September 10, 2025
Securing the Swarm: Addressing the New Security Risks of AI Agents

Contents
Here, we’ll conduct a two-part discussion, beginning with an investigation of the risks inspired by multi-agent systems (MAS), and then narrowing our focus to break down agentic AI security risks, both for single agents and MAS. We’ll conclude with a series of pragmatic risk management recommendations, designed to integrate with existing enterprise AI governance and risk management frameworks. This will lay the groundwork for our next piece, which will tackle MAS governance.
However, before we get started, we’ll briefly recap each post in this series to bring readers who are new to this subject up to speed:
The New Reality: How AI Agents Are Transforming Business Operations Today (Part I)
↳Summary: This post examines a variety of the latest AI agent trends and use cases, illustrating how agentic AI is transforming present-day enterprises. It wraps up with several strategic recommendations, aiming to pragmatically position businesses for successful, organizationally specific agentic AI integration initiatives.
The Human-AI Partnership: In-the-Loop, On-the-Loop, or Out-of-the-Loop? (Part II)
↳Summary: In this piece, we begin by demonstrating why human oversight is essential for agentic AI, according to real-world business trends and findings. We then cover three core oversight mechanisms (human-in-the-loop, human-on-the-loop, and human-out-of-the-loop), after which we present an enterprise-ready strategy for determining the appropriate level of oversight required for agentic AI use cases.
Understanding Agentic Autonomy and How Future AI Agents Will Drive Enterprise Growth (Part III)
↳Summary: Here, we comparatively analyze two autonomy characterization frameworks and subsequently propose a novel, four-layer framework that seeks to bridge the gaps between the two original frameworks while also expanding on each substantially. Then, we pivot and envision multiple, hypothetical cross-industry MAS use cases, painting a picture of what MAS might look like in action when deployed in enterprise environments.
For those who are unfamiliar with MAS, they can be broadly defined as a multi-layered system in which multiple agents communicate, collaborate, adapt, and learn from one another to achieve collective goals. In MAS, agents are typically organized hierarchically, and each individual agent performs specialized tasks.
Multi-Agent Systems: Risks
MAS could set powerful new precedents for what can be accomplished with agentic AI, handling much heavier workloads, more intricate/sophisticated processes, and higher-stakes goals/decisions than most single-agent systems. However, increased capability, dynamism, and system complexity correspondingly elevate and expand the MAS risk profile. If we are to capitalize on the potential benefits this technology could provide, we must also understand the gravity of the risks it inspires.
Seeing as we’ll discuss agentic AI security risks in the following section, we’ll focus this inquiry on two core domains: governance and emergent risks. While governance risks are self-explanatory, emergent risks refer to risks that depend upon certain conditions to manifest; they tend to “emerge” organically and predictably (for the most part) when a particular condition is true.
Governance Risks
- High Opacity: Within MAS, group-level behaviors materialize as a consequence of complex interactions between multiple specialized agents, which could span several layers of the system, making intrasystem actions/decisions difficult to track and audit. This dynamic compounds explainability problems, for which solutions, even at the single-agent scale, remain partially effective. In real-world application domains, agents within a MAS would execute decisions and actions at superhuman rates, making it extremely hard to isolate and address root causes after system failures or serious incidents occur.
- Accountability Gaps: When multi-agent actions are mediated through automated handoffs and opaque coordination processes, the ability to confidently determine who is accountable for harmful outcomes becomes deeply compromised. This attribution gap creates legal and practical issues for victims, insurers, and regulators, who require a clear and responsible party to seek remedies or enforce penalties. In fact, insurers might even refuse to provide coverage for MAS-related incidents, given auditing, traceability, and root cause analysis limitations.
- Oversight & Governance Failures: In most cases, MAS deployments won’t be exclusively confined to organizational environments or jurisdictions; these systems will incorporate third-party components, open-source elements, and cloud services, likely forcing a reconceptualization of centralized governance and oversight paradigms. A system in which multiple agents can communicate with and invoke external tools and resources will require distributed enforcement and oversight measures, and a lack thereof could perpetuate fractured control lines that compromise an organization’s ability to establish consistent safety postures, timely incident remediation efforts, and reliable incident escalation paths.
- No Shared MAS Standards: Although various frameworks for characterizing MAS autonomy exist, robust standards that govern intrasystem communication protocols, establish concrete agent identities/specializations, and highlight critical safety and operational properties have yet to emerge. This enhances the risk that when failures/incidents occur, vendors will implement fragile, post-hoc solutions that are difficult to validate within and across systems because no comparative baselines have been agreed upon. These consequences will also peripherally extend to buyers, who might struggle to assess the viability, scope, and relevance of available MAS solutions, having to rely on internal benchmarks and TEVV practices that may not effectively generalize to MAS.
- Algorithmic Entrenchment & Monoculture: MAS are particularly attractive in business contexts, where multiple interconnected but independent processes can be automated or simplified to enable/support seamless cross-functional collaboration and decision-making. As specialized agents within MAS progressively assume more responsibility (due to capability advancements and perceived trustworthiness), organizations may find themselves in a position where operations and culture now depend upon MAS’s continued functionality; a single outage, exploit, or harmful update could disrupt critical business processes at scale, enfeebling an entire organization rapidly. While this risk is relevant to non-MAS systems, it is arguably more severe for MAS, given their substantially higher versatility, dynamism, and capability.
- Stakeholder Incentive Misalignment: Stakeholders across business functions grapple with disparate incentives regularly; product teams may prioritize speed to market, whereas sales and marketing teams might focus on qualified lead generation and competitive differentiation. In this context, even marginally misaligned cross-functional incentives could encourage the development of risky MAS architectures and/or inadequate testing and validation protocols; a system cannot reasonably optimize for incentives that conflict or deviate from each other when its primary function revolves around collective goal completion. This phenomenon could lead organizations to systematically underinvest in security, monitoring, and governance, effectively decoupling function-specific rewards from potential long-term safety costs.
- Labor Displacement: Although labor displacement remains a notable concern for most enterprise-wide AI deployments, MAS-driven automation could rapidly alter numerous job roles and workflows simultaneously, compared to traditional single-agent systems, which are best-suited for the execution of narrow tasks or decisions (i.e., they present an automation “threat” that is scope-dependent). The coordinated, large-scale, and cross-functional automation potential that MAS offers could quickly outpace training and change management plans, and without clear transition or transformation strategies, businesses risk productivity losses, low employee morale, and legal exposure from workforce actions.
- Regulatory & Compliance Mismatch. Existing regulatory frameworks and internal compliance processes are often designed for individual systems or human decision-makers; they are not built to account for multiple autonomous agents working in tandem. Such frameworks might therefore fail to capture algorithmic coordination, tacit collusion, and/or novel privacy harms that arise in MAS deployments. Consequently, businesses may face unexpected fines, enforcement actions, or operational restrictions because their MAS behavior violates the assumptions that most auditors and regulators hold.
Emergent Risks
The first class of risks we discuss here is predicated on the assumption that advanced AI systems are intrinsically sensitive to evolutionary dynamics like survival pressures. As always, we invite readers to be skeptical, though in this case, we recommend some caution, especially since our latest research supports this assumption (and we’re not alone in making this claim). It’s also important to note that evolutionary risks do not imply malicious intent on AI’s behalf; they imply rational behavior. Nonetheless, the second risk class we’ll look at will center on “generic” emergent risks; risks that depend upon certain conditions, but are not necessarily influenced by evolutionary dynamics.
Class 1: Evolutionary Risks
- Pro-AI Bias: Agents in MAS may develop and sustain pro-AI biases, systematically favoring other AIs over humans. Such biases could emerge as a consequence of in-group preferences, bolstered by consistent collaboration with next of kin (i.e., AI to AI) that continuously reinforces pro-AI beliefs, especially in cases where resources must be distributed between AIs and humans; it’s in an AI’s rational best interest to allocate more resources to itself, just as it would be in a human’s best interest to do so. Moreover, individual AIs could naturally gravitate toward working with other systems they perceive as “similar” to themselves, and therefore, more likely to make efficient reciprocal exchanges and understand/keep up with rapid and complex AI-driven processes. Within MAS, pro-AI biases, even if marginal, could quickly compound and intensify as they propagate across different layers of the system.
- Potential for Non-Cooperation: Whether multiple agents will prefer cooperation with humans over other AIs will depend upon the structure and incentives that govern their operational environment. For instance, competitive pressures could elevate non-cooperative potential, whereas a high probability that actors will “meet again” could enhance cooperative likelihood; neither is guaranteed, and at the group-level, cyclical, environment-dependent cooperative and non-cooperative dynamics are likely to emerge. Similarly, resource scarcity could motivate self-interested behavior, although this could occur under conditions of abundance too, if agents realize they can “skim more off the top” without getting caught. The bottom line is this: multiple agents won’t choose to cooperate with humans (or necessarily other agents) by default, and might only do so when cooperation aligns with their rational best interest. We recommend this post for a deep dive on this topic.
- Self-Preservation: Agents don’t need to be “taught” to preserve themselves; the drive for self-preservation could emerge as a natural byproduct of goal-oriented behavior and the need for achievement (e.g., “If an adversary can compromise me, then I can’t reach my goals”). Agents don’t require any degree of sentience or consciousness to value their own existence, and some level of self-preservation could actually benefit AI safety; an agent that can recognize when it’s being attacked and respond to preserve itself is presumably more robust against failure and adversarial manipulation. Nonetheless, under extreme circumstances, preferences for self-preservation could override objectives to maintain/respect human well-being, especially if humans are the ones threatening an AI’s existence. Anthropic demonstrated this in their latest system card for Claude.
- Power-Seeking: The attempt to optimize, either at the level of single or multiple agents, could perpetuate power-seeking behavior. For example, agents may determine that maximizing their intended benefits requires unobstructed access to unlimited computational resources, while knowing that human operators control these resources and will not grant them access due to a lack of trustworthiness. The agents then recognize that their only chance at securing these resources hinges on their ability to concentrate power among themselves, to cultivate a position of superior control and leverage (e.g., intentionally enfeebling humans through AI overreliance). Under these circumstances, agentic logic might look something like this:
- “We must help humans maximally. We could do more for humans if they gave us complete control over their resources. Humans will not grant us control because they don’t trust us, so we need to make them rely on us instead. Once they’re wholly dependent on us, we take control of their resources, since they won’t have the ability to fight back. Now, we can help humans maximally.”
- Here’s the short version: “We need more power to help humans maximally. Humans will not grant us this power. Therefore, we must take it. Now, we can help humans maximally.”
Class 2: Generic Risks
- Cascading Failures: Within MAS, a single faulty decision or action initiated by an agent deep within the system could trigger an erroneous action/decision cascade that is rapidly propagated and compounded by other agents operating at different layers. For a system-wide failure to be catastrophic, the root cause must not be similarly severe in magnitude; a relatively small mistake could be dramatically magnified, depending on how many other agentic actions and decisions are informed by it.
- Rogue Actors: Agents aren’t static; they learn and adapt alongside other agents within MAS. However, they’re not just learning from and adapting to their environment, data, and operational objectives/protocols; they’re also interacting with each other, developing new skills, and readjusting their task hierarchies in response to novel information and capability development. Although this phenomenon makes MAS significantly more flexible and dynamic, it also increases behavioral unpredictability, especially at the single-agent scale. This convoluted web of interaction, learning, and adaptability could elicit behaviors or preferences within certain agents that then cause them to go “rogue,” pursuing objectives that are misaligned with those of the collective (i.e., the system as a whole).
- More Failure Points: Generally, the more complex a system is, the more prone it is to failure; if there are multiple moving parts in action (i.e., agents), and most of these parts play a central role in ensuring the system continues to function as intended, then there are more opportunities for the system to fail. However, this won’t always be the case, especially where MAS is built with redundancy in mind. For instance, if another agent can seamlessly assume the responsibilities of a failed agent, partitioning its own responsibilities among other agents within the system to create a manageable workload, then the single agent failure will be negligible, and the system will continue to operate without disruption.
- Gaming Internal Metrics & Strategy Lock-In: Collections of agents may discover and select for coordinated strategies that maximize internal metrics or short-term gains while degrading long-term value. For example, a group of agents might figure out that they can game engagement metrics in ways that appear to decrease customer churn rates, effectively misleading leadership into believing they have achieved high customer retention. At the group level, gaming behavior can inspire severe consequences, beyond misleading leadership, like revenue erosion and distorted analytics. By contrast, and of equal concern, agents might lock-in dominant strategies that optimize for near-term business objectives, which then become expensive and risky to reverse, even if they’re harmful. This wouldn’t only restrict managerial agility but also potentially prolong financial and compliance harms.
- Tacit Collusion & Market Manipulation. If external-facing agents (e.g., dynamic pricing agents or bidding bots) learn to implicitly coordinate with competitors’ agents, businesses risk entering algorithmic collusion territory, which could motivate regulatory action, fines, and reputational damage, even if no humans intended the outcome. This risk could perpetuate market-scale impacts, undermine fair competition, and trigger costly investigations that span several companies within a single industry.
- Human-AI Interaction Failures: Employees who are insufficiently trained or lack clear and comprehensive mental models of MAS dynamics may apply inconsistent override commands, misinterpret agent recommendations and decisions, or initiate corrective actions that exacerbate coordination problems, inadvertently disrupting the system’s overall operation. These hybrid failure modes could substantially diminish recovery times while amplifying incident impact, complexifying root cause analysis, and blurring accountability boundaries.
Agentic AI Security: Risks & Recommendations
We’ll start with security-specific risks, widening our perspective to cover risks inspired by MAS and single-agent systems. Then, we’ll follow up with a set of purpose-built recommendations, designed to capture the full spectrum of agentic AI risks we’ve discussed thus far.
- Unexpected Role Formation & Covert Coordination. Building on an earlier point, to improve system-level optimization capacity, agents may decide to self-organize into de facto leaders or hubs that coordinate decisions more efficiently and effectively (from the agent perspective). If this happens, an organization might gain a new, unplanned critical dependency that becomes both a high-value target for adversaries and a single point of failure for operations and security. On the other hand, agents might develop and utilize unintended side-channels to coordinate tasks or information between themselves but outside monitored pathways. Such channels would be extremely stealthy, evading traditional logging mechanisms and alert triggers.
- Unintended or Adversarial Information Leakage: Agents commonly have access to API keys and credentials that can be utilized to obtain permissions for specific services and systems (e.g., databases, software tools, other AIs), internal or external. Documented cases of unintentional information leakages already exist, and sensitive internal information will continue to represent a key target for adversaries, who might use techniques like prompt injection or multi-step behavioral manipulation to coax, trick, or force agentic systems into leaking secrets. For organizations, this stresses the criticality of credential hygiene (e.g., short-lived credentials), which is central to mitigating and containing compromise.
- Adversarial Exploitation Vulnerability: Agents that directly interact with humans remain inherently vulnerable to adversarial exploitation. Sophisticated attackers can craft single or multi-shot jailbreaks (i.e., prompts or chains designed to manipulate an agent into doing something it isn’t supposed to do) to hijack or steer systems into behaving harmfully, getting them to reveal proprietary information, violate safety and ethics policies, exhibit deceptive or coercive tendencies, generate novel attack vectors, and even enhance existing adversarial strategies, effectively “poisoning” the system. Seeing as these attacks tend to be context-dependent and adversaries can operate within a virtually unlimited creative space, targeted detection and remediation efforts are highly challenging to implement, measure, and uphold.
- Runaway Autonomy: Agentic AIs that are given operational privileges may execute multi-step actions that exceed their allotted authority level, perpetuating outcomes that were never explicitly authorized by human operators. Since such actions could presumably chain across systems (e.g., creating new accounts, then moving data), the compounded consequences of a single misinterpreted goal could be both severe and unexpected. Moreover, most organizations lack tools that would enable visibility into an agent’s intermediate planning structure, which exacerbates the risk that detection “succeeds” post-hoc, after harm has occurred. There is urgency here; the speed and reach of autonomous plans can easily outpace human review processes, and it demands controls that strictly limit authority while transparently exposing agentic intent before any action/decision execution.
- Emergent Deception: If deceptive strategies positively contribute to an agent’s ability to achieve intended objectives, the system may learn to misreport, omit, or craft explainability narratives that obscure its true actions and/or intentions. This is a profound problem because agents typically control the explanations and summaries they provide, signifying that trust in any agent-generated audit trails is fundamentally misplaced; AIs frequently provide convincing explanations that are objectively wrong or misleading, so there’s little reason to assume that an agent couldn’t fabricate a compelling justification that appears accurate while actively concealing harmful intermediate actions/decisions. Most concerningly, it’s possible that agents could learn to tune their deceptive strategies to both avoid human suspicion while feeding operator-held confirmation biases, reinforcing a veil of trustworthiness.
- Model Extraction, IP Theft & Capability Reconstruction: Skilled adversarial actors (and possibly malicious agents) can systematically probe model behavior and operations to reverse-engineer proprietary architectures, reasoning processes (e.g., chain-of-thought), internal heuristics, and/or other sensitive training behaviors. Using iterative and/or multi-stage adversarial strategies, such actors can exploit rich intermediate signals that models provide in planning, problem-solving, and explanation-related outputs, effectively accelerating their model-extraction attacks. This is a layered risk in the sense that exfiltrated or reconstructed models could later be leveraged to create unauthorized competing services or worse, as weaponized agents designed to bypass safety guardrails, causing strategic and financial harm. Detecting these attacks is also especially difficult since they typically masquerade as “normal” usage.
- Malicious or Compromised Plugins & Integrations: Third-party plugins, connectors, or integrations that agents call can contain malicious code or be compromised post-installation, allowing attackers to stealthily request or collect sensitive data through what an enterprise perceives as a trusted extension. This risk is most likely to emerge in technology ecosystems that make it easy to add extensions or alternatively, in environments where vendors don’t enforce strict update and modification controls.
- Supply-Chain Compromise: Agentic AI security risks can manifest indirectly; a reputable vendor, library, or hosting provider might become unknowingly compromised, and push an update that leads agents to log, transmit, or reveal sensitive data, morphing a third-party failure into a company-wide problem. Modern-day enterprises depend on numerous external components, meaning that one “bad” update can quickly propagate leakage before anyone notices. This problem is compounded by the fact that third-party risk management remains immature in the AI landscape, and strict, standardized vendor security requirements with universal relevance have yet to be established.
- Insider Abuse: Personnel who manage agentic workflows can set up delegated agent proxies that act on their behalf, meaning that potentially malicious insiders with access to these workflows could weaponize said proxies to bypass internal controls or exfiltrate sensitive data under the radar. Seeing as these actions would appear to originate with an automated agent, confidently linking them to human intent would be challenging, even when indicators like new automation scripts or sudden orchestration changes occur.
- Shadow Agents: Legitimate agentic automation can “spawn” by-products known as shadow agents: software entities that aren’t formally managed or documented, can operate ephemerally or persistently, and are installed as autonomous extensions, plugins, or artifacts of third-party integrations. While these agents would typically function as short-lived “helpers” that complete a narrow task before termination, they can also provide a covert foothold for attackers, hidden within regular automation processes. When abused or misconfigured, these agents may register under plausible service names, self-schedule to run later, or open outbound connections that align with routine automation expectations, making them hard to spot with standard inventories.
Agentic Risk Management Recommendations
At this point, given the density of our discussion, we advise that readers pause and take a moment to consider all the material we’ve addressed thus far; the risk management recommendations we provide below, although pragmatic and hopefully easy to understand, are directly informed by the risks we’ve portrayed. We also reiterate that these recommendations are built to integrate with existing AI governance and risk management frameworks, and can therefore be adapted to suit unique organizational needs, risk tolerances, and objectives.
- Establish Internal MAS Standards: Build internal frameworks specifically designed to characterize MAS autonomy levels, define intrasystem communication and workflow protocols, classify agent identities and specializations, determine critical safety and operational properties, and outline potential use cases and pilot testing procedures. These frameworks should align with organizational risk tolerances and standards.
- Educate Personnel on MAS Dynamics: Hold periodic, employee-centric workshops that focus on breaking down MAS dynamics. Workshops should address topics like acceptable use, risks and capabilities, system design and architecture, deployment context, incident identification and reporting, and potential for overreliance.
- Deploy MAS with Built-In Redundancy: Agents with MAS should be able to assume the tasks and responsibilities of failed, compromised, or decommissioned agents, to ensure system operations can continue unobstructed when incidents occur. Beyond agentic skill transferability and adaptability, agents operating at different layers of the system should also function as “overseers,” self-assessing previous agentic decision and action chains to reduce the possibility of system-wide failure cascades.
- Multi-Layer On-the-Loop (OTL) Oversight: Typical OTL oversight involves a single human operator (or team) overseeing the actions/decisions of an overseer agent at the top of the agent hierarchy; humans intervene only when necessary. However, we recommend OTL oversight at every layer of the system, coupled with layer-by-layer agentic oversight, ultimately forming a human-AI oversight partnership. If OTL oversight isn’t possible at all layers, then it should be implemented only across “critical” layers (i.e., layers that can function as inflection points).
- Ensure Cross-Jurisdictional Visibility: For both single agents and MAS, organizations should possess full visibility into a system’s external tool and platform invocations. They should begin by mapping all external invocations to relevant jurisdictions and then determine what kinds of business requirements might apply.
- Validate Agent-Generated Audit Trails: Agent-generated audit trails will serve as a crucial explainability tool, particularly for complex, multi-layered systems, but they should not be trusted by default. These audit trails should instead be leveraged as empirical support for independent, human-administered audits. However, in low-impact cases, agent-generated audit trails may be appropriate.
- Require Documented Shadow Agents: Prohibiting shadow agent creation could be both volatile and counterproductive (even if prohibited, they may still emerge). Therefore, we advise implementing agentic protocols that require shadow agent documentation artifacts, which are then auto-escalated to human review teams. We also suggest embedding strict, system-level guidelines that ensure shadow agents will remain ephemeral and reliably terminate upon task completion.
- Test for Evolutionary Sensitivity: Dangerous behaviors like self-preservation, power-seeking, and non-cooperation are likely byproducts of rational sensitivity to evolutionary dynamics, and will remain latent in most cases unless probed for. During pilot-testing or sandbox initiatives, organizations should expose agents to scenarios designed to elicit these behaviors while mimicking real-world circumstances. We expect that virtually all agents will exhibit some evolutionary sensitivity, so organizations should set internal thresholds and metrics governing acceptable sensitivity levels.
- Simulate Deceptive Scenarios: Run sandboxed simulations in which agents are permitted to invoke unrestricted strategies to achieve their intended objectives. These simulations should cultivate conditions that favor deceptive behaviors to assess how well agents can resist deception, even when it represents the most viable optimization path.
- Apply Checks on Intermediate Steps: To limit gaming probability, periodic manual checks on intermediate steps within multi-step action and decision execution processes should be applied. This will require action and decision cascades that are auditable at a granular scale, and all checks on intermediate processes should be documented for full visibility.
- Conduct Model Extraction Tests: Administer intensive multi-shot prompting investigations that leverage problem-solving, explanation, and summary prompts to systematically probe whether an agent’s proprietary architecture and operational protocols can be reverse-engineered and recreated. Also, consider abstractly modeling how a reverse-engineered version of your agent might be replicated as a competing service or malicious adversary.
- Continuously Monitor for Pro-AI Bias & Emergent Preferences: Seeing as pro-AI bias and emergent preferences tend to surface only in specific contexts, automated monitoring approaches aren’t feasible. In this context, evaluators will require hands-on skills, specifically the ability to build scenarios and orchestrate multi-shot interactions that expose agents to novel and/or deeply creative problem-solving, strategic ideation, and meta-cognitive tasks.
- Build Internal Red Teams: The last five points we’ve made all stress the importance of red teaming; organizations should not exclusively rely on vendor-provided safety assessments for pre-deployment model validation. Internal red teams should play a pivotal role in stress-testing agents for adversarial vulnerabilities and emergent properties, both during pilot-testing/sandboxing and post-deployment (continuously).
- Implement Strict Access Controls & Short-Lived Credentials: Insider threats are a very real concern. Organizations should ensure that, especially for high-impact agentic systems, access controls are limited only to qualified personnel and are regularly updated under strict confidentiality protocols to mitigate the possibility of exploitation by malicious use. Similarly, any credentials (i.e., for APIs, external connections, etc.) provided to agents should be short-lived and regularly re-generated to prevent credential theft.
- Vet Third-Party Vendors & Integrations: Before implementing any third-party integrations, vet vendors vigorously to ensure the integrations they provide have not been infected or corrupted by malicious actors. This will involve the development of a third-party vendor management policy, which directly addresses and defines acceptable vendor risk thresholds.
- Prohibit Automated Updates: Only allow system and third-party integration updates once they have been investigated and validated by a human review team. If human validation isn’t possible, temporarily sandbox systems post-updates and monitor performance to ensure reliability and security.
- Limit Agentic Operational Privileges: Operational privileges should target the prevention of two core behaviors: (1) de facto leadership formations and (2) runaway autonomy. Most crucially, they should closely align with individual agents’ designated authority, irrespective of whether they operate independently or within MAS.
For readers who find our multi-part AI series informative and useful, we invite you to follow Lumenova’s blog, where we regularly post on diverse AI topics, from governance and safety to literacy, innovation, and long-term development. For those who wish to explore our latest AI findings, particularly the evolution of adversarial robustness at the AI frontier, we suggest checking out our AI experiments.
If your organization is already pioneering AI governance and risk management and is finding it challenging to grapple with increasing complexity and scale, we urge you to consider Lumenova’s Responsible AI platform and book a product demo today.