August 13, 2025

The Human-AI Partnership: In-the-Loop, On-the-Loop, or Out-of-the-Loop?

ai agents

In part I of this series, we began by examining a wide selection of notable AI-specific business trends and statistics, after which we took a deep dive into present-day agentic AI use cases across multiple industries, from finance to supply chain operations and cybersecurity. We concluded by leaving readers with a variety of targeted recommendations for extracting and maintaining agentic AI’s value in enterprise environments.

Here, we’ll extend our discussion, focusing on three key human oversight mechanisms: in-the-loop, on-the-loop, and out-of-the-loop. We’ll start by summarizing each of these mechanisms in detail, breaking down their core characteristics, and coupling each mechanism with plausible real-world examples. Next, we’ll outline a high-level strategy designed to enable enterprises to implement oversight mechanisms aligned with their AI use cases.

We’ll wrap up with a brief overview of further factors that robust oversight frameworks must implement and uphold, and finally, some predictions that envision how oversight mechanisms could evolve in the near future.

However, before tackling any of this, we need to establish some context to demonstrate why AI oversight remains essential, even within a profoundly unstructured, uncertain, and immature US-based regulatory ecosystem. In this respect, we showcase several important AI governance and oversight trends below:

  • In 2024, large enterprises captured 70% of the AI governance market, a trend explained by increasingly complex compliance demands and driven by the need to integrate explainability, auditability, and risk management mechanisms and protocols directly into organizational workflows.
  • In the same year, AI oversight solutions (e.g., continuous monitoring tools, dashboards, and compliance modals) comprised 66% of the AI governance market, highlighting the growing popularity of key AI oversight functions like bias detection, explainability modules, compliance audits, and automated governance platforms.
  • 2024 further revealed a major gap (42%) between AI deployment expectations and reality, with leading causes including a fragmented regulatory environment, poorly managed third-party models, and a lack of concrete governance ownership and accountability.
  • According to Cential Consulting, high-profile corporations are actively shifting gears from siloed AI risk management strategies to enterprise-wide, integrated risk platforms, designed to support quantitative, real-time, and predictive AI-powered risk assessment. This emphasizes the holistic nature of AI oversight and the pivotal role that AI itself can play in enhancing oversight practices.
  • The World Economic Forum AI Governance Alliance now totals 500 organizations and 644 partners globally, all of whom work together to advance and accelerate the adoption of international standards and best practices, pragmatically dedicated to bolstering human potential and AI progress while preserving societal strength and integrity in the face of rapid AI innovation.
  • Deloitte identifies four categories of AI risks, ranging from internal to external: (1) enterprise (e.g., app security, sensitive data protection), (2) AI capabilities (e.g., prompt injection, data poisoning, evasion), (3) adversarial AI (e.g., malware, impersonation fraud, phishing), and (4) market (e.g., regulatory uncertainty, complex ecosystem management, legal risks). Across all these categories, AI oversight, specifically monitoring solutions, can and will be crucial to mitigating related AI risks.
  • In a separate 2024 study, Deloitte revealed that almost 60% of US respondents had significantly increased their cybersecurity investments since 2023, predominantly due to advanced AI integration initiatives. This strongly suggests that cybersecurity will become an AI oversight cornerstone in the age of frontier AI.
  • Enterprise investments in AI ethics are estimated to increase by 5.4% across all AI budgets by 2026, compared to 2.9% in 2022 and 4.6% in 2024. This sends a potent signal that AI-enabled enterprises are now prioritizing transparency and responsible AI (RAI) practices at scale, despite regulatory uncertainty.
  • Findings by Harvard Law show that 60% of S&P 500 companies cite “material risk” concerning AI in official governance disclosures, demonstrating that trust, reputation, and societal impact have evolved into critical board-level concerns, not only low-level operational issues.
  • According to a 2024 TechTarget survey, the cloud segment of the AI governance market is projected to be among the fastest-growing segments, reinforcing the importance of cost-effective, rapid scaling of AI governance and oversight practices across enterprise domains.

Breakdown: In-the-Loop, Out-of-the-Loop, On-the-Loop

Below, we’ll break down three key AI oversight mechanisms: in-the-loop, out-of-the-loop, and on-the-loop.

In-the-Loop

Definition: In-the-loop (ITL) oversight represents the most direct form of AI oversight. A human is directly responsible for verifying and validating the AI’s decision-making and/or action-execution process. This means that a system can’t act autonomously unless it has received or undergone human approval or intervention. Models subject to ITL oversight tend to function as advisors or assistants.

Analogy: ITL is like walking your dog on a leash; they can sniff and wander a bit, but they can’t move forward or cross the street without you tugging the leash. You’re always making the final call.

Core Characteristics

  • Labor-Intensive: Human review and approval are required before any action or decision is taken.
  • Decelerated Decision-Making: Due to human intervention and guidance requirements, AI decision and action processes will be significantly slower than a system with minimal or no oversight.
  • High-Risk Scope: Well-suited for high-risk AI systems, where decision or action-related impacts can be severe.
  • No Autonomy: An AI system is wholly restricted from pursuing any actions or decisions without human intervention.
  • Clear Accountability: If an AI pursues harmful decisions or actions, a human can be held directly accountable.
  • Transparency by Necessity: An AI system must meet a certain transparency and explainability threshold, such that its actions and decisions are easily interpretable to human overseers.
  • Enhanced Control: A human can ensure that the actions or decisions an AI initiates are aligned with ethical, legal, and social standards and that its behavior remains consistent with its intended purpose and use.

Examples

  1. A diagnostic AI identifies and detects early cancer signs in imaging scans, generating medical reports that must be verified by a physician before a diagnosis is given or treatment is initiated.
  2. A legal AI assistant drafts sections of a legal contract, but the contract must first be reviewed, edited, and signed off by a lawyer before it’s shared or finalized.
  3. A manufacturing AI monitors industrial machinery and, after spotting a potential mechanical failure, recommends halting the production line, though technician approval is required for shutdown.

Out-of-the-Loop

Definition: Out-of-the-loop (OFTL) oversight is the most “laissez-faire” form of AI oversight. A human never monitors or intervenes in an AI’s decision-making and/or action-execution process, and a system is left to behave with full autonomy. In simple terms, OFTL indicates that a system isn’t subject to any human oversight.

Analogy: OFTL is like sending a spacecraft on an autonomous journey to Mars; once it’s launched, there’s no joystick or control panel. You must simply trust it’ll do what you designed it to do.

Core Characteristics

  • No Labor & Full Autonomy: An AI system is granted full authority to initiate decisions and actions without any human intervention or guidance.
  • Limited Control: Human operators will not be able to verify whether a system’s behavior is aligned with its intended purpose, use, and other relevant standards.
  • Maximum Speed & Scalability: A human will never intervene in action or decision processes, allowing an AI to pursue these processes at maximum speed and scale easily.
  • Differential Low-Risk/High-Risk Scope: Ideally used for tasks where actions and decisions can’t drive severe negative impacts or throughout mission-critical high-risk domains (e.g., missile defense systems).
  • Compromised Accountability: If an AI system pursues a harmful action or decision, clear accountability will be challenging to assign, raising serious liability concerns.
  • High Opacity: A human doesn’t need to understand a system’s decision and action processes, meaning that the system isn’t required to meet a high explainability and transparency threshold.
  • Diminished Situational Awareness: Over time, humans become less familiar with system dynamics and may lose the ability to accurately predict or understand behavior.
  • Pre-Deployment Design Dependency: Human operators must rely on developers’ capacity to build robust, well-aligned, reliable, and compliant systems.

Examples

  1. AI agents make thousands of stock trades per second without any human oversight during execution, and trades are reviewed only after they’ve been made.
  2. A home AI assistant autonomously adjusts temperature, airflow, and lighting based on sensor inputs according to pre-built user preferences, with no human approval or supervision.
  3. A cybersecurity AI agent continuously monitors and neutralizes cyber threats, autonomously quarantining traffic or reconfiguring firewalls, and autonomously generating logs for the cybersecurity team’s post-hoc review.
  4. Robotic agents navigate a warehouse, retrieve items, and restock shelves based entirely on internal optimization algorithms.

On-the-Loop

Definition: On-the-loop (OTL) oversight finds a middle ground between ITL and OFTL. A human is responsible for monitoring an AI’s decision-making and/or action-execution process and can manually intervene if necessary, for example, to prohibit a specific decision or action. Systems subject to OTL are autonomous by default, and they don’t undergo routine approval for every decision or action they initiate.

Analogy: OTL can be likened to hosting a party at home; you keep an eye on your guests, food and drinks, and the general vibe, and you only intervene if things start to get a bit too hectic.

Core Characteristics

  • Dynamic Labor: Depending on how an AI system is performing, the scope of human intervention and guidance will fluctuate throughout the system’s lifecycle.
  • Partial Autonomy: Even though a system is autonomous by default, it doesn’t have full authority to pursue all actions and decisions without human intervention.
  • Speed & Scalability Tradeoff: Enables faster and more scalable decision-making than in-the-loop models, while retaining a human “safety net” in case of malfunctions or ethical issues.
  • Oversight Tuned to Scope: For high-risk deployments, oversight measures can be increased, and for low-risk deployments, they can be decreased.
  • Monitoring Interfaces Required: To maintain effective situational awareness, human operators will require real-time dashboards, alerts, or explainability tools.
  • Risk of Automation Bias: Over time, humans may come to place excess trust in the system and begin overlooking potential failures while treating all outputs as inherently credible.
  • Complex Accountability: Seeing as oversight measures may change throughout a system’s lifecycle, assigning accountability for harmful outcomes could depend on which oversight stage the system is or was in.

Examples

  1. A recommender system leverages AI-powered filtering to remove inappropriate content in real-time, while human moderators periodically audit flagged items and override filtering decisions when needed.
  2. An infrastructure AI dynamically reroutes power in response to demand fluctuations, while engineers oversee operations and step in if cascading failure signals are detected.
  3. Autonomous drones patrol a wildfire zone, mapping a fire’s spread and intensity while a human operations center oversees their behavior, ready to redirect or ground drones when/if needed.
  4. A security AI agent autonomously detects and patches network vulnerabilities, which are automatically applied unless a system administrator intervenes.
  5. A trading AI agent executes high-frequency trades driven by real-time market data while a human portfolio manager watches trading activity, suspending system operations when market anomalies are identified.

A Basic Step-Wise Strategy for Identifying the Right Oversight Approach

Advanced AI is an organizational asset, and to ensure it delivers value as intended, it’s crucial to understand how much human intervention is needed to ensure steady performance, reliability, and safety. Here, we outline a relatively simple strategy for determining what kind of oversight (i.e., ITL, OFTL, OTL) is best suited to your specific AI use case; this strategy is predicated upon a series of questions, expanded upon below.

Step 1: What degree of autonomy does your AI exhibit?

If you’re not working with a fully autonomous system (e.g., an AI assistant), then you immediately know that ITL oversight is what you need. If you’re working with an autonomous system, then you have to choose between OFTL and OTL. How you make this choice will depend on how you answer the following questions.

Step 2: How transparent and explainable is your AI?

If your AI meets a high transparency and explainability threshold, you know that ITL is likely the best approach; we say “likely” because systems subject to OTL or OFTL can still meet explainable AI (XAI) standards, and in this context, this feature might be interpreted as a “cherry on top”. Nonetheless, if your system exhibits low-to-moderate transparency and explainability (i.e., medium to high opacity), OTL or OFTL could be better suited.

Step 3: How reliably does your AI perform?

This (and the next question) is where the differentiation process really begins; if you’re dealing with a dynamic system where performance fluctuates regularly (even if it’s a few percentage points above or below your internal criteria), you’ll need the ability for direct human intervention and guidance, focusing either on ITL or OTL; this also applies to systems that routinely undergo modifications or updates. By contrast, if your system’s performance is highly reliable, meaning that it’s consistent over time, data, and environments (e.g., model drift risk is low and accuracy is high), you might begin considering OFTL.

Step 4: What is your AI’s risk profile?

If your AI is used in critical decision-making contexts, showcases dangerous capabilities (e.g., deception), can materially, psychologically, or financially affect stakeholders, or exhibits other concerning properties (e.g., misalignment, security vulnerabilities, etc.), it may qualify as high-risk. In this case, the intuitive approach would involve at least some degree of human oversight, whether ITL or OTL.

But let’s consider an added layer of complexity: say you have a high-risk AI that performs reliably, is fully autonomous, and meets XAI standards. To add another layer, let’s say this system is used for some emergency response function (e.g., EMS dispatch) where the timeframe within which a decision or action is executed drives life or death consequences. Would human oversight be warranted here?

No, and here’s why: the degree of human oversight applied to an AI depends on the interplay between a conglomeration of factors, precisely the ones these questions seek to target. In the above scenario, human oversight would do more harm than good; in emergency contexts, seconds can save lives, and it’s always better to respond to an emergency even if we then find out that it was a false flag.

Step 5: Is your AI static or dynamic?

Is your AI designed to execute predefined, repetitive tasks? Does it lack the ability to adapt in real time? Is it limited to inferences based solely on its initial training data? Does it only change when explicitly updated by humans? If you answered ‘yes’ to most of these questions, you’re likely working with a static system. If you answered ‘no,’ then you’re probably using a dynamic system, that is, one that exhibits adaptive, context-sensitive, or learning behaviors. Of course, real-world systems often exist on a spectrum between these two extremes, but this contrast helps illustrate the basic difference, and we hope readers get the point.

In essence, as a system becomes more dynamic (i.e., adapting in real time or modifying its internal state in response to new data), its behavior tends to correspondingly become more unpredictable. This dynamism often entails elevated structural and interactional complexity, which increases potential failure modes and typically reduces system explainability, especially when transparency or control constraints aren’t applied. The bottom line: trust in dynamic systems should be earned differently and maintained more carefully than in static ones.

This means that dynamic systems will usually require a level of oversight that aligns with their dynamism, and dynamism can change throughout a system’s lifecycle, so oversight levels should accompany these changes as necessary. Regardless, as we’ve said before, the oversight an AI should receive will be informed not only by its dynamism, but also by the other factors we’ve pointed out here.

Step 6: If things go wrong, what consequences will you face?

Of all the questions we’ve posed, this is the most pragmatic and easy to answer. If the consequences of an AI failure are severe (e.g., costly compliance penalties, litigation, large operational disruptions), it would be wise to implement oversight, even if a system doesn’t technically require it. Whether this oversight manifests in the form of periodic system performance reviews or post-hoc verification for logged decisions or actions will depend on organizational preferences and needs.

Tying it all Together

These six steps define our strategy for identifying the right oversight approach, and while the strategy itself is simple, we remind readers that how they answer these questions is absolutely crucial, especially in terms of how they acknowledge the interplay between the various factors we’ve isolated and the tradeoffs they’re willing to make. Still, we’d like to paint a concrete picture here: below, we’ve classified oversight levels according to three generic AI systems.

SYSTEM 1
Is the AI fully autonomous? (Yes/No) Yes, but it does execute some high-impact decisions.
Does the AI meet XAI standards? (Yes/No) No, but many internal processes can be well-explained, and the system’s logic is mostly comprehensible.
Does the AI perform reliably? (Yes/No) Yes, but it undergoes regular updates and modifications.
What’s the AI’s risk profile? (High/Medium/Low) High, but only if exploited or misused.
Is the AI static or dynamic? (Static/Dynamic) Dynamic, but dynamism changes in response to system updates.
If AI fails, are the consequences severe? (Yes/No) No, the system doesn’t inspire any liability or operational risk.
Oversight Required On-the-Loop
SYSTEM 2
Is the AI fully autonomous? (Yes/No) Yes, but it’s a traditional machine learning system.
Does the AI meet XAI standards? (Yes/No) No, but it can be explained using current explainability tools if necessary.
Does the AI perform reliably? (Yes/No) Yes, it performs with near-perfect accuracy.
What’s the AI’s risk profile? (High/Medium/Low) High, but only because it’s used for emergency response.
Is the AI static or dynamic? (Static/Dynamic) Static, though it infrequently undergoes minimal modification.
If AI fails, are the consequences severe? (Yes/No) No, a false-positive would initiate an emergency response, which is better than no response.
Oversight Required Out-of-the-Loop
SYSTEM 3
Is the AI fully autonomous? (Yes/No) No, it functions as an advisor to our legal team.
Does the AI meet XAI standards? (Yes/No) Yes, it’s rigorously documented, and its internal processes are fully visible.
Does the AI perform reliably? (Yes/No) No, but it remains helpful and its performance consistently improves with updates and modifications.
What’s the AI’s risk profile? (High/Medium/Low) Low, it only serves an assistant-based function.
Is the AI static or dynamic? (Static/Dynamic) Dynamic, frequently adapting to user preferences and feedback.
If AI fails, are the consequences severe? (Yes/No) Yes, but only if our legal team fails to verify outputs.
Oversight Required In-the-Loop

Conclusion

As stated at the beginning of this post, we’ll wrap with two brief inquiries, the first of which will outline general characteristics that oversight frameworks should contain, while the second will involve a few predictions on the future of oversight mechanisms.

Oversight Frameworks: Core Characteristics

  • Observability: Oversight frameworks must ensure that AI system behavior is continuously observable, whether it’s in real time (e.g., ITL) or via comprehensive logging and telemetry (e.g., OFTL), effectively enabling operators to monitor, analyze, and comprehend system performance, decisions, and state transitions over time. Most importantly, observability lays the foundation of accountability, diagnostics, and trustworthiness, irrespective of when or how the system is accessed.
  • Traceability & Auditability: Irrespective of complexity, all AI systems should produce durable, verifiable records of their behavior, inputs, updates, and interventions, either autonomously or with human input and guidance. Traceability elevates the possibility that oversight remains highly feasible, even in retrospective analysis. Furthermore, auditability supports accountability, auditing/investigation, and regulatory compliance across both real-time and post-hoc settings.
  • Intervenability: AI systems must be designed to allow seamless human-driven intervention, whether through real-time control, delayed overrides, or constraint mechanisms established in advance. Oversight doesn’t always require intensive (i.e., moment-to-moment) human intervention, but it must support corrective action that can be taken before, during, or after deployment, depending on the system’s autonomy level.
  • Scalability: Oversight mechanisms must scale across system complexity, deployment volume, and autonomy levels without losing efficacy. Independent of whether oversight targets a single static model or a swarm of autonomous agents, an oversight framework should retain the ability to detect potential incidents, enforce ethical and safety constraints, and verify and validate AI behavior at scale, within and across systems.
  • Defined Oversight Objectives: Effective oversight hinges on concretely defined and articulated goals, thresholds, and risk tolerances or boundaries that define the full range of acceptable behaviors for a given system. These objectives, once clearly formulated, should underpin the development, validation, monitoring, and incident response strategies an organization builds, and would be embedded into oversight tools regardless of whether the system requires real-time supervision.
  • Explainability & Interpretability: Oversight depends on the ability to recognize and understand how and why an AI system behaves as it does, whether through transparent internal processes or post-hoc review and auditing. Moreover, interpretability must be accessible to all relevant stakeholders and enable analytic scrutiny across various system stages, from input and processing to output and impact.
  • Alignment Feedback Loops: Oversight findings obtained from monitoring, audits, or incident analysis should feed back into the system’s design, training, governance policies, and/or operational constraints. Oversight is not just observational; it is cyclical, driving ongoing improvement and adaptation to maintain alignment over time.
  • Resilience to Deception & Emergence: AI oversight must anticipate that systems, especially complex or autonomous ones, could exhibit unexpected, emergent, or deceptive behaviors, preferences, and goals. A robust oversight framework should meticulously investigate and apply proactive defenses like adversarial testing and red-teaming, regular behavioral invariance checks, and anomaly detection tools to consistently maintain trustworthiness under uncertainty and ambiguity.
  • Modularity & Layered Oversight: Oversight should be distributed across multiple layers, from internal models and decision policies to environmental/inter-system interactions and overall system outputs. This layered, modular approach should enhance fault tolerance while diminishing the risk that if one component escapes control, others will be unable to support analysis, intervention, or recovery.
  • Role-Based Oversight & Escalation: Oversight responsibilities should be clearly delineated across stakeholder roles and functions, with clearly defined permissions, escalation protocols, and intervention scopes and strategies. Even in highly autonomous systems, governance structures must supports oversight mechanisms that allow operation within institutional, technical, and ethical boundaries, tailored to each role’s expertise and authority.

Oversight Mechanisms: Predictions

  • AI in-the-loop: This is exactly what it sounds like; instead of placing a human ITL, we’d place an AI ITL, which would be responsible for automating the functions that a human ITL would typically perform. This approach could support ITL oversight for high-autonomy and complexity high-risk systems, for which true ITL oversight remains practically unfeasible today.
  • Human and AI in-the-loop: In this context, a human ITL wouldn’t be responsible for verifying an AI ITL’s performance. Rather, the human would accompany and collaborate with the AI ITL, and the human ITL would perform functions that the AI ITL isn’t yet capable of performing reliably. Think of this as a human-AI oversight partnership.
  • Self-monitoring AI: An AI that self-monitors its decision-making and action-execution processes in real-time, with a high degree of transparency and explainability, performing a variety of functions like self-auditing, risk assessment, dynamic alignment and policy checks, and incident reporting autonomously, then logging and escalating findings to OFTL human operators for post-hoc review.
  • AI in-the-loop with human on-the-loop: An AI ITL that directly reports to a human OTL. The human OTL would supervise the AI ITL and have the authority to intervene, correct, or guide the AI ITL’s behavior when necessary; otherwise, the AI ITL will automate all ITL functions.
  • Multi-layered AI on-the-loop: This approach would target multi-agent systems, whereby at each layer of the system, an AI OTL is implemented. This could manifest in one of two ways: (1) an AI that doesn’t perform any functions within the multi-agent system is independently inserted at each layer of the agent hierarchy for OTL monitoring or (2) each agent within the agent hierarchy is required to perform OTL monitoring for the agent(s) that exist in the layer beneath it. In both cases, OTL monitoring decisions and actions would permeate and aggregate through layers of the agent hierarchy (to maintain coherence), getting logged by the final AI ITL, which would generate a report for OFTL human operators responsible for post-hoc review.
  • AI at the bottom and AI at the top: This also targets multi-agent systems, specifically those that contain multiple agents (within a single layer of the agent hierarchy) or layers (beyond what’s pragmatically manageable by independent AIs or agents themselves). In essence, this approach involves implementing a single AI OTL (independent or part of the system) at the lowest and highest layers of the agent hierarchy. The success of this approach would depend on whether the action/decision process the system follows always assumes the same protocol (e.g., bottom-up vs. any agent within any layer can trigger actions/decisions as needed).

For those who’ve found this post insightful and/or pragmatically useful, we suggest following Lumenova’s blog, where you can explore a range of diverse content on everything from AI governance and innovation to risk management and literacy. For those who’d like to dig deeper and see what frontier AI is capable of, we encourage you to take a look at our AI experiments, published weekly.

Similarly, if you’ve already begun your AI governance and risk management journey, we invite you to check out Lumenova’s RAI platform and book a product demo today to address your RAI needs throughout the AI lifecycle.


Related topics: AI Agents

Make your AI ethical, transparent, and compliant - with Lumenova AI

Book your demo