Organizational Theory as the Missing Foundation for Agentic AI Systems

Jonathan H. Westover, PhD
Apr 17
24 min read

Listen to this article:

Abstract: As artificial intelligence systems evolve toward multi-agent architectures, practitioners and researchers are rediscovering coordination challenges that organizational theorists have studied for decades. Current agentic AI implementations often assume models possess unlimited managerial capacity, ignore well-established principles of span of control, and rely on unstructured information transfer between agents. This creates predictable coordination failures, token inefficiency, and brittleness at scale. Drawing on organizational theory—particularly concepts of span of control, boundary objects, coupling mechanisms, and bounded rationality—this article argues that effective multi-agent AI systems require deliberate organizational design choices, not merely more capable models. Through examination of early implementations and organizational parallels, we identify evidence-based design principles including hierarchical structuring with intermediate coordination layers, structured artifacts for inter-agent communication, and calibrated coupling mechanisms. Organizations that approach agentic AI as an organizational design challenge rather than purely a technical one will achieve more reliable, scalable, and economically viable systems. The article concludes by proposing experimental directions that integrate organizational science with AI system design.

The rapid advancement of large language models has sparked intense interest in agentic AI—systems where multiple AI agents coordinate to accomplish complex tasks that exceed the capability of any single agent. These systems promise to automate sophisticated workflows spanning research, software development, customer service, and strategic analysis. Early demonstrations suggest transformative potential: agents that plan, delegate, execute, and integrate results across multi-step processes.

Yet as practitioners move from prototype to production, a pattern emerges. Systems that work elegantly with three agents collapse under coordination overhead with ten. Orchestrator agents that manage dozens of subordinate agents produce incoherent results despite each sub-agent performing adequately in isolation. Information passed between agents degrades or gets misinterpreted. Token costs explode as agents repeatedly clarify, restate, and reconcile. The bottleneck isn't model capability—it's coordination.

These challenges are not new. Organizational theorists have spent over a century studying how humans coordinate in groups, how information flows across boundaries, and what structures enable effective action at scale (Galbraith, 1974; March & Simon, 1958; Thompson, 1967). Classic concepts—span of control, boundary objects, coupling, bounded rationality—directly address the failures appearing in multi-agent AI. A human manager's effective span of control rarely exceeds seven to ten direct reports (Urwick, 1956; Van Fleet & Bedeian, 1977). Marketing teams and engineering teams use prototypes and specifications as boundary objects to coordinate despite different expertise and vocabularies (Star & Griesemer, 1989). Organizations calibrate how tightly units are coupled based on task interdependence and environmental uncertainty (Orton & Weick, 1990; Perrow, 1967).

Enterprises are investing heavily in agentic systems for code generation, business process automation, and knowledge work augmentation. Vendors promise "agent swarms" that self-organize to solve complex problems. Yet without organizational design principles, these systems risk becoming expensive, unreliable, and unable to scale beyond demonstrations. Organizations that recognize agentic AI as fundamentally an organizational design problem—not merely a technical one—stand to gain significant competitive advantage.

This article synthesizes organizational theory with emerging multi-agent AI practice. We examine how principles of span of control, boundary objects, and coupling apply to agent coordination, review evidence from early implementations, and propose design patterns that organizations can adopt today. Our goal is to bridge two communities—organizational scientists and AI practitioners—whose combined insight is essential for building reliable agentic systems.

The Agentic AI Landscape

Defining Agentic AI in Practice

Agentic AI refers to systems in which one or more AI models operate with some degree of autonomy to pursue goals, make decisions, and take actions beyond single-turn response generation (Wang et al., 2024). Unlike traditional chatbots that respond to discrete queries, agents maintain state, plan sequences of actions, invoke tools, and coordinate with other agents or humans.

In practice, agentic systems span a continuum. At the simpler end, a single agent might iterate on a coding task—writing code, running tests, debugging failures, and refining until tests pass (Qian et al., 2024). At the complex end, multiple specialized agents divide labor: one agent researches a topic, another synthesizes findings, a third generates graphics, and an orchestrator coordinates handoffs and final assembly.

Common architectural patterns include:

Single-agent with tools: One model invokes external APIs, executes code, or searches databases to accomplish tasks requiring multiple steps.
Chain-of-agents: Tasks flow sequentially through specialized agents (e.g., requirements analysis → design → implementation → testing).
Hierarchical orchestration: A primary agent decomposes tasks and delegates to subordinate agents, then integrates results.
Peer-to-peer collaboration: Agents negotiate, critique each other's outputs, or vote to reach consensus.

The architectural choice significantly affects coordination demands. Hierarchical systems concentrate coordination burden on the orchestrator. Peer-to-peer systems distribute coordination but require sophisticated negotiation protocols.

State of Practice and Emerging Patterns

Industry adoption of agentic AI is accelerating. GitHub Copilot Workspace and similar tools allow developers to describe features in natural language and receive multi-file implementations generated through agent-driven workflows. Customer service platforms deploy agents that triage inquiries, retrieve relevant knowledge, and draft responses with human oversight (Huang & Rust, 2021). Consulting firms experiment with research agents that gather data, synthesize findings, and produce draft reports (Davenport & Mittal, 2022).

Despite enthusiasm, practitioners report consistent challenges:

Coordination overhead: As agent count increases, the proportion of tokens spent on inter-agent communication rather than productive work grows rapidly.
Context fragmentation: Information gets lost or distorted as it passes through multiple agents, similar to the "telephone game" effect in human organizations.
Brittle handoffs: Agents optimized for their specialized task often produce outputs that downstream agents cannot effectively use.
Scalability walls: Systems that work with 3-5 agents degrade sharply at 10+ agents, not because individual agents fail but because coordination breaks down.

These patterns mirror well-documented organizational pathologies. The question is whether organizational theory can provide systematic solutions rather than trial-and-error discovery.

Organizational and Individual Consequences of Poor Agent Coordination

Organizational Performance Impacts

Poorly coordinated agentic systems impose measurable costs on organizations. Token consumption provides a direct financial metric. When agents must repeatedly clarify instructions, restate context, or reconcile conflicting outputs, token usage can increase 3-5× compared to well-coordinated alternatives. At current API pricing, this transforms economically viable automation into cost-prohibitive experimentation.

Latency compounds the problem. Multi-agent workflows that require sequential processing already face multiplicative latency—if each of five agents takes 10 seconds, the workflow requires 50 seconds minimum. Coordination failures that necessitate retries or reconciliation extend this further. For customer-facing applications, latency beyond 30-60 seconds undermines user experience and adoption (Luo et al., 2019).

Reliability suffers as coordination complexity increases. Early empirical studies of agentic coding systems show that success rates decline as task complexity grows, often due to coordination failures rather than individual agent capability limitations (Qian et al., 2024). When a ten-agent workflow has 90% reliability per agent, overall workflow reliability drops to 35% without effective coordination mechanisms.

Organizations also face technical debt accumulation when coordination mechanisms are ad hoc. Each new agent type requires custom integration logic. Changing one agent's output format cascades through dependent agents. This brittleness slows innovation and increases maintenance burden, familiar problems in poorly architected software systems (Martini et al., 2018).

Individual Wellbeing and Stakeholder Impacts

End users—employees interacting with agentic systems or customers served by them—experience frustration when coordination failures produce incoherent results. A customer receiving contradictory information from different "expert agents" within a single interaction loses trust in the entire system (Følstad & Brandtzæg, 2017).

For knowledge workers whose roles involve orchestrating AI agents, coordination complexity creates cognitive burden. Managing ten autonomous agents without clear structure mirrors the documented stress of excessive managerial span of control in human organizations (Puranam et al., 2014). Users resort to micromanaging agents to prevent coordination failures, undermining the autonomy that makes agentic systems valuable.

Development teams building agentic systems face their own challenges. Without established patterns for agent coordination, teams reinvent solutions, debug opaque inter-agent failures, and struggle to diagnose whether problems stem from model capability or coordination design. This extends development timelines and increases likelihood of project abandonment.

Executives evaluating agentic AI investments face uncertainty. Proofs-of-concept succeed, but production scaling fails in ways that are difficult to predict or cost. This leads to either over-caution—rejecting valuable automation—or over-investment in systems that cannot deliver promised ROI.

Evidence-Based Organizational Responses

Implementing Span of Control Principles

Organizational research consistently demonstrates that effective management requires bounded spans of control—the number of subordinates one manager can effectively supervise. Classical management theorists suggested ideal spans of 5-7 direct reports (Urwick, 1956). Empirical studies show variation based on task complexity, subordinate capability, and geographic distribution, but confirm that excessively wide spans reduce supervisory effectiveness, increase errors, and decrease subordinate satisfaction (Van Fleet & Bedeian, 1977; Davison, 2003).

These principles apply directly to agentic AI. An orchestrator agent managing dozens of subordinate agents faces analogous challenges: tracking progress across multiple parallel tasks, maintaining coherent context about each agent's state, resolving conflicts between agent outputs, and integrating results into a unified product. Unlike humans, language models don't experience fatigue, but they face context window limits and attention degradation across long contexts (Liu et al., 2024). An orchestrator tracking 50 agents likely loses critical details or makes integration errors.

Effective approaches to managing agent span of control:

Hierarchical structuring with intermediate layers: Introduce middle-manager agents that coordinate 5-7 subordinate agents each and report to a top-level orchestrator. This mirrors organizational hierarchies and keeps each agent's coordination burden manageable.
Functional clustering: Group related agents under specialized coordinators—a "research manager" agent coordinates multiple research specialist agents; a "development manager" coordinates coding, testing, and documentation agents. This reduces cross-domain coordination complexity.
Dynamic team formation: Rather than all agents active simultaneously, activate subteams as needed for specific task phases. A writing workflow might activate research agents first, then synthesis agents, then editing agents, reducing concurrent coordination demands.
Explicit delegation protocols: Standardize how orchestrators assign tasks to subordinates, including clear success criteria, expected output formats, and escalation triggers if subordinates encounter problems. This reduces supervisory overhead for routine cases.

Several AI development platforms have begun implementing hierarchical patterns. Anthropic's research on constitutional AI includes experiments with "critic" and "executor" agent separation, effectively creating a two-tier hierarchy (Bai et al., 2022). Microsoft's Semantic Kernel framework provides abstractions for agent hierarchies, though best practices for depth and span are still emerging.

OpenAI o1's approach to complex reasoning demonstrates the value of internal structure. Rather than generating answers in a single pass, o1 performs extended "thinking" through internal chains of reasoning before producing outputs (OpenAI, 2024). While not a multi-agent system per se, this reflects hierarchical decomposition—high-level reasoning coordinates lower-level analytical steps. Organizations building multi-agent systems can apply similar principles, explicitly decomposing coordination across agent layers rather than expecting flat agent pools to self-organize.

Designing and Deploying Boundary Objects

Organizational theorists use the term boundary objects to describe artifacts that facilitate coordination across groups with different expertise, vocabularies, and goals (Star & Griesemer, 1989). A prototype, for instance, allows engineers and marketers to discuss a product despite different priorities and terminology. The prototype is "plastic" enough that each group can adapt it to their needs, yet "robust" enough to maintain shared meaning.

Current agentic systems typically pass unstructured text between agents—essentially meeting notes or email chains. This works adequately for simple handoffs but fails as complexity grows. Downstream agents must parse freeform text, infer structure, and hope upstream agents included everything necessary. Information degrades with each handoff.

Structured boundary objects would provide shared data structures that multiple agents read from and write to, with defined schemas that enforce information completeness. Consider a software development workflow:

A requirements document object with fields for user stories, acceptance criteria, and constraints
A design specification object with architectural decisions, component interfaces, and data models
A implementation artifact object with code files, test suites, and documentation
A review feedback object with categorized comments, severity ratings, and resolution status

Each agent reads the objects relevant to its role and writes to designated fields. An orchestrator tracks which objects exist and which agents have completed their contributions. This approach offers several advantages:

Reduced ambiguity: Structured formats make explicit what information is required, reducing back-and-forth clarification.
Incremental updates: Agents can update specific object fields without regenerating entire contexts.
Parallel work: Multiple agents can safely work on different fields of the same object simultaneously.
Validation: Schemas enable automated checking that outputs meet downstream requirements before handoff.
Audit trails: Changes to objects are versioned and attributed to specific agents, facilitating debugging.

Effective approaches to boundary object implementation:

Task-specific schemas: Define object structures tailored to workflow domains—research workflows use evidence citation objects; software workflows use requirement and specification objects; creative workflows use style guide and asset library objects.
Progressive elaboration: Design objects that support iterative refinement. Early workflow stages populate high-level fields; later stages add detail without breaking earlier content.
Multi-level abstractions: Include both human-readable narrative fields and machine-structured data fields, allowing human oversight without forcing agents to parse unstructured text.
Cross-referencing mechanisms: Enable objects to reference each other explicitly, maintaining relationships (e.g., a design specification references the requirements it addresses; implementation artifacts reference their specifications).

Stripe's approach to internal API development provides a relevant parallel. Engineering teams use detailed API specification documents (boundary objects) that product, backend, and frontend teams all reference. These specs include interface contracts, example payloads, and error conditions. Different teams work from the same object, ensuring alignment despite different implementation concerns (Stripe, 2023). Agentic systems could adopt similar patterns—a specification object that research, design, and implementation agents all read and update creates shared understanding.

Atlassian has long emphasized structured work artifacts in Jira and Confluence (Atlassian, 2022). User stories follow defined formats; technical designs use templates; documentation follows hierarchical structures. This standardization allows distributed teams—often across continents—to coordinate without constant synchronous communication. The same principle enables distributed agents to coordinate via shared, structured artifacts rather than unstructured message passing.

Calibrating Coupling Mechanisms

Organizational coupling refers to how tightly units within an organization are bound together (Orton & Weick, 1990). Tight coupling means units are highly interdependent—one unit's actions immediately constrain another's. Loose coupling means units operate more independently, with buffering or delayed interactions. Each has tradeoffs:

Tight coupling enables precise coordination and rapid error detection but reduces flexibility and increases brittleness. A problem in one unit immediately halts dependent units.
Loose coupling allows flexibility and resilience but risks drift, inconsistency, and delayed problem detection.

Classic organizational theory suggests matching coupling to task characteristics (Thompson, 1967; Perrow, 1967). Tasks with high interdependence and low ambiguity benefit from tighter coupling. Tasks with uncertainty and need for innovation benefit from looser coupling.

Current agentic systems tend toward extremes. Some require orchestrator approval after every sub-agent action (extreme tight coupling), creating bottlenecks and negating agent autonomy. Others grant agents broad autonomy with minimal oversight (extreme loose coupling), leading to divergent interpretations and wasted work when agents misunderstand requirements.

Effective approaches to coupling calibration:

Phase-based coupling: Vary coupling tightness across workflow stages. Initial planning phases use tighter coupling to ensure shared understanding; execution phases use looser coupling for parallel work; integration phases tighten again for coherence checking.
Exception-based oversight: Establish clear operating parameters for agents and intervene only when agents signal uncertainty or when output validation fails. This provides autonomy for routine cases while maintaining control for edge cases.
Asynchronous checkpoints: Rather than synchronous approval after each step, define checkpoints where agents publish intermediate results for review before proceeding. Downstream agents can start work based on preliminary outputs while upstream refinements continue.
Buffering with intermediate artifacts: Place boundary objects between agents as buffers. Upstream agents complete outputs and move on; downstream agents work from stable artifacts rather than depending on upstream agents remaining "online."
Confidence signaling: Have agents report confidence levels alongside outputs. High-confidence outputs proceed with looser coupling; low-confidence outputs trigger tighter oversight or additional validation.

Toyota's production system exemplifies calibrated coupling (Liker, 2004). Assembly line stations are tightly coupled through just-in-time inventory, ensuring quality issues are immediately visible. But suppliers are more loosely coupled, with buffer inventory and flexible delivery windows, allowing them to optimize their own operations. Agentic systems can similarly vary coupling: tightly couple agents working on immediately interdependent sub-tasks (like code generation and immediate testing) while loosely coupling agents working on parallel concerns (like multiple research threads or independent feature implementations).

Spotify's squad model for software development provides another example (Kniberg & Ivarsson, 2012). Squads operate with substantial autonomy (loose coupling) but share standards through "chapters" and "guilds" that set best practices. Squads can move quickly without constant coordination, yet maintain enough consistency for integration. Agentic systems could adopt analogous patterns—agents operate autonomously within defined standards, with periodic synchronization on shared conventions and integration points.

Managing Bounded Rationality and Information Processing

Herbert Simon's concept of bounded rationality recognizes that decision-makers face cognitive limits, incomplete information, and time constraints (Simon, 1957; March & Simon, 1958). Organizations develop routines, heuristics, and simplified models to make tractable decisions despite these limits. Information processing theory extends this, examining how organizations design structures and processes to match information processing requirements with capacity (Galbraith, 1974; Tushman & Nadler, 1978).

Language models face analogous limitations. Context windows, while growing, remain finite. Attention mechanisms degrade with context length, causing models to "forget" information from distant parts of long contexts (Liu et al., 2024). Processing very long contexts increases latency and cost. Models exhibit systematic biases, such as favoring information early or late in context ("lost in the middle" effects). These aren't defects—they're inherent constraints of current architectures.

Organizations address bounded rationality through division of labor, abstraction, and hierarchical information aggregation. The same strategies apply to agentic systems.

Effective approaches to managing agent information processing limits:

Specialization and modularity: Assign agents narrow, well-defined roles rather than expecting generalist agents to handle everything. A specialized code review agent maintains expertise in coding standards; a security review agent focuses on vulnerabilities. Each operates within bounded information domains.
Hierarchical information summarization: As information flows up organizational hierarchies, each level abstracts and summarizes. Worker agents report detailed outputs to middle managers; middle managers provide aggregated summaries to top orchestrators. This keeps each agent's information load manageable.
Explicit memory systems: Rather than relying on context windows to carry all information forward, maintain external memory stores. Agents write key facts, decisions, and intermediate results to shared memory that can be selectively retrieved when needed, rather than carrying full transcripts.
Attention directing mechanisms: When presenting information to agents, use explicit markers, formatting, or salience signals to highlight critical information and reduce load on attention mechanisms. This mirrors how human organizations use executive summaries and bolded key points.
Satisficing rather than optimizing: Design workflows where agents seek satisfactory solutions rather than optimal ones. Accept "good enough" intermediate outputs that meet defined thresholds rather than requiring perfection, reducing decision complexity.

McKinsey & Company's knowledge management system provides a relevant organizational example (Baumard & Starbuck, 2005). Consultants working on projects cannot read every document ever produced by the firm—bounded rationality limits apply. Instead, the firm maintains structured repositories with metadata, summaries, and expert directories. When consultants need information, they search efficiently and retrieve relevant excerpts, not entire document histories. Agentic systems should similarly provide agents with retrieval mechanisms and summaries rather than forcing comprehensive context inclusion.

NASA's Mars rover operations demonstrate effective management of information processing under extreme constraints (Malin & Edgett, 2001). Ground teams cannot directly teleoperate rovers due to communication delays. Instead, they provide high-level goals and constraints; rovers autonomously execute within those parameters, reporting back summaries and requesting guidance on ambiguous cases. This balances autonomy with oversight despite severe information and communication limitations. Agentic AI systems can adopt similar patterns: orchestrators set goals and constraints; executor agents work autonomously within bounds; exceptions escalate for guidance.

Establishing Communication Protocols and Shared Language

Organizational communication research emphasizes that effective coordination requires not just information exchange but shared interpretive frameworks—common vocabularies, mental models, and assumptions that allow messages to be correctly understood (Cramton, 2001; Weick & Roberts, 1993). Distributed teams that lack shared context experience more misunderstandings and coordination failures than co-located teams.

Multi-agent AI systems face similar challenges. Agents fine-tuned on different datasets, using different prompt patterns, or optimized for different objectives may interpret the same instructions differently. An orchestrator's instruction to "improve code quality" might be interpreted as refactoring for readability by one agent, optimizing performance by another, and adding test coverage by a third.

Effective approaches to establishing shared language among agents:

Domain-specific ontologies: Define controlled vocabularies for the workflow domain. In software development, establish precise definitions for terms like "component," "module," "service," and "interface." In research workflows, define "source," "evidence," "claim," and "synthesis." Agents reference this ontology when communicating.
Standardized instruction templates: Rather than freeform natural language requests, use structured templates for common agent interactions. A task assignment template might specify objective, deliverables, constraints, and success criteria in fixed fields. This reduces ambiguity.
Explicit confirmation protocols: Have receiving agents restate their understanding of assignments back to the sender before proceeding. This catches misinterpretations early, similar to aviation's "read-back" protocols that reduce miscommunication errors.
Shared examples and non-examples: Include concrete examples of desired outputs and anti-examples of common mistakes in agent instructions. This grounds abstract instructions in specifics.
Iterative protocol refinement: Monitor where coordination failures occur, identify root-cause misunderstandings, and refine communication protocols to address them. Treat protocols as living documents that improve with operational experience.

Boeing's commercial aircraft development involves coordination across thousands of engineers in multiple locations designing highly interdependent systems (Hirshorn, 2000). Boeing maintains extensive design standards, interface specifications, and communication protocols. When an electrical systems engineer specifies a requirement, mechanical engineers interpreting it use shared definitions of terms like "maximum load," "operating envelope," and "failure mode." This shared language, documented in thick standards manuals, prevents costly misinterpretations. Agentic systems need analogous shared vocabularies to prevent agents from diverging interpretations.

Table 1: Organizational Theory Principles for Agentic AI System Design

Organizational Concept	Description	Proposed AI Implementation Strategy	Benefits for AI Coordination	Real-world Organizational Example	Key Challenge Addressed
Span of Control	The number of subordinates one manager can effectively supervise, typically 5-7 individuals.	Introduce hierarchical structuring with intermediate manager agents coordinating clusters of 5-7 subordinates to avoid orchestrator overload.	Maintains coherent context, reduces integration errors, and prevents attention degradation across long contexts.	Classical management theory (Urwick, 1956) suggesting ideal ratios for supervisory effectiveness.	Orchestrator agents managing too many subordinates produce incoherent results and face context window limits.
Boundary Objects	Artifacts that facilitate coordination across groups with different expertise and goals while maintaining shared meaning.	Use structured data schemas (JSON/objects) for inter-agent handoffs rather than unstructured text.	Reduces ambiguity, enables automated validation of outputs, and allows parallel work on specific fields.	Stripe's API specifications or Atlassian's Jira/Confluence templates used by distributed teams.	Information degradation and the "telephone game" effect where context is lost or distorted during handoffs.
Coupling Mechanisms	The degree of interdependence between units, ranging from tight (highly bound) to loose (independent/buffered).	Apply phase-based or exception-based coupling; tightly couple interdependent sub-tasks while loosely coupling parallel threads.	Balances autonomy with consistency; prevents bottlenecks caused by constant human/orchestrator approval.	Toyota's Production System (tightly coupled assembly but loosely coupled suppliers) and Spotify's squad model.	Brittle handoffs and efficiency bottlenecks caused by extreme tight or loose oversight.
Bounded Rationality	The recognition that decision-makers face cognitive limits, incomplete information, and time constraints.	Implement specialized modular agents, hierarchical information summarization, and explicit external memory stores.	Manages information load by preventing models from being 'lost in the middle' of massive contexts.	McKinsey & Company's knowledge management system and NASA's Mars rover autonomous operations.	Context window limits and attention mechanism degradation in long sequences.
Shared Language / Communication Protocols	Common vocabularies and mental models that allow messages to be correctly understood across distributed teams.	Define domain-specific ontologies, standardized instruction templates, and confirmation (read-back) protocols.	Ensures agents optimized for different tasks interpret instructions identically (e.g., 'improve quality').	Boeing’s design standards manuals used by electrical and mechanical engineers.	Misinterpretation of natural language instructions across diverse model types or prompt patterns.

Building Long-Term Organizational Capability for Agentic AI

Developing Agent Coordination Maturity Models

Organizations adopting agentic AI should approach it as a capability that matures over time, not a one-time implementation. Drawing on capability maturity models from software engineering and organizational development (Paulk et al., 1993), we can outline progressive maturity levels:

Level 1 - Ad Hoc: Single-agent or simple multi-agent systems without explicit coordination mechanisms. Success depends on individual agent capability and luck. Coordination challenges are addressed reactively through prompt engineering.
Level 2 - Repeatable: Basic coordination patterns established for common workflows. Simple hierarchies or chains implemented. Structured handoff formats used for well-understood agent interactions. Success is repeatable for familiar tasks but new workflows require significant trial-and-error.
Level 3 - Defined: Organization has documented coordination patterns, standard boundary object schemas, and coupling guidelines for different task types. Developers reference these patterns when building new agentic workflows. Span of control and hierarchy depth are consciously designed choices rather than arbitrary.
Level 4 - Managed: Coordination effectiveness is measured. Organizations track metrics like tokens-per-productive-output, coordination-overhead-ratio, and agent-utilization efficiency. Coordination patterns are empirically validated and refined based on operational data. A/B testing compares coordination approaches.
Level 5 - Optimizing: Organizations systematically experiment with coordination innovations. They contribute coordination patterns back to the community. Coordination design is recognized as a distinct competency. Cross-functional teams including organizational designers, AI engineers, and domain experts collaborate on agent system design.

Moving up this maturity curve requires investment in documentation, training, and experimentation infrastructure—not just model improvement. Organizations at Level 3+ gain significant competitive advantage in deploying reliable agentic systems.

Creating Cross-Disciplinary Agent Design Teams

Effective agentic system design requires expertise spanning multiple domains: machine learning engineering, software architecture, domain knowledge, and organizational design. Yet most current development teams lack organizational design expertise, while organizational researchers rarely engage with AI systems directly.

Organizations building long-term agentic AI capability should establish cross-disciplinary agent design teams:

Organizational designers or industrial-organizational psychologists: Bring expertise in coordination mechanisms, information flows, role design, and span of control. They translate organizational principles into system requirements.
AI/ML engineers: Understand model capabilities, limitations, and implementation constraints. They translate organizational designs into working technical systems and identify where organizational principles conflict with technical realities.
Domain experts: Contribute deep understanding of the workflows being automated—whether software development, research, customer service, or other domains. They ensure coordination mechanisms align with actual task requirements.
Data scientists/analysts: Instrument systems to measure coordination effectiveness, identify bottlenecks, and validate whether design changes improve outcomes. They enable evidence-based iteration on coordination patterns.

This cross-disciplinary composition mirrors successful approaches in other domains. User experience design teams combine psychology, visual design, and software engineering. Healthcare system redesign combines clinicians, operations researchers, and health informaticists. The complex coordination challenges of agentic AI similarly require diverse expertise.

IDEO's approach to innovation combines designers, engineers, and business strategists in integrated teams (Brown, 2008). Projects benefit from simultaneous consideration of human needs, technical feasibility, and business viability. Agentic AI development would benefit from analogous integration—technical feasibility, organizational coordination principles, and domain requirements addressed simultaneously rather than sequentially.

Procter & Gamble's Connect + Develop program brought together internal R&D scientists, external partners, and business strategists to identify and develop innovations (Huston & Sakkab, 2006). The diversity of perspectives reduced blind spots and enabled faster problem-solving. Organizations developing agentic systems should similarly integrate diverse expertise rather than relegating organizational considerations to afterthoughts.

Establishing Agent Governance and Oversight Frameworks

As agentic systems become more autonomous and handle higher-stakes tasks, organizations need governance frameworks that balance autonomy with appropriate oversight (Cath et al., 2018). This parallels corporate governance—providing accountability and risk management without micromanaging every decision.

Key elements of agent governance frameworks include:

Role and authority boundaries: Explicit documentation of what each agent type is authorized to do, what it must escalate, and what is prohibited. Analogous to delegation-of-authority matrices in human organizations.
Audit and monitoring mechanisms: Logging of agent decisions and actions sufficient to reconstruct decision paths. Periodic review of agent coordination patterns to identify drift, inefficiencies, or emerging risks.
Human-in-the-loop calibration: Defining which decisions require human approval based on risk, uncertainty, or value. High-stakes, high-uncertainty decisions flow to humans; routine, low-stakes decisions proceed autonomously. Thresholds should be empirically validated and refined.
Performance evaluation and feedback loops: Systematic assessment of individual agent and overall system performance. Identifying where agents excel, where they struggle, and feeding these insights back into prompt refinement, fine-tuning, or architecture changes.
Incident response protocols: Defined processes for when agent systems produce seriously flawed outputs or when coordination failures cause significant issues. Analogous to incident management in DevOps or safety management in high-reliability organizations (Weick & Sutcliffe, 2007).

Governance frameworks should be risk-proportionate. Internal research applications warrant lighter governance than customer-facing systems or systems making financial commitments. But all agentic systems benefit from thoughtful governance that enables rather than stifles autonomy.

Microsoft's approach to responsible AI includes governance frameworks that define review processes for AI systems based on impact assessments (Microsoft, 2023). Systems that touch sensitive data, interact with vulnerable populations, or make high-stakes recommendations receive more intensive review than low-risk applications. This risk-based approach balances innovation with responsibility. Organizations should adopt similar frameworks for agentic systems—routine automation proceeds with light oversight; high-stakes applications receive more intensive governance.

Investing in Coordination Experimentation Infrastructure

Organizations serious about agentic AI should establish infrastructure for systematic experimentation with coordination patterns. This includes:

Testbeds with realistic workflows: Reproductions of actual organizational workflows at sufficient scale to reveal coordination challenges. Simple toy problems won't expose real coordination issues.
Instrumentation for coordination metrics: Beyond task completion rates, measure coordination-specific outcomes like coordination overhead (tokens spent on inter-agent communication vs. productive work), information loss across handoffs, latency from coordination delays, and agent utilization rates.
A/B testing frameworks: Infrastructure to run controlled experiments comparing coordination approaches—hierarchical vs. flat structures, different span-of-control parameters, alternative boundary object schemas, tighter vs. looser coupling.
Rapid prototyping tools: Abstractions and frameworks that make it easy to instantiate different coordination patterns without rewriting everything from scratch. This enables faster exploration of the design space.
Shared learnings repositories: Documentation of what coordination patterns were tried, what worked, what failed, and why. This organizational memory prevents repeated mistakes and accelerates learning.

Without dedicated experimentation infrastructure, organizations will rely on ad hoc trial-and-error, learning slowly and wasting resources. Systematic experimentation—applying scientific method to organizational design—enables faster progress.

Google's approach to testing new features involves extensive experimentation infrastructure that enables thousands of A/B tests simultaneously (Kohavi et al., 2009). This rigorous experimentation culture identifies what truly improves user experience vs. what merely sounds appealing. Organizations should apply similar rigor to agentic coordination—systematically testing coordination hypotheses rather than assuming plausible-sounding patterns will work.

Conclusion

The trajectory of agentic AI depends as much on organizational design as on model capabilities. As systems scale from single agents to complex multi-agent workflows, coordination challenges dominate. Pretending that models have unlimited managerial capacity, passing unstructured text between agents, and choosing coupling levels arbitrarily will produce brittle, expensive, unreliable systems.

Organizational theory provides tested frameworks for addressing these challenges. Bounded spans of control prevent coordination overload. Structured boundary objects reduce information loss and ambiguity across handoffs. Calibrated coupling balances flexibility with consistency. Explicit communication protocols establish shared understanding. Hierarchical information aggregation manages bounded rationality.

Organizations that approach agentic AI through an organizational design lens—establishing cross-disciplinary teams, documenting coordination patterns, measuring coordination effectiveness, and systematically experimenting—will develop durable competitive advantages. They will deploy more reliable systems, scale them more efficiently, and adapt them more quickly as requirements evolve.

The path forward requires bridging research communities. Organizational scientists should engage with agentic AI as a domain where their theories apply and can be tested at unprecedented scale. AI researchers and practitioners should treat coordination as a first-class design concern, not an afterthought. Together, these communities can establish evidence-based patterns that make agentic AI practical and reliable.

The alternative is predictable: continued disappointment as prototype systems fail to scale, resources wasted rediscovering organizational principles, and potential underinvestment in genuinely transformative technology because early implementations don't deliver. We have decades of organizational research to draw upon. It's time to use it.

Key takeaways for practitioners:

Design hierarchies explicitly: Don't allow flat agent structures to exceed manageable spans of control (roughly 5-10 agents per coordinator). Introduce intermediate coordination layers for complex workflows.
Implement structured boundary objects: Replace unstructured text handoffs with defined schemas that multiple agents can reliably read and write. Tailor schemas to your workflow domain.
Calibrate coupling to task characteristics: Vary how tightly agents are coupled based on interdependence and uncertainty. Use phase-based or exception-based approaches rather than uniformly tight or loose coupling.
Establish shared vocabularies and protocols: Define ontologies, instruction templates, and confirmation protocols that reduce misinterpretation across agents.
Build cross-disciplinary teams: Combine AI engineering, organizational design, domain expertise, and analytics. Coordination effectiveness requires diverse perspectives.
Invest in measurement and experimentation: Instrument systems to measure coordination overhead and effectiveness. Systematically test coordination alternatives rather than relying on intuition.
Treat coordination as a capability to mature: Progress from ad hoc implementations to defined patterns to systematic optimization. Document and refine coordination approaches over time.

The organizations that succeed with agentic AI will be those that recognize it as fundamentally an organizational design challenge. The models will keep improving. The real competitive advantage lies in designing the coordination structures that enable them to work together effectively.

Research Infographic

Organizational Design for Agentic AI Slide Deck

References

Atlassian. (2022). Team playbook: Health monitor and plays. Atlassian.
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., Chen, C., Olsson, C., Olah, C., Hernandez, D., Drain, D., Ganguli, D., Li, D., Tran-Johnson, E., Perez, E., ... Kaplan, J. (2022). Constitutional AI: Harmlessness from AI feedback. Anthropic.
Baumard, P., & Starbuck, W. H. (2005). Learning from failures: Why it may not happen. Long Range Planning, 38(3), 281–298.
Brown, T. (2008). Design thinking. Harvard Business Review, 86(6), 84–92.
Cath, C., Wachter, S., Mittelstadt, B., Taddeo, M., & Floridi, L. (2018). Artificial intelligence and the 'good society': The US, EU, and UK approach. Science and Engineering Ethics, 24(2), 505–528.
Cramton, C. D. (2001). The mutual knowledge problem and its consequences for dispersed collaboration. Organization Science, 12(3), 346–371.
Davenport, T. H., & Mittal, N. (2022). All in on AI: How smart companies win big with artificial intelligence. Harvard Business Review Press.
Davison, B. (2003). Management span of control: How wide is too wide? Journal of Business Strategy, 24(4), 22–29.
Følstad, A., & Brandtzæg, P. B. (2017). Chatbots and the new world of HCI. interactions, 24(4), 38–42.
Galbraith, J. R. (1974). Organization design: An information processing view. Interfaces, 4(3), 28–36.
Hirshorn, S. R. (2000). Organizational structures for the 21st century (NASA Technical Report). NASA Johnson Space Center.
Huang, M. H., & Rust, R. T. (2021). A strategic framework for artificial intelligence in marketing. Journal of the Academy of Marketing Science, 49(1), 30–50.
Huston, L., & Sakkab, N. (2006). Connect and develop: Inside Procter & Gamble's new model for innovation. Harvard Business Review, 84(3), 58–66.
Kniberg, H., & Ivarsson, A. (2012). Scaling agile @ Spotify with tribes, squads, chapters & guilds. Spotify.
Kohavi, R., Crook, T., Longbotham, R., Frasca, B., Henne, R., Ferres, J. L., & Melamed, T. (2009). Online experimentation at Microsoft. Data Mining Case Studies, 11.
Liker, J. K. (2004). The Toyota way: 14 management principles from the world's greatest manufacturer. McGraw-Hill.
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173.
Luo, X., Tong, S., Fang, Z., & Qu, Z. (2019). Frontiers: Machines vs. humans: The impact of artificial intelligence chatbot disclosure on customer purchases. Marketing Science, 38(6), 937–947.
Malin, M. C., & Edgett, K. S. (2001). Mars Global Surveyor Mars Orbiter Camera: Interplanetary cruise through primary mission. Journal of Geophysical Research: Planets, 106(E10), 23429–23570.
March, J. G., & Simon, H. A. (1958). Organizations. Wiley.
Martini, A., Bosch, J., & Chaudron, M. (2018). Architecture technical debt: Understanding causes and a qualitative model. IEEE Software, 35(5), 88–93.
Microsoft. (2023). Responsible AI principles and approach. Microsoft Corporation.
OpenAI. (2024). Learning to reason with LLMs. OpenAI.
Orton, J. D., & Weick, K. E. (1990). Loosely coupled systems: A reconceptualization. Academy of Management Review, 15(2), 203–223.
Paulk, M. C., Curtis, B., Chrissis, M. B., & Weber, C. V. (1993). Capability maturity model, version 1.1. IEEE Software, 10(4), 18–27.
Perrow, C. (1967). A framework for the comparative analysis of organizations. American Sociological Review, 32(2), 194–208.
Puranam, P., Alexy, O., & Reitzig, M. (2014). What's "new" about new forms of organizing? Academy of Management Review, 39(2), 162–180.
Qian, C., Liu, W., Liu, H., Chen, N., Dang, Y., Li, J., Yang, C., Chen, W., Su, Y., Cong, Y., & Liu, T. (2024). ChatDev: Communicative agents for software development. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics.
Simon, H. A. (1957). Models of man: Social and rational. Wiley.
Star, S. L., & Griesemer, J. R. (1989). Institutional ecology, 'translations' and boundary objects: Amateurs and professionals in Berkeley's Museum of Vertebrate Zoology, 1907-39. Social Studies of Science, 19(3), 387–420.
Stripe. (2023). API design principles. Stripe, Inc.
Thompson, J. D. (1967). Organizations in action: Social science bases of administrative theory. McGraw-Hill.
Tushman, M. L., & Nadler, D. A. (1978). Information processing as an integrating concept in organizational design. Academy of Management Review, 3(3), 613–624.
Urwick, L. (1956). The manager's span of control. Harvard Business Review, 34(3), 39–47.
Van Fleet, D. D., & Bedeian, A. G. (1977). A history of the span of management. Academy of Management Review, 2(3), 356–372.
Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., Zhao, W. X., Wei, Z., & Wen, J. R. (2024). A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6), 186345.
Weick, K. E., & Roberts, K. H. (1993). Collective mind in organizations: Heedful interrelating on flight decks. Administrative Science Quarterly, 38(3), 357–381.
Weick, K. E., & Sutcliffe, K. M. (2007). Managing the unexpected: Resilient performance in an age of uncertainty (2nd ed.). Jossey-Bass.

Jonathan H. Westover, PhD is Chief Research Officer (Nexus Institute for Work and AI); Associate Dean and Director of HR Academic Programs (WGU); Professor, Organizational Leadership (UVU); OD/HR/Leadership Consultant (Human Capital Innovations). Read Jonathan Westover's executive profile here.

Suggested Citation: Westover, J. H. (2026). Organizational Theory as the Missing Foundation for Agentic AI Systems. Human Capital Leadership Review, 33(1). doi.org/10.70175/hclreview.2020.33.1.7