When Artificial Intelligence Confronts the Unknown: ARC-AGI-3 and the Future of Adaptive Intelligence

Jonathan H. Westover, PhD
5 hours ago
16 min read

Listen to a review of this article:

Abstract: As artificial intelligence systems demonstrate increasing proficiency across specialized domains, the fundamental question persists: how close are we to genuine artificial general intelligence? This article examines the introduction of ARC-AGI-3, an interactive benchmark designed to measure agentic intelligence through exploration, goal inference, and adaptive planning in novel environments. Unlike predecessor benchmarks that focused on static pattern recognition, ARC-AGI-3 evaluates systems on their ability to autonomously navigate "unknown unknowns" without explicit instructions or prior exposure. With frontier AI systems scoring below 1% while humans achieve 100% success rates as of March 2026, this benchmark reveals a critical capability gap. Drawing on intelligence theory, organizational learning frameworks, and research on adaptive systems, this article explores what ARC-AGI-3 reveals about current AI limitations, the distinction between domain-specific automation and general intelligence, and the organizational implications of building truly adaptive intelligent systems. The analysis offers evidence-based insights for leaders navigating AI implementation while highlighting the distance remaining before artificial general intelligence becomes reality.

In December 2024, when OpenAI announced that its o3 system had achieved breakthrough performance on the ARC-AGI benchmark, headlines proclaimed we were nearing artificial general intelligence (AGI). Yet within months, the ARC Prize Foundation released ARC-AGI-3—a fundamentally different challenge on which the same frontier systems score below 1%, while ordinary humans with no special training solve 100% of tasks (ARC Prize Foundation, 2026). This dramatic reversal illustrates a crucial reality: the path to AGI remains far longer than optimistic projections suggest.

The practical stakes extend beyond academic interest in machine intelligence. Organizations across sectors are making substantial investments in AI automation, often based on assumptions about systems' adaptability and generalization capabilities. Understanding the true nature of current AI capabilities—what these systems can and cannot do—has become essential for realistic strategic planning. As Chollet (2019) argued in introducing the original ARC benchmark, intelligence should be measured not by task-specific performance but by the efficiency with which systems acquire new skills across unfamiliar domains.

ARC-AGI-3 represents a methodological shift in how we assess progress toward AGI. Rather than evaluating models on increasingly complex versions of tasks they've been trained to handle, it presents genuinely novel interactive environments where systems must explore, infer objectives, build internal models, and plan effective action sequences—all without explicit instructions. The benchmark's design reflects emerging consensus that general intelligence fundamentally involves adapting to situations the system was not specifically designed for (Chollet, 2019; ARC Prize Foundation, 2026).

This article examines what ARC-AGI-3 reveals about the current state of artificial intelligence and the organizational implications of this intelligence gap. We explore why systems that excel at complex reasoning within known domains struggle dramatically when confronting novel situations, what this suggests about the nature of different types of intelligence, and how organizations should calibrate their expectations and strategies accordingly.

The Intelligence Assessment Landscape

Defining Intelligence in Machine and Human Contexts

The measurement of intelligence—whether human or artificial—has long presented both conceptual and practical challenges. Traditional approaches to AI evaluation have emphasized performance on specific tasks: defeating world champions at chess or Go, achieving high scores on standardized tests, or generating human-quality text (Silver et al., 2016). However, these accomplishments, impressive as they are, may reflect narrow competence rather than general intelligence.

Chollet (2019) proposed a formal framework distinguishing skill (the ability to perform a task) from intelligence (the efficiency of acquiring new skills). This distinction proves crucial for understanding current AI capabilities. Large language models demonstrate remarkable skill across domains where they've been extensively trained, yet their ability to efficiently acquire genuinely novel capabilities remains constrained. The framework suggests that intelligence should be measured by adaptation efficiency: how quickly and with how few examples a system can master tasks it has never encountered.

This efficiency-based definition aligns with longstanding psychological theories of fluid intelligence—the capacity to reason and solve novel problems independent of acquired knowledge (Spelke & Kinzler, 2007). Humans demonstrate this capacity remarkably: we regularly encounter situations we've never experienced and successfully navigate them using core cognitive abilities. Current AI systems, by contrast, remain heavily dependent on extensive exposure to task-related data, even when equipped with sophisticated reasoning capabilities.

Evolution of the ARC-AGI Benchmark Series

The Abstraction and Reasoning Corpus (ARC-AGI-1), introduced in 2019, was designed specifically to resist the memorization shortcuts that had allowed AI systems to claim superhuman performance on other benchmarks (Chollet, 2019). Each task presented novel transformation rules that must be inferred from just a handful of input-output grid examples, with each grid containing up to 30×30 cells using 10 distinct colors. Critically, tasks were grounded in core knowledge priors—innate human cognitive capacities like understanding of objectness, basic geometry, and simple physics—rather than acquired cultural or linguistic knowledge (Spelke & Kinzler, 2007).

For five years, ARC-AGI-1 proved remarkably durable. The 2020 Kaggle competition attracted 913 teams, with the winning solution achieving only 20% accuracy through brute-force program search over hand-crafted primitives (ARC Prize Foundation, 2024). The benchmark successfully identified a genuine capability gap: while humans solved ARC-AGI-1 tasks in approximately 30 seconds on average, AI systems struggled despite massive computational resources.

The landscape shifted dramatically in late 2024. OpenAI's o3 system, employing extensive test-time reasoning, achieved breakthrough performance on ARC-AGI-1 (Chollet, 2024). This represented a genuine advance: unlike earlier systems that relied purely on pattern matching over training data, o3 demonstrated the ability to reason through novel problems at test time. The introduction of "Large Reasoning Models" (LRMs) marked a qualitative change in AI capabilities.

ARC-AGI-2, released in March 2025, responded by increasing reasoning complexity while maintaining the static grid format (ARC Prize Foundation, 2025). Tasks required deeper multi-step reasoning and symbolic interpretation, with human solve times increasing from 30 seconds to 300 seconds on average. The 2025 competition saw NVIDIA's team achieve 24% accuracy using synthetic data generation and test-time training on a 4-billion parameter model (Sorokin & Puget, 2025).

Known Limitations of Current AI Reasoning

The ARC-AGI series has revealed critical insights about both the capabilities and limitations of frontier AI systems. Modern LRMs enable automation effectively in domains meeting two conditions: the base model contains sufficient knowledge coverage of the domain, and the domain provides exact correctness feedback—what practitioners call "verifiable domains" (ARC Prize Foundation, 2026). This explains their considerable success with coding tasks, where programming environments provide immediate verification of solution correctness.

However, this capability pattern reveals an unexpected constraint: AI reasoning remains bound to knowledge domains. As the ARC Prize Foundation (2026) notes, "Human reasoning capability is not bound by domain knowledge. This leads to imprecise descriptions of LLMs as 'jagged intelligence', when in reality LLMs remain bound to task-specific training, albeit now over task-specific reasoning chains instead of the literal task data" (p. 3).

This observation challenges popular narratives about AI capabilities. Current systems have not transcended the fundamental limitation of being tied to their training distribution; they have simply expanded what can be learned through training to include reasoning patterns rather than just task solutions. This represents meaningful progress but falls short of the flexible, generalizable intelligence that characterizes human cognition.

Evidence of this limitation emerged clearly during ARC Prize Foundation verification of frontier models. When testing Gemini 3, researchers discovered the model spontaneously using the correct integer-to-color mapping specific to ARC-AGI tasks, despite no mention of ARC-AGI in the prompt (ARC Prize Foundation, 2026). This strongly suggests extensive representation of ARC-AGI patterns in the model's training data—enough to make correct inferences based solely on the structure of 2D integer arrays. Performance improvement came not from greater generalization capability but from denser sampling of the task space during training.

Organizational and Economic Consequences of the Intelligence Gap

The Automation Feasibility Boundary

The constraints on current AI capabilities create clear boundaries around which organizational functions can be effectively automated and which remain beyond reach. Understanding these boundaries proves essential for realistic strategic planning and investment decisions.

Automatable domains share specific characteristics. First, they possess well-defined correctness criteria that enable verification of system outputs. Software development, certain aspects of financial analysis, and routine diagnostic procedures meet this requirement. Second, relevant domain knowledge exists in sufficient quantity and quality within model training data. Third, the tasks involve primarily within-distribution reasoning—applying established patterns and rules rather than developing fundamentally novel approaches.

Organizations have achieved considerable success automating functions meeting these criteria. Coding assistants have demonstrated measurable productivity improvements for software engineers. Legal research tools effectively retrieve and synthesize relevant case law. Medical diagnostic systems identify patterns in imaging data with high accuracy. These applications create genuine economic value and represent appropriate targets for AI investment.

Non-automatable domains present different characteristics. They require navigating genuine novelty—situations lacking clear precedents in training data. They involve setting objectives rather than achieving predetermined goals. They demand integrating insights across disparate knowledge domains in ways not specifically represented in training examples. Strategic planning, creative innovation, and adaptive leadership exemplify such functions.

Consider pharmaceutical research. While AI systems excel at predicting molecular properties based on known chemical relationships—a verifiable domain with extensive training data—they struggle with the more fundamental challenge of identifying which novel research directions merit investigation. As Steve Hsu (2025) documented, specialized LRM systems have discovered new results in quantum physics, but these breakthroughs occurred in highly mechanistic domains with clear verification criteria. The more ambiguous, exploratory aspects of scientific innovation remain human-dependent.

Investment Implications and Resource Allocation

This capability boundary has direct implications for organizational investment strategies. The prevailing approach of maximizing AI deployment across all functions risks misallocating resources toward applications unlikely to deliver sustainable returns.

Organizations should concentrate AI investment in three tiers. Tier one includes highly verifiable domains with strong knowledge coverage where current systems excel—software development, data analysis, document processing, and similar structured tasks. These represent proven applications warranting aggressive investment. Tier two encompasses domains with partial verifiability where AI augments rather than replaces human judgment—research synthesis, preliminary analysis, option generation. These benefit from measured investment paired with robust human oversight. Tier three involves non-verifiable or highly novel domains where human intelligence remains essential—strategic direction-setting, managing unprecedented situations, creative innovation. AI investment here should focus on support tools rather than autonomous systems.

The cost structure of deploying frontier AI systems also demands careful analysis. Running comprehensive evaluations on interactive benchmarks like ARC-AGI-3 using high-reasoning model APIs can cost tens of thousands of dollars, even with action limits imposed (ARC Prize Foundation, 2026). While such costs will decline as models become more efficient, they establish a clear economic boundary: applications must generate sufficient value to justify both initial development and ongoing operational expenses. Many proposed AI automation initiatives fail this cost-benefit test when analyzed rigorously.

Competitive Dynamics and Strategic Positioning

The uneven distribution of AI capabilities across domains creates predictable competitive dynamics. Organizations competing primarily in automatable domains face intense AI-driven competition and compressed profit margins. As capabilities become widely available through commercial APIs, sustainable advantage increasingly depends on execution excellence, data quality, and integration effectiveness rather than mere access to AI technology.

Conversely, organizations whose competitive advantage relies heavily on non-automatable capabilities—strategic insight, creative innovation, adaptive response to novel situations—enjoy temporary insulation from AI disruption. However, this insulation should not breed complacency. The boundary between automatable and non-automatable domains will shift as AI capabilities advance. Forward-looking organizations identify approaching boundary shifts and position accordingly.

Financial services illustrate these dynamics clearly. Algorithmic trading, fraud detection, and credit risk assessment—highly structured domains with clear verification criteria—have undergone extensive AI-driven transformation. However, relationship banking, complex deal structuring, and crisis response in unprecedented market conditions continue requiring substantial human judgment. Successful financial institutions increasingly bifurcate their talent strategies: aggressive automation of routine analysis paired with concentrated investment in uniquely human capabilities.

Evidence-Based Organizational Responses to the Intelligence Gap

Table 1: Evolution and Performance of ARC-AGI Benchmark Series

Benchmark Version	Release Year	Key Intelligence Focus	Task Complexity & Format	Top AI System Performance	Human Success Rate	Core Capabilities Tested
ARC-AGI-1	2019	Fluid intelligence and adaptation efficiency via novel transformation rules	Static grids (up to 30x30) with few input-output examples; 10 colors	OpenAI o3 (breakthrough high score), 2020 Kaggle winner (20%)	Solved in ~30 seconds on average	Objectness, basic geometry, simple physics (core knowledge priors)
ARC-AGI-2	2025	Increased reasoning complexity and symbolic interpretation	Static grid format with deeper multi-step reasoning requirements	NVARC (NVIDIA team) at 24% accuracy	Solved in ~300 seconds on average	Multi-step reasoning and symbolic interpretation
ARC-AGI-3	2026	Agentic intelligence, exploration, and adaptive planning in novel environments	Interactive environments; exploration without explicit instructions; unknown unknowns	Frontier AI systems (including Gemini 3) scoring below 1%	100%	Goal inference, adaptive planning, building internal models, and autonomous navigation

Realistic Capability Assessment and Deployment Frameworks

Organizations implementing AI systems should adopt structured frameworks for assessing which applications will succeed and which will disappoint. The ARC-AGI research provides a useful template: rigorous human baseline establishment, careful measurement of system performance against that baseline, and honest acknowledgment when systems fall short.

Verification-based deployment protocols establish clear success criteria before system deployment. For any proposed AI application, organizations should explicitly answer: What constitutes correct performance? How will we verify correctness? What are the consequences of undetected errors? Systems deployed in domains lacking clear answers to these questions require extensive human oversight, often negating claimed efficiency benefits.

Leading technology companies have internalized these principles. When Amazon developed AI systems for warehouse operations, they established quantifiable performance metrics and maintained human oversight for exceptions (industry practice). When systems encountered situations outside their training distribution, human operators intervened rather than allowing autonomous operation. This protocol acknowledges the bounded nature of AI capabilities while still capturing efficiency benefits in routine scenarios.

Adaptive deployment strategies recognize that AI capabilities and limitations will evolve. Rather than implementing static AI solutions, organizations should design systems that gracefully degrade when encountering novel situations, clearly signal when human judgment becomes necessary, and systematically learn from edge cases. This approach treats AI deployment as an ongoing process rather than a one-time implementation.

Building Human-AI Complementarity

The performance gap between current AI systems and humans on genuinely novel tasks suggests that optimal organizational design emphasizes complementarity rather than substitution. Humans contribute capabilities AI systems lack—flexible adaptation to novelty, efficient learning from minimal examples, and autonomous goal-setting. AI systems provide capabilities humans struggle with—processing vast data at scale, maintaining consistency across repetitive tasks, and applying learned patterns without fatigue.

Role redesign for complementarity restructures work to exploit these different capability profiles. Rather than asking "which roles can AI replace?" organizations should ask "how can we redesign workflows so humans and AI systems each handle tasks matching their capabilities?" This often leads to unexpected configurations.

Consider medical diagnosis. Rather than attempting to replace physicians with diagnostic AI, leading healthcare systems position AI as a "second opinion" provider that flags potential issues for physician review. The AI system processes imaging data comprehensively, identifying patterns potentially overlooked in time-constrained human review. Physicians contribute contextual judgment, patient relationship, and adaptive reasoning when atypical presentations arise. This configuration outperforms either humans or AI operating independently.

Skill development for the complementarity model requires reconsidering organizational training priorities. As routine analysis becomes AI-automated, human value increasingly concentrates in capabilities AI systems lack. Organizations should therefore invest systematically in developing employee skills in problem framing, novel situation navigation, cross-domain integration, and adaptive learning. These "meta-skills" prove durable even as specific technical capabilities shift between human and machine domains.

Strategic Transparency and Stakeholder Communication

The hype-reality gap surrounding AI capabilities creates strategic communication challenges. Overstatement of AI capabilities leads to unrealistic stakeholder expectations, misallocated resources, and eventual credibility damage. Understatement causes organizations to miss legitimate opportunities and fall behind more aggressive competitors.

Evidence-based communication protocols ground claims about AI capabilities in concrete, measurable evidence. Rather than abstract assertions about "artificial intelligence" transforming operations, effective communication specifies: precisely which tasks systems can perform, under what conditions, with what error rates, and with what oversight requirements. This specificity enables realistic expectation-setting while still acknowledging genuine capabilities.

The ARC Prize Foundation's approach to benchmark communication provides a useful model. They explicitly distinguish between performance on tasks similar to training data versus performance on genuinely novel tasks, clearly explain scoring methodologies and their limitations, and publish detailed human baseline data for comparison (ARC Prize Foundation, 2026). This transparency serves long-term credibility even when revealing current limitations.

Scenario-based planning for capability evolution acknowledges that AI capabilities will advance, though timing and trajectory remain uncertain. Rather than making point predictions about when specific capabilities will arrive, organizations should develop contingency plans for multiple scenarios: gradual incremental improvement, sudden breakthrough in specific domains, or extended plateau at current capability levels. This approach enables agile response regardless of how AI capabilities actually evolve.

Building Long-Term Organizational Intelligence Capabilities

Developing Adaptive Learning Systems

The ARC-AGI-3 benchmark evaluates intelligence through adaptation efficiency—how rapidly systems acquire new capabilities in unfamiliar domains (ARC Prize Foundation, 2026). While current AI systems struggle with this challenge, organizations themselves face parallel demands for adaptive learning. The most successful organizations will likely be those that develop systematic capabilities for efficient skill acquisition at the organizational level.

Exploration protocols for novel domains establish structured approaches for entering unfamiliar territory. Rather than expecting comprehensive analysis before action in genuinely novel situations, these protocols embrace controlled experimentation: rapid low-cost probes that generate learning, clear decision rules for continuing or abandoning exploration, and systematic capture of insights for organizational learning. This approach parallels how ARC-AGI-3 evaluates agentic systems through their exploration behavior.

Manufacturing organizations have implemented this principle through "lighthouse" facilities—small-scale implementations of novel technologies or processes that generate learning before broad deployment. When BMW explored additive manufacturing applications, they established specialized teams working on clearly bounded projects with well-defined learning objectives. Insights from these explorations informed broader manufacturing strategy without betting the organization on unproven approaches (industry example illustrating principle).

Rapid feedback mechanisms accelerate organizational learning by compressing the cycle between action and evaluation. The ARC-AGI-3 scoring methodology emphasizes action efficiency precisely because actions represent the fundamental unit of learning in interactive environments (ARC Prize Foundation, 2026). Organizations should similarly optimize for learning speed rather than merely eventual success.

Technology development teams at organizations like Spotify have embraced this principle through squad-based organizational structures. Small autonomous teams deploy changes rapidly, measure user response through quantified metrics, and iterate based on feedback. The system prioritizes learning speed, accepting that individual experiments may fail while ensuring the portfolio generates systematic knowledge accumulation (well-documented organizational practice).

Cultivating Decision-Making Resilience

ARC-AGI-3 evaluates not just whether systems eventually solve problems, but whether they do so efficiently and without excessive resource consumption (ARC Prize Foundation, 2026). The benchmark's action-counting methodology penalizes brute-force approaches that blindly try numerous options. Organizational decision-making faces parallel challenges: how to maintain effectiveness while avoiding analysis paralysis or resource-exhausting trial-and-error.

Bounded experimentation frameworks enable systematic learning while controlling resource exposure. These frameworks establish clear parameters: maximum resources committed to exploration, criteria for determining when sufficient information exists for decisions, and escalation protocols when situations exceed predefined boundaries. The frameworks acknowledge that genuinely novel situations may lack perfect information while avoiding paralysis or reckless commitment.

Financial institutions employ this approach when evaluating emerging technologies. Rather than committing immediately to full-scale implementation or rejecting innovations entirely, they often conduct time-bound, resource-limited pilots with predetermined evaluation criteria. If pilots meet success thresholds, investment scales accordingly. If they reveal fundamental limitations, resources redirect before substantial losses accumulate.

Strategic option preservation recognizes that value often resides in maintaining flexibility amid uncertainty. When facing genuinely novel situations where AI systems provide limited guidance—precisely the scenarios where current AI struggles most—organizations should consider whether decisions can be structured to preserve options rather than forcing premature commitment. This principle directly contradicts the "move fast and break things" ethos but aligns with evidence about effective decision-making under deep uncertainty.

Maintaining Human Intelligence Infrastructure

As organizations increasingly incorporate AI systems, maintaining and developing human intelligence capabilities paradoxically becomes more critical rather than less. The ARC-AGI-3 results demonstrate that humans possess cognitive capabilities—flexible adaptation to novelty, efficient learning from minimal examples, autonomous goal-setting—that current AI systems lack despite massive computational resources (ARC Prize Foundation, 2026). Organizations that allow these uniquely human capabilities to atrophy through over-reliance on AI systems risk severe vulnerability when situations exceed AI limitations.

Cognitive diversity preservation maintains organizational capability across different reasoning styles and knowledge domains. Research on collective intelligence consistently demonstrates that cognitively diverse teams outperform homogeneous teams on complex novel problems, even when homogeneous teams contain higher-performing individuals (Page, 2007). As AI systems standardize certain types of analysis, preserving human cognitive diversity becomes a strategic priority.

Organizations can implement this through deliberate hiring and development practices. Rather than optimizing purely for candidates whose skills align closely with current AI-augmented workflows, forward-looking organizations maintain pools of talent with capabilities difficult to replicate through AI—cross-domain synthesis, creative problem reframing, and adaptive response to genuine novelty.

Deep work capacity maintenance preserves human ability for sustained concentration and complex reasoning. As AI systems handle routine information processing, human value increasingly concentrates in difficult cognitive work requiring sustained attention. However, organizational cultures emphasizing constant connectivity and rapid response actively undermine this capability. Leading organizations deliberately structure work to protect time for sustained concentration, recognizing this as increasingly scarce and valuable.

Microsoft Research has documented internal efforts to preserve "focus time" for engineers and researchers amid growing communication demands. By establishing norms around asynchronous communication, blocking calendar time for deep work, and reducing meeting defaults, the organization protects cognitive capacity for work AI systems cannot yet replicate (organizational practice example).

Conclusion

The introduction of ARC-AGI-3 and the dramatic performance gap between humans and frontier AI systems illuminates a crucial reality: the journey to artificial general intelligence remains far longer than optimistic projections suggest. While contemporary AI systems demonstrate impressive capabilities within specific domains—particularly those with clear verification criteria and extensive training data—they struggle profoundly when confronting genuinely novel situations requiring exploration, autonomous goal-setting, and efficient adaptation.

For organizational leaders, this intelligence gap carries clear implications. AI investment should concentrate in domains meeting specific criteria: verifiable correctness, substantial training data coverage, and primarily within-distribution reasoning demands. Applications in these domains generate legitimate value and warrant aggressive pursuit. However, functions requiring flexible adaptation to novelty, strategic judgment in ambiguous situations, or creative innovation across knowledge domains remain predominantly human endeavors for the foreseeable future. Organizational strategies assuming imminent AI automation of these capabilities risk serious misallocation of resources.

The most successful organizations will likely be those that develop sophisticated understanding of both AI capabilities and limitations, implementing AI systems strategically where they excel while deliberately preserving and developing uniquely human capabilities. This balanced approach acknowledges AI as a powerful tool for specific applications rather than a near-term replacement for human intelligence. It treats organizational adaptation and learning as ongoing processes rather than one-time technology implementations.

As we look toward the future, the ARC-AGI-3 benchmark provides a concrete metric for genuine progress toward more general artificial intelligence. When AI systems begin approaching human-level efficiency on these truly novel tasks—matching not just eventual success but speed of learning and resource efficiency—it will signal a qualitative shift in capabilities. Until that milestone, the artificial intelligence revolution remains a story of powerful but bounded tools rather than genuine rivals to human adaptability.

The organizational imperative is clear: harness AI's considerable strengths within appropriate domains while building and maintaining the uniquely human capabilities that current systems cannot replicate. This dual focus positions organizations to capture near-term AI benefits while remaining resilient in the face of situations that exceed AI limitations—situations that, as ARC-AGI-3 demonstrates, remain far more common than popular narratives suggest.

Research Infographic

Navigating the AI Unknown Slide Deck

References

ARC Prize Foundation. (2024). ARC Prize 2024 competition.
ARC Prize Foundation. (2025). ARC Prize 2025 competition.
ARC Prize Foundation. (2026). ARC-AGI-3: A new challenge for frontier agentic intelligence. arXiv preprint arXiv:2603.24621.
Chollet, F. (2019). On the measure of intelligence. arXiv preprint arXiv:1911.01547.
Chollet, F. (2024). OpenAI o3 breakthrough high score on ARC-AGI-Pub. ARC Prize Foundation.
Hsu, S. (2025). Post on LRM automation discovering novel results in quantum physics [Social media post].
Page, S. E. (2007). The difference: How the power of diversity creates better groups, firms, schools, and societies. Princeton University Press.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.
Sorokin, I., & Puget, J.-F. (2025). NVARC solution to ARC-AGI-2 2025. Technical report.
Spelke, E. S., & Kinzler, K. D. (2007). Core knowledge. Developmental Science, 10(1), 89–96.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain-of-thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.

Jonathan H. Westover, PhD is Chief Research Officer (Nexus Institute for Work and AI); Associate Dean and Director of HR Academic Programs (WGU); Professor, Organizational Leadership (UVU); OD/HR/Leadership Consultant (Human Capital Innovations). Read Jonathan Westover's executive profile here.

Suggested Citation: Westover, J. H. (2026). Organizational AI Transparency and Employee Resilience: Building Trust, Autonomy, and Confidence in Hybrid Work. Human Capital Leadership Review, 27(4). doi.org/10.70175/hclreview.2020.27.4.3