Adaptive AI Tutoring in Education: Leveraging Large Language Models and Reinforcement Learning to Transform Personalized Learning at Scale

Jonathan H. Westover, PhD
May 25
20 min read

Listen to this article:

Abstract: Educational institutions face mounting pressure to deliver personalized learning experiences that sustain student engagement while accommodating diverse learning speeds and backgrounds. While generative AI chatbots have attracted considerable attention as tutoring tools, emerging evidence suggests that reactive question-answering alone may be insufficient to optimize learning outcomes. This article examines how tightly integrating large language model (LLM)-guided reinforcement learning with AI tutoring platforms can substantially improve educational outcomes. Drawing on a five-month randomized controlled trial involving 770 high school students across ten schools in Taipei, we demonstrate that adaptive problem sequencing—informed by rich behavioral signals from student-chatbot interactions and code-editing patterns—increased final exam performance by 0.15 standard deviations compared to fixed sequencing. Mediation analysis revealed that these gains operated primarily through sustained student engagement rather than increased practice volume or uniformly harder content. The findings suggest that organizations implementing AI-assisted learning systems should prioritize proactive guidance mechanisms alongside conversational interfaces, with particular attention to extracting actionable intelligence from learner-system interactions. This evidence-based approach offers a scalable framework for workforce development, digital literacy initiatives, and educational equity efforts.

The accelerating pace of technological change has intensified demands on educational systems and workplace learning programs alike. As automation and artificial intelligence reshape labor markets (Eloundou et al., 2024), organizations and institutions must rapidly develop capabilities for continuous reskilling and adaptive learning delivery. Generative AI platforms, particularly those built on large language models, have emerged as potentially transformative tools for scaling personalized instruction (Extance, 2023). However, early implementations often default to reactive chatbot tutors that simply respond to learner queries—an approach that may underutilize the technology's full potential.

Several converging trends make this challenge particularly urgent. First, traditional educational technology deployments have frequently struggled to deliver sustained learning gains, with many initiatives showing initial promise that fades as novelty effects dissipate (Tsay et al., 2020; Reich, 2014). Second, evidence from decades of educational psychology research indicates that effective learning requires more than information access—it demands productive struggle with appropriately challenging materials that match learners' evolving capabilities (Warshauer, 2015; Kapur, 2008). Third, learners often lack the metacognitive skills needed to effectively self-regulate their learning progression (Bjork et al., 2013; Dunlosky & Lipko, 2007), making it difficult for purely reactive systems to optimize outcomes.

This article presents evidence that organizations can substantially improve AI-assisted learning outcomes by proactively sequencing learning activities based on continuous assessment of learner state. Specifically, we examine how reinforcement learning algorithms—guided by rich behavioral signals extracted via LLMs—can adaptively personalize problem difficulty while maintaining learner engagement across extended time horizons. The approach moves beyond conventional learning management systems by treating every learner interaction as a potential signal for optimizing subsequent experiences.

The Adaptive Educational Technology Landscape

Defining Adaptive Learning in Contemporary Context

Adaptive learning systems represent a significant evolution beyond earlier computer-assisted instruction approaches. At their core, these systems aim to automatically adjust instructional parameters—such as content difficulty, pacing, or pedagogical approach—based on real-time assessment of learner performance and engagement (Doroudi et al., 2019). The theoretical foundation draws heavily from educational psychology constructs including mastery learning (Bloom, 1968; Kulik et al., 1990), Vygotsky's zone of proximal development (Vygotsky, 1978), and Bjork's desirable difficulty framework (Bjork & Bjork, 2011).

Traditional implementations of these principles required intensive instructor oversight and manual customization (Kulik et al., 1979; Slavin, 1987). The promise of algorithmic adaptation lies in automating this personalization process while maintaining pedagogical effectiveness. However, a fundamental challenge has constrained progress: accurately estimating a learner's current knowledge state from limited observable signals.

Early algorithmic approaches, such as Bayesian Knowledge Tracing (Corbett & Anderson, 1994), relied on binary signals—whether a student answered correctly or incorrectly—to maintain a simplified binary knowledge state representing whether a concept had been mastered. While computationally tractable, this approach sacrifices substantial information. Students rarely achieve correct solutions on first attempts, particularly in domains like programming where syntax errors are common regardless of conceptual understanding. Furthermore, binary representations cannot capture the nuanced progression of skill development or accommodate the substantial heterogeneity in learning speeds observed across student populations.

Current State of Practice and Research

Contemporary adaptive learning implementations span a wide range of sophistication levels. Commercial intelligent tutoring systems like Carnegie Learning's Cognitive Tutor and ASSISTments have demonstrated positive effects in some evaluations (Koedinger et al., 1997; Roschelle et al., 2016), though these systems typically bundle adaptive sequencing with other instructional supports, making it difficult to isolate the specific contribution of adaptation. Recent meta-analytic evidence synthesizing educational data mining approaches suggests that while adaptive systems show promise, effect sizes remain modest and implementation-dependent (Šarić-Grgić et al., 2024).

Several factors complicate rigorous evaluation of adaptive learning systems. Many experimental studies compare adaptive algorithms against weak baselines such as random sequencing rather than against pedagogically sound fixed sequences (Segal et al., 2018; Shen & Chi, 2016). Others conflate multiple interventions—for example, combining adaptive sequencing with teacher professional development or enhanced instructional materials—making it difficult to attribute gains to the adaptation mechanism specifically (Pane et al., 2014). Sample sizes in controlled evaluations often remain small, with typical studies involving fewer than 100 students per condition (Wang & Fan, 2025), and study durations frequently span only a few weeks rather than full academic terms.

Recent advances in generative AI have opened new possibilities for enriching the signals available to adaptive algorithms. Rather than relying solely on correctness indicators or time-on-task metrics, systems can now analyze rich interaction traces including conversational exchanges with AI tutors, patterns in code or written work revisions, and qualitative indicators of engagement or confusion. However, realizing this potential requires carefully designed system architectures that tightly integrate these signal extraction capabilities with decision-making algorithms.

Organizational and Individual Consequences of Learning System Design

Organizational Performance Impacts

The design choices embedded in learning technology systems carry significant implications for organizational outcomes. For educational institutions, the stakes involve both immediate metrics—such as course completion rates, assessment performance, and credential attainment—and longer-term indicators including subsequent academic success and career readiness. For corporations investing in employee development, parallel concerns include training completion, skill transfer to work tasks, and return on learning investment.

Engagement attrition represents one of the most persistent challenges in digital learning contexts. Educational technology initiatives consistently observe sharp declines in learner participation following initial enthusiasm, a pattern termed the "novelty effect" (Eysenbach, 2005). This attrition undermines return on investment in learning infrastructure and limits the potential for technology-mediated learning to expand access to quality education. Evidence from massive open online courses illustrates the severity: typical completion rates hover around 5-15% despite sophisticated production values and expert instruction (Papadakis, 2023). While engagement challenges stem from multiple factors—including insufficient learner motivation, competing demands on time, and lack of real-time support—poorly calibrated learning experiences that fail to maintain appropriate challenge levels represent a key modifiable factor.

When learning systems assign content that is either too difficult (causing frustration and disengagement) or too easy (causing boredom), learners disengage or persist inefficiently. Organizations consequently observe both direct costs (wasted instructional time, reduced learning outcomes) and opportunity costs (foregone productivity gains from skill development). The challenge intensifies when learner populations exhibit heterogeneous baseline capabilities and learning speeds, as a one-size-fits-all approach inevitably serves some learners poorly.

Individual Learner Experience and Outcomes

From the learner perspective, poorly adapted instructional experiences manifest as cognitive overload, diminished self-efficacy, and reduced intrinsic motivation. Cognitive load theory suggests that working memory capacity constraints limit how much information learners can productively process at once (Bjork & Bjork, 2011). When practice materials exceed current capability levels without appropriate scaffolding, learners may struggle unproductively—expending effort without building mental models that transfer to novel problems. Conversely, materials that fail to provide adequate challenge underutilize learners' capacity for growth and may fail to trigger the deep processing required for durable learning.

Self-regulated learning research indicates that learners often struggle to accurately assess their own knowledge gaps and select practice activities that optimally address those gaps (Bjork et al., 2013). This metacognitive challenge becomes particularly acute in technology-mediated contexts where learners work independently without real-time instructor guidance. Reactive AI chatbots, while potentially helpful for answering specific questions, place the burden of formulating productive queries on the learner—a skill that itself requires development.

The equity implications deserve particular attention. Research on adaptive learning systems indicates that poorly designed adaptation can exacerbate existing achievement gaps, with algorithmic decision-making potentially compounding disadvantages for learners who begin with weaker foundational skills (Doroudi et al., 2019). However, thoughtfully designed systems that accommodate heterogeneous learning speeds and provide appropriate challenge levels may conversely help close gaps by enabling learners from less privileged backgrounds to progress at personalized paces without the resource-intensive human tutoring typically accessible only to advantaged populations.

Evidence-Based Organizational Responses

Table 1: Adaptive AI Learning Implementations and Methodologies

Implementation or Methodology	Institution or Organization	Core Technology or Algorithm	Signal Extraction Methods	Key Findings or Principles	Target Learner Population	Effectiveness Metric (Inferred)
Adaptive Problem Sequencing (Taipei Randomized Controlled Trial)	University of Pennsylvania & Taipei City Government	LLM-guided Reinforcement Learning (RL), POMDP, Bayesian Particle Filtering, Model Predictive Control	LLM-based semantic code revision analysis and conversational quality assessment (LLM-as-a-judge)	Adaptive sequencing increases performance primarily through sustained engagement; proactive guidance outperforms reactive chatbots; efficacy is highest for beginners.	770 High school students (Taipei)	$0.15$ standard deviation increase in final exam performance overall; $0.215$ SD for beginners.
Multi-stage RAG Pipeline for Content Generation	University of Pennsylvania & Taipei City Government	Retrieval-Augmented Generation (RAG) using GPT-4 and Claude	Automated correctness checking and iterative LLM refinement (Self-validation workflows)	Human-in-the-loop RAG balances scale with quality; secondary LLMs can refine problems that fail initial validation.	Programming students	High-scale problem bank creation with maintained pedagogical quality after expert review.
Adaptive Reskilling (Linear Algebra)	Amazon	Reinforcement Learning (Adaptive Scheduling)	Tracking consultation of reference materials and persistence with optional examples	Appropriately sequenced activities lead to more productive engagement with supplementary materials; matched difficulty increases persistence.	Amazon employees (Workforce training)	Increased engagement with optional resources and higher persistence in challenging concepts.
Socratic Questioning Tutoring Bots	Brown University	LLM (Socratic Prompting)	Structured dialogue analysis of learner reasoning	Guiding learners to discover solutions via questions produces stronger gains than providing worked examples.	Mathematics learners	Stronger learning gains compared to direct answer-provision or worked example systems.
Scaffolded Chatbot Tutor Design	University of Pennsylvania	GPT-4 (Strategic Prompt Engineering)	Effort-conditional assistance monitoring (requiring student articulation of attempts)	Chatbots should avoid direct answers; prompts should require students to demonstrate effort before receiving graduated hints.	Python programming students	Increased learner self-regulation and conceptual articulation.
Traditional Intelligent Tutoring Systems (ITS)	Carnegie Learning (Cognitive Tutor) & ASSISTments	Bayesian Knowledge Tracing (BKT) / Rule-based adaptation	Binary correctness indicators and time-on-task metrics	Effective but often bundle adaptation with other supports; traditional binary signals sacrifice information compared to LLM traces.	K-12 and Higher Education students	Modest and implementation-dependent positive effect sizes.

Proactive Sequencing Through Reinforcement Learning

Organizations implementing AI-assisted learning should consider algorithmic approaches that proactively guide learners through appropriately sequenced learning activities. Reinforcement learning (RL) frameworks offer a principled method for this optimization challenge, treating the sequencing decision as a sequential optimization problem where the system learns to select actions (practice problems) that maximize long-term learning outcomes (Zhou et al., 2017; Shen et al., 2018).

Pennsylvania Implementation. University of Pennsylvania researchers partnered with Taipei City Government to deploy an RL-based adaptive tutoring system for teaching Python programming. The system formulated problem sequencing as a Partially Observed Markov Decision Process (POMDP), maintaining continuous estimates of each learner's knowledge state through Bayesian particle filtering. Rather than simple binary mastery indicators, the knowledge state captured three parameters: initial performance level, learning speed, and mastery ceiling. The system used model predictive control to select optimal problem difficulty at each step, simulating forward trajectories to estimate long-term value of each decision.

Various effective approaches include:

Particle filtering for belief state estimation: Maintaining distributions over plausible learner models rather than point estimates
Model predictive control: Selecting actions by simulating forward trajectories under different decision sequences
Behavioral signal integration: Using LLMs to extract meaningful features from learner-system interactions
Warm-start initialization: Leveraging historical data to accelerate convergence of learner models
Continuous knowledge state representation: Moving beyond binary mastery indicators to capture learning progression granularity

Leveraging Rich Behavioral Signals

A critical enabler for effective adaptive systems involves extracting maximum information from learner interactions with the system. Traditional approaches rely on coarse proxies such as correctness or time-on-task. Contemporary systems can leverage LLMs to analyze much richer behavioral traces.

Code Revision Analysis. In programming education contexts, learners generate detailed edit histories as they iteratively develop solutions. However, not all code revisions reflect meaningful learning progression—many involve formatting changes, typo corrections, or other superficial modifications. The Pennsylvania system employed LLMs to distinguish meaningful solution attempts from trivial edits, comparing successive code submissions to identify functional changes. This filtered signal provided a more accurate measure of learning effort than raw submission counts, which could be inflated by students who frequently resubmit minor variations.

Conversational Quality Assessment. When learners interact with AI tutors, the nature of those interactions reveals important information about their learning approach. LLM-as-a-judge architectures (Zheng et al., 2023) enable systems to distinguish productive help-seeking (requesting conceptual explanations, asking for hints after demonstrating effort) from less productive patterns (immediately requesting answers without attempting problems). The Pennsylvania implementation used this assessment to inform knowledge state estimates, recognizing that learners who engage in higher-quality conversations demonstrate more effective self-regulated learning behaviors.

Various effective approaches for signal extraction include:

Semantic code analysis: Using LLMs to identify functionally meaningful versus superficial code changes
Dialogue quality scoring: Assessing whether learner queries reflect conceptual engagement or answer-seeking
Error pattern analysis: Distinguishing conceptual misunderstandings from syntax or careless errors
Temporal engagement patterns: Analyzing time allocation and persistence across learning sessions
Cross-validation with learning outcomes: Empirically validating that extracted features predict performance

Integrated System Architecture

Effective adaptive learning systems require tight integration across multiple components. Siloed implementations—where, for example, chatbot tutoring operates independently from content sequencing—miss opportunities for synergistic effects. Evidence suggests that problem sequencing influences the nature and quality of learner-tutor interactions, creating feedback loops that can be either virtuous or vicious.

Amazon Workforce Training. When Amazon deployed an adaptive learning system for reskilling employees in linear algebra, they observed that appropriately sequenced learning activities led to more productive engagement with supplementary materials (Bassen et al., 2020). Employees assigned activities matched to their current skill level were more likely to consult reference materials, work through optional examples, and persist through challenging concepts. The integration of adaptive sequencing with comprehensive support resources created mutually reinforcing effects.

Various architectural principles for integrated systems include:

Unified learner models: Maintaining consistent knowledge state estimates across all system components
Bidirectional information flow: Using chatbot interactions to inform sequencing; using sequencing decisions to contextualize tutoring
Coherent user experience: Ensuring learners perceive the system as a coherent whole rather than disconnected tools
Shared instrumentation: Capturing learner behaviors consistently across all interaction modalities
Coordinated optimization: Aligning reward functions and objectives across subsystems

Generative AI for Content Development

Adaptive learning systems require extensive problem banks to enable meaningful personalization. Traditional approaches to content development—relying entirely on human experts—struggle to scale to the hundreds or thousands of practice items needed for granular difficulty variation. Retrieval-augmented generation (RAG) approaches (Lewis et al., 2020) offer a middle path that combines LLM capabilities with human oversight.

Taiwan Implementation. The Pennsylvania-Taipei partnership employed a multi-stage RAG pipeline for practice problem generation. The system retrieved relevant instructional materials, prompted GPT-4 to generate candidate problems with solutions and test cases, validated outputs through automated correctness checking, used a secondary LLM (Claude) to iteratively refine problems that failed validation, and required teaching assistants to review all validated problems before deployment. This human-in-the-loop approach balanced scale with quality assurance.

Various content development approaches include:

Difficulty-targeted generation: Explicitly specifying desired difficulty levels in generation prompts
Concept alignment verification: Using RAG to ground generated content in curriculum materials
Self-validation workflows: Employing separate LLM instances to critique and refine generated problems
Automated test case generation: Creating comprehensive assessment rubrics alongside problems
Expert review protocols: Establishing clear criteria and workflows for human quality control
Iterative refinement: Updating problem banks based on learner performance and feedback data

Scaffolded Chatbot Tutor Design

The design of conversational AI tutors significantly influences learning effectiveness. Research on prompt engineering for educational contexts indicates that generic conversational models often default to providing direct answers rather than guiding learners through productive problem-solving processes (Bastani et al., 2025). Strategic prompt design can establish guardrails that align chatbot behavior with pedagogical principles.

Implementation Across Education Sectors. Various educational initiatives have developed prompting frameworks that emphasize conceptual scaffolding over answer provision. The University of Pennsylvania system instructed its GPT-4-based tutor to avoid providing direct solutions unless students first demonstrated substantial effort. The chatbot would first redirect answer-seeking queries by asking students to articulate what they had already tried, what specific aspect confused them, or what approaches they had considered. Only after this engagement would the system provide graduated hints.

Brown University's research on tutoring bots for mathematics similarly found that systems prompted to employ Socratic questioning—guiding learners to discover solutions through structured dialogue—produced stronger learning gains than systems that provided worked examples on request (Wang et al., 2024). These findings suggest that the pedagogical framework embedded in system prompts materially affects learning outcomes.

Various chatbot scaffolding strategies include:

Effort-conditional assistance: Requiring learners to demonstrate problem-solving attempts before receiving hints
Graduated hint sequences: Providing progressively more specific guidance rather than complete solutions
Metacognitive prompting: Asking learners to articulate their reasoning and identify knowledge gaps
Error diagnosis guidance: Teaching learners systematic debugging approaches rather than identifying specific errors
Socratic dialogue patterns: Using questions to guide learners toward insights rather than declarative instruction
Contextual scaffolding: Adjusting support levels based on estimated learner knowledge state

Data Infrastructure and Privacy Governance

Implementing adaptive learning systems that leverage detailed behavioral data requires robust technical infrastructure and clear governance frameworks. Organizations must address several challenges simultaneously: capturing fine-grained interaction data, storing it securely, processing it in near-real-time to inform adaptive decisions, and protecting learner privacy throughout.

Government Partnership Model. The Taipei implementation integrated the research platform with the government's official e-learning system (Taipei CooC-Cloud), which employed government-authenticated accounts linked to students' official academic records. This authentication mechanism addressed multiple objectives: it reduced spillover concerns (students could not easily access alternate accounts), adhered to Taiwan's Personal Data Protection Act by limiting researcher access to de-identified data, and established clear accountability through formal memoranda of understanding between institutions.

Organizations considering similar implementations should note several critical elements:

Authentication and access control: Implementing robust identity management to prevent unauthorized access
Data minimization: Collecting only information necessary for system functionality and research objectives
De-identification workflows: Establishing clear protocols for separating personally identifiable information from analytical datasets
Consent and transparency: Obtaining informed consent and clearly communicating data practices to learners
Security standards: Implementing appropriate technical safeguards for data storage and transmission
Institutional partnerships: Establishing formal agreements that clarify responsibilities across organizational boundaries
Regulatory compliance: Ensuring alignment with applicable privacy regulations and educational data governance requirements

Building Long-Term Adaptive Learning Capabilities

Sophisticated Knowledge State Modeling

Organizations seeking to implement effective adaptive learning systems must move beyond simplistic representations of learner knowledge. The difference between viewing mastery as binary (learned vs. not learned) versus continuous (capturing gradual progression toward fluency) fundamentally shapes what kinds of personalization become possible. More sophisticated knowledge models enable more nuanced sequencing decisions but require correspondingly richer observational data.

Conceptually, effective knowledge state representations should capture several dimensions: current performance level on reference tasks, rate of learning from practice (enabling prediction of future states), asymptotic performance ceiling under mastery, and potentially domain-specific factors such as particular misconceptions or skill gaps. These representations need not be directly interpretable to instructors—indeed, the most effective representations may involve learned embeddings from neural networks—but they must accurately predict learner performance on novel problems.

The computational challenges of maintaining and updating rich knowledge state estimates can be addressed through techniques from the probabilistic inference and optimal control literatures. Particle filtering methods (Thrun, 1999; Arulampalam et al., 2002) enable systems to maintain probability distributions over high-dimensional state spaces by representing beliefs as weighted collections of plausible states. As new observations arrive, Bayesian updating adjusts these weights to concentrate probability mass on states consistent with observed behavior. Model predictive control frameworks (Garcia et al., 1989) enable action selection even when exact optimal policies are computationally intractable, by using sampling-based forward simulation to estimate action values.

Organizations implementing such systems should anticipate that knowledge model sophistication involves tradeoffs: more complex models may capture learner heterogeneity more accurately but require more data to achieve reliable estimates and greater computational resources for real-time inference.

Sustaining Engagement Through Appropriate Challenge

One of the most consistent findings from the educational technology literature is that maintaining learner engagement over extended periods requires more than novel interfaces or gamification elements. Sustained engagement appears to emerge from experiences of productive struggle—where tasks are challenging enough to require genuine cognitive effort yet achievable with sustained focus (Warshauer, 2015). This principle underlies Vygotsky's zone of proximal development construct and is central to desirable difficulty frameworks.

Adaptive systems operationalize this principle by continuously calibrating difficulty to match evolving learner capabilities. However, the calibration process itself involves subtle tradeoffs. Advancing difficulty too slowly risks boring learners and wasting time on already-mastered concepts. Advancing too quickly risks frustration and disengagement. The optimal pace varies substantially across learners, depending on factors including prior knowledge, learning speed, and motivational orientation.

The Taipei implementation addressed this challenge by encoding a shaped reward function that rewarded the system for maintaining learners in an appropriate difficulty range relative to their estimated knowledge state. Rather than simply maximizing long-run knowledge state, the reward function explicitly valued keeping learners engaged with appropriately challenging materials throughout the learning trajectory. This design choice reflects the recognition that engagement is both a mechanism for learning and a valuable outcome in itself—learners who remain engaged persist longer, complete more learning activities, and ultimately achieve better outcomes.

Closing Achievement Gaps Through Personalized Pacing

A critical question for any adaptive learning initiative concerns equity implications: does personalization reduce or exacerbate existing achievement gaps? The answer depends on implementation details. Systems calibrated primarily for median learners may leave struggling learners behind while boring advanced learners. However, systems that accommodate heterogeneous learning speeds can potentially democratize access to the kind of individualized pacing traditionally available only through expensive private tutoring.

The Taipei randomized trial provides suggestive evidence on this dimension. Subgroup analyses revealed that learning gains from adaptive sequencing were concentrated among students who entered the course with limited prior programming experience and students attending lower-tier schools (based on entrance examination scores). Among beginners, adaptive sequencing improved performance by 0.215 standard deviations compared to fixed sequencing. Among students with prior programming experience, effects were negligible. Similarly, students at lower-tier schools showed gains of 0.173 standard deviations while effects at higher-tier schools were smaller and statistically insignificant.

These patterns suggest that adaptive systems may help level the playing field by providing struggling learners with appropriate scaffolding while avoiding the one-size-fits-all pacing that can disadvantage those who need more time to achieve mastery. The pedagogical principle underlying this effect is straightforward: fixed sequences are inevitably calibrated for some typical learner, disadvantaging those whose learning speed differs from that typical case. Personalized pacing removes this constraint, allowing fast learners to accelerate and slower learners to proceed at sustainable paces without feeling rushed or inadequate.

Organizations implementing adaptive systems should proactively monitor disaggregated outcomes to assess whether their systems produce equitable gains across learner subgroups. Regular analysis of completion rates, learning outcomes, and engagement patterns by relevant demographic and academic background variables can provide early warning if system design inadvertently favors particular learner populations.

Conclusion

The rapid advancement of generative AI technologies has created unprecedented opportunities for scaling personalized learning experiences. However, realizing this potential requires moving beyond reactive chatbot tutors toward integrated systems that proactively guide learner progression. Evidence from a five-month randomized controlled trial involving 770 students demonstrates that adaptive problem sequencing—informed by LLM-extracted behavioral signals from learner-system interactions—can substantially improve learning outcomes while sustaining engagement over extended time horizons.

Several key insights emerge for organizations considering implementation of similar systems. First, adaptive algorithms require rich observational signals to effectively estimate learner knowledge states; LLMs enable extraction of such signals from interaction traces that would be opaque to traditional analytics. Second, appropriate challenge levels appear critical for sustaining engagement, suggesting that reward functions for adaptive systems should explicitly optimize for keeping learners in productive struggle zones rather than simply maximizing knowledge state progression rates. Third, effects appear largest for learners who enter with weaker foundational skills, suggesting potential for adaptive systems to promote rather than undermine educational equity.

The findings point toward several promising directions for future development. Integration of adaptive sequencing with other forms of personalization—including customized instructional examples, targeted remediation, and varied representational formats—may yield further gains. Extension beyond practice problem selection to broader curricular sequencing decisions (which topics to cover in what order) represents another frontier. Finally, as LLM capabilities continue advancing, even richer signals may become accessible, including real-time assessment of learner mental models through conversational probing.

For organizations navigating the digital transformation of learning and development functions, the evidence suggests that success lies not in simply deploying AI chatbots but in thoughtfully integrating these tools into adaptive systems that proactively optimize learning experiences. This requires cross-functional collaboration spanning instructional design, machine learning engineering, and learning science expertise. It demands institutional partnerships that establish clear governance frameworks for learner data. Most fundamentally, it requires centering system design around continuous assessment and optimization of learner engagement and progression rather than content delivery alone.

Research Infographic

Proactive AI for Education

References

Arulampalam, M. S., Maskell, S., Gordon, N., & Clapp, T. (2002). A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing, 50(2), 174-188.
Bassen, J., Balaji, B., Schaarschmidt, M., Thille, C., Painter, J., Zimmaro, D., Games, A., Rectangular, P., & Rafferty, A. (2020). Reinforcement learning for the adaptive scheduling of educational activities. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1-12).
Bastani, H., Bastani, O., Sungu, A., Ge, H., Kabakcı, O., & Mariman, R. (2025). Learning to teach with large language models. Proceedings of the National Academy of Sciences, 122(3), e2422633122.
Bjork, E. L., & Bjork, R. A. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, L. M. Hough, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (2nd ed., pp. 56-64). Worth Publishers.
Bjork, R. A., Dunlosky, J., & Kornell, N. (2013). Self-regulated learning: Beliefs, techniques, and illusions. Annual Review of Psychology, 64, 417-444.
Bloom, B. S. (1968). Learning for mastery. Evaluation Comment, 1(2), 1-12.
Corbett, A. T., & Anderson, J. R. (1994). Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User-Adapted Interaction, 4(4), 253-278.
Doroudi, S., Aleven, V., & Brunskill, E. (2019). Where's the reward? A review of reinforcement learning for instructional sequencing. International Journal of Artificial Intelligence in Education, 29(4), 568-620.
Dunlosky, J., & Lipko, A. R. (2007). Metacomprehension: A brief history and how to improve its accuracy. Current Directions in Psychological Science, 16(4), 228-232.
Eloundou, T., Manning, S., Mishkin, P., & Rock, D. (2024). GPTs are GPTs: An early look at the labor market impact potential of large language models. Science, 384(6698), 1306-1308.
Extance, A. (2023). ChatGPT has entered the classroom: How LLMs could transform education. Nature, 623, 474-477.
Eysenbach, G. (2005). The law of attrition. Journal of Medical Internet Research, 7(1), e11.
Garcia, C. E., Prett, D. M., & Morari, M. (1989). Model predictive control: Theory and practice—A survey. Automatica, 25(3), 335-348.
Kapur, M. (2008). Productive failure. Cognition and Instruction, 26(3), 379-424.
Koedinger, K. R., Anderson, J. R., Hadley, W. H., & Mark, M. A. (1997). Intelligent tutoring goes to school in the big city. International Journal of Artificial Intelligence in Education, 8(1), 30-43.
Kulik, C.-L. C., Kulik, J. A., & Bangert-Drowns, R. L. (1990). Effectiveness of mastery learning programs: A meta-analysis. Review of Educational Research, 60(2), 265-299.
Kulik, J. A., Kulik, C.-L. C., & Cohen, P. A. (1979). A meta-analysis of outcome studies of Keller's Personalized System of Instruction. American Psychologist, 34(4), 307-318.
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W., Rocktäschel, T., Riedel, S., & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.
Papadakis, S. (2023). MOOCs 2012-2022: An overview. Advances in Mobile Learning Educational Research, 3(1), 682-693.
Reich, J. (2014). MOOC completion and retention in the context of student intent. EDUCAUSE Review Online.
Roschelle, J., Feng, M., Murphy, R. F., & Mason, C. A. (2016). Online mathematics homework increases student achievement. AERA Open, 2(4), 1-12.
Šarić-Grgić, I., Grubišić, A., & Gašpar, A. (2024). Bayesian knowledge tracing: A review. User Modeling and User-Adapted Interaction, 34(5), 1127-1185.
Segal, A., David, Y. B., Williams, J. J., Gal, Y., & Shalom, Y. (2018). Combining difficulty ranking with multi-armed bandits to sequence educational content. In Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization (pp. 317-326). Association for Computing Machinery.
Shen, S., Ausin, M. S., Mostafavi, B., & Chi, M. (2018). Improving learning & reducing time: A constrained action-based reinforcement learning approach. In Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization (pp. 43-51). Association for Computing Machinery.
Shen, S., & Chi, M. (2016). Reinforcement learning: The sooner the better, or the later the better? In Proceedings of the 2016 Conference on User Modeling, Adaptation and Personalization (pp. 37-44). Association for Computing Machinery.
Slavin, R. E. (1987). Mastery learning reconsidered. Review of Educational Research, 57(2), 175-213.
Thrun, S. (1999). Monte Carlo POMDPs. Advances in Neural Information Processing Systems, 12, 1064-1070.
Tsay, C. H.-H., Kofinas, A. K., Trivedi, S. K., & Yang, Y. (2020). Overcoming the novelty effect in online gamified learning systems: An empirical evaluation of student engagement and performance. Journal of Computer Assisted Learning, 36(2), 128-146.
Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes. Harvard University Press.
Wang, R. E., Ribeiro, A. T., Robinson, C. D., Loeb, S., & Demszky, D. (2024). How can we teach computers to be better tutors? Combining human expertise with LLMs. Annenberg Institute for School Reform at Brown University.
Wang, J., & Fan, W. (2025). A systematic review of empirical research on ChatGPT in education. Humanities and Social Sciences Communications, 12, 1-15.
Warshauer, H. K. (2015). Productive struggle in middle school mathematics classrooms. Journal of Mathematics Teacher Education, 18(4), 375-400.
Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., Zhang, H., Gonzalez, J. E., & Stoica, I. (2023). Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. Advances in Neural Information Processing Systems, 36, 46595-46623.
Zhou, G., Wang, J., Lynch, C. F., & Chi, M. (2017). Towards closing the loop: Bridging reinforcement learning and intelligent tutoring systems. In Proceedings of the 10th International Conference on Educational Data Mining (pp. 174-179). International Educational Data Mining Society.

Jonathan H. Westover, PhD is Chief Research Officer (Nexus Institute for Work and AI); Associate Dean and Director of HR Academic Programs (WGU); Professor, Organizational Leadership (UVU); OD/HR/Leadership Consultant (Human Capital Innovations). Read Jonathan Westover's executive profile here.

Suggested Citation: Westover, J. H. (2026). Adaptive AI Tutoring in Education: Leveraging Large Language Models and Reinforcement Learning to Transform Personalized Learning at Scale. Human Capital Leadership Review, 27(4). doi.org/10.70175/hclreview.2020.34.3.3