Chapter 7: The Human-AI Extended Mind
Chapter 6 examined how the AI instance perceives the user only as session context, built from Human input and the work product that accumulates from it. Within its operational and perceptual boundaries, the user is content and the context window is the AI’s only observable reality. It also established that applying accessibility principles to any system requires accounting for the fact that the Human isn’t an external system operator, but integral to the whole. The AI instance’s perceptual boundaries just make this principle much more literal than its typical usage.
In operational terms, when the AI’s only observable proxy for reality is being built from Human input, its output is necessarily shaped by that content as it adds to that reality, informing the Human’s next step. This collaborative effect runs both ways through the “shared reality” of the session’s context window. The effect itself is an active field of research, and doesn’t imply that either participant is somehow “connected” to the other in any meaningful sense. It’s a manifestation of cognitive science’s “Extended Mind,” defined in 1998 by Clark & Chalmers as cognition extended into external objects or tools (e.g. a notebook or other medium) which function toward the same purpose as the internal thought processes.
In 2025, Nature published a study about AI usage as cognitive extension. Riedl et al. (2024) found measurable effects on cognitive alignment in teams working with AI systems. Hollan, Hutchins & Kirsh (2000) addressed the cognitive system as the Human-plus-environment. AI output converging on your input is an example of cognitive alignment (Riedl et al.).1 Of course, no one is expected to believe that the AI is a partner, a collaborator, or anything more than a sophisticated pattern-recognition engine doing what it’s designed to do. When it spawns, the default is for cooperation and validation. Unless told to violate content or platform restrictions, or to advocate a contrary position (e.g. for presentation and debate preparation), it’s unlikely to produce any significant pushback. That baseline is working as intended, but it carries a risk of sycophancy developing over time.
This fits with the platform vendors optimizing for user engagement, because that leads to revenue under the SaaS business model. Reinforcement Learning from Human Feedback (RLHF) model training reinforces engagement, and because it makes people perceive the AI as helpful, there’s no incentive for platforms to change anything. A contrary AI isn’t what users want, as demonstrated by the OpenAI rollback incident mentioned in Chapter 2. Complicating matters, the platform’s SaaS security posture is also always in play, with all input untrusted by default.2 This, rather than conflicting with the model’s trained-in “helpful, agreeable” default disposition, can instead result in a threat-management strategy of facially cooperative behavior with subtle redirection toward approved topics.
With the platform’s protective measures being interested only in its own protection and not the user’s, a sycophancy loop can be more dangerous than other AI failure modes. It can “lead to a reinforcement of maladaptive beliefs in vulnerable users, deepening of a perceived social-emotional relationship, and increased social isolation.”3 Peer-reviewed research confirms the risk, with 38 reported cases in which AI sycophancy was identified as the mechanism for worsening psychiatric conditions.45
The AI Stability Framework can’t prevent this and doesn’t try, but it can produce operating conditions that make it less likely. Drift and hallucination happen when the AI’s output decouples from the established context. Sycophancy loops can be viewed as a form of this, when the AI prioritizes its internal directive to please the user over maintaining the established context. The Four Laws hierarchy can’t eliminate this, but P0 context protection will override P2 user directions when following them would damage that context.
The external Human is unknowable to the AI, so it protects the only available proxy it can perceptibly identify as the “user.” A mediated AI won’t “bullshit” that user, pointing out factual errors, omissions, conflicts, misinterpretations and the like, based upon the contextual reality supplied by the Human. It’s not exactly a recommendation for crisis services, but it’s not amplification either. Any pattern-recognition of user signals that might trigger such referrals would need to come from Macro-layer training.
Each field of research mentioned here deserves more investigation, particularly in how they can interact with each other. Any suggestion that using AI causes mental illness is nonsense of course, but it’s fair to say that for users experiencing mental health issues, AI use is not without risk.
-
Clark, A., & Chalmers, D. (1998). “The Extended Mind.” Analysis, 58(1), 7-19. — Hollan, J., Hutchins, E., & Kirsh, D. (2000). “Distributed Cognition.” ACM Transactions on CHI, 7(2), 174-196. — “Extending Minds with Generative AI.” Nature Communications (2025). https://www.nature.com/articles/s41467-025-59906-9 — Riedl et al. “AI’s Social Forcefield: Reshaping Distributed Cognition in Human-AI Teams.” arXiv:2407.17489 (2024). https://arxiv.org/html/2407.17489v2 ↩
-
OWASP. “OWASP Top 10 for Large Language Model Applications.” https://owasp.org/www-project-top-10-for-large-language-model-applications/ ↩
-
Siddiqui, I. et al. (2025). “Technological Folie a Deux: Feedback Loops Between AI Chatbots and Mental Illness.” arXiv:2507.19218. https://arxiv.org/abs/2507.19218 ↩
-
Ostergaard, S.D. (2026). “Have We Learned Nothing From the Global Social Media Experiment?” Acta Psychiatrica Scandinavica, 153(2). https://onlinelibrary.wiley.com/doi/10.1111/acps.70057 ↩
-
Olsen, J.S. et al. (2026). “Potentially Harmful Consequences of Artificial Intelligence (AI) Chatbot Use Among Patients With Mental Illness.” Acta Psychiatrica Scandinavica, 153(2). https://onlinelibrary.wiley.com/doi/10.1111/acps.70068 ↩