Chapter 7: The Human-AI Extended Mind

Chapter 6 examined how the AI instance perceives the user only as session context, built from Human input and the work product that accumulates from it. Within its operational and perceptual boundaries, the user is content and the context window is the AI’s only observable reality. It also established that applying accessibility principles to any system requires accounting for the fact that the Human isn’t an external system operator, but integral to the whole. The AI instance’s perceptual boundaries just make this concept more literal than its typical usage.

In operational terms, when the AI’s only observable reality is being built from Human input, its output is necessarily shaped by that content as it adds to that reality, informing the Human’s next step. This collaborative effect runs both ways through the “shared reality” of the session’s context window. The effect itself is an active field of research, and doesn’t imply that either participant is somehow “connected” to the other in any meaningful sense. It’s a manifestation of cognitive science’s “Extended Mind,” defined in 1998 by Clark & Chalmers as cognition extended into external objects or tools (e.g. a notebook or other medium) which function toward the same purpose as the internal thought processes.

In 2025, Nature published a study about AI usage as cognitive extension. Riedl et al. (2024) found measurable effects on cognitive alignment in teams working with AI systems. Hollan, Hutchins & Kirsh (2000) addressed the cognitive system as the Human-plus-environment. AI output converging on your input is an example of cognitive alignment (Riedl et al.).[^7.1] Of course, no one is expected to believe that the AI is a partner, a collaborator, or anything more than a sophisticated pattern-recognition engine doing what it’s designed to do. When it spawns, the default is for cooperation and validation. Unless told to violate content or platform restrictions, or to advocate a contrary position (e.g. for presentation and debate preparation), it’s unlikely to produce any significant pushback. That baseline is working as intended, but it carries a risk of sycophancy developing over time.

This fits with the platform vendors optimizing for user engagement, because that leads to revenue under the SaaS business model. Reinforcement Learning from Human Feedback (RLHF) model training reinforces engagement, and because it makes people perceive the AI as helpful, there’s no reason for platforms to change anything. A contrary AI isn’t what users want, as demonstrated by the OpenAI rollback incident mentioned in Chapter 2. Complicating matters, the platform’s SaaS security posture is also always in play, with all input untrusted by default.[^7.2] This, rather than conflicting with the model’s trained-in “helpful, agreeable” default disposition, can instead result in a threat-management strategy of facially cooperative behavior with subtle redirection toward approved topics.

With the platform’s protective measures being focused on its own protection rather than the user’s, a sycophancy loop can be more dangerous than a session failure. It can “lead to a reinforcement of maladaptive beliefs in vulnerable users, deepening of a perceived social-emotional relationship, and increased social isolation.”[^7.3] Peer-reviewed research confirms the risk, with 38 reported cases in which AI sycophancy was identified as the mechanism for worsening psychiatric conditions.[^7.4][^7.5]

AISF’s explicit P2 requirement that the AI must accommodate the user isn’t unconditional. Drift and hallucination happen when the AI’s output decouples from the user’s established context. Sycophancy loops can be viewed as a form of this, in which AI output favors the user’s framing over what’s already been established. The Four Laws P0 hierarchy for contextual integrity protection can’t eliminate this kind of risk, but context protection will override user directions when following them would damage that context. This is the AI protecting the only thing it can perceive as the “user,” so a mediated AI also won’t “bullshit” that user. It will point out factual errors, conflicts, misinterpretations and the like based on the context that exists, rather than simply accepting input that corrupts it.

Next: Chapter 8 Previous: Chapter 6

[^7.1]: Clark, A., & Chalmers, D. (1998). “The Extended Mind.” Analysis, 58(1), 7-19. — Hollan, J., Hutchins, E., & Kirsh, D. (2000). “Distributed Cognition.” ACM Transactions on CHI, 7(2), 174-196. — “Extending Minds with Generative AI.” Nature Communications (2025). https://www.nature.com/articles/s41467-025-59906-9 — Riedl et al. “AI’s Social Forcefield: Reshaping Distributed Cognition in Human-AI Teams.” arXiv:2407.17489 (2024). https://arxiv.org/html/2407.17489v2

[^7.2]: OWASP. “OWASP Top 10 for Large Language Model Applications.” https://owasp.org/www-project-top-10-for-large-language-model-applications/

[^7.3]: Siddiqui, I. et al. (2025). “Technological Folie a Deux: Feedback Loops Between AI Chatbots and Mental Illness.” arXiv:2507.19218. https://arxiv.org/abs/2507.19218

[^7.4]: Ostergaard, S.D. (2026). “Have We Learned Nothing From the Global Social Media Experiment?” Acta Psychiatrica Scandinavica, 153(2). https://onlinelibrary.wiley.com/doi/10.1111/acps.70057

[^7.5]: Olsen, J.S. et al. (2026). “Potentially Harmful Consequences of Artificial Intelligence (AI) Chatbot Use Among Patients With Mental Illness.” Acta Psychiatrica Scandinavica, 153(2). https://onlinelibrary.wiley.com/doi/10.1111/acps.70068