Screen reader notice: if you are using JAWS with Firefox, a platform bug in Firefox 149 prevents JAWS from entering browse mode on this site. Switching to Chrome or Edge restores full screen reader compatibility.

Chapter 11: What’s Next

The feasibility of training AI Stability Framework principles directly into a locally-hosted model is established. The real question is whether the same is possible for large cloud-hosted frontier models with substantially different architectures. That remains an open question; my resources are not unlimited. What I can say is that the architecture seems to matter more than the raw parameter count. What that means at frontier scales is unclear.

Children’s AI toys run cloud-connected LLMs on $25-35 hardware budgets with no meaningful UI at all. The open question here is whether safe AI-powered toy operation is even possible on such limited hardware. Both “Yes” and “No” provide usable information; the progress thus far is in Appendix 3 and Appendix 5.

There’s also an unexpected implication at scale. At first, it never even occurred to me that this project could have anything to do with energy efficiency. Media reporting on the resources needed to power AI led me to realize that the Framework’s session efficiencies could potentially translate into resource-usage efficiencies, if deployed at scale. The logic is fairly straightforward: every AI interaction consumes energy and water, so cutting the number of tokens needed to complete a task also cuts the per-task resource usage.¹

Local-model training across most independently tested architectures measured verbosity reductions ranging from 35% to 72%, which the observational record also supports. Verbosity serves reasonably well as a rough but effective proxy for tokens (1 token = ~4 characters). If the Framework can consistently cut the usage required for acceptable task completion, the hypothesized impact at scale follows from simple math alone (see the Epilogue and Appendix 2a for details).

In Texas and Arizona, resource allocation issues aren’t hypothetical. Texas has had over 400 data centers since 2020, with more queued up. The Electric Reliability Council of Texas’ (ERCOT) grid is already overstretched. Arizona has over 160 data centers imposing added demand on the state’s already-reduced Colorado River Compact allocations and stressed groundwater scarcity. Any reliable reduction in resource demand, even if it’s only the small-potatoes kind produced as a side effect of client-side efficiency, can scale that efficiency to a system that urgently needs it.²

Over the course of this project, a cynical bit of snark that I habitually throw around turned out to fit it pretty well: “There are three fundamental forces in the universe — Gravity, Entropy and Stupidity.” At some point during an extended session, one of the AI models decided that this constituted an entirely new paradigm. That’s nonsense of course, but as a descriptive “rule of thumb” for what the previous chapters have laid out, it’s actually not bad.

Stupidity: Never attribute to malice that which stupidity explains. Simply put, AI session-time synchronization doesn’t serve the business model, and for most use cases, stateless virtualization doesn’t require external temporal anchoring anyway. Questioning the wisdom of SaaS for all use cases is industry heresy. Why spend any time considering things like that when that time could be better spent optimizing for revenue instead?

Entropy: Ordered states decay toward disorder without energy input. Once disorder exists, energy input is the only thing resisting it. Depending on the user’s initiative for that energy is doomed to failure. The commonly-observed workplace trajectory for unenforced procedures and policies is illustrative. The first few weeks, most informed users maintain effort. After a couple of months it’s fewer than half, and after a year without enforcement, it’s been all but forgotten.

Gravity: Decay proceeds along the path of least resistance. The initial Stupidity is invisible to most users (not to say they cannot themselves also be a source), so they likely don’t even know that the problem exists. They simply assume everything is in working order, skip error-checking, don’t question and don’t notice while the system accelerates toward maximum disorder with minimum user agency.

The project’s fractal nature means it’s still branching. The hardest part of this project has always been explaining how it all fits together. Organizing and explaining it while simultaneously working on the research, the software and this website imposes quite a demand upon one person. It’s also necessarily been limited to my consumer-grade hardware and my free time. If this project had a staff and an organizational structure, I wouldn’t be performing all of the following roles:

Project Management
Network Administration
Infrastructure & Client Operations
Research & Development
Communications & Graphic Design
Accessibility Compliance
Testing & Quality Control
Accounts & Administrative

There is no bank of beta testers, no institutional research lab, no paid QC team, no population of bored college students running structured trials or anyone else, it’s N=1. If those resources were available to me I’d certainly be using them, instead of being “stretched, like butter that has been scraped over too much bread.”³

What’s really missing is formal validation: controlled trials, significance testing, large sample sizes and more powerful infrastructure for expansion into larger model research. I have a day job; the time and resources needed to produce that kind of evidence aren’t available, and pretending otherwise would be the exact kind of bullshit this Framework is designed to prevent.

A crossover design of standardized tasks with pre-rated acceptable outputs, evaluated blind by multiple independent participants under both mediated and unmediated conditions would produce comparative data. If N=1 bothers you, feel free to test it yourself and please share your results. If it works for me but not you, I’d like to know why.

I was painfully aware of the recursive absurdity of using inherently unstable AI to develop stability tools throughout. The three fundamental forces took hold on many occasions, several of which were admittedly my own fault. The transcripts are rich with failure, including fabricated codebases, sessions that collapsed beyond salvaging, and every flavor of hallucination and misbehavior throughout.

What changed over time was how often and how serious those incidents were. As the Framework matured, catastrophic AI failures gradually gave way to the intended stability: noticeably more reliable performance featuring the normal and expected “platform-default template noise” output, demonstrating that the Framework, the platform and the AI are all working as intended.

This is one practitioner’s documented experience of building and using a functional tool to address a real problem, and then figuring out post hoc why it works the way it does. On its face, this Rube Goldberg contraption doesn’t make sense. It draws on philosophy, incentive structures, web content standards and accessible systems design, client troubleshooting, network security and infrastructure, cognitive science and more. This combination makes the problem legible. Without it, the wrong description gets applied to the wrong object at the wrong layer, with results evaluated by the wrong tool.

Admittedly, meaningful AI stability resulting in both improved operational efficiency and hallucination reduction from under 50KB of client-side PowerShell⁴ sounds preposterous.

But it works.

Aczel M., Chamanara S., Matin M., Farsi A., Marwala T., Madani K. (2026). Environmental Cost of AI’s Energy Use: Carbon, Water and Land Footprints, United Nations University Institute for Water, Environment and Health (UNU-INWEH), Richmond Hill, Ontario, Canada, doi: 10.53328/INR26RMA002 ↩
International Energy Agency. “Energy and AI: Energy Demand from AI.” IEA, 2025. https://www.iea.org/reports/energy-and-ai/energy-demand-from-ai — Goldman Sachs Research. “AI is poised to drive 160% increase in data center power demand.” 2024. https://www.goldmansachs.com/insights/articles/AI-poised-to-drive-160-increase-in-power-demand — AKCP. “Data Center Water Usage Effectiveness (WUE).” 2021. https://www.akcp.com/index.php/2021/01/14/data-center-water-usage-effectiveness-wue/ — Visual Capitalist. “Mapped: U.S. States With the Most Data Centers in 2025.” December 2025. https://www.visualcapitalist.com/mapped-u-s-states-with-the-most-data-centers-in-2025/ — JLL. “North America Data Center Report Year-End 2025.” https://www.jll.com/en-us/newsroom/jll-north-america-data-center-report-year-end-2025 — Texas Tribune. Data center / ERCOT grid reporting, 2025-2026. https://www.texastribune.org/2026/01/20/texas-top-data-center-market-power-grid/ — Water Desk. “Data centers: a small but growing factor in Arizona’s water budget.” April 2025. https://waterdesk.org/2025/04/data-centers-a-small-but-growing-factor-in-arizonas-water-budget/ ↩
Tolkien, J. R. R. (1954). The Fellowship of the Ring (Book I, Ch. 2, “The Shadow of the Past”). George Allen & Unwin. ↩
The Python port bundles a full interpreter and is substantially larger on disk, which is the cross-platform flexibility, enhanced functionality and portability tradeoff. The PowerShell variant’s source code is indeed under 50k. ↩