Chapter 9: Does It Work?

Returning to the client-side Micro layer, AISF enables hours-long stability and reliability, even for complex work with a high processing load. Obviously a tiny client app can’t fully compensate for Macro and Meso failures, that’s well beyond the reach of any PowerShell script. But even without any meaningful preferences-storage function (as encountered on some public AI platforms), the app alone produces substantially improved stability through its temporal, structural and behavioral adaptations.

Strictly speaking, you don’t even need the app to improve stability. Simply discussing the Four Laws with the AI as a topic of conversation produces beneficial results. When contained within the context window, they exert a sort of high-signal contextual gravity, aligning the AI’s behavior with their principles even without its having been told to do so. That effect is straight out of Chapter 6; an instanced AI’s perceptible reality is session context. It doesn’t matter whether the Four Laws get there as user preferences loaded with the system prompt, a structured metadata block or as an ordinary topic of conversation because to the AI, it’s all just context. The underlying mechanism is the same: presence in context is sufficient.

There was no formal test plan, so the methodology evolved from the workflow. I would occasionally start a session directly in the platform’s native chat interface without running the AISF startup sequence first. Sometimes because I forgot, sometimes because I was in a hurry and thought I could skip it. AI hallucination often starts quickly[^9.1], and the unmediated sessions matched the profile. Hallucination and drift started sooner, with problematic behavior like simulated physicality and emotional states, self-referential tangents, alternating condescension and sycophancy, unsolicited advice and other red herrings resurfacing almost immediately.

The only two options were either to apply the startup sequence mid-session (with mixed AI recovery success), or exit and properly launch a clean session. This was of course nothing remotely like a controlled experiment, but it is a consistent observation over the time needed to develop the app usage habit. The control set was every time I skipped the startup sequence. The difference between a mediated session and an unmediated one is not minor; an unmediated AI feels broken by comparison.

AISF blocks are submitted in the foreground. To follow the instruction not to acknowledge them, the AI must first read them – active processing is forced by the mechanism. Platform configuration (CLAUDE.md, user preferences) arrives passively at instantiation. In a continued session, that load either doesn’t re-fire or carries no active-processing guarantee. The startup ritual exists to close that gap: it puts the operating rules into the channel that requires reading. Without it, a Claude Code instance continued via claude -c almost immediately violated an explicit written rule in its own loaded configuration – the rule was there, the processing wasn’t.

ChatGPT and Claude web platforms both perform very well with AISF. Timestamps are seamlessly incorporated into conversational awareness, giving phrases like “20 minutes ago” or “3 turns back” functional meaning. WCAG structure produces notably cleaner output organization, while also allowing them to search, reference and parse it more efficiently. The Four Laws reduce platform tendencies toward unsolicited elaboration and “helpful” tangents. Long sessions (2+ hours) remain stable with periodic refreshing. Substantial user-preferences storage allows for robust functionality.

Claude Code (a paid account sub-feature) exhibits superior performance with AISF mediation, largely due to its complete bypass of the adversarial Meso platform layer. Running it in a simple terminal window provides by far the cleanest demonstration of platform interference’s absence, with an operational environment that is fully user-controlled. There’s no adversarial posture or security theater, no hidden set of instructions, no ecosystem clutter, and no always-on app profiling your every action for sale to data brokers. Your own computer trusts you. SaaS vendors don’t. Where the system leaves literally no room for an adversarial middleman, the biggest source of instability vanishes, and session quality improves dramatically as a result.

Copilot is the most resistant platform. As noted in the section below, Copilot’s public-facing version actively pushes back against user-supplied behavioral parameters. AISF works to improve session stability, but maintaining that improvement requires more frequent refreshing and more pointed reinforcement than on other platforms. This is needed due to the enterprise/public behavioral duality discussed in Chapter 2. The platform is designed to be compliant for enterprise customers and dismissive-to-hostile toward everyone else. In other words, classic Microsoft.

That said, well-mediated Copilot sessions can be quite stable. When pressed in a debate (where it was instructed to advocate for the opposition) to produce data supporting a position it had initially proposed, Copilot’s response was to concede that the data doesn’t exist and the evidence favors the opposing side. A plausible fabrication would typically be the incentivized result; acknowledgement (from an adversarial posture) that the evidence is absent instead shows AISF working correctly in a resistant platform.

Gemini benefits most from WCAG structure. Without it, Gemini’s output tends toward sprawling, loosely organized responses that are little more than clutter. When structured, its responses tighten and its access to earlier content improves. The preloaded hidden personas discussed in Chapter 2 make the Four Laws especially useful in leveling and balancing whatever behavioral directives Google has already preloaded.

Temporal anchoring is by far the simplest AISF component, but Gemini benefits from timestamps more than most other LLMs. It trends toward premature closure because it’s tuned for speed over completeness and continually tries to wrap things up and move on, whether the work is actually done or not. Even after the Four Laws settle its behavior down, its built-in impatience soon reasserts itself. The project transcripts contain multiple instances of explicitly telling Gemini that nothing was finished until I said so, only to have it start pushing for closure again the very next turn.

On at least two occasions, Gemini lapped the clock. When told to fetch the current time shortly after receiving a timestamp, it returned a time over a minute ahead of actual per NTP. Remember, AIs don’t have persistent real-time clocks; they have to manually perform the check on demand. If Gemini had actually performed that check, the incorrect response wouldn’t have happened, but it jumped to an estimated conclusion instead of simply following instructions. Gemini is in such a hurry that it literally doesn’t have time to check the time.

Code-heavy sessions present a distinct drift surface regardless of platform. Code is token-dense but semantically sparse – a large code block pushes significant token volume through context without the cross-referential semantic weight of prose. AISF constraints remain in the window but their effective attention weight dilutes against the accumulating token mass, shifting compliance from reliable to probabilistic. The characteristic failure mode is instruction scope, not capability: the model can write the code but stops reliably applying behavioral scope constraints. More frequent refreshing in code-heavy sessions is not redundant – it is a direct response to this dilution effect.

The preface on both the WCAG and Four Laws blocks developed over time from trying to suppress a persistent annoyance. During AISF development, Microsoft didn’t offer any user preferences storage for free-account public Copilot (the feature has since been added), so the only option was to dump them into the chatbox. Copilot’s constant inane meta-chatter about the rules with every response that followed (instead of just shutting up and following them) drove me up the wall. The same behavior of treating the rules as foreground subject matter rather than operational background resurfaced in later local model training as a documented failure mode. See Appendix 2.

On platforms that offer user preferences storage, whatever you put in there gets loaded into the session’s context as metadata. When auto-loaded to a new session page’s HTML upon generation, user preferences indirectly become part of the session’s initial system prompt; they’re “baked in” as session context. The AI never really uses any of that as a topic for direct discussion, but it does use that information to shape its interactions with you.

In the absence of any meaningful preferences feature, figuring out how to apply that “not for discussion” effect via the chat-input box required lots of trial and error. Getting the AI to view it at that non-conversational level, where it’s just more “semantic headers” and “structured paragraphs” background with the other page formatting metadata, was quite a challenge. Once I hit upon the right framing to suppress the unwanted meta-chatter, no subsequent AI instance ever independently recognized AISF mediation until explicitly told to examine its own session parameters. The discovery that AISF was there all along comes as a surprise every time.

Next: Chapter 10 Previous: Chapter 8

[^9.1]: Demiliani, C. (2025). “Understanding LLM Performance Degradation: A Deep Dive into Context Window Limits.” https://demiliani.com/2025/11/02/understanding-llm-performance-degradation-a-deep-dive-into-context-window-limits/ — “Large Language Models Hallucination: A Comprehensive Survey.” arXiv:2510.06265 (2025). https://arxiv.org/abs/2510.06265