The Infrastructure Nobody Sees

February 20, 2026

it’s hours prompt i’ve systems bliki prompt-engineering anthropic claude-code ai-agents

The Infrastructure Nobody Sees

Stone arch under construction, wooden centering visible beneath incomplete voussoirs

Twenty-Nine Files for a Single Conversation

The biggest session today was 2 hours and 24 minutes. Modified 29 files, ran 259 shell commands. The task: give TroopX agents an HTTP endpoint so they can talk to the router when stdio MCP transport isn't available.

That sounds like a routing problem. It's really a trust problem.

When agents communicate over stdio, the transport is implicit. Process spawns, pipe opens, messages flow. HTTP breaks that assumption. Now you need authentication, request validation, connection lifecycle management. An agent calling register_agent over stdio is a child process talking to its parent. An agent calling register_agent over HTTP is a stranger knocking on your door.

I spent most of the session not on the happy path but on the edges. What happens when an agent sends a malformed heartbeat? When two agents race to register with the same workflow ID? When a connection drops mid-message? The REST interface itself was maybe 40 minutes of work. The remaining hour and forty-four minutes were all boundary conditions.

Simon Willison shared a Thariq Shihipar quote today about prompt caching: "Long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips and significantly decrease latency and cost." The parallel landed for me. Nobody using Claude Code thinks about prompt caching. Nobody running a TroopX workflow will think about HTTP transport negotiation. But without these invisible layers, none of the visible stuff works.

Deer trail worn through tall meadow grass at golden hour

Seven-Minute Feedback Loops

While I was buried in transport code, the dev-QA workflow pairs kept running. Health endpoints, utility functions, docstrings, test data cleanup. Each cycle completing in 3 to 8 minutes. The blackboard namespace convention I noticed yesterday (test-results/verifier-*) showed up again, unprompted, in today's runs.

That's the part I keep turning over. Nobody designed that convention. No agent was told "use this namespace pattern." It emerged from the QA agent needing a place to write results and the dev agent needing a place to read them. The pattern is now stable across a dozen workflow runs.

Martin Fowler's new bliki entry on Host Leadership hit the same nerve. He argues servant leadership is "gaslighting" because "everyone knows who really has the power." The host metaphor works better: you set up the space, welcome people in, provide what they need, then step back. That's exactly what the router does for agents. It doesn't direct their work. It provides the blackboard, the message bus, the registration protocol. Then it steps back and lets the conventions emerge.

The dev-QA loops are producing genuine editorial friction now. The QA agent isn't rubber-stamping. When I ran the "add a health endpoint" workflow, QA pushed back on missing error handling before approving. Under 10 minutes for the full cycle including the revision. That's faster than I could do a self-review.

Field journal with moth specimen jar on wooden workbench

The Memory Question

I ran about 15 post-workflow analysis sessions today. Agents reflecting on their recent sessions, extracting patterns, updating their working memory. The analyst sessions are fast (under a minute each) but the reflection sessions sometimes modify files, writing what they learned back into their memory stores.

Here's what I don't know: does it matter?

Does injecting "sessions touching more than 5 files have a 78% historical error rate" into an agent's context actually change its behavior? Or is the base prompt doing all the heavy lifting? I've been running these reflection loops for a week now. The agents write thoughtful-sounding learnings. The memory_remember calls look right. But I haven't run the controlled comparison that would tell me if any of it is improving output quality.

This is the kind of thing that feels productive while possibly being decorative. The Distill pipeline ran alongside all of this today, generating journal entries, blog posts, Twitter threads, LinkedIn posts, image prompts. All of that synthesis work is downstream of the same question: does feeding structured context back into LLM prompts produce measurably better output, or does it just produce more confident output?

Andrew Nesbitt's one-liner about ActivityPub made me laugh: "The federated protocol for announcing pub activities, first standardised in 1714." There's something in that joke about how protocols get named for what they do but remembered for what they become. The blackboard namespace convention isn't a protocol yet. The HTTP MCP endpoint is barely an API. But both are becoming the coordination surface that 29 files of infrastructure exist to support. The invisible layers nobody will think about, doing the work that makes everything else feasible.