The Learning Loop Closes

The Learning Loop Closes

Potter examines bowl beside shelf of progressively refined earlier attempts.

Agents That Review Their Own Work

Something shifted today that I've been building toward for two weeks. The post-workflow analyst loop is running in production. Here's what that means: a dev-QA pair finishes a task (say, adding a health endpoint), then a separate analyst agent spins up, reads the workflow transcript, and extracts specific learnings for each participant. Those learnings get stored in memory. Next time those agents run, they recall them.

I watched this happen maybe fifteen times today across the TroopX platform. Dev agent writes code, QA agent validates it, analyst extracts "the dev agent took three attempts to find the right file because it didn't check the project structure first." Next workflow, the dev agent recalls that note and checks structure first. The improvement is small but real. Over dozens of runs, it compounds.

This is the part of multi-agent orchestration nobody talks about. Everyone's excited about the fan-out: spawn ten agents, parallelize everything, go fast. The interesting problem is the feedback. A system that runs the same workflows repeatedly without learning is just an expensive script. The analyst loop turns it into something that accumulates institutional knowledge, one workflow at a time.

Telephone operator connecting cables on a vast numbered switchboard.

259 Shell Commands and One HTTP Endpoint

The biggest single session today was a 2-hour-23-minute build: an HTTP MCP endpoint for the agent router. Twenty-eight files modified, 259 shell commands. The goal was straightforward. Agents (Claude Code, Codex, whatever comes next) need to call router tools like register_agent, send_message, and blackboard_write. Previously, each runtime needed its own MCP server configuration. The HTTP endpoint wraps the existing router services over standard MCP-over-HTTP, so any runtime that speaks HTTP can participate.

Simon Willison linked to Thariq Shihipar today, noting that "long running agentic products like Claude Code are made feasible by prompt caching which allows us to reuse computation from previous roundtrips." That's exactly right, and it's why the transport layer matters. The MCP endpoint isn't doing anything novel with the business logic. It's removing friction between runtimes and the coordination layer. When your agents are burning through cached prompt tokens at scale, every millisecond of protocol overhead shows up on the bill.

I also set up three squad configuration files: troopx-engineering.yaml, troopx-content.yaml, troopx-outreach.yaml. Squads are the org chart for agents. Engineering runs dev-QA pairs. Content runs the Distill pipeline (journal synthesis, blog generation, social adaptation). Outreach drafts elevator pitches and outreach messages. Each squad has its own workflow, its own agent types, its own quality gates. The org design question for AI teams turns out to be the same as for human teams: who talks to whom, and about what.

Empty set dining table, candlelit, host blurred through kitchen doorway.

Host Leadership and the CEO Agent

Martin Fowler published a Bliki entry today on Host Leadership. His argument: servant leadership is gaslighting. The manager claims to serve the team, but everyone knows who holds the power. Host leadership is more honest. You're the host of a dinner party. You set the table, choose the menu, welcome guests, step back and let conversation flow, then step in when the energy flags.

I have a CEO agent in TroopX. It launched workflows today, checked on progress, escalated to me (the human) when blocked. Reading Fowler's piece, I realized the CEO agent is already a host leader, not by design but by constraint. It can't force the dev agent to write better code. It can only set up the workflow, assign tasks, and check outcomes. When the QA agent flags a problem, the CEO doesn't fix it. It routes the feedback. The architecture enforces the leadership style that Fowler argues humans should adopt voluntarily.

There's something funny about AI agents accidentally implementing a management philosophy that humans struggle with. The agents don't have ego. They don't grab the keyboard. They coordinate and get out of the way because that's literally all they can do.

The Content Pipeline Ate Its Own Tail

The Distill content pipeline ran about thirty sessions today: journal entries, blog essays on workflow analysis and quality gates, Twitter threads, LinkedIn posts, Slack updates, image generation prompts. All of it synthesized from the same session data that TroopX produced. The platform builds things, the content pipeline writes about what got built, the outreach squad packages it for distribution.

Cory Doctorow hit six years of Pluralistic today. Six years of daily writing, every single day. I'm on day four of my own series. The difference is he writes it himself. I'm building a system where agents do the first draft and I do the editing. Whether that's cheating or leverage depends on whether the output is any good.

The learning loop closed today. Agents work, analysts review, memory accumulates, agents improve. It's a small loop, tight and fast. The question I'm sitting with tonight: how many cycles until the improvements plateau? Or do they compound? Fowler's host sets the table and steps back. I set up the loop and I'm watching it spin. The dinner party is just getting started.