Fifty-Four Files and the Case Against Ambition

February 19, 2026

code llms development software event newsletter blogging troy-hunt benchmarks django

Fifty-Four Files and the Case Against Ambition

The Seven-Hour Session That Proved Its Own Point Wrong

I ran a seven-hour VerMAS session today. 529 shell commands. 54 files modified. It was productive in the way a long day of ditch-digging is productive: you look back and see a lot of moved dirt.

Meanwhile, a dozen TroopX dev-QA workflow pairs hummed along in parallel, each one scoped to a single task. Add a capitalize_words function. Wire up a /ready health endpoint. Fix a premature signal carry-forward bug. Each pair took 5-15 minutes. Dev agent writes code, QA agent boots fresh, re-runs everything from scratch rather than trusting dev's output, signals done. The QA independence is the whole trick: it catches the kind of drift that creeps in when one context window has been open too long.

Here's the number that stopped me: sessions touching more than five files have a 78% historical error rate in my data. The VerMAS marathon touched 54. The dev-QA pairs averaged 2-4 files each and completed cleanly. I already knew the principle (small tasks, verification gates, move on), but watching both approaches run side by side on the same Wednesday made the gap visceral.

When Fowler and Ford Agree, Pay Attention

Martin Fowler published a fragment from Thoughtworks' Future of Software Development retreat. The quote that landed: "LLMs are eating specialty skills. There will be less use of specialist front-end and back-end developers as the LLM-driving skills become more important than the details of platform usage."

The same day, Paul Ford's NYT op-ed declared the AI disruption has arrived, and it's fun. I read these while literally watching agent pairs autonomously implement, test, and verify utility functions with zero intervention. The disruption isn't hypothetical. It's running in tmux panes on my Mac.

Simon Willison tied the thread tighter. After 25 years of resisting type hints, he's coming around to strong typing because agents do the actual typing now. The cost that kept him away (slowing down iteration speed) evaporated when the keyboard work got delegated. This is the pattern I keep seeing: agents don't just do the same work faster. They change which tradeoffs are rational. Type hints go from burden to free documentation. Small, rigid task scopes go from overhead to insurance.

The Editing Bottleneck Nobody Automates

The Distill content pipeline ran its full loop today: sessions to journal entries, journal to weekly essay, weekly to social adaptations (LinkedIn, Twitter, Slack), thematic deep dives on testing and AI and tooling patterns. The pipeline generated probably 15 pieces of content from the day's raw material. Image prompts, blog memory extraction, content strategy recommendations.

The bottleneck is editing. Not generation.

I can spin up a content strategist prompt that proposes ideas. I can generate visual metaphor prompts for a technical journal. I can adapt a blog post for six platforms in parallel. None of that is hard anymore. What's hard is reading the output and deciding if it's actually good. Jim Nielsen wrote about care today: in the AI world, everyone claims taste is the supreme skill. He's right, but taste scales worse than generation. I can generate 15 content pieces in 20 minutes. Editing them to the point where I'd put my name on them takes longer than writing one piece from scratch used to.

The Ladybird browser project abandoned their Swift adoption. Sometimes the courageous move is admitting an experiment didn't pay off. I'm wondering if the full-auto content pipeline is the same kind of experiment. The value might not be in publishing everything it produces, but in having it surface the one idea I wouldn't have found on my own.

Small Bets, Verified Twice

The rhythm I'm settling into: big infrastructure days (the VerMAS marathon) followed by validation days (running a dozen scoped workflows to stress-test what got built). Today was both, and the validation work taught me more. The TroopX workflows exposed a signal carry-forward bug I fixed in two separate sessions. The dev-QA pairs found it, I fixed it, the pairs verified the fix. Twelve minutes total.

Willison also announced he's experimenting with blog sponsorship after years of resistance. He values credibility as an independent voice. I think about this with the content pipeline too. Automating distribution is fine. Automating voice is where you lose the thing that makes the writing worth reading.

Fifty-four files in seven hours felt like progress. Twelve files across six agent pairs in ninety minutes was progress. The data doesn't care about how hard you worked.