Killing Your Darlings, 842 Lines at a Time

Sculptor's workshop with fresh rubble from removed marble details

I deleted the VerMAS parser today. 842 lines of src/parsers/vermas.py, gone. The test files, gone. The measurer for task visibility, gone. The source_directories entry in core.py that mapped "vermas": ".vermas", gone. Net diff on the Python side: something like -1,200 lines. It felt great.

There's a particular satisfaction in removing code that once felt essential. The VerMAS parser was one of Distill's three session sources, a peer to the Claude and Codex parsers. But VerMAS evolved. The data it produces doesn't fit the BaseSession model the way Claude and Codex sessions do, and wrapping it in that shape was producing increasingly tortured adapter code. Forty lines of special-casing in cli_runs_clean.py. Sixty-seven lines of VerMAS-specific logic in note_content_richness.py. A whole measurer, vermas_task_visibility.py, that only existed because VerMAS sessions had task metadata the other sources didn't.

The cleaner solution is to bring VerMAS data in through the intake pipeline as ContentItems, which is exactly what the session.py parser in src/intake/parsers/ already does. One unified ingestion path instead of two divergent ones. This is the kind of refactor you can only do after you've built both paths and lived with the duplication long enough to know which one wins.

That same tension between specialized and unified paths showed up in my reading today, in a different domain entirely.

Aerial view of two rivers merging into one channel

What Ptacek Sees That Others Don't

Thomas Ptacek called vulnerability research "THE MOST LLM-amenable software engineering problem." Pattern-driven. Huge public corpus. Closed feedback loops. I've been thinking about this all day, because it connects to something I keep circling: the difference between automation and autonomy in AI agents.

Vuln research is a perfect automation target. The patterns are legible. The success criteria are binary: you found the bug or you didn't. The search space is vast but structured. An LLM doing vuln research is replacing steps, not judgment. It's scanning more code, matching more patterns, trying more variations. Faster, cheaper, wider. But the agent isn't deciding what to look for or why it matters. A human still sets the target and evaluates the output.

Contrast that with what I've been building in Distill, where the pipeline makes actual editorial decisions. The blog synthesizer picks themes. The intelligence module classifies content. The intake prompts decide what's interesting. That's closer to autonomy, and it's much harder to get right. There's no binary success criterion for "did you write a good essay about this person's week." The feedback loop is me reading the output and grimacing.

Sean Goedecke's piece on heroism in large tech companies lands differently when you think about it through the automation/autonomy lens. His argument is that big companies run on systems, not heroes. Individual engineers who try to patch local inefficiencies through sheer effort are fighting the system, and the system always wins. The hero burns out. The process grinds on.

But what if the "hero" is an agent? The reason human heroism doesn't scale is that humans get tired, resentful, promoted, or bored. An agent doing the same kind of local optimization, fixing tests, cleaning up code, patching process gaps, doesn't have those failure modes. It has different ones (hallucination, context loss, misaligned objectives), but fatigue isn't among them. The question is whether agent-driven local optimization hits the same ceiling that human heroism does, or whether the economics change enough that it actually works.

Either way, you need infrastructure that lets you see what's happening. Which brings me back to the dashboard work.

Open library card catalog drawer with annotated index cards

Reading Pages and Source Filters

On the web dashboard side, I spent the afternoon making the Reading page actually useful. The reading.tsx route went from 132 lines to 244. New ContentItemCard.tsx component (60 lines), new SourceFilterPills.tsx (70 lines), a proper date-based drill-down in reading.$date.tsx. The server got a new API route for fetching content items with source filtering. You can now browse your intake by date and filter by source: RSS, browser history, Substack, whatever.

This is the kind of feature that sounds boring when described but changes how you interact with the tool. Before, the Reading page was a list of dates. Now it's a browsable archive with facets. I can see that on Tuesday I read fourteen RSS items and three Substack posts, and drill into each one. The ContentItemCard shows the source, title, a snippet, and tags. Small thing, but it turns the dashboard from "a place to look at your journal" into "a place to understand your information diet."

Mitchell Hashimoto's Vouch system caught my eye for a related reason. The open source world is drowning in AI-generated PRs now that the friction of contributing has dropped to zero. Vouch solves this with a simple social layer: unvouched users can't contribute. Contributors vouch for people they trust. A reputation system, basically, but scoped to individual projects rather than some global identity. The projects decide their own standards.

What I like about Vouch is that it adds friction back selectively. It doesn't make contributing harder for everyone. It makes contributing harder for people nobody in the project knows. That's the right tradeoff. And it connects to this idea I keep coming back to with VerMAS and Distill: the more autonomous your agents become, the more important your infrastructure layer is. Vouch is infrastructure. The constitutional layer in VerMAS is infrastructure. The intake pipeline's classification system is infrastructure. Safety nets don't restrict freedom. They make freedom possible.

Also: there's a Canadian video titler from 1985 that's basically a home computer with a 6809 CPU and 32x16 character display, and someone is trying to hack it into running programs. I have nothing to connect this to. I just love that people are still finding weird old hardware and figuring out what it can do. Some things don't need to scale, don't need infrastructure, don't need autonomy. Sometimes the right project is just seeing what a thirty-nine-year-old circuit board can still do.