Killing the research analyst

Hands tearing formal drafts on sunlit desk

Distill has had this prompt sitting in intake/prompts.py since I first wrote it: "You are a research analyst synthesizing a daily reading digest." Every time I read the output, something felt off. The digests were competent. They had structure. They identified themes. They even had a "threads to watch" section at the end, like a McKinsey deck that nobody asked for.

Today I ripped it out and replaced it with: "You are writing a personal essay about today's reading. You are Nik, a software engineer and builder."

The diff is about 120 lines. Most of the change is anti-patterns. I added a list of things the LLM should never do: no "Emerging Themes" headers, no bullet-point takeaways, no "the strongest thread today..." narration. I banned em-dashes across the entire prompt stack while I was at it. The blog prompts, the Twitter thread generator, the LinkedIn formatter. Every replaced with a comma or a new sentence. Twenty-some files touched in src/blog/prompts.py alone just for punctuation hygiene.

This sounds trivial. It took me a while to figure out why it matters. When you tell an LLM "you are a research analyst," it writes like a research analyst. Formal. Structured. Distant. The output reads like it was written by someone who read the articles but doesn't actually care about them. When you tell it "you are Nik, writing for smart friends," something shifts. The voice gets more specific. The connections get weirder and more personal. The output becomes something I'd actually want to read.

Prompt engineering is style editing. You're not really programming. You're doing the same work a magazine editor does when they hand back a draft and say "this sounds like a press release, make it sound like you." Which raises the obvious question: what happens when you apply the same thinking to code that's been sitting around pretending to be useful?

Chisel removing ornate molding from cabinet

Pulling out the dead weight

The other big change today: deleting VerMAS. 842 lines in src/parsers/vermas.py, 1,648 lines of tests in tests/parsers/test_vermas.py, another 753 in test_vermas_metadata_exposure.py. Gone. The measurer that tracked VerMAS task visibility, 231 lines, also gone. Two other measurers (cli_runs_clean.py, note_content_richness.py) got gutted because they were coupled to assumptions that no longer hold.

Final score: +1,262 lines, -5,298 lines. Net negative 4,000 lines. That felt good.

VerMAS was a task management experiment from months ago. It had its own parser, its own metadata format, its own test fixtures. And it hadn't been used in weeks. The code was clean, well-tested, completely irrelevant. Deleting it was trivially easy and emotionally hard in exactly the way that deleting working code always is. You think: what if I need this later? You think: those tests took effort. Then you delete it anyway because dead code is worse than no code. Dead code lies to you about what the system does.

The cleanup left behind a question, though. If those measurers were coupled to VerMAS assumptions, what were they actually measuring? And what should replace them? That's still open. For now, fewer broken tests.

Content items get a face

On the web dashboard side, I built out the Reading page. The previous version just listed intake digests, these synthesized daily summaries. Now it also shows the raw content items themselves, with a tab interface to switch between "Items" and "Digests."

The new ContentItemCard component in web/src/components/shared/ContentItemCard.tsx is 82 lines of React. Each card shows the title, source badge (color-coded per platform: amber for browser history, teal for Substack, sky blue for Twitter), an excerpt that expands on click, author, word count, and tags. The SourceFilterPills component lets you filter by source type. There's pagination. The server route in web/server/routes/reading.ts reads from the intake archive JSON files, supports ?source= and ?page= query params.

Building this surfaced a design question I haven't resolved: should the Reading page show all items across all dates, or should it be date-scoped? Right now it's date-scoped. You pick a date from a dropdown, you see that day's items. But the more natural interaction might be "show me everything from Substack this week" or "show me all the AI articles regardless of when I read them." The filter pills hint at this. Source filtering within a single date is useful but limited. Cross-date filtering would be more powerful and more complicated. I'm leaving it for now.

That same question, it turns out, shows up in a different form when you're trying to extract structured data from a pile of articles.

Inspection light revealing bottle flaws on conveyor

The entity extraction noise

One thing I noticed running the pipeline today: the entity extraction step in src/intake/intelligence.py fires off dozens of tiny Claude calls. Each one takes three to five seconds. The content of those sessions, visible in the build log, is just "Extract named entities from each content item below. Return ONLY valid JSON..." repeated thirty times. Individually cheap. Collectively slow. I changed the batching logic to send larger groups per call, but there's a ceiling on how much context you can stuff into a single extraction prompt before the model starts dropping items from the array.

This is the classic batch-vs-stream tradeoff that shows up everywhere in LLM pipelines. Small prompts are reliable but slow. Big prompts are fast but fragile. Somewhere in the middle is a sweet spot that depends on the model, the task, and how much you care about the occasional dropped entity. For intake enrichment, I don't care much. If a blog post's tags are missing "React" one day, nobody notices. So I'm biasing toward bigger batches and accepting some noise.

The real lesson, though, loops back to the prompt change at the top. Whether you're telling an LLM to write like a person or extract entities from a batch, you're making tradeoffs between structure and naturalness, speed and reliability, formality and voice. The job is figuring out which tradeoff matters for the thing you're building. Today it was voice. Tomorrow it might be speed. But you have to pick.