I read a lot of news, and for years I’ve had the same three problems: the same story showing up five times from five sites, a firehose that didn’t know what I cared about, and certain topics I just never want to see. I finally fixed all three by letting a small language model running on my own hardware understand the articles.

All of the AI runs locally through Ollama and is hosted from server on my network. No article, no click, and none of my reading habits ever leave the house. That privacy angle is the whole reason I built it myself instead of using a hosted reader.

The foundation: embeddings

The trick behind all three features is the same. Besides asking the model for a relevance score and a summary, my ingestion pipeline asks a small embedding model (nomic-embed-text) to turn each article into a vector. Two articles about the same thing land near each other in that vector space, even if they share zero words. Once you can measure “how similar in meaning are these two things,” a lot becomes possible.

Collapsing duplicate coverage

The most satisfying one. When ten outlets cover the same announcement, I want one card, not ten. So the pipeline clusters articles by how close their embeddings are and shows a single representative, with the rest folded underneath.

Yhis is source-agnostic. A duplicate is a duplicate whether it’s two takes from the same site or the same story across five different ones. Each rolled-up card gets a little “+N” badge and a “Show N collapsed articles” button that expands an inline list so I can jump to a specific outlet’s version if I want.

The lesson here was about tuning. I started with a strict similarity threshold and almost nothing merged; too loose and unrelated stories got lumped together. I ended up testing it against a real day of articles and settling around 0.74: tight enough to keep distinct stories apart, loose enough to actually collapse the dupes.

A reader that learns my taste

Source-based ranking (“you click The Verge a lot, so rank The Verge higher”) is crude. I wanted ranking based on topics I engage with, not just where they’re published.

So the pipeline builds a taste vector: it takes everything I’ve saved (weighted heavily) and opened, averages their embeddings into a single point that represents “stuff I like,” and then scores every new article by how close it is to that point. That score becomes a gentle boost in the ranking. The more I save and read, the sharper it gets. There’s a single “personalization strength” slider if it ever leans too hard in one direction.

When I turned it on, my saved history skewed heavily toward AI tooling, and sure enough those stories floated straight to the top. It feels like the reader knows me.

The full ranking chain

Putting it all together, the full ranking chain now runs: LLM relevance score → cross-source coverage boost → personal “taste” boost → source affinity → retention.

Blocking topics, not just words

I already had keyword blocking, the kind of thing that drops anything with “Black Friday” in the title. Useful, but dumb: it only catches the exact words.

The semantic version lets me block a concept. I type a short phrase like “celebrity gossip” or “cryptocurrency price predictions” and the pipeline drops any article whose meaning is close to it, regardless of the specific wording. Keyword and built-in spam filtering still run first; this is a smarter layer on top.

This one taught me the most about the quirks of embedding models. The similarity scores from nomic-embed-text are compressed, so my first threshold blocked nothing at all. After measuring actual scores, I recalibrated and learned the real rule of thumb: be specific. A focused phrase like “reality TV drama” separates cleanly from everything else; a broad word like “gaming” overlaps with half of tech and blocks unpredictably. There’s a sensitivity slider for the rest.

Other Important changes for v2

Two structural changes came first. I dropped the n8n dependency and rebuilt the ingestion workflow in plain Python, which made the whole thing far easier to version and reason about. I also collapsed what used to be a separate view container and service container into a single container, so the stack is one thing to deploy.

Smarter Story Retention

The biggest change is how long stories live.

  • Replaced the “fresh 36-hour snapshot” model. The digest used to be rebuilt from scratch every run, so unread stories fell off purely on the clock. That was worst on quiet weekends, when little new content backfills what ages out.
  • Carry-forward: unread stories now persist across runs, so a story you didn’t get to doesn’t vanish just because its source feed rotated it out of the RSS window.
  • Rank-weighted lifespan: each story lives longer the higher it ranks. Top stories stick around up to 72h, weak ones drop after 24h, everything in between scaled by rank.
  • Floor and ceiling band: the inbox never goes dry (at least ~25 stories) and never floods (cap ~60). Over the cap, the lowest-ranked fall off first.
  • Hard age cap (120h) overrides everything, so stale news can’t linger even to hit the floor.
  • Operates on stories, not articles. Clustered coverage counts as one story, and a story’s “age” is its freshest coverage, so a developing story stays alive as new coverage lands.
  • Read stories are vacated from the carried pool, while History keeps its own untouched copy, so retention never breaks “find that thing I read recently to share it.”
  • Five new runtime-tunable settings (floor, ceiling, min and max lifespan, hard cap), all editable in-app with no restart.
  • Efficient by design: carried stories reuse their existing LLM score and summary and only re-embed, so they re-cluster against fresh coverage without re-paying the expensive scoring pass.

A balanced, self-healing layout

  • Bounded the “Earlier this week” rail to roughly match the “Latest” grid’s height, instead of a single-column list running far down the page past the 3-wide grid.
  • A “Show more / Show less” control reveals the rail’s reserve on demand.
  • Dismiss-to-backfill: closing a story re-renders so the gap fills. A vacated hero or grid slot promotes the next-ranked story, and the rail pulls its reserve up at the end.
  • No more empty sections: if you clear today’s stories off the top, the front page promotes the best carried-forward stories by ranking instead of showing “No stories.”
  • Fixed a misleading count: the unread badge now reflects the whole retained inbox (deduped to stories) rather than only “today’s” stories, so it stops resetting at midnight.

Why local matters

None of this is novel AI. It’s embeddings and cosine similarity, techniques that have been around for years. What makes it feel good is that it’s mine: it runs on my hardware, learns from my behavior, and answers only to me. No engagement-maximizing feed, no data broker, no account. Just a quiet little model on a server under the stairs, reading the news so I don’t have to read the same story twice.