Research Notes

All posts

Experiments, evals, and honest writeups — no filler

2026-05-08

Printing Press: When You Don't Have to Write the CLI

We generated a production GitHub CLI — 1,404 Go files, SQLite mirror, MCP server — in 45 seconds. Here's what Printing Press actually does, what the compound query gap means for AI agents, and why the local mirror is the key insight.

Read →

2026-05-07

Pre-Compiled Knowledge Artifacts vs Naive RAG: An Empirical Evaluation

265 documents. 10 queries. Two retrieval paths measured side-by-side. We tested whether pre-compiling Q&A pairs at index time can replace naive chunk retrieval — and where the approach breaks on cross-document and novel questions.

Read →

2026-05-07

We Gave a Small Model a Search Engine. The Results Were Not What We Expected.

MiniMax-M2.7 as a multi-hop retrieval agent on HotpotQA. +45pp over baseline RRF on hard 3-hop questions. But also: a surprising divergence between LanceDB and Qdrant at agent-level that raw recall metrics completely miss.

Read →