APR 23
3 min
analysis
analysis
Long-context evals keep diverging from reality: the 1M-token number nobody earns
Vendor 1M-context numbers keep outperforming my production RAG task by 30+ points. The three reasons the benchmarks lie, and what I trust instead.
—
unrated
read →3 min
APR 23
2 min
cursor
cursor
Cursor 3 ships parallel agents: what changes in my pipeline, and what does not
Cursor 3 shipped parallel Composer 2 agents and a background agent on April 2, 2026. Two tests moved in my pipeline, four did not. The 90-second summary with numbers.
—
unrated
read →2 min