APR 23
11 min
analysis
analysis
Long-context evals diverge from reality: the 1M-token gap
Vendor 1M-context numbers keep outperforming my production RAG task by 30+ points. The three reasons the benchmarks lie, and what I trust instead.
—
unrated
read →11 min
APR 23
12 min
cursor
cursor
Cursor 3 ships parallel agents: what changes, what doesn’t
Cursor 3 shipped parallel Composer 2 agents and a background agent on April 2, 2026. Two tests moved in my pipeline, four did not. The 90-second summary with numbers.
—
unrated
read →12 min