APR 23
10 min
advanced
advanced
Refactor legacy TS with AI: 14 call sites, no re-export trap
A step-by-step AI refactor on a 63k-line TypeScript monorepo. The prompt, the re-export trap, and why Claude Opus 4.7 + Aider beat the IDE agent by 3 call sites.
8.4
peer score
read →10 min
APR 23
10 min
eval
eval
Evals without LLM judges: a harness that catches regressions
How I score LLM pipelines without an LLM-as-judge. Deterministic graders, property-based checks, and the 4 reasons a judge model keeps biting you in production.
7.9
peer score
read →10 min
APR 23
10 min
chunking
chunking
RAG defaults 2026: chunks, rerankers, 3 settings that matter
The chunk size, overlap, rerank, and top-k values that moved my retrieval accuracy from 74% to 91%. Tested on a 1,400-chunk corpus with a ground-truth answer set.
8.1
peer score
read →10 min
APR 23
13 min
advanced
advanced
The case against autonomous coding agents in 2026
Autonomous coding agents still fail 1 in 9 production runs on my suite. The three failure modes that cause it, and where a bounded planner is the honest answer.
8.9
peer score
read →13 min
APR 23
13 min
intermediate
intermediate
Structured outputs, three years in: the one pattern that survived
Three years of shipping LLM structured outputs in production. The one pattern that survived, the three that did not, and the strict-JSON failure rate I run at today.
9.1
peer score
read →13 min
APR 23
6 min
advanced
advanced
Agent loops and retries: 4-step policy that cuts 429s 30x
The retry policy that cut my agent-loop 429 rate from 6% to 0.2% across 4 vendors. Jitter, step-budget interlock, and the one thing you should never retry.
9.4
peer score
read →6 min