intermediate

APR 23 10 min
eval

Evals without LLM judges: a harness that catches regressions

How I score LLM pipelines without an LLM-as-judge. Deterministic graders, property-based checks, and the 4 reasons a judge model keeps biting you in production.

evals intermediate

7.9 peer score

read →10 min

APR 23 10 min
chunking

RAG defaults 2026: chunks, rerankers, 3 settings that matter

The chunk size, overlap, rerank, and top-k values that moved my retrieval accuracy from 74% to 91%. Tested on a 1,400-chunk corpus with a ground-truth answer set.

intermediate rag retrieval

8.1 peer score

read →10 min

APR 23 13 min
intermediate

Structured outputs, three years in: the one pattern that survived

Three years of shipping LLM structured outputs in production. The one pattern that survived, the three that did not, and the strict-JSON failure rate I run at today.

json schema structured

9.1 peer score

read →13 min

#Tag · intermediate