The Slack-pasteable version of TCC’s RAG defaults for April 2026. The full reasoning, the ablations, and the per-parameter charts are on the RAG defaults guide. This is the table and the copy-paste config.
The settings that move top-1 accuracy
| Parameter | Naive default | Ship in 2026 | Direction of move |
|---|---|---|---|
| Chunk size (tokens) | 512 | 768 | +~5 pp on technical corpora |
| Chunk overlap (tokens) | 0 | 128 | +~3 pp; recovers boundary context |
| Chunking strategy | semantic | fixed-size | +~2 pp; simpler, more reliable |
| Retriever top-k (before rerank) | 5 | 20 | +~6 pp when paired with rerank |
| Reranker | none | Cohere Rerank v4.0, final k=3 | +~4-5 pp; the single biggest lever |
| Embedding model | ada-002 era | Voyage 4 Large or OpenAI text-embedding-3-large | +~1 pp; modern embeddings still help |
| Hybrid BM25 weight | 0 | 0.35 (technical corpus) | +~2 pp on code/SKU; 0 on prose |
| Boilerplate strip | off | on | +~2 pp; PDF and HTML extraction |
Direction of move is from independent and TCC editorial ablations on a mixed-technical corpus. Your fixture will produce different absolute deltas; the direction holds. The single largest lever is the reranker; if you only have time for one change, set top-k to 20 and bolt on Cohere Rerank v4.0.
Paste-this config
# rag.yaml
chunker:
strategy: fixed_size
size_tokens: 768
overlap_tokens: 128
strip_boilerplate: true
embed:
provider: voyage
model: voyage-4-large
retrieve:
top_k: 20
hybrid_bm25_weight: 0.35 # set to 0 on prose-only corpora
rerank:
provider: cohere
model: rerank-v4.0-pro
final_k: 3
eval:
ground_truth_path: evals/ragtruth.jsonl
min_top_1: 0.88
2 flags people forget
- Normalize whitespace before chunking. PDF extraction leaves ragged spacing. Run the text through
re.sub(r"\s+", " ", text).strip()before you chunk. The recurring r/MachineLearning “RAG quality is bad on PDFs” thread converges on this in the top reply most of the time. - Store the chunk offset, not just the text. When you show the retrieved chunk to a user, you will want to link into the source document at the right byte offset. Reconstructing that offset from the chunk text later is a multi-day project. Store it at index time.
Watch-outs
- Chunk overlap is not free. At 128 tokens of overlap on a 1,400-chunk corpus, your index grows ~18%. Budget accordingly.
- BM25 hybrid weight of 0.35 is the number for a mixed-technical corpus. Pure prose: set to 0. Pure code or SKU-heavy: try 0.5. Always ground-truth the choice; “it feels better” is not a reason to keep it.
- The reranker is the single biggest accuracy lever. If you only ship one change, set top-k to 20 and add Cohere Rerank v4.0. Everything else is second-order.
- Long-context models (Gemini 3.1 Pro at 1M, Claude Opus 4.7 at 1M) shift the math: smaller corpora can skip retrieval entirely. The long-context divergence trend post covers when that holds and when it does not.
Related
The full methodology, the ground-truth set, and the per-parameter ablations are in the RAG defaults guide. The eval harness wrapped around every RAG change is on the evals-without-judges post. The Pinecone learn series has the long-form explanation of hybrid scoring if you want the math.
One-line takeaway
Fixed-size 768-token chunks with 128 overlap, retrieve top-20, rerank to top-3 with Cohere Rerank v4.0, BM25 hybrid at 0.35 only on technical corpora, boilerplate strip on by default. That is the production-grade RAG default in April 2026.