RAG defaults 2026 cheatsheet: copy, paste, ship

The Slack-pasteable version of TCC’s RAG defaults for April 2026. The full reasoning, the ablations, and the per-parameter charts are on the RAG defaults guide. This is the table and the copy-paste config.

The settings that move top-1 accuracy

Parameter	Naive default	Ship in 2026	Direction of move
Chunk size (tokens)	512	768	+~5 pp on technical corpora
Chunk overlap (tokens)	0	128	+~3 pp; recovers boundary context
Chunking strategy	semantic	fixed-size	+~2 pp; simpler, more reliable
Retriever top-k (before rerank)	5	20	+~6 pp when paired with rerank
Reranker	none	Cohere Rerank v4.0, final k=3	+~4-5 pp; the single biggest lever
Embedding model	ada-002 era	Voyage 4 Large or OpenAI text-embedding-3-large	+~1 pp; modern embeddings still help
Hybrid BM25 weight	0	0.35 (technical corpus)	+~2 pp on code/SKU; 0 on prose
Boilerplate strip	off	on	+~2 pp; PDF and HTML extraction

Direction of move is from independent and TCC editorial ablations on a mixed-technical corpus. Your fixture will produce different absolute deltas; the direction holds. The single largest lever is the reranker; if you only have time for one change, set top-k to 20 and bolt on Cohere Rerank v4.0.

Paste-this config

# rag.yaml
chunker:
  strategy: fixed_size
  size_tokens: 768
  overlap_tokens: 128
  strip_boilerplate: true

embed:
  provider: voyage
  model: voyage-4-large

retrieve:
  top_k: 20
  hybrid_bm25_weight: 0.35    # set to 0 on prose-only corpora

rerank:
  provider: cohere
  model: rerank-v4.0-pro
  final_k: 3

eval:
  ground_truth_path: evals/ragtruth.jsonl
  min_top_1: 0.88

2 flags people forget

Normalize whitespace before chunking. PDF extraction leaves ragged spacing. Run the text through re.sub(r"\s+", " ", text).strip() before you chunk. The recurring r/MachineLearning “RAG quality is bad on PDFs” thread converges on this in the top reply most of the time.
Store the chunk offset, not just the text. When you show the retrieved chunk to a user, you will want to link into the source document at the right byte offset. Reconstructing that offset from the chunk text later is a multi-day project. Store it at index time.

Watch-outs

Chunk overlap is not free. At 128 tokens of overlap on a 1,400-chunk corpus, your index grows ~18%. Budget accordingly.
BM25 hybrid weight of 0.35 is the number for a mixed-technical corpus. Pure prose: set to 0. Pure code or SKU-heavy: try 0.5. Always ground-truth the choice; “it feels better” is not a reason to keep it.
The reranker is the single biggest accuracy lever. If you only ship one change, set top-k to 20 and add Cohere Rerank v4.0. Everything else is second-order.
Long-context models (Gemini 3.1 Pro at 1M, Claude Opus 4.7 at 1M) shift the math: smaller corpora can skip retrieval entirely. The long-context divergence trend post covers when that holds and when it does not.

The full methodology, the ground-truth set, and the per-parameter ablations are in the RAG defaults guide. The eval harness wrapped around every RAG change is on the evals-without-judges post. The Pinecone learn series has the long-form explanation of hybrid scoring if you want the math.

One-line takeaway

Fixed-size 768-token chunks with 128 overlap, retrieve top-20, rerank to top-3 with Cohere Rerank v4.0, BM25 hybrid at 0.35 only on technical corpora, boilerplate strip on by default. That is the production-grade RAG default in April 2026.