~/archive
§ EDITOR · PROFILE · WORKING ENGINEER. RUNS THE TOOLS ON REAL CODEBASES.

Adrian Marcus

Working engineer. Runs the tools on real codebases. · Remote

I'm Adrian Marcus. I write every review and guide on this site myself. I pay for my own subscriptions and I run the tools on real codebases, not on toy examples.

I've been shipping production software for over a decade across TypeScript, Python, Go, and Rust, mostly on developer tooling and large-scale web applications. I started The Coding Colosseum because the AI-coding reviews I kept finding were either press releases in disguise or one-shot benchmarks that never got re-run. So I built the thing I wanted: the same 14 tasks, rerun weekly, scored on the median of five runs, with every failure pattern written down.

If a tool drops a point between runs, I say so. If it earns one back, I say that too. No sponsored placements, no NDA previews, no affiliate deals that move scores.

§ PUBLISHED · 25 ARTICLES

Recent work

  1. TUTORIALS

    React useEffect cleanup function: when, why, and 4 patterns

    When and why React useEffect needs a cleanup function, the 4 patterns that cover 95% of cases, plus what changed in React 18 Strict Mode (effect runs twice).

  2. TRENDS

    Long-context evals keep diverging from reality: the 1M-token number nobody earns

    Vendor 1M-context numbers keep outperforming my production RAG task by 30+ points. The three reasons the benchmarks lie, and what I trust instead.

  3. TRENDS

    Cursor 3 ships parallel agents: what changes in my pipeline, and what does not

    Cursor 3 shipped parallel Composer 2 agents and a background agent on April 2, 2026. Two tests moved in my pipeline, four did not. The 90-second summary with numbers.

  4. CHEATSHEETS

    RAG defaults 2026 cheatsheet: copy, paste, ship

    The RAG parameter defaults that moved my top-1 accuracy from 74% to 91% in 2026. Chunk size, overlap, rerank, hybrid BM25, and the 2 flags people forget.

  5. CHEATSHEETS

    Cursor 3 shortcuts and settings cheatsheet

    The 18 Cursor 3 keyboard shortcuts and 6 settings that changed since 2.x. Composer, parallel agents, tab-complete, and the bindings they moved.

  6. CHEATSHEETS

    Claude Opus 4.7 tool calling cheatsheet: the 7 settings that make tool use reliable

    The 7 settings that move Claude Opus 4.7 tool-call reliability from 94% to 99.2%. Adaptive thinking, tool_choice, disable_parallel_tool_use, stop_sequences, and the sampling params you must now omit.

  7. CHEATSHEETS

    GPT-5.4 API cheatsheet: the 9 parameters that matter in 2026

    GPT-5.4 API parameters, defaults, and the 3 that break your pipeline if you do not set them. Strict JSON, reasoning_effort, tool_choice, and the cost line to watch.

  8. AI REVIEWS

    Gemini 3.1 Pro review: cheapest frontier token on the leaderboard, and where it still lags

    Gemini 3.1 Pro scored 7.8 on refactoring and 7.9 on structured output at $0.21 per task. The domains where cheap wins and where you need to route traffic elsewhere.

  9. AI REVIEWS

    Windsurf 2.0 review: Cascade agent on the $20 Pro plan, and the 2 tasks where it beat Cursor

    Windsurf 2.0 with Cascade 2 scored 7.9 on refactoring and 8.1 on test-gen on a 14-task suite. The 2 tasks it beat Cursor 3 + Composer 2 on, and the 3 tasks…

  10. AI REVIEWS

    Aider review: the terminal agent that still wins on diff quality

    Aider 0.80 paired with Claude Opus 4.7 scored 8.7 on refactoring and 8.5 on RAG. The diff-based workflow, the 3 commands that matter, and where it breaks.

Showing page 1
esc