~/archive
§ EDITOR · PROFILE · WORKING ENGINEER. RUNS THE TOOLS ON REAL CODEBASES.

Adrian Marcus

Working engineer. Runs the tools on real codebases. · Remote

I'm Adrian Marcus. I write every review and guide on this site myself. I pay for my own subscriptions and I run the tools on real codebases, not on toy examples.

I've been shipping production software for over a decade across TypeScript, Python, Go, and Rust, mostly on developer tooling and large-scale web applications. I started The Coding Colosseum because the AI-coding reviews I kept finding were either press releases in disguise or one-shot benchmarks that never got re-run. So I built the thing I wanted: the same 14 tasks, rerun weekly, scored on the median of five runs, with every failure pattern written down.

If a tool drops a point between runs, I say so. If it earns one back, I say that too. No sponsored placements, no NDA previews, no affiliate deals that move scores.

§ PUBLISHED · 21 ARTICLES

Recent work

  1. TROUBLESHOOTING

    Cannot read properties of undefined: 6 root causes, 1 fix each

    Six places this JavaScript TypeError comes from, in order from most to least common, with the exact line you should rewrite for each.

  2. TUTORIALS

    React useEffect cleanup function: when, why, and 4 patterns

    When and why React useEffect needs a cleanup function, the 4 patterns that cover 95% of cases, plus what changed in React 18 Strict Mode (effect runs twice).

  3. TRENDS

    Long-context evals diverge from reality: the 1M-token gap

    Vendor 1M-context numbers keep outperforming my production RAG task by 30+ points. The three reasons the benchmarks lie, and what I trust instead.

  4. TRENDS

    Cursor 3 ships parallel agents: what changes, what doesn’t

    Cursor 3 shipped parallel Composer 2 agents and a background agent on April 2, 2026. Two tests moved in my pipeline, four did not. The 90-second summary with numbers.

  5. CHEATSHEETS

    Cursor 3 shortcuts and settings cheatsheet: the 22 that matter

    The 18 Cursor 3 keyboard shortcuts and 6 settings that changed since 2.x. Composer, parallel agents, tab-complete, and the bindings they moved.

  6. CHEATSHEETS

    Claude Opus 4.7 tool calling: 7 settings for reliable use

    The 7 settings that move Claude Opus 4.7 tool-call reliability from 94% to 99.2%. Adaptive thinking, tool_choice, disable_parallel_tool_use, stop_sequences, and the sampling params you must now omit.

  7. CHEATSHEETS

    GPT-5.4 API cheatsheet: the 9 parameters that matter in 2026

    GPT-5.4 API parameters, defaults, and the 3 that break your pipeline if you do not set them. Strict JSON, reasoning_effort, tool_choice, and the cost line to watch.

  8. AI REVIEWS

    Gemini 3.1 Pro review: cheapest frontier token, 4 places it lags

    Gemini 3.1 Pro scored 7.8 on refactoring and 7.9 on structured output at $0.21 per task. The domains where cheap wins and where you need to route traffic elsewhere.

  9. AI REVIEWS

    Windsurf 2.0 review: Cascade on $20 Pro, 2 wins over Cursor

    Windsurf 2.0 with Cascade 2 scored 7.9 on refactoring and 8.1 on test-gen on a 14-task suite. The 2 tasks it beat Cursor 3 + Composer 2 on, and the 3 tasks…

  10. AI REVIEWS

    Aider review: the terminal agent that still wins on diff quality

    Aider 0.80 paired with Claude Opus 4.7 scored 8.7 on refactoring and 8.5 on RAG. The diff-based workflow, the 3 commands that matter, and where it breaks.

Showing page 1
esc