§ EDITOR · PROFILE · WORKING ENGINEER. RUNS THE TOOLS ON REAL CODEBASES.

Adrian Marcus

Working engineer. Runs the tools on real codebases. · Remote

I'm Adrian Marcus. I write every review and guide on this site myself. I pay for my own subscriptions and I run the tools on real codebases, not on toy examples.

I've been shipping production software for over a decade across TypeScript, Python, Go, and Rust, mostly on developer tooling and large-scale web applications. I started The Coding Colosseum because the AI-coding reviews I kept finding were either press releases in disguise or one-shot benchmarks that never got re-run. So I built the thing I wanted: the same 14 tasks, rerun weekly, scored on the median of five runs, with every failure pattern written down.

If a tool drops a point between runs, I say so. If it earns one back, I say that too. No sponsored placements, no NDA previews, no affiliate deals that move scores.

§ PUBLISHED · 25 ARTICLES

Recent work

TUTORIALS Apr 24, 2026

React useEffect cleanup function: when, why, and 4 patterns

When and why React useEffect needs a cleanup function, the 4 patterns that cover 95% of cases, plus what changed in React 18 Strict Mode (effect runs twice).
TRENDS Apr 23, 2026

Long-context evals keep diverging from reality: the 1M-token number nobody earns

Vendor 1M-context numbers keep outperforming my production RAG task by 30+ points. The three reasons the benchmarks lie, and what I trust instead.
TRENDS Apr 23, 2026

Cursor 3 ships parallel agents: what changes in my pipeline, and what does not

Cursor 3 shipped parallel Composer 2 agents and a background agent on April 2, 2026. Two tests moved in my pipeline, four did not. The 90-second summary with numbers.
CHEATSHEETS Apr 23, 2026

RAG defaults 2026 cheatsheet: copy, paste, ship

The RAG parameter defaults that moved my top-1 accuracy from 74% to 91% in 2026. Chunk size, overlap, rerank, hybrid BM25, and the 2 flags people forget.
CHEATSHEETS Apr 23, 2026

Cursor 3 shortcuts and settings cheatsheet

The 18 Cursor 3 keyboard shortcuts and 6 settings that changed since 2.x. Composer, parallel agents, tab-complete, and the bindings they moved.
CHEATSHEETS Apr 23, 2026

Claude Opus 4.7 tool calling cheatsheet: the 7 settings that make tool use reliable

The 7 settings that move Claude Opus 4.7 tool-call reliability from 94% to 99.2%. Adaptive thinking, tool_choice, disable_parallel_tool_use, stop_sequences, and the sampling params you must now omit.
CHEATSHEETS Apr 23, 2026

GPT-5.4 API cheatsheet: the 9 parameters that matter in 2026

GPT-5.4 API parameters, defaults, and the 3 that break your pipeline if you do not set them. Strict JSON, reasoning_effort, tool_choice, and the cost line to watch.
AI REVIEWS Apr 23, 2026

Gemini 3.1 Pro review: cheapest frontier token on the leaderboard, and where it still lags

Gemini 3.1 Pro scored 7.8 on refactoring and 7.9 on structured output at $0.21 per task. The domains where cheap wins and where you need to route traffic elsewhere.
AI REVIEWS Apr 23, 2026

Windsurf 2.0 review: Cascade agent on the $20 Pro plan, and the 2 tasks where it beat Cursor

Windsurf 2.0 with Cascade 2 scored 7.9 on refactoring and 8.1 on test-gen on a 14-task suite. The 2 tasks it beat Cursor 3 + Composer 2 on, and the 3 tasks…
AI REVIEWS Apr 23, 2026

Aider review: the terminal agent that still wins on diff quality

Aider 0.80 paired with Claude Opus 4.7 scored 8.7 on refactoring and 8.5 on RAG. The diff-based workflow, the 3 commands that matter, and where it breaks.

Showing page 1

12 3 ▶