~/about-adrian-marcus-and-the-coding-colosseum
§ PAGEupdated APR 23, 2026

About Adrian Marcus and The Coding Colosseum

Adrian Marcus reviews AI coding tools on real codebases, scored on a fixed 14-task suite, rerun every week. No sponsored posts. No vendor previews.

I’m Adrian Marcus. I write every review, guide, and prompt on this site myself. I pay for my own Claude, GPT-5, Gemini, Cursor, and Aider subscriptions out of the same budget I use for side projects. Nothing on The Coding Colosseum is sponsored, previewed by a vendor, or moved by an affiliate deal.

I’ve been shipping production software for over a decade. Most of that work was on developer tooling and large-scale web apps, across TypeScript, Python, Go, and Rust. I started this site because the AI-coding reviews I kept finding were either press releases reformatted into listicles, or one-shot benchmarks that nobody reran when the next model dropped a month later. The prompt-engineering posts were worse: a screenshot of a ChatGPT tab and three bullet points of advice.

What the site publishes

Three things, in order of priority:

About the editor

How I test a tool

The same 14 tasks, every time. Refactors across a 63k-line TypeScript monorepo, test-gen with property-based assertions, debugging non-trivial production stack traces, schema design from prose requirements, bounded-budget agent planning, and six more. Tasks are versioned in a private fixture repo with deterministic inputs. I run each task 5 times with a clean context, record the transcripts, and score each run 0-10 on a rubric that is also public.

Every tool gets the same task. Every tool gets the same clock on the same hardware. The one thing that changes is the model or the client. When a tool ships an update I’ve already tested, I rerun the suite within a week and publish a dated delta. A full example of a rerun and the resulting score change sits on the Claude Opus 4.7 review.

How I cover vendors

The same way I’d cover a compiler. I read the official docs (Anthropic, OpenAI, Google AI, Aider), I open a public paid account, and I run the 14 tasks. No NDA previews. No advance copies. I never ask a vendor to comment on a review before publication, and I don’t accept changes to a score in exchange for coverage. Corrections are a different matter: if a score is wrong, I fix it, date it, and keep the old number visible in the changelog on the affected post.

Where I draw the line

I don’t benchmark public evals against each other. I don’t run HumanEval again. I don’t chart HELM scores on a color-graded bar. The 14-task suite is the suite. If a tool wins there, it wins here. If it loses, it loses.

I also don’t pretend my suite is the whole picture. It covers the coding work I actually do, plus tasks that showed up in threads on Hacker News and r/ExperiencedDevs more than once. If you build infra for self-driving cars, the suite undersells the tool you need. If you ship web apps, it’s close.

Contact and corrections

Wrong score, dead link, broken fixture, or a claim that’s gone stale? Email corrections@thecodingcolosseum.com with a link and two lines. I fix in under a week and keep a dated changelog on the affected post. Editorial, partnerships, or speaking: editor@thecodingcolosseum.com.

That’s the whole operation. One person, one suite, one repo of fixtures, rerun weekly.

esc