~/ai-reviews/aider-review-the-terminal-agent-that-still-wins-on-diff-quality
§ REVIEW · APR 23, 2026 AIDER · DIFF · TERMINAL v1.0

Aider review: the terminal agent that still wins on diff quality

Aider 0.80 paired with Claude Opus 4.7 scored 8.7 on refactoring and 8.5 on RAG. The diff-based workflow, the 3 commands that matter, and where it breaks.
Adrian MarcusAdrian Marcus. Working engineer. Reviews AI-coding tools on real codebases, scored on a fixed 14-task suite, rerun weekly.
  5 min read
8.2/ 10
Peer score · Apr 2026
scaffold 8.0
refactor 8.7
test-gen 7.9
debug 8.1
agent 7.6

Aider is the open-source AI pair-programming tool with 43,000 GitHub stars and the only coding tool that publishes its own independent multi-language leaderboard. The polyglot benchmark (225 Exercism exercises across C++, Go, Java, JavaScript, Python, and Rust) is the most language-diverse public coding evaluation that exists, and the leader on it is GPT-5 (high) at 88.0%, with Claude Opus 4 at 72.0% and DeepSeek V3.2 at 74.2% for $1.30 per benchmark run. Aider has no per-seat subscription; you bring your own model API key and pay only the inference cost. The recurring r/cursor “Why are people still using Aider in 2026?” thread has the same answer in dozens of replies: terminal-native diff-edit format, no editor lock-in, and benchmark transparency that no closed tool matches. This is the review.

Quick Verdict
Best fordiff quality, terminal-first workflows, surgical refactors with full git history
Not best forGUI-IDE users; visual debugging; juniors who want autocomplete
Watch out formodel selection on big diffs; cost on chained edits with –auto-commits
Pro tipuse –architect on planning, –code on execution — two-pass beats one

Quick answer: if you live in the terminal and want a model-agnostic agent that produces clean unified-diff edits and lets you swap providers without changing your workflow, Aider is the best free tool in April 2026. If you want an IDE with parallel agents and a polished UX, take Cursor 3 or Claude Code instead.

What Aider is

Where it wins on the public leaderboard

Aider polyglot is the only major public coding benchmark that bills the cost per run alongside the score. The April 2026 leaderboard:

Model (effort) Polyglot pass rate Cost per benchmark run Source
GPT-5 (high) 88.0% $29.08 aider.chat
GPT-5 (medium) 86.7% $17.69 aider.chat
o3-pro (high) 84.9% $146.32 aider.chat
Gemini 2.5 Pro 83.1% $49.88 aider.chat
GPT-5 (low) 81.3% $10.37 aider.chat
DeepSeek V3.2 74.2% $1.30 aider.chat
Claude Opus 4 72.0% $65.75 aider.chat

DeepSeek V3.2 at 74.2% for $1.30 per run was the standout when this leaderboard was last refreshed (August 2025). The board has not yet been updated for the April 2026 wave of frontier models: Claude Opus 4.7 (April 16), GPT-5.4 (March 5), GPT-5.5 (April 23), and DeepSeek V4 Pro / V4 Flash (April 24). Once added, V4 Flash should undercut V3.2 on cost-per-run by roughly 5x at $0.14/$0.28 per M tokens, and DeepSeek’s own tech report puts both V4 models at GPT-5.4-level coding performance. Treat the table above as the public benchmark of record, not as the current cost-per-task picture.

Where it wins, in our 14-task editorial scoring

Domain Aider + Opus 4.7 Aider + GPT-5.3-Codex vs Cursor Composer 2
Refactor (single file) 8.8 8.4 +0.7
Refactor (cross-package) 8.6 8.0 +0.5
Test-gen 8.5 8.7 +0.5
Diff-edit cleanliness 9.4 9.0 +1.1
Strict JSON 8.0 9.0 +0.8
Editor UX (terminal-only) 6.5 6.5 -2.7

Diff quality is where Aider lives. The diff edit format combined with a strong model produces cleaner unified diffs than any IDE-embedded agent we have tested. The 9.4 is not an artifact; the percent-cases-well-formed metric on Aider’s own leaderboard (97-99% on top models) confirms it. On the recurring r/ChatGPTCoding thread “Aider vs Cursor for refactor”, the consensus is the same: Aider produces a diff a senior engineer would write; Cursor and Claude Code produce a diff a senior engineer would have to clean up.

Where it loses

The terminal. If you are already in VS Code and your team reviews PRs through the GitHub UI, Aider’s git diff-first workflow is the wrong default. The 6.5 on editor UX is not Aider failing; it is Aider being a terminal tool. Pair it with delta for color-coded diffs and the experience improves; it still does not have the multi-file preview, accept-some-reject-others UX of Cursor.

The other miss is parallel agents. Aider runs one session at a time. If you want /best-of-n across multiple models on the same task, take Cursor 3 with the Agents Window.

The best Aider stacks for April 2026

Goal Recommended stack Why
Cheapest competitive Aider + DeepSeek V4 Flash Released April 24, 2026 at $0.14/M input and $0.28/M output — undercuts V3.2 by ~5x; coding parity with GPT-5.4 per DeepSeek’s own tech report
Best refactor quality Aider + Claude Opus 4.7 SWE-bench Pro 64.3% leader; clean diffs out of the box
Best code-only quality Aider + GPT-5.3-Codex Polyglot leader at 88% at GPT-5 level; strict mode for any JSON tools
Local-first / air-gapped Aider + Qwen3 235B via Ollama Strong SWE-bench numbers among open weights; no data leaves your machine

What the threads are saying

Three patterns dominate the Aider community discussion:

  1. The leaderboard is the trust unlock. r/LocalLLaMA refers back to Aider’s polyglot more than any other public benchmark when comparing open-weight coding models. The cost-per-run column is what makes it useful for buying decisions.
  2. Diff-edit format wins on quality. The recurring “diff vs whole” question on the Aider discussions tab lands on the same answer: diff for any file you are not creating from scratch, whole only for greenfield. The benchmark numbers back it.
  3. Maintainer responsiveness. Paul Gauthier ships releases roughly weekly; the HISTORY.md has support for new model releases within days. The Claude Sonnet 4 / Opus 4 series landed in v0.84 four days after the model launched.

Setup in 90 seconds

# Install (Python 3.10+)
pip install aider-install
aider-install

# Set your provider key
export ANTHROPIC_API_KEY=sk-ant-...

# Start in your project root
cd ~/code/myproject
aider --model claude-opus-4-7 --edit-format diff

# Or with DeepSeek for cheap
export DEEPSEEK_API_KEY=...
aider --model deepseek/deepseek-chat

The official install guide covers Windows, Conda, and Docker. The model selection page has every supported provider.

How it compares

TCC editorial score Aider Cursor 3 + Composer 2 Claude Code + Opus 4.7 Windsurf 2.0 + Cascade
Diff quality 9.4 8.3 8.6 8.4
Editor UX 6.5 9.4 7.8 9.0
Model flexibility 9.5 8.7 6.0 (Anthropic only) 8.7
Cheapest to run $0 + API $20-$200/mo $20-$200/mo $20-$200/mo
Parallel agents 4.0 9.1 7.6 7.8

Verdict

Aider is the right tool when you care about diff quality, model flexibility, and audit trail more than IDE polish. It is the cheapest path to frontier-quality coding (DeepSeek V4 Flash launched April 24, 2026 at $0.14/$0.28 per M tokens — ~5x cheaper than V3.2 and at GPT-5.4-level coding quality per DeepSeek’s own tech report) and the most honest benchmark publisher in the category. Pair it with Claude Opus 4.7 for hard refactors, with GPT-5.3-Codex for strict-JSON pipelines, with GPT-5.5 for terminal-style agent loops, and with DeepSeek V4 when budget is the binding constraint.

For the methodology behind the scores above, see the 14-task scorecard. For the model comparison context, see the Claude Opus 4.7 review and the GPT-5.3-Codex review.

esc