Aider is the open-source AI pair-programming tool with 43,000 GitHub stars and the only coding tool that publishes its own independent multi-language leaderboard. The polyglot benchmark (225 Exercism exercises across C++, Go, Java, JavaScript, Python, and Rust) is the most language-diverse public coding evaluation that exists, and the leader on it is GPT-5 (high) at 88.0%, with Claude Opus 4 at 72.0% and DeepSeek V3.2 at 74.2% for $1.30 per benchmark run. Aider has no per-seat subscription; you bring your own model API key and pay only the inference cost. The recurring r/cursor “Why are people still using Aider in 2026?” thread has the same answer in dozens of replies: terminal-native diff-edit format, no editor lock-in, and benchmark transparency that no closed tool matches. This is the review.
Quick answer: if you live in the terminal and want a model-agnostic agent that produces clean unified-diff edits and lets you swap providers without changing your workflow, Aider is the best free tool in April 2026. If you want an IDE with parallel agents and a polished UX, take Cursor 3 or Claude Code instead.
What Aider is
- Open-source CLI tool (Aider-AI/aider on GitHub).
- Bring-your-own-key for any major LLM provider: OpenAI, Anthropic, Google, DeepSeek, OpenRouter, local models via Ollama.
- Two edit formats:
diff(default, surgical changes) andwhole(rewrite the file). Diff is what wins on the polyglot benchmark. - Git-aware: every model edit is a separate commit you can revert. The Aider commit history is its own audit trail.
- No subscription. You pay only the inference cost on the model you choose.
Where it wins on the public leaderboard
Aider polyglot is the only major public coding benchmark that bills the cost per run alongside the score. The April 2026 leaderboard:
| Model (effort) | Polyglot pass rate | Cost per benchmark run | Source |
|---|---|---|---|
| GPT-5 (high) | 88.0% | $29.08 | aider.chat |
| GPT-5 (medium) | 86.7% | $17.69 | aider.chat |
| o3-pro (high) | 84.9% | $146.32 | aider.chat |
| Gemini 2.5 Pro | 83.1% | $49.88 | aider.chat |
| GPT-5 (low) | 81.3% | $10.37 | aider.chat |
| DeepSeek V3.2 | 74.2% | $1.30 | aider.chat |
| Claude Opus 4 | 72.0% | $65.75 | aider.chat |
DeepSeek V3.2 at 74.2% for $1.30 per run was the standout when this leaderboard was last refreshed (August 2025). The board has not yet been updated for the April 2026 wave of frontier models: Claude Opus 4.7 (April 16), GPT-5.4 (March 5), GPT-5.5 (April 23), and DeepSeek V4 Pro / V4 Flash (April 24). Once added, V4 Flash should undercut V3.2 on cost-per-run by roughly 5x at $0.14/$0.28 per M tokens, and DeepSeek’s own tech report puts both V4 models at GPT-5.4-level coding performance. Treat the table above as the public benchmark of record, not as the current cost-per-task picture.
Where it wins, in our 14-task editorial scoring
| Domain | Aider + Opus 4.7 | Aider + GPT-5.3-Codex | vs Cursor Composer 2 |
|---|---|---|---|
| Refactor (single file) | 8.8 | 8.4 | +0.7 |
| Refactor (cross-package) | 8.6 | 8.0 | +0.5 |
| Test-gen | 8.5 | 8.7 | +0.5 |
| Diff-edit cleanliness | 9.4 | 9.0 | +1.1 |
| Strict JSON | 8.0 | 9.0 | +0.8 |
| Editor UX (terminal-only) | 6.5 | 6.5 | -2.7 |
Diff quality is where Aider lives. The diff edit format combined with a strong model produces cleaner unified diffs than any IDE-embedded agent we have tested. The 9.4 is not an artifact; the percent-cases-well-formed metric on Aider’s own leaderboard (97-99% on top models) confirms it. On the recurring r/ChatGPTCoding thread “Aider vs Cursor for refactor”, the consensus is the same: Aider produces a diff a senior engineer would write; Cursor and Claude Code produce a diff a senior engineer would have to clean up.
Where it loses
The terminal. If you are already in VS Code and your team reviews PRs through the GitHub UI, Aider’s git diff-first workflow is the wrong default. The 6.5 on editor UX is not Aider failing; it is Aider being a terminal tool. Pair it with delta for color-coded diffs and the experience improves; it still does not have the multi-file preview, accept-some-reject-others UX of Cursor.
The other miss is parallel agents. Aider runs one session at a time. If you want /best-of-n across multiple models on the same task, take Cursor 3 with the Agents Window.
The best Aider stacks for April 2026
| Goal | Recommended stack | Why |
|---|---|---|
| Cheapest competitive | Aider + DeepSeek V4 Flash | Released April 24, 2026 at $0.14/M input and $0.28/M output — undercuts V3.2 by ~5x; coding parity with GPT-5.4 per DeepSeek’s own tech report |
| Best refactor quality | Aider + Claude Opus 4.7 | SWE-bench Pro 64.3% leader; clean diffs out of the box |
| Best code-only quality | Aider + GPT-5.3-Codex | Polyglot leader at 88% at GPT-5 level; strict mode for any JSON tools |
| Local-first / air-gapped | Aider + Qwen3 235B via Ollama | Strong SWE-bench numbers among open weights; no data leaves your machine |
What the threads are saying
Three patterns dominate the Aider community discussion:
- The leaderboard is the trust unlock. r/LocalLLaMA refers back to Aider’s polyglot more than any other public benchmark when comparing open-weight coding models. The cost-per-run column is what makes it useful for buying decisions.
- Diff-edit format wins on quality. The recurring “diff vs whole” question on the Aider discussions tab lands on the same answer: diff for any file you are not creating from scratch, whole only for greenfield. The benchmark numbers back it.
- Maintainer responsiveness. Paul Gauthier ships releases roughly weekly; the HISTORY.md has support for new model releases within days. The Claude Sonnet 4 / Opus 4 series landed in v0.84 four days after the model launched.
Setup in 90 seconds
# Install (Python 3.10+)
pip install aider-install
aider-install
# Set your provider key
export ANTHROPIC_API_KEY=sk-ant-...
# Start in your project root
cd ~/code/myproject
aider --model claude-opus-4-7 --edit-format diff
# Or with DeepSeek for cheap
export DEEPSEEK_API_KEY=...
aider --model deepseek/deepseek-chat
The official install guide covers Windows, Conda, and Docker. The model selection page has every supported provider.
How it compares
| TCC editorial score | Aider | Cursor 3 + Composer 2 | Claude Code + Opus 4.7 | Windsurf 2.0 + Cascade |
|---|---|---|---|---|
| Diff quality | 9.4 | 8.3 | 8.6 | 8.4 |
| Editor UX | 6.5 | 9.4 | 7.8 | 9.0 |
| Model flexibility | 9.5 | 8.7 | 6.0 (Anthropic only) | 8.7 |
| Cheapest to run | $0 + API | $20-$200/mo | $20-$200/mo | $20-$200/mo |
| Parallel agents | 4.0 | 9.1 | 7.6 | 7.8 |
Verdict
Aider is the right tool when you care about diff quality, model flexibility, and audit trail more than IDE polish. It is the cheapest path to frontier-quality coding (DeepSeek V4 Flash launched April 24, 2026 at $0.14/$0.28 per M tokens — ~5x cheaper than V3.2 and at GPT-5.4-level coding quality per DeepSeek’s own tech report) and the most honest benchmark publisher in the category. Pair it with Claude Opus 4.7 for hard refactors, with GPT-5.3-Codex for strict-JSON pipelines, with GPT-5.5 for terminal-style agent loops, and with DeepSeek V4 when budget is the binding constraint.
For the methodology behind the scores above, see the 14-task scorecard. For the model comparison context, see the Claude Opus 4.7 review and the GPT-5.3-Codex review.