Aider is the open-source AI pair-programming tool with 43,000 GitHub stars and the only coding tool that publishes its own independent multi-language leaderboard. The polyglot benchmark (225 Exercism exercises across C++, Go, Java, JavaScript, Python, and Rust) is the most language-diverse public coding evaluation that exists, and the current leader is GPT-5 (high) at 88.0%, with DeepSeek V3.2 at 74.2% for $1.30 per benchmark run. Aider has no per-seat subscription; you bring your own model API key and pay only the inference cost. The recurring r/cursor “Why are people still using Aider in 2026?” thread has the same answer in dozens of replies: terminal-native diff-edit format, no editor lock-in, and benchmark transparency that no closed tool matches. This is the review.
Quick answer: if you live in the terminal and want a model-agnostic agent that produces clean unified-diff edits and lets you swap providers without changing your workflow, Aider is the best free tool in April 2026. If you want an IDE with parallel agents and a polished UX, take Cursor 3 or Claude Code instead. If you want an open-source option inside VS Code, Cline is the closer comparison.
What Aider is
- Open-source CLI tool (Aider-AI/aider on GitHub). Apache 2.0 license.
- Bring-your-own-key for any major LLM provider: OpenAI, Anthropic, Google, DeepSeek, OpenRouter, local models via Ollama.
- Two edit formats:
diff(default, surgical changes) andwhole(rewrite the file). Diff is what wins on the polyglot benchmark. - Git-aware: every model edit is a separate commit you can revert. The Aider commit history is its own audit trail.
- No subscription. You pay only the inference cost on the model you choose.
- Runs anywhere: local terminal, remote servers over SSH, CI pipelines, headless environments where GUI editors are not an option.
Where it wins on the public leaderboard
Aider polyglot is the only major public coding benchmark that bills the cost per run alongside the score. The April 2026 leaderboard:
| Model (effort) | Polyglot pass rate | Cost per benchmark run | Source |
|---|---|---|---|
| GPT-5 (high) | 88.0% | $29.08 | aider.chat |
| GPT-5 (medium) | 86.7% | $17.69 | aider.chat |
| o3-pro (high) | 84.9% | $146.32 | aider.chat |
| Gemini 2.5 Pro | 83.1% | $49.88 | aider.chat |
| GPT-5 (low) | 81.3% | $10.37 | aider.chat |
| DeepSeek V3.2 | 74.2% | $1.30 | aider.chat |
| Claude Opus 4 | 72.0% | $65.75 | aider.chat |
Freshness note (April 28, 2026): Aider’s upstream leaderboard has not retested the April 2026 wave of frontier models (Claude Opus 4.7, GPT-5.5, DeepSeek V4) as of this writing. For the current frontier ranking on harder coding tasks, see the live SWE-bench Verified leaderboard and the TokenMix April 2026 summary. DeepSeek V4 Flash (April 24, 2026, $0.14/$0.28 per M tokens) should undercut V3.2 on cost-per-run by roughly 5x once added, with DeepSeek’s own tech report putting V4 at GPT-5.4-level coding quality.
Repository mapping: how Aider understands large codebases
Aider builds a map of your entire repository (function signatures, class definitions, module structure) and sends a compressed version to the LLM alongside your prompt. This gives the model context about your codebase without burning your entire token budget on file contents.
In practice: on a 150-file Python project, asking Aider to “add a caching layer to the user service” without the repo map requires you to manually specify every relevant file. With the repo map enabled, Aider identifies the user service, the database layer, the existing Redis configuration, and the test files on its own, then makes coherent edits across all of them. The map is what elevates Aider from “chat that edits files” to “assistant that understands your project.”
For repos over 300 files, use --map-tokens 1024 to limit how much context the map consumes. The default settings work well up to around 300 files; above that, the map overhead starts competing with the actual task context.
Edit formats: diff vs whole
Use diff (the default) for any file you are not creating from scratch. It produces clean unified diffs that apply surgically without touching unrelated code. Use whole only for greenfield files or cases where the model needs to rewrite the entire file to get the structure right. The polyglot leaderboard numbers confirm the quality gap: diff-format models score 5-8 points higher than whole-file rewrite on the same model at the same effort level. Aider automatically selects the best format per model based on its own benchmark testing; you can override with --edit-format diff or --edit-format whole.
The architect + code two-pass workflow
The most effective pattern for complex tasks: run aider --architect for the planning pass, then aider --code for execution. In architect mode, a stronger reasoning model designs the approach without making file edits. In code mode, an execution-focused model implements the plan. The split costs more per task but produces noticeably cleaner results on anything that involves design decisions, not just mechanical code changes. Use it on feature implementations, architectural refactors, and any task where “how to do it” is as hard as “doing it.”
Voice-to-code
Aider includes voice input support via OpenAI’s Whisper API. You describe changes by speaking instead of typing. This is genuinely useful for dictating long, specific instructions: “Add error handling to the database connection function in db.py, catch ConnectionError and retry three times with exponential backoff with jitter” is faster to speak than type. Whisper handles programming vocabulary well: function names, library names, CLI flags. The Whisper API adds a small per-request cost (~$0.006 per minute of audio at standard pricing). Enable with --voice on launch.
Git-native workflow
Every edit the AI makes is automatically committed with a clear, descriptive commit message. You end up with a Git history that reads like a changelog of exactly what the model did. When a code reviewer looks at the PR, they can see precisely which changes were human-written and which were AI-generated. That transparency changes the review dynamic.
The Git integration also makes aggressive experimentation safe. Tell Aider to try an ambitious refactor, then revert the entire thing with git revert if it goes sideways. Instead of carefully crafting prompts to avoid mistakes, you let it move fast and use Git as the undo button. This is a fundamentally different trust model from GUI-based tools where you preview changes in an accept/reject panel.
Cost: what you will actually pay
Aider is free; the API usage is not. Real monthly cost examples at moderate daily use:
| Usage profile | Typical monthly cost |
|---|---|
| Light use (a few tasks/day, cheaper models) | $5-20 |
| Moderate use (Claude Opus 4.7 primary + DeepSeek for simple tasks) | $45-80 |
| Heavy use with large codebases (mostly Claude Opus 4.7) | $80-150 |
| Single major refactor session on a large codebase | $20-40 per session |
The key monitoring command: /tokens inside an Aider session shows exactly how many tokens each message consumed and the estimated cost. Use it. Without monitoring, it is easy to burn $20 in a long session with large files in context, especially with Opus 4.7 at $5/$25 per M tokens.
The pay-per-use model is better than a subscription for developers with variable workloads: quiet weeks cost almost nothing. Major refactor weeks can spike. For developers who want predictable costs, a Cursor Pro subscription at $20/month may be preferable.
Where it wins: our 14-task editorial scoring
| Domain | Aider + Opus 4.7 | Aider + GPT-5.3-Codex | vs Cursor Composer 2 |
|---|---|---|---|
| Refactor (single file) | 8.8 | 8.4 | +0.7 |
| Refactor (cross-package) | 8.6 | 8.0 | +0.5 |
| Test-gen | 8.5 | 8.7 | +0.5 |
| Diff-edit cleanliness | 9.4 | 9.0 | +1.1 |
| Strict JSON | 8.0 | 9.0 | +0.8 |
| Editor UX (terminal-only) | 6.5 | 6.5 | -2.7 |
Diff quality is where Aider lives. The diff edit format combined with a strong model produces cleaner unified diffs than any IDE-embedded agent we have tested. The 9.4 is not an artifact; the percent-cases-well-formed metric on Aider’s own leaderboard (97-99% on top models) confirms it. On the recurring r/ChatGPTCoding thread “Aider vs Cursor for refactor,” the consensus is the same: Aider produces a diff a senior engineer would write; Cursor and Claude Code produce a diff a senior engineer would have to clean up.
Where it loses
No autocomplete. There are no inline suggestions while you type. If you are coming from Cursor or GitHub Copilot, you will immediately notice the absence. Aider is entirely chat-driven; you have to actively ask for every change. There is no passive productivity gain.
Terminal-only. No visual diff previews, no inline annotations, no click-to-accept workflow. You need to be comfortable reading Git diffs in the terminal or using a separate diff tool like delta. The UX gap versus Cursor is real and the 6.5 editor UX score reflects it.
No parallel agents. Aider runs one session at a time. If you want /best-of-n across multiple models on the same task, take Cursor 3 with the Agents Window.
API cost unpredictability. There is no spending cap by default. A runaway session with a large codebase and aggressive model selection can hit $30-40 in a single afternoon. The /tokens command is your monitoring tool; use it actively.
The best Aider stacks for April 2026
| Goal | Recommended stack | Why |
|---|---|---|
| Cheapest competitive | Aider + DeepSeek V4 Flash | $0.14/$0.28 per M tokens (April 24, 2026), ~5x cheaper than V3.2, GPT-5.4-level coding quality per DeepSeek’s tech report |
| Best refactor quality | Aider + Claude Opus 4.7 | SWE-bench Pro 64.3% leader; clean diffs out of the box |
| Best code-only quality | Aider + GPT-5.3-Codex | Polyglot leader at GPT-5 level; strict JSON mode for any JSON tool tasks |
| Local-first / air-gapped | Aider + Qwen3 235B via Ollama | Strong SWE-bench numbers among open weights; no data leaves your machine |
| Cost-conscious daily use | Aider + DeepSeek V4 Flash (default) + Opus 4.7 (architect pass only) | Run cheap for execution, expensive only for design decisions |
Setup in 90 seconds
pip install aider-chat
# Set your API keys
export ANTHROPIC_API_KEY=your-key
export OPENAI_API_KEY=your-key
# Launch with Claude Opus 4.7 (best refactor quality)
aider --model claude-opus-4-7
# Launch with DeepSeek V4 Flash (cheapest frontier)
aider --model deepseek/deepseek-v4-flash
# Two-pass architect + code workflow
aider --architect --model claude-opus-4-7
# then in the same session:
aider --code --model deepseek/deepseek-v4-flash
# Add files to context
/add src/services/user_service.py src/models/user.py
# Check token usage and cost
/tokens
The official install guide covers Windows, Conda, and Docker. The model selection page has every supported provider including Ollama for local models.
Aider vs Cline
Both are free and open-source, but they serve different workflows. Cline lives inside VS Code and offers a richer UI experience with plan/act modes and browser automation built in. Aider lives in the terminal and offers superior Git integration, model flexibility, and SSH/headless support. If you want an open-source tool inside your editor without leaving VS Code, choose Cline. If you want an open-source tool with the cleanest Git workflow and the ability to run on remote servers, choose Aider. Many developers use both and switch by context.
What the threads are saying
Three patterns dominate the Aider community discussion:
- The leaderboard is the trust unlock. r/LocalLLaMA refers back to Aider’s polyglot more than any other public benchmark when comparing open-weight coding models. The cost-per-run column is what makes it useful for purchasing decisions, not just raw accuracy comparisons.
- Diff format wins on quality. The recurring “diff vs whole” question on the Aider discussions tab lands on the same answer: diff for any file you are not creating from scratch, whole only for greenfield. The benchmark numbers back it.
- Maintainer responsiveness is real. Paul Gauthier ships releases roughly weekly; the HISTORY.md has support for new model releases within days. The Claude Sonnet 4 / Opus 4 series landed in v0.84 four days after launch.
How it compares
| TCC editorial score | Aider | Cursor 3 + Composer 2 | Claude Code + Opus 4.7 | Windsurf 2.0 + Cascade | GitHub Copilot |
|---|---|---|---|---|---|
| Diff quality | 9.4 | 8.3 | 8.6 | 8.4 | 7.8 |
| Editor UX | 6.5 | 9.4 | 7.8 | 9.0 | 8.5 |
| Model flexibility | 9.5 | 8.7 | 6.0 (Anthropic only) | 8.7 | 7.0 |
| Cheapest to run | $0 + API | $20-$200/mo | $20-$200/mo | $20-$200/mo | $10/mo |
| Parallel agents | 4.0 | 9.1 | 7.6 | 7.8 | 7.2 |
| SSH / headless support | 9.5 | n/a | 7.0 | n/a | 6.0 |
Pros and cons
| Strengths | Weaknesses |
|---|---|
| Git-native: every AI edit is a clean, revertible commit | No autocomplete; entirely chat-driven, no passive suggestions |
| Best diff quality of any tool we tested (9.4/10) | Terminal-only; no visual diff preview or click-to-accept UI |
| Model-agnostic: swap between Claude, GPT, DeepSeek, Ollama | API costs unpredictable without active /tokens monitoring |
| Free and open-source (Apache 2.0) | No parallel agents; one session at a time |
| Runs on remote servers over SSH, CI pipelines, headless envs | Learning curve steeper than GUI tools; 2-3 days to internalize |
| Transparent benchmark publisher with cost-per-run data | Leaderboard not updated for April 2026 frontier models yet |
Frequently asked questions
Is Aider completely free? Aider the tool is free and open-source. You pay only for the LLM API usage on the model you choose. Light users can keep costs under $20/month; heavy users running Claude Opus 4.7 as the primary model should budget $45-80/month or more depending on session intensity.
Does Aider have autocomplete? No. Aider is entirely chat-driven. There are no inline suggestions while you type. If autocomplete is part of your workflow, Aider is not the right primary tool. Use it alongside Cursor or GitHub Copilot rather than as a replacement.
Can I use Aider with local models? Yes. Aider supports Ollama and any OpenAI-compatible API endpoint. Point it at a locally running Qwen3 or Llama 4 instance with --model ollama/qwen3:235b and your code never leaves your machine. Useful for air-gapped environments and sensitive codebases.
How does Aider compare to Claude Code? Claude Code is better at open-ended reasoning, exploration, and complex multi-step tasks with a more guided experience. Aider is better at targeted file edits with clean Git integration and model flexibility. Claude Code is a thinking partner; Aider is an executing partner. Many developers run both and switch by task type.
How do I prevent runaway API costs? Use the /tokens command in every session to monitor spend. For long sessions, set a budget prompt at the start: “I want you to tell me before making any change that will consume more than 10,000 tokens.” For larger repos, use --map-tokens 1024 to limit repo map overhead. Consider running DeepSeek V4 Flash for execution passes and reserving Opus 4.7 for architect/planning passes only.
What is the two-pass architect + code workflow? Run aider --architect with a strong reasoning model (Claude Opus 4.7) to design the approach without making file edits. Then run aider --code with an execution-focused model (DeepSeek V4 Flash or GPT-5.3-Codex) to implement the plan. The split costs more per task but produces cleaner results on anything involving real design decisions.
Verdict
Aider is the right tool when you care about diff quality, model flexibility, and audit trail more than IDE polish. It is the cheapest path to frontier-quality coding (DeepSeek V4 Flash at $0.14/$0.28 per M tokens launched April 24, 2026, roughly 5x cheaper than V3.2 and at GPT-5.4-level coding quality) and the most honest benchmark publisher in the category. Pair it with Claude Opus 4.7 for hard refactors, with GPT-5.3-Codex for strict-JSON pipelines, and with DeepSeek V4 Flash when budget is the binding constraint. Use the architect + code two-pass workflow on anything where design matters as much as implementation.
For the methodology behind the scores, see the 14-task scorecard. For the model comparison context, see the Claude Opus 4.7 review and the GPT-5.3-Codex review. For the Cursor comparison, see the Cursor 3 + Composer 2 review.