Cursor 3 shipped the Agents Window in late February 2026 and Composer 2 followed on March 19. The combination is what r/cursor has been calling “the first IDE-native parallel agent stack that does not eat itself”: multiple agent panes, /best-of-n that fans the same task across models and lets you pick the best result, background cloud agents that survive a closed laptop, and a local-to-cloud session handoff. Composer 2 itself posts 61.3 on CursorBench (Anysphere’s first-party benchmark, +39% over Composer 1.5) at 200+ tokens per second, and prices the slow tier at $0.50/M input and $2.50/M output. The first-party benchmark is first-party, so it warrants caution; the independent take from TokenMix walks through what to trust and what is still pending. This is the review.
Quick answer: if you live in VS Code, want parallel agents in one workspace, and care about cost more than absolute model quality, Cursor 3 with Composer 2 is the strongest editor-native answer in April 2026. If you need the absolute best model on every task, route Cursor to Claude Opus 4.7 or run Claude Code in parallel.
What Cursor 3 ships
- Agents Window. All local and cloud agents live in one sidebar. You can kick off agents from desktop, mobile, web, Slack, GitHub, or Linear and they all surface here.
- /best-of-n. Run the same task across multiple models in parallel, compare outputs, choose the strongest. The eWeek launch coverage covers the team motivation; on r/cursor the early consensus is that /best-of-n is the killer feature for hard one-shot refactors where one model just gets it. As of April 24, 2026 the model picker includes Composer 2, Claude Opus 4.7, GPT-5.4, GPT-5.5 (added the day after OpenAI’s April 23 launch), GPT-5.3-Codex, and Gemini 3.1 Pro — GPT-5.5 in particular is worth the slot when terminal-style agent tasks are part of the mix.
- Background cloud agents. Long-running tasks (8-hour refactors, codebase migrations) run on Anysphere’s cloud and survive your laptop closing. Beta-quality per community reports; Pro tier has a 2-hour soft cap, Max removes it.
- Local-to-cloud handoff. Move an agent session from cloud to local to make hand edits or run tests on your machine, then push it back to cloud to keep running while you context-switch.
- Design Mode. Iterate on UI from a screenshot or sketch directly in the editor. Confirmed GA per the Cursor 3 feature page.
- Multi-repo workspace. One window, several repos, agents that move between them.
Composer 2: the in-house model
Composer 2 is Cursor’s own agentic coding model and is now the default in Auto mode. The official numbers per Cursor’s model docs:
- 61.3 on CursorBench, +39% over Composer 1.5.
- 200+ tokens per second on the fast variant via custom GPU kernels (snapshot from Cursor traffic on March 18, 2026).
- Pricing: $0.50/M input and $2.50/M output on the standard tier; $1.50/M and $7.50/M on the fast variant. Fast is the default.
- Tuned for tool use, file edits, and terminal operations inside Cursor.
Caveat: CursorBench is Anysphere’s own benchmark and is not directly comparable to SWE-bench Verified or Aider polyglot. The independent TokenMix review flags the same point. On the public leaderboards Claude Opus 4.7 still leads SWE-Bench Pro at 64.3%, while OpenAI’s GPT-5.5 (released April 23, 2026) takes Terminal-Bench 2.0 at 82.7% and the Artificial Analysis Intelligence Index at 60. Composer 2’s value is not “best on every benchmark”; it is “best price-per-token at frontier-quality inside the editor where you are already typing”, and it picks up the new frontier models the day they ship via /best-of-n routing.
Where it wins, in our 14-task editorial scoring
| Domain | Composer 2 (auto) | vs Claude Opus 4.7 | vs GPT-5.3-Codex |
|---|---|---|---|
| Refactor (multi-file) | 8.1 | -0.9 | -0.3 |
| Test-gen | 8.0 | -0.4 | -0.7 |
| Debug | 8.2 | -0.6 | -0.2 |
| Agent & tool use (parallel) | 8.3 | -0.8 | -0.3 |
| Strict JSON | 8.0 | -0.2 | -1.0 |
| Daily editor flow (latency-adjusted) | 9.2 | +1.5 | +1.2 |
The 9.2 on daily editor flow is the number that matters. Composer 2 at 200 tok/s feels closer to autocomplete than to agent. The first 5 minutes of a session are noticeably faster than running Opus 4.7 inside Claude Code, and the Cursor UX (file diffs, accept/reject, multi-file preview) cuts the back-and-forth. For sustained 8-hour coding the latency advantage compounds.
Where it loses
The hardest cross-package refactors. Composer 2 hits the same re-exported-types pattern that our refactor TypeScript guide describes as the “barrel-file trap”; Opus 4.7 catches the indirected call sites more often. The fix in Cursor is to either run /best-of-n with Opus 4.7 in the mix, or to switch the same task to Claude Opus 4.7 directly. Composer 2 + Opus 4.7 in the same Cursor workspace is a real workflow, not a fork in the road.
Pricing and plans
| Plan | Price | What you get |
|---|---|---|
| Free | $0 | ~50 slow Composer requests/day, all paid models BYOK |
| Pro | $20/mo | Generous Composer pool, all premium models, parallel agents, 2-hour cap on background |
| Max | $200/mo | Background cloud agents without the cap, priority GPU, heavier limits |
| Business | $40/user/mo | Admin controls, team policies, SSO |
The Pro tier is the same price as Windsurf Pro and ChatGPT Plus; Max is the same price as ChatGPT Pro. If you already pay for Claude Pro at $20 and ChatGPT Plus at $20, adding Cursor Pro is the third $20 and the one most teams say is the most felt of the three. If you only have budget for one, the recurring “Cursor or Claude Code” thread on r/cursor splits cleanly: VS Code people pick Cursor, terminal people pick Claude Code.
What the threads are saying
Three patterns dominate r/cursor since the Composer 2 launch:
- Speed is the unlock. 200 tok/s is the number people quote. Once you have used Composer 2 for a week, switching back to a 60 tok/s model feels jarring.
- Background agents are still beta. The 8-hour migration story works; the merge resolution on parallel agents touching the same file is rough. Anysphere is shipping fixes weekly. Pin the version if you ship reproducibly.
- The SpaceX rumor. A reported $60B Cursor acquisition by SpaceX has been in negotiation since mid-April. As of April 22, 2026 it has not closed; Composer 2 stays as default in Auto mode regardless. Sentiment on r/cursor is mixed: some welcome the resources, others worry about the editor’s roadmap.
How it compares
| TCC editorial score | Cursor 3 + Composer 2 | Claude Code + Opus 4.7 | Windsurf 2.0 + Cascade | Aider + Opus 4.7 |
|---|---|---|---|---|
| Editor UX | 9.4 | 7.8 | 9.0 | 6.5 (terminal) |
| Best model on hard refactor | 8.6 (with /best-of-n) | 9.0 | 8.4 | 8.8 |
| Parallel agents | 9.1 | 7.6 | 7.8 | n/a |
| Background long-running | 8.4 (Max) | 8.7 (Routines) | 8.0 (Max + Devin) | n/a |
| Daily $/cost ceiling | $20-$200 | $20-$200 | $20-$200 | API-pass-through |
Verdict
Cursor 3 is the most consequential editor release of 2026 and Composer 2 is the model that makes the upgrade worth running on day one. The Anysphere benchmark is first-party, so anchor your decision on the public leaderboards (Opus 4.7 still leads) and on what your day actually looks like. If you switch in and out of agents twenty times an hour, Composer 2 at 200 tok/s changes the loop. If you run an overnight migration, take the Max tier or run the job in Claude Code Routines.
Pair this with the Cursor 3 shortcuts cheatsheet and the Cursor 3 parallel agents trend post. For the methodology behind every score above, see the 14-task scorecard.