How Good is Cursor's Composer 2?
Breaking News from Anysphere.
I’ve been thinking a lot about Anysphere the company behind Cursor. Today they announced their new model Composer 2. Making capable in-house models for such a small team of about 400 employees is a great challenge.
The thing is it’s going to surprise some people. a16z.
Introducing Composer 2
Composer 2 is now available in Cursor.
It’s frontier-level at coding and priced at $0.50/M input and $2.50/M output tokens. Frontier-level coding intelligence
They are rapidly improving the quality of our model. Composer 2 delivers large improvements on all benchmarks we measure, including Terminal-Bench 2.01 and SWE-Bench Multilingual:
Model CursorBench Terminal-Bench 2.0
The Cursor team trained Composer for long-horizon tasks through reinforcement learning. Composer 2 is able to solve challenging coding tasks requiring hundreds of actions.
They were also able to significantly improve the model quality and cost to serve Composer 2. These quality improvements come from our first continued pretraining, providing a far stronger base to scale our reinforcement learning.
More for Less
Composer 2 is priced at $0.50/M input and $2.50/M output tokens.
There is also a faster variant with the same intelligence at $1.50/M input and $7.50/M output tokens, which is cheaper than other fast models. We’re making fast the default option.
On individual plans, Composer usage is part of a standalone usage pool with generous usage included. Try Composer 2 today in Cursor or in the early alpha of our new interface.
Old and New
Curor’s pace of features has noticeably improved since they hit a $2 Bn. ARR run rate as of February, 2026, just about 33 months after being founded.
How fast is Cursor Iterating?
5 Months, 3 Generations
Composer 2 is the third Composer release since October. Cursor shipped the original Composer model, along with its 2.0 platform redesign, in October 2025. Composer 1.5 followed this February, and at the time, it was still trailing Opus 4.6 by 10% on Terminal-Bench 2.0. - The New Stack
Using RL
DeepSeek and Cursor are using RL in new and innovative ways from what I can tell. Cursor’s approach, which the team calls compaction-in-the-loop reinforcement learning, builds summarization directly into the training loop. When a generation hits a token-length trigger, the model pauses and compresses its own context to roughly 1,000 tokens, down from 5,000 or more with more traditional methods. Because the reinforcement learning reward the team used when training the model covers the entire chain, including the summarization steps, the model learns which details to keep and which to discard.
Why Compaction-in-the-loop Matters?
This is used to to solve the “long-horizon” problem in AI coding agents.
Traditionally, when an AI agent runs out of context window space, a separate “harness” or prompt is used to summarize the past history. This often causes the model to “forget” critical technical details, leading to errors in complex, multi-hour coding tasks.
How it Works
Instead of treating summarization as an external post-processing step, Cursor baked it directly into the model’s training:
Self-Summarization: During training, the model is intentionally pushed to its context limit. It is then rewarded via RL for generating a “compacted” version of its own history that successfully allows it to complete a task.
The “Loop”: The compaction process is literally “in the loop” of the RL trajectory. If the model summarizes poorly and loses a vital piece of information (like a specific variable name or a past bug fix), it fails the task and receives a negative reward.
Token Efficiency: Because the model learns exactly what is important to keep for a developer’s workflow, these summaries are roughly 5x more token-efficient than standard prompt-based summaries.
According to Cursor’s research released in March 2026, this approach:
Reduces Compaction Error by 50%: It significantly lowers the “forgetting” rate compared to previous state-of-the-art methods.
Enables “Project-Scale” Refactors: It allows the Composer agent to work through hundreds of sequential actions (like refactoring an entire repository) without losing the thread of the original goal.
Reuses KV Cache: Technically, it’s optimized to reuse existing computations, making the transition between “full context” and “compacted context” nearly instant for the user.
Faster Cheaper
Anyways I obviously can’t verify any of this, it’s just really interesting model news. Remember that rumor on X? A mysterious, unattributed AI model called Hunter Alpha (sometimes referred to as “Alpha Hunter”) appeared anonymously on the OpenRouter platform around March 11, 2026. It quickly surged to the top of their usage/leaderboard charts after users tested it, racking up over a trillion tokens in usage in a very short time.
Not sure if it’s DeepSeek or this, but it warrants watching. One idea explained that the rumor was later revealed to be an early internal test build from Xiaomi’s AI team (MiMo-V2-Pro), run by former DeepSeek researchers. Xiaomi confirmed it was designed as a flagship “brain” for AI agents (tool-using systems that handle complex tasks autonomously).
Mysterious anonymous models on LMSYS/Chatbot Arena in the past that turned out to be previews from big labs.
So you can see why I’m a little bit excited. DeepSeek-R2 can’t be so far away either by now. Here’s the link to the Xiaomi MiMo-V2-Pro model launch. Here’s my deep dive on Cursor’s business growth:
If Composer 2 is as good as it looks to be, Anysphere is for real in 2026.
















It’s actually great at coding. The challenge comes in when doing more than just coding, like running long term tasks like operating or maintaining systems. Composer doesn’t have the reasoning depth that Opus 4.6 has. That being said if all you’re doing is straight coding then its a solid model.