Claude Opus 4.6 Explained: Agent Teams & 1M Tokens

1M token context, agent teams, adaptive thinking, and the benchmarks that outpace GPT-5.2

February 6, 2026

Three months after releasing Opus 4.5, Anthropic has launched Claude Opus 4.6—an upgrade so significant that software stocks dropped further on the news. The model claims the top spot on nearly every major benchmark, introduces agent teams that let multiple AI instances collaborate in parallel, and expands the context window to one million tokens. It is available now on claude.ai, the Claude API (model ID: claude-opus-4-6), Amazon Bedrock, Google Cloud Vertex AI, and Snowflake Cortex AI.

This is not a minor version bump. This is Anthropic signaling where it thinks AI is headed—from tools that answer questions to agents that do the work.

What Is Claude Opus 4.6?

Claude Opus 4.6 is Anthropic's most capable AI model, purpose-built for coding, enterprise agents, and professional knowledge work.

It builds directly on the foundation of Opus 4.5, released in November 2025, but with meaningful improvements across the board: a 5x larger context window, better performance on long-document retrieval, stronger coding abilities, and a new architecture for running multiple agents simultaneously.

The key upgrades at a glance:

Feature	Opus 4.5	Opus 4.6
Context window	200K tokens	1M tokens (beta)
Max output tokens	64K	128K
Terminal-Bench 2.0	59.8%	65.4%
Thinking mode	Extended thinking	Adaptive thinking (4 effort levels)
Agent architecture	Single agent	Agent teams (parallel)
Context management	Manual	Auto-compaction (beta)

Dianne Penn, Anthropic's head of product management for research, put it plainly: "We think that Opus 4.6 is going to be an inflection point for knowledge work in many ways."

The Benchmarks: Where Opus 4.6 Stands

Numbers tell a clear story. Opus 4.6 leads across multiple independent evaluations, including several where it outperforms OpenAI's GPT-5.2.

Benchmark	What It Measures	Opus 4.6 Result
GDPval-AA	Economically valuable knowledge work (finance, legal)	+144 Elo over GPT-5.2, +190 over Opus 4.5. Wins ~70% of head-to-head comparisons.
Terminal-Bench 2.0	Agentic coding and system tasks	65.4% — highest reported score
Humanity's Last Exam	Multi-discipline reasoning	#1 among all frontier models
BrowseComp	Locating hard-to-find information online	Best performance
MRCR v2 (8-needle, 1M)	Long-context retrieval accuracy	76% (vs. 18.5% for Sonnet 4.5)

The GDPval result is particularly worth noting. This benchmark tests real-world financial and legal analysis tasks—the kind of work traditionally done by junior analysts, paralegals, and consultants. Beating GPT-5.2 by 144 Elo points on this specific evaluation is what sent ripples through Wall Street.

Agent Teams: Multiple Agents, One Task

The headline feature for developers is agent teams—a new capability in Claude Code, Anthropic's CLI tool for software engineers.

Here is how it works: instead of one agent processing tasks one at a time, you can now spin up multiple Claude Code agents that work in parallel. Each agent gets its own context window, can pick up tasks from a shared task list, and communicates directly with the other agents on the team.

Think of it like pair programming, except your pair is a team of AI agents splitting the work.

When Agent Teams Shine

Codebase reviews: Multiple agents can review different files simultaneously
Large refactors: Each agent handles a different module
Research-heavy tasks: Agents independently gather information, then converge

How to Enable It

Agent teams are off by default. You can enable them by setting:

code

CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1

You can also take direct control of any individual agent via Shift+Up/Down or through tmux sessions—fitting naturally into terminal-centric developer workflows.

One important note: running multiple agents increases token usage proportionally. More agents means faster results, but also higher API costs.

Adaptive Thinking: The Model Decides How Hard to Think

Previous Claude models offered extended thinking as a binary toggle—on or off. Opus 4.6 replaces this with adaptive thinking, where the model dynamically decides how much reasoning a task requires.

Developers can guide this with four effort levels:

Effort Level	Behavior
Max	Always uses extended thinking with no depth constraints
High (default)	Always thinks, provides deep reasoning
Medium	Moderate thinking, may skip for simple queries
Low	Skips thinking for simple tasks, prioritizes speed

The recommended setting for most use cases is the default adaptive mode (thinking: {type: "adaptive"}). Claude reads contextual clues to determine how much thinking is appropriate—no developer configuration needed for most tasks.

1 Million Token Context Window

Opus 4.6 is the first Opus-class model to support a 1M token context window (currently in beta). That is roughly 3,000 pages of text in a single conversation.

The practical impact is significant. On the MRCR v2 benchmark, which tests whether a model can find specific pieces of information buried in massive documents, Opus 4.6 scores 76% on the hardest variant (8-needle, 1M tokens). Sonnet 4.5 manages just 18.5% on the same test.

This matters for anyone working with large codebases, legal documents, research papers, or financial filings.

For prompts exceeding 200K tokens, premium pricing applies: $10 per million input tokens and $37.50 per million output tokens (compared to $5/$25 at standard usage).

Context Compaction: Infinite Conversations

Long conversations have always been a pain point with LLMs. Hit the context limit, and you either lose context or start over.

Opus 4.6 introduces context compaction (beta)—automatic, server-side summarization of older parts of a conversation. When the context approaches the window limit, the API summarizes earlier exchanges into a compact block, preserving the essentials while freeing up space.

For developers building agents that run for hours or days, this is a practical necessity. It enables effectively infinite conversations without manual context management.

The Market Reaction: Software Stocks Under Pressure

Claude Opus 4.6 did not launch in a vacuum. The broader narrative around AI replacing knowledge work has been building for months, and this release added fuel to the fire.

Since the start of 2026:

Company	Stock Decline (YTD)
Intuit	-32%
Thomson Reuters	-30%
Salesforce	-25%
SAP	-18%

The Nasdaq posted its worst two-day tumble since April, driven in part by the selloff in legal and financial analysis software stocks. Anthropic's release of industry-specific plug-ins for its Claude Cowork tool earlier this year had already triggered selling. Opus 4.6, with its superior financial analysis and legal reasoning capabilities, intensified the pressure.

As CNN Business reported, Opus 4.6 is "the AI that shook software stocks" getting "a big update."

"Vibe Working": Anthropic's Vision

Anthropic is framing this release around a concept it calls "vibe working"—an evolution of the popular "vibe coding" trend.

The idea: AI moves beyond being a tool that executes discrete commands into a collaborative partner that understands the broader context and objectives of your work. You describe what you want at a high level, and the AI handles the execution—writing code, conducting research, drafting analyses, building presentations.

With Opus 4.6's agent teams, adaptive thinking, and expanded context, this vision becomes more tangible. You are not prompting a chatbot. You are directing a team of agents.

Additional Product Updates

Opus 4.6 is not limited to the API and Claude Code. Anthropic also announced:

Claude in PowerPoint (research preview): The model can read existing slide layouts, fonts, and templates, then generate or edit slides that match your design system. You describe what you want, and Claude builds the deck.
Claude in Excel: Improved handling of long-running tasks and multi-step changes.
$50 extra usage credit: Current Pro and Max subscribers receive $50 in additional usage, automatically applied, to try the latest features across Claude Code and the Claude apps.

Pricing and Availability

Opus 4.6 is available immediately across all major platforms:

Platform	Status
claude.ai	Available now
Claude API	Model ID: `claude-opus-4-6`
Claude Code	Default model
Amazon Bedrock	Available now
Google Cloud Vertex AI	Available now
Snowflake Cortex AI	Available now
GitHub Copilot	Generally available

Standard pricing: $5 per million input tokens, $25 per million output tokens. Prompt caching and batch processing offer additional cost reductions.

Premium pricing (for prompts over 200K tokens): $10/$37.50 per million input/output tokens.

The Bottom Line

Opus 4.6 is not a revolution—it is the steady, relentless march of capability that makes each generation of AI models more useful than the last. The benchmarks are better. The context window is bigger. The agent architecture is more sophisticated.

But the real story is the convergence of features. A model that can reason through a million tokens of context, dynamically adjust its thinking depth, coordinate with other agents in parallel, and outperform every competitor on economically valuable knowledge work—that is not just a better chatbot. That is the beginning of what Anthropic calls vibe working.

Whether that vision excites or concerns you probably depends on what you do for a living.

Claude Opus 4.6: Anthropic Just Dropped Its Most Intelligent Model and Wall Street Is Paying Attention