Claude Opus 4.7 (released April 16, 2026) ships with API changes that will break a working Sonnet 4.5 or Opus 4.6 call site on copy-paste: sampling knobs (temperature, top_p, top_k) are rejected with a 400 error, extended-thinking-with-a-token-budget is replaced by thinking={"type":"adaptive"} with output_config.effort, and prefilled assistant messages are gone. This page covers the 8 settings for production tool calling, the new programmatic tool calling feature that eliminates per-tool round trips, and the migration checklist for anyone upgrading from Opus 4.6 or earlier.
By Adrian Marcus. Updated May 2026.
The 8 settings for production tool calling
| Setting | Default | Production value | Why |
|---|---|---|---|
model |
none | claude-opus-4-7 |
The alias resolves to the shipped snapshot. Pin a dated snapshot in production if you need replay reproducibility. |
tool_choice |
{"type":"auto"} |
{"type":"tool","name":"<name>"} |
Forces the named tool. The single largest delta on tool-call reliability. auto returns prose on a non-trivial fraction of calls when intent is ambiguous. |
disable_parallel_tool_use |
false |
true unless your tools are parallel-safe |
Default parallelism races on state-changing tools (writes, deletes, mutations). Set this on every call involving side effects. |
thinking |
off | {"type":"adaptive"} when reasoning depth matters |
Adaptive replaces extended thinking on Opus 4.7. The model decides how long to think; you bias depth via output_config.effort. Off by default – requests without this field run without thinking, matching Opus 4.6 behavior. |
output_config.effort |
n/a | "xhigh" for coding/agentic, "high" minimum for intelligence-sensitive work, "medium" for cost-sensitive |
Replaces budget_tokens. Legal values: low, medium, high, xhigh, max. New in Opus 4.7 and more important here than on any prior Opus. |
max_tokens |
none | 4096 single-turn, 16384 when effort >= high, 64000+ at xhigh/max | Hard per-request ceiling. Set a large value at xhigh/max so the model has room to think across subagents and tool calls. The new tokenizer produces 1.0-1.35x more tokens than Opus 4.6 for the same input – add headroom. |
system |
none | Minimal preamble plus explicit tool-use rules | Long system prompts hurt tool-call precision and are the second most-flagged issue on the Anthropic SDK tracker. Keep it short; add an anti-rule in each tool description instead. |
thinking.display |
"omitted" |
"summarized" if your UI streams reasoning to users |
Thinking blocks appear in the response stream but their content is empty by default on Opus 4.7. This is a silent change from Opus 4.6. If your product shows reasoning progress, set display: "summarized" or users see a long pause before output. |
Sampling parameters are gone
Opus 4.7 rejects temperature, top_p, and top_k with a 400 error. If you set temperature=0.2 on Sonnet 4.6 or Opus 4.1 for tool-call determinism, remove it before the first request lands. The model picks its own sampling internally; you only bias thinking depth via output_config.effort. This is the most common migration error in the Anthropic Python SDK issue tracker.
Note: temperature=0 never guaranteed identical outputs on prior models either. If you need deterministic outputs for testing, use fixed-seed evals with consistent prompts.
effort levels reference
| Level | Best for | Notes |
|---|---|---|
low |
Short, scoped tasks, latency-sensitive workloads that are not intelligence-sensitive | Fastest, cheapest |
medium |
Cost-sensitive use cases trading off intelligence for token reduction | Balanced |
high |
Most intelligence-sensitive use cases – recommended minimum for production | Good balance of capability and cost |
xhigh |
Coding and agentic use cases – Anthropic’s recommended default for these workloads | New in Opus 4.7 |
max |
Hardest reasoning tasks; may show diminishing returns and over-thinking | Test before committing |
Tool schema – the way it should look
tools = [
{
"name": "search_invoices",
"description": "Search the invoice database. Use only when the user asks about invoices or payments.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "minLength": 1, "maxLength": 200},
"since": {"type": "string", "format": "date"},
"limit": {"type": "integer", "minimum": 1, "maximum": 50}
},
"required": ["query"],
"additionalProperties": False
}
}
]
Four schema constraints that matter for reliability:
additionalProperties: false. The model will not invent fields your code does not expect. This is the single highest-value schema constraint.- Tight
minLengthandmaxLengthon strings. Prevents the “pass empty string” and “pass the entire prior conversation” failure modes. - An anti-rule in the
description. “Use only when the user asks about invoices” drops off-topic tool calls that positive-only descriptions let through. - Keep descriptions under 150 words. Opus 4.7 is more literal than 4.6; long descriptions that list exceptions and edge cases tend to fire the exceptions as often as the happy path.
Minimal production call
from anthropic import Anthropic
client = Anthropic()
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
system="You call exactly one tool per turn. Return tool input only, no prose.",
tools=tools,
tool_choice={"type": "tool", "name": "search_invoices"},
disable_parallel_tool_use=True,
thinking={"type": "adaptive"},
output_config={"effort": "xhigh"},
messages=[{"role": "user", "content": user_msg}],
)
# Extract the tool call result
for block in resp.content:
if block.type == "tool_use":
print(f"Tool: {block.name}, Input: {block.input}")
Programmatic tool calling (new in 2026)
Standard tool calling requires one model round trip per tool invocation. Programmatic tool calling lets Claude write code that calls your tools inside a code execution container, running multiple tool calls in a single request. On a task like checking budget compliance across 20 employees, the standard approach needs 20 round trips. With programmatic tool calling, a single script runs all 20 lookups, filters results, and returns only the employees who exceeded limits.
This feature requires the code_execution_20260120 tool and uses the allowed_callers field to restrict which tools the code executor can call:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-7",
max_tokens=4096,
messages=[
{
"role": "user",
"content": "Query sales data for the West, East, and Central regions, "
"then tell me which region had the highest revenue last quarter."
}
],
tools=[
# Enable code execution container
{"type": "code_execution_20260120", "name": "code_execution"},
# Register your tool, restrict it to code execution calls only
{
"name": "query_database",
"description": "Execute a SQL query against the sales database. Returns rows as JSON.",
"input_schema": {
"type": "object",
"properties": {
"sql": {"type": "string", "description": "SQL query to execute"}
},
"required": ["sql"],
"additionalProperties": False
},
"allowed_callers": ["code_execution_20260120"] # restricts direct model calls
}
]
)
print(response)
When to use it: multi-tool workflows where Claude needs to loop or filter results before returning them to the caller. When not to use it: single-tool calls where a standard round trip is fine, or any tool that should not accept programmatic invocations from code blocks.
Task budgets (beta)
Task budgets are a beta feature in Opus 4.7 that give the model a token target for a complete agentic loop – including thinking, tool calls, tool results, and final output. The model sees a running countdown and uses it to pace and prioritize work.
resp = client.messages.create(
model="claude-opus-4-7",
max_tokens=64000,
extra_headers={"anthropic-beta": "task-budgets-2026-03-13"},
thinking={"type": "adaptive"},
output_config={
"effort": "xhigh",
"task_budget": {"type": "tokens", "total": 32000}
},
messages=[{"role": "user", "content": complex_task}],
)
Task budgets are advisory, not a hard cap. They differ from max_tokens: task_budget is passed to the model so it can self-moderate; max_tokens is a hard ceiling the model does not see. Minimum task budget is 20k tokens. For open-ended tasks where quality matters more than speed, skip the task budget and let the model run.
Watch-outs
- Parallel tool-use surprise. If the model calls the same tool twice in one turn despite your intent, you forgot
disable_parallel_tool_use. It is a top-level request field, not a tool field. - stop_sequences with adaptive thinking.
stop_sequencescan clip the thinking block on the first tool chunk when thinking is on. Leavestop_sequencesempty whenthinkingis enabled; add stop behavior to the system prompt instead. - Do not send budget_tokens.
thinking: {type: "enabled", budget_tokens: N}returns a 400 error on Opus 4.7. The field was the Sonnet 4.x / Opus 4.1 extended-thinking knob and is rejected withinvalid_request_errorpointing atoutput_config.effort. - Tokenizer cost increase. The same input maps to roughly 1.0-1.35x more tokens than on Opus 4.6 due to the new tokenizer. Plan for up to 35% input cost increase before effort level changes. Update
max_tokensto give additional headroom and re-test any code path that estimates token counts client-side. - Prefill removal. Prefilling assistant messages returns a 400 error. Use structured outputs, system prompt instructions, or
output_config.formatinstead. - High-resolution image tokens. Opus 4.7 supports images up to 2576px and uses up to approximately 3x more image tokens per full-resolution image (up to 4,784 tokens vs the previous cap of roughly 1,600). If your pipeline sends images, re-budget max_tokens and downsample before sending if the additional resolution is not needed.
- MCP tools. If you call the API directly with MCP tools, you must also pass the MCP server config. Tools alone will not wire the connection.
Migration checklist from Opus 4.6
- Update model name from
claude-opus-4-6toclaude-opus-4-7. - Remove
temperature,top_p, andtop_kfrom all request payloads. - Replace
thinking: {type: "enabled", budget_tokens: N}withthinking: {type: "adaptive"}plusoutput_config.effort. - Remove any assistant-message prefills; use structured outputs or system instructions instead.
- If your UI streams thinking to users, set
thinking.display: "summarized"explicitly – the default changed to"omitted". - Raise
max_tokensby 35% minimum to account for the new tokenizer; at xhigh/max effort, start at 64k. - Re-test any client-side token-count estimations against the new tokenizer.
- For image-heavy workloads, re-budget for up to 3x more tokens per full-resolution image; downsample if the extra resolution is not needed.
Related
For the Opus 4.7 scores on agent and tool-use tasks (TCC editorial scoring), see the Opus 4.7 review. For the retry policy that wraps every tool call, see the agent loop retry policy post. For the bounded-planner prompt that respects step budgets on this model, see the bounded-planner prompt.
One-line takeaway
Pin claude-opus-4-7, set tool_choice to a specific tool, disable_parallel_tool_use unless you designed for it, drop temperature/top_p/top_k, switch thinking to {"type":"adaptive"} with output_config.effort at xhigh for agentic work, keep stop_sequences empty, and set additionalProperties: false on every tool schema. Add 35% headroom to max_tokens for the new tokenizer. That is the production-grade configuration in May 2026.