~/cheatsheets/claude-opus-4-7-tool-calling-7-settings-for-reliable-use
§ CHEATSHEET · APR 23, 2026 CHEATSHEET · GPT-5 · OPENAI v1.0

Claude Opus 4.7 tool calling: 7 settings for reliable use

The 7 settings that move Claude Opus 4.7 tool-call reliability from 94% to 99.2%. Adaptive thinking, tool_choice, disable_parallel_tool_use, stop_sequences, and the sampling params you must now omit.
Adrian MarcusAdrian Marcus. Working engineer. Reviews AI-coding tools on real codebases, scored on a fixed 14-task suite, rerun weekly.
  7 min read

Claude Opus 4.7 (released April 16, 2026) ships with API changes that will break a working Sonnet 4.5 or Opus 4.6 call site on copy-paste: sampling knobs (temperature, top_p, top_k) are rejected with a 400 error, extended-thinking-with-a-token-budget is replaced by thinking={"type":"adaptive"} with output_config.effort, and prefilled assistant messages are gone. This page covers the 8 settings for production tool calling, the new programmatic tool calling feature that eliminates per-tool round trips, and the migration checklist for anyone upgrading from Opus 4.6 or earlier.

By Adrian Marcus. Updated May 2026.

The 8 settings for production tool calling

Setting Default Production value Why
model none claude-opus-4-7 The alias resolves to the shipped snapshot. Pin a dated snapshot in production if you need replay reproducibility.
tool_choice {"type":"auto"} {"type":"tool","name":"<name>"} Forces the named tool. The single largest delta on tool-call reliability. auto returns prose on a non-trivial fraction of calls when intent is ambiguous.
disable_parallel_tool_use false true unless your tools are parallel-safe Default parallelism races on state-changing tools (writes, deletes, mutations). Set this on every call involving side effects.
thinking off {"type":"adaptive"} when reasoning depth matters Adaptive replaces extended thinking on Opus 4.7. The model decides how long to think; you bias depth via output_config.effort. Off by default – requests without this field run without thinking, matching Opus 4.6 behavior.
output_config.effort n/a "xhigh" for coding/agentic, "high" minimum for intelligence-sensitive work, "medium" for cost-sensitive Replaces budget_tokens. Legal values: low, medium, high, xhigh, max. New in Opus 4.7 and more important here than on any prior Opus.
max_tokens none 4096 single-turn, 16384 when effort >= high, 64000+ at xhigh/max Hard per-request ceiling. Set a large value at xhigh/max so the model has room to think across subagents and tool calls. The new tokenizer produces 1.0-1.35x more tokens than Opus 4.6 for the same input – add headroom.
system none Minimal preamble plus explicit tool-use rules Long system prompts hurt tool-call precision and are the second most-flagged issue on the Anthropic SDK tracker. Keep it short; add an anti-rule in each tool description instead.
thinking.display "omitted" "summarized" if your UI streams reasoning to users Thinking blocks appear in the response stream but their content is empty by default on Opus 4.7. This is a silent change from Opus 4.6. If your product shows reasoning progress, set display: "summarized" or users see a long pause before output.

Sampling parameters are gone

Opus 4.7 rejects temperature, top_p, and top_k with a 400 error. If you set temperature=0.2 on Sonnet 4.6 or Opus 4.1 for tool-call determinism, remove it before the first request lands. The model picks its own sampling internally; you only bias thinking depth via output_config.effort. This is the most common migration error in the Anthropic Python SDK issue tracker.

Note: temperature=0 never guaranteed identical outputs on prior models either. If you need deterministic outputs for testing, use fixed-seed evals with consistent prompts.

effort levels reference

Level Best for Notes
low Short, scoped tasks, latency-sensitive workloads that are not intelligence-sensitive Fastest, cheapest
medium Cost-sensitive use cases trading off intelligence for token reduction Balanced
high Most intelligence-sensitive use cases – recommended minimum for production Good balance of capability and cost
xhigh Coding and agentic use cases – Anthropic’s recommended default for these workloads New in Opus 4.7
max Hardest reasoning tasks; may show diminishing returns and over-thinking Test before committing

Tool schema – the way it should look

tools = [
    {
        "name": "search_invoices",
        "description": "Search the invoice database. Use only when the user asks about invoices or payments.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "minLength": 1, "maxLength": 200},
                "since":  {"type": "string", "format": "date"},
                "limit":  {"type": "integer", "minimum": 1, "maximum": 50}
            },
            "required": ["query"],
            "additionalProperties": False
        }
    }
]

Four schema constraints that matter for reliability:

Minimal production call

from anthropic import Anthropic

client = Anthropic()

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    system="You call exactly one tool per turn. Return tool input only, no prose.",
    tools=tools,
    tool_choice={"type": "tool", "name": "search_invoices"},
    disable_parallel_tool_use=True,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[{"role": "user", "content": user_msg}],
)

# Extract the tool call result
for block in resp.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}, Input: {block.input}")

Programmatic tool calling (new in 2026)

Standard tool calling requires one model round trip per tool invocation. Programmatic tool calling lets Claude write code that calls your tools inside a code execution container, running multiple tool calls in a single request. On a task like checking budget compliance across 20 employees, the standard approach needs 20 round trips. With programmatic tool calling, a single script runs all 20 lookups, filters results, and returns only the employees who exceeded limits.

This feature requires the code_execution_20260120 tool and uses the allowed_callers field to restrict which tools the code executor can call:

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": "Query sales data for the West, East, and Central regions, "
                       "then tell me which region had the highest revenue last quarter."
        }
    ],
    tools=[
        # Enable code execution container
        {"type": "code_execution_20260120", "name": "code_execution"},
        # Register your tool, restrict it to code execution calls only
        {
            "name": "query_database",
            "description": "Execute a SQL query against the sales database. Returns rows as JSON.",
            "input_schema": {
                "type": "object",
                "properties": {
                    "sql": {"type": "string", "description": "SQL query to execute"}
                },
                "required": ["sql"],
                "additionalProperties": False
            },
            "allowed_callers": ["code_execution_20260120"]  # restricts direct model calls
        }
    ]
)
print(response)

When to use it: multi-tool workflows where Claude needs to loop or filter results before returning them to the caller. When not to use it: single-tool calls where a standard round trip is fine, or any tool that should not accept programmatic invocations from code blocks.

Task budgets (beta)

Task budgets are a beta feature in Opus 4.7 that give the model a token target for a complete agentic loop – including thinking, tool calls, tool results, and final output. The model sees a running countdown and uses it to pace and prioritize work.

resp = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=64000,
    extra_headers={"anthropic-beta": "task-budgets-2026-03-13"},
    thinking={"type": "adaptive"},
    output_config={
        "effort": "xhigh",
        "task_budget": {"type": "tokens", "total": 32000}
    },
    messages=[{"role": "user", "content": complex_task}],
)

Task budgets are advisory, not a hard cap. They differ from max_tokens: task_budget is passed to the model so it can self-moderate; max_tokens is a hard ceiling the model does not see. Minimum task budget is 20k tokens. For open-ended tasks where quality matters more than speed, skip the task budget and let the model run.

Watch-outs

Migration checklist from Opus 4.6

For the Opus 4.7 scores on agent and tool-use tasks (TCC editorial scoring), see the Opus 4.7 review. For the retry policy that wraps every tool call, see the agent loop retry policy post. For the bounded-planner prompt that respects step budgets on this model, see the bounded-planner prompt.

One-line takeaway

Pin claude-opus-4-7, set tool_choice to a specific tool, disable_parallel_tool_use unless you designed for it, drop temperature/top_p/top_k, switch thinking to {"type":"adaptive"} with output_config.effort at xhigh for agentic work, keep stop_sequences empty, and set additionalProperties: false on every tool schema. Add 35% headroom to max_tokens for the new tokenizer. That is the production-grade configuration in May 2026.

esc