~/cheatsheets/gpt-5-4-api-cheatsheet-the-9-parameters-that-matter-in-2026
§ CHEATSHEET · APR 23, 2026 ANTHROPIC · CHEATSHEET · CLAUDE v1.0

GPT-5.4 API cheatsheet: the 9 parameters that matter in 2026

GPT-5.4 API parameters, defaults, and the 3 that break your pipeline if you do not set them. Strict JSON, reasoning_effort, tool_choice, and the cost line to watch.
Adrian MarcusAdrian Marcus. Working engineer. Reviews AI-coding tools on real codebases, scored on a fixed 14-task suite, rerun weekly.
  3 min read

GPT-5.4 shipped March 5, 2026. The 9 parameters below are the ones a production-grade call site needs to set explicitly; the 3 footguns are the ones the recurring r/ChatGPTCoding “GPT-5.4 in production” threads keep flagging. Pin the model, force the schema, cap the bill.

Update — April 23, 2026
New flagshipOpenAI shipped GPT-5.5 — first fully retrained base since GPT-4.5, 1M context, $5/$30 per M tokens. Every parameter on this page still applies; swap model="gpt-5.4" for model="gpt-5.5".
Cost noticeGPT-5.5 doubles input ($2.50 → $5.00) and output ($15 → $30). Stay on GPT-5.4 unless you need Terminal-Bench 2.0 lead, agent-loop accuracy, or 1M context.
When to upgradeTerminal automation, multi-step tool chains, long-horizon agents. For strict-JSON extraction GPT-5.4 stays the right pick on $/successful-call.

The 9 parameters to set explicitly

Parameter API default Production value Why
model none gpt-5.4 (or pinned snapshot) Pin the version. Floating -latest breaks reproducibility.
reasoning_effort medium medium for coding, high for planning Higher effort costs 2-4x in hidden reasoning tokens; only worth it on long-horizon work.
response_format text {"type":"json_schema","strict":true,...} Strict mode compiles the schema into constrained decoding. The classic JSON parse-error classes can no longer occur.
tool_choice auto "required" or {"type":"function",...} Forces the model to call the tool you registered.
max_output_tokens model max 2048-8192 Caps runaway generations. The unbounded bill line is the biggest single leak we see in audits.
temperature 1.0 0.0 for deterministic, 0.4 for test-gen 0 is not fully deterministic on its own; pair with seed.
seed none fixed int per call class Reproducibility on replays.
top_logprobs none 5 on eval runs Margin-of-confidence at eval time without a second call.
metadata none {"eval_id":..., "call_class":...} Tags calls for post-hoc analysis in the OpenAI dashboard.

The 3 settings that break you

  1. response_format without "strict": true. Without strict mode the model emits JSON-flavoured text and your parser rejects a non-trivial percentage of calls on high-volume traffic. Always set "strict": true and supply the full schema. The OpenAI structured outputs guide is the reference; the strict-JSON prompt is what we pair it with.
  2. tool_choice left at auto. If you registered a tool because the workflow requires it, set tool_choice to "required". Otherwise the model will answer in prose on a meaningful fraction of calls. That is a spec violation on any pipeline that expects a tool output.
  3. max_output_tokens unset. The model will run to the model-max on error. The bill line for a week of unbounded calls is the most common surprise in our reader-submitted post-mortems.

Minimal production call

from openai import OpenAI
from pydantic import BaseModel

class Invoice(BaseModel):
    id: str
    total_cents: int
    currency: str

client = OpenAI()
resp = client.chat.completions.parse(
    model="gpt-5.4",
    reasoning_effort="medium",
    response_format=Invoice,
    max_output_tokens=2048,
    temperature=0.0,
    seed=42,
    metadata={"call_class": "invoice_extract", "env": "prod"},
    messages=[...],
)
invoice: Invoice = resp.choices[0].message.parsed

Pricing (April 2026)

Tier Input $/M Cached input $/M Output $/M Best for
GPT-5.4 $2.50 $0.25 $15.00 Default production
GPT-5.5 $5.00 $0.50 $30.00 Agentic / terminal / 1M context
GPT-5.5 Pro $30.00 $3.00 $180.00 Hardest reasoning, accept-once tasks (replaces GPT-5.4 Pro)
Batch (any tier) 50% of standard 50% 50% Overnight evals, large structured runs
Priority 2x standard 2x 2x SLA-critical low-latency

Pricing per the GPT-5.4 launch post. Cached input at 1/10 of standard is the lever most teams underuse: long stable system prompts pay for themselves in 2-3 calls.

Watch-outs

For the side-by-side with GPT-5.3-Codex on structured output at scale, see the GPT-5.3-Codex review. For the prompt pattern that pairs with these parameters and posts 100 of 100 on a 40-property schema, see the strict-JSON prompt.

One-line takeaway

Set model with a date, reasoning_effort to medium, response_format to strict JSON schema, tool_choice to required, and max_output_tokens to a real number. Everything else is a knob you can leave alone until you have a reason to move it.

esc