GPT-5.4 API cheatsheet: the 9 parameters that matter in 2026

GPT-5.4 shipped March 5, 2026. The 9 parameters below are the ones a production-grade call site needs to set explicitly; the 3 footguns are the ones the recurring r/ChatGPTCoding “GPT-5.4 in production” threads keep flagging. Pin the model, force the schema, cap the bill.

Update — April 23, 2026

New flagshipOpenAI shipped GPT-5.5 — first fully retrained base since GPT-4.5, 1M context, $5/$30 per M tokens. Every parameter on this page still applies; swap model="gpt-5.4" for model="gpt-5.5".

Cost noticeGPT-5.5 doubles input ($2.50 → $5.00) and output ($15 → $30). Stay on GPT-5.4 unless you need Terminal-Bench 2.0 lead, agent-loop accuracy, or 1M context.

When to upgradeTerminal automation, multi-step tool chains, long-horizon agents. For strict-JSON extraction GPT-5.4 stays the right pick on $/successful-call.

The 9 parameters to set explicitly

Parameter	API default	Production value	Why
`model`	none	`gpt-5.4` (or pinned snapshot)	Pin the version. Floating `-latest` breaks reproducibility.
`reasoning_effort`	`medium`	`medium` for coding, `high` for planning	Higher effort costs 2-4x in hidden reasoning tokens; only worth it on long-horizon work.
`response_format`	text	`{"type":"json_schema","strict":true,...}`	Strict mode compiles the schema into constrained decoding. The classic JSON parse-error classes can no longer occur.
`tool_choice`	`auto`	`"required"` or `{"type":"function",...}`	Forces the model to call the tool you registered.
`max_output_tokens`	model max	2048-8192	Caps runaway generations. The unbounded bill line is the biggest single leak we see in audits.
`temperature`	1.0	0.0 for deterministic, 0.4 for test-gen	0 is not fully deterministic on its own; pair with `seed`.
`seed`	none	fixed int per call class	Reproducibility on replays.
`top_logprobs`	none	`5` on eval runs	Margin-of-confidence at eval time without a second call.
`metadata`	none	`{"eval_id":..., "call_class":...}`	Tags calls for post-hoc analysis in the OpenAI dashboard.

The 3 settings that break you

response_format without "strict": true. Without strict mode the model emits JSON-flavoured text and your parser rejects a non-trivial percentage of calls on high-volume traffic. Always set "strict": true and supply the full schema. The OpenAI structured outputs guide is the reference; the strict-JSON prompt is what we pair it with.
tool_choice left at auto. If you registered a tool because the workflow requires it, set tool_choice to "required". Otherwise the model will answer in prose on a meaningful fraction of calls. That is a spec violation on any pipeline that expects a tool output.
max_output_tokens unset. The model will run to the model-max on error. The bill line for a week of unbounded calls is the most common surprise in our reader-submitted post-mortems.

Minimal production call

from openai import OpenAI
from pydantic import BaseModel

class Invoice(BaseModel):
    id: str
    total_cents: int
    currency: str

client = OpenAI()
resp = client.chat.completions.parse(
    model="gpt-5.4",
    reasoning_effort="medium",
    response_format=Invoice,
    max_output_tokens=2048,
    temperature=0.0,
    seed=42,
    metadata={"call_class": "invoice_extract", "env": "prod"},
    messages=[...],
)
invoice: Invoice = resp.choices[0].message.parsed

Pricing (April 2026)

Tier	Input $/M	Cached input $/M	Output $/M	Best for
GPT-5.4	$2.50	$0.25	$15.00	Default production
GPT-5.5	$5.00	$0.50	$30.00	Agentic / terminal / 1M context
GPT-5.5 Pro	$30.00	$3.00	$180.00	Hardest reasoning, accept-once tasks (replaces GPT-5.4 Pro)
Batch (any tier)	50% of standard	50%	50%	Overnight evals, large structured runs
Priority	2x standard	2x	2x	SLA-critical low-latency

Pricing per the GPT-5.4 launch post. Cached input at 1/10 of standard is the lever most teams underuse: long stable system prompts pay for themselves in 2-3 calls.

Watch-outs

reasoning_effort=high does not change the returned token count in the response metadata the way you might expect. Hidden reasoning tokens are billed separately and appear under usage.output_tokens_details.reasoning_tokens. Build a dashboard alert on this if reasoning effort is high.
GPT-5.4 deprecates the logit_bias path for steering JSON structure. Carrying 2024 bias tables forward causes interaction errors with strict JSON mode.
The prediction parameter (speculative decoding) is supported. On test-gen workloads it cuts latency materially; it conflicts with stream in some SDK versions, so check your client before enabling.
GPT-5.2 Thinking is removed from the ChatGPT picker on June 5, 2026 per the launch post. API access is unchanged for now, but plan migration tests for any downstream consumer that still pins gpt-5.2.
model="gpt-5.5" uses the same parameters listed in this cheatsheet. The one drift: reasoning_effort="xhigh" is the new tier above "high" on GPT-5.5 — use it sparingly, it is the line item that doubles bills in week one of upgrades.

For the side-by-side with GPT-5.3-Codex on structured output at scale, see the GPT-5.3-Codex review. For the prompt pattern that pairs with these parameters and posts 100 of 100 on a 40-property schema, see the strict-JSON prompt.

One-line takeaway

Set model with a date, reasoning_effort to medium, response_format to strict JSON schema, tool_choice to required, and max_output_tokens to a real number. Everything else is a knob you can leave alone until you have a reason to move it.