GPT-5.4 shipped March 5, 2026. The 9 parameters below are the ones a production-grade call site needs to set explicitly; the 3 footguns are the ones the recurring r/ChatGPTCoding “GPT-5.4 in production” threads keep flagging. Pin the model, force the schema, cap the bill.
model="gpt-5.4" for model="gpt-5.5".The 9 parameters to set explicitly
| Parameter | API default | Production value | Why |
|---|---|---|---|
model |
none | gpt-5.4 (or pinned snapshot) |
Pin the version. Floating -latest breaks reproducibility. |
reasoning_effort |
medium |
medium for coding, high for planning |
Higher effort costs 2-4x in hidden reasoning tokens; only worth it on long-horizon work. |
response_format |
text | {"type":"json_schema","strict":true,...} |
Strict mode compiles the schema into constrained decoding. The classic JSON parse-error classes can no longer occur. |
tool_choice |
auto |
"required" or {"type":"function",...} |
Forces the model to call the tool you registered. |
max_output_tokens |
model max | 2048-8192 | Caps runaway generations. The unbounded bill line is the biggest single leak we see in audits. |
temperature |
1.0 | 0.0 for deterministic, 0.4 for test-gen | 0 is not fully deterministic on its own; pair with seed. |
seed |
none | fixed int per call class | Reproducibility on replays. |
top_logprobs |
none | 5 on eval runs |
Margin-of-confidence at eval time without a second call. |
metadata |
none | {"eval_id":..., "call_class":...} |
Tags calls for post-hoc analysis in the OpenAI dashboard. |
The 3 settings that break you
response_formatwithout"strict": true. Without strict mode the model emits JSON-flavoured text and your parser rejects a non-trivial percentage of calls on high-volume traffic. Always set"strict": trueand supply the full schema. The OpenAI structured outputs guide is the reference; the strict-JSON prompt is what we pair it with.tool_choiceleft atauto. If you registered a tool because the workflow requires it, settool_choiceto"required". Otherwise the model will answer in prose on a meaningful fraction of calls. That is a spec violation on any pipeline that expects a tool output.max_output_tokensunset. The model will run to the model-max on error. The bill line for a week of unbounded calls is the most common surprise in our reader-submitted post-mortems.
Minimal production call
from openai import OpenAI
from pydantic import BaseModel
class Invoice(BaseModel):
id: str
total_cents: int
currency: str
client = OpenAI()
resp = client.chat.completions.parse(
model="gpt-5.4",
reasoning_effort="medium",
response_format=Invoice,
max_output_tokens=2048,
temperature=0.0,
seed=42,
metadata={"call_class": "invoice_extract", "env": "prod"},
messages=[...],
)
invoice: Invoice = resp.choices[0].message.parsed
Pricing (April 2026)
| Tier | Input $/M | Cached input $/M | Output $/M | Best for |
|---|---|---|---|---|
| GPT-5.4 | $2.50 | $0.25 | $15.00 | Default production |
| GPT-5.5 | $5.00 | $0.50 | $30.00 | Agentic / terminal / 1M context |
| GPT-5.5 Pro | $30.00 | $3.00 | $180.00 | Hardest reasoning, accept-once tasks (replaces GPT-5.4 Pro) |
| Batch (any tier) | 50% of standard | 50% | 50% | Overnight evals, large structured runs |
| Priority | 2x standard | 2x | 2x | SLA-critical low-latency |
Pricing per the GPT-5.4 launch post. Cached input at 1/10 of standard is the lever most teams underuse: long stable system prompts pay for themselves in 2-3 calls.
Watch-outs
reasoning_effort=highdoes not change the returned token count in the response metadata the way you might expect. Hidden reasoning tokens are billed separately and appear underusage.output_tokens_details.reasoning_tokens. Build a dashboard alert on this if reasoning effort is high.- GPT-5.4 deprecates the
logit_biaspath for steering JSON structure. Carrying 2024 bias tables forward causes interaction errors with strict JSON mode. - The
predictionparameter (speculative decoding) is supported. On test-gen workloads it cuts latency materially; it conflicts withstreamin some SDK versions, so check your client before enabling. - GPT-5.2 Thinking is removed from the ChatGPT picker on June 5, 2026 per the launch post. API access is unchanged for now, but plan migration tests for any downstream consumer that still pins
gpt-5.2. model="gpt-5.5"uses the same parameters listed in this cheatsheet. The one drift:reasoning_effort="xhigh"is the new tier above"high"on GPT-5.5 — use it sparingly, it is the line item that doubles bills in week one of upgrades.
Related
For the side-by-side with GPT-5.3-Codex on structured output at scale, see the GPT-5.3-Codex review. For the prompt pattern that pairs with these parameters and posts 100 of 100 on a 40-property schema, see the strict-JSON prompt.
One-line takeaway
Set model with a date, reasoning_effort to medium, response_format to strict JSON schema, tool_choice to required, and max_output_tokens to a real number. Everything else is a knob you can leave alone until you have a reason to move it.