“Strict JSON” without the strict flag is the single most common production parsing failure across the OpenAI and Anthropic SDKs in 2026. The recurring r/ChatGPTCoding and Anthropic Discord threads on parse errors land on the same answer: the prompt does not fix it, the strict flag does, and the prompt is the insurance policy on top. The 11-line prompt below moves Claude Opus 4.7 to 99-100 of 100 and GPT-5.3-Codex to 100 of 100 across 500 runs on the TCC adversarial set (40-property schema, 100 inputs designed to break naive prompts).
The prompt
Return exactly one JSON object that validates against the schema.
Rules, in order of priority:
1. Return JSON only. No prose, no markdown, no code fences.
2. Every required field must be present, spelled exactly as in the schema.
3. Never return the schema. Return an instance of the schema.
4. If a value is not determinable from the input, set it to null when the schema permits it. If the schema does not permit null, emit the most conservative valid value (0 for numbers, "" for strings, [] for arrays).
5. Stop as soon as the closing brace of the top-level object is written. Do not continue.
The schema is supplied out-of-band via response_format. Do not repeat the schema in your output.
Why it works, in 5 bullets
- “Return the schema” is a real failure mode. On deeply nested structures, models sometimes return the schema itself instead of an instance. Rule 3 addresses it explicitly. The same failure shows up in the recurring “model returned my JSON Schema instead of a JSON object” threads on the OpenAI Python SDK issue tracker.
- “Stop as soon as the closing brace” stops the trailing-text failure. Models love to add a “Let me know if you need anything else!” after a valid JSON block. With native strict-JSON mode that trailing text should not appear; without it, it does on a meaningful percentage of calls. Rule 5 is insurance.
- “Conservative valid value” blocks the enum-extrapolation bug. Without rule 4, models return enum values outside the schema (helpful defaults like
"medium"when the schema lists"low","high"). Rule 4 routes those cases to the smallest valid value instead. - No markdown and no code fences. The
response_formatflag handles this on OpenAI and Anthropic, but explicit instruction reduces edge-case emission on long outputs and on models from vendors without strict-mode parity. - One ordering of rules. Models respect priority when you give them priority. Unordered rule lists produce edge cases where the model interprets two rules as contradictory. Ordered rules do not.
Failure modes
- Unescaped quote in a nested string. The most common remaining failure class. The prompt cannot fix this cleanly. Fix is client-side: use
response_format={"type":"json_schema","strict":true}on OpenAI per the OpenAI structured outputs guide, or the tool-use interface withinput_schemaon Anthropic. Do not rely on the prompt alone. - “Number as string” when the schema requires integer. A small but persistent failure class. On GPT-5.4 with strict mode this drops to 0. On other vendors, validate client-side and retry once with the validator error appended to the prompt.
- Model ignores the prompt and returns prose because the strict flag is off. This is a setup bug, not a prompt bug. If you see prose in the output,
response_formatis not set.
Tested on (TCC editorial scoring)
- GPT-5.3-Codex with
response_format={"type":"json_schema","strict":true,"schema":...}: 100 of 100 across 500 runs. - GPT-5.4 with strict JSON mode: 100 of 100 across 500 runs.
- Claude Opus 4.7 with tool-use interface,
tool_choiceforced: 499 of 500 clean. The one failure was a trailing comma on a deeply nested array. - Gemini 3.1 Pro with
responseMimeType="application/json"and a suppliedresponseSchema: 488 of 500 clean. The failures cluster on the enum-extrapolation bug that rule 4 partially addresses.
Methodology on the 14-task scorecard. The cross-vendor pattern matches the recurring Hacker News threads on structured outputs: OpenAI’s grammar-constrained decoding is the strongest, Anthropic’s tool-use interface is close, Gemini’s responseSchema mode catches the most common failures but not all.
Pair with a parser gate
from pydantic import BaseModel, ValidationError
from openai import OpenAI
class InvoiceExtract(BaseModel):
id: str
total_cents: int
currency: str
def call_with_one_retry(messages):
client = OpenAI()
for attempt in (1, 2):
resp = client.chat.completions.parse(
model="gpt-5.3-codex",
response_format=InvoiceExtract,
messages=messages,
)
try:
return resp.choices[0].message.parsed
except ValidationError as e:
if attempt == 2: raise
messages.append({
"role": "user",
"content": f"Your response did not validate: {e}. Fix and return only the corrected JSON."
})
One retry is enough; a second retry rarely converges and costs you latency. Log the validator error on the final failure and fail the turn.
Related
The production-track structured-output post is three years of structured outputs. The cheatsheet for GPT-5.4’s strict-JSON flags is on the GPT-5.4 API cheatsheet. The cheatsheet for Claude Opus 4.7’s tool-use path is on the Claude Opus 4.7 tool calling cheatsheet. The full strict-JSON scores are on the GPT-5.3-Codex review.
One-line takeaway
Pair the prompt with a strict-JSON flag on the client, force “return an instance, not the schema”, stop at the closing brace, and the parse error rate drops to the single-digit-per-million range.