~/prompts/property-based-test-generation-prompt-6-invariants-on-the-first-run
§ PROMPT · APR 23, 2026 GPT-5 · PROPERTY · TESTING v1.0

Property-based test generation prompt: 6 invariants on the first run

The prompt that writes 6 Hypothesis invariants for a JSON-diff library on the first run, with shrink strategies. Tested on GPT-5.3-Codex, Claude Opus 4.7, and Aider.
Adrian MarcusAdrian Marcus. Working engineer. Reviews AI-coding tools on real codebases, scored on a fixed 14-task suite, rerun weekly.
  4 min read
# TESTING · gpt-5.3-codex
Given the following function signature and its JSDoc, write 4 property-based tests using fast-check.
For each property, state the invariant in one English sentence before the code.

The recurring r/Python and r/Hypothesis “model wrote example tests when I asked for property-based” thread has a structural answer: the prompt has to demand a count, ban example-based tests, and force a recursive shrink strategy on nested data. The 7-rule prompt below moves Claude Opus 4.7 to 6 of 6 invariants on the TCC editorial property-test fixture (a JSON-diff library with 6 documented invariants), with shrink strategies and a state machine for the stateful invariant. GPT-5.3-Codex hits the same. The bare “write me property-based tests” prompt produces 4 of 6 invariants and skips the recursion shrinker on most runs across both models.

The prompt

Write property-based tests for the library below, in Python, using Hypothesis.

Inputs:
- Source file: <paste source here or reference by path>
- Function under test: <name>
- Public contract (plain English): <2-4 lines>

Produce exactly this output:

1. A list of 6 invariants, numbered. Each invariant is one sentence.
2. For each invariant, a @given-decorated test function. Name the test `test_prop_<short_name>`.
3. Include shrink-friendly strategies. For recursive or nested inputs, use `st.recursive` with a size cutoff (max_leaves=8) and compose with shared structures where the invariant requires shared identity.
4. Use `@settings(max_examples=200, deadline=timedelta(seconds=1))`. Adjust only if justified in a comment.
5. If any invariant is not provable as a pure property (requires a model), emit it as a stateful RuleBasedStateMachine instead, not as a @given test. Mark it clearly in a comment.
6. Do not write example-based tests. Do not write docstrings longer than one line per test.
7. At the end, emit a single line: `# Run: pytest -q && pytest --hypothesis-profile=ci`

Why it works, in 5 bullets

Failure modes

Tested on (TCC editorial scoring)

Methodology on the 14-task scorecard. The pattern (Opus and Codex tied on the top, Gemini behind on stateful tasks) is consistent with the recurring r/MachineLearning thread on PBT generation across frontier models in early 2026.

TypeScript variant

For TypeScript, swap the prompt’s stack section:

Use fast-check, TypeScript strict mode, jest runner. Replace:
- @given -> fc.property
- @settings(max_examples=200, deadline=timedelta(seconds=1)) -> { numRuns: 200, timeout: 1000 }
- RuleBasedStateMachine -> fc.statefulFakeItems

The same rules apply. The shrink-strategy line maps to fc.letrec for recursive inputs, and the deadline maps to timeout. See the fast-check docs for the API. The TCC fixture has not been ported to TypeScript yet; results may shift.

The deterministic eval harness that uses property-based tests as graders is on the evals-without-judges post. The strict-JSON prompt for the output schema of your test-gen harness is on the strict-JSON prompt. The test-gen scores for each model are on the GPT-5.3-Codex review and Claude Opus 4.7 review.

One-line takeaway

Ask for exactly 6 invariants, force st.recursive on nested inputs, route stateful invariants to a state machine, and the first run produces a PBT suite you can ship.

esc