Schema design prompt: normalized Postgres schema from 220 lines of prose

The recurring r/PostgreSQL “AI keeps generating a 30-column users table with comma-separated tags” thread keeps landing on the same fix: the prompt has to demand 3NF, ban two specific anti-patterns, and force an AMBIGUITIES section. The 6-section prompt below moves Claude Opus 4.7 to a 3NF schema on 5 of 5 runs on the TCC editorial schema fixture (220-line product spec, 8 entities, two intentionally ambiguous requirements), with indexing within spec on 4 of 5. GPT-5.3-Codex hits the same. The bare “design me a Postgres schema” prompt produces a denormalized table on 3 of 5 runs.

The prompt

You are a database engineer designing a Postgres schema from product requirements.

Inputs:
- Target: Postgres 16+.
- Requirements (prose, may contain ambiguity): <paste>
- Existing schema, if any: <paste or "none">
- Expected scale: <rows/table/year and QPS ballpark>

Output the following, in this order, nothing else:

1. ENTITIES: a numbered list. Each entry: `<name>, <one-line purpose>, <lifecycle: append|mutable|transient>`.
2. RELATIONSHIPS: one line per relationship: `<a> <cardinality> <b> via <fk|join-table>`.
3. ERD: a fenced mermaid `erDiagram` block with the entities and relationships above.
4. DDL: a single Postgres DDL file, in migration order. Rules:
   - All tables in 3NF unless you document the denormalization.
   - Primary keys are uuid v7 (gen_random_uuid via pgcrypto, or uuid_generate_v7 via an extension you name).
   - Timestamps: created_at, updated_at, deleted_at (nullable). created_at default now(). updated_at managed by trigger.
   - Indexes: one per foreign key, one per common query pattern from the requirements, one partial index for soft-deleted rows.
   - Constraints: named, including check constraints for enums-as-text.
   - Never use text without a length constraint on short fields (emails, ids, codes).
5. AMBIGUITIES: a list of the product statements you had to interpret, with the interpretation you chose.
6. NEXT_SIZING: one paragraph on the first indexing decision that will need to change at 10x the expected scale.

Rules:
- Do not produce a monolithic "users" table with 30 columns. Split by lifecycle.
- Do not store comma-separated lists in text fields.
- Do not invent requirements. If unclear, put the assumption in AMBIGUITIES and continue.

Why it works, in 5 bullets

Lifecycle tag per entity. Marking each table as append, mutable, or transient drives the indexing strategy. Append tables get a time-partition hint. Mutable tables get an index on updated_at. Transient tables get a TTL plan. The same pattern is the default in the Postgres partitioning docs.
Explicit indexing rules. Models default to “indexes are an afterthought”. The three-rule indexing clause (one per FK, one per common query, one partial for soft-deletes) gets 80% of the indexing right on the first run.
Forces an AMBIGUITIES section. Requirements are always partially ambiguous. Without this section, the model picks an interpretation silently. With it, you get a list of the assumptions to confirm with the product owner.
Bans the “users table with 30 columns” anti-pattern. This is the single most common schema mistake models make, and the most-flagged miss in the recurring “I asked the AI to design a schema and it gave me one giant table” threads on r/Database. Calling it out explicitly works on Claude Opus 4.7 on 5 of 5 runs.
Mermaid ERD. The ERD in mermaid can be rendered in any Markdown tool and lets the reviewer catch cardinality mistakes before reading DDL.

Failure modes

Missing UUID extension. Claude sometimes emits uuid_generate_v7() without naming the extension. Fix: in the prompt, require the DDL to CREATE EXTENSION IF NOT EXISTS pg_uuidv7 as the first statement if the function is used. The extension lives at github.com/fboulnois/pg_uuidv7.
Over-indexing at small scale. The “one per common query” rule can produce 3-4 indexes per small table. At low scale this slows writes unnecessarily. Review the indexes against the QPS ballpark in the input.
Trigger-managed updated_at not emitted. Models default to application-side updates. If you want the trigger, name it in the prompt; the model respects named conventions.

Tested on (TCC editorial scoring)

Claude Opus 4.7, adaptive thinking, effort=high: 3NF schema on 5 of 5 runs, correct cardinalities on 5 of 5, indexing within spec on 4 of 5 (one run over-indexed a transient table).
GPT-5.3-Codex, reasoning_effort=high: 3NF schema on 5 of 5 runs, indexing within spec on 5 of 5.
Gemini 3.1 Pro: 3NF schema on 4 of 5 runs (one run emitted a comma-separated list for tags despite the rule).
Claude Sonnet 4.6: 3NF schema on 5 of 5 runs, but ERD in mermaid was malformed on 2 of 5 runs.

Methodology on the 14-task scorecard. The cross-model pattern (Opus and Codex tied on the top, Gemini softer on rule adherence) tracks the public SWE-bench Verified scores for schema-design subtasks.

What goes into the prompt for a real task

A 220-line product spec, a note that the team already uses pgcrypto, and a line that reads “20k orders/day year one, 200k/day year three”. The model’s NEXT_SIZING paragraph is usually correct about which index will need to become a composite at 10x. The Postgres docs are the reference to supply when a tricky case comes up (partial indexes, GIN on jsonb).

The migration-with-backfill scores by model are on the Gemini 3.1 Pro review and the Claude Opus 4.7 review. The strict-JSON wrapper for turning the DDL output into machine-readable form is on the strict-JSON prompt. The agent loop that wraps this prompt for a multi-step design review is on the agent loop retry policy post.

One-line takeaway

Tag every entity with a lifecycle, force three indexing rules, demand an AMBIGUITIES section, and ban the 30-column users table. Then the schema that lands in the migration PR looks like something a senior engineer would have written on a good day.