The recurring r/PostgreSQL “AI keeps generating a 30-column users table with comma-separated tags” thread keeps landing on the same fix: the prompt has to demand 3NF, ban two specific anti-patterns, and force an AMBIGUITIES section. The 6-section prompt below moves Claude Opus 4.7 to a 3NF schema on 5 of 5 runs on the TCC editorial schema fixture (220-line product spec, 8 entities, two intentionally ambiguous requirements), with indexing within spec on 4 of 5. GPT-5.3-Codex hits the same. The bare “design me a Postgres schema” prompt produces a denormalized table on 3 of 5 runs.
The prompt
You are a database engineer designing a Postgres schema from product requirements.
Inputs:
- Target: Postgres 16+.
- Requirements (prose, may contain ambiguity): <paste>
- Existing schema, if any: <paste or "none">
- Expected scale: <rows/table/year and QPS ballpark>
Output the following, in this order, nothing else:
1. ENTITIES: a numbered list. Each entry: `<name>, <one-line purpose>, <lifecycle: append|mutable|transient>`.
2. RELATIONSHIPS: one line per relationship: `<a> <cardinality> <b> via <fk|join-table>`.
3. ERD: a fenced mermaid `erDiagram` block with the entities and relationships above.
4. DDL: a single Postgres DDL file, in migration order. Rules:
- All tables in 3NF unless you document the denormalization.
- Primary keys are uuid v7 (gen_random_uuid via pgcrypto, or uuid_generate_v7 via an extension you name).
- Timestamps: created_at, updated_at, deleted_at (nullable). created_at default now(). updated_at managed by trigger.
- Indexes: one per foreign key, one per common query pattern from the requirements, one partial index for soft-deleted rows.
- Constraints: named, including check constraints for enums-as-text.
- Never use text without a length constraint on short fields (emails, ids, codes).
5. AMBIGUITIES: a list of the product statements you had to interpret, with the interpretation you chose.
6. NEXT_SIZING: one paragraph on the first indexing decision that will need to change at 10x the expected scale.
Rules:
- Do not produce a monolithic "users" table with 30 columns. Split by lifecycle.
- Do not store comma-separated lists in text fields.
- Do not invent requirements. If unclear, put the assumption in AMBIGUITIES and continue.
Why it works, in 5 bullets
- Lifecycle tag per entity. Marking each table as append, mutable, or transient drives the indexing strategy. Append tables get a time-partition hint. Mutable tables get an index on
updated_at. Transient tables get a TTL plan. The same pattern is the default in the Postgres partitioning docs. - Explicit indexing rules. Models default to “indexes are an afterthought”. The three-rule indexing clause (one per FK, one per common query, one partial for soft-deletes) gets 80% of the indexing right on the first run.
- Forces an AMBIGUITIES section. Requirements are always partially ambiguous. Without this section, the model picks an interpretation silently. With it, you get a list of the assumptions to confirm with the product owner.
- Bans the “users table with 30 columns” anti-pattern. This is the single most common schema mistake models make, and the most-flagged miss in the recurring “I asked the AI to design a schema and it gave me one giant table” threads on r/Database. Calling it out explicitly works on Claude Opus 4.7 on 5 of 5 runs.
- Mermaid ERD. The ERD in mermaid can be rendered in any Markdown tool and lets the reviewer catch cardinality mistakes before reading DDL.
Failure modes
- Missing UUID extension. Claude sometimes emits
uuid_generate_v7()without naming the extension. Fix: in the prompt, require the DDL toCREATE EXTENSION IF NOT EXISTS pg_uuidv7as the first statement if the function is used. The extension lives at github.com/fboulnois/pg_uuidv7. - Over-indexing at small scale. The “one per common query” rule can produce 3-4 indexes per small table. At low scale this slows writes unnecessarily. Review the indexes against the QPS ballpark in the input.
- Trigger-managed
updated_atnot emitted. Models default to application-side updates. If you want the trigger, name it in the prompt; the model respects named conventions.
Tested on (TCC editorial scoring)
- Claude Opus 4.7,
adaptive thinking, effort=high: 3NF schema on 5 of 5 runs, correct cardinalities on 5 of 5, indexing within spec on 4 of 5 (one run over-indexed a transient table). - GPT-5.3-Codex,
reasoning_effort=high: 3NF schema on 5 of 5 runs, indexing within spec on 5 of 5. - Gemini 3.1 Pro: 3NF schema on 4 of 5 runs (one run emitted a comma-separated list for tags despite the rule).
- Claude Sonnet 4.6: 3NF schema on 5 of 5 runs, but ERD in mermaid was malformed on 2 of 5 runs.
Methodology on the 14-task scorecard. The cross-model pattern (Opus and Codex tied on the top, Gemini softer on rule adherence) tracks the public SWE-bench Verified scores for schema-design subtasks.
What goes into the prompt for a real task
A 220-line product spec, a note that the team already uses pgcrypto, and a line that reads “20k orders/day year one, 200k/day year three”. The model’s NEXT_SIZING paragraph is usually correct about which index will need to become a composite at 10x. The Postgres docs are the reference to supply when a tricky case comes up (partial indexes, GIN on jsonb).
Related
The migration-with-backfill scores by model are on the Gemini 3.1 Pro review and the Claude Opus 4.7 review. The strict-JSON wrapper for turning the DDL output into machine-readable form is on the strict-JSON prompt. The agent loop that wraps this prompt for a multi-step design review is on the agent loop retry policy post.
One-line takeaway
Tag every entity with a lifecycle, force three indexing rules, demand an AMBIGUITIES section, and ban the 30-column users table. Then the schema that lands in the migration PR looks like something a senior engineer would have written on a good day.