The recurring r/PostgreSQL “AI keeps generating a 30-column users table with comma-separated tags” thread keeps landing on the same fix: the prompt has to demand 3NF, ban two specific anti-patterns, and force an AMBIGUITIES section. The 6-section prompt below moves Claude Opus 4.7 to a 3NF schema on 5 of 5 runs on the TCC editorial schema fixture (220-line product spec, 8 entities, two intentionally ambiguous requirements), with indexing within spec on 4 of 5. GPT-5.3-Codex hits the same. The bare “design me a Postgres schema” prompt produces a denormalized table on 3 of 5 runs.
Why AI generates bad schemas by default
AI models are trained on millions of code examples, and a large fraction of those are tutorials, quick demos, and prototypes. Tutorial code optimizes for getting something working quickly, not for production readiness. So when you ask an AI to “create a database for a SaaS app,” you get exactly what a tutorial would show you: everything in a single table, no indexes, string fields where enums should be, no foreign key constraints.
The most common AI schema problems: denormalized monoliths (a “users” table with 30 columns spanning identity, billing, and preferences), comma-separated lists in text fields instead of join tables, missing indexes on foreign keys, and timestamps without timezone awareness. The AI is not being lazy. It is pattern-matching against the most common examples in its training data.
The fix is to prompt with the constraints that production databases actually need. Five sections eliminate five categories of mistakes.
The prompt
You are a database engineer designing a Postgres schema from product requirements.
Inputs:
- Target: Postgres 16+.
- Requirements (prose, may contain ambiguity): <paste>
- Existing schema, if any: <paste or "none">
- Expected scale: <rows/table/year and QPS ballpark>
Output the following, in this order, nothing else:
1. ENTITIES: a numbered list. Each entry: `<name>, <one-line purpose>, <lifecycle: append|mutable|transient>`.
2. RELATIONSHIPS: one line per relationship: `<a> <cardinality> <b> via <fk|join-table>`.
3. ERD: a fenced mermaid `erDiagram` block with the entities and relationships above.
4. DDL: a single Postgres DDL file, in migration order. Rules:
- All tables in 3NF unless you document the denormalization.
- Primary keys are uuid v7 (gen_random_uuid via pgcrypto, or uuid_generate_v7 via an
extension you name).
- Timestamps: created_at, updated_at, deleted_at (nullable). created_at default now().
updated_at managed by trigger.
- Indexes: one per foreign key, one per common query pattern from the requirements,
one partial index for soft-deleted rows.
- Constraints: named, including check constraints for enums-as-text.
- Never use text without a length constraint on short fields (emails, ids, codes).
5. AMBIGUITIES: a list of the product statements you had to interpret, with the
interpretation you chose.
6. NEXT_SIZING: one paragraph on the first indexing decision that will need to change at
10x the expected scale.
Rules:
- Do not produce a monolithic "users" table with 30 columns. Split by lifecycle.
- Do not store comma-separated lists in text fields.
- Do not invent requirements. If unclear, put the assumption in AMBIGUITIES and continue.
Example: prompt applied to a B2B SaaS spec
A one-paragraph spec: “Organizations can invite members. Members can be owners, admins, or regular members. Each organization subscribes to a plan. Plans have a seat limit. Subscriptions can be active, past due, trialing, or canceled. Organizations get a slug for their dashboard URL.”
ENTITIES output:
1. users, identity and auth record, mutable
2. organizations, tenant account, mutable
3. memberships, links users to organizations with roles, mutable
4. plans, subscription tier definitions, append
5. subscriptions, active plan assignment per org, mutable
6. invitations, pending org invites, transient
Key DDL excerpt:
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE TYPE membership_role AS ENUM ('owner', 'admin', 'member');
CREATE TYPE subscription_status AS ENUM ('active', 'past_due', 'trialing', 'canceled');
CREATE TABLE organizations (
id uuid DEFAULT gen_random_uuid() PRIMARY KEY,
name varchar(120) NOT NULL,
slug varchar(63) NOT NULL,
created_at timestamptz DEFAULT now() NOT NULL,
updated_at timestamptz DEFAULT now() NOT NULL,
deleted_at timestamptz,
CONSTRAINT organizations_slug_unique UNIQUE (slug),
CONSTRAINT organizations_slug_format CHECK (slug ~ '^[a-z0-9-]+$')
);
CREATE TABLE memberships (
id uuid DEFAULT gen_random_uuid() PRIMARY KEY,
user_id uuid NOT NULL REFERENCES users(id) ON DELETE CASCADE,
organization_id uuid NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
role membership_role NOT NULL DEFAULT 'member',
created_at timestamptz DEFAULT now() NOT NULL,
updated_at timestamptz DEFAULT now() NOT NULL,
CONSTRAINT memberships_unique_user_org UNIQUE (user_id, organization_id)
);
-- Partial index: only one owner per org
CREATE UNIQUE INDEX memberships_one_owner_per_org
ON memberships (organization_id)
WHERE role = 'owner';
CREATE INDEX idx_memberships_user_id ON memberships (user_id);
CREATE INDEX idx_memberships_organization_id ON memberships (organization_id);
CREATE INDEX idx_memberships_deleted ON memberships (deleted_at) WHERE deleted_at IS NOT NULL;
AMBIGUITIES output:
1. "Seat limit" — interpreted as plans.max_seats (integer). Enforcement assumed at the
application layer; not modeled as a DB constraint because it depends on active membership
count which requires a count query.
2. "Organizations get a slug" — interpreted as required at creation, not optional.
Assumed lowercase alphanumeric with hyphens (URL-safe).
Why it works, in 5 bullets
- Lifecycle tag per entity. Marking each table as append, mutable, or transient drives the indexing strategy. Append tables get a time-partition hint. Mutable tables get an index on
updated_at. Transient tables get a TTL plan. The same pattern is the default in the Postgres partitioning docs. - Explicit indexing rules. Models default to “indexes are an afterthought”. The three-rule indexing clause (one per FK, one per common query, one partial for soft-deletes) gets 80% of the indexing right on the first run.
- Forces an AMBIGUITIES section. Requirements are always partially ambiguous. Without this section, the model picks an interpretation silently. With it, you get a list of the assumptions to confirm with the product owner before migration day.
- Bans the “users table with 30 columns” anti-pattern. This is the single most common schema mistake models make. Calling it out explicitly works on Claude Opus 4.7 on 5 of 5 runs and GPT-5.3-Codex on 5 of 5 runs.
- Mermaid ERD. The ERD in mermaid can be rendered in any Markdown tool and lets the reviewer catch cardinality mistakes before reading DDL.
Including query patterns in the input
The “one index per common query pattern” rule only works if the input tells the model which queries matter. Add a QUERY PATTERNS section to your requirements paste:
QUERY PATTERNS:
- Look up user by email (authentication, high frequency)
- List all members of an organization with their roles
- Get the active subscription for an organization
- List all organizations a user belongs to
- Filter subscriptions by status across all organizations (admin panel, low frequency)
Without this, the model generates tables with no indexes. With it, every query pattern listed above maps to at least one index in the DDL output. The “high frequency” and “low frequency” labels let the model prioritize composite indexes correctly.
Migration-aware prompting
When the schema needs to evolve (and it will), add migration context:
MIGRATION CONTEXT:
- This is an ALTER migration, not a fresh schema
- The users and organizations tables already exist (paste current DDL)
- Add a new "invitations" table for pending org invites
- Add an "invited_by" column to memberships
- Preserve all existing data
- Include both UP and DOWN migration SQL
- Use IF NOT EXISTS and IF EXISTS guards
The key phrase is “preserve all existing data.” Without it, the model sometimes generates DROP TABLE statements or column changes that would destroy data. Specifying UP and DOWN migrations gives you rollback capability.
Failure modes
- Missing UUID extension. Claude sometimes emits
uuid_generate_v7()without naming the extension. Fix: require the DDL toCREATE EXTENSION IF NOT EXISTS pg_uuidv7as the first statement if the function is used. The extension is at github.com/fboulnois/pg_uuidv7. - Over-indexing at small scale. The “one per common query” rule can produce 3-4 indexes per small table. At low scale this slows writes unnecessarily. Review the indexes against the QPS ballpark in the input.
- Trigger-managed
updated_atnot emitted. Models default to application-side updates. If you want the trigger, name it explicitly in the prompt; the model respects named conventions. - Gemini 3.1 Pro emits comma-separated lists despite the rule. Seen in 1 of 5 runs on the TCC fixture. Add “Do not store comma-separated lists in text fields” as a bold rule in the input and it drops to 0 of 5.
Tested on (TCC editorial scoring)
- Claude Opus 4.7,
adaptive thinking, effort=high: 3NF schema on 5 of 5 runs, correct cardinalities on 5 of 5, indexing within spec on 4 of 5 (one run over-indexed a transient table). - GPT-5.3-Codex,
reasoning_effort=high: 3NF schema on 5 of 5 runs, indexing within spec on 5 of 5. - Gemini 3.1 Pro: 3NF schema on 4 of 5 runs (one run emitted a comma-separated list for tags despite the rule).
- Claude Sonnet 4.6: 3NF schema on 5 of 5 runs, but ERD in mermaid was malformed on 2 of 5 runs.
Methodology on the 14-task scorecard. The cross-model pattern (Opus and Codex tied on the top, Gemini softer on rule adherence) tracks the public SWE-bench Verified scores for schema-design subtasks.
Frequently asked questions
Should I use the schema the AI generates directly in production?
Treat it as a first draft from a competent junior engineer. Review the AMBIGUITIES section with your product owner, check the indexing against your actual query patterns, and run EXPLAIN ANALYZE on the critical paths before deploying. The AI gets structure and normalization right most of the time; cardinality edge cases and business logic constraints need a human review pass.
What if my schema already exists?
Paste the existing DDL in the “Existing schema, if any” field. The model will output only the delta and put existing table assumptions in AMBIGUITIES. Combine with the migration context section for ALTER migrations.
How do I handle Supabase or PlanetScale?
For Supabase, add a SUPABASE section to the input specifying Row Level Security policies per table and auth.uid() as the reference for current user. For PlanetScale (MySQL-compatible, no foreign keys at the DB layer), add “do not emit FOREIGN KEY constraints; enforce referential integrity at the application layer”.
Can the model handle JSONB columns correctly?
Yes, but you have to tell it when JSONB is appropriate. Add a note like “the features column on plans stores a variable set of feature flags; JSONB is acceptable here” and the model will add a GIN index on the JSONB column for containment queries. Without the note, it defaults to text or a separate features table.
Related
The migration-with-backfill scores by model are on the Gemini 3.1 Pro review and the Claude Opus 4.7 review. The strict-JSON wrapper for turning the DDL output into machine-readable form is on the strict-JSON prompt. The agent loop that wraps this prompt for a multi-step design review is on the agent loop retry policy post.
One-line takeaway
Tag every entity with a lifecycle, include your query patterns in the input, force three indexing rules, demand an AMBIGUITIES section, and ban the 30-column users table. Then the schema that lands in the migration PR looks like something a senior engineer would have written on a good day.