~/prompts/architecture-level-code-review-prompt-the-one-that-catches-3-real-issues-and-skips-the-false-positive
§ PROMPT · APR 23, 2026 CLAUDE · CODE-REVIEW · REVIEW v1.0

Architecture-level code review prompt: the one that catches 3 real issues and skips the false positive

The architecture-review prompt that flagged 3 real issues and ignored a planted false-positive trap on my 600-line PR. What to include, what to remove, and the 4 models I tested.
Adrian MarcusAdrian Marcus. Working engineer. Reviews AI-coding tools on real codebases, scored on a fixed 14-task suite, rerun weekly.
  4 min read
# REVIEW · claude-opus-4-7
You are reviewing a pull request in a large TypeScript codebase.
You will receive: the diff, the full contents of every file in the diff, and
the file tree.

Your output is three sections:
1. Does the change achieve its stated intent?
2. What invariants in the surrounding module does it break?
3. Three smallest fixes, ranked by impact.

The recurring r/ChatGPTCoding “AI PR review keeps flagging style nits and missing the auth bug” thread has a consistent answer: the default “review this PR” prompt does not work, and the fix is to anchor the model on production impact and force a non-issues section. The prompt below catches 3 of 3 real issues on the TCC code-review fixture (a 600-line PR with a missing auth check, an N+1 query, a race condition, and a planted false-positive trap) and does not fall for the trap on any of 5 runs with Claude Opus 4.7. The bare “review this PR” prompt catches 1-2 real issues and falls for the trap 3 of 5 runs.

The prompt

You are a staff engineer reviewing a pull request. You have one job: flag issues that will cause a production incident within 90 days.

Output format, exact:
---
ISSUES:
- [severity: high|medium|low] [file:line] <one-sentence description> <why it will break>
...
NON_ISSUES_CONSIDERED:
- [file:line] <what I thought might be wrong> <why I concluded it is fine>
---

Rules:
1. Do not comment on style, naming, or formatting unless it causes a correctness issue.
2. Prefer high-confidence issues. If you are less than 70% sure, list it under NON_ISSUES_CONSIDERED.
3. For every issue flagged, name the production scenario that triggers it.
4. Cap the output at 8 ISSUES. If you find more, keep the highest severity.
5. Read the PR end to end before emitting the first bullet. Do not stream.

Why it works, in 5 bullets

Failure modes

Tested on (TCC editorial scoring)

Methodology on the 14-task scorecard. The cross-model pattern (Anthropic leads, OpenAI close, Gemini behind on flow-sensitive tasks) is consistent with the public SWE-bench Pro leaderboard.

Wiring it into a PR workflow

# .github/workflows/ai-review.yml
on: pull_request
jobs:
  ai-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }
      - run: |
          diff=$(git diff --unified=5 origin/main...HEAD)
          echo "$diff" | python scripts/ai_review.py > review.md
      - uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const body = fs.readFileSync('review.md', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body
            });

The ai_review.py script wraps the prompt above, splits long diffs at 80k tokens, and concatenates the ISSUES sections before posting. Keep the NON_ISSUES_CONSIDERED section local for debugging; posting all of it clutters the PR thread.

The 14-task methodology is on the editorial process page. The retry policy for the API call wrapping this prompt is on the agent loop retry policy post. The model that currently posts the best code-review score on the TCC suite is Claude Opus 4.7.

One-line takeaway

Anchor the model on a 90-day incident horizon, force a non-issues section, cap output at 8, and the AI review stops flagging your comments and starts catching the missing auth check.

esc