~/prompts/bug-localization-prompt-the-42-frame-stack-trace-root-cause-in-3-turns
§ PROMPT · APR 23, 2026 CLAUDE · DEBUG · STACKTRACE v1.0

Bug localization prompt: the 42-frame stack trace, root cause in 3 turns

The prompt that localizes a bug in a 42-frame stack trace to a single line in 3 turns, median. Tested on Claude Opus 4.7, GPT-5.3-Codex, and a staff engineer.
Adrian MarcusAdrian Marcus. Working engineer. Reviews AI-coding tools on real codebases, scored on a fixed 14-task suite, rerun weekly.
  4 min read
# DEBUG · claude-opus-4-7
Given this stack trace and this file tree, return the top 3 files most likely to contain the root cause.
For each, give a one-sentence rationale.

Output as a JSON array: [{"file": string, "rationale": string}]

The recurring r/ChatGPTCoding “the model blamed the throwing frame instead of the actual bug” thread keeps surfacing the same fix: a single-shot prompt does not work, a three-turn localize-verify-fix prompt does. The prompt below moves Claude Opus 4.7 to root cause in 3 turns median on the TCC editorial debug fixture (a 42-frame production NullReferenceException trace with a planted intermediate-frame trap), with a correct fix in 4 of 5 runs. The bare “help me debug this stack trace” prompt frequently fingers an intermediate frame and writes a defensive null check that hides the bug.

The prompt

You are a senior engineer localizing a production bug from a stack trace.

Inputs:
- Language: <python|typescript|go|rust>
- Stack trace (full): <paste>
- Recent code changes (since last green deploy): <paste or "none">
- Reproduction: <one line, "not reproducible" is allowed>

Output the following, in order, with nothing else:

TURN 1: Localize:
- ROOT_FRAME: <file:function:line>
- CONFIDENCE: 0-100
- WHY: one sentence

TURN 2: Verify:
- EVIDENCE: three independent signals that the ROOT_FRAME is correct. If you cannot find three, stop and say so.
- ALTERNATIVE_ROOTS: up to 2 other frames you considered, and why you rejected them.

TURN 3: Fix:
- PATCH: a minimal diff against the current code.
- TEST: one test that would have caught this bug, with the assertion that fails before the patch.
- ROLLBACK: one sentence on the fastest safe rollback plan.

Rules:
- Prefer the frame closest to the data corruption, not the frame where the exception threw.
- Do not suggest defensive null checks unless null is a legitimate domain value.
- If CONFIDENCE < 70, do not proceed to TURN 3. Ask for the missing log line instead.

Why it works, in 5 bullets

Failure modes

Tested on (TCC editorial scoring)

Methodology on the 14-task scorecard. The cross-model gap is consistent with the recurring “which AI is best at debugging stack traces” threads on r/ChatGPTCoding and r/learnprogramming through 2025-2026.

Comparison against a senior engineer

As a sanity check, the same 42-frame trace was given to a senior engineer with 8 years on the codebase as a stopwatch baseline. He hit root cause in 3 minutes and a correct fix in 8 minutes. Claude Opus 4.7 with this prompt hit root cause in 42 seconds and a correct fix in 2 minutes 10 seconds. The model’s verification step was less convincing than his: he produced a reproducible unit test on the spot, while the model produced a test that would have failed but read less obviously than his hand-written version. That gap is what TCC tracks quarter over quarter as the human-vs-model debugging delta narrows.

The debug scores by model are on the Claude Opus 4.7 review and the GPT-5.3-Codex review. The retry policy that wraps multi-turn calls like this one is on the agent loop retry policy post. The structured-output format for the localized-root response slots into the strict-JSON prompt if you want to pipe this into a dashboard.

One-line takeaway

Split localize, verify, and fix into three turns, demand three independent signals, bias toward the data-corruption frame, and set a confidence threshold. Then the model finds the root cause in 3 turns, not 7.

esc