fix(ci): reduce AI review hallucinations in pr-review workflow
Some checks failed
Test / rust-fmt-check (pull_request) Successful in 1m46s
Test / frontend-typecheck (pull_request) Successful in 1m49s
Test / frontend-tests (pull_request) Successful in 1m46s
Test / rust-clippy (pull_request) Failing after 3m12s
PR Review Automation / review (pull_request) Successful in 4m37s
Test / rust-tests (pull_request) Successful in 4m34s

Three changes:
- Exclude Cargo.lock/lockfiles from the diff — removes ~163 lines of
  hash noise that waste the review budget with no value
- Raise line cap from 500 to 3000 and add a truncation notice when
  the diff is cut, so the model knows the diff is incomplete
- Harden prompt: require quoted evidence for every finding; add explicit
  self-verification step for missing-identifier claims (search full diff
  before raising); tighten no-hallucinate instruction
This commit is contained in:
Shaun Arman 2026-05-31 14:08:10 -05:00
parent cf1d5adb83
commit 06956940e2

View File

@ -40,8 +40,17 @@ jobs:
run: | run: |
set -euo pipefail set -euo pipefail
git fetch origin ${{ github.base_ref }} git fetch origin ${{ github.base_ref }}
git diff origin/${{ github.base_ref }}..HEAD > /tmp/pr_diff.txt # Exclude generated/lock files — they add noise with no review value
echo "diff_size=$(wc -l < /tmp/pr_diff.txt | tr -d ' ')" >> $GITHUB_OUTPUT git diff origin/${{ github.base_ref }}..HEAD \
-- ':!Cargo.lock' ':!package-lock.json' ':!*.lock' \
> /tmp/pr_diff.txt
FULL_SIZE=$(wc -l < /tmp/pr_diff.txt | tr -d ' ')
echo "diff_size=${FULL_SIZE}" >> $GITHUB_OUTPUT
echo "diff_full_size=${FULL_SIZE}" >> $GITHUB_OUTPUT
# Build a manifest of changed files so the prompt can include it
git diff --name-only origin/${{ github.base_ref }}..HEAD \
-- ':!Cargo.lock' ':!package-lock.json' ':!*.lock' \
> /tmp/pr_files.txt
- name: Analyze with LLM - name: Analyze with LLM
id: analyze id: analyze
@ -57,10 +66,17 @@ jobs:
if grep -q "^Binary files" /tmp/pr_diff.txt; then if grep -q "^Binary files" /tmp/pr_diff.txt; then
echo "WARNING: Binary file changes detected — they will be excluded from analysis" echo "WARNING: Binary file changes detected — they will be excluded from analysis"
fi fi
DIFF_CONTENT=$(head -n 500 /tmp/pr_diff.txt \ CHANGED_FILES=$(cat /tmp/pr_files.txt | tr '\n' ' ')
FULL_LINES=$(wc -l < /tmp/pr_diff.txt | tr -d ' ')
DIFF_CONTENT=$(head -n 3000 /tmp/pr_diff.txt \
| grep -v -E '^[+-].*(password[[:space:]]*[=:"'"'"']|token[[:space:]]*[=:"'"'"']|secret[[:space:]]*[=:"'"'"']|api_key[[:space:]]*[=:"'"'"']|private_key[[:space:]]*[=:"'"'"']|Authorization:[[:space:]]|AKIA[A-Z0-9]{16}|xox[baprs]-[0-9]{10,13}-[0-9]{10,13}-[a-zA-Z0-9]{24}|gh[opsu]_[A-Za-z0-9_]{36,}|https?://[^@[:space:]]+:[^@[:space:]]+@)' \ | grep -v -E '^[+-].*(password[[:space:]]*[=:"'"'"']|token[[:space:]]*[=:"'"'"']|secret[[:space:]]*[=:"'"'"']|api_key[[:space:]]*[=:"'"'"']|private_key[[:space:]]*[=:"'"'"']|Authorization:[[:space:]]|AKIA[A-Z0-9]{16}|xox[baprs]-[0-9]{10,13}-[0-9]{10,13}-[a-zA-Z0-9]{24}|gh[opsu]_[A-Za-z0-9_]{36,}|https?://[^@[:space:]]+:[^@[:space:]]+@)' \
| grep -v -E '^[+-].*[A-Za-z0-9+/]{40,}={0,2}([^A-Za-z0-9+/=]|$)') | grep -v -E '^[+-].*[A-Za-z0-9+/]{40,}={0,2}([^A-Za-z0-9+/=]|$)')
PROMPT="You are a senior engineer performing a focused code review. Your review must be grounded strictly in the diff provided — do not invent issues about code you cannot see.\n\nPR Title: ${PR_TITLE}\n\nDiff:\n${DIFF_CONTENT}\n\n## Instructions\n\n1. **Read the entire diff first.** Before raising any issue, verify it against the actual lines in the diff. If something appears to be missing, confirm it is absent from ALL relevant files in the diff before claiming it is missing.\n\n2. **Quote the evidence.** For every issue you raise, cite the specific file and line from the diff that supports your claim. If you cannot quote a line, do not raise the issue.\n\n3. **Distinguish severity clearly:**\n - BLOCKER: broken right now, will cause crashes or data loss\n - WARNING: works but has a real risk that should be addressed before merge\n - SUGGESTION: improvement worth considering in a follow-up PR\n\n4. **Do not raise issues about code outside the diff.** If a concern requires reading files not present in the diff, say 'outside the scope of this diff' and skip it.\n\n5. **Keep it concise.** Lead with a one-paragraph summary, then list only genuine findings with evidence. Avoid restating what the code already does correctly unless it is directly relevant to a finding.\n\n## Output format\n\n**Summary** (1 paragraph)\n\n**Findings** (only real issues with quoted evidence)\n- [BLOCKER/WARNING/SUGGESTION] filename:line — description and suggested fix\n\n**Verdict**: APPROVE / APPROVE WITH COMMENTS / REQUEST CHANGES" if [ "$FULL_LINES" -gt 3000 ]; then
TRUNCATION_NOTICE="NOTE: This diff was truncated at 3000 lines (full diff is ${FULL_LINES} lines). Code appearing near the end of the diff may be incomplete. Do NOT raise findings about code that appears to be missing or incomplete — it may simply be outside the truncated window."
else
TRUNCATION_NOTICE="This diff is complete (${FULL_LINES} lines, no truncation)."
fi
PROMPT="You are a senior engineer performing a focused code review. Your review must be grounded strictly in the diff provided — do not speculate about code you cannot see.\n\nPR Title: ${PR_TITLE}\nFiles changed: ${CHANGED_FILES}\n${TRUNCATION_NOTICE}\n\nDiff:\n${DIFF_CONTENT}\n\n## Instructions\n\n1. **Read the entire diff before writing anything.** Do not begin composing your review until you have read every line of the diff above.\n\n2. **Verify before claiming.** Before raising any finding:\n a. Quote the exact line(s) from the diff that support it.\n b. For findings about missing identifiers (undeclared variables, missing parameters, undefined functions): search the ENTIRE diff for the identifier. If it appears anywhere in the diff, the finding is WRONG — discard it.\n c. If you cannot quote supporting evidence from the diff, do not raise the finding.\n\n3. **Do not hallucinate.** You may only raise issues visible in the diff. If a concern requires reading a file not shown in the diff, write 'outside the scope of this diff' and skip it. Never infer that code is broken from partial information.\n\n4. **Distinguish severity:**\n - BLOCKER: provably broken from what is shown — will fail to compile or cause data loss\n - WARNING: works but has a real risk that should be addressed before merge\n - SUGGESTION: improvement worth considering in a follow-up PR\n\n5. **Keep it concise.** Lead with a one-paragraph summary, then list only genuine findings with quoted evidence.\n\n## Output format\n\n**Summary** (1 paragraph)\n\n**Findings** (each must include a quoted line from the diff)\n- [BLOCKER/WARNING/SUGGESTION] filename:line — description, quoted evidence, suggested fix\n\n**Verdict**: APPROVE / APPROVE WITH COMMENTS / REQUEST CHANGES"
BODY=$(jq -cn \ BODY=$(jq -cn \
--arg model "qwen3-coder-next" \ --arg model "qwen3-coder-next" \
--arg content "$PROMPT" \ --arg content "$PROMPT" \