From 6373f0b09cb8536662bd2a3b5f4948fd5b00b246 Mon Sep 17 00:00:00 2001 From: Shaun Arman Date: Sun, 31 May 2026 14:33:44 -0500 Subject: [PATCH] fix(ci): fix secret scrubbing regex that was deleting legitimate code lines MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous regex matched any line containing "password", "token", etc. near certain punctuation characters. This silently removed function signatures, variable declarations, and test assertions from the context sent to the LLM — causing it to hallucinate 3 BLOCKERs per review: - "function signature missing" (the `password: &str` param was scrubbed) - "filter body empty" (the filter condition containing "password" was scrubbed) - "password passed unencrypted" (the decrypt_token call line was scrubbed) Fix: match actual credential VALUES only: - Well-known token formats (AKIA..., ghp_..., xox...) - keyword = "long_quoted_literal" (25+ chars, clearly a value not a name) - Standalone base64 blob lines (60+ chars, PEM-style) Never scrub a line just because it contains a credential-related word. --- .gitea/workflows/pr-review.yml | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/.gitea/workflows/pr-review.yml b/.gitea/workflows/pr-review.yml index 4df1f408..065600a8 100644 --- a/.gitea/workflows/pr-review.yml +++ b/.gitea/workflows/pr-review.yml @@ -58,9 +58,13 @@ jobs: # Build context: full file content for each changed file. # Files <= 500 lines: include complete content. # Files > 500 lines: include the per-file diff with generous context (±50 lines). - # Secret scrubbing applied to both paths. - SECRET_PATTERN='^([[:space:]]*[+\-]?[[:space:]]*).*[pP]assword[[:space:]]*[=:"'"'"']|[tT]oken[[:space:]]*[=:"'"'"']|[aA][pP][iI][_][kK]ey[[:space:]]*[=:"'"'"']|AKIA[A-Z0-9]{16}|gh[opsu]_[A-Za-z0-9_]{36,}|Authorization:[[:space:]]' - B64_PATTERN='^[[:space:]]*[+\-]?[[:space:]]*[A-Za-z0-9+/]{40,}={0,2}([^A-Za-z0-9+/=]|$)' + # + # Secret scrubbing: match actual credential VALUES only — known API key formats, + # or keyword="long_quoted_literal" (25+ chars). Never scrub on keyword alone, + # which would silently delete function signatures, variable declarations, and tests. + SECRET_PATTERN='AKIA[A-Z0-9]{16}|gh[opsu]_[A-Za-z0-9_]{36,}|xox[baprs]-[0-9]{10,13}-[0-9]{10,13}-[a-zA-Z0-9]{24}|(password|token|api_key|secret)[[:space:]]*=[[:space:]]*["'"'"'][A-Za-z0-9+/_\-!@#]{25,}["'"'"']' + # Only strip lines that are ENTIRELY a long base64 blob (e.g. PEM cert bodies) + B64_PATTERN='^[[:space:]]*[A-Za-z0-9+/]{60,}={0,2}[[:space:]]*$' > /tmp/pr_context.txt while IFS= read -r file; do