tftsr-devops_investigation/.gitea
Shaun Arman cf5bc83b75
Some checks failed
Test / rust-fmt-check (pull_request) Successful in 1m42s
Test / frontend-typecheck (pull_request) Successful in 1m42s
Test / frontend-tests (pull_request) Successful in 1m42s
Test / rust-clippy (pull_request) Successful in 3m16s
PR Review Automation / review (pull_request) Failing after 4m33s
Test / rust-tests (pull_request) Successful in 4m54s
fix(ci): add post-generation evidence verification to pr-review
qwen3-coder-next fabricates plausible-looking code in its Evidence
blocks instead of quoting from the actual files provided. This adds a
Python verification step that greps each fenced code block against the
real changed files and tags any finding whose evidence cannot be found
as UNVERIFIED.

This is a safeguard, not a fix — the model is fundamentally unreliable
for grounded code review. The longer-term fix is to replace qwen3-coder
with a model that stays grounded to context (Claude Haiku, devstral,
or deepseek-coder-v2 via the LiteLLM proxy / vLLM at 172.0.1.42).
2026-05-31 14:41:47 -05:00
..
workflows fix(ci): add post-generation evidence verification to pr-review 2026-05-31 14:41:47 -05:00