tftsr-devops_investigation

History

Shaun Arman cf5bc83b75 Some checks failed Test / rust-fmt-check (pull_request) Successful in 1m42s Details Test / frontend-typecheck (pull_request) Successful in 1m42s Details Test / frontend-tests (pull_request) Successful in 1m42s Details Test / rust-clippy (pull_request) Successful in 3m16s Details PR Review Automation / review (pull_request) Failing after 4m33s Details Test / rust-tests (pull_request) Successful in 4m54s Details fix(ci): add post-generation evidence verification to pr-review qwen3-coder-next fabricates plausible-looking code in its Evidence blocks instead of quoting from the actual files provided. This adds a Python verification step that greps each fenced code block against the real changed files and tags any finding whose evidence cannot be found as UNVERIFIED. This is a safeguard, not a fix — the model is fundamentally unreliable for grounded code review. The longer-term fix is to replace qwen3-coder with a model that stays grounded to context (Claude Haiku, devstral, or deepseek-coder-v2 via the LiteLLM proxy / vLLM at 172.0.1.42).		2026-05-31 14:41:47 -05:00
..
workflows	fix(ci): add post-generation evidence verification to pr-review	2026-05-31 14:41:47 -05:00