Remove frontend detectPiiCmd pre-scan loop — backend is sole redaction
authority; bubble update via response.user_message covers user feedback.
Detect PII on full file content before truncating. Previous order
(truncate to 8000 bytes then scan) could miss PII straddling the
boundary. Now: read full content, scan, redact, then truncate to
EMBED_LIMIT (8000 bytes) at a valid UTF-8 char boundary.
logFileIds IPC: pass undefined (not null) for empty array so Tauri
serialises it correctly to Rust Option::None.
Add MAX_TEXT_SCAN_BYTES (32 KB) guard in scan_text_for_pii to prevent
unbounded regex evaluation on oversized payloads.
Fix clippy uninlined_format_args in ai.rs.
Addresses three findings from the third automated review:
[BLOCKER] No frontend PII pre-check on attachments.
Added detectPiiCmd call for each logFileId before chatMessageCmd.
PII is not blocked (per explicit product decision: auto-redact and
send) but the user now sees a non-blocking amber notice listing
each file and the PII types that will be auto-redacted. Backend
remains the authoritative redaction layer.
[WARNING 2] Chat bubble showed original PII-laden message even though
only the redacted form was sent to AI.
Added updateMessageContent to sessionStore. After chatMessageCmd
returns, if response.user_message is set the user bubble is updated
to reflect what was actually stored in the DB, so the UI is
consistent with the audit log.
CI fix: cargo fmt changes to analysis.rs were not staged in the prior
commit. Committed here — fmt check now passes cleanly.
Resolves all three findings from the second automated review and
fixes the cargo fmt --check CI failure (formatting drift in analysis.rs
from a prior merge).
[BLOCKER 1 + BLOCKER 2 + WARNING]
Frontend no longer performs any PII scanning or redaction. All three
concerns stemmed from the same root cause: outMessage was derived
on the frontend and used for display, DB storage (via lastUserMsgRef
and the chat bubble), and the AI payload — causing the original message
to be silently replaced before the backend received it.
Fix: frontend sends the original message verbatim. Backend is now the
sole authority. chat_message auto-redacts the typed message text using
PiiDetector + apply_redactions() before building the full payload, logs
the PII types via tracing::warn, and stores only the redacted form in
ai_messages and the audit log. The redacted form is returned to the
caller as ChatResponse.user_message (Option<String>, absent from direct
provider calls).
Frontend uses message (original) for the chat bubble and
lastUserMsgRef — resolution steps show natural language, not
[Password] tokens. The AI and DB see only the redacted version.
CI fix: cargo fmt applied to analysis.rs; all format checks now pass.
Resolves all four findings from the automated review:
[BLOCKER 1] Attachment PII scan error path left pendingFiles intact,
allowing retry with stale file references. Fix: file content is no
longer held in frontend state at all — PendingFile drops the content
field entirely. logFileIds are captured before setPendingFiles([]) and
passed directly to the backend.
[BLOCKER 2] Raw file content stored in PendingFile.content created a
UI-visible PII surface and a data-residency risk. Fix: frontend never
reads or stores file content. The backend loads file data from disk,
auto-redacts PII in-memory using pii::apply_redactions(), and embeds
the clean text into the AI message. No PII ever touches the frontend.
[WARNING 1] String-based attachment header parsing was fragile and
bypassable. Fix: parsing is gone — backend identifies attachments by
log_file_id, reads them directly from the DB/disk path, and applies
redaction at that level.
[WARNING 2] Error message disclosed PII type list to the caller. Fix:
PII types are logged via tracing::warn only; no type details in the
user-facing error or API response.
Additionally: typed chat messages are now auto-redacted rather than
blocked. scanTextForPiiCmd runs on the typed text; detected spans are
replaced in reverse-offset order before the message is sent to the AI
and stored in the DB. The user sees the redacted form in their chat
bubble.
Architecture:
- chat_message now accepts log_file_ids: Option<Vec<String>>
- Backend reads file → detects PII → redacts in memory → embeds
- Frontend: no readTextFile, no content field, no frontend PII gate
File attachments were embedded into AI messages without any PII
scanning, allowing credentials, tokens, and other sensitive data
to be forwarded to AI providers in plaintext.
Typed chat messages had the same gap: a user could type a password
or API key directly and it would be sent unscanned.
Changes:
- chat_message (Rust): defence-in-depth scan of all attachment body
content (between --- Attached: markers); hard rejects if PII found
- detect_pii (Rust): fix return type from pii::PiiDetectionResult
(spans/original_text) to db::models::PiiDetectionResult
(detections/total_pii_found) to match the TypeScript contract; the
LogUpload PII review workflow was receiving undefined for detections
- scan_text_for_pii (Rust): new command — scans arbitrary text for PII
without creating DB records; used for typed message warnings
- Triage/index.tsx: PendingFile now carries logFileId; handleSend gates
each text attachment through detectPiiCmd (hard block on PII found);
typed message text scanned via scanTextForPiiCmd with a one-time
warning — second send of same message proceeds as acknowledgment
- compress_text now returns Result<Vec<u8>, String>; callers propagate
the error instead of silently storing empty BLOB on gzip failure
- upload_image_attachment_by_content and upload_paste_image now validate
decoded byte length against MAX_IMAGE_FILE_BYTES before DB storage,
closing the size-bypass gap that existed for base64 content uploads
- Image View modal in AttachmentsTab now surfaces the error string when
get_image_attachment_data fails, replacing the opaque "could not be
loaded" message with actionable diagnostic text
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Store compressed log content and raw image bytes in SQLite so attachments
are self-contained regardless of source file availability on disk.
DB (migrations 020-022):
- log_files.content_compressed BLOB — gzip-compressed extracted text
- image_attachments.image_data BLOB — raw image bytes
- Views v_log_files_with_issue and v_image_attachments_with_issue for
cross-incident queries with joined issue title
Rust backend:
- compress_text / decompress_text helpers (flate2 rust_backend / miniz_oxide)
with 100 MB decompression-bomb guard
- upload_log_file*, upload_log_file_by_content store content_compressed
- upload_image_attachment*, upload_paste_image store image_data
- New commands: get_log_file_content, list_all_log_files (analysis.rs)
- New commands: get_image_attachment_data, list_all_image_attachments (image.rs)
- All commands fall back to file_path for pre-migration records
Frontend:
- LogFileSummary, ImageAttachmentSummary types in tauriCommands.ts
- attachmentStore (Zustand) — loadAttachments, searchAttachments
- History page: Issues tab (existing) + Attachments tab (new)
with log/image tables, search bar, View modals, lazy thumbnails
Tests: 227 Rust (+16 new), 103 frontend (+9 new), tsc clean, clippy clean
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Three issues addressed together:
1. Race condition (was PR #56): changelog job now CREATES the Gitea
release rather than assuming build jobs have already created it.
Build jobs continue to use create-or-skip + upload unchanged.
2. Detached HEAD push: 'git push origin master' fails when HEAD is
detached (no local branch named master). Changed to 'HEAD:master'.
3. git-cliff tag guard: verify tag is present locally before running
git-cliff, to fail fast with a clear message rather than silently
generating a wrong changelog.
4. git commit idiom: replaced 'git commit || echo' (swallows all
non-zero exit codes including real failures) with an explicit
'git diff --staged --quiet' guard so set -euo pipefail is not
undermined.
The changelog job checks out a specific SHA (detached HEAD) then
commits CHANGELOG.md and tries to push with 'git push origin master'.
Since there is no local branch named 'master', git rejects the push
with 'src refspec master does not match any'.
Fix: use 'git push origin HEAD:master' which explicitly maps the
current detached HEAD to the remote master branch regardless of
local branch state.
Addresses the review warning: git rev-parse confirms the tag is
present in the local repo after git fetch --tags before git-cliff
or git tag --sort= runs against it. Fails fast with a clear error
if the tag is missing rather than silently generating an incomplete
changelog.
The changelog and build-* jobs all fan out from autotag in parallel.
Build jobs create the Gitea release with 'curl ... || true', but the
changelog job was trying to GET the release before any build job had
run, reliably failing with 'Could not find release for tag vX.Y.Z'.
Fix: changelog job owns release creation. It creates the release with
the git-cliff body if it does not exist, or patches the body if a
prior run already created it. Build jobs continue using their existing
create-or-skip + upload pattern unchanged.
Fixes two bugs identified in the AI code review:
1. INSERT OR REPLACE with a freshly generated UUID never matches the
existing primary key, so it appended rows instead of replacing.
Switch to DELETE-then-INSERT to guarantee exactly one row.
2. Username defaulted to empty string. Resolve it to the current OS
user (USER/LOGNAME env vars, fallback 'local') so credentials are
always bound to a specific user identity.
test_sudo_password now passes -u <username> to sudo so the test
runs scoped to the stored user, not an arbitrary one.
UI: show the configured username prominently in status; relabel the
field and add a scope hint below it.
Tests: test_set_sudo_singleton_delete_then_insert, three username
resolution tests.
The 147KB JSON body was being passed as a shell argument to curl,
hitting the kernel ARG_MAX limit. Write it to /tmp/body.json via
jq redirection and use curl --data @/tmp/body.json instead.
bash printf treats format strings starting with '-' as option flags in
some environments. The POSIX-safe idiom is 'printf "%s\n" content'
where the format is always "%s\n" and the content is an argument.
Applied to all prompt printf calls. Also replaced '--' in prompt text
with single '-' to eliminate any remaining double-dash ambiguity.
1. SECRET_PATTERN had [A-Za-z0-9+/_\-!@#] -- backslash-escaped hyphen
is invalid POSIX ERE; grep parsed it as a range with invalid bounds.
Fix: move hyphen to end of class: [A-Za-z0-9+/_!@#-].
2. printf -- '---\n' fails with 'invalid option' in bash because the
builtin does not accept -- as end-of-options. Removed -- from all
four printf calls.
Shell heredocs with unindented bodies (line 1 content) terminate YAML
run: | block scalars. The YAML parser sees the unindented heredoc body
as leaving the block, making the workflow file unparseable -- Gitea
silently stops creating runs for a workflow with invalid YAML.
Replace the single-quoted heredoc prompt with a group of printf + cat
calls. Every line stays properly indented within the YAML block scalar.
Use jq --rawfile instead of --arg to load the prompt from a temp file,
which also eliminates shell escaping hazards for large strings.
Gitea 1.22 cancel-in-progress does not behave like GitHub Actions: when
a new synchronize event arrives while a review is running, instead of
cancelling the running job and starting a new one, it drops the new run
silently. Remove the concurrency block entirely so every commit to a PR
gets its own review run.
The PROMPT string contained backtick-quoted text for the Evidence field
example. Inside a double-quoted bash string, backticks trigger command
substitution, causing 'exact: command not found' at runtime.
Fix: build the prompt using a single-quoted heredoc (no shell expansion
inside) then splice dynamic values via sed and python3 replace() instead
of shell variable interpolation.
Two changes to reduce hallucinations in pr-review:
1. Codebase index (new step "Build codebase index"):
Generates a compact manifest of everything that EXISTS in the project:
- All registered Tauri commands (from lib.rs generate_handler![])
- All TypeScript exports (from tauriCommands.ts)
- All public Rust fn signatures in commands/
- All DB migration names
This index is prepended to the prompt so the model cannot invent
functions like authenticate_sudo or continue_chat_history that are
absent from both the index and the file contents.
2. Full-repo verification (updated "Verify findings" step):
Previously only grepped changed files, which falsely tagged findings
about unchanged-but-real code as UNVERIFIED. Now runs git ls-files
to load all tracked source files, so verification only fails for
code that genuinely does not exist anywhere in the codebase.
If qwen3-coder continues to hallucinate after these changes, swap the
model name on line 184 to bedrock-personal or claude-haiku.
qwen3-coder-next fabricates plausible-looking code in its Evidence
blocks instead of quoting from the actual files provided. This adds a
Python verification step that greps each fenced code block against the
real changed files and tags any finding whose evidence cannot be found
as UNVERIFIED.
This is a safeguard, not a fix — the model is fundamentally unreliable
for grounded code review. The longer-term fix is to replace qwen3-coder
with a model that stays grounded to context (Claude Haiku, devstral,
or deepseek-coder-v2 via the LiteLLM proxy / vLLM at 172.0.1.42).
The previous regex matched any line containing "password", "token", etc.
near certain punctuation characters. This silently removed function
signatures, variable declarations, and test assertions from the context
sent to the LLM — causing it to hallucinate 3 BLOCKERs per review:
- "function signature missing" (the `password: &str` param was scrubbed)
- "filter body empty" (the filter condition containing "password" was scrubbed)
- "password passed unencrypted" (the decrypt_token call line was scrubbed)
Fix: match actual credential VALUES only:
- Well-known token formats (AKIA..., ghp_..., xox...)
- keyword = "long_quoted_literal" (25+ chars, clearly a value not a name)
- Standalone base64 blob lines (60+ chars, PEM-style)
Never scrub a line just because it contains a credential-related word.
Rust 1.88 enforces clippy::uninlined_format_args as a style lint under
-D warnings. Change `writeln!(stdin, "{}", password)` to the inline
form `writeln!(stdin, "{password}")`.
Three changes:
- Exclude Cargo.lock/lockfiles from the diff — removes ~163 lines of
hash noise that waste the review budget with no value
- Raise line cap from 500 to 3000 and add a truncation notice when
the diff is cut, so the model knows the diff is incomplete
- Harden prompt: require quoted evidence for every finding; add explicit
self-verification step for missing-identifier claims (search full diff
before raising); tighten no-hallucinate instruction
Only a single hardcoded entry (word/document.xml) is ever accessed from
the ZIP archive; no arbitrary path extraction occurs, so path traversal
attacks cannot apply. Add a comment to make this invariant explicit for
future maintainers.
- Add extension allowlist (SAFE_TEXT_EXTENSIONS + SAFE_BINARY_EXTENSIONS)
rejecting unsupported file types at both upload_log_file and
upload_log_file_by_content entry points
- Add extract_text_content() with PDF text extraction via lopdf and
DOCX extraction via zip+quick-xml
- Binary files (PDF/DOCX) get extracted text written to .extracted.txt
for downstream PII detection
- Expand frontend file input accept list and add collapsible
supported-formats disclosure element
- Add 11 unit tests covering allowlist logic and extraction paths
AI history continuity: Changed the history-load query in chat_message to
JOIN ai_conversations and select by issue_id instead of single conversation_id.
This preserves full context when provider/model changes mid-triage.
Deep search: Added DISTINCT to list_issues SELECT and extended the search
filter with EXISTS subqueries covering ai_messages, resolution_steps,
log_files, and timeline_events. Ensures comprehensive search without
duplicate results.
Includes 11 new unit tests covering both features.
Cargo.toml was updated to 0.3.0 for the MCP feature but tauri.conf.json
was missed. Bundle artifact filenames are derived from tauri.conf.json,
so all builds were producing files named 0.2.68.
tea is not available in alpine:latest containers. Replace both tea calls
with curl API requests against the Gitea releases endpoint.
Also: auto-tag now reads src-tauri/Cargo.toml version first. If it is
ahead of the latest git tag (major/minor bump), that version is used
directly instead of incrementing the patch.
- call_tool: 30s hard timeout via tokio::time::timeout
- discover_server: 60s hard timeout wrapping full connect+discover sequence
- delete_mcp_server: write_audit_event before cascade delete, capturing
server name, tool count, and resource count
- initiate_mcp_oauth: append cryptographically random state nonce for CSRF
- pr-review.yml: rewrite prompt to require line-quoted evidence for every
finding and eliminate hallucinated false positives