tftsr-devops_investigation/docs/TICKET-pii-bypass-chat-attachments.md

# TICKET: PII Detection Bypass in AI Chat

**Branch**: `fix/pii-detection-bypass`

---

## Description

Two PII detection bypasses were identified and fixed in the AI triage chat interface.

### Bypass 1 — File Attachments (Critical)

When a user attached a file to a chat message, its content was read via `readTextFile()`, sliced to 8 KB, and embedded directly into the AI message string — bypassing the PII pipeline entirely. The message was forwarded to the configured AI provider in plaintext with no redaction marker in the audit log.

**Root cause**: `handleAttach` stored raw file content in React state. `handleSend` concatenated it into `aiMessage` with no PII check. The backend `chat_message` command applied no validation.

### Bypass 2 — Typed Chat Messages (High)

Plain typed chat messages were sent to the AI provider without any PII scan. A user typing `How secure is my password: abc123!!` would have the password forwarded to the AI and persisted in the audit log in plaintext.

### Related Fix — Wrong Return Type on `detect_pii`

`detect_pii` was serialising `pii::PiiDetectionResult` (`spans`, `original_text`) while the TypeScript interface expected `db::models::PiiDetectionResult` (`detections`, `total_pii_found`). All frontend code reading `result.detections` received `undefined`, meaning the LogUpload PII review workflow was silently broken.

---

## Design Decision: Auto-Redact, Not Block

After initial implementation explored a blocking/warn-then-proceed approach, the product decision was made to **auto-redact PII in-place and send**:

- File attachments: PII is detected on full file content and replaced with type tokens (`[Password]`, `[Email]`, etc.) before the content is embedded in the AI message. The redacted form is stored in the DB and audit log.
- Typed messages: Same auto-redact applied to the user's typed text before the message is sent to the AI provider.
- The user's chat bubble is updated after the response to show the redacted form — users can see exactly what reached the AI.
- The audit log records `was_pii_redacted: bool` and `pii_types_redacted: [...]` alongside the redacted message.
- No user blocking or acknowledgment flow. PII is handled transparently.

---

## Acceptance Criteria

- [x] Attaching a text file containing PII sends successfully; content is auto-redacted before the AI sees it
- [x] Attaching a clean text file proceeds normally with no modification
- [x] PII detection runs on the full file content before truncating to the 8 KB embed limit (no PII straddling the boundary)
- [x] Typed messages containing PII are auto-redacted before being sent to the AI provider
- [x] The chat bubble is updated post-send to show the redacted form of the user's message
- [x] The audit log records `was_pii_redacted`, `pii_types_redacted`, and the full redacted `user_message`
- [x] `detectPiiCmd` returns `detections: PiiSpan[]` and `total_pii_found: number` matching the TypeScript contract
- [x] `chatMessageCmd` passes `logFileIds` as `undefined` (not `null`) when no files are attached
- [x] `scan_text_for_pii` rejects inputs over 32 KB to prevent DoS
- [x] `response.user_message ?? message` used as bubble fallback — no `"undefined..."` concatenation
- [x] All Rust and frontend tests pass; zero clippy warnings; `cargo fmt --check` clean; tsc clean

---

## Work Implemented

### `src-tauri/src/ai/mod.rs`
- Added `user_message: Option<String>` to `ChatResponse` — set by `chat_message`, absent from direct provider calls

### `src-tauri/src/ai/anthropic.rs`, `gemini.rs`, `mistral.rs`, `ollama.rs`, `openai.rs`
- Added `user_message: None` to all `ChatResponse { ... }` constructors

### `src-tauri/src/commands/ai.rs`
- `chat_message` now accepts `log_file_ids: Option<Vec<String>>`
- Step 1: auto-redacts the typed message text with `PiiDetector` + `apply_redactions`
- Step 2: loads each attachment from DB, detects PII on **full file content**, applies redactions, then truncates to 8 KB at a valid UTF-8 char boundary
- Tracks `was_pii_redacted` and `redacted_pii_types` across both steps
- Audit log includes `was_pii_redacted: bool` and `pii_types_redacted: [...]`
- Returns `user_message: Some(stored_user_message)` in `ChatResponse`

### `src-tauri/src/commands/analysis.rs`
- Fixed `detect_pii` return type from `pii::PiiDetectionResult` to `db::models::PiiDetectionResult`
- Added `scan_text_for_pii(text: String)` with 32 KB input cap

### `src-tauri/src/lib.rs`
- Registered `scan_text_for_pii`

### `src/lib/tauriCommands.ts`
- `ChatResponse` interface: added `user_message?: string`
- `chatMessageCmd` signature: added `logFileIds: string[]`; passes `undefined` when empty
- Added `scanTextForPiiCmd` wrapper

### `src/stores/sessionStore.ts`
- Added `updateMessageContent(id, content)` action

### `src/pages/Triage/index.tsx`
- `PendingFile` type: `{ name: string; logFileId: string }` — no raw content stored
- `handleAttach`: only uploads the file and stores `logFileId`; no `readTextFile`
- `handleSend`: passes `logFileIds` to backend; after response updates the bubble with `(response.user_message ?? message) + suffix`

---

## Testing Needed

1. Attach a file containing `password: secret123` → message sends; chat bubble shows `[Password]` in the embedded content; no plaintext credential in bubble or DB
2. Attach a clean text file → content appears unmodified in the chat context
3. Attach a file where PII appears near the 8000-byte mark → content is fully redacted before truncation
4. Type `My password is abc123!!` → message sends; bubble shows `My [Password] is [Password]`
5. On LogUpload page, upload a file with a known IP/email → PII spans appear in the review UI
6. Check audit log after a PII-containing message: `was_pii_redacted: true`, `pii_types_redacted` populated
7. Check audit log after a clean message: `was_pii_redacted: false`, `pii_types_redacted: []`
8. `cargo test` → 228/228 pass; `npm run test:run` → 103/103 pass; `cargo fmt --check` clean; `npx tsc --noEmit` clean
feat: full copy from apollo_nxt-trcaa with complete sanitization Complete backport of all features from apollo_nxt-trcaa repository: - Three-tier shell execution safety system (Tier 1: auto, Tier 2: approve, Tier 3: deny) - Ollama function calling with tool use support - AI provider tool calling auto-detection - kubectl binary bundling and management - kubeconfig upload and context management - Shell approval modal with real-time UI - MCP protocol HTTP transport with custom headers - Enhanced security audit logging - Comprehensive test coverage (275+ tests) - Updated CI/CD workflows for Gitea Actions - Complete documentation (ADRs, wiki, release notes) Sanitization applied to all files: - Removed all MSI, Motorola, VNXT, Vesta references - Replaced internal infrastructure references with TFTSR equivalents - Updated all URLs and API endpoints - Sanitized commit history references in documentation Technical changes: - New modules: shell/classifier, shell/executor, shell/kubectl, shell/kubeconfig - Enhanced AI providers: ollama.rs, openai.rs with function calling - New Tauri commands: shell execution, kubeconfig management, tool calling detection - Database migrations: shell_execution_audit table - Frontend: ShellApprovalModal, ShellExecution, KubeconfigManager pages - CI/CD: kubectl bundling, multi-platform builds, Gitea Actions integration Version: 1.0.8 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> 2026-06-05 19:11:00 +00:00			`# TICKET: PII Detection Bypass in AI Chat`

			Branch: `fix/pii-detection-bypass`

			`---`

			`## Description`

			`Two PII detection bypasses were identified and fixed in the AI triage chat interface.`

			`### Bypass 1 — File Attachments (Critical)`

			When a user attached a file to a chat message, its content was read via `readTextFile()`, sliced to 8 KB, and embedded directly into the AI message string — bypassing the PII pipeline entirely. The message was forwarded to the configured AI provider in plaintext with no redaction marker in the audit log.

			Root cause: `handleAttach` stored raw file content in React state. `handleSend` concatenated it into `aiMessage` with no PII check. The backend `chat_message` command applied no validation.

			`### Bypass 2 — Typed Chat Messages (High)`

			Plain typed chat messages were sent to the AI provider without any PII scan. A user typing `How secure is my password: abc123!!` would have the password forwarded to the AI and persisted in the audit log in plaintext.

			### Related Fix — Wrong Return Type on `detect_pii`

			`detect_pii` was serialising `pii::PiiDetectionResult` (`spans`, `original_text`) while the TypeScript interface expected `db::models::PiiDetectionResult` (`detections`, `total_pii_found`). All frontend code reading `result.detections` received `undefined`, meaning the LogUpload PII review workflow was silently broken.

			`---`

			`## Design Decision: Auto-Redact, Not Block`

			`After initial implementation explored a blocking/warn-then-proceed approach, the product decision was made to auto-redact PII in-place and send:`

			- File attachments: PII is detected on full file content and replaced with type tokens (`[Password]`, `[Email]`, etc.) before the content is embedded in the AI message. The redacted form is stored in the DB and audit log.
			`- Typed messages: Same auto-redact applied to the user's typed text before the message is sent to the AI provider.`
			`- The user's chat bubble is updated after the response to show the redacted form — users can see exactly what reached the AI.`
			- The audit log records `was_pii_redacted: bool` and `pii_types_redacted: [...]` alongside the redacted message.
			`- No user blocking or acknowledgment flow. PII is handled transparently.`

			`---`

			`## Acceptance Criteria`

			`- [x] Attaching a text file containing PII sends successfully; content is auto-redacted before the AI sees it`
			`- [x] Attaching a clean text file proceeds normally with no modification`
			`- [x] PII detection runs on the full file content before truncating to the 8 KB embed limit (no PII straddling the boundary)`
			`- [x] Typed messages containing PII are auto-redacted before being sent to the AI provider`
			`- [x] The chat bubble is updated post-send to show the redacted form of the user's message`
			- [x] The audit log records `was_pii_redacted`, `pii_types_redacted`, and the full redacted `user_message`
			- [x] `detectPiiCmd` returns `detections: PiiSpan[]` and `total_pii_found: number` matching the TypeScript contract
			- [x] `chatMessageCmd` passes `logFileIds` as `undefined` (not `null`) when no files are attached
			- [x] `scan_text_for_pii` rejects inputs over 32 KB to prevent DoS
			- [x] `response.user_message ?? message` used as bubble fallback — no `"undefined..."` concatenation
			- [x] All Rust and frontend tests pass; zero clippy warnings; `cargo fmt --check` clean; tsc clean

			`---`

			`## Work Implemented`

			### `src-tauri/src/ai/mod.rs`
			- Added `user_message: Option<String>` to `ChatResponse` — set by `chat_message`, absent from direct provider calls

			### `src-tauri/src/ai/anthropic.rs`, `gemini.rs`, `mistral.rs`, `ollama.rs`, `openai.rs`
			- Added `user_message: None` to all `ChatResponse { ... }` constructors

			### `src-tauri/src/commands/ai.rs`
			- `chat_message` now accepts `log_file_ids: Option<Vec<String>>`
			- Step 1: auto-redacts the typed message text with `PiiDetector` + `apply_redactions`
			`- Step 2: loads each attachment from DB, detects PII on full file content, applies redactions, then truncates to 8 KB at a valid UTF-8 char boundary`
			- Tracks `was_pii_redacted` and `redacted_pii_types` across both steps
			- Audit log includes `was_pii_redacted: bool` and `pii_types_redacted: [...]`
			- Returns `user_message: Some(stored_user_message)` in `ChatResponse`

			### `src-tauri/src/commands/analysis.rs`
			- Fixed `detect_pii` return type from `pii::PiiDetectionResult` to `db::models::PiiDetectionResult`
			- Added `scan_text_for_pii(text: String)` with 32 KB input cap

			### `src-tauri/src/lib.rs`
			- Registered `scan_text_for_pii`

			### `src/lib/tauriCommands.ts`
			- `ChatResponse` interface: added `user_message?: string`
			- `chatMessageCmd` signature: added `logFileIds: string[]`; passes `undefined` when empty
			- Added `scanTextForPiiCmd` wrapper

			### `src/stores/sessionStore.ts`
			- Added `updateMessageContent(id, content)` action

			### `src/pages/Triage/index.tsx`
			- `PendingFile` type: `{ name: string; logFileId: string }` — no raw content stored
			- `handleAttach`: only uploads the file and stores `logFileId`; no `readTextFile`
			- `handleSend`: passes `logFileIds` to backend; after response updates the bubble with `(response.user_message ?? message) + suffix`

			`---`

			`## Testing Needed`

			1. Attach a file containing `password: secret123` → message sends; chat bubble shows `[Password]` in the embedded content; no plaintext credential in bubble or DB
			`2. Attach a clean text file → content appears unmodified in the chat context`
			`3. Attach a file where PII appears near the 8000-byte mark → content is fully redacted before truncation`
			4. Type `My password is abc123!!` → message sends; bubble shows `My [Password] is [Password]`
			`5. On LogUpload page, upload a file with a known IP/email → PII spans appear in the review UI`
			6. Check audit log after a PII-containing message: `was_pii_redacted: true`, `pii_types_redacted` populated
			7. Check audit log after a clean message: `was_pii_redacted: false`, `pii_types_redacted: []`
			8. `cargo test` → 228/228 pass; `npm run test:run` → 103/103 pass; `cargo fmt --check` clean; `npx tsc --noEmit` clean