tftsr-devops_investigation/docs/wiki/PII-Detection.md
Shaun Arman 093495a653
Some checks failed
Test / rust-fmt-check (pull_request) Failing after 0s
Test / rust-clippy (pull_request) Failing after 1s
Test / rust-tests (pull_request) Failing after 0s
Test / frontend-typecheck (pull_request) Failing after 16s
Test / frontend-tests (pull_request) Failing after 18s
PR Review Automation / review (pull_request) Failing after 4m13s
feat: full copy from apollo_nxt-trcaa with complete sanitization
Complete backport of all features from apollo_nxt-trcaa repository:
- Three-tier shell execution safety system (Tier 1: auto, Tier 2: approve, Tier 3: deny)
- Ollama function calling with tool use support
- AI provider tool calling auto-detection
- kubectl binary bundling and management
- kubeconfig upload and context management
- Shell approval modal with real-time UI
- MCP protocol HTTP transport with custom headers
- Enhanced security audit logging
- Comprehensive test coverage (275+ tests)
- Updated CI/CD workflows for Gitea Actions
- Complete documentation (ADRs, wiki, release notes)

Sanitization applied to all files:
- Removed all MSI, Motorola, VNXT, Vesta references
- Replaced internal infrastructure references with TFTSR equivalents
- Updated all URLs and API endpoints
- Sanitized commit history references in documentation

Technical changes:
- New modules: shell/classifier, shell/executor, shell/kubectl, shell/kubeconfig
- Enhanced AI providers: ollama.rs, openai.rs with function calling
- New Tauri commands: shell execution, kubeconfig management, tool calling detection
- Database migrations: shell_execution_audit table
- Frontend: ShellApprovalModal, ShellExecution, KubeconfigManager pages
- CI/CD: kubectl bundling, multi-platform builds, Gitea Actions integration

Version: 1.0.8

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-06-05 14:12:43 -05:00

3.6 KiB

PII Detection

Overview

Before any text is sent to an AI provider, TRCAA scans it for personally identifiable information (PII). Users must review and approve each detected span before the redacted text is transmitted.

Detection Flow

1. Upload log file
      ↓
2. detect_pii(log_file_id)
   → Scans content with PII regex patterns (including hostname + expanded card brands)
   → Resolves overlapping matches (longest wins)
   → Returns Vec<PiiSpan> with byte offsets + replacements
      ↓
3. User reviews spans in PiiDiffViewer (before/after diff)
   → Approves or rejects each span
      ↓
4. apply_redactions(log_file_id, approved_span_ids)
   → Rewrites text with replacements (iterates spans in REVERSE order to preserve offsets)
   → Records SHA-256 hash of redacted text in audit_log
      ↓
5. Redacted text safe to send to AI

Detection Patterns

Type Replacement Pattern notes
UrlWithCredentials [URL] scheme://user:pass@host
BearerToken [Bearer] Case-insensitive bearer keyword + token chars
ApiKey [ApiKey] api_key=, apikey=, access_token= + 16+ char value
Password [Password] password=, passwd=, pwd= + non-whitespace value
Ssn [SSN] \b\d{3}-\d{2}-\d{4}\b
CreditCard [CreditCard] Visa/MC/Amex/Discover/JCB/Diners patterns
Email [Email] RFC-compliant email addresses
MacAddress [MAC] XX:XX:XX:XX:XX:XX and XX-XX-XX-XX-XX-XX
Ipv6 [IPv6] Full and compressed IPv6 addresses
Ipv4 [IPv4] Standard dotted-quad notation
PhoneNumber [Phone] US and international phone formats
Hostname [Hostname] FQDN/hostname detection for internal names
UrlCredentials (covered by UrlWithCredentials)

Overlap Resolution

When two patterns match overlapping text, the longer match wins:

let mut filtered: Vec<PiiSpan> = Vec::new();
for span in sorted_by_start {
    if let Some(last) = filtered.last() {
        if span.start < last.end {
            // Overlap: keep the longer span
            if span.end - span.start > last.end - last.start {
                filtered.pop();
                filtered.push(span);
            }
            continue;
        }
    }
    filtered.push(span);
}

PiiSpan Struct

pub struct PiiSpan {
    pub id: String,          // UUID v7
    pub pii_type: PiiType,
    pub start: usize,        // byte offset in original text
    pub end: usize,
    pub original: String,
    pub replacement: String, // e.g., "[IPv4]"
}

Redaction Algorithm

Spans are applied in reverse order to preserve byte offsets:

let mut redacted = original.to_string();
for span in approved_spans.iter().rev() {  // reverse!
    redacted.replace_range(span.start..span.end, &span.replacement);
}

Audit Logging

Every redaction and every AI send is logged:

write_audit_event(
    &conn,
    "ai_send",                        // action
    "issue",                          // entity_type
    &issue_id,                        // entity_id
    &json!({
        "log_file_ids": [...],
        "redacted_hash": sha256_hex,  // SHA-256 of redacted text
        "provider": provider_name,
    }).to_string(),
)?;

Security Guarantees

  • PII detection runs locally — original text never leaves the machine
  • Only the redacted text is sent to AI providers
  • The SHA-256 hash in the audit log allows integrity verification
  • If redaction is skipped (no PII detected), the audit log still records the send
  • Stored pii_spans.original_value metadata is cleared after redaction is finalized