tftsr-devops_investigation/docs/HACKATHON-SUBMISSION-CONCISE.md

# 2026 Hackathon: TRCAA

**Developer**: Shaun Arman (VFK387) | **ADO**: [#727547](https://dev.azure.com/tftsr/Apollo/_workitems/edit/727547)

---

## Problem to Solve

An alert fires, engineers swarm it, someone finds the root cause, and the post-mortem gets written from memory three days later with half the context gone. The process loses information at every handoff. Current pain: manual command execution slows triage (copy terminal → paste → ask AI → repeat), cloud SaaS tools require uploading sensitive production data, generic AI lacks infrastructure expertise.

---

## Our Solution

**TRCAA: Local-first AI-powered incident triage that autonomously executes diagnostic commands.**

### Core Innovation: Agentic Shell Execution
The AI doesn't suggest commands—it executes them with intelligent safety:

**Three-Tier Safety:**
- **Tier 1**: Read-only (`kubectl get`, `grep`) auto-execute
- **Tier 2**: Mutating (`kubectl scale`) require approval
- **Tier 3**: Destructive (`rm -rf`) auto-blocked

**Example:** *"Why is nginx pod crashing?"* → AI runs `kubectl get/describe/logs`, analyzes output, explains root cause. No copy-paste.

### Unique Features
- **Local-first**: SQLCipher AES-256 encrypted storage, offline via Ollama, PII auto-redact, tamper-evident audit
- **Domain expertise**: 16 pre-built contexts (Linux RHEL/OEL, Windows, K8s, networking, databases, Proxmox, HPE, observability)
- **Multi-cluster K8s**: Encrypted kubeconfig storage, bundled kubectl v1.30.0
- **Provider-agnostic**: OpenAI, Claude, Gemini, Mistral, Bedrock, Ollama + auto-detect tool calling

---

## What We Built

**v1.0.0** (44 hrs): 35 files, +4089 lines, shell execution module, three-tier classifier (19 tests/100% coverage), approval modal UI, CI/CD

**v1.0.1-v1.0.9** (28 hrs, 24 PRs in 48 hrs): Security updates, LiteLLM Bedrock, Ollama auto-start + function calling, query classification (prevents AI over-investigation), connection reliability (180s timeout, health checks, retry logic), tool calling auto-detect

**Total**: 25 PRs, ~84 files, ~6,100 lines, 431 tests, 72 hours

---

## Competitive Landscape

**SaaS exists**: Rootly, incident.io, Xurrent, TraceRoot—all cloud, subscriptions, data leaves network

**TRCAA uniquely combines**: Local-first + offline + encrypted + PII sanitization + provider-agnostic (6 providers) + 16 domain contexts + autonomous shell execution + tamper-evident audit + air-gap capable

**We win on**: Privacy (local encrypted), air-gap (Ollama), cost (no per-seat fees), domain depth  
**SaaS wins on**: Alert integration (PagerDuty/Datadog), team collaboration, observability correlation

**Target**: Regulated industries, defense, air-gapped environments, privacy-focused teams

---

## Technical Highlights

**Backend (Rust)**: Three-tier classifier with pipe/chain analysis, AES-256-GCM encryption, hash-chained audit, 297 tests  
**Frontend (React)**: Real-time approval modal, multi-cluster manager, 134 tests  
**CI/CD**: Multi-platform builds (Linux amd64/arm64, macOS, Windows), kubectl bundled, branch protection

**Quality**: 3 rounds Copilot review (10 findings resolved), zero Clippy warnings, zero TypeScript errors

---

## Impact

**Development**: 72 hours, 25 PRs, ~6,100 lines, 431 tests  
**Real-world**: Reduced triage from manual copy-paste loop to autonomous sub-second execution  
**Security**: 3 Copilot security findings resolved (prompt injection, tool call dropping, sanitization)

---

## Try It

[GitHub Releases](https://github.com/tftsr/apollo_nxt-trcaa/releases) → Upload kubeconfig → Ask *"What pods in default namespace?"* → Watch AI auto-execute. Works fully offline with Ollama.

---

## Fun Fact

Zero to production with 431 passing tests, 25 PRs, comprehensive docs in 72 hours. Zero Clippy warnings. Zero TypeScript errors. 100+ real commands executed without a single false-positive denial.
feat: full copy from apollo_nxt-trcaa with complete sanitization Complete backport of all features from apollo_nxt-trcaa repository: - Three-tier shell execution safety system (Tier 1: auto, Tier 2: approve, Tier 3: deny) - Ollama function calling with tool use support - AI provider tool calling auto-detection - kubectl binary bundling and management - kubeconfig upload and context management - Shell approval modal with real-time UI - MCP protocol HTTP transport with custom headers - Enhanced security audit logging - Comprehensive test coverage (275+ tests) - Updated CI/CD workflows for Gitea Actions - Complete documentation (ADRs, wiki, release notes) Sanitization applied to all files: - Removed all MSI, Motorola, VNXT, Vesta references - Replaced internal infrastructure references with TFTSR equivalents - Updated all URLs and API endpoints - Sanitized commit history references in documentation Technical changes: - New modules: shell/classifier, shell/executor, shell/kubectl, shell/kubeconfig - Enhanced AI providers: ollama.rs, openai.rs with function calling - New Tauri commands: shell execution, kubeconfig management, tool calling detection - Database migrations: shell_execution_audit table - Frontend: ShellApprovalModal, ShellExecution, KubeconfigManager pages - CI/CD: kubectl bundling, multi-platform builds, Gitea Actions integration Version: 1.0.8 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> 2026-06-05 19:11:00 +00:00			`# 2026 Hackathon: TRCAA`

			`Developer: Shaun Arman (VFK387) \| ADO: [#727547](https://dev.azure.com/tftsr/Apollo/_workitems/edit/727547)`

			`---`

			`## Problem to Solve`

			`An alert fires, engineers swarm it, someone finds the root cause, and the post-mortem gets written from memory three days later with half the context gone. The process loses information at every handoff. Current pain: manual command execution slows triage (copy terminal → paste → ask AI → repeat), cloud SaaS tools require uploading sensitive production data, generic AI lacks infrastructure expertise.`

			`---`

			`## Our Solution`

			`TRCAA: Local-first AI-powered incident triage that autonomously executes diagnostic commands.`

			`### Core Innovation: Agentic Shell Execution`
			`The AI doesn't suggest commands—it executes them with intelligent safety:`

			`Three-Tier Safety:`
			- Tier 1: Read-only (`kubectl get`, `grep`) auto-execute
			- Tier 2: Mutating (`kubectl scale`) require approval
			- Tier 3: Destructive (`rm -rf`) auto-blocked

			Example: "Why is nginx pod crashing?" → AI runs `kubectl get/describe/logs`, analyzes output, explains root cause. No copy-paste.

			`### Unique Features`
			`- Local-first: SQLCipher AES-256 encrypted storage, offline via Ollama, PII auto-redact, tamper-evident audit`
			`- Domain expertise: 16 pre-built contexts (Linux RHEL/OEL, Windows, K8s, networking, databases, Proxmox, HPE, observability)`
			`- Multi-cluster K8s: Encrypted kubeconfig storage, bundled kubectl v1.30.0`
			`- Provider-agnostic: OpenAI, Claude, Gemini, Mistral, Bedrock, Ollama + auto-detect tool calling`

			`---`

			`## What We Built`

			`v1.0.0 (44 hrs): 35 files, +4089 lines, shell execution module, three-tier classifier (19 tests/100% coverage), approval modal UI, CI/CD`

			`v1.0.1-v1.0.9 (28 hrs, 24 PRs in 48 hrs): Security updates, LiteLLM Bedrock, Ollama auto-start + function calling, query classification (prevents AI over-investigation), connection reliability (180s timeout, health checks, retry logic), tool calling auto-detect`

			`Total: 25 PRs, ~84 files, ~6,100 lines, 431 tests, 72 hours`

			`---`

			`## Competitive Landscape`

			`SaaS exists: Rootly, incident.io, Xurrent, TraceRoot—all cloud, subscriptions, data leaves network`

			`TRCAA uniquely combines: Local-first + offline + encrypted + PII sanitization + provider-agnostic (6 providers) + 16 domain contexts + autonomous shell execution + tamper-evident audit + air-gap capable`

			`We win on: Privacy (local encrypted), air-gap (Ollama), cost (no per-seat fees), domain depth`
			`SaaS wins on: Alert integration (PagerDuty/Datadog), team collaboration, observability correlation`

			`Target: Regulated industries, defense, air-gapped environments, privacy-focused teams`

			`---`

			`## Technical Highlights`

			`Backend (Rust): Three-tier classifier with pipe/chain analysis, AES-256-GCM encryption, hash-chained audit, 297 tests`
			`Frontend (React): Real-time approval modal, multi-cluster manager, 134 tests`
			`CI/CD: Multi-platform builds (Linux amd64/arm64, macOS, Windows), kubectl bundled, branch protection`

			`Quality: 3 rounds Copilot review (10 findings resolved), zero Clippy warnings, zero TypeScript errors`

			`---`

			`## Impact`

			`Development: 72 hours, 25 PRs, ~6,100 lines, 431 tests`
			`Real-world: Reduced triage from manual copy-paste loop to autonomous sub-second execution`
			`Security: 3 Copilot security findings resolved (prompt injection, tool call dropping, sanitization)`

			`---`

			`## Try It`

			`[GitHub Releases](https://github.com/tftsr/apollo_nxt-trcaa/releases) → Upload kubeconfig → Ask "What pods in default namespace?" → Watch AI auto-execute. Works fully offline with Ollama.`

			`---`

			`## Fun Fact`

			`Zero to production with 431 passing tests, 25 PRs, comprehensive docs in 72 hours. Zero Clippy warnings. Zero TypeScript errors. 100+ real commands executed without a single false-positive denial.`