tftsr-devops_investigation/README.md
Shaun Arman e20228da6f refactor(ci): remove standalone release workflow
Delete .gitea/workflows/release.yml and keep release orchestration in auto-tag.yml only, then update related workflow tests and docs to reference the unified pipeline.

Made-with: Cursor
2026-04-04 21:34:15 -05:00

330 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Troubleshooting and RCA Assistant
A structured, AI-backed desktop tool for IT incident triage, 5-Whys root cause analysis, RCA document generation, and blameless post-mortems. Runs fully offline via Ollama local models, or connects to cloud AI providers.
Built with **Tauri 2** (Rust + WebView), **React 18**, **TypeScript**, and **SQLCipher AES-256** encrypted storage.
**CI status:** ![CI](http://172.0.0.29:3000/sarman/tftsr-devops_investigation/actions/workflows/test.yml/badge.svg) — all checks green (rustfmt · clippy · 64 Rust tests · tsc · vitest)
---
## Features
- **5-Whys AI Triage** — Guided root cause analysis via AI chat, with auto-detection of why levels 15
- **PII Sanitization** — Automatic detection and redaction of IPv4/IPv6, emails, tokens, passwords, SSNs, and more before any data leaves the machine
- **Multi-Provider AI** — OpenAI, Anthropic Claude, Google Gemini, Mistral, and local [Ollama](https://ollama.com) (offline)
- **Encrypted Database** — SQLCipher AES-256 encrypted SQLite; all issue history stays local
- **RCA + Post-Mortem Generation** — Auto-populated Markdown templates, exportable to `.md` and `.pdf`
- **Ollama Management** — Hardware detection, model recommendations, pull/delete models in-app
- **Audit Trail** — Every external data send logged with SHA-256 hash
- **Domain System Prompts** — Pre-built expert context for 8 IT domains (Linux, Windows, Network, Kubernetes, Databases, Virtualization, Hardware, Observability)
- **Integrations** *(v0.2, coming soon)* — Confluence, ServiceNow, Azure DevOps
---
## Supported Domains
| Domain | Coverage |
|---|---|
| Linux | RHEL/OEL, systemd, journald, SELinux, kernel panics |
| Windows | Event IDs, WinRM, BSOD codes, Server 2019/2022 |
| Network | Fortigate, Cisco IOS, Aruba AOS-CX, Nokia SR-OS, VoIP SIP/RTP |
| Kubernetes | k3s, OpenShift, CrashLoopBackOff, OOMKill, etcd, Rancher |
| Databases | PostgreSQL WAL, Redis AOF/RDB, RabbitMQ, MSSQL |
| Virtualization | Proxmox VE/PBS, VDI sessions |
| Hardware | HPE Synergy 12000, DL-20/320/360/380, iLO event logs |
| Observability | Kibana/ECK, Elasticsearch shard failures |
---
## Architecture
| Component | Technology |
|---|---|
| App framework | Tauri 2.x (Rust + WebView) |
| Frontend | React 18 + TypeScript + Vite |
| UI | Tailwind CSS (custom shadcn-style components) |
| Database | rusqlite + `bundled-sqlcipher` (AES-256) |
| Secret storage | `tauri-plugin-stronghold` |
| State management | Zustand (persisted settings store) |
| AI providers | reqwest (async HTTP) |
| PII detection | regex + aho-corasick multi-pattern engine |
---
## Prerequisites
### System Libraries (Linux — Fedora/RHEL)
```bash
sudo dnf install -y \
glib2-devel gtk3-devel webkit2gtk4.1-devel \
libsoup3-devel openssl-devel librsvg2-devel
```
### System Libraries (Linux — Debian/Ubuntu)
```bash
sudo apt-get install -y \
libwebkit2gtk-4.1-dev libssl-dev libgtk-3-dev \
libayatana-appindicator3-dev librsvg2-dev patchelf pkg-config
```
### Toolchain
```bash
# Rust (minimum 1.88 — required by cookie_store, time, darling)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
# Node.js 22+ (via your package manager)
# Verify:
rustc --version # 1.88+
node --version # 22+
```
---
## Getting Started
```bash
# Clone
git clone https://gogs.tftsr.com/sarman/tftsr-devops_investigation.git
cd tftsr-devops_investigation
npm install --legacy-peer-deps
# Development mode (hot reload)
source ~/.cargo/env
cargo tauri dev
# Production build
cargo tauri build
# Output: src-tauri/target/release/bundle/
```
---
## Releases
Pre-built installers are attached to each [tagged release](https://gogs.tftsr.com/sarman/tftsr-devops_investigation/releases):
| Platform | Format | Notes |
|---|---|---|
| Linux amd64 | `.deb`, `.rpm`, `.AppImage` | Standard package or universal AppImage |
| Windows amd64 | `.exe` (NSIS), `.msi` | From cross-compile via mingw-w64 |
| Linux arm64 | `.deb`, `.rpm`, `.AppImage` | Built natively on arm64 runner |
| macOS | — | Requires macOS runner — build locally |
---
## AI Provider Setup
Launch the app and go to **Settings → AI Providers** to add a provider:
| Provider | API URL | Notes |
|---|---|---|
| OpenAI | `https://api.openai.com/v1` | Requires API key |
| Anthropic | `https://api.anthropic.com` | Requires API key |
| Google Gemini | `https://generativelanguage.googleapis.com` | Requires API key |
| Mistral | `https://api.mistral.ai/v1` | Requires API key |
| Ollama (local) | `http://localhost:11434` | No key needed — fully offline |
| Azure OpenAI | `https://<resource>.openai.azure.com/openai/deployments/<deployment>` | Requires API key |
| **AWS Bedrock (via LiteLLM)** | `http://localhost:8000/v1` | See [LiteLLM + AWS Bedrock](#litellm--aws-bedrock-setup) below |
For offline use, install [Ollama](https://ollama.com) and pull a model:
```bash
ollama pull llama3.2:3b # Good for most hardware (≥8 GB RAM)
ollama pull llama3.1:8b # Better quality (≥16 GB RAM)
```
Or use **Settings → Ollama** to pull models directly from within the app.
### LiteLLM + AWS Bedrock Setup
To use Claude via AWS Bedrock (ideal for enterprise environments with existing AWS contracts):
1. **Install LiteLLM:**
```bash
pip install litellm[proxy]
```
2. **Create config file** at `~/.litellm/config.yaml`:
```yaml
model_list:
- model_name: bedrock-claude
litellm_params:
model: bedrock/us.anthropic.claude-sonnet-4-6
aws_region_name: us-east-1
# Optionally specify aws_profile_name if not using default
general_settings:
master_key: sk-your-secure-key # Any value for API auth
```
3. **Start LiteLLM proxy:**
```bash
nohup litellm --config ~/.litellm/config.yaml --port 8000 > ~/.litellm/litellm.log 2>&1 &
```
4. **Configure in Troubleshooting and RCA Assistant:**
- Provider: **OpenAI** (OpenAI-compatible)
- Base URL: `http://localhost:8000/v1`
- API Key: `sk-your-secure-key` (from config)
- Model: `bedrock-claude`
For detailed setup including multiple AWS accounts and Claude Code integration, see the [LiteLLM + Bedrock wiki page](https://gogs.tftsr.com/sarman/tftsr-devops_investigation/wiki/LiteLLM-Bedrock-Setup).
---
## Triage Workflow
```
1. New Issue → Select domain, enter title and severity
2. Log Upload → Drag-and-drop log files, review PII redactions
3. Triage → 5-Whys AI conversation, auto-tracked why levels 15
4. Resolution → Review and confirm each root cause and action
5. RCA → Auto-generated RCA document, export as MD or PDF
6. Post-Mortem → Blameless post-mortem document with action items
```
---
## Project Structure
```
tftsr/
├── src-tauri/src/
│ ├── ai/ # AI provider clients (OpenAI, Anthropic, Gemini, Mistral, Ollama)
│ ├── pii/ # PII detection + redaction engine
│ ├── db/ # SQLCipher connection, migrations, models
│ ├── ollama/ # Hardware detection, model recommendations, download manager
│ ├── docs/ # RCA + post-mortem generators, PDF/MD exporters
│ ├── integrations/ # Confluence, ServiceNow, Azure DevOps (v0.2 stubs)
│ ├── audit/ # Audit log writer
│ ├── commands/ # Tauri IPC command handlers
│ ├── lib.rs # App builder, plugin registration, command handler registration
│ └── state.rs # AppState (DB connection, settings)
├── src/
│ ├── pages/ # Dashboard, NewIssue, LogUpload, Triage, Resolution, RCA, Postmortem, History, Settings
│ ├── components/ # ChatWindow, TriageProgress, PiiDiffViewer, DocEditor, HardwareReport, ModelSelector, UI
│ ├── stores/ # sessionStore, settingsStore (persisted), historyStore
│ ├── lib/ # tauriCommands.ts (typed IPC wrappers), domainPrompts.ts
│ └── styles/ # Tailwind + CSS custom properties
├── tests/
│ ├── unit/ # Vitest unit tests (PII, session store, settings store)
│ └── e2e/ # WebdriverIO + tauri-driver E2E skeletons
├── docs/wiki/ # Source of truth for Gitea wiki
└── .gitea/
└── workflows/
├── test.yml # CI: rustfmt · clippy · cargo test · tsc · vitest (every push/PR)
└── auto-tag.yml # Auto tag + release: linux/amd64 + windows/amd64 + linux/arm64 + macOS
```
---
## Testing
```bash
# Unit tests (Vitest) — 13/13 passing
npm run test:run
# Frontend coverage
npm run test:coverage
# TypeScript type check
npx tsc --noEmit
# Rust checks — 64/64 tests passing
cargo check --manifest-path src-tauri/Cargo.toml
cargo test --manifest-path src-tauri/Cargo.toml
# E2E tests (requires compiled app binary)
TAURI_BINARY_PATH=./src-tauri/target/release/tftsr npm run test:e2e
```
---
## CI/CD — Gitea Actions
The project uses **Gitea Actions** (act_runner v0.3.1) connected to the Gitea instance at `gogs.tftsr.com`.
| Workflow | Trigger | Jobs |
|---|---|---|
| `.gitea/workflows/test.yml` | Every push / PR | rustfmt · clippy · cargo test (64) · tsc · vitest (13) |
| `.gitea/workflows/auto-tag.yml` | Push to `master` | Auto-tag, then build linux/amd64 + windows/amd64 + linux/arm64 + macOS and upload assets |
**Runners:**
| Runner | Platform | Host | Purpose |
|---|---|---|---|
| `amd64-docker-runner` | linux/amd64 | 172.0.0.29 (Docker) | Test pipeline + amd64/windows release builds |
| `arm64-native-runner` | linux/arm64 | Local arm64 machine | Native arm64 release builds |
**Branch protection:** master requires a PR approved by `sarman`, with all 5 CI checks passing before merge.
> See [CI/CD Pipeline wiki](https://gogs.tftsr.com/sarman/tftsr-devops_investigation/wiki/CICD-Pipeline) for full infrastructure docs.
---
## Security
| Concern | Implementation |
|---|---|
| API keys / tokens | `tauri-plugin-stronghold` encrypted vault |
| Database at rest | SQLCipher AES-256; key derived via PBKDF2 |
| PII before AI send | Rust-side detection + mandatory user approval in UI |
| Audit trail | Every `ai_send` / `publish` event logged with SHA-256 hash |
| Network | `reqwest` with TLS; HTTP blocked by Tauri capability config |
| Capabilities | Least-privilege: scoped fs access, no arbitrary shell by default |
| CSP | Strict CSP in `tauri.conf.json`; no inline scripts |
| Telemetry | None — zero analytics, crash reporting, or usage tracking |
---
## Database
All data is stored locally in a SQLCipher-encrypted database at:
| OS | Path |
|---|---|
| Linux | `~/.local/share/tftsr/tftsr.db` |
| macOS | `~/Library/Application Support/tftsr/tftsr.db` |
| Windows | `%APPDATA%\tftsr\tftsr.db` |
Override with the `TFTSR_DATA_DIR` environment variable.
---
## Environment Variables
| Variable | Default | Purpose |
|---|---|---|
| `TFTSR_DATA_DIR` | Platform data dir | Override database location |
| `TFTSR_DB_KEY` | `dev-key-change-in-prod` | Database encryption key (release builds) |
| `RUST_LOG` | `info` | Tracing log level (`debug`, `info`, `warn`, `error`) |
---
## Implementation Status
| Phase | Description | Status |
|---|---|---|
| 1 | Scaffold & Foundation | ✅ Complete |
| 2 | Security & Database Layer | ✅ Complete |
| 3 | PII Sanitization Engine | ✅ Complete |
| 4 | AI Provider Layer | ✅ Complete |
| 5 | Ollama Integration | ✅ Complete |
| 6 | Log Upload & Analysis | ✅ Complete |
| 7 | 5-Whys Triage Engine | ✅ Complete |
| 8 | RCA & Post-Mortem Generation | ✅ Complete |
| 9 | History & Search | 🔲 Pending |
| 10 | Integrations (Confluence, ServiceNow, ADO) | 🔲 v0.2 |
| 11 | CI/CD Pipeline | ✅ Complete — Gitea Actions, all checks green |
| 12 | Release Packaging | ✅ linux/amd64 · linux/arm64 (native) · windows/amd64 |
---
## License
Private — internal tooling. All rights reserved.