feat: full copy from apollo_nxt-trcaa with complete sanitization #69
@ -307,7 +307,7 @@ jobs:
|
||||
needs: autotag
|
||||
runs-on: linux-amd64
|
||||
container:
|
||||
image: 172.0.0.29:3000/sarman/trcaa-linux-amd64:rust1.88-node22
|
||||
image: 172.0.0.29:3000/sarman/tftsr-linux-amd64:rust1.88-node22
|
||||
steps:
|
||||
- name: Checkout
|
||||
run: |
|
||||
@ -402,7 +402,7 @@ jobs:
|
||||
needs: autotag
|
||||
runs-on: linux-amd64
|
||||
container:
|
||||
image: 172.0.0.29:3000/sarman/trcaa-windows-cross:rust1.88-node22
|
||||
image: 172.0.0.29:3000/sarman/tftsr-windows-cross:rust1.88-node22
|
||||
steps:
|
||||
- name: Checkout
|
||||
run: |
|
||||
@ -586,7 +586,7 @@ jobs:
|
||||
needs: autotag
|
||||
runs-on: linux-amd64
|
||||
container:
|
||||
image: 172.0.0.29:3000/sarman/trcaa-linux-arm64:rust1.88-node22
|
||||
image: 172.0.0.29:3000/sarman/tftsr-linux-arm64:rust1.88-node22
|
||||
steps:
|
||||
- name: Checkout
|
||||
run: |
|
||||
|
||||
@ -13,9 +13,9 @@ name: Build CI Docker Images
|
||||
# sudo systemctl restart docker
|
||||
#
|
||||
# Images produced:
|
||||
# 172.0.0.29:3000/sarman/trcaa-linux-amd64:rust1.88-node22
|
||||
# 172.0.0.29:3000/sarman/trcaa-windows-cross:rust1.88-node22
|
||||
# 172.0.0.29:3000/sarman/trcaa-linux-arm64:rust1.88-node22
|
||||
# 172.0.0.29:3000/sarman/tftsr-linux-amd64:rust1.88-node22
|
||||
# 172.0.0.29:3000/sarman/tftsr-windows-cross:rust1.88-node22
|
||||
# 172.0.0.29:3000/sarman/tftsr-linux-arm64:rust1.88-node22
|
||||
|
||||
on:
|
||||
push:
|
||||
@ -52,10 +52,10 @@ jobs:
|
||||
run: |
|
||||
echo "$RELEASE_TOKEN" | docker login $REGISTRY -u $REGISTRY_USER --password-stdin
|
||||
docker build \
|
||||
-t $REGISTRY/$REGISTRY_USER/trcaa-linux-amd64:rust1.88-node22 \
|
||||
-t $REGISTRY/$REGISTRY_USER/tftsr-linux-amd64:rust1.88-node22 \
|
||||
-f .docker/Dockerfile.linux-amd64 .
|
||||
docker push $REGISTRY/$REGISTRY_USER/trcaa-linux-amd64:rust1.88-node22
|
||||
echo "✓ Pushed $REGISTRY/$REGISTRY_USER/trcaa-linux-amd64:rust1.88-node22"
|
||||
docker push $REGISTRY/$REGISTRY_USER/tftsr-linux-amd64:rust1.88-node22
|
||||
echo "✓ Pushed $REGISTRY/$REGISTRY_USER/tftsr-linux-amd64:rust1.88-node22"
|
||||
|
||||
windows-cross:
|
||||
runs-on: linux-amd64
|
||||
@ -75,10 +75,10 @@ jobs:
|
||||
run: |
|
||||
echo "$RELEASE_TOKEN" | docker login $REGISTRY -u $REGISTRY_USER --password-stdin
|
||||
docker build \
|
||||
-t $REGISTRY/$REGISTRY_USER/trcaa-windows-cross:rust1.88-node22 \
|
||||
-t $REGISTRY/$REGISTRY_USER/tftsr-windows-cross:rust1.88-node22 \
|
||||
-f .docker/Dockerfile.windows-cross .
|
||||
docker push $REGISTRY/$REGISTRY_USER/trcaa-windows-cross:rust1.88-node22
|
||||
echo "✓ Pushed $REGISTRY/$REGISTRY_USER/trcaa-windows-cross:rust1.88-node22"
|
||||
docker push $REGISTRY/$REGISTRY_USER/tftsr-windows-cross:rust1.88-node22
|
||||
echo "✓ Pushed $REGISTRY/$REGISTRY_USER/tftsr-windows-cross:rust1.88-node22"
|
||||
|
||||
linux-arm64:
|
||||
runs-on: linux-amd64
|
||||
@ -98,7 +98,7 @@ jobs:
|
||||
run: |
|
||||
echo "$RELEASE_TOKEN" | docker login $REGISTRY -u $REGISTRY_USER --password-stdin
|
||||
docker build \
|
||||
-t $REGISTRY/$REGISTRY_USER/trcaa-linux-arm64:rust1.88-node22 \
|
||||
-t $REGISTRY/$REGISTRY_USER/tftsr-linux-arm64:rust1.88-node22 \
|
||||
-f .docker/Dockerfile.linux-arm64 .
|
||||
docker push $REGISTRY/$REGISTRY_USER/trcaa-linux-arm64:rust1.88-node22
|
||||
echo "✓ Pushed $REGISTRY/$REGISTRY_USER/trcaa-linux-arm64:rust1.88-node22"
|
||||
docker push $REGISTRY/$REGISTRY_USER/tftsr-linux-arm64:rust1.88-node22
|
||||
echo "✓ Pushed $REGISTRY/$REGISTRY_USER/tftsr-linux-arm64:rust1.88-node22"
|
||||
|
||||
18
AGENTS.md
18
AGENTS.md
@ -77,7 +77,7 @@ TypeScript mirrors this shape exactly in `tauriCommands.ts`.
|
||||
|
||||
### State Persistence
|
||||
- `sessionStore`: ephemeral triage session (issue, messages, PII spans, why-level 0–5, loading) — **not persisted**
|
||||
- `settingsStore`: persisted to `localStorage` as `"trcaa-settings"`
|
||||
- `settingsStore`: persisted to `localStorage` as `"tftsr-settings"`
|
||||
|
||||
---
|
||||
|
||||
@ -91,9 +91,9 @@ TypeScript mirrors this shape exactly in `tauriCommands.ts`.
|
||||
**Artifacts**: `src-tauri/target/{target}/release/bundle/`
|
||||
|
||||
**Environments**:
|
||||
- Test CI images at `gitea.tftsr.com:3000` (pull `trcaa-*:rust1.88-node22`)
|
||||
- Gitea instance: `http://gitea.tftsr.com:3000`
|
||||
- Wiki: sync from `docs/wiki/*.md` → `https://gogs.trcaa.com/sarman/trcaa-devops_investigation/wiki`
|
||||
- Test CI images at `172.0.0.29:3000` (pull `tftsr-*:rust1.88-node22`)
|
||||
- Gitea instance: `http://172.0.0.29:3000`
|
||||
- Wiki: sync from `docs/wiki/*.md` → `https://gogs.tftsr.com/sarman/tftsr-devops_investigation/wiki`
|
||||
|
||||
---
|
||||
|
||||
@ -107,9 +107,9 @@ TypeScript mirrors this shape exactly in `tauriCommands.ts`.
|
||||
| `RUST_LOG` | `info` | Tracing level (`debug`, `info`, `warn`, `error`) |
|
||||
|
||||
**Database path**:
|
||||
- Linux: `~/.local/share/trcaa/trcaa.db`
|
||||
- macOS: `~/Library/Application Support/trcaa/trcaa.db`
|
||||
- Windows: `%APPDATA%\trcaa\trcaa.db`
|
||||
- Linux: `~/.local/share/tftsr/tftsr.db`
|
||||
- macOS: `~/Library/Application Support/tftsr/tftsr.db`
|
||||
- Windows: `%APPDATA%\tftsr\tftsr.db`
|
||||
|
||||
---
|
||||
|
||||
@ -141,7 +141,7 @@ TypeScript mirrors this shape exactly in `tauriCommands.ts`.
|
||||
| Rust | `cargo test --manifest-path src-tauri/Cargo.toml` | 64 tests, runs in `rust:1.88-slim` container |
|
||||
| TypeScript | `npm run test:run` | Vitest, 13 tests |
|
||||
| Type check | `npx tsc --noEmit` | `skipLibCheck: true` |
|
||||
| E2E | `TAURI_BINARY_PATH=./src-tauri/target/release/trcaa npm run test:e2e` | WebdriverIO, requires compiled binary |
|
||||
| E2E | `TAURI_BINARY_PATH=./src-tauri/target/release/tftsr npm run test:e2e` | WebdriverIO, requires compiled binary |
|
||||
|
||||
**Frontend coverage**: `npm run test:coverage` → `tests/unit/` coverage report
|
||||
|
||||
@ -154,4 +154,4 @@ TypeScript mirrors this shape exactly in `tauriCommands.ts`.
|
||||
3. **PII before AI**: Always redact and record hash before external send
|
||||
4. **Port 1420**: Vite dev server is hard-coded to 1420, not 3000
|
||||
5. **Build order**: Rust fmt → clippy → test → TS check → JS test
|
||||
6. **CI images**: Use `gitea.tftsr.com:3000` registry for pre-baked builder images
|
||||
6. **CI images**: Use `172.0.0.29:3000` registry for pre-baked builder images
|
||||
|
||||
@ -168,10 +168,10 @@ CI, chore, and build changes are excluded.
|
||||
- Use bash shell and remove bash-only substring expansion in pr-review
|
||||
- Restore migration 014, bump version to 0.2.50, harden pr-review workflow
|
||||
- Harden pr-review workflow and sync versions to 0.2.50
|
||||
- Configure container DNS to resolve ollama-ui.trcaa.com
|
||||
- Configure container DNS to resolve ollama-ui.tftsr.com
|
||||
- Harden pr-review workflow — URLs, DNS, correctness and reliability
|
||||
- Resolve AI review false positives and address high/medium issues
|
||||
- Replace github.server_url with hardcoded gogs.trcaa.com for container access
|
||||
- Replace github.server_url with hardcoded gogs.tftsr.com for container access
|
||||
- Revert to two-dot diff — three-dot requires merge base unavailable in shallow clone
|
||||
- Harden pr-review workflow — secret redaction, log safety, auth header
|
||||
- **ci**: Address AI review — rustup idempotency and cargo --locked
|
||||
@ -251,7 +251,7 @@ CI, chore, and build changes are excluded.
|
||||
- Add multi-mode authentication for integrations (v0.2.10)
|
||||
- Complete webview cookie extraction implementation
|
||||
- Add custom_rest provider mode and rebrand application name
|
||||
- **rebrand**: Rename binary to trcaa and auto-generate DB key
|
||||
- **rebrand**: Rename binary to tftsr and auto-generate DB key
|
||||
- **ui**: Fix model dropdown, auth prefill, PII persistence, theme toggle, and Ollama bundle
|
||||
- **ci**: Add persistent pre-baked Docker builder images
|
||||
- **ai**: Add tool-calling and integration search as AI data source
|
||||
|
||||
12
CLAUDE.md
12
CLAUDE.md
@ -78,8 +78,8 @@ cargo tauri build # Outputs to src-tauri/target/release/bundle/
|
||||
### CI/CD
|
||||
|
||||
- **Test pipeline**: `.github/workflows/test.yml` — runs on every push/PR targeting `main`
|
||||
- **Release pipeline**: `.github/workflows/release.yml` — runs on every push to `main`, auto-tags, produces multi-platform bundles (Linux amd64+arm64, Windows, macOS arm64+Intel), uploads to GitHub Releases at `https://github.com/tftsr/apollo_nxt-trcaa/releases`
|
||||
- **Docker builder images**: `.github/workflows/build-images.yml` — rebuilds `ghcr.io/tftsr/trcaa-*` images when `.docker/**` changes on `main`
|
||||
- **Release pipeline**: `.github/workflows/release.yml` — runs on every push to `main`, auto-tags, produces multi-platform bundles (Linux amd64+arm64, Windows, macOS arm64+Intel), uploads to GitHub Releases at `https://gogs.tftsr.com/sarman/apollo_nxt-tftsr/releases`
|
||||
- **Docker builder images**: `.github/workflows/build-images.yml` — rebuilds `ghcr.io/tftsr/tftsr-*` images when `.docker/**` changes on `main`
|
||||
|
||||
---
|
||||
|
||||
@ -94,7 +94,7 @@ cargo tauri build # Outputs to src-tauri/target/release/bundle/
|
||||
pub struct AppState {
|
||||
pub db: Arc<Mutex<rusqlite::Connection>>,
|
||||
pub settings: Arc<Mutex<AppSettings>>,
|
||||
pub app_data_dir: PathBuf, // ~/.local/share/trcaa on Linux
|
||||
pub app_data_dir: PathBuf, // ~/.local/share/tftsr on Linux
|
||||
}
|
||||
```
|
||||
|
||||
@ -128,7 +128,7 @@ All command handlers receive `State<'_, AppState>` as a Tauri-injected parameter
|
||||
|
||||
**Stores** (Zustand):
|
||||
- `sessionStore.ts` — ephemeral triage session: current issue, chat messages, PII spans, why-level (0–5), loading state. **Not persisted.**
|
||||
- `settingsStore.ts` — AI providers, theme, Ollama URL. **Persisted** to `localStorage` as `"trcaa-settings"`.
|
||||
- `settingsStore.ts` — AI providers, theme, Ollama URL. **Persisted** to `localStorage` as `"tftsr-settings"`.
|
||||
- `historyStore.ts` — read-only cache of past issues for the History page.
|
||||
|
||||
**Page flow**:
|
||||
@ -203,7 +203,7 @@ Before any text is sent to an AI provider, `apply_redactions` must be called and
|
||||
|
||||
### GitHub Actions CI
|
||||
|
||||
All pipelines run on GitHub Actions at `https://github.com/tftsr/apollo_nxt-trcaa/actions`.
|
||||
All pipelines run on GitHub Actions at `https://gogs.tftsr.com/sarman/apollo_nxt-tftsr/actions`.
|
||||
|
||||
- `GITHUB_TOKEN` is the only credential needed — no external secrets required
|
||||
- Builder images are hosted on `ghcr.io/tftsr/` (GitHub Container Registry)
|
||||
@ -214,7 +214,7 @@ All pipelines run on GitHub Actions at `https://github.com/tftsr/apollo_nxt-trca
|
||||
|
||||
## Wiki Maintenance
|
||||
|
||||
The project wiki lives at `https://github.com/tftsr/apollo_nxt-trcaa/wiki`.
|
||||
The project wiki lives at `https://gogs.tftsr.com/sarman/apollo_nxt-tftsr/wiki`.
|
||||
|
||||
**Source of truth**: `docs/wiki/*.md` in this repo. The `wiki-sync` job (in `.github/workflows/release.yml`) automatically pushes any changes to the GitHub wiki on every push to `main`.
|
||||
|
||||
|
||||
8
Makefile
8
Makefile
@ -1,4 +1,4 @@
|
||||
GH_REPO := msicie/apollo_nxt-trcaa
|
||||
GOGS_REPO := msicie/apollo_nxt-tftsr
|
||||
TAG ?= v0.1.0-alpha
|
||||
TARGET := aarch64-unknown-linux-gnu
|
||||
|
||||
@ -34,11 +34,11 @@ build-arm64:
|
||||
|
||||
.PHONY: upload-arm64
|
||||
upload-arm64:
|
||||
@test -n "$(GH_TOKEN)" || (echo "ERROR: set GH_TOKEN env var"; exit 1)
|
||||
@test -n "$(GOGS_TOKEN)" || (echo "ERROR: set GOGS_TOKEN env var"; exit 1)
|
||||
@for f in artifacts/linux-arm64/*; do \
|
||||
[ -f "$$f" ] || continue; \
|
||||
NAME="linux-arm64-$$(basename $$f)"; \
|
||||
echo "Uploading $$NAME..."; \
|
||||
GH_TOKEN=$(GH_TOKEN) gh release upload $(TAG) "$$f#$$NAME" \
|
||||
--repo $(GH_REPO) && echo "OK" || echo "FAIL: $$f"; \
|
||||
GOGS_TOKEN=$(GOGS_TOKEN) # gh release upload $(TAG) "$$f#$$NAME" \
|
||||
--repo $(GOGS_REPO) && echo "OK" || echo "FAIL: $$f"; \
|
||||
done
|
||||
|
||||
416
PLAN.md
416
PLAN.md
@ -1,416 +0,0 @@
|
||||
# TRCAA — IT Triage & Root-Cause Analysis Desktop Application
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### Overview
|
||||
|
||||
TRCAA is a **desktop-first, offline-capable** application that helps IT teams
|
||||
perform structured incident triage using the *5-Whys* methodology, backed by
|
||||
pluggable AI providers (Ollama local, OpenAI, Anthropic, Mistral, Gemini).
|
||||
It automates PII redaction, guides engineers through root-cause analysis, and
|
||||
produces post-mortem documents (Markdown / PDF / DOCX).
|
||||
|
||||
---
|
||||
|
||||
## Architecture Decisions
|
||||
|
||||
| Area | Choice | Rationale |
|
||||
|------|--------|-----------|
|
||||
| Desktop framework | **Tauri 2.x** | Small binary, native webview, Rust backend for security |
|
||||
| Frontend framework | **React 18** | Large ecosystem, component model fits wizard-style UX |
|
||||
| State management | **Zustand** | Minimal boilerplate, TypeScript-friendly, no context nesting |
|
||||
| Local database | **SQLCipher** (via `rusqlite` + `bundled-sqlcipher`) | Encrypted SQLite — secrets and PII at rest |
|
||||
| Secret storage | **Tauri Stronghold** | OS-keychain-grade encrypted vault for API keys |
|
||||
| AI providers | Ollama (local), OpenAI, Anthropic, Mistral, Gemini | User choice; local-first with cloud fallback |
|
||||
| Unit tests (frontend) | **Vitest** | Fast, Vite-native, first-class TS support |
|
||||
| E2E tests | **WebdriverIO + tauri-driver** | Official Tauri E2E path, cross-platform |
|
||||
| CI/CD | **Woodpecker CI** (Gogs at `gitea.tftsr.com:3000`) | Self-hosted, Docker-native, YAML pipelines |
|
||||
| Bundling | Vite 6 | Dev server + production build, used by Tauri CLI |
|
||||
|
||||
---
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
trcaa/
|
||||
├── .woodpecker/
|
||||
│ ├── test.yml # lint + unit tests on push / PR
|
||||
│ └── release.yml # multi-platform build on tag
|
||||
├── cli/
|
||||
│ ├── package.json
|
||||
│ └── src/
|
||||
│ └── main.ts # minimal CLI entry point
|
||||
├── src/ # React frontend
|
||||
│ ├── assets/
|
||||
│ ├── components/
|
||||
│ │ ├── common/ # Button, Card, Modal, DropZone …
|
||||
│ │ ├── dashboard/ # IssueList, StatsCards
|
||||
│ │ ├── triage/ # WhyStep, ChatBubble, ProgressBar
|
||||
│ │ ├── rca/ # DocEditor, ExportBar
|
||||
│ │ ├── settings/ # ProviderForm, ThemeToggle
|
||||
│ │ └── pii/ # PiiHighlighter, RedactionPreview
|
||||
│ ├── hooks/ # useInvoke, useListener, useTheme …
|
||||
│ ├── lib/
|
||||
│ │ ├── tauriCommands.ts # typed invoke wrappers & TS types
|
||||
│ │ └── utils.ts # date formatting, debounce, etc.
|
||||
│ ├── pages/
|
||||
│ │ ├── DashboardPage.tsx
|
||||
│ │ ├── NewIssuePage.tsx
|
||||
│ │ ├── TriagePage.tsx
|
||||
│ │ ├── RcaPage.tsx
|
||||
│ │ ├── LogViewerPage.tsx
|
||||
│ │ └── SettingsPage.tsx
|
||||
│ ├── stores/
|
||||
│ │ ├── sessionStore.ts # current triage session state
|
||||
│ │ └── settingsStore.ts # theme, providers, preferences
|
||||
│ ├── App.tsx
|
||||
│ └── main.tsx
|
||||
├── src-tauri/
|
||||
│ ├── Cargo.toml
|
||||
│ ├── tauri.conf.json
|
||||
│ ├── capabilities/
|
||||
│ │ └── default.json
|
||||
│ ├── icons/
|
||||
│ ├── src/
|
||||
│ │ ├── main.rs # Tauri entry point
|
||||
│ │ ├── db.rs # SQLCipher connection & migrations
|
||||
│ │ ├── commands/ # IPC command modules
|
||||
│ │ │ ├── mod.rs
|
||||
│ │ │ ├── issues.rs
|
||||
│ │ │ ├── triage.rs
|
||||
│ │ │ ├── logs.rs
|
||||
│ │ │ ├── pii.rs
|
||||
│ │ │ ├── rca.rs
|
||||
│ │ │ ├── ai.rs
|
||||
│ │ │ └── settings.rs
|
||||
│ │ ├── ai/ # AI provider abstractions
|
||||
│ │ │ ├── mod.rs
|
||||
│ │ │ ├── ollama.rs
|
||||
│ │ │ ├── openai_compat.rs
|
||||
│ │ │ └── prompt_templates.rs
|
||||
│ │ ├── pii/ # PII detection engine
|
||||
│ │ │ ├── mod.rs
|
||||
│ │ │ └── patterns.rs
|
||||
│ │ └── export/ # Document export
|
||||
│ │ ├── mod.rs
|
||||
│ │ ├── markdown.rs
|
||||
│ │ ├── pdf.rs
|
||||
│ │ └── docx.rs
|
||||
│ └── migrations/
|
||||
│ └── 001_init.sql
|
||||
├── tests/
|
||||
│ ├── unit/
|
||||
│ │ ├── setup.ts
|
||||
│ │ ├── pii.test.ts
|
||||
│ │ ├── sessionStore.test.ts
|
||||
│ │ └── settingsStore.test.ts
|
||||
│ └── e2e/
|
||||
│ ├── wdio.conf.ts
|
||||
│ ├── helpers/
|
||||
│ │ └── app.ts
|
||||
│ └── specs/
|
||||
│ ├── onboarding.spec.ts
|
||||
│ ├── log-upload.spec.ts
|
||||
│ ├── triage-flow.spec.ts
|
||||
│ └── rca-export.spec.ts
|
||||
├── package.json
|
||||
├── tsconfig.json
|
||||
├── vite.config.ts
|
||||
└── PLAN.md # ← this file
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Schema (SQLCipher)
|
||||
|
||||
All tables live in a single encrypted `trcaa.db` file under the Tauri
|
||||
app-data directory.
|
||||
|
||||
### 1. `issues`
|
||||
```sql
|
||||
CREATE TABLE issues (
|
||||
id TEXT PRIMARY KEY,
|
||||
title TEXT NOT NULL,
|
||||
domain TEXT NOT NULL CHECK(domain IN
|
||||
('linux','windows','network','k8s','db','virt','hw','obs')),
|
||||
status TEXT NOT NULL DEFAULT 'open'
|
||||
CHECK(status IN ('open','triaging','resolved','closed')),
|
||||
severity TEXT CHECK(severity IN ('p1','p2','p3','p4')),
|
||||
created_at INTEGER NOT NULL,
|
||||
updated_at INTEGER NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### 2. `triage_messages`
|
||||
```sql
|
||||
CREATE TABLE triage_messages (
|
||||
id TEXT PRIMARY KEY,
|
||||
issue_id TEXT NOT NULL REFERENCES issues(id),
|
||||
role TEXT NOT NULL CHECK(role IN ('user','assistant','system')),
|
||||
content TEXT NOT NULL,
|
||||
why_level INTEGER NOT NULL DEFAULT 0,
|
||||
created_at INTEGER NOT NULL
|
||||
);
|
||||
CREATE INDEX idx_triage_msg_issue ON triage_messages(issue_id);
|
||||
```
|
||||
|
||||
### 3. `log_files`
|
||||
```sql
|
||||
CREATE TABLE log_files (
|
||||
id TEXT PRIMARY KEY,
|
||||
issue_id TEXT NOT NULL REFERENCES issues(id),
|
||||
filename TEXT NOT NULL,
|
||||
content TEXT NOT NULL,
|
||||
mime_type TEXT,
|
||||
size_bytes INTEGER,
|
||||
created_at INTEGER NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### 4. `pii_spans`
|
||||
```sql
|
||||
CREATE TABLE pii_spans (
|
||||
id TEXT PRIMARY KEY,
|
||||
log_file_id TEXT NOT NULL REFERENCES log_files(id),
|
||||
pii_type TEXT NOT NULL,
|
||||
start_pos INTEGER NOT NULL,
|
||||
end_pos INTEGER NOT NULL,
|
||||
original TEXT NOT NULL,
|
||||
replacement TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### 5. `rca_documents`
|
||||
```sql
|
||||
CREATE TABLE rca_documents (
|
||||
id TEXT PRIMARY KEY,
|
||||
issue_id TEXT NOT NULL REFERENCES issues(id) UNIQUE,
|
||||
content TEXT NOT NULL DEFAULT '',
|
||||
format TEXT NOT NULL DEFAULT 'markdown',
|
||||
created_at INTEGER NOT NULL,
|
||||
updated_at INTEGER NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### 6. `ai_providers`
|
||||
```sql
|
||||
CREATE TABLE ai_providers (
|
||||
id TEXT PRIMARY KEY,
|
||||
name TEXT NOT NULL UNIQUE,
|
||||
api_url TEXT NOT NULL,
|
||||
model TEXT NOT NULL,
|
||||
created_at INTEGER NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### 7. `settings`
|
||||
```sql
|
||||
CREATE TABLE settings (
|
||||
key TEXT PRIMARY KEY,
|
||||
value TEXT NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
### 8. `export_history`
|
||||
```sql
|
||||
CREATE TABLE export_history (
|
||||
id TEXT PRIMARY KEY,
|
||||
issue_id TEXT NOT NULL REFERENCES issues(id),
|
||||
format TEXT NOT NULL CHECK(format IN ('md','pdf','docx')),
|
||||
file_path TEXT NOT NULL,
|
||||
created_at INTEGER NOT NULL
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## IPC Command Interface
|
||||
|
||||
All frontend ↔ backend communication goes through Tauri's `invoke()`.
|
||||
|
||||
### Issue commands
|
||||
| Command | Payload | Returns |
|
||||
|---------|---------|---------|
|
||||
| `create_issue` | `{ title, domain, severity }` | `Issue` |
|
||||
| `list_issues` | `{ status?, domain? }` | `Issue[]` |
|
||||
| `get_issue` | `{ id }` | `Issue` |
|
||||
| `update_issue` | `{ id, title?, status?, severity? }` | `Issue` |
|
||||
| `delete_issue` | `{ id }` | `void` |
|
||||
|
||||
### Triage commands
|
||||
| Command | Payload | Returns |
|
||||
|---------|---------|---------|
|
||||
| `send_triage_message` | `{ issueId, content, whyLevel }` | `TriageMessage` (assistant reply) |
|
||||
| `get_triage_history` | `{ issueId }` | `TriageMessage[]` |
|
||||
| `set_why_level` | `{ issueId, level }` | `void` |
|
||||
|
||||
### Log commands
|
||||
| Command | Payload | Returns |
|
||||
|---------|---------|---------|
|
||||
| `upload_log` | `{ issueId, filename, content }` | `LogFile` |
|
||||
| `list_logs` | `{ issueId }` | `LogFile[]` |
|
||||
| `delete_log` | `{ id }` | `void` |
|
||||
|
||||
### PII commands
|
||||
| Command | Payload | Returns |
|
||||
|---------|---------|---------|
|
||||
| `detect_pii` | `{ logFileId }` | `PiiDetectionResult` |
|
||||
| `apply_redactions` | `{ logFileId, spanIds }` | `string` (redacted text) |
|
||||
|
||||
### RCA / Export commands
|
||||
| Command | Payload | Returns |
|
||||
|---------|---------|---------|
|
||||
| `generate_rca` | `{ issueId }` | `RcaDocument` |
|
||||
| `update_rca` | `{ id, content }` | `RcaDocument` |
|
||||
| `export_document` | `{ issueId, format }` | `string` (file path) |
|
||||
|
||||
### AI / Settings commands
|
||||
| Command | Payload | Returns |
|
||||
|---------|---------|---------|
|
||||
| `test_provider` | `{ name, apiUrl, apiKey?, model }` | `{ ok, message }` |
|
||||
| `save_provider` | `{ provider }` | `void` |
|
||||
| `get_settings` | `{}` | `Settings` |
|
||||
| `update_settings` | `{ key, value }` | `void` |
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Approach
|
||||
|
||||
### Infrastructure
|
||||
- **Git server**: Gogs at `http://gitea.tftsr.com:3000`
|
||||
- **CI runner**: Woodpecker CI with Docker executor
|
||||
- **Artifacts**: Uploaded to Gogs releases via API
|
||||
|
||||
### Pipelines
|
||||
|
||||
| Pipeline | Trigger | Steps |
|
||||
|----------|---------|-------|
|
||||
| `.woodpecker/test.yml` | push, PR | `rustfmt` check → Clippy → Rust tests → TS typecheck → Vitest → coverage (main only) |
|
||||
| `.woodpecker/release.yml` | `v*` tag | Build linux-amd64 → Build linux-arm64 → Upload to Gogs release |
|
||||
|
||||
---
|
||||
|
||||
## Security Implementation
|
||||
|
||||
1. **Database encryption** — SQLCipher with a key derived from Tauri Stronghold.
|
||||
2. **API key storage** — Stronghold vault, never stored in plaintext.
|
||||
3. **PII redaction** — Regex + heuristic engine runs before any text leaves the device.
|
||||
4. **CSP** — Strict Content-Security-Policy in `tauri.conf.json`; only allowlisted AI API origins.
|
||||
5. **Least-privilege capabilities** — `capabilities/default.json` grants only required Tauri permissions.
|
||||
6. **No remote code** — All assets bundled; no CDN scripts.
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
| Layer | Tool | Location | What it covers |
|
||||
|-------|------|----------|----------------|
|
||||
| Rust unit | `cargo test` | `src-tauri/src/**` | DB operations, PII regex, AI prompt building |
|
||||
| Frontend unit | Vitest | `tests/unit/` | Stores, command wrappers, component logic |
|
||||
| E2E | WebdriverIO + tauri-driver | `tests/e2e/` | Full user flows: onboarding, triage, export |
|
||||
| Lint | `rustfmt` + Clippy + `tsc --noEmit` | CI | Code style, type safety |
|
||||
|
||||
---
|
||||
|
||||
## Implementation Phases
|
||||
|
||||
### Phase 1 — Project Scaffold & CI ✅ COMPLETE
|
||||
- [x] Initialise repo with Tauri 2.x + React 18 + Vite
|
||||
- [x] Configure `tauri.conf.json` and capabilities
|
||||
- [x] Set up Woodpecker CI pipelines (`test.yml`, `release.yml`)
|
||||
- [x] Write Vitest setup and mock harness
|
||||
- [x] Write initial unit tests (PII, sessionStore, settingsStore) — 13/13 passing
|
||||
- [x] Write E2E scaffolding (wdio config, helpers, skeleton specs)
|
||||
- [x] Create CLI stub (`cli/`)
|
||||
- [x] Push to Gogs at http://gitea.tftsr.com:3000/sarman/trcaa-devops_investigation
|
||||
- [x] Write README.md
|
||||
- [x] Deploy Woodpecker CI v0.15.4 (server + agent + nginx proxy)
|
||||
- [ ] **BLOCKED**: Verify CI green on push (Woodpecker hook auth issue — see below)
|
||||
|
||||
### Phase 2 — Database & Migrations ✅ COMPLETE
|
||||
- [x] Integrate `rusqlite` + `bundled-sqlcipher`
|
||||
- [x] Write migrations (10 tables: issues, log_files, pii_spans, ai_conversations, ai_messages, resolution_steps, documents, audit_log, settings, integration_publishes)
|
||||
- [x] Implement migration runner in `db/migrations.rs`
|
||||
- [x] DB models with all required types
|
||||
|
||||
### Phase 3 — Stronghold Integration ✅ COMPLETE (scaffold)
|
||||
- [x] `tauri-plugin-stronghold` registered in `lib.rs`
|
||||
- [x] Password derivation function configured
|
||||
- [ ] Full key lifecycle tests (deferred to Phase 3 proper)
|
||||
|
||||
### Phase 4 — Issue CRUD ✅ COMPLETE
|
||||
- [x] All issue CRUD commands: create, get, list, update, delete, search
|
||||
- [x] 5-Whys tracking: add_five_why, update_five_why
|
||||
- [x] Timeline events: add_timeline_event
|
||||
- [x] Dashboard, NewIssue, History pages
|
||||
|
||||
### Phase 5 — Log Ingestion & PII Detection ✅ COMPLETE
|
||||
- [x] `upload_log_file`, `detect_pii`, `apply_redactions` commands
|
||||
- [x] PII engine: 11 regex patterns (IPv4, IPv6, email, phone, SSN, CC, MAC, bearer, password, API key, URL)
|
||||
- [x] PiiDiffViewer component
|
||||
- [x] LogUpload page
|
||||
|
||||
### Phase 6 — AI Provider Abstraction ✅ COMPLETE
|
||||
- [x] OpenAI-compatible, Anthropic, Gemini, Mistral, Ollama providers
|
||||
- [x] `analyze_logs`, `chat_message`, `list_providers` IPC commands
|
||||
- [x] Settings/AIProviders page
|
||||
- [x] 8 IT domain system prompts
|
||||
|
||||
### Phase 7 — 5-Whys Triage Engine ✅ COMPLETE
|
||||
- [x] Triage page with ChatWindow
|
||||
- [x] TriageProgress component (5-step indicator)
|
||||
- [x] Auto-detection of why level from AI responses
|
||||
- [x] Session store with message persistence
|
||||
|
||||
### Phase 8 — RCA & Post-Mortem Generation ✅ COMPLETE
|
||||
- [x] `generate_rca`, `generate_postmortem` commands
|
||||
- [x] RCA and post-mortem Markdown templates
|
||||
- [x] DocEditor component with export (MD, PDF)
|
||||
- [x] RCA and Postmortem pages
|
||||
|
||||
### Phase 9 — Document Export ✅ COMPLETE (MD + PDF)
|
||||
- [x] Markdown export
|
||||
- [x] PDF export via `printpdf`
|
||||
- [ ] DOCX export (not yet implemented — docx-rs dep removed for simplicity)
|
||||
|
||||
### Phase 10 — Polish & Settings ✅ COMPLETE
|
||||
- [x] Dark/light theme via Tailwind + CSS variables
|
||||
- [x] Ollama settings page with hardware detection + model management
|
||||
- [x] Security page with audit log
|
||||
- [x] Integrations page (v0.2 stubs)
|
||||
|
||||
### Phase 11 — Woodpecker CI Integration ✅ COMPLETE
|
||||
- [x] Woodpecker CI v0.15.4 deployed at http://gitea.tftsr.com:8084
|
||||
- [x] Webhook delivery: Gogs pushes trigger Woodpecker via `?access_token=<JWT>`
|
||||
- [x] Repo activated (DB direct): `repo_active=1`, `repo_trusted=1`, `repo_config_path=.woodpecker/test.yml`
|
||||
- [x] Clone override: `CI_REPO_CLONE_URL` + `network_mode: gogs_default` for step containers
|
||||
- [x] All CI steps green (build #19): fmt → clippy → rust-tests (64/64) → ts-check → vitest
|
||||
- [x] Token security: old tokens rotated, removed from git history, `.gitignore` updated
|
||||
- [x] Gogs repo set to public (for unauthenticated clone from step containers)
|
||||
|
||||
### Phase 12 — Release Package 🔲 PENDING
|
||||
- [ ] Tag v0.1.0-alpha
|
||||
- [ ] Verify Woodpecker builds Linux amd64 + arm64
|
||||
- [ ] Verify artifacts upload to Gogs release
|
||||
- [ ] Smoke-test installed packages
|
||||
|
||||
---
|
||||
|
||||
## Known Issues & Gotchas
|
||||
|
||||
### Gogs Token Authentication
|
||||
- The `sha1` in the Gogs CREATE token API response IS the actual bearer token
|
||||
- Gogs stores `sha1(token)` and `sha256(token)` in the DB — these are HASHES, not the token itself
|
||||
- Woodpecker user token stored in Woodpecker SQLite DB only (never commit token values)
|
||||
|
||||
### Woodpecker CI + Gogs v0.15.4 Compatibility
|
||||
- The SPA form login uses `login=` field but Gogs backend reads `username=`
|
||||
- Workaround: nginx proxy at :8085 serves custom HTML login page
|
||||
- The webhook `?token=` URL param is NOT read by Woodpecker's `token.ParseRequest()`
|
||||
- Use `?access_token=<JWT>` instead (JWT must be HS256 signed with `repo_hash` as key)
|
||||
- Gogs 0.14 has no OAuth2 provider support — blocks upgrade to Woodpecker 2.x
|
||||
|
||||
### Rust/DB Type Notes
|
||||
- IssueDetail is NESTED: `{ issue: Issue, log_files, resolution_steps, conversations }`
|
||||
- DB uses TEXT timestamps for created_at/updated_at (not INTEGER)
|
||||
- All commands use the `and_then` pattern with rusqlite to avoid lifetime issues
|
||||
24
README.md
24
README.md
@ -6,7 +6,7 @@ A structured, AI-backed desktop tool for IT incident triage, 5-Whys root cause a
|
||||
|
||||
Built with **Tauri 2** (Rust + WebView), **React 18**, **TypeScript**, and **SQLCipher AES-256** encrypted storage.
|
||||
|
||||
**CI status:**  — all checks green (rustfmt · clippy · 64 Rust tests · tsc · vitest)
|
||||
**CI status:**  — all checks green (rustfmt · clippy · 64 Rust tests · tsc · vitest)
|
||||
|
||||
---
|
||||
|
||||
@ -92,8 +92,8 @@ node --version # 22+
|
||||
|
||||
```bash
|
||||
# Clone
|
||||
git clone https://gogs.trcaa.com/sarman/trcaa-devops_investigation.git
|
||||
cd trcaa-devops_investigation
|
||||
git clone https://gogs.tftsr.com/sarman/tftsr-devops_investigation.git
|
||||
cd tftsr-devops_investigation
|
||||
npm install --legacy-peer-deps
|
||||
|
||||
# Development mode (hot reload)
|
||||
@ -109,7 +109,7 @@ cargo tauri build
|
||||
|
||||
## Releases
|
||||
|
||||
Pre-built installers are attached to each [tagged release](https://gogs.trcaa.com/sarman/trcaa-devops_investigation/releases):
|
||||
Pre-built installers are attached to each [tagged release](https://gogs.tftsr.com/sarman/tftsr-devops_investigation/releases):
|
||||
|
||||
| Platform | Format | Notes |
|
||||
|---|---|---|
|
||||
@ -175,7 +175,7 @@ To use Claude via AWS Bedrock (ideal for enterprise environments with existing A
|
||||
- API Key: `sk-your-secure-key` (from config)
|
||||
- Model: `bedrock-claude`
|
||||
|
||||
For detailed setup including multiple AWS accounts and Claude Code integration, see the [LiteLLM + Bedrock wiki page](https://gogs.trcaa.com/sarman/trcaa-devops_investigation/wiki/LiteLLM-Bedrock-Setup).
|
||||
For detailed setup including multiple AWS accounts and Claude Code integration, see the [LiteLLM + Bedrock wiki page](https://gogs.tftsr.com/sarman/tftsr-devops_investigation/wiki/LiteLLM-Bedrock-Setup).
|
||||
|
||||
---
|
||||
|
||||
@ -195,7 +195,7 @@ For detailed setup including multiple AWS accounts and Claude Code integration,
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
trcaa/
|
||||
tftsr/
|
||||
├── src-tauri/src/
|
||||
│ ├── ai/ # AI provider clients (OpenAI, Anthropic, Gemini, Mistral, Ollama)
|
||||
│ ├── pii/ # PII detection + redaction engine
|
||||
@ -242,14 +242,14 @@ cargo check --manifest-path src-tauri/Cargo.toml
|
||||
cargo test --manifest-path src-tauri/Cargo.toml
|
||||
|
||||
# E2E tests (requires compiled app binary)
|
||||
TAURI_BINARY_PATH=./src-tauri/target/release/trcaa npm run test:e2e
|
||||
TAURI_BINARY_PATH=./src-tauri/target/release/tftsr npm run test:e2e
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CI/CD — Gitea Actions
|
||||
|
||||
The project uses **Gitea Actions** (act_runner v0.3.1) connected to the Gitea instance at `gogs.trcaa.com`.
|
||||
The project uses **Gitea Actions** (act_runner v0.3.1) connected to the Gitea instance at `gogs.tftsr.com`.
|
||||
|
||||
| Workflow | Trigger | Jobs |
|
||||
|---|---|---|
|
||||
@ -265,7 +265,7 @@ The project uses **Gitea Actions** (act_runner v0.3.1) connected to the Gitea in
|
||||
|
||||
**Branch protection:** master requires a PR approved by `sarman`, with all 5 CI checks passing before merge.
|
||||
|
||||
> See [CI/CD Pipeline wiki](https://gogs.trcaa.com/sarman/trcaa-devops_investigation/wiki/CICD-Pipeline) for full infrastructure docs.
|
||||
> See [CI/CD Pipeline wiki](https://gogs.tftsr.com/sarman/tftsr-devops_investigation/wiki/CICD-Pipeline) for full infrastructure docs.
|
||||
|
||||
---
|
||||
|
||||
@ -290,9 +290,9 @@ All data is stored locally in a SQLCipher-encrypted database at:
|
||||
|
||||
| OS | Path |
|
||||
|---|---|
|
||||
| Linux | `~/.local/share/trcaa/trcaa.db` |
|
||||
| macOS | `~/Library/Application Support/trcaa/trcaa.db` |
|
||||
| Windows | `%APPDATA%\trcaa\trcaa.db` |
|
||||
| Linux | `~/.local/share/tftsr/tftsr.db` |
|
||||
| macOS | `~/Library/Application Support/tftsr/tftsr.db` |
|
||||
| Windows | `%APPDATA%\tftsr\tftsr.db` |
|
||||
|
||||
Override with the `TRCAA_DATA_DIR` (or legacy `TRCAA_DATA_DIR`) environment variable.
|
||||
|
||||
|
||||
@ -1,335 +0,0 @@
|
||||
# Security Audit Report
|
||||
|
||||
**Application**: Troubleshooting and RCA Assistant (TRCAA)
|
||||
**Audit Date**: 2026-04-06
|
||||
**Scope**: All git-tracked source files (159 files)
|
||||
**Context**: Pre-open-source release under MIT license
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The codebase is generally well-structured with several positive security practices already in place: parameterized SQL queries, AES-256-GCM credential encryption, PKCE for OAuth flows, PII detection and redaction before AI transmission, hash-chained audit logs, and a restrictive CSP. However, the audit identified **3 CRITICAL**, **5 HIGH**, **5 MEDIUM**, and **5 LOW** findings that must be addressed before public release.
|
||||
|
||||
---
|
||||
|
||||
## CRITICAL Findings
|
||||
|
||||
### C1. Corporate-Internal Documents Shipped in Repository
|
||||
|
||||
**Files**:
|
||||
- `GenAI API User Guide.md` (entire file)
|
||||
- `HANDOFF-TFTSR-GENAI.md` (entire file)
|
||||
|
||||
**Issue**: These files contain proprietary TFTSR / TFTSR internal documentation. `GenAI API User Guide.md` is authored by named TFTSR employees (Dipjyoti Bisharad, Jahnavi Alike, Sunil Vurandur, Anjali Kamath, Vibin Jacob, Girish Manivel) and documents internal API contracts at `genai-service.stage.commandcentral.com` and `genai-service.commandcentral.com`. `HANDOFF-TFTSR-GENAI.md` explicitly references "TFTSR GenAI API" integration details including internal endpoint URLs, header formats, and payload contracts.
|
||||
|
||||
Publishing these files under MIT license likely violates corporate IP agreements and exposes internal infrastructure details.
|
||||
|
||||
**Recommended Fix**: Remove both files from the repository entirely and scrub from git history using `git filter-repo` before making the repo public.
|
||||
|
||||
---
|
||||
|
||||
### C2. Internal Infrastructure URLs Hardcoded in CSP and Source
|
||||
|
||||
**File**: `src-tauri/tauri.conf.json`, line 13
|
||||
**Also**: `src-tauri/src/ai/openai.rs`, line 219
|
||||
|
||||
**Issue**: The CSP `connect-src` directive includes corporate-internal endpoints:
|
||||
```
|
||||
https://genai-service.stage.commandcentral.com
|
||||
https://genai-service.commandcentral.com
|
||||
```
|
||||
|
||||
Additionally, `openai.rs` line 219 sends `X-msi-genai-client: troubleshooting-rca-assistant` as a hardcoded header in the custom REST path, tying the application to an internal TFTSR service.
|
||||
|
||||
These expose internal service infrastructure to anyone reading the source and indicate the app was designed to interact with corporate systems.
|
||||
|
||||
**Recommended Fix**:
|
||||
- Remove the two `commandcentral.com` entries from the CSP.
|
||||
- Remove or make the `X-msi-genai-client` header configurable rather than hardcoded.
|
||||
- Audit the CSP to ensure only generic/public endpoints remain (OpenAI, Anthropic, Mistral, Google, Ollama, Atlassian, Microsoft are fine).
|
||||
|
||||
---
|
||||
|
||||
### C3. Private Gogs Server IP Exposed in All CI Workflows
|
||||
|
||||
**Files**:
|
||||
- `.gitea/workflows/test.yml` (lines 17, 44, 72, 99, 126)
|
||||
- `.gitea/workflows/auto-tag.yml` (lines 31, 52, 79, 95, 97, 141, 162, 227, 252, 313, 338, 401, 464)
|
||||
- `.gitea/workflows/build-images.yml` (lines 4, 10, 11, 16-18, 33, 46, 69, 92)
|
||||
|
||||
**Issue**: All CI workflow files reference `gitea.tftsr.com:3000` (a private Gogs instance) and `sarman` username. While the IP is RFC1918 private address space, it reveals internal infrastructure topology and the developer's username across dozens of lines. The `build-images.yml` also exposes `REGISTRY_USER: sarman` and container registry details.
|
||||
|
||||
**Recommended Fix**: Before open-sourcing, replace all workflow files with GitHub Actions equivalents, or at minimum replace the hardcoded private IP and username with parameterized variables or remove the `.gitea/` directory entirely if moving to GitHub.
|
||||
|
||||
---
|
||||
|
||||
## HIGH Findings
|
||||
|
||||
### H1. Hardcoded Development Encryption Key in Auth Module
|
||||
|
||||
**File**: `src-tauri/src/integrations/auth.rs`, line 179
|
||||
|
||||
```rust
|
||||
return Ok("dev-key-change-me-in-production-32b".to_string());
|
||||
```
|
||||
|
||||
**Issue**: In debug builds, the credential encryption key is a well-known hardcoded string. Anyone reading the source can decrypt any credentials stored by a debug build. Since this is about to be open source, attackers know the exact key to use against any debug-mode installation.
|
||||
|
||||
**Also at**: `src-tauri/src/db/connection.rs`, line 39: `"dev-key-change-in-prod"`
|
||||
|
||||
While this is gated behind `cfg!(debug_assertions)`, open-sourcing the code means the development key is permanently public knowledge. If any user runs a debug build or if the release profile check is ever misconfigured, all stored credentials are trivially decryptable.
|
||||
|
||||
**Recommended Fix**:
|
||||
- Remove the hardcoded dev key entirely.
|
||||
- In debug mode, auto-generate and persist a random key the same way the release path does (lines 44-57 of `connection.rs` already implement this pattern).
|
||||
- Document in a `SECURITY.md` file that credentials are encrypted at rest and the key management approach.
|
||||
|
||||
---
|
||||
|
||||
### H2. Encryption Key Derivation Uses Raw SHA-256 Instead of a KDF
|
||||
|
||||
**File**: `src-tauri/src/integrations/auth.rs`, lines 185-191
|
||||
|
||||
```rust
|
||||
fn derive_aes_key() -> Result<[u8; 32], String> {
|
||||
let key_material = get_encryption_key_material()?;
|
||||
let digest = Sha256::digest(key_material.as_bytes());
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
**Issue**: The AES-256-GCM key is derived from the raw material by a single SHA-256 hash. There is no salt and no iteration count. This means if the key material has low entropy (as the dev key does), the derived key is trivially brute-forceable. In contrast, the database encryption properly uses PBKDF2-HMAC-SHA512 with 256,000 iterations (line 69 of `connection.rs`).
|
||||
|
||||
**Recommended Fix**: Use a proper KDF (PBKDF2, Argon2, or HKDF) with a persisted random salt and sufficient iteration count for deriving the AES key. The `db/connection.rs` module already demonstrates the correct approach.
|
||||
|
||||
---
|
||||
|
||||
### H3. Release Build Fails Open if TFTSR_ENCRYPTION_KEY is Unset
|
||||
|
||||
**File**: `src-tauri/src/integrations/auth.rs`, line 182
|
||||
|
||||
```rust
|
||||
Err("TFTSR_ENCRYPTION_KEY must be set in release builds".to_string())
|
||||
```
|
||||
|
||||
**Issue**: In release mode, if the `TFTSR_ENCRYPTION_KEY` environment variable is not set, any attempt to store or retrieve credentials will fail with an error. Unlike the database key management (which auto-generates and persists a key), credential encryption requires manual environment variable configuration. For a desktop app distributed to end users, this is an unworkable UX: users will never set this variable, meaning credential storage will be broken out of the box in release builds.
|
||||
|
||||
**Recommended Fix**: Mirror the database key management pattern: auto-generate a random key on first use, persist it to a file in the app data directory with 0600 permissions (as already done for `.dbkey`), and read it back on subsequent launches.
|
||||
|
||||
---
|
||||
|
||||
### H4. API Keys Transmitted to Frontend via IPC and Stored in Memory
|
||||
|
||||
**File**: `src/stores/settingsStore.ts`, lines 56-63
|
||||
**Also**: `src-tauri/src/state.rs`, line 12 (`api_key` field in `ProviderConfig`)
|
||||
|
||||
**Issue**: The `ProviderConfig` struct includes `api_key: String` which is serialized over Tauri's IPC bridge from Rust to TypeScript and back. The settings store correctly strips API keys before persisting to `localStorage` (line 60: `api_key: ""`), which is good. However, the full API key lives in the Zustand store in browser memory for the duration of the session. If the webview's JavaScript context is compromised (e.g., via a future XSS or a malicious Tauri plugin), the API key is accessible.
|
||||
|
||||
**Recommended Fix**: Store API keys exclusively in the Rust backend (encrypted in the database). The frontend should only send a provider identifier; the backend should look up the key internally before making API calls. This eliminates API keys from the IPC surface entirely.
|
||||
|
||||
---
|
||||
|
||||
### H5. Filesystem Capabilities Are Overly Broad
|
||||
|
||||
**File**: `src-tauri/capabilities/default.json`, lines 16-24
|
||||
|
||||
```json
|
||||
"fs:allow-read",
|
||||
"fs:allow-write",
|
||||
"fs:allow-mkdir",
|
||||
```
|
||||
|
||||
**Issue**: The capabilities include `fs:allow-read` and `fs:allow-write` without scope constraints (in addition to the properly scoped `fs:scope-app-recursive` and `fs:scope-temp-recursive`). The unscoped `fs:allow-read`/`fs:allow-write` permissions may override the scope restrictions, potentially allowing the frontend JavaScript to read or write arbitrary files on the filesystem depending on Tauri 2.x ACL resolution order.
|
||||
|
||||
**Recommended Fix**: Remove the unscoped `fs:allow-read`, `fs:allow-write`, and `fs:allow-mkdir` permissions. Keep only the scoped variants (`fs:allow-app-read-recursive`, `fs:allow-app-write-recursive`, `fs:allow-temp-read-recursive`, `fs:allow-temp-write-recursive`) plus the `fs:scope-*` directives. File dialog operations (`dialog:allow-open`, `dialog:allow-save`) already handle user-initiated file access.
|
||||
|
||||
---
|
||||
|
||||
## MEDIUM Findings
|
||||
|
||||
### M1. Export Document Accepts Arbitrary Output Directory Without Validation
|
||||
|
||||
**File**: `src-tauri/src/commands/docs.rs`, lines 154-162
|
||||
|
||||
```rust
|
||||
let base_dir = if output_dir.is_empty() || output_dir == "." {
|
||||
dirs::download_dir().unwrap_or_else(|| { ... })
|
||||
} else {
|
||||
PathBuf::from(&output_dir)
|
||||
};
|
||||
```
|
||||
|
||||
**Issue**: The `export_document` command accepts an `output_dir` string from the frontend and writes files to it without canonicalization or path validation. While the frontend likely provides a dialog-selected path, a compromised frontend could write files to arbitrary directories (e.g., `../../etc/cron.d/` on Linux). There is no check that `output_dir` is within an expected scope.
|
||||
|
||||
**Recommended Fix**: Canonicalize the path and validate it against an allowlist of directories (Downloads, app data, or user-selected via dialog). Reject paths containing `..` or pointing to system directories.
|
||||
|
||||
---
|
||||
|
||||
### M2. OAuth Callback Server Listens on Fixed Port Without CSRF Protection
|
||||
|
||||
**File**: `src-tauri/src/integrations/callback_server.rs`, lines 14-33
|
||||
|
||||
**Issue**: The OAuth callback server binds to `127.0.0.1:8765`. While binding to localhost is correct, the server accepts any HTTP GET to `/callback?code=...&state=...` without verifying the origin of the request. A malicious local process or a webpage with access to `localhost` could forge a callback request. The `state` parameter provides some CSRF protection, but it is stored in a global `HashMap` without TTL, meaning stale state values persist indefinitely.
|
||||
|
||||
**Recommended Fix**:
|
||||
- Add a TTL (e.g., 10 minutes) to OAuth state entries to prevent stale state accumulation.
|
||||
- Consider using a random high port instead of the fixed 8765 to reduce predictability.
|
||||
|
||||
---
|
||||
|
||||
### M3. Audit Log Hash Chain is Appendable but Not Verifiable
|
||||
|
||||
**File**: `src-tauri/src/audit/log.rs`, lines 4-16
|
||||
|
||||
**Issue**: The audit log implements a hash chain (each entry includes the hash of the previous entry), which is good for tamper detection. However, there is no command or function to verify the integrity of the chain. An attacker with database access could modify entries and recompute all subsequent hashes. Without an external anchor (e.g., periodic hash checkpoint to an external store), the chain only proves ordering, not immutability.
|
||||
|
||||
**Recommended Fix**: Add a `verify_audit_chain()` function and consider periodically exporting chain checkpoints to a file outside the database. Document the threat model in `SECURITY.md`.
|
||||
|
||||
---
|
||||
|
||||
### M4. Non-Windows Key File Permissions Not Enforced
|
||||
|
||||
**File**: `src-tauri/src/db/connection.rs`, lines 25-28
|
||||
|
||||
```rust
|
||||
#[cfg(not(unix))]
|
||||
fn write_key_file(path: &Path, key: &str) -> anyhow::Result<()> {
|
||||
std::fs::write(path, key)?;
|
||||
Ok(())
|
||||
}
|
||||
```
|
||||
|
||||
**Issue**: On non-Unix platforms (Windows), the database key file is written with default permissions, potentially making it world-readable. The Unix path correctly uses mode `0o600`.
|
||||
|
||||
**Recommended Fix**: On Windows, use platform-specific ACL APIs to restrict the key file to the current user, or at minimum document this limitation.
|
||||
|
||||
---
|
||||
|
||||
### M5. `unsafe-inline` in Style CSP Directive
|
||||
|
||||
**File**: `src-tauri/tauri.conf.json`, line 13
|
||||
|
||||
```
|
||||
style-src 'self' 'unsafe-inline'
|
||||
```
|
||||
|
||||
**Issue**: The CSP allows `unsafe-inline` for styles. While this is common in React/Tailwind applications and the attack surface is lower than `unsafe-inline` for scripts, it still permits style-based data exfiltration attacks (e.g., CSS injection to leak attribute values).
|
||||
|
||||
**Recommended Fix**: If feasible, use nonce-based or hash-based style CSP. If not feasible due to Tailwind's runtime style injection, document this as an accepted risk.
|
||||
|
||||
---
|
||||
|
||||
## LOW Findings
|
||||
|
||||
### L1. `http:default` Capability Grants Broad Network Access
|
||||
|
||||
**File**: `src-tauri/capabilities/default.json`, line 28
|
||||
|
||||
**Issue**: The `http:default` permission allows the frontend to make arbitrary HTTP requests. Combined with the broad CSP `connect-src`, this gives the webview significant network access. For a desktop app this is often necessary, but it should be documented and reviewed.
|
||||
|
||||
**Recommended Fix**: Consider restricting `http` permissions to specific URL patterns matching only the known AI provider APIs and integration endpoints.
|
||||
|
||||
---
|
||||
|
||||
### L2. IntelliJ IDEA Config Files Tracked in Git
|
||||
|
||||
**Files**:
|
||||
- `.idea/.gitignore`
|
||||
- `.idea/copilot.data.migration.ask2agent.xml`
|
||||
- `.idea/misc.xml`
|
||||
- `.idea/modules.xml`
|
||||
- `.idea/tftsr-devops_investigation.iml`
|
||||
- `.idea/vcs.xml`
|
||||
|
||||
**Issue**: IDE configuration files are tracked. These may leak editor preferences and do not belong in an open-source repository.
|
||||
|
||||
**Recommended Fix**: Add `.idea/` to `.gitignore` and remove from tracking with `git rm -r --cached .idea/`.
|
||||
|
||||
---
|
||||
|
||||
### L3. Placeholder OAuth Client IDs in Source
|
||||
|
||||
**File**: `src-tauri/src/commands/integrations.rs`, lines 181, 187
|
||||
|
||||
```rust
|
||||
"confluence-client-id-placeholder"
|
||||
"ado-client-id-placeholder"
|
||||
```
|
||||
|
||||
**Issue**: These placeholder strings are used as fallbacks when environment variables are not set. While they are obviously not real credentials, they could confuse users or be mistaken for actual client IDs in bug reports.
|
||||
|
||||
**Recommended Fix**: Make the OAuth flow fail explicitly with a clear error message when the client ID environment variable is not set, rather than falling back to a placeholder.
|
||||
|
||||
---
|
||||
|
||||
### L4. Username `sarman` Embedded in CI Workflows and Makefile
|
||||
|
||||
**Files**: `.gitea/workflows/*.yml`, `Makefile` line 2
|
||||
|
||||
**Issue**: The developer's username appears throughout CI configuration. While not a security vulnerability per se, it is a privacy concern for open-source release.
|
||||
|
||||
**Recommended Fix**: Parameterize the username in CI workflows. Update the Makefile to use a generic repository reference.
|
||||
|
||||
---
|
||||
|
||||
### L5. `shell:allow-open` Capability Enabled
|
||||
|
||||
**File**: `src-tauri/capabilities/default.json`, line 27
|
||||
|
||||
**Issue**: The `shell:allow-open` permission allows the frontend to open URLs in the system browser. This is used for OAuth flows and external links. While convenient, a compromised frontend could open arbitrary URLs.
|
||||
|
||||
**Recommended Fix**: This is acceptable for the app's functionality but should be documented. Consider restricting to specific URL patterns if Tauri 2.x supports it.
|
||||
|
||||
---
|
||||
|
||||
## Positive Security Observations
|
||||
|
||||
The following practices are already well-implemented:
|
||||
|
||||
1. **Parameterized SQL queries**: All database operations use `rusqlite::params![]` with positional parameters. No string interpolation in SQL. The dynamic query builder in `list_issues` and `get_audit_log` correctly uses indexed parameter placeholders.
|
||||
|
||||
2. **SQLCipher encryption at rest**: Release builds encrypt the database using AES-256-CBC via SQLCipher with PBKDF2-HMAC-SHA512 (256k iterations).
|
||||
|
||||
3. **PII detection and mandatory redaction**: Log files must pass PII detection and redaction before being sent to AI providers (`redacted_path_for()` enforces this check).
|
||||
|
||||
4. **PKCE for OAuth**: The OAuth implementation uses PKCE (S256) with cryptographically random verifiers.
|
||||
|
||||
5. **Hash-chained audit log**: Every security-relevant action is logged with a SHA-256 hash chain.
|
||||
|
||||
6. **Path traversal prevention**: `upload_log_file` uses `std::fs::canonicalize()` and validates the result is a regular file with size limits.
|
||||
|
||||
7. **No `dangerouslySetInnerHTML` or `eval()`**: The frontend renders AI responses as plain text via `{msg.content}` in JSX, preventing XSS from AI model output.
|
||||
|
||||
8. **API key scrubbing from localStorage**: The settings store explicitly strips `api_key` before persisting (line 60 of `settingsStore.ts`).
|
||||
|
||||
9. **No shell command injection**: All `std::process::Command` calls use hardcoded binary names with literal arguments. No user input is passed to shell commands.
|
||||
|
||||
10. **No secrets in git history**: `.gitignore` properly excludes `.env`, `.secrets`, `secrets.yml`, and related files. No private keys or certificates are tracked.
|
||||
|
||||
11. **Mutex guards not held across await points**: The codebase correctly drops `MutexGuard` before `.await` by scoping locks inside `{ }` blocks.
|
||||
|
||||
---
|
||||
|
||||
## Recommendations Summary (Priority Order)
|
||||
|
||||
| Priority | Action | Effort |
|
||||
|----------|--------|--------|
|
||||
| **P0** | Remove `GenAI API User Guide.md` and `HANDOFF-TFTSR-GENAI.md` from repo and git history | Small |
|
||||
| **P0** | Remove `commandcentral.com` URLs from CSP and hardcoded TFTSR headers from `openai.rs` | Small |
|
||||
| **P0** | Replace or parameterize private IP (`gitea.tftsr.com`) and username in all `.gitea/` workflows | Medium |
|
||||
| **P1** | Replace hardcoded dev encryption keys with auto-generated per-install keys | Small |
|
||||
| **P1** | Use proper KDF (PBKDF2/HKDF) for AES key derivation in `auth.rs` | Small |
|
||||
| **P1** | Auto-generate encryption key for credential storage (mirror `connection.rs` pattern) | Small |
|
||||
| **P1** | Remove unscoped `fs:allow-read`/`fs:allow-write` from capabilities | Small |
|
||||
| **P2** | Move API key storage to backend-only (remove from IPC surface) | Medium |
|
||||
| **P2** | Add path validation to `export_document` output directory | Small |
|
||||
| **P2** | Add TTL to OAuth state entries | Small |
|
||||
| **P2** | Add audit chain verification function | Small |
|
||||
| **P3** | Remove `.idea/` from git tracking | Trivial |
|
||||
| **P3** | Replace placeholder OAuth client IDs with explicit errors | Trivial |
|
||||
| **P3** | Parameterize username in CI/Makefile | Small |
|
||||
|
||||
---
|
||||
|
||||
*Report generated by security audit of git-tracked source files at commit HEAD on feature/ai-tool-calling-integration-search branch.*
|
||||
@ -900,23 +900,23 @@ GitHub Copilot performed automated code review across 3 rounds with 10 findings
|
||||
- **Parent Feature**: #744142
|
||||
|
||||
### GitHub
|
||||
- **Repository**: https://github.com/tftsr/apollo_nxt-trcaa
|
||||
- **PR #27**: https://github.com/tftsr/apollo_nxt-trcaa/pull/27 (v1.0.0 - Initial hackathon)
|
||||
- **PR #28**: https://github.com/tftsr/apollo_nxt-trcaa/pull/28 (v1.0.0 - Copilot fixes)
|
||||
- **PR #29**: https://github.com/tftsr/apollo_nxt-trcaa/pull/29 (v1.0.1 - Security updates)
|
||||
- **PR #31**: https://github.com/tftsr/apollo_nxt-trcaa/pull/31 (v1.0.2 - LiteLLM + bug fixes)
|
||||
- **PR #37**: https://github.com/tftsr/apollo_nxt-trcaa/pull/37 (v1.0.3 - Query classification)
|
||||
- **PR #38**: https://github.com/tftsr/apollo_nxt-trcaa/pull/38 (v1.0.4 - Graceful exit + TFTSR GenAI)
|
||||
- **PR #39**: https://github.com/tftsr/apollo_nxt-trcaa/pull/39 (v1.0.5 - Agent output + provider docs)
|
||||
- **PR #40**: https://github.com/tftsr/apollo_nxt-trcaa/pull/40 (v1.0.6 - JSON example removal)
|
||||
- **PR #41**: https://github.com/tftsr/apollo_nxt-trcaa/pull/41 (v1.0.7 - Ollama function calling)
|
||||
- **Repository**: https://github.com/tftsr/apollo_nxt-tftsr
|
||||
- **PR #27**: https://github.com/tftsr/apollo_nxt-tftsr/pull/27 (v1.0.0 - Initial hackathon)
|
||||
- **PR #28**: https://github.com/tftsr/apollo_nxt-tftsr/pull/28 (v1.0.0 - Copilot fixes)
|
||||
- **PR #29**: https://github.com/tftsr/apollo_nxt-tftsr/pull/29 (v1.0.1 - Security updates)
|
||||
- **PR #31**: https://github.com/tftsr/apollo_nxt-tftsr/pull/31 (v1.0.2 - LiteLLM + bug fixes)
|
||||
- **PR #37**: https://github.com/tftsr/apollo_nxt-tftsr/pull/37 (v1.0.3 - Query classification)
|
||||
- **PR #38**: https://github.com/tftsr/apollo_nxt-tftsr/pull/38 (v1.0.4 - Graceful exit + TFTSR GenAI)
|
||||
- **PR #39**: https://github.com/tftsr/apollo_nxt-tftsr/pull/39 (v1.0.5 - Agent output + provider docs)
|
||||
- **PR #40**: https://github.com/tftsr/apollo_nxt-tftsr/pull/40 (v1.0.6 - JSON example removal)
|
||||
- **PR #41**: https://github.com/tftsr/apollo_nxt-tftsr/pull/41 (v1.0.7 - Ollama function calling)
|
||||
- **Releases**:
|
||||
- v1.0.0: https://github.com/tftsr/apollo_nxt-trcaa/releases/tag/v1.0.0
|
||||
- v1.0.0: https://github.com/tftsr/apollo_nxt-tftsr/releases/tag/v1.0.0
|
||||
- v1.0.1-v1.0.6: Merged, pending release build
|
||||
- v1.0.7: In review (PR #41)
|
||||
|
||||
### Documentation
|
||||
- **Wiki**: https://github.com/tftsr/apollo_nxt-trcaa/wiki/Shell-Execution
|
||||
- **Wiki**: https://github.com/tftsr/apollo_nxt-tftsr/wiki/Shell-Execution
|
||||
- **Architecture**: docs/architecture/
|
||||
- **CLAUDE.md**: Repository root
|
||||
- **TFTSR GenAI Bug Report**: /tmp/TFTSRGenAI-ToolCalling-Bug-Report.md
|
||||
|
||||
@ -1,250 +0,0 @@
|
||||
# 2026 Hackathon Submission: TRCAA
|
||||
|
||||
**Project**: TRCAA (Troubleshooting and RCA Assistant)
|
||||
**Feature**: Autonomous AI-Powered Incident Triage with Shell Command Execution
|
||||
**Developer**: Shaun Arman (VFK387)
|
||||
**ADO Work Item**: [#727547](https://dev.azure.com/tftsr/Apollo/_workitems/edit/727547)
|
||||
|
||||
---
|
||||
|
||||
## Problem to Solve
|
||||
|
||||
An alert fires, engineers swarm it, someone eventually finds the root cause, and then the post-mortem gets written from memory three days later with half the context already gone. The process loses information at every handoff.
|
||||
|
||||
**Current workflow pain points:**
|
||||
- Incident context scattered across Slack, PagerDuty, logs, and memory
|
||||
- Manual command execution slows triage (copy terminal output → paste → ask AI → repeat)
|
||||
- Cloud SaaS RCA tools require uploading sensitive production data
|
||||
- Generic AI assistants lack infrastructure domain expertise
|
||||
- Post-mortems written days later miss critical context
|
||||
|
||||
---
|
||||
|
||||
## Our Solution
|
||||
|
||||
**TRCAA is a local-first, AI-powered incident triage assistant that autonomously executes diagnostic commands while you work.**
|
||||
|
||||
### Core Innovation: Agentic Shell Execution
|
||||
The AI doesn't just suggest commands—it executes them directly with intelligent safety controls:
|
||||
|
||||
**Three-Tier Safety System:**
|
||||
- **Tier 1 (Auto-Execute)**: Read-only diagnostics (`kubectl get`, `grep`, `ps`) run immediately
|
||||
- **Tier 2 (User Approval)**: Mutating operations (`kubectl scale`, `systemctl restart`) require explicit consent
|
||||
- **Tier 3 (Always Deny)**: Destructive commands (`rm -rf`, `shutdown`) automatically blocked
|
||||
|
||||
**Example:** You say *"Why is the nginx pod crashing?"* — the AI autonomously runs `kubectl get pods`, `kubectl describe`, and `kubectl logs`, analyzes the output, and explains the root cause. No copy-paste, no manual terminal work.
|
||||
|
||||
### Key Differentiators
|
||||
|
||||
**Local-First Architecture:**
|
||||
- SQLCipher AES-256 encrypted local storage (not cloud SaaS)
|
||||
- Offline-capable via Ollama local AI models
|
||||
- PII auto-detection and redaction before any cloud API calls
|
||||
- Tamper-evident hash-chained audit log
|
||||
|
||||
**Infrastructure Domain Expertise:**
|
||||
- Pre-built expert context for 16 domains: Linux (RHEL/OEL), Windows, Kubernetes (k3s/OpenShift/Rancher), Networking (Fortigate/Cisco/Aruba), Databases (PostgreSQL/Redis/RabbitMQ), Proxmox, HPE Synergy/iLO, Observability (Kibana/Elasticsearch)
|
||||
- AI understands your stack's specifics, not generic troubleshooting
|
||||
|
||||
**Multi-Cluster Kubernetes Support:**
|
||||
- Upload multiple kubeconfig files with encrypted AES-256-GCM storage
|
||||
- Bundled kubectl v1.30.0 (no external dependencies)
|
||||
- Switch contexts seamlessly during triage
|
||||
|
||||
**Provider-Agnostic AI:**
|
||||
- OpenAI, Anthropic Claude, Google Gemini, Mistral, AWS Bedrock (via LiteLLM), local Ollama
|
||||
- Auto-detect tool calling support for custom providers
|
||||
- No vendor lock-in
|
||||
|
||||
---
|
||||
|
||||
## What We Built (v1.0.0 → v1.0.9)
|
||||
|
||||
### Initial Hackathon Release (v1.0.0)
|
||||
**35 files changed, +4089 lines**
|
||||
- Shell execution module with three-tier classifier (19 tests, 100% coverage)
|
||||
- kubectl binary bundling for all platforms
|
||||
- Real-time approval modal UI
|
||||
- 4 new database tables (migrations 024-027)
|
||||
- 7 Tauri commands + 1 AI tool registration
|
||||
- Cross-platform CI/CD with GitHub Actions
|
||||
|
||||
### Post-Hackathon Iterations (v1.0.1 → v1.0.9)
|
||||
**24 additional PRs merged in 48 hours**, addressing real-world usage issues:
|
||||
|
||||
**v1.0.1-v1.0.2**: Security updates (vitest 4.1.8, postcss, vite), LiteLLM AWS Bedrock support, Ollama auto-start
|
||||
**v1.0.3-v1.0.4**: Query classification (prevents AI from running 20+ commands for simple questions), graceful iteration limit handling, TFTSR GenAI gateway support
|
||||
**v1.0.5-v1.0.6**: Agent prompt cleanup (fixed JSON output in natural language responses)
|
||||
**v1.0.7**: Ollama function calling support (tools parameter was ignored)
|
||||
**v1.0.8**: Connection reliability (180s timeout, health checks, 3-attempt retry logic), model recommendations (≥3B parameters required)
|
||||
**v1.0.9** (PR #44, in review): Auto-detect tool calling support—eliminates guesswork about whether custom AI providers support function calling
|
||||
|
||||
**Total impact:** 60 files modified, ~6,100 lines of production code, 297 backend + 134 frontend tests passing
|
||||
|
||||
---
|
||||
|
||||
## The Competitive Landscape
|
||||
|
||||
### What Exists (Cloud SaaS)
|
||||
- **Rootly**: Automates postmortem/RCA process (cloud SaaS, subscription)
|
||||
- **incident.io**: Triaging/investigating alerts in Slack/Teams (cloud SaaS, data leaves network)
|
||||
- **Xurrent**: Auto-compiles postmortems from logs/metrics (cloud SaaS)
|
||||
- **TraceRoot** (AWS Marketplace): 5-step investigation with AI assist (cloud SaaS, compliance framing)
|
||||
|
||||
**Critical gap:** Every competitor is cloud-hosted SaaS requiring sensitive incident data to leave your network.
|
||||
|
||||
### What Doesn't Exist
|
||||
**No tool combines:**
|
||||
- Local-first + offline-capable execution
|
||||
- Encrypted local storage (SQLCipher AES-256)
|
||||
- PII sanitization before AI send
|
||||
- Provider-agnostic AI (swap models without workflow changes)
|
||||
- Infrastructure domain depth (16 pre-built expert contexts)
|
||||
- Autonomous command execution with safety controls
|
||||
- Tamper-evident audit trail
|
||||
- Air-gap capable (via Ollama local models)
|
||||
|
||||
**TRCAA occupies this unique gap.**
|
||||
|
||||
### Where We Win vs SaaS
|
||||
| Dimension | TRCAA | SaaS Competitors |
|
||||
|-----------|-------|------------------|
|
||||
| **Privacy** | All data local, encrypted | Incident logs on vendor servers |
|
||||
| **Air-gap capable** | Yes (Ollama local models) | No (requires cloud) |
|
||||
| **Cost** | One-time install | Per-seat subscription fees |
|
||||
| **Domain depth** | 16 pre-built infrastructure contexts | Generalist troubleshooting |
|
||||
| **Provider choice** | 6 AI providers + custom | Vendor-locked backend |
|
||||
| **PII protection** | Auto-redact before send | Raw logs ingested |
|
||||
| **Compliance** | Hash-chained audit trail | Varies by vendor |
|
||||
|
||||
### Where SaaS Wins
|
||||
- **Alert integration**: PagerDuty/Datadog/CloudWatch auto-triggers (TRCAA is manually initiated)
|
||||
- **Team collaboration**: Multiple engineers on same incident simultaneously (TRCAA is single-user)
|
||||
- **Observability correlation**: Tight integration with metrics/traces (incident.io cuts context-switching from 15min → 30sec)
|
||||
|
||||
**Target market:** Regulated-industry DevOps teams, defense contractors, small MSPs, air-gapped environments, solo infrastructure engineers who prioritize privacy and cost over team collaboration features.
|
||||
|
||||
---
|
||||
|
||||
## Technical Highlights
|
||||
|
||||
**Backend (Rust + Tauri):**
|
||||
- Three-tier command classifier with pipe/chain analysis and tier escalation
|
||||
- Platform-specific shell execution (`cmd /C` on Windows, `sh -c` on Unix)
|
||||
- AES-256-GCM kubeconfig encryption with hand-rolled YAML parser (licensing constraints)
|
||||
- 30-second command timeout with environment isolation (strips `AWS_ACCESS_KEY_ID`, etc.)
|
||||
- Hash-chained audit log (tamper-evident)
|
||||
|
||||
**Frontend (React + TypeScript):**
|
||||
- Real-time approval modal with risk factor display
|
||||
- Multi-cluster kubeconfig manager with drag-drop upload
|
||||
- Execution history with exit codes and timing
|
||||
- Settings UI for tier architecture visualization
|
||||
|
||||
**CI/CD (GitHub Actions):**
|
||||
- Multi-platform builds: Linux (amd64/arm64 DEB/RPM), macOS (Intel/ARM DMG), Windows (NSIS)
|
||||
- kubectl binary auto-bundled for all platforms
|
||||
- Branch protection requires passing tests + Copilot review before merge
|
||||
|
||||
**Quality Assurance:**
|
||||
- 297 backend tests + 134 frontend tests (100% classifier coverage)
|
||||
- 3 rounds of GitHub Copilot automated review (10 security/reliability findings, all resolved)
|
||||
- Zero Clippy warnings, zero TypeScript errors
|
||||
- TDD approach throughout
|
||||
|
||||
---
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
### What Went Well
|
||||
- TDD caught bugs early (19 classifier tests prevented regressions)
|
||||
- Three-tier classification proved robust in real usage
|
||||
- GitHub Copilot review identified real security issues (prompt injection risk, tool call dropping)
|
||||
- Rapid iteration post-launch (24 PRs in 48 hours) addressed real user pain points
|
||||
|
||||
### What We'd Improve
|
||||
- Should have built multi-context kubeconfig support in v1.0.0 (added v1.0.9)
|
||||
- Domain prompts initially didn't instruct AI to use shell execution tool (fixed v1.0.1)
|
||||
- Integration tests need more coverage (mostly unit tests currently)
|
||||
- Should have updated hackathon summary after each PR merge (created documentation debt)
|
||||
|
||||
### Challenges Solved
|
||||
1. **Cross-platform shell execution**: `sh -c` doesn't exist on Windows → platform-specific shell selection with `cfg!` macros
|
||||
2. **AI over-investigation**: Simple query "What pods are running?" triggered 20+ commands → three-tier query classification (Simple/Diagnostic/Incident)
|
||||
3. **Ollama function calling**: Provider ignored `tools` parameter → implemented proper tool formatting in request body
|
||||
4. **Connection reliability**: Intermittent timeouts → extended timeout (180s for tool calling), health checks, 3-attempt retry logic
|
||||
5. **Tool calling detection**: Users unsure if custom providers support it → auto-detect with test tool call (v1.0.9)
|
||||
|
||||
---
|
||||
|
||||
## Impact Metrics
|
||||
|
||||
**Development Time:**
|
||||
- Initial hackathon (v1.0.0): ~44 hours
|
||||
- Post-release iterations (v1.0.1-v1.0.9): ~28 hours
|
||||
- **Total: ~72 hours**
|
||||
|
||||
**Code Produced:**
|
||||
- Rust: ~2,200 lines (shell module + commands + AI improvements)
|
||||
- TypeScript/React: ~900 lines (components + types)
|
||||
- Tests: ~800 lines (431 tests total)
|
||||
- Documentation: ~2,200 lines (wiki + summaries)
|
||||
- **Total: ~6,100 lines**
|
||||
|
||||
**PRs Merged:** 25 PRs (v1.0.0 initial + 24 post-release iterations)
|
||||
|
||||
**Real-World Usage:** Reduced troubleshooting time from "copy terminal output → paste → ask AI → repeat" loop to autonomous execution with sub-second command completion.
|
||||
|
||||
---
|
||||
|
||||
## Future Roadmap
|
||||
|
||||
**Immediate (v1.1.0):**
|
||||
- Multi-context kubeconfig support (currently first context only)
|
||||
- PII blocking mode (auto-escalate to Tier 2 when PII detected)
|
||||
- Command templates (pre-defined diagnostic runbooks)
|
||||
|
||||
**Near-term (v1.2.0):**
|
||||
- Team collaboration (multi-user on same incident)
|
||||
- Alert integration (PagerDuty/Datadog webhooks auto-open issues)
|
||||
- Execution rollback (undo last command where possible)
|
||||
|
||||
**Long-term:**
|
||||
- Terraform/Ansible command support
|
||||
- Database query execution (read-only mode)
|
||||
- Log streaming (tail -f equivalent)
|
||||
- SSH agent integration for direct remote execution
|
||||
|
||||
---
|
||||
|
||||
## Documentation Delivered
|
||||
|
||||
- **docs/wiki/Shell-Execution.md**: 700+ line comprehensive guide (architecture, API reference, 6 manual integration tests, troubleshooting)
|
||||
- **docs/wiki/AI-Providers.md**: Provider comparison, tool calling compatibility matrix
|
||||
- **docs/2026-HACKATHON-SUMMARY.md**: 940-line detailed project chronicle
|
||||
- **CLAUDE.md**: Updated architecture documentation
|
||||
- **.github/COPILOT_SETUP.md**: Code review configuration
|
||||
- **docs/v1.0.{1-8}-summary.md**: Per-version release notes
|
||||
|
||||
---
|
||||
|
||||
## Try It Yourself
|
||||
|
||||
**Install:** Download from [GitHub Releases](https://github.com/tftsr/apollo_nxt-trcaa/releases)
|
||||
**Quick Start:**
|
||||
1. Upload a kubeconfig via Settings → Kubeconfig Manager
|
||||
2. Create new issue, select "Kubernetes" domain
|
||||
3. Ask: *"What pods are in the default namespace?"*
|
||||
4. Watch the AI autonomously execute `kubectl get pods -n default` and explain the results
|
||||
|
||||
**No cloud required** — works fully offline with Ollama local models.
|
||||
|
||||
---
|
||||
|
||||
## Team Members We're Looking For
|
||||
|
||||
N/A (solo project)
|
||||
|
||||
---
|
||||
|
||||
**Fun Fact:** This entire feature—from zero to production with 431 passing tests, 25 merged PRs, and comprehensive documentation—was built in 72 hours while maintaining zero Clippy warnings and zero TypeScript errors. The three-tier safety classifier has handled 100+ real diagnostic commands without a single false-positive denial.
|
||||
@ -1,84 +0,0 @@
|
||||
# 2026 Hackathon: TRCAA
|
||||
|
||||
**Developer**: Shaun Arman (VFK387) | **ADO**: [#727547](https://dev.azure.com/tftsr/Apollo/_workitems/edit/727547)
|
||||
|
||||
---
|
||||
|
||||
## Problem to Solve
|
||||
|
||||
An alert fires, engineers swarm it, someone finds the root cause, and the post-mortem gets written from memory three days later with half the context gone. The process loses information at every handoff. Current pain: manual command execution slows triage (copy terminal → paste → ask AI → repeat), cloud SaaS tools require uploading sensitive production data, generic AI lacks infrastructure expertise.
|
||||
|
||||
---
|
||||
|
||||
## Our Solution
|
||||
|
||||
**TRCAA: Local-first AI-powered incident triage that autonomously executes diagnostic commands.**
|
||||
|
||||
### Core Innovation: Agentic Shell Execution
|
||||
The AI doesn't suggest commands—it executes them with intelligent safety:
|
||||
|
||||
**Three-Tier Safety:**
|
||||
- **Tier 1**: Read-only (`kubectl get`, `grep`) auto-execute
|
||||
- **Tier 2**: Mutating (`kubectl scale`) require approval
|
||||
- **Tier 3**: Destructive (`rm -rf`) auto-blocked
|
||||
|
||||
**Example:** *"Why is nginx pod crashing?"* → AI runs `kubectl get/describe/logs`, analyzes output, explains root cause. No copy-paste.
|
||||
|
||||
### Unique Features
|
||||
- **Local-first**: SQLCipher AES-256 encrypted storage, offline via Ollama, PII auto-redact, tamper-evident audit
|
||||
- **Domain expertise**: 16 pre-built contexts (Linux RHEL/OEL, Windows, K8s, networking, databases, Proxmox, HPE, observability)
|
||||
- **Multi-cluster K8s**: Encrypted kubeconfig storage, bundled kubectl v1.30.0
|
||||
- **Provider-agnostic**: OpenAI, Claude, Gemini, Mistral, Bedrock, Ollama + auto-detect tool calling
|
||||
|
||||
---
|
||||
|
||||
## What We Built
|
||||
|
||||
**v1.0.0** (44 hrs): 35 files, +4089 lines, shell execution module, three-tier classifier (19 tests/100% coverage), approval modal UI, CI/CD
|
||||
|
||||
**v1.0.1-v1.0.9** (28 hrs, 24 PRs in 48 hrs): Security updates, LiteLLM Bedrock, Ollama auto-start + function calling, query classification (prevents AI over-investigation), connection reliability (180s timeout, health checks, retry logic), tool calling auto-detect
|
||||
|
||||
**Total**: 25 PRs, ~84 files, ~6,100 lines, 431 tests, 72 hours
|
||||
|
||||
---
|
||||
|
||||
## Competitive Landscape
|
||||
|
||||
**SaaS exists**: Rootly, incident.io, Xurrent, TraceRoot—all cloud, subscriptions, data leaves network
|
||||
|
||||
**TRCAA uniquely combines**: Local-first + offline + encrypted + PII sanitization + provider-agnostic (6 providers) + 16 domain contexts + autonomous shell execution + tamper-evident audit + air-gap capable
|
||||
|
||||
**We win on**: Privacy (local encrypted), air-gap (Ollama), cost (no per-seat fees), domain depth
|
||||
**SaaS wins on**: Alert integration (PagerDuty/Datadog), team collaboration, observability correlation
|
||||
|
||||
**Target**: Regulated industries, defense, air-gapped environments, privacy-focused teams
|
||||
|
||||
---
|
||||
|
||||
## Technical Highlights
|
||||
|
||||
**Backend (Rust)**: Three-tier classifier with pipe/chain analysis, AES-256-GCM encryption, hash-chained audit, 297 tests
|
||||
**Frontend (React)**: Real-time approval modal, multi-cluster manager, 134 tests
|
||||
**CI/CD**: Multi-platform builds (Linux amd64/arm64, macOS, Windows), kubectl bundled, branch protection
|
||||
|
||||
**Quality**: 3 rounds Copilot review (10 findings resolved), zero Clippy warnings, zero TypeScript errors
|
||||
|
||||
---
|
||||
|
||||
## Impact
|
||||
|
||||
**Development**: 72 hours, 25 PRs, ~6,100 lines, 431 tests
|
||||
**Real-world**: Reduced triage from manual copy-paste loop to autonomous sub-second execution
|
||||
**Security**: 3 Copilot security findings resolved (prompt injection, tool call dropping, sanitization)
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
[GitHub Releases](https://github.com/tftsr/apollo_nxt-trcaa/releases) → Upload kubeconfig → Ask *"What pods in default namespace?"* → Watch AI auto-execute. Works fully offline with Ollama.
|
||||
|
||||
---
|
||||
|
||||
## Fun Fact
|
||||
|
||||
Zero to production with 431 passing tests, 25 PRs, comprehensive docs in 72 hours. Zero Clippy warnings. Zero TypeScript errors. 100+ real commands executed without a single false-positive denial.
|
||||
@ -1,160 +0,0 @@
|
||||
# 2026 Hackathon Submission: TRCAA
|
||||
|
||||
**Developer**: Shaun Arman (VFK387)
|
||||
**ADO**: [#727547](https://dev.azure.com/tftsr/Apollo/_workitems/edit/727547)
|
||||
|
||||
---
|
||||
|
||||
## Problem to Solve
|
||||
|
||||
An alert fires, engineers swarm it, someone eventually finds the root cause, and then the post-mortem gets written from memory three days later with half the context already gone. The process loses information at every handoff.
|
||||
|
||||
**Pain points:**
|
||||
- Manual command execution slows triage (copy terminal → paste → ask AI → repeat)
|
||||
- Cloud SaaS RCA tools require uploading sensitive production data
|
||||
- Generic AI assistants lack infrastructure domain expertise
|
||||
- Post-mortems written days later miss critical context
|
||||
|
||||
---
|
||||
|
||||
## Our Solution
|
||||
|
||||
**TRCAA: A local-first, AI-powered incident triage assistant that autonomously executes diagnostic commands while you work.**
|
||||
|
||||
### Core Innovation: Agentic Shell Execution
|
||||
|
||||
The AI doesn't just suggest commands—it executes them with intelligent safety controls:
|
||||
|
||||
**Three-Tier Safety System:**
|
||||
- **Tier 1 (Auto-Execute)**: Read-only diagnostics (`kubectl get`, `grep`) run immediately
|
||||
- **Tier 2 (User Approval)**: Mutating operations (`kubectl scale`, `systemctl restart`) require consent
|
||||
- **Tier 3 (Always Deny)**: Destructive commands (`rm -rf`, `shutdown`) blocked
|
||||
|
||||
**Example:** You say *"Why is the nginx pod crashing?"* — the AI autonomously runs `kubectl get pods`, `kubectl describe`, `kubectl logs`, analyzes the output, and explains the root cause. No copy-paste, no manual terminal work.
|
||||
|
||||
### What Makes TRCAA Unique
|
||||
|
||||
**Local-First Architecture:**
|
||||
- SQLCipher AES-256 encrypted local storage (not cloud SaaS)
|
||||
- Offline-capable via Ollama local AI models
|
||||
- PII auto-detection and redaction before cloud API calls
|
||||
- Tamper-evident hash-chained audit log
|
||||
|
||||
**Infrastructure Domain Expertise:**
|
||||
- Pre-built expert context for 16 domains: Linux (RHEL/OEL), Windows, Kubernetes (k3s/OpenShift/Rancher), Networking (Fortigate/Cisco/Aruba), Databases (PostgreSQL/Redis/RabbitMQ), Proxmox, HPE Synergy/iLO, Observability (Kibana/Elasticsearch)
|
||||
|
||||
**Multi-Cluster Kubernetes:**
|
||||
- Upload multiple kubeconfig files with AES-256-GCM encryption
|
||||
- Bundled kubectl v1.30.0 (no external dependencies)
|
||||
|
||||
**Provider-Agnostic AI:**
|
||||
- OpenAI, Anthropic Claude, Google Gemini, Mistral, AWS Bedrock (via LiteLLM), local Ollama
|
||||
- Auto-detect tool calling support for custom providers
|
||||
- No vendor lock-in
|
||||
|
||||
---
|
||||
|
||||
## What We Built
|
||||
|
||||
**Initial Hackathon (v1.0.0):** 35 files changed, +4089 lines
|
||||
- Shell execution module with three-tier classifier (19 tests, 100% coverage)
|
||||
- Real-time approval modal UI
|
||||
- Cross-platform CI/CD with GitHub Actions
|
||||
|
||||
**Post-Hackathon Iterations (v1.0.1 → v1.0.9):** 24 PRs merged in 48 hours
|
||||
- Security updates (vitest 4.1.8, postcss, vite)
|
||||
- LiteLLM AWS Bedrock support
|
||||
- Ollama auto-start + function calling support
|
||||
- Query classification (prevents 20+ commands for simple questions)
|
||||
- Connection reliability (180s timeout, health checks, 3-attempt retry)
|
||||
- Tool calling auto-detect (eliminates guesswork about provider support)
|
||||
|
||||
**Total:** 25 PRs, ~84 files modified, ~6,100 lines, 431 tests passing, 72 hours
|
||||
|
||||
---
|
||||
|
||||
## The Competitive Landscape
|
||||
|
||||
### What Exists (Cloud SaaS)
|
||||
- **Rootly**, **incident.io**, **Xurrent**: Cloud SaaS, subscription, data leaves network
|
||||
- **TraceRoot** (AWS Marketplace): Cloud SaaS, compliance framing
|
||||
|
||||
**Critical gap:** Every competitor requires sensitive incident data to leave your network.
|
||||
|
||||
### What Doesn't Exist
|
||||
**No tool combines:**
|
||||
- Local-first + offline-capable + encrypted storage
|
||||
- PII sanitization before AI send
|
||||
- Provider-agnostic AI (6 providers + custom)
|
||||
- Infrastructure domain depth (16 pre-built expert contexts)
|
||||
- Autonomous command execution with safety controls
|
||||
- Tamper-evident audit trail
|
||||
- Air-gap capable (Ollama local models)
|
||||
|
||||
### Where We Win vs SaaS
|
||||
|
||||
| TRCAA | SaaS Competitors |
|
||||
|-------|------------------|
|
||||
| All data local, encrypted | Incident logs on vendor servers |
|
||||
| Air-gap capable (Ollama) | Requires cloud |
|
||||
| One-time install cost | Per-seat subscriptions |
|
||||
| 16 pre-built infrastructure contexts | Generalist troubleshooting |
|
||||
| 6 AI providers + custom | Vendor-locked backend |
|
||||
| Auto-redact PII before send | Raw logs ingested |
|
||||
|
||||
**Where SaaS Wins:** Alert integration (PagerDuty/Datadog auto-triggers), team collaboration (multi-user), observability correlation
|
||||
|
||||
**Target market:** Regulated-industry DevOps teams, defense contractors, air-gapped environments, solo infrastructure engineers prioritizing privacy and cost over team collaboration.
|
||||
|
||||
---
|
||||
|
||||
## Technical Highlights
|
||||
|
||||
**Backend (Rust + Tauri):**
|
||||
- Three-tier command classifier with pipe/chain analysis
|
||||
- AES-256-GCM kubeconfig encryption
|
||||
- Hash-chained audit log (tamper-evident)
|
||||
- 297 backend tests
|
||||
|
||||
**Frontend (React + TypeScript):**
|
||||
- Real-time approval modal with risk factor display
|
||||
- Multi-cluster kubeconfig manager
|
||||
- 134 frontend tests
|
||||
|
||||
**CI/CD (GitHub Actions):**
|
||||
- Multi-platform builds: Linux (amd64/arm64), macOS (Intel/ARM), Windows
|
||||
- kubectl binary auto-bundled
|
||||
- Branch protection requires tests + Copilot review
|
||||
|
||||
---
|
||||
|
||||
## Impact
|
||||
|
||||
**Development:** 72 hours, 25 PRs, ~6,100 lines, 431 tests
|
||||
**Real-world:** Reduced troubleshooting from manual copy-paste loop to autonomous execution with sub-second command completion
|
||||
**Quality:** 3 rounds GitHub Copilot review (10 security/reliability findings, all resolved), zero Clippy warnings, zero TypeScript errors
|
||||
|
||||
---
|
||||
|
||||
## Try It
|
||||
|
||||
**Install:** [GitHub Releases](https://github.com/tftsr/apollo_nxt-trcaa/releases)
|
||||
**Quick Start:**
|
||||
1. Upload kubeconfig via Settings
|
||||
2. Create issue, select "Kubernetes" domain
|
||||
3. Ask: *"What pods are in default namespace?"*
|
||||
4. Watch AI autonomously execute `kubectl get pods -n default`
|
||||
|
||||
**No cloud required** — works fully offline with Ollama.
|
||||
|
||||
---
|
||||
|
||||
## Team Members We're Looking For
|
||||
|
||||
N/A (solo project)
|
||||
|
||||
---
|
||||
|
||||
## Fun Fact
|
||||
|
||||
This entire feature—from zero to production with 431 passing tests, 25 merged PRs, and comprehensive documentation—was built in 72 hours while maintaining zero Clippy warnings and zero TypeScript errors. The three-tier safety classifier has handled 100+ real diagnostic commands without a single false-positive denial.
|
||||
@ -29,7 +29,7 @@ C4Context
|
||||
|
||||
Person(it_eng, "IT Engineer", "Diagnoses incidents and conducts root cause analysis")
|
||||
|
||||
System(trcaa, "TRCAA Desktop App", "Structured AI-backed assistant for IT troubleshooting, 5-whys RCA, and post-mortem documentation")
|
||||
System(tftsr, "TRCAA Desktop App", "Structured AI-backed assistant for IT troubleshooting, 5-whys RCA, and post-mortem documentation")
|
||||
|
||||
System_Ext(ollama, "Ollama (Local)", "Runs open-source LLMs locally (llama3, mistral, phi3)")
|
||||
System_Ext(openai, "OpenAI API", "GPT-4o, GPT-4o-mini for cloud AI inference")
|
||||
@ -41,15 +41,15 @@ C4Context
|
||||
System_Ext(servicenow, "ServiceNow", "ITSM platform — create incident tickets")
|
||||
System_Ext(ado, "Azure DevOps", "Work item tracking and collaboration")
|
||||
|
||||
Rel(it_eng, trcaa, "Uses", "Desktop app (Tauri WebView)")
|
||||
Rel(trcaa, ollama, "AI inference", "HTTP/JSON (local)")
|
||||
Rel(trcaa, openai, "AI inference", "HTTPS/REST")
|
||||
Rel(trcaa, anthropic, "AI inference", "HTTPS/REST")
|
||||
Rel(trcaa, gemini, "AI inference", "HTTPS/REST")
|
||||
Rel(trcaa, custom_rest, "AI inference", "HTTPS/REST")
|
||||
Rel(trcaa, confluence, "Publish RCA docs", "HTTPS/REST + OAuth2")
|
||||
Rel(trcaa, servicenow, "Create incidents", "HTTPS/REST + OAuth2")
|
||||
Rel(trcaa, ado, "Create work items", "HTTPS/REST + OAuth2")
|
||||
Rel(it_eng, tftsr, "Uses", "Desktop app (Tauri WebView)")
|
||||
Rel(tftsr, ollama, "AI inference", "HTTP/JSON (local)")
|
||||
Rel(tftsr, openai, "AI inference", "HTTPS/REST")
|
||||
Rel(tftsr, anthropic, "AI inference", "HTTPS/REST")
|
||||
Rel(tftsr, gemini, "AI inference", "HTTPS/REST")
|
||||
Rel(tftsr, custom_rest, "AI inference", "HTTPS/REST")
|
||||
Rel(tftsr, confluence, "Publish RCA docs", "HTTPS/REST + OAuth2")
|
||||
Rel(tftsr, servicenow, "Create incidents", "HTTPS/REST + OAuth2")
|
||||
Rel(tftsr, ado, "Create work items", "HTTPS/REST + OAuth2")
|
||||
```
|
||||
|
||||
---
|
||||
@ -64,7 +64,7 @@ C4Container
|
||||
|
||||
Person(user, "IT Engineer")
|
||||
|
||||
System_Boundary(trcaa, "TRCAA Desktop Process") {
|
||||
System_Boundary(tftsr, "TRCAA Desktop Process") {
|
||||
Container(webview, "React Frontend", "React 18 + TypeScript + Vite", "Renders UI via OS WebView (WebKit/WebView2). Manages ephemeral session state and persisted settings.")
|
||||
Container(tauri_core, "Tauri Core / IPC Bridge", "Rust / Tauri 2", "Routes invoke() calls between WebView and backend command handlers. Enforces capability ACL.")
|
||||
Container(rust_backend, "Rust Backend", "Rust / Tokio async", "Command handlers, AI provider clients, PII engine, document generation, integration clients, audit logging.")
|
||||
@ -1167,7 +1167,7 @@ graph LR
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "Source Control"
|
||||
GOGS[Gogs / Gitea\ngogs.trcaa.com\nSarman Repository]
|
||||
GOGS[Gogs / Gitea\ngogs.tftsr.com\nSarman Repository]
|
||||
end
|
||||
|
||||
subgraph "CI/CD Triggers"
|
||||
@ -1185,15 +1185,15 @@ graph TB
|
||||
end
|
||||
|
||||
subgraph "Release Builders (Parallel)"
|
||||
AMD64[linux/amd64\nDocker: trcaa-linux-amd64\n.deb .rpm .AppImage]
|
||||
WINDOWS[windows/amd64\nDocker: trcaa-windows-cross\n.exe .msi]
|
||||
AMD64[linux/amd64\nDocker: tftsr-linux-amd64\n.deb .rpm .AppImage]
|
||||
WINDOWS[windows/amd64\nDocker: tftsr-windows-cross\n.exe .msi]
|
||||
ARM64[linux/arm64\narm64 native runner\n.deb .rpm .AppImage]
|
||||
MACOS[macOS arm64\nnative macOS runner\n.app .dmg]
|
||||
end
|
||||
|
||||
subgraph "Artifact Storage"
|
||||
RELEASE[Gitea Release\nv0.x.x tags\nAll platform assets]
|
||||
REGISTRY[Gitea Container Registry\ngitea.tftsr.com:3000\nCI Docker images]
|
||||
REGISTRY[Gitea Container Registry\n172.0.0.29:3000\nCI Docker images]
|
||||
end
|
||||
|
||||
GOGS --> PR_TRIGGER
|
||||
@ -1227,25 +1227,25 @@ graph TB
|
||||
```mermaid
|
||||
graph TB
|
||||
subgraph "macOS Runtime"
|
||||
MAC_PROC[trcaa process\nMach-O arm64 binary]
|
||||
MAC_PROC[tftsr process\nMach-O arm64 binary]
|
||||
WEBKIT[WKWebView\nSafari WebKit engine]
|
||||
MAC_DATA[~/Library/Application Support/trcaa/\n.dbkey mode 0600\n.enckey mode 0600\ntrcaa.db SQLCipher]
|
||||
MAC_DATA[~/Library/Application Support/tftsr/\n.dbkey mode 0600\n.enckey mode 0600\ntftsr.db SQLCipher]
|
||||
MAC_KUBECTL[Bundled kubectl v1.30.0\narm64 binary]
|
||||
MAC_BUNDLE[Troubleshooting and RCA Assistant.app\n/Applications/]
|
||||
end
|
||||
|
||||
subgraph "Linux Runtime"
|
||||
LINUX_PROC[trcaa process\nELF amd64/arm64]
|
||||
LINUX_PROC[tftsr process\nELF amd64/arm64]
|
||||
WEBKIT2[WebKitGTK WebView\nwebkit2gtk4.1]
|
||||
LINUX_DATA[~/.local/share/trcaa/\n.dbkey .enckey\ntrcaa.db]
|
||||
LINUX_DATA[~/.local/share/tftsr/\n.dbkey .enckey\ntftsr.db]
|
||||
LINUX_KUBECTL[Bundled kubectl v1.30.0\namd64/arm64 binary]
|
||||
LINUX_PKG[.deb / .rpm / .AppImage]
|
||||
end
|
||||
|
||||
subgraph "Windows Runtime"
|
||||
WIN_PROC[trcaa.exe\nPE amd64]
|
||||
WIN_PROC[tftsr.exe\nPE amd64]
|
||||
WEBVIEW2[Microsoft WebView2\nChromium-based]
|
||||
WIN_DATA[%APPDATA%\trcaa\\\n.dbkey .enckey\ntrcaa.db]
|
||||
WIN_DATA[%APPDATA%\tftsr\\\n.dbkey .enckey\ntftsr.db]
|
||||
WIN_KUBECTL[Bundled kubectl.exe v1.30.0\namd64 binary]
|
||||
WIN_PKG[NSIS .exe / .msi]
|
||||
end
|
||||
|
||||
@ -33,9 +33,9 @@ Auto-generate cryptographically secure 256-bit keys at first launch and persist
|
||||
| Credentials | `.enckey` | `0600` (owner r/w only) | `$TRCAA_DATA_DIR/` |
|
||||
|
||||
**Platform data directories:**
|
||||
- macOS: `~/Library/Application Support/trcaa/`
|
||||
- Linux: `~/.local/share/trcaa/`
|
||||
- Windows: `%APPDATA%\trcaa\`
|
||||
- macOS: `~/Library/Application Support/tftsr/`
|
||||
- Linux: `~/.local/share/tftsr/`
|
||||
- Windows: `%APPDATA%\tftsr\`
|
||||
|
||||
---
|
||||
|
||||
|
||||
@ -40,7 +40,7 @@ Use **Zustand** for all three state categories, with selective persistence via `
|
||||
- Session is per-issue; loading a different issue should reset all session state
|
||||
- `reset()` method called on navigation away from triage
|
||||
|
||||
**`settingsStore`** — Persisted to localStorage as `"trcaa-settings"`:
|
||||
**`settingsStore`** — Persisted to localStorage as `"tftsr-settings"`:
|
||||
- Theme, active provider, PII pattern toggles — user preference, should survive restart
|
||||
- AI providers themselves are NOT persisted here — only `active_provider` string
|
||||
- Actual `ProviderConfig` (with encrypted API keys) lives in the backend DB, loaded via `load_ai_providers()`
|
||||
@ -59,7 +59,7 @@ The settings store persists to localStorage:
|
||||
persist(
|
||||
(set, get) => ({ ...storeImpl }),
|
||||
{
|
||||
name: 'trcaa-settings',
|
||||
name: 'tftsr-settings',
|
||||
partialize: (state) => ({
|
||||
theme: state.theme,
|
||||
active_provider: state.active_provider,
|
||||
|
||||
@ -1,175 +0,0 @@
|
||||
# v1.0.5 Release Summary
|
||||
|
||||
**Date**: June 3, 2026
|
||||
**PR**: [#39](https://github.com/tftsr/apollo_nxt-trcaa/pull/39)
|
||||
**ADO**: [#727547](https://dev.azure.com/tftsr/Apollo/_workitems/edit/727547)
|
||||
**Status**: In Review
|
||||
|
||||
---
|
||||
|
||||
## Description
|
||||
|
||||
Post-hackathon fixes addressing agent output quality issues and provider compatibility documentation.
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [x] Ollama no longer echoes raw JSON tool call payloads to users
|
||||
- [x] LiteLLM diagnostic queries execute actual commands instead of status JSON
|
||||
- [x] TFTSR GenAI incompatibility documented with recommendations
|
||||
- [x] All tests passing (280 Rust, 103 frontend)
|
||||
- [x] All linting clean (clippy, TypeScript)
|
||||
|
||||
---
|
||||
|
||||
## Work Implemented
|
||||
|
||||
### Issue 1: Verbose JSON Output (Ollama)
|
||||
|
||||
**Problem**: Agent was echoing tool call requests and responses to users in JSON format:
|
||||
```
|
||||
Let's execute a kubectl command:
|
||||
|
||||
{"requesting_agent": "devops-incident-responder", "request_type": "execute_shell_command", ...}
|
||||
|
||||
Response:
|
||||
{"stdout": [...]}
|
||||
```
|
||||
|
||||
**Root Cause**: Agent prompt didn't explicitly prohibit showing tool call JSON to users.
|
||||
|
||||
**Fix**: Added CRITICAL instruction in `devops_incident_responder.md`:
|
||||
> Never echo tool call requests or responses in your user-facing output. When you invoke execute_shell_command, DO NOT show the JSON request payload to the user. After receiving the tool result, present ONLY the meaningful output in natural language or formatted results.
|
||||
|
||||
### Issue 2: No Actual Investigation (LiteLLM)
|
||||
|
||||
**Problem**: Diagnostic queries like "investigate telemetry issues" returned status JSON objects without executing commands:
|
||||
```json
|
||||
{
|
||||
"agent": "devops-incident-responder",
|
||||
"status": "investigating",
|
||||
"progress": {"phase": "Phase 1: Detection & Evidence Gathering", ...}
|
||||
}
|
||||
```
|
||||
|
||||
**Root Cause**: Agent treated diagnostic investigations as status updates rather than actionable tasks.
|
||||
|
||||
**Fix**: Strengthened Diagnostic Investigation section:
|
||||
- Added CRITICAL: Actually execute the diagnostic commands via execute_shell_command tool
|
||||
- Added explicit instruction: DO NOT just output status JSON
|
||||
- Added warning: Outputting status JSON instead of executing commands is a critical failure
|
||||
- Clarified examples to include "Investigate telemetry issues"
|
||||
|
||||
### Issue 3: TFTSR GenAI Tool Calling Incompatibility
|
||||
|
||||
**Problem**: TFTSR GenAI gateway returns:
|
||||
```
|
||||
503 Service Unavailable: {"status":false,"msg":"Gemini Filter Triggered: UNEXPECTED_TOOL_CALL"}
|
||||
```
|
||||
|
||||
**Root Cause**: Gateway-level content filtering blocks tool calls before they reach the client. The workaround parser in PR#38 cannot overcome this because the filtering happens at the gateway layer.
|
||||
|
||||
**Fix**: Documented in `docs/wiki/AI-Providers.md`:
|
||||
- Created dedicated "TFTSR GenAI" section
|
||||
- Documented limitations:
|
||||
- ❌ Tool calling not supported
|
||||
- ❌ Shell execution unavailable
|
||||
- ✅ Basic chat works
|
||||
- ✅ Workaround parser included (attempts to parse malformed responses)
|
||||
- Recommended alternatives: LiteLLM + AWS Bedrock or Ollama
|
||||
- Explained root cause: Gateway-level filtering cannot be worked around from client side
|
||||
|
||||
---
|
||||
|
||||
## Testing Needed
|
||||
|
||||
### Automated Tests
|
||||
- [x] Rust unit tests: 280 passing
|
||||
- [x] Frontend tests: 103 passing
|
||||
- [x] Clippy: clean
|
||||
- [x] TypeScript: clean
|
||||
|
||||
### Manual Tests
|
||||
- [ ] **Ollama Simple Query**: Verify no JSON output shown to user
|
||||
- Prompt: "What pods are running in default namespace?"
|
||||
- Expected: Clean output without `{"requesting_agent": ...}` JSON
|
||||
|
||||
- [ ] **LiteLLM Diagnostic Query**: Verify commands are executed
|
||||
- Prompt: "Investigate why telemetry data is not being collected"
|
||||
- Expected: kubectl commands executed (get pods, describe, logs)
|
||||
- Not expected: Status JSON object without command execution
|
||||
|
||||
- [ ] **TFTSR GenAI Error**: Verify documented error appears
|
||||
- Any prompt with configured TFTSR GenAI provider
|
||||
- Expected: 503 error with "Gemini Filter Triggered"
|
||||
- Check: Error message helps user understand limitation
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
| File | Changes |
|
||||
|------|---------|
|
||||
| `src-tauri/src/ai/agents/devops_incident_responder.md` | Added 3 CRITICAL instructions to suppress JSON output and enforce command execution |
|
||||
| `docs/wiki/AI-Providers.md` | Added TFTSR GenAI section documenting tool calling incompatibility |
|
||||
| `src-tauri/Cargo.toml` | Version bump to 1.0.5 |
|
||||
| `src-tauri/tauri.conf.json` | Version bump to 1.0.5 |
|
||||
| `package.json` | Version bump to 1.0.5 |
|
||||
| `docs/v1.0.5-summary.md` | This release summary document |
|
||||
| `docs/2026-HACKATHON-SUMMARY.md` | Added v1.0.5 section, Challenges 11-12, updated metrics |
|
||||
|
||||
**Total**: 7 files, +268 lines, -17 lines
|
||||
|
||||
---
|
||||
|
||||
## Impact Analysis
|
||||
|
||||
### User Experience
|
||||
- **Positive**: Cleaner, more readable agent responses (no raw JSON)
|
||||
- **Positive**: Diagnostic queries now produce actual investigation results
|
||||
- **Positive**: Clear documentation prevents TFTSR GenAI tool calling confusion
|
||||
|
||||
### Performance
|
||||
- **Neutral**: No performance impact (prompt changes only)
|
||||
|
||||
### Security
|
||||
- **Neutral**: No security implications
|
||||
|
||||
### Compatibility
|
||||
- **Positive**: All existing providers maintain compatibility
|
||||
- **Documentation**: TFTSR GenAI limitations now clearly documented
|
||||
|
||||
---
|
||||
|
||||
## Related Work
|
||||
|
||||
- **v1.0.4 (PR #38)**: Graceful exit on tool iteration limit, TFTSR GenAI workaround parser
|
||||
- **v1.0.3 (PR #37)**: Query classification (Simple/Diagnostic/Incident)
|
||||
- **v1.0.2 (PR #31)**: LiteLLM integration, Ollama auto-start
|
||||
- **v1.0.0 (PR #27, #28)**: Initial agentic shell execution
|
||||
|
||||
---
|
||||
|
||||
## Deployment Notes
|
||||
|
||||
No special deployment requirements. Changes are backward-compatible agent prompt updates.
|
||||
|
||||
---
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
1. **Explicit instructions required**: Agent prompts need explicit prohibitions, not just positive instructions
|
||||
2. **Status updates vs. actions**: Agents may confuse reporting status with taking action unless clearly directed
|
||||
3. **Gateway limitations**: Some infrastructure limitations (TFTSR GenAI filtering) cannot be worked around at the client level
|
||||
4. **Testing depth**: Need better manual test cases for agent behavior quality beyond unit tests
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
After merge:
|
||||
1. Update hackathon summary with v1.0.5 details
|
||||
2. Test on macOS build when available
|
||||
3. Monitor for any remaining agent behavior issues
|
||||
4. Consider adding automated tests for agent output quality
|
||||
@ -1,224 +0,0 @@
|
||||
# Version 1.0.7 Release Summary
|
||||
|
||||
**Release Date**: 2026-06-03
|
||||
**Type**: Bug Fix
|
||||
**Focus**: Ollama Function Calling Support
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Version 1.0.7 adds function calling (tool use) support to the Ollama AI provider, enabling local Ollama models to execute shell commands and interact with system tools just like OpenAI-compatible providers.
|
||||
|
||||
---
|
||||
|
||||
## What Changed
|
||||
|
||||
### Function Calling Support for Ollama
|
||||
|
||||
**Problem**: The Ollama provider was ignoring the `tools` parameter and could not execute function calls (like `execute_shell_command`). Models would output text descriptions of tool calls instead of actually invoking them.
|
||||
|
||||
**Solution**: Implemented full function calling support in the Ollama provider:
|
||||
|
||||
1. **Tool Registration**: Ollama provider now accepts and formats tools in the request
|
||||
2. **Tool Call Parsing**: Response handler parses `tool_calls` from Ollama API responses
|
||||
3. **Arguments Handling**: Supports both object and string argument formats
|
||||
4. **ID Generation**: Generates fallback IDs when Ollama doesn't provide them
|
||||
|
||||
**Files Changed**:
|
||||
- `src-tauri/src/ai/ollama.rs` - Added function calling support
|
||||
|
||||
---
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Ollama API Integration
|
||||
|
||||
The Ollama provider now sends tools in the request body:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "llama3.1:8b",
|
||||
"messages": [...],
|
||||
"stream": false,
|
||||
"tools": [
|
||||
{
|
||||
"type": "function",
|
||||
"function": {
|
||||
"name": "execute_shell_command",
|
||||
"description": "Execute shell commands...",
|
||||
"parameters": {...}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Response Parsing
|
||||
|
||||
Parses tool calls from Ollama's response format:
|
||||
|
||||
```json
|
||||
{
|
||||
"message": {
|
||||
"content": "...",
|
||||
"tool_calls": [
|
||||
{
|
||||
"id": "call_123",
|
||||
"function": {
|
||||
"name": "execute_shell_command",
|
||||
"arguments": {"command": "kubectl get pods"}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Before vs After
|
||||
|
||||
### Before (v1.0.6)
|
||||
|
||||
**User**: "Can you tell me all the namespaces in devops1-fed1?"
|
||||
|
||||
**Ollama Response** (broken):
|
||||
```
|
||||
tool_calls:
|
||||
- command: kubectl get ns --all-namespaces=false
|
||||
output_format: table
|
||||
```
|
||||
*Output is just text, no actual command execution*
|
||||
|
||||
### After (v1.0.7)
|
||||
|
||||
**User**: "Can you tell me all the namespaces in devops1-fed1?"
|
||||
|
||||
**Ollama Response** (working):
|
||||
- Executes: `kubectl get namespaces`
|
||||
- Returns: Actual namespace list from cluster
|
||||
- Format: Natural language summary with data
|
||||
|
||||
---
|
||||
|
||||
## Impact
|
||||
|
||||
### User Benefits
|
||||
|
||||
- ✅ **Local Ollama models now work properly** with diagnostic commands
|
||||
- ✅ **No cloud API required** for function calling (privacy benefit)
|
||||
- ✅ **Consistent behavior** across OpenAI and Ollama providers
|
||||
- ✅ **Lower costs** by using local models for incident response
|
||||
|
||||
### Developer Benefits
|
||||
|
||||
- ✅ **Unified tool interface** across all providers
|
||||
- ✅ **Easier testing** with local models
|
||||
- ✅ **Better debugging** without API dependencies
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Test Cases
|
||||
|
||||
1. **Simple Information Query**:
|
||||
- Input: "What pods are running in subsys-sub1?"
|
||||
- Expected: Executes `kubectl get pods -n subsys-sub1` and returns results
|
||||
|
||||
2. **Diagnostic Investigation**:
|
||||
- Input: "Investigate telemetry issues in devops1-fed1"
|
||||
- Expected: Executes multiple kubectl commands, analyzes results
|
||||
|
||||
3. **Tool Call Arguments**:
|
||||
- Test both object and string argument formats
|
||||
- Verify proper JSON serialization
|
||||
|
||||
### Verified Models
|
||||
|
||||
- ✅ `llama3.1:8b` - Full function calling support
|
||||
- ✅ `gemma4:e2b` - Full function calling support
|
||||
- ⚠️ Other models may require testing (phi3, mistral, codellama)
|
||||
|
||||
---
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### For Users
|
||||
|
||||
**No configuration changes required**. If you're using Ollama provider, function calling will now work automatically.
|
||||
|
||||
### For Developers
|
||||
|
||||
**No code changes required**. The Ollama provider signature matches the existing `Provider` trait.
|
||||
|
||||
---
|
||||
|
||||
## Known Limitations
|
||||
|
||||
1. **Model Support**: Function calling availability depends on the Ollama model's capabilities. Not all models support tools.
|
||||
|
||||
2. **Response Format**: Ollama's tool call format may vary slightly from OpenAI's. The provider handles common variations.
|
||||
|
||||
3. **Error Handling**: If Ollama returns malformed tool calls, they are skipped and the response content is returned instead.
|
||||
|
||||
---
|
||||
|
||||
## Related Issues
|
||||
|
||||
- Fixes: Tool calls not working with local Ollama
|
||||
- Related to: PR #40 (removed JSON examples from agent prompts)
|
||||
- Complements: liteLLM timeout fixes for remote models
|
||||
|
||||
---
|
||||
|
||||
## Upgrade Instructions
|
||||
|
||||
1. **Pull latest code**: `git pull origin main`
|
||||
2. **Rebuild application**: `npm run tauri build`
|
||||
3. **Install updated app**: Replace existing `.app` in `/Applications/`
|
||||
4. **Test function calling**: Use Ollama provider with diagnostic queries
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Potential Improvements
|
||||
|
||||
1. **Streaming Support**: Add function calling for streaming responses
|
||||
2. **Tool Choice Control**: Support `tool_choice` parameter (auto/required/none)
|
||||
3. **Parallel Tool Calls**: Handle multiple simultaneous tool invocations
|
||||
4. **Model Capability Detection**: Auto-detect which Ollama models support tools
|
||||
|
||||
### Compatibility
|
||||
|
||||
This release maintains backward compatibility with:
|
||||
- OpenAI provider function calling
|
||||
- Anthropic provider function calling
|
||||
- Gemini provider function calling
|
||||
- TFTSR GenAI custom format
|
||||
|
||||
---
|
||||
|
||||
## Credits
|
||||
|
||||
- **Issue Identification**: Testing revealed Ollama tool calling regression after PR #40
|
||||
- **Root Cause Analysis**: Ollama provider was ignoring tools parameter entirely
|
||||
- **Implementation**: Added full function calling support matching OpenAI format
|
||||
- **Testing**: Verified with llama3.1:8b and gemma4:e2b models
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
- **v1.0.7** (2026-06-03): Added Ollama function calling support
|
||||
- **v1.0.6** (2026-06-03): Removed JSON examples from agent prompts
|
||||
- **v1.0.5** (2026-06-03): Agent output quality improvements
|
||||
|
||||
---
|
||||
|
||||
**Release Type**: Bug Fix
|
||||
**Breaking Changes**: None
|
||||
**API Changes**: None (internal implementation only)
|
||||
**Documentation Updated**: Yes
|
||||
@ -1,293 +0,0 @@
|
||||
# Version 1.0.8 Release Summary
|
||||
|
||||
**Release Date**: 2026-06-03
|
||||
**Type**: Bug Fix + Enhancements
|
||||
**Focus**: Ollama Connection Reliability
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Version 1.0.8 improves Ollama provider connection reliability with extended timeouts, retry logic, and health checks. Also updates model recommendations to require ≥3B parameters for reliable tool calling.
|
||||
|
||||
---
|
||||
|
||||
## What Changed
|
||||
|
||||
### Connection Reliability Improvements
|
||||
|
||||
**Problem**: Users experiencing intermittent "cannot be reached" errors and timeouts when using Ollama for tool calling.
|
||||
|
||||
**Solution**: Comprehensive connection reliability improvements:
|
||||
|
||||
1. **Extended Timeouts**
|
||||
- 180s timeout for tool calling (vs 60s for regular chat)
|
||||
- 10s connect timeout to fail fast on unreachable servers
|
||||
- Tool calling requires more time for structured output generation
|
||||
|
||||
2. **Health Check Before Requests**
|
||||
- Quick `/api/tags` endpoint check before attempting chat
|
||||
- Prevents wasted time on requests to unresponsive servers
|
||||
- Better error messages distinguishing connection vs API failures
|
||||
|
||||
3. **Retry Logic**
|
||||
- 3 attempts total with 2s delay between retries
|
||||
- Retries on: connection errors, server errors (5xx), JSON parse errors
|
||||
- Last error captured and reported for debugging
|
||||
|
||||
4. **Auto-Start Improvements**
|
||||
- 2s initialization delay after auto-start to allow Ollama to fully start
|
||||
- Prevents immediate connection failures after service start
|
||||
|
||||
### Model Recommendations Update (Breaking)
|
||||
|
||||
**Problem**: Models <3B parameters cannot reliably follow tool calling instructions.
|
||||
|
||||
**Testing Results**:
|
||||
- ✅ `llama3.2:3b` and larger: Properly invoke tools
|
||||
- ❌ `llama3.2:1b`: Describes tools in text instead of calling them
|
||||
|
||||
**Updated Default Model List**:
|
||||
|
||||
| Model | Size | Min RAM | Notes |
|
||||
|-------|------|---------|-------|
|
||||
| `llama3.2:3b` | 2.0 GB | 6 GB | Balanced performance |
|
||||
| `phi3.5:3.8b` | 2.2 GB | 6 GB | Excellent reasoning |
|
||||
| `llama3.1:8b` | 4.7 GB | 10 GB | **RECOMMENDED** |
|
||||
| `qwen2.5:14b` | 9.0 GB | 16 GB | Best for complex analysis |
|
||||
| `gemma2:9b` | 5.5 GB | 12 GB | Google's efficient model |
|
||||
|
||||
**Removed Models**: Generic model names without size tags (`llama3.1`, `llama3`, `mistral`, `codellama`, `phi3`)
|
||||
|
||||
---
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Retry Logic Implementation
|
||||
|
||||
```rust
|
||||
let max_retries = 2;
|
||||
for attempt in 0..=max_retries {
|
||||
if attempt > 0 {
|
||||
tokio::time::sleep(Duration::from_secs(2)).await;
|
||||
}
|
||||
|
||||
match client.post(&url).send().await {
|
||||
Ok(resp) if resp.status().is_success() => {
|
||||
// Success - parse and return
|
||||
}
|
||||
Ok(resp) if resp.status().is_server_error() && attempt < max_retries => {
|
||||
continue; // Retry on 5xx
|
||||
}
|
||||
Err(e) if attempt < max_retries => {
|
||||
continue; // Retry connection errors
|
||||
}
|
||||
_ => {
|
||||
// Final failure - report error
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Health Check
|
||||
|
||||
```rust
|
||||
let health_check_result = client
|
||||
.get(format!("{base_url}/api/tags"))
|
||||
.send()
|
||||
.await;
|
||||
|
||||
match health_check_result {
|
||||
Ok(resp) if resp.status().is_success() => {
|
||||
// Ollama is ready
|
||||
}
|
||||
_ => {
|
||||
anyhow::bail!("Cannot connect to Ollama. Please ensure Ollama is running.");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Files Changed
|
||||
|
||||
1. **src-tauri/src/ai/ollama.rs** (+100 lines, -90 lines)
|
||||
- Extended timeout: 180s for tool calling, 60s for chat
|
||||
- Added connect_timeout: 10s
|
||||
- Implemented retry logic with 3 attempts
|
||||
- Added health check before chat requests
|
||||
- Added 2s delay after auto-start
|
||||
- Updated model list to ≥3B parameters
|
||||
|
||||
2. **docs/wiki/AI-Providers.md** (+60 lines)
|
||||
- Updated Ollama section with tool calling details
|
||||
- Added model recommendations table with size/RAM requirements
|
||||
- Added troubleshooting section
|
||||
- Added performance tips
|
||||
|
||||
3. **package.json, src-tauri/Cargo.toml, src-tauri/tauri.conf.json**
|
||||
- Version: 1.0.7 → 1.0.8
|
||||
|
||||
4. **src-tauri/Cargo.lock** (auto-updated)
|
||||
|
||||
---
|
||||
|
||||
## Before vs After
|
||||
|
||||
### Before (v1.0.7)
|
||||
|
||||
**User Experience:**
|
||||
- Intermittent connection failures
|
||||
- 60s timeout insufficient for tool calling
|
||||
- No retry on transient errors
|
||||
- Generic error: "Failed to connect to Ollama"
|
||||
|
||||
**Model Issues:**
|
||||
- Users could select 1B models
|
||||
- Models would describe tools instead of calling them
|
||||
- Confusing experience with no clear guidance
|
||||
|
||||
### After (v1.0.8)
|
||||
|
||||
**User Experience:**
|
||||
- Health check prevents wasted requests
|
||||
- 180s timeout sufficient for tool calling
|
||||
- 3 retry attempts handle transient failures
|
||||
- Clear error messages: "Ollama is not ready" vs "Connection error"
|
||||
|
||||
**Model Guidance:**
|
||||
- Only ≥3B models shown in dropdown
|
||||
- Clear RAM requirements in documentation
|
||||
- Working tool calling for all recommended models
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
### Connection Reliability
|
||||
|
||||
1. ✅ **Health Check**: Ollama service stopped → immediate clear error
|
||||
2. ✅ **Retry Logic**: Simulated network glitch → 3 attempts with 2s delay
|
||||
3. ✅ **Extended Timeout**: Tool calling with llama3.1:8b → completes within 180s
|
||||
4. ✅ **Auto-Start**: First request → Ollama starts, 2s delay, successful connection
|
||||
|
||||
### Model Testing
|
||||
|
||||
1. ✅ **llama3.2:3b**: Proper tool calls, reasonable response time
|
||||
2. ✅ **phi3.5:3.8b**: Excellent tool calling, fast responses
|
||||
3. ✅ **llama3.1:8b**: Best overall performance, recommended
|
||||
4. ✅ **qwen2.5:14b**: Excellent for complex queries, slower but thorough
|
||||
5. ✅ **gemma2:9b**: Good balance of size and capability
|
||||
6. ⚠️ **llama3.2:1b**: Correctly describes tools in text (as expected for <3B model)
|
||||
|
||||
---
|
||||
|
||||
## Migration Guide
|
||||
|
||||
### For Users
|
||||
|
||||
**No configuration changes required** if using recommended models (≥3B).
|
||||
|
||||
**If using 1B models:**
|
||||
1. Open Settings → AI Providers → Ollama
|
||||
2. Select a model ≥3B parameters (e.g., `llama3.2:3b`)
|
||||
3. Ensure model is pulled: `ollama pull llama3.2:3b`
|
||||
|
||||
### For Developers
|
||||
|
||||
**No code changes required**. Timeout and retry improvements are automatic.
|
||||
|
||||
**Model list now enforces ≥3B**: Update `ollama.rs::info()` if custom models needed.
|
||||
|
||||
---
|
||||
|
||||
## Known Limitations
|
||||
|
||||
### Ollama Provider
|
||||
|
||||
1. **Model Loading Time**: First request loads model into VRAM (5-10s delay)
|
||||
2. **Memory Usage**: Larger models use significant RAM/VRAM
|
||||
3. **Quantization Trade-offs**: Lower quantization (Q3_K_M) faster but less accurate
|
||||
4. **Concurrent Requests**: Ollama processes requests sequentially
|
||||
|
||||
### Tool Calling (Applies to ALL Providers)
|
||||
|
||||
1. **Model Size**: <3B parameters insufficient for reliable structured output
|
||||
2. **Response Time**: Tool calling 2-3x slower than regular chat
|
||||
3. **Multi-turn Complexity**: Deep tool conversations may hit iteration limits
|
||||
|
||||
### TFTSR GenAI Provider
|
||||
|
||||
**Status**: ⚠️ **Limited Compatibility**
|
||||
|
||||
- ❌ **Tool calling blocked**: Gateway returns `503 UNEXPECTED_TOOL_CALL`
|
||||
- ❌ **Cannot use shell execution**: No function calling features available
|
||||
- ✅ **Text-only chat works**: Regular conversations function correctly
|
||||
- 📋 **Recommendation**: Use LiteLLM + AWS Bedrock or Ollama for full features
|
||||
|
||||
**Root Cause**: TFTSR GenAI gateway applies content filtering at gateway level, blocking structured tool call responses before they reach the client. This cannot be worked around from the client side.
|
||||
|
||||
**Documented**: See `docs/wiki/AI-Providers.md` section 6 for full details and alternatives.
|
||||
|
||||
---
|
||||
|
||||
## Performance Impact
|
||||
|
||||
### Positive
|
||||
|
||||
- ✅ Retry logic improves success rate by ~15% (transient failures recovered)
|
||||
- ✅ Health check prevents wasted 60-180s timeouts on down servers
|
||||
- ✅ Extended timeout eliminates premature failures on tool calling
|
||||
|
||||
### Neutral
|
||||
|
||||
- Health check adds ~50-100ms per request (negligible)
|
||||
- Auto-start delay adds 2s on first request only (one-time per session)
|
||||
|
||||
### Trade-offs
|
||||
|
||||
- Retry logic can extend failed requests from 60s to 186s (3 × 60s + 2 × 2s delay)
|
||||
- Users get result instead of error, so perceived as improvement
|
||||
|
||||
---
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Potential Improvements
|
||||
|
||||
1. **Adaptive Timeout**: Detect model size and adjust timeout dynamically
|
||||
2. **Model Caching**: Pre-load models on application start
|
||||
3. **Streaming Support**: Real-time token streaming for faster perceived responses
|
||||
4. **Parallel Requests**: Queue multiple Ollama requests (requires Ollama enhancement)
|
||||
5. **GPU Detection**: Recommend models based on available VRAM
|
||||
|
||||
### Compatibility
|
||||
|
||||
This release maintains backward compatibility with:
|
||||
- v1.0.7 Ollama function calling
|
||||
- All other AI providers (OpenAI, Anthropic, Gemini, Mistral, LiteLLM)
|
||||
- Existing model configurations (users can still manually type 1B model names)
|
||||
|
||||
---
|
||||
|
||||
## Related Issues
|
||||
|
||||
- Builds on: PR #41 (v1.0.7 - Ollama function calling support)
|
||||
- Fixes: Intermittent "cannot be reached" errors during testing
|
||||
- Documents: TFTSR GenAI tool calling limitations (gateway-level blocking)
|
||||
|
||||
---
|
||||
|
||||
## Version History
|
||||
|
||||
- **v1.0.8** (2026-06-03): Connection reliability + model recommendations
|
||||
- **v1.0.7** (2026-06-03): Ollama function calling support
|
||||
- **v1.0.6** (2026-06-03): Removed JSON examples from agent prompts
|
||||
- **v1.0.5** (2026-06-03): Agent output quality improvements
|
||||
|
||||
---
|
||||
|
||||
**Release Type**: Bug Fix + Enhancements
|
||||
**Breaking Changes**: None (model list updated but user can still type 1B models)
|
||||
**API Changes**: None (internal implementation only)
|
||||
**Documentation Updated**: Yes (wiki + v1.0.8-summary.md)
|
||||
@ -29,7 +29,7 @@ TRCAA uses a Tauri 2.x architecture: a Rust backend runs natively, and a React/T
|
||||
pub struct AppState {
|
||||
pub db: Arc<Mutex<rusqlite::Connection>>,
|
||||
pub settings: Arc<Mutex<AppSettings>>,
|
||||
pub app_data_dir: PathBuf, // ~/.local/share/trcaa on Linux
|
||||
pub app_data_dir: PathBuf, // ~/.local/share/tftsr on Linux
|
||||
}
|
||||
```
|
||||
|
||||
@ -111,7 +111,7 @@ src-tauri/src/
|
||||
| Store | Persistence | Contents |
|
||||
|-------|------------|----------|
|
||||
| `sessionStore.ts` | Not persisted (ephemeral) | currentIssue, messages, piiSpans, approvedRedactions, whyLevel (0–5), loading state |
|
||||
| `settingsStore.ts` | `localStorage` as `"trcaa-settings"` | AI providers, theme, Ollama URL, active provider |
|
||||
| `settingsStore.ts` | `localStorage` as `"tftsr-settings"` | AI providers, theme, Ollama URL, active provider |
|
||||
| `historyStore.ts` | Not persisted (cache) | Past issues list, search query |
|
||||
|
||||
### Page Flow
|
||||
@ -229,7 +229,7 @@ Timeline events are stored in the `timeline_events` table (indexed by issue_id a
|
||||
|
||||
```
|
||||
1. Initialize tracing (RUST_LOG controls level)
|
||||
2. Determine data directory (~/.local/share/trcaa or TRCAA_DATA_DIR)
|
||||
2. Determine data directory (~/.local/share/tftsr or TRCAA_DATA_DIR)
|
||||
3. Open / create SQLite database (run migrations)
|
||||
4. Create AppState (db + settings + app_data_dir)
|
||||
5. Register Tauri plugins (stronghold, dialog, fs, shell, http, cli, updater)
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
|
||||
| Component | URL | Notes |
|
||||
|-----------|-----|-------|
|
||||
| Gitea | `https://gogs.trcaa.com` / `http://gitea.tftsr.com:3000` | Git server (migrated from Gogs 0.14) |
|
||||
| Gitea | `https://gogs.tftsr.com` / `http://172.0.0.29:3000` | Git server (migrated from Gogs 0.14) |
|
||||
| Woodpecker CI (direct) | `http://gitea.tftsr.com:8084` | v2.x |
|
||||
| Woodpecker CI (proxy) | `http://gitea.tftsr.com:8085` | nginx reverse proxy |
|
||||
| PostgreSQL (Gitea DB) | Container: `gogs_postgres_db` | DB: `gogsdb`, User: `gogs` |
|
||||
@ -30,25 +30,25 @@ macOS runner runs jobs **directly on the host** (no Docker container) — macOS
|
||||
## Pre-baked Builder Images
|
||||
|
||||
CI build and test jobs use pre-baked Docker images pushed to the local Gitea registry
|
||||
at `gitea.tftsr.com:3000`. These images bake in all system dependencies (Tauri libs, Node.js,
|
||||
at `172.0.0.29:3000`. These images bake in all system dependencies (Tauri libs, Node.js,
|
||||
Rust toolchain, cross-compilers) so that CI jobs skip package installation entirely.
|
||||
|
||||
| Image | Used by jobs | Contents |
|
||||
|-------|-------------|----------|
|
||||
| `gitea.tftsr.com:3000/sarman/trcaa-linux-amd64:rust1.88-node22` | `rust-fmt-check`, `rust-clippy`, `rust-tests`, `build-linux-amd64` | Rust 1.88 + rustfmt + clippy + Tauri amd64 libs + Node.js 22 |
|
||||
| `gitea.tftsr.com:3000/sarman/trcaa-windows-cross:rust1.88-node22` | `build-windows-amd64` | Rust 1.88 + mingw-w64 + NSIS + Node.js 22 |
|
||||
| `gitea.tftsr.com:3000/sarman/trcaa-linux-arm64:rust1.88-node22` | `build-linux-arm64` | Rust 1.88 + aarch64 cross-toolchain + arm64 multiarch libs + Node.js 22 |
|
||||
| `172.0.0.29:3000/sarman/tftsr-linux-amd64:rust1.88-node22` | `rust-fmt-check`, `rust-clippy`, `rust-tests`, `build-linux-amd64` | Rust 1.88 + rustfmt + clippy + Tauri amd64 libs + Node.js 22 |
|
||||
| `172.0.0.29:3000/sarman/tftsr-windows-cross:rust1.88-node22` | `build-windows-amd64` | Rust 1.88 + mingw-w64 + NSIS + Node.js 22 |
|
||||
| `172.0.0.29:3000/sarman/tftsr-linux-arm64:rust1.88-node22` | `build-linux-arm64` | Rust 1.88 + aarch64 cross-toolchain + arm64 multiarch libs + Node.js 22 |
|
||||
|
||||
**Rebuild triggers:** Rust toolchain version bump, webkit2gtk/gtk major version change, Node.js major version change.
|
||||
|
||||
**How to rebuild images:**
|
||||
1. Trigger `build-images.yml` via `workflow_dispatch` in the Gitea Actions UI
|
||||
2. Confirm all 3 images appear in the Gitea package/container registry at `gitea.tftsr.com:3000`
|
||||
2. Confirm all 3 images appear in the Gitea package/container registry at `172.0.0.29:3000`
|
||||
3. Only then merge workflow changes that depend on the new image contents
|
||||
|
||||
**Server prerequisite — insecure registry** (one-time, on gitea.tftsr.com):
|
||||
```sh
|
||||
echo '{"insecure-registries":["gitea.tftsr.com:3000"]}' | sudo tee /etc/docker/daemon.json
|
||||
echo '{"insecure-registries":["172.0.0.29:3000"]}' | sudo tee /etc/docker/daemon.json
|
||||
sudo systemctl restart docker
|
||||
```
|
||||
This must be configured on every machine running an act_runner for the runner's Docker
|
||||
@ -106,7 +106,7 @@ Pipeline jobs (run in parallel):
|
||||
```
|
||||
|
||||
**Docker images used:**
|
||||
- `gitea.tftsr.com:3000/sarman/trcaa-linux-amd64:rust1.88-node22` — Rust steps (replaces `rust:1.88-slim`)
|
||||
- `172.0.0.29:3000/sarman/tftsr-linux-amd64:rust1.88-node22` — Rust steps (replaces `rust:1.88-slim`)
|
||||
- `node:22-alpine` — Frontend steps
|
||||
|
||||
---
|
||||
@ -120,15 +120,15 @@ Release jobs are executed in the same workflow and depend on `autotag` completio
|
||||
|
||||
```
|
||||
Jobs (run in parallel after autotag):
|
||||
build-linux-amd64 → image: trcaa-linux-amd64:rust1.88-node22
|
||||
build-linux-amd64 → image: tftsr-linux-amd64:rust1.88-node22
|
||||
→ cargo tauri build (x86_64-unknown-linux-gnu)
|
||||
→ {.deb, .rpm, .AppImage} uploaded to Gitea release
|
||||
→ fails fast if no Linux artifacts are produced
|
||||
build-windows-amd64 → image: trcaa-windows-cross:rust1.88-node22
|
||||
build-windows-amd64 → image: tftsr-windows-cross:rust1.88-node22
|
||||
→ cargo tauri build (x86_64-pc-windows-gnu) via mingw-w64
|
||||
→ {.exe, .msi} uploaded to Gitea release
|
||||
→ fails fast if no Windows artifacts are produced
|
||||
build-linux-arm64 → image: trcaa-linux-arm64:rust1.88-node22 (ubuntu:22.04-based)
|
||||
build-linux-arm64 → image: tftsr-linux-arm64:rust1.88-node22 (ubuntu:22.04-based)
|
||||
→ cargo tauri build (aarch64-unknown-linux-gnu)
|
||||
→ {.deb, .rpm, .AppImage} uploaded to Gitea release
|
||||
→ fails fast if no Linux artifacts are produced
|
||||
@ -154,7 +154,7 @@ steps:
|
||||
**Multi-agent workspace isolation:**
|
||||
|
||||
Steps routed to different agents do **not** share a workspace. The arm64 step clones
|
||||
the repo directly within its commands (using `http://gitea.tftsr.com:3000`, accessible from
|
||||
the repo directly within its commands (using `http://172.0.0.29:3000`, accessible from
|
||||
the local machine) and uploads its artifacts inline. The `upload-release` step (amd64)
|
||||
handles amd64 + windows artifacts only.
|
||||
|
||||
@ -167,7 +167,7 @@ clone:
|
||||
network_mode: gogs_default
|
||||
commands:
|
||||
- git init -b master
|
||||
- git remote add origin http://gitea_app:3000/sarman/trcaa-devops_investigation.git
|
||||
- git remote add origin http://gitea_app:3000/sarman/tftsr-devops_investigation.git
|
||||
- git fetch --depth=1 origin +refs/tags/${CI_COMMIT_TAG}:refs/tags/${CI_COMMIT_TAG}
|
||||
- git checkout ${CI_COMMIT_TAG}
|
||||
```
|
||||
@ -202,14 +202,14 @@ migration. The secret name stays `GOGS_TOKEN` for pipeline compatibility.
|
||||
**Gitea Release API (replaces Gogs API — same endpoints, different container name):**
|
||||
```bash
|
||||
# Create release
|
||||
POST http://gitea_app:3000/api/v1/repos/sarman/trcaa-devops_investigation/releases
|
||||
POST http://gitea_app:3000/api/v1/repos/sarman/tftsr-devops_investigation/releases
|
||||
Authorization: token $GOGS_TOKEN
|
||||
|
||||
# Upload artifact
|
||||
POST http://gitea_app:3000/api/v1/repos/sarman/trcaa-devops_investigation/releases/{id}/assets
|
||||
POST http://gitea_app:3000/api/v1/repos/sarman/tftsr-devops_investigation/releases/{id}/assets
|
||||
```
|
||||
|
||||
From the arm64 agent (local machine), use `http://gitea.tftsr.com:3000/api/v1` instead.
|
||||
From the arm64 agent (local machine), use `http://172.0.0.29:3000/api/v1` instead.
|
||||
|
||||
---
|
||||
|
||||
@ -235,7 +235,7 @@ After migration, Woodpecker 2.x registers webhooks automatically when a repo is
|
||||
activated via the UI. No manual JWT-signed webhook setup required.
|
||||
|
||||
1. Log in at `http://gitea.tftsr.com:8085` via Gitea OAuth2
|
||||
2. Add repo `sarman/trcaa-devops_investigation`
|
||||
2. Add repo `sarman/tftsr-devops_investigation`
|
||||
3. Woodpecker creates webhook in Gitea automatically
|
||||
|
||||
---
|
||||
@ -318,7 +318,7 @@ There are no cross-arch index overlaps and the dependency resolver succeeds. Rus
|
||||
installed manually via `rustup` since it is not pre-installed in the Ubuntu base image.
|
||||
|
||||
### Step Containers Cannot Reach `gitea_app`
|
||||
Default Docker bridge containers cannot resolve `gitea_app` or reach `gitea.tftsr.com:3000`
|
||||
Default Docker bridge containers cannot resolve `gitea_app` or reach `172.0.0.29:3000`
|
||||
(host firewall). Fix: use `network_mode: gogs_default` in any step that needs Gitea
|
||||
access. Requires `repo_trusted=1`.
|
||||
|
||||
|
||||
@ -4,7 +4,7 @@
|
||||
|
||||
TRCAA uses **SQLite** via `rusqlite` with the `bundled-sqlcipher` feature for AES-256 encryption in production. 22 versioned migrations are tracked in the `_migrations` table.
|
||||
|
||||
**DB file location:** `{app_data_dir}/trcaa.db`
|
||||
**DB file location:** `{app_data_dir}/tftsr.db`
|
||||
|
||||
---
|
||||
|
||||
|
||||
@ -40,9 +40,9 @@ npm install --legacy-peer-deps
|
||||
| `RUST_LOG` | `info` | Tracing verbosity: `debug`, `info`, `warn`, `error` |
|
||||
|
||||
Application data is stored at:
|
||||
- **Linux:** `~/.local/share/trcaa/`
|
||||
- **macOS:** `~/Library/Application Support/trcaa/`
|
||||
- **Windows:** `%APPDATA%\trcaa\`
|
||||
- **Linux:** `~/.local/share/tftsr/`
|
||||
- **macOS:** `~/Library/Application Support/tftsr/`
|
||||
- **Windows:** `%APPDATA%\tftsr\`
|
||||
|
||||
---
|
||||
|
||||
|
||||
@ -2,7 +2,7 @@
|
||||
|
||||
**Troubleshooting and RCA Assistant** is a secure desktop application for guided IT incident triage, root cause analysis (RCA), and post-mortem documentation. Built with Tauri 2.x (Rust + WebView) and React 18.
|
||||
|
||||
**CI:**  — rustfmt · clippy · 64 Rust tests · tsc · vitest — all green
|
||||
**CI:**  — rustfmt · clippy · 64 Rust tests · tsc · vitest — all green
|
||||
|
||||
## Quick Navigation
|
||||
|
||||
@ -45,7 +45,7 @@
|
||||
|
||||
**Platforms:** linux/amd64 · linux/arm64 · windows/amd64 (.deb, .rpm, .AppImage, .exe, .msi)
|
||||
|
||||
Download from [Releases](https://gogs.trcaa.com/sarman/trcaa-devops_investigation/releases). All builds are produced natively (no QEMU emulation).
|
||||
Download from [Releases](https://gogs.tftsr.com/sarman/tftsr-devops_investigation/releases). All builds are produced natively (no QEMU emulation).
|
||||
|
||||
## Project Status
|
||||
|
||||
|
||||
@ -6,14 +6,14 @@
|
||||
|
||||
**Check:**
|
||||
1. Verify the workflow file exists in `.gitea/workflows/` on the pushed branch
|
||||
2. Check the Actions tab at `http://gitea.tftsr.com:3000/sarman/trcaa-devops_investigation/actions`
|
||||
2. Check the Actions tab at `http://172.0.0.29:3000/sarman/tftsr-devops_investigation/actions`
|
||||
3. Confirm the act_runner is online: `docker logs gitea_act_runner_amd64 --since 5m`
|
||||
|
||||
---
|
||||
|
||||
### Job Container Can't Reach Gitea (`gitea.tftsr.com:3000` blocked)
|
||||
### Job Container Can't Reach Gitea (`172.0.0.29:3000` blocked)
|
||||
|
||||
**Cause:** act_runner creates an isolated Docker network per job (when `container:` is specified). Traffic from the job container to `gitea.tftsr.com:3000` is blocked by the host firewall.
|
||||
**Cause:** act_runner creates an isolated Docker network per job (when `container:` is specified). Traffic from the job container to `172.0.0.29:3000` is blocked by the host firewall.
|
||||
|
||||
**Fix:** Ensure `container.network: host` is set in the act_runner config AND that `CONFIG_FILE=/data/config.yaml` is in the container's environment:
|
||||
|
||||
@ -50,7 +50,7 @@ Restart runner: `docker restart gitea_act_runner_amd64`
|
||||
run: |
|
||||
apt-get update -qq && apt-get install -y -qq git
|
||||
git init
|
||||
git remote add origin http://gitea.tftsr.com:3000/sarman/trcaa-devops_investigation.git
|
||||
git remote add origin http://172.0.0.29:3000/sarman/tftsr-devops_investigation.git
|
||||
git fetch --depth=1 origin $GITHUB_SHA
|
||||
git checkout FETCH_HEAD
|
||||
```
|
||||
@ -177,7 +177,7 @@ sudo apt-get install -y libwebkit2gtk-4.1-dev libssl-dev libgtk-3-dev \
|
||||
|
||||
1. `TRCAA_DB_KEY` (or legacy `TRCAA_DB_KEY`) env var is set
|
||||
2. Key matches what was used when DB was created
|
||||
3. File isn't corrupted: `file trcaa.db` should say `SQLite 3.x database`
|
||||
3. File isn't corrupted: `file tftsr.db` should say `SQLite 3.x database`
|
||||
|
||||
---
|
||||
|
||||
@ -228,7 +228,7 @@ Common causes:
|
||||
### API Token Authentication
|
||||
|
||||
```bash
|
||||
curl -H "Authorization: token <token_value>" http://gitea.tftsr.com:3000/api/v1/user
|
||||
curl -H "Authorization: token <token_value>" http://172.0.0.29:3000/api/v1/user
|
||||
```
|
||||
|
||||
Create tokens in Gitea Settings > Applications > Access Tokens, or via admin CLI:
|
||||
|
||||
@ -69,7 +69,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn parse_msigenai_chatgpt_tool_calls_from_json_text() {
|
||||
// TFTSRGenAI ChatGPT format: returns tool calls as JSON object in msg
|
||||
// GenAI ChatGPT format: returns tool calls as JSON object in msg
|
||||
let content = r#"{"tool_calls":[{"id":"call_1","type":"function","function":{"name":"execute_shell_command","arguments":{"command":"kubectl get namespaces"}}}]}"#;
|
||||
|
||||
let result = OpenAiProvider::parse_tool_calls_from_text(content);
|
||||
@ -84,7 +84,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn parse_msigenai_claude_tool_calls_from_xml_wrapper() {
|
||||
// TFTSRGenAI Claude format: XML wrapper around JSON array
|
||||
// GenAI Claude format: XML wrapper around JSON array
|
||||
let content = r#"<tool_calls>
|
||||
[{"id":"call_1","type":"function","function":{"name":"execute_shell_command","arguments":{"command":"kubectl get pods"}}}]
|
||||
</tool_calls>"#;
|
||||
@ -513,14 +513,14 @@ impl OpenAiProvider {
|
||||
}
|
||||
});
|
||||
|
||||
// WORKAROUND: TFTSRGenAI gateway bug - tool calls returned as JSON text in 'msg' field
|
||||
// WORKAROUND: GenAI gateway bug - tool calls returned as JSON text in 'msg' field
|
||||
// Expected: {"tool_calls": [...]}
|
||||
// Actual: {"msg": '{"tool_calls":[...]}'} or {"msg": '<tool_calls>[...]</tool_calls>'}
|
||||
if tool_calls.is_none() {
|
||||
// Try parsing tool calls from msg content (TFTSRGenAI workaround)
|
||||
// Try parsing tool calls from msg content (GenAI workaround)
|
||||
if let Some(parsed_calls) = Self::parse_tool_calls_from_text(&content) {
|
||||
tracing::warn!(
|
||||
"TFTSR GenAI: TFTSRGenAI workaround - parsed {} tool calls from msg text (gateway should return structured tool_calls field)",
|
||||
"TFTSR GenAI: GenAI workaround - parsed {} tool calls from msg text (gateway should return structured tool_calls field)",
|
||||
parsed_calls.len()
|
||||
);
|
||||
tool_calls = Some(parsed_calls);
|
||||
@ -541,9 +541,9 @@ impl OpenAiProvider {
|
||||
})
|
||||
}
|
||||
|
||||
/// Parse tool calls from text content (TFTSRGenAI gateway workaround)
|
||||
/// Parse tool calls from text content (GenAI gateway workaround)
|
||||
///
|
||||
/// TFTSRGenAI returns tool calls as JSON text in the 'msg' field instead of structured data:
|
||||
/// GenAI returns tool calls as JSON text in the 'msg' field instead of structured data:
|
||||
/// - ChatGPT models: `{"tool_calls":[...]}`
|
||||
/// - Claude models: `<tool_calls>[...]</tool_calls>`
|
||||
fn parse_tool_calls_from_text(content: &str) -> Option<Vec<crate::ai::ToolCall>> {
|
||||
|
||||
@ -719,7 +719,7 @@ mod tests {
|
||||
#[test]
|
||||
fn test_validate_log_file_path_accepts_small_file() {
|
||||
let file_path =
|
||||
std::env::temp_dir().join(format!("trcaa-analysis-test-{}.log", uuid::Uuid::now_v7()));
|
||||
std::env::temp_dir().join(format!("tftsr-analysis-test-{}.log", uuid::Uuid::now_v7()));
|
||||
std::fs::write(&file_path, "hello").unwrap();
|
||||
let result = validate_log_file_path(file_path.to_string_lossy().as_ref());
|
||||
assert!(result.is_ok());
|
||||
@ -775,7 +775,7 @@ mod tests {
|
||||
#[test]
|
||||
fn test_extract_text_plain_file() {
|
||||
let dir = std::env::temp_dir();
|
||||
let path = dir.join(format!("trcaa-test-extract-{}.txt", uuid::Uuid::now_v7()));
|
||||
let path = dir.join(format!("tftsr-test-extract-{}.txt", uuid::Uuid::now_v7()));
|
||||
std::fs::write(&path, "hello world").unwrap();
|
||||
let result = extract_text_content(&path);
|
||||
assert!(result.is_ok());
|
||||
|
||||
@ -143,7 +143,7 @@ fn is_plain_sqlite(path: &Path) -> bool {
|
||||
|
||||
pub fn init_db(data_dir: &Path) -> anyhow::Result<Connection> {
|
||||
std::fs::create_dir_all(data_dir)?;
|
||||
let db_path = data_dir.join("trcaa.db");
|
||||
let db_path = data_dir.join("tftsr.db");
|
||||
|
||||
let key = get_db_key(data_dir)?;
|
||||
|
||||
@ -180,7 +180,7 @@ mod tests {
|
||||
.duration_since(SystemTime::UNIX_EPOCH)
|
||||
.unwrap()
|
||||
.as_nanos();
|
||||
let dir = std::env::temp_dir().join(format!("trcaa-test-{}-{}", name, timestamp));
|
||||
let dir = std::env::temp_dir().join(format!("tftsr-test-{}-{}", name, timestamp));
|
||||
// Clean up if it exists
|
||||
let _ = std::fs::remove_dir_all(&dir);
|
||||
std::fs::create_dir_all(&dir).unwrap();
|
||||
|
||||
@ -249,7 +249,7 @@ mod tests {
|
||||
|
||||
#[test]
|
||||
fn test_export_markdown_writes_file() {
|
||||
let dir = std::env::temp_dir().join("trcaa_test_export");
|
||||
let dir = std::env::temp_dir().join("tftsr_test_export");
|
||||
let path = dir.join("test.md");
|
||||
let _ = std::fs::remove_file(&path);
|
||||
export_markdown("# Test\n\nContent", path.to_str().unwrap()).unwrap();
|
||||
|
||||
@ -175,14 +175,14 @@ pub async fn extract_cookies_via_ipc<R: tauri::Runtime>(
|
||||
let check_and_signal_script = r#"
|
||||
try {
|
||||
if (typeof window.__TRCAA_ERROR__ !== 'undefined') {
|
||||
window.localStorage.setItem('trcaa_result', JSON.stringify({ error: window.__TRCAA_ERROR__ }));
|
||||
window.localStorage.setItem('tftsr_result', JSON.stringify({ error: window.__TRCAA_ERROR__ }));
|
||||
} else if (typeof window.__TRCAA_COOKIES__ !== 'undefined' && window.__TRCAA_COOKIES__.length > 0) {
|
||||
window.localStorage.setItem('trcaa_result', JSON.stringify({ cookies: window.__TRCAA_COOKIES__ }));
|
||||
window.localStorage.setItem('tftsr_result', JSON.stringify({ cookies: window.__TRCAA_COOKIES__ }));
|
||||
} else if (typeof window.__TRCAA_COOKIES__ !== 'undefined') {
|
||||
window.localStorage.setItem('trcaa_result', JSON.stringify({ cookies: [] }));
|
||||
window.localStorage.setItem('tftsr_result', JSON.stringify({ cookies: [] }));
|
||||
}
|
||||
} catch (e) {
|
||||
window.localStorage.setItem('trcaa_result', JSON.stringify({ error: e.message }));
|
||||
window.localStorage.setItem('tftsr_result', JSON.stringify({ error: e.message }));
|
||||
}
|
||||
"#;
|
||||
|
||||
@ -194,9 +194,9 @@ pub async fn extract_cookies_via_ipc<R: tauri::Runtime>(
|
||||
// Execute script that sets document.title temporarily
|
||||
let read_via_title = r#"
|
||||
(function() {
|
||||
const result = window.localStorage.getItem('trcaa_result');
|
||||
const result = window.localStorage.getItem('tftsr_result');
|
||||
if (result) {
|
||||
window.localStorage.removeItem('trcaa_result');
|
||||
window.localStorage.removeItem('tftsr_result');
|
||||
// Store in title temporarily for Rust to read
|
||||
window.__TRCAA_ORIGINAL_TITLE__ = document.title;
|
||||
document.title = 'TRCAA_RESULT:' + result;
|
||||
|
||||
@ -45,7 +45,7 @@ pub async fn fetch_from_webview<R: tauri::Runtime>(
|
||||
}});
|
||||
|
||||
if (!response.ok) {{
|
||||
window.location.hash = '#trcaa-error-' + requestId + '-' + encodeURIComponent(JSON.stringify({{
|
||||
window.location.hash = '#tftsr-error-' + requestId + '-' + encodeURIComponent(JSON.stringify({{
|
||||
error: `HTTP ${{response.status}}: ${{response.statusText}}`
|
||||
}}));
|
||||
return;
|
||||
@ -53,9 +53,9 @@ pub async fn fetch_from_webview<R: tauri::Runtime>(
|
||||
|
||||
const data = await response.json();
|
||||
// Store in hash - we'll poll for this
|
||||
window.location.hash = '#trcaa-success-' + requestId + '-' + encodeURIComponent(JSON.stringify(data));
|
||||
window.location.hash = '#tftsr-success-' + requestId + '-' + encodeURIComponent(JSON.stringify(data));
|
||||
}} catch (error) {{
|
||||
window.location.hash = '#trcaa-error-' + requestId + '-' + encodeURIComponent(JSON.stringify({{
|
||||
window.location.hash = '#tftsr-error-' + requestId + '-' + encodeURIComponent(JSON.stringify({{
|
||||
error: error.message
|
||||
}}));
|
||||
}}
|
||||
@ -77,7 +77,7 @@ pub async fn fetch_from_webview<R: tauri::Runtime>(
|
||||
let url_string = url_str.to_string();
|
||||
|
||||
// Check for success
|
||||
let success_marker = format!("#trcaa-success-{request_id}-");
|
||||
let success_marker = format!("#tftsr-success-{request_id}-");
|
||||
if url_string.contains(&success_marker) {
|
||||
// Extract the JSON from the hash
|
||||
if let Some(json_start) = url_string.find(&success_marker) {
|
||||
@ -96,7 +96,7 @@ pub async fn fetch_from_webview<R: tauri::Runtime>(
|
||||
}
|
||||
|
||||
// Check for error
|
||||
let error_marker = format!("#trcaa-error-{request_id}-");
|
||||
let error_marker = format!("#tftsr-error-{request_id}-");
|
||||
if url_string.contains(&error_marker) {
|
||||
if let Some(json_start) = url_string.find(&error_marker) {
|
||||
let json_encoded = &url_string[json_start + error_marker.len()..];
|
||||
|
||||
@ -110,7 +110,7 @@ async fn search_confluence_from_webview<R: tauri::Runtime>(
|
||||
);
|
||||
|
||||
// Execute JavaScript and store result in localStorage for retrieval
|
||||
let storage_key = format!("__trcaa_search_{}__", uuid::Uuid::now_v7());
|
||||
let storage_key = format!("__tftsr_search_{}__", uuid::Uuid::now_v7());
|
||||
let callback_script = format!(
|
||||
r#"
|
||||
{}
|
||||
|
||||
@ -42,7 +42,7 @@ pub fn run() {
|
||||
pending_approvals: Arc::new(tokio::sync::Mutex::new(std::collections::HashMap::new())),
|
||||
};
|
||||
let stronghold_salt = format!(
|
||||
"trcaa-stronghold-salt-v1-{:x}",
|
||||
"tftsr-stronghold-salt-v1-{:x}",
|
||||
Sha256::digest(data_dir.to_string_lossy().as_bytes())
|
||||
);
|
||||
|
||||
@ -190,13 +190,13 @@ fn dirs_data_dir() -> std::path::PathBuf {
|
||||
#[cfg(target_os = "linux")]
|
||||
{
|
||||
if let Ok(xdg) = std::env::var("XDG_DATA_HOME") {
|
||||
return std::path::PathBuf::from(xdg).join("trcaa");
|
||||
return std::path::PathBuf::from(xdg).join("tftsr");
|
||||
}
|
||||
if let Ok(home) = std::env::var("HOME") {
|
||||
return std::path::PathBuf::from(home)
|
||||
.join(".local")
|
||||
.join("share")
|
||||
.join("trcaa");
|
||||
.join("tftsr");
|
||||
}
|
||||
}
|
||||
|
||||
@ -206,17 +206,17 @@ fn dirs_data_dir() -> std::path::PathBuf {
|
||||
return std::path::PathBuf::from(home)
|
||||
.join("Library")
|
||||
.join("Application Support")
|
||||
.join("trcaa");
|
||||
.join("tftsr");
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(target_os = "windows")]
|
||||
{
|
||||
if let Ok(appdata) = std::env::var("APPDATA") {
|
||||
return std::path::PathBuf::from(appdata).join("trcaa");
|
||||
return std::path::PathBuf::from(appdata).join("tftsr");
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback
|
||||
std::path::PathBuf::from("./trcaa-data")
|
||||
std::path::PathBuf::from("./tftsr-data")
|
||||
}
|
||||
|
||||
@ -104,14 +104,14 @@ pub fn get_app_data_dir() -> Option<PathBuf> {
|
||||
#[cfg(target_os = "linux")]
|
||||
{
|
||||
if let Ok(xdg) = std::env::var("XDG_DATA_HOME") {
|
||||
return Some(PathBuf::from(xdg).join("trcaa"));
|
||||
return Some(PathBuf::from(xdg).join("tftsr"));
|
||||
}
|
||||
if let Ok(home) = std::env::var("HOME") {
|
||||
return Some(
|
||||
PathBuf::from(home)
|
||||
.join(".local")
|
||||
.join("share")
|
||||
.join("trcaa"),
|
||||
.join("tftsr"),
|
||||
);
|
||||
}
|
||||
}
|
||||
@ -123,7 +123,7 @@ pub fn get_app_data_dir() -> Option<PathBuf> {
|
||||
PathBuf::from(home)
|
||||
.join("Library")
|
||||
.join("Application Support")
|
||||
.join("trcaa"),
|
||||
.join("tftsr"),
|
||||
);
|
||||
}
|
||||
}
|
||||
@ -131,10 +131,10 @@ pub fn get_app_data_dir() -> Option<PathBuf> {
|
||||
#[cfg(target_os = "windows")]
|
||||
{
|
||||
if let Ok(appdata) = std::env::var("APPDATA") {
|
||||
return Some(PathBuf::from(appdata).join("trcaa"));
|
||||
return Some(PathBuf::from(appdata).join("tftsr"));
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback
|
||||
Some(PathBuf::from("./trcaa-data"))
|
||||
Some(PathBuf::from("./tftsr-data"))
|
||||
}
|
||||
|
||||
@ -15,13 +15,13 @@ that gap and adds `actions/cache@v3` for Cargo and npm.
|
||||
|
||||
- [ ] `Dockerfile.linux-amd64` includes `rustfmt` and `clippy` components
|
||||
- [ ] `Dockerfile.linux-arm64` includes `rustfmt` and `clippy` components
|
||||
- [ ] `test.yml` Rust jobs use `gitea.tftsr.com:3000/sarman/trcaa-linux-amd64:rust1.88-node22`
|
||||
- [ ] `test.yml` Rust jobs use `gitea.tftsr.com:3000/sarman/tftsr-linux-amd64:rust1.88-node22`
|
||||
- [ ] `test.yml` Rust jobs have no inline `apt-get` or `rustup component add` steps
|
||||
- [ ] `test.yml` Rust jobs include `actions/cache@v3` for `~/.cargo/registry`
|
||||
- [ ] `test.yml` frontend jobs include `actions/cache@v3` for `~/.npm`
|
||||
- [ ] `auto-tag.yml` `build-linux-amd64` uses pre-baked `trcaa-linux-amd64` image
|
||||
- [ ] `auto-tag.yml` `build-windows-amd64` uses pre-baked `trcaa-windows-cross` image
|
||||
- [ ] `auto-tag.yml` `build-linux-arm64` uses pre-baked `trcaa-linux-arm64` image
|
||||
- [ ] `auto-tag.yml` `build-linux-amd64` uses pre-baked `tftsr-linux-amd64` image
|
||||
- [ ] `auto-tag.yml` `build-windows-amd64` uses pre-baked `tftsr-windows-cross` image
|
||||
- [ ] `auto-tag.yml` `build-linux-arm64` uses pre-baked `tftsr-linux-arm64` image
|
||||
- [ ] All three build jobs have no `Install dependencies` step
|
||||
- [ ] All three build jobs include `actions/cache@v3` for Cargo and npm
|
||||
- [ ] `docs/wiki/CICD-Pipeline.md` documents pre-baked images, cache keys, and server prerequisites
|
||||
@ -40,7 +40,7 @@ existing `rustup` installation RUN command (chained with `&&` to keep it one lay
|
||||
|
||||
### `.gitea/workflows/test.yml`
|
||||
- **rust-fmt-check**, **rust-clippy**, **rust-tests**: switched container image from
|
||||
`rust:1.88-slim` → `gitea.tftsr.com:3000/sarman/trcaa-linux-amd64:rust1.88-node22`.
|
||||
`rust:1.88-slim` → `gitea.tftsr.com:3000/sarman/tftsr-linux-amd64:rust1.88-node22`.
|
||||
Removed `apt-get install git` from Checkout steps (git is pre-installed in image).
|
||||
Removed `apt-get install libwebkit2gtk-...` steps.
|
||||
Removed `rustup component add rustfmt` and `rustup component add clippy` steps.
|
||||
@ -50,14 +50,14 @@ existing `rustup` installation RUN command (chained with `&&` to keep it one lay
|
||||
Added `actions/cache@v3` step for `~/.npm` keyed on `package-lock.json` hash.
|
||||
|
||||
### `.gitea/workflows/auto-tag.yml`
|
||||
- **build-linux-amd64**: image `rust:1.88-slim` → `trcaa-linux-amd64:rust1.88-node22`.
|
||||
- **build-linux-amd64**: image `rust:1.88-slim` → `tftsr-linux-amd64:rust1.88-node22`.
|
||||
Removed Checkout apt-get install git, removed entire Install dependencies step.
|
||||
Removed `rustup target add x86_64-unknown-linux-gnu` from Build step. Added cargo + npm cache.
|
||||
- **build-windows-amd64**: image `rust:1.88-slim` → `trcaa-windows-cross:rust1.88-node22`.
|
||||
- **build-windows-amd64**: image `rust:1.88-slim` → `tftsr-windows-cross:rust1.88-node22`.
|
||||
Removed Checkout apt-get install git, removed entire Install dependencies step.
|
||||
Removed `rustup target add x86_64-pc-windows-gnu` from Build step.
|
||||
Added cargo (with `-windows-` suffix key to avoid collision) + npm cache.
|
||||
- **build-linux-arm64**: image `ubuntu:22.04` → `trcaa-linux-arm64:rust1.88-node22`.
|
||||
- **build-linux-arm64**: image `ubuntu:22.04` → `tftsr-linux-arm64:rust1.88-node22`.
|
||||
Removed Checkout apt-get install git, removed entire Install dependencies step (~40 lines).
|
||||
Removed `. "$HOME/.cargo/env"` (PATH already set via `ENV` in Dockerfile).
|
||||
Removed `rustup target add aarch64-unknown-linux-gnu` from Build step.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user