diff --git a/docs/wiki/Architecture.md b/docs/wiki/Architecture.md index ce6089b7..7f93c79f 100644 --- a/docs/wiki/Architecture.md +++ b/docs/wiki/Architecture.md @@ -50,7 +50,7 @@ All command handlers receive `State<'_, AppState>` as a Tauri-injected parameter | `commands/integrations.rs` | Confluence / ServiceNow / ADO — v0.2 stubs | | `ai/provider.rs` | `Provider` trait + `create_provider()` factory | | `pii/detector.rs` | Multi-pattern PII scanner with overlap resolution | -| `db/migrations.rs` | Versioned schema (12 migrations in `_migrations` table) | +| `db/migrations.rs` | Versioned schema (17 migrations in `_migrations` table) | | `db/models.rs` | All DB types — see `IssueDetail` note below | | `docs/rca.rs` + `docs/postmortem.rs` | Markdown template builders | | `audit/log.rs` | `write_audit_event()` — called before every external send | @@ -176,6 +176,55 @@ pub struct IssueDetail { Use `detail.issue.title`, **not** `detail.title`. +## Incident Response Methodology + +The application integrates a comprehensive incident response framework via system prompt injection. The `INCIDENT_RESPONSE_FRAMEWORK` constant in `src/lib/domainPrompts.ts` is appended to all 17 domain-specific system prompts (Linux, Windows, Network, Kubernetes, Databases, Virtualization, Hardware, Observability, and others). + +**5-Phase Framework:** + +1. **Detection & Evidence Gathering** — Initial issue assessment, log collection, PII redaction +2. **Diagnosis & Hypothesis Testing** — AI-assisted analysis, pattern matching against known incidents +3. **Root Cause Analysis with 5-Whys** — Iterative questioning to identify underlying cause (steps 1–5) +4. **Resolution & Prevention** — Remediation planning and implementation +5. **Post-Incident Review** — Timeline-based blameless post-mortem and lessons learned + +**System Prompt Injection:** + +The `chat_message` command accepts an optional `system_prompt` parameter. If provided, it prepends domain expertise before the conversation history. If omitted, the framework selects the appropriate domain prompt based on the issue category. This allows: + +- **Specialized expertise**: Different frameworks for Linux vs. Kubernetes vs. Network incidents +- **Flexible override**: Users can inject custom system prompts for cross-domain problems +- **Consistent methodology**: All 17 domain prompts follow the same 5-phase incident response structure + +**Timeline Event Recording:** + +Timeline events are recorded non-blockingly at key triage moments: + +``` +Issue Creation → triage_started + ↓ +Log Upload → log_uploaded (metadata: file_name, file_size) + ↓ +Why-Level Progression → why_level_advanced (metadata: from_level → to_level) + ↓ +Root Cause Identified → root_cause_identified (metadata: root_cause, confidence) + ↓ +RCA Generated → rca_generated (metadata: doc_id, section_count) + ↓ +Postmortem Generated → postmortem_generated (metadata: doc_id, timeline_events_count) + ↓ +Document Exported → document_exported (metadata: format, file_path) +``` + +**Document Generation:** + +RCA and Postmortem generators now use real timeline event data instead of placeholders: + +- **RCA**: Incorporates timeline to show detection-to-root-cause progression +- **Postmortem**: Uses full timeline to demonstrate the complete incident lifecycle and response effectiveness + +Timeline events are stored in the `timeline_events` table (indexed by issue_id and created_at for fast retrieval) and dual-written to `audit_log` for security/compliance purposes. + ## Application Startup Sequence ``` diff --git a/docs/wiki/Database.md b/docs/wiki/Database.md index adcd0c21..452395ff 100644 --- a/docs/wiki/Database.md +++ b/docs/wiki/Database.md @@ -2,7 +2,7 @@ ## Overview -TFTSR uses **SQLite** via `rusqlite` with the `bundled-sqlcipher` feature for AES-256 encryption in production. 12 versioned migrations are tracked in the `_migrations` table. +TFTSR uses **SQLite** via `rusqlite` with the `bundled-sqlcipher` feature for AES-256 encryption in production. 17 versioned migrations are tracked in the `_migrations` table. **DB file location:** `{app_data_dir}/tftsr.db` @@ -38,7 +38,7 @@ pub fn init_db(data_dir: &Path) -> anyhow::Result { --- -## Schema (11 Migrations) +## Schema (17 Migrations) ### 001 — issues @@ -245,6 +245,51 @@ CREATE TABLE image_attachments ( - Basic auth (ServiceNow): Store encrypted password - One credential per service (enforced by UNIQUE constraint) +### 017 — timeline_events (Incident Response Timeline) + +```sql +CREATE TABLE timeline_events ( + id TEXT PRIMARY KEY, + issue_id TEXT NOT NULL REFERENCES issues(id) ON DELETE CASCADE, + event_type TEXT NOT NULL, + description TEXT NOT NULL, + metadata TEXT, -- JSON object with event-specific data + created_at TEXT NOT NULL +); + +CREATE INDEX idx_timeline_events_issue ON timeline_events(issue_id); +CREATE INDEX idx_timeline_events_time ON timeline_events(created_at); +``` + +**Event Types:** +- `triage_started` — Incident response begins, initial issue properties recorded +- `log_uploaded` — Log file uploaded and analyzed +- `why_level_advanced` — 5-Whys entry completed, progression to next level +- `root_cause_identified` — Root cause determined from analysis +- `rca_generated` — Root Cause Analysis document created +- `postmortem_generated` — Post-mortem document created +- `document_exported` — Document exported to file (MD or PDF) + +**Metadata Structure (JSON):** +```json +{ + "triage_started": {"severity": "high", "category": "network"}, + "log_uploaded": {"file_name": "app.log", "file_size": 2048576}, + "why_level_advanced": {"from_level": 2, "to_level": 3, "question": "Why did the service timeout?"}, + "root_cause_identified": {"root_cause": "DNS resolution failure", "confidence": 0.95}, + "rca_generated": {"doc_id": "doc_abc123", "section_count": 7}, + "postmortem_generated": {"doc_id": "doc_def456", "timeline_events_count": 12}, + "document_exported": {"format": "pdf", "file_path": "/home/user/docs/rca.pdf"} +} +``` + +**Design Notes:** +- Timeline events are **queryable** (indexed by issue_id and created_at) for document generation +- Dual-write: Events recorded to both `timeline_events` and `audit_log` — timeline for chronological reporting, audit_log for security/compliance +- `created_at`: TEXT UTC timestamp (`YYYY-MM-DD HH:MM:SS`) +- Non-blocking writes: Timeline events recorded asynchronously at key triage moments +- Cascade delete from issues ensures cleanup + --- ## Key Design Notes @@ -289,4 +334,13 @@ pub struct AuditEntry { pub user_id: String, pub details: Option, } + +pub struct TimelineEvent { + pub id: String, + pub issue_id: String, + pub event_type: String, + pub description: String, + pub metadata: Option, // JSON + pub created_at: String, +} ``` diff --git a/docs/wiki/IPC-Commands.md b/docs/wiki/IPC-Commands.md index ad931460..6e7081e8 100644 --- a/docs/wiki/IPC-Commands.md +++ b/docs/wiki/IPC-Commands.md @@ -62,11 +62,27 @@ updateFiveWhyCmd(entryId: string, answer: string) → void ``` Sets or updates the answer for an existing 5-Whys entry. +### `get_timeline_events` +```typescript +getTimelineEventsCmd(issueId: string) → TimelineEvent[] +``` +Retrieves all timeline events for an issue, ordered by created_at ascending. +```typescript +interface TimelineEvent { + id: string; + issue_id: string; + event_type: string; // One of: triage_started, log_uploaded, why_level_advanced, etc. + description: string; + metadata?: Record; // Event-specific JSON data + created_at: string; // UTC timestamp +} +``` + ### `add_timeline_event` ```typescript -addTimelineEventCmd(issueId: string, eventType: string, description: string) → TimelineEvent +addTimelineEventCmd(issueId: string, eventType: string, description: string, metadata?: Record) → TimelineEvent ``` -Records a timestamped event in the issue timeline. +Records a timestamped event in the issue timeline. Dual-writes to both `timeline_events` (for document generation) and `audit_log` (for security audit trail). --- @@ -137,9 +153,9 @@ Sends selected (redacted) log files to the AI provider with an analysis prompt. ### `chat_message` ```typescript -chatMessageCmd(issueId: string, message: string, providerConfig: ProviderConfig) → ChatResponse +chatMessageCmd(issueId: string, message: string, providerConfig: ProviderConfig, systemPrompt?: string) → ChatResponse ``` -Sends a message in the ongoing triage conversation. Domain system prompt is injected automatically on first message. AI response is parsed for why-level indicators (1–5). +Sends a message in the ongoing triage conversation. Optional `systemPrompt` parameter allows prepending domain expertise before conversation history. If not provided, the domain-specific system prompt for the issue category is injected automatically on first message. AI response is parsed for why-level indicators (1–5). ### `list_providers` ```typescript @@ -155,13 +171,13 @@ Returns the list of supported providers with their available models and configur ```typescript generateRcaCmd(issueId: string) → Document ``` -Builds an RCA Markdown document from the issue data, 5-Whys answers, and timeline. +Builds an RCA Markdown document from the issue data, 5-Whys answers, and timeline events. Uses real incident response timeline (log uploads, why-level progression, root cause identification) instead of placeholders. ### `generate_postmortem` ```typescript generatePostmortemCmd(issueId: string) → Document ``` -Builds a blameless post-mortem Markdown document. +Builds a blameless post-mortem Markdown document. Incorporates timeline events to show the full incident lifecycle: detection, diagnosis, resolution, and post-incident review phases. ### `update_document` ```typescript