feat: incident response methodology + UTC timeline tracking #45

Closed
sarman wants to merge 0 commits from feat/incident-response-timeline into master
Owner

Summary

  • Add timeline_events database table (migration 017) to record UTC-timestamped triage events as they happen
  • RCA and postmortem generators populate real timeline data instead of placeholders — includes Incident Timeline section, Incident Metrics, calculated durations
  • Inject INCIDENT_RESPONSE_FRAMEWORK methodology (5 phases: Detection, Diagnosis, RCA, Resolution, Post-Incident) into all 17 domain-specific AI system prompts
  • Record 7 event types at key triage moments: triage_started, log_uploaded, why_level_advanced, root_cause_identified, rca_generated, postmortem_generated, document_exported
  • Security hardening: event_type whitelist, JSON metadata validation, 10KB size cap, atomic SQLite transaction for dual-write to timeline_events + audit_log

Test plan

  • cargo fmt --check — clean
  • cargo clippy -- -D warnings — 0 warnings
  • cargo test — 170/170 passed (22 new tests)
  • npx tsc --noEmit — 0 errors
  • npm run test:run — 94/94 passed (12 new tests)
  • Manual: create issue, triage, verify timeline events in DB
  • Manual: generate RCA — verify real timeline table and metrics
  • Manual: generate postmortem — verify real timeline rows and duration
  • Manual: verify AI responses reflect incident methodology
## Summary - Add `timeline_events` database table (migration 017) to record UTC-timestamped triage events as they happen - RCA and postmortem generators populate real timeline data instead of placeholders — includes Incident Timeline section, Incident Metrics, calculated durations - Inject `INCIDENT_RESPONSE_FRAMEWORK` methodology (5 phases: Detection, Diagnosis, RCA, Resolution, Post-Incident) into all 17 domain-specific AI system prompts - Record 7 event types at key triage moments: triage_started, log_uploaded, why_level_advanced, root_cause_identified, rca_generated, postmortem_generated, document_exported - Security hardening: event_type whitelist, JSON metadata validation, 10KB size cap, atomic SQLite transaction for dual-write to timeline_events + audit_log ## Test plan - [x] `cargo fmt --check` — clean - [x] `cargo clippy -- -D warnings` — 0 warnings - [x] `cargo test` — 170/170 passed (22 new tests) - [x] `npx tsc --noEmit` — 0 errors - [x] `npm run test:run` — 94/94 passed (12 new tests) - [ ] Manual: create issue, triage, verify timeline events in DB - [ ] Manual: generate RCA — verify real timeline table and metrics - [ ] Manual: generate postmortem — verify real timeline rows and duration - [ ] Manual: verify AI responses reflect incident methodology
sarman added 5 commits 2026-04-19 23:28:17 +00:00
- Add migration 017_create_timeline_events with indexes
- Update TimelineEvent struct with issue_id, metadata, UTC string timestamps
- Add TimelineEvent::new() constructor with UUIDv7
- Add timeline_events field to IssueDetail
- Rewrite add_timeline_event to write to new table + audit_log (dual-write)
- Add get_timeline_events command for ordered retrieval
- Update get_issue to load timeline_events
- Update delete_issue to clean up timeline_events
- Register get_timeline_events in generate_handler
- Add migration tests for table, indexes, and cascade delete
- Fix flaky derive_aes_key test (env var race condition in parallel tests)
Add format_event_type() and calculate_duration() helpers to convert
raw timeline events into human-readable tables and metrics. RCA now
includes an Incident Timeline section and Incident Metrics (event
count, duration, time-to-root-cause). Postmortem replaces placeholder
timeline rows with real events, calculates impact duration, and
auto-populates What Went Well from evidence.

10 new Rust tests covering timeline rendering, duration calculation,
and event type formatting.
Add INCIDENT_RESPONSE_FRAMEWORK to domainPrompts.ts and append it to
all 17 domain prompts via getDomainPrompt(). Add system_prompt param
to chat_message command so frontend can inject domain expertise. Record
UTC timeline events (triage_started, log_uploaded, why_level_advanced,
root_cause_identified, rca_generated, postmortem_generated,
document_exported) at key moments with non-blocking calls.

Update tauriCommands.ts with getTimelineEventsCmd, optional metadata on
addTimelineEventCmd, and systemPrompt on chatMessageCmd.

12 new frontend tests (9 domain prompts, 3 timeline events).
Address security review findings:
- Validate event_type against whitelist of 7 known types (M-3)
- Validate metadata is valid JSON and under 10KB (M-2, M-4)
- Include metadata in audit log details (M-2)
- Wrap timeline insert + audit write + timestamp update in a
  SQLite transaction for atomicity (M-5)
- Fix TypeScript TimelineEvent interface: add issue_id, metadata
  fields and correct created_at type to string (L-3)
- Add timeline_events to IssueDetail TypeScript interface (L-4)
docs: update wiki for timeline events and incident response methodology
Some checks failed
Test / rust-fmt-check (pull_request) Successful in 1m12s
Test / frontend-typecheck (pull_request) Successful in 1m17s
Test / frontend-tests (pull_request) Successful in 1m25s
PR Review Automation / review (pull_request) Failing after 2m45s
Test / rust-clippy (pull_request) Successful in 4m26s
Test / rust-tests (pull_request) Successful in 5m42s
d715ba0b25
- Database.md: document timeline_events table (migration 017), event
  types, dual-write strategy, correct migration count to 17
- IPC-Commands.md: document get_timeline_events, updated
  add_timeline_event with metadata, chat_message system_prompt param
- Architecture.md: document incident response methodology integration,
  5-phase framework, system prompt injection, correct migration count
sarman reviewed 2026-04-19 23:31:01 +00:00
sarman left a comment
Author
Owner

⚠️ Automated PR Review could not be completed — Ollama analysis failed or produced no output.

⚠️ Automated PR Review could not be completed — Ollama analysis failed or produced no output.
sarman closed this pull request 2026-04-19 23:38:07 +00:00
Some checks failed
Test / rust-fmt-check (pull_request) Successful in 1m12s
Test / frontend-typecheck (pull_request) Successful in 1m17s
Test / frontend-tests (pull_request) Successful in 1m25s
PR Review Automation / review (pull_request) Failing after 2m45s
Test / rust-clippy (pull_request) Successful in 4m26s
Test / rust-tests (pull_request) Successful in 5m42s

Pull request closed

Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sarman/tftsr-devops_investigation#45
No description provided.