From e5593cbfe26592c9fcf9bdcc4d5f416c921e880b Mon Sep 17 00:00:00 2001 From: Shaun Arman Date: Fri, 5 Jun 2026 08:12:19 -0500 Subject: [PATCH] docs: add ADRs for shell safety, MCP transport, kubectl bundling Architecture decision records with sanitized content (proprietary references removed). - ADR-007: Three-Tier Shell Safety Classification - ADR-008: MCP Protocol Integration (HTTP transport) - ADR-009: Bundled kubectl Binary rationale Co-Authored-By: Claude Sonnet 4.5 --- .../adrs/ADR-007-three-tier-shell-safety.md | 161 ++++++++++++ .../adrs/ADR-008-mcp-protocol-integration.md | 214 ++++++++++++++++ .../adrs/ADR-009-bundled-kubectl-binary.md | 241 ++++++++++++++++++ 3 files changed, 616 insertions(+) create mode 100644 docs/architecture/adrs/ADR-007-three-tier-shell-safety.md create mode 100644 docs/architecture/adrs/ADR-008-mcp-protocol-integration.md create mode 100644 docs/architecture/adrs/ADR-009-bundled-kubectl-binary.md diff --git a/docs/architecture/adrs/ADR-007-three-tier-shell-safety.md b/docs/architecture/adrs/ADR-007-three-tier-shell-safety.md new file mode 100644 index 00000000..6c2ac253 --- /dev/null +++ b/docs/architecture/adrs/ADR-007-three-tier-shell-safety.md @@ -0,0 +1,161 @@ +# ADR-007: Three-Tier Shell Command Safety Classification + +**Date**: 2026-06-02 +**Status**: Accepted +**Deciders**: Shaun Arman, Henry Castle, RJ Cooper +**Context**: Hackathon v1.0.0 — Agentic Shell Execution + +--- + +## Context + +TFTSR DevOps Investigation v1.0.0 introduced agentic shell command execution, allowing AI agents to execute kubectl, Proxmox, and general shell commands during troubleshooting conversations. This capability creates a significant security risk: malicious or hallucinated commands could cause data loss, service disruption, or unauthorized system access. + +**Requirements**: +- AI agents need shell access for diagnostics (kubectl, pvecm, qm, etc.) +- Read-only operations should execute immediately for fast iteration +- Mutating operations require explicit user approval +- Destructive operations must be blocked entirely +- Classification must handle pipes, chains, and command substitution +- System must be deterministic and testable + +**Alternatives Considered**: + +1. **Whitelist-only approach**: Maintain a fixed list of allowed commands + - ✅ Simple to implement + - ❌ Brittle — breaks with new commands or options + - ❌ Poor UX — blocks legitimate commands like `kubectl get pods -n custom-namespace` + +2. **Blacklist-only approach**: Block known-dangerous commands + - ✅ Flexible for new commands + - ❌ Fails-open — unknown dangerous commands execute + - ❌ False sense of security + +3. **LLM-based classification**: Ask another AI to classify command safety + - ✅ Context-aware decisions + - ❌ Non-deterministic — same command gets different classifications + - ❌ Latency — adds 500ms+ per command + - ❌ Cost — every command requires an AI call + - ❌ Cannot unit test + +4. **Sandbox all commands**: Execute in isolated containers + - ✅ Maximum safety + - ❌ Complex infrastructure + - ❌ Breaks kubectl (needs real cluster access) + - ❌ High latency + +--- + +## Decision + +**Implement a deterministic three-tier safety classification system with static analysis and rule-based tier assignment.** + +### Tier Definitions + +| Tier | Safety Level | Approval | Examples | +|------|--------------|----------|----------| +| **Tier 1** | Read-only, no side effects | Auto-execute | `kubectl get`, `describe`, `logs`, `cat`, `grep`, `ls`, `pvecm status`, `qm status` | +| **Tier 2** | Mutating, potentially disruptive | User approval required | `kubectl apply`, `delete`, `scale`, `chmod`, `systemctl restart`, `ssh`, `chown` | +| **Tier 3** | Destructive, unrecoverable | Always deny | `rm -rf`, `shutdown`, `reboot`, `mkfs`, `dd if=/dev/zero`, `:(){:\|:&};:` (fork bomb) | + +### Classification Rules + +1. **Single command**: Classify by command + subcommand pattern + - `kubectl get` → Tier 1 + - `kubectl apply` → Tier 2 + - `rm -rf` → Tier 3 + +2. **Piped commands** (`|`): Highest tier wins + - `kubectl get pods | grep nginx` → max(Tier 1, Tier 1) = Tier 1 + - `cat /etc/passwd | tee /tmp/backup` → max(Tier 1, Tier 2) = Tier 2 + +3. **Command chains** (`&&`, `||`, `;`): Highest tier wins + - `ls && cat file` → max(Tier 1, Tier 1) = Tier 1 + - `kubectl delete pod nginx && kubectl get pods` → max(Tier 2, Tier 1) = Tier 2 + +4. **Command substitution** (`` `...` ``, `$(...)`): Escalate Tier 1 to Tier 2 + - `kubectl get pods $(cat namespace.txt)` → Tier 2 (even if `kubectl get` is Tier 1) + - Rationale: Command substitution introduces hidden indirection + +5. **Any Tier 3 in chain**: Entire command becomes Tier 3 + - `ls && rm -rf /` → Tier 3 (entire command denied) + +### Implementation + +**Backend**: `src-tauri/src/shell/classifier.rs` + +```rust +pub enum CommandTier { + Tier1, // Auto-execute + Tier2, // Requires approval + Tier3, // Always deny +} + +impl CommandClassifier { + pub fn classify(&self, command: &str) -> ClassificationResult { + // Parse command structure (pipes, chains, substitution) + let components = Self::parse_command_structure(command); + + // Classify each component and find highest tier + let mut highest_tier = CommandTier::Tier1; + for component in &components { + let tier = self.classify_single_command(&component.command, ...); + if tier > highest_tier { + highest_tier = tier; + } + } + + // Escalate if command substitution detected + if command.contains("$(") || command.contains("`") { + if highest_tier == CommandTier::Tier1 { + highest_tier = CommandTier::Tier2; + } + } + + ClassificationResult { tier: highest_tier, ... } + } +} +``` + +**Testing**: 19 unit tests cover all classification rules, edge cases, and escalation logic. + +--- + +## Consequences + +### Positive + +- **Deterministic**: Same command always gets same classification (unit testable) +- **Fast**: Regex-based classification completes in <1ms (no AI calls) +- **User-friendly**: Read-only commands execute immediately without prompts +- **Safe defaults**: Unknown commands default to Tier 2 (approval required) +- **Transparent**: UI shows tier reasoning ("mutating operation", "contains command substitution") +- **Session memory**: User can "Allow for Session" to approve multiple similar Tier 2 commands + +### Negative + +- **Maintenance burden**: New commands require manual tier assignment +- **False negatives**: Benign commands may be over-classified (e.g., `kubectl run --dry-run=client` is Tier 2 but harmless) +- **Bypass via arguments**: `cat /etc/shadow` is Tier 1 (read-only) but accesses sensitive data + - **Mitigation**: Context matters — AI should not ask to read `/etc/shadow` without reason + - **Mitigation**: Full audit log records all commands for security review + +### Trade-offs + +We chose **correctness and safety over flexibility**. A false positive (over-restricting a safe command) is acceptable; a false negative (allowing a destructive command) is not. + +--- + +## Related Decisions + +- **ADR-008**: MCP Protocol Integration (provides alternative tool integration method) +- **ADR-009**: Bundle kubectl Binary (ensures consistent kubectl version across platforms) + +--- + +## References + +- **Implementation PR**: #30 (Hackathon v1.0.0) +- **Test Coverage**: `src-tauri/src/shell/tests.rs` (19 tests) +- **Wiki**: `docs/wiki/Shell-Execution.md` +- **Database Schema**: Migrations 024-027 (shell_commands, kubeconfig_files, command_executions, approval_decisions) diff --git a/docs/architecture/adrs/ADR-008-mcp-protocol-integration.md b/docs/architecture/adrs/ADR-008-mcp-protocol-integration.md new file mode 100644 index 00000000..4304e777 --- /dev/null +++ b/docs/architecture/adrs/ADR-008-mcp-protocol-integration.md @@ -0,0 +1,214 @@ +# ADR-008: Model Context Protocol for External Tools + +**Date**: 2026-06-02 +**Status**: Accepted +**Deciders**: Shaun Arman, Henry Castle +**Context**: Hackathon v1.0.0 — Extensible Tool Integration + +--- + +## Context + +TFTSR DevOps Investigation v1.0.0 introduced agentic shell execution with statically-defined tools (`execute_shell_command`, `add_ado_comment`). As the application grows, we need a way to integrate external tools and services without hardcoding every integration into the Rust backend. + +**Requirements**: +- AI agents need access to third-party tools (GitHub, Slack, monitoring systems, etc.) +- Tool definitions should be discoverable and documented +- Tool execution should be sandboxed and timeout-protected +- New tools should be addable without recompiling the application +- Support both local processes (stdio) and remote services (HTTP) + +**Alternatives Considered**: + +1. **Plugin system (dynamic library loading)** + - ✅ Native Rust plugins with full system access + - ❌ Security risk — malicious plugins have full process access + - ❌ Unsafe Rust (`dlopen`, FFI) for plugin loading + - ❌ Platform-specific (.so, .dylib, .dll) + - ❌ No sandboxing + +2. **WebAssembly plugins (wasmtime)** + - ✅ Sandboxed execution with WASI + - ✅ Cross-platform (single .wasm file) + - ❌ Complex WASI interface design + - ❌ WASI preview2 still unstable + - ❌ Limited async support + +3. **gRPC tool server protocol** + - ✅ Industry-standard RPC + - ✅ Strongly typed with protobuf + - ❌ Complex setup for simple tools + - ❌ Every tool server needs gRPC boilerplate + - ❌ No existing ecosystem + +4. **Model Context Protocol (MCP)** + - ✅ Designed specifically for AI tool integration + - ✅ Existing ecosystem (Anthropic, community servers) + - ✅ Supports stdio (local processes) and HTTP (remote services) + - ✅ JSON-RPC 2.0 protocol (simple, well-understood) + - ✅ Tool discovery built into protocol + - ❌ New protocol (May 2024), potential churn + +--- + +## Decision + +**Adopt the Model Context Protocol (MCP) for external tool integration, using the `rmcp` Rust client library.** + +### Architecture + +``` +AI Agent → MCP Adapter → MCP Client → Transport (stdio/HTTP) → MCP Server + ↓ + External Tool +``` + +**Components**: + +| Module | Responsibility | +|--------|---------------| +| `mcp/client.rs` | Connect to MCP servers (stdio/HTTP) | +| `mcp/adapter.rs` | Merge MCP tools with static tools | +| `mcp/discovery.rs` | Health check servers, update status | +| `mcp/store.rs` | Persist server configs and tools to database | +| `mcp/models.rs` | McpServer, McpTool, McpResource types | +| `mcp/transport/stdio.rs` | Spawn processes with env vars | +| `mcp/transport/http.rs` | HTTP POST with auth headers | + +**Database Schema** (Migration 018): + +```sql +CREATE TABLE mcp_servers ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL, + url TEXT NOT NULL, + transport_type TEXT NOT NULL CHECK(transport_type IN ('stdio', 'http')), + auth_type TEXT NOT NULL CHECK(auth_type IN ('none', 'api_key', 'bearer', 'oauth2')), + auth_value TEXT, + enabled INTEGER NOT NULL DEFAULT 1, + discovery_status TEXT NOT NULL DEFAULT 'pending' + CHECK(discovery_status IN ('pending','connected','unreachable','error')), + env_config TEXT, -- JSON map of environment variables + ... +); + +CREATE TABLE mcp_tools ( + id TEXT PRIMARY KEY, + server_id TEXT NOT NULL, + name TEXT NOT NULL, + tool_key TEXT NOT NULL, -- "server_name.tool_name" + description TEXT, + parameters TEXT NOT NULL, -- JSON schema + FOREIGN KEY(server_id) REFERENCES mcp_servers(id) ON DELETE CASCADE +); +``` + +**Tool Calling Flow**: + +1. User configures MCP server in Settings (name, URL/command, transport type, auth) +2. Application connects and calls `list_tools()` to discover available tools +3. Tools stored in `mcp_tools` table with namespaced key (`server_name.tool_name`) +4. AI agent requests tools via `get_enabled_mcp_tools()` +5. MCP tools merged with static tools (`execute_shell_command`, `add_ado_comment`) +6. AI agent calls tool by key (e.g., `github.create_issue`) +7. Adapter routes to correct MCP client +8. Client invokes tool with **30-second hard timeout** +9. Result returned to AI agent + +**Safety Features**: + +- **Timeout protection**: 30-second hard timeout prevents indefinite hangs from misbehaving servers +- **Process isolation**: Stdio servers run as separate processes with isolated env vars +- **Auth encryption**: API keys encrypted with AES-256-GCM before storage +- **User control**: Users explicitly enable/disable each MCP server +- **Status tracking**: Connection health displayed in UI (connected, unreachable, error) + +--- + +## Consequences + +### Positive + +- **Extensibility**: New tools without recompiling (add MCP server in Settings) +- **Ecosystem**: Can use community MCP servers (GitHub, Slack, Prometheus, etc.) +- **Simplicity**: JSON-RPC 2.0 protocol is simple to implement and debug +- **Dual transport**: Supports both local tools (stdio) and cloud services (HTTP) +- **Discovery**: Tool schemas fetched automatically via `list_tools()` +- **Sandboxing**: Stdio processes isolated, HTTP calls timeout-protected + +### Negative + +- **Protocol churn risk**: MCP is new (May 2024), spec may evolve +- **Dependency**: Relies on `rmcp` crate maintenance +- **Stdio complexity**: Process spawning platform-dependent (Windows cmd.exe vs Unix bash) +- **Debugging**: Tool call failures require inspecting both application logs and MCP server logs + +### Trade-offs + +We chose **extensibility and ecosystem over protocol maturity**. MCP's design aligns with our use case (AI tool calling), and the 30-second timeout mitigates the risk of server misbehavior. + +--- + +## Implementation Notes + +**Example: Stdio MCP Server** + +```bash +# User configures in Settings UI: +Name: GitHub Tools +Transport: stdio +Command: npx +Args: @modelcontextprotocol/server-github +Env: GITHUB_TOKEN=ghp_... +``` + +Application spawns process, sends JSON-RPC 2.0 requests over stdin/stdout: + +```json +{"jsonrpc":"2.0","method":"tools/list","id":1} +``` + +Server responds: + +```json +{ + "jsonrpc":"2.0", + "id":1, + "result":{ + "tools":[ + {"name":"create_issue","description":"Create a GitHub issue","inputSchema":{...}}, + {"name":"list_commits","description":"List commits","inputSchema":{...}} + ] + } +} +``` + +**Example: HTTP MCP Server** + +```bash +# User configures: +Name: Internal Monitoring +Transport: http +URL: https://monitoring.internal.com/mcp +Auth Type: bearer +Auth Value: eyJ... +``` + +Application sends HTTP POST to `/mcp` with `Authorization: Bearer eyJ...` header. + +--- + +## Related Decisions + +- **ADR-007**: Three-Tier Shell Safety (MCP tools bypass shell classification — server responsibility) +- Future: **ADR-010**: MCP Tool Approval System (extend three-tier safety to MCP tools) + +--- + +## References + +- **MCP Specification**: https://spec.modelcontextprotocol.io/ +- **rmcp Rust Client**: https://github.com/tankeez/rmcp +- **Implementation PR**: #32 (Hackathon v1.0.0) +- **Database Schema**: Migration 018 (`mcp_servers`, `mcp_tools`, `mcp_resources`) +- **Wiki**: `docs/wiki/AI-Providers.md` (Tool Calling section) diff --git a/docs/architecture/adrs/ADR-009-bundled-kubectl-binary.md b/docs/architecture/adrs/ADR-009-bundled-kubectl-binary.md new file mode 100644 index 00000000..8612620f --- /dev/null +++ b/docs/architecture/adrs/ADR-009-bundled-kubectl-binary.md @@ -0,0 +1,241 @@ +# ADR-009: Bundle kubectl Binary for Cross-Platform Consistency + +**Date**: 2026-06-02 +**Status**: Accepted +**Deciders**: Shaun Arman, RJ Cooper +**Context**: Hackathon v1.0.0 — Shell Execution System + +--- + +## Context + +TFTSR DevOps Investigation v1.0.0 introduced `execute_shell_command` tool for AI agents, with kubectl as a primary use case (diagnosing Kubernetes pod failures, checking deployments, viewing logs). kubectl is a critical tool for IT troubleshooting but has several challenges: + +**Problems with system kubectl**: +- Version skew: User's kubectl may be v1.25 while cluster is v1.30 (API changes) +- Not installed: Many Windows/macOS users don't have kubectl +- PATH issues: kubectl in non-standard location (WSL, Homebrew, Chocolatey) +- Permission issues: System kubectl may require admin rights on Windows +- Configuration drift: `~/.kube/config` may be misconfigured or missing + +**Requirements**: +- AI agents need reliable kubectl execution across all platforms +- Users should not need to install kubectl separately +- kubectl version should be consistent (no version skew errors) +- Work with multiple kubeconfig files (dev, staging, prod clusters) + +**Alternatives Considered**: + +1. **Use system kubectl (require manual install)** + - ✅ No binary bundling needed + - ❌ Poor UX — user must install kubectl separately + - ❌ Version skew issues + - ❌ PATH configuration required + - ❌ Windows complexity (WSL vs native) + +2. **Download kubectl at runtime (first use)** + - ✅ No bloat in installer + - ✅ Always latest version + - ❌ Requires internet on first run + - ❌ Download failure = broken feature + - ❌ Security risk (MITM, checksum verification) + +3. **Bundle kubectl as resource file** + - ✅ Works offline + - ✅ Consistent version + - ✅ No user setup required + - ❌ Increases installer size (~50MB per platform) + - ❌ Need to update kubectl periodically + +4. **Kubernetes client library (k8s-openapi crate)** + - ✅ No binary needed + - ✅ Native Rust implementation + - ❌ Complex API (YAML → Rust types) + - ❌ Doesn't support `kubectl apply -f` directly + - ❌ No support for kubectl plugins + - ❌ AI agents know kubectl CLI, not k8s-openapi API + +--- + +## Decision + +**Bundle kubectl v1.30.0 binary for all platforms (Linux amd64/arm64, macOS arm64/Intel, Windows amd64) as a Tauri resource.** + +### Implementation + +**Build-time binary download**: `scripts/download-kubectl.sh` + +```bash +#!/bin/bash +VERSION="1.30.0" +OS=$1 # linux, darwin, windows +ARCH=$2 # amd64, arm64 + +curl -LO "https://dl.k8s.io/release/v${VERSION}/bin/${OS}/${ARCH}/kubectl" +chmod +x kubectl +mv kubectl "binaries/kubectl-${OS}-${ARCH}" +``` + +**CI/CD Integration**: `.github/workflows/release.yml` + +```yaml +- name: Download kubectl binaries + run: | + ./scripts/download-kubectl.sh linux amd64 + ./scripts/download-kubectl.sh linux arm64 + ./scripts/download-kubectl.sh darwin arm64 + ./scripts/download-kubectl.sh darwin amd64 + ./scripts/download-kubectl.sh windows amd64 +``` + +**Tauri Resource Bundling**: `src-tauri/tauri.conf.json` + +```json +{ + "tauri": { + "bundle": { + "resources": [ + "binaries/kubectl-*" + ] + } + } +} +``` + +**Runtime Binary Extraction**: `src-tauri/src/shell/kubectl.rs` + +```rust +pub fn get_kubectl_path() -> Result { + let resource_dir = tauri::api::path::resource_dir(...) + .ok_or("Failed to get resource directory")?; + + #[cfg(target_os = "linux")] + let binary_name = if cfg!(target_arch = "aarch64") { + "kubectl-linux-arm64" + } else { + "kubectl-linux-amd64" + }; + + #[cfg(target_os = "macos")] + let binary_name = if cfg!(target_arch = "aarch64") { + "kubectl-darwin-arm64" + } else { + "kubectl-darwin-amd64" + }; + + #[cfg(target_os = "windows")] + let binary_name = "kubectl-windows-amd64.exe"; + + let kubectl_path = resource_dir.join(binary_name); + + // Ensure executable permissions on Unix + #[cfg(unix)] + { + use std::os::unix::fs::PermissionsExt; + let metadata = std::fs::metadata(&kubectl_path) + .map_err(|e| format!("kubectl binary not found: {e}"))?; + let mut perms = metadata.permissions(); + perms.set_mode(0o755); + std::fs::set_permissions(&kubectl_path, perms)?; + } + + Ok(kubectl_path) +} +``` + +**Execution with Custom Kubeconfig**: `src-tauri/src/shell/executor.rs` + +```rust +pub async fn execute_kubectl(command: &str, kubeconfig_id: Option) -> Result { + let kubectl_path = kubectl::get_kubectl_path()?; + + let mut cmd = Command::new(kubectl_path); + + // Inject kubeconfig if provided + if let Some(id) = kubeconfig_id { + let kubeconfig = kubeconfig::get_and_decrypt(id)?; + let temp_path = write_temp_kubeconfig(kubeconfig)?; + cmd.env("KUBECONFIG", temp_path); + } + + cmd.args(command.split_whitespace()); + cmd.output().await +} +``` + +### Version Selection Rationale + +**kubectl v1.30.0** (released April 2024): +- **Compatibility**: Supports Kubernetes v1.29, v1.30, v1.31 (n±1 version skew) +- **Stability**: 1.30 is a stable release (not beta) +- **Feature coverage**: Includes all common troubleshooting commands +- **Size**: ~50MB per platform (acceptable for installer) + +--- + +## Consequences + +### Positive + +- **Zero-configuration**: kubectl works immediately after install +- **Consistent behavior**: Same kubectl version on all platforms +- **Offline capable**: No internet required for kubectl execution +- **Kubeconfig flexibility**: Users can upload multiple kubeconfig files +- **Security**: Binary checksum verified during CI build +- **Reliability**: No version skew errors with Kubernetes 1.29-1.31 clusters + +### Negative + +- **Installer size**: Increases by ~50MB per platform (150MB total for all platforms) +- **Update lag**: kubectl version frozen until release +- **Disk usage**: Each install includes kubectl binary (no sharing across users) +- **Maintenance**: Need to periodically update kubectl version + +### Trade-offs + +We chose **reliability and UX over installer size**. The 50MB increase is acceptable for a desktop application targeting IT engineers who likely have kubectl needs. + +--- + +## Mitigation Strategies + +**Installer size**: +- Compress binaries in bundle (reduces to ~15MB per platform) +- Document minimum disk space requirement in README + +**kubectl version updates**: +- Add `scripts/update-kubectl.sh` to automate version bumps +- Schedule quarterly kubectl version reviews +- Document current version in CLAUDE.md and wiki + +**Platform-specific issues**: +- Windows: Sign kubectl binary to avoid SmartScreen warnings +- macOS: Sign and notarize to pass Gatekeeper +- Linux: Verify `chmod +x` works across all distros + +--- + +## Future Enhancements + +1. **Optional system kubectl**: Add "Use system kubectl" toggle in Settings (falls back to bundled if not found) +2. **Version display**: Show kubectl version in Settings UI +3. **Auto-update**: Download newer kubectl if available (requires secure checksum verification) +4. **Plugin support**: Bundle common kubectl plugins (kubectx, kubens, stern) + +--- + +## Related Decisions + +- **ADR-007**: Three-Tier Shell Safety (kubectl commands classified as Tier 1/Tier 2) +- **ADR-008**: MCP Protocol Integration (alternative to bundling binaries — use MCP kubectl server) + +--- + +## References + +- **kubectl Releases**: https://kubernetes.io/releases/ +- **Download Script**: `scripts/download-kubectl.sh` +- **Binary Management**: `src-tauri/src/shell/kubectl.rs` +- **Implementation PR**: #30 (Hackathon v1.0.0) +- **CI/CD**: `.github/workflows/release.yml` (kubectl download step) +- **Wiki**: `docs/wiki/Shell-Execution.md` (kubectl section)