docs: add ADRs for shell safety, MCP transport, kubectl bundling

Architecture decision records with sanitized content (proprietary
references removed).

- ADR-007: Three-Tier Shell Safety Classification
- ADR-008: MCP Protocol Integration (HTTP transport)
- ADR-009: Bundled kubectl Binary rationale

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Shaun Arman 2026-06-05 08:12:19 -05:00
parent ad2d1ced84
commit e5593cbfe2
3 changed files with 616 additions and 0 deletions

View File

@ -0,0 +1,161 @@
# ADR-007: Three-Tier Shell Command Safety Classification
**Date**: 2026-06-02
**Status**: Accepted
**Deciders**: Shaun Arman, Henry Castle, RJ Cooper
**Context**: Hackathon v1.0.0 — Agentic Shell Execution
---
## Context
TFTSR DevOps Investigation v1.0.0 introduced agentic shell command execution, allowing AI agents to execute kubectl, Proxmox, and general shell commands during troubleshooting conversations. This capability creates a significant security risk: malicious or hallucinated commands could cause data loss, service disruption, or unauthorized system access.
**Requirements**:
- AI agents need shell access for diagnostics (kubectl, pvecm, qm, etc.)
- Read-only operations should execute immediately for fast iteration
- Mutating operations require explicit user approval
- Destructive operations must be blocked entirely
- Classification must handle pipes, chains, and command substitution
- System must be deterministic and testable
**Alternatives Considered**:
1. **Whitelist-only approach**: Maintain a fixed list of allowed commands
- ✅ Simple to implement
- ❌ Brittle — breaks with new commands or options
- ❌ Poor UX — blocks legitimate commands like `kubectl get pods -n custom-namespace`
2. **Blacklist-only approach**: Block known-dangerous commands
- ✅ Flexible for new commands
- ❌ Fails-open — unknown dangerous commands execute
- ❌ False sense of security
3. **LLM-based classification**: Ask another AI to classify command safety
- ✅ Context-aware decisions
- ❌ Non-deterministic — same command gets different classifications
- ❌ Latency — adds 500ms+ per command
- ❌ Cost — every command requires an AI call
- ❌ Cannot unit test
4. **Sandbox all commands**: Execute in isolated containers
- ✅ Maximum safety
- ❌ Complex infrastructure
- ❌ Breaks kubectl (needs real cluster access)
- ❌ High latency
---
## Decision
**Implement a deterministic three-tier safety classification system with static analysis and rule-based tier assignment.**
### Tier Definitions
| Tier | Safety Level | Approval | Examples |
|------|--------------|----------|----------|
| **Tier 1** | Read-only, no side effects | Auto-execute | `kubectl get`, `describe`, `logs`, `cat`, `grep`, `ls`, `pvecm status`, `qm status` |
| **Tier 2** | Mutating, potentially disruptive | User approval required | `kubectl apply`, `delete`, `scale`, `chmod`, `systemctl restart`, `ssh`, `chown` |
| **Tier 3** | Destructive, unrecoverable | Always deny | `rm -rf`, `shutdown`, `reboot`, `mkfs`, `dd if=/dev/zero`, `:(){:\|:&};:` (fork bomb) |
### Classification Rules
1. **Single command**: Classify by command + subcommand pattern
- `kubectl get` → Tier 1
- `kubectl apply` → Tier 2
- `rm -rf` → Tier 3
2. **Piped commands** (`|`): Highest tier wins
- `kubectl get pods | grep nginx` → max(Tier 1, Tier 1) = Tier 1
- `cat /etc/passwd | tee /tmp/backup` → max(Tier 1, Tier 2) = Tier 2
3. **Command chains** (`&&`, `||`, `;`): Highest tier wins
- `ls && cat file` → max(Tier 1, Tier 1) = Tier 1
- `kubectl delete pod nginx && kubectl get pods` → max(Tier 2, Tier 1) = Tier 2
4. **Command substitution** (`` `...` ``, `$(...)`): Escalate Tier 1 to Tier 2
- `kubectl get pods $(cat namespace.txt)` → Tier 2 (even if `kubectl get` is Tier 1)
- Rationale: Command substitution introduces hidden indirection
5. **Any Tier 3 in chain**: Entire command becomes Tier 3
- `ls && rm -rf /` → Tier 3 (entire command denied)
### Implementation
**Backend**: `src-tauri/src/shell/classifier.rs`
```rust
pub enum CommandTier {
Tier1, // Auto-execute
Tier2, // Requires approval
Tier3, // Always deny
}
impl CommandClassifier {
pub fn classify(&self, command: &str) -> ClassificationResult {
// Parse command structure (pipes, chains, substitution)
let components = Self::parse_command_structure(command);
// Classify each component and find highest tier
let mut highest_tier = CommandTier::Tier1;
for component in &components {
let tier = self.classify_single_command(&component.command, ...);
if tier > highest_tier {
highest_tier = tier;
}
}
// Escalate if command substitution detected
if command.contains("$(") || command.contains("`") {
if highest_tier == CommandTier::Tier1 {
highest_tier = CommandTier::Tier2;
}
}
ClassificationResult { tier: highest_tier, ... }
}
}
```
**Testing**: 19 unit tests cover all classification rules, edge cases, and escalation logic.
---
## Consequences
### Positive
- **Deterministic**: Same command always gets same classification (unit testable)
- **Fast**: Regex-based classification completes in <1ms (no AI calls)
- **User-friendly**: Read-only commands execute immediately without prompts
- **Safe defaults**: Unknown commands default to Tier 2 (approval required)
- **Transparent**: UI shows tier reasoning ("mutating operation", "contains command substitution")
- **Session memory**: User can "Allow for Session" to approve multiple similar Tier 2 commands
### Negative
- **Maintenance burden**: New commands require manual tier assignment
- **False negatives**: Benign commands may be over-classified (e.g., `kubectl run --dry-run=client` is Tier 2 but harmless)
- **Bypass via arguments**: `cat /etc/shadow` is Tier 1 (read-only) but accesses sensitive data
- **Mitigation**: Context matters — AI should not ask to read `/etc/shadow` without reason
- **Mitigation**: Full audit log records all commands for security review
### Trade-offs
We chose **correctness and safety over flexibility**. A false positive (over-restricting a safe command) is acceptable; a false negative (allowing a destructive command) is not.
---
## Related Decisions
- **ADR-008**: MCP Protocol Integration (provides alternative tool integration method)
- **ADR-009**: Bundle kubectl Binary (ensures consistent kubectl version across platforms)
---
## References
- **Implementation PR**: #30 (Hackathon v1.0.0)
- **Test Coverage**: `src-tauri/src/shell/tests.rs` (19 tests)
- **Wiki**: `docs/wiki/Shell-Execution.md`
- **Database Schema**: Migrations 024-027 (shell_commands, kubeconfig_files, command_executions, approval_decisions)

View File

@ -0,0 +1,214 @@
# ADR-008: Model Context Protocol for External Tools
**Date**: 2026-06-02
**Status**: Accepted
**Deciders**: Shaun Arman, Henry Castle
**Context**: Hackathon v1.0.0 — Extensible Tool Integration
---
## Context
TFTSR DevOps Investigation v1.0.0 introduced agentic shell execution with statically-defined tools (`execute_shell_command`, `add_ado_comment`). As the application grows, we need a way to integrate external tools and services without hardcoding every integration into the Rust backend.
**Requirements**:
- AI agents need access to third-party tools (GitHub, Slack, monitoring systems, etc.)
- Tool definitions should be discoverable and documented
- Tool execution should be sandboxed and timeout-protected
- New tools should be addable without recompiling the application
- Support both local processes (stdio) and remote services (HTTP)
**Alternatives Considered**:
1. **Plugin system (dynamic library loading)**
- ✅ Native Rust plugins with full system access
- ❌ Security risk — malicious plugins have full process access
- ❌ Unsafe Rust (`dlopen`, FFI) for plugin loading
- ❌ Platform-specific (.so, .dylib, .dll)
- ❌ No sandboxing
2. **WebAssembly plugins (wasmtime)**
- ✅ Sandboxed execution with WASI
- ✅ Cross-platform (single .wasm file)
- ❌ Complex WASI interface design
- ❌ WASI preview2 still unstable
- ❌ Limited async support
3. **gRPC tool server protocol**
- ✅ Industry-standard RPC
- ✅ Strongly typed with protobuf
- ❌ Complex setup for simple tools
- ❌ Every tool server needs gRPC boilerplate
- ❌ No existing ecosystem
4. **Model Context Protocol (MCP)**
- ✅ Designed specifically for AI tool integration
- ✅ Existing ecosystem (Anthropic, community servers)
- ✅ Supports stdio (local processes) and HTTP (remote services)
- ✅ JSON-RPC 2.0 protocol (simple, well-understood)
- ✅ Tool discovery built into protocol
- ❌ New protocol (May 2024), potential churn
---
## Decision
**Adopt the Model Context Protocol (MCP) for external tool integration, using the `rmcp` Rust client library.**
### Architecture
```
AI Agent → MCP Adapter → MCP Client → Transport (stdio/HTTP) → MCP Server
External Tool
```
**Components**:
| Module | Responsibility |
|--------|---------------|
| `mcp/client.rs` | Connect to MCP servers (stdio/HTTP) |
| `mcp/adapter.rs` | Merge MCP tools with static tools |
| `mcp/discovery.rs` | Health check servers, update status |
| `mcp/store.rs` | Persist server configs and tools to database |
| `mcp/models.rs` | McpServer, McpTool, McpResource types |
| `mcp/transport/stdio.rs` | Spawn processes with env vars |
| `mcp/transport/http.rs` | HTTP POST with auth headers |
**Database Schema** (Migration 018):
```sql
CREATE TABLE mcp_servers (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
url TEXT NOT NULL,
transport_type TEXT NOT NULL CHECK(transport_type IN ('stdio', 'http')),
auth_type TEXT NOT NULL CHECK(auth_type IN ('none', 'api_key', 'bearer', 'oauth2')),
auth_value TEXT,
enabled INTEGER NOT NULL DEFAULT 1,
discovery_status TEXT NOT NULL DEFAULT 'pending'
CHECK(discovery_status IN ('pending','connected','unreachable','error')),
env_config TEXT, -- JSON map of environment variables
...
);
CREATE TABLE mcp_tools (
id TEXT PRIMARY KEY,
server_id TEXT NOT NULL,
name TEXT NOT NULL,
tool_key TEXT NOT NULL, -- "server_name.tool_name"
description TEXT,
parameters TEXT NOT NULL, -- JSON schema
FOREIGN KEY(server_id) REFERENCES mcp_servers(id) ON DELETE CASCADE
);
```
**Tool Calling Flow**:
1. User configures MCP server in Settings (name, URL/command, transport type, auth)
2. Application connects and calls `list_tools()` to discover available tools
3. Tools stored in `mcp_tools` table with namespaced key (`server_name.tool_name`)
4. AI agent requests tools via `get_enabled_mcp_tools()`
5. MCP tools merged with static tools (`execute_shell_command`, `add_ado_comment`)
6. AI agent calls tool by key (e.g., `github.create_issue`)
7. Adapter routes to correct MCP client
8. Client invokes tool with **30-second hard timeout**
9. Result returned to AI agent
**Safety Features**:
- **Timeout protection**: 30-second hard timeout prevents indefinite hangs from misbehaving servers
- **Process isolation**: Stdio servers run as separate processes with isolated env vars
- **Auth encryption**: API keys encrypted with AES-256-GCM before storage
- **User control**: Users explicitly enable/disable each MCP server
- **Status tracking**: Connection health displayed in UI (connected, unreachable, error)
---
## Consequences
### Positive
- **Extensibility**: New tools without recompiling (add MCP server in Settings)
- **Ecosystem**: Can use community MCP servers (GitHub, Slack, Prometheus, etc.)
- **Simplicity**: JSON-RPC 2.0 protocol is simple to implement and debug
- **Dual transport**: Supports both local tools (stdio) and cloud services (HTTP)
- **Discovery**: Tool schemas fetched automatically via `list_tools()`
- **Sandboxing**: Stdio processes isolated, HTTP calls timeout-protected
### Negative
- **Protocol churn risk**: MCP is new (May 2024), spec may evolve
- **Dependency**: Relies on `rmcp` crate maintenance
- **Stdio complexity**: Process spawning platform-dependent (Windows cmd.exe vs Unix bash)
- **Debugging**: Tool call failures require inspecting both application logs and MCP server logs
### Trade-offs
We chose **extensibility and ecosystem over protocol maturity**. MCP's design aligns with our use case (AI tool calling), and the 30-second timeout mitigates the risk of server misbehavior.
---
## Implementation Notes
**Example: Stdio MCP Server**
```bash
# User configures in Settings UI:
Name: GitHub Tools
Transport: stdio
Command: npx
Args: @modelcontextprotocol/server-github
Env: GITHUB_TOKEN=ghp_...
```
Application spawns process, sends JSON-RPC 2.0 requests over stdin/stdout:
```json
{"jsonrpc":"2.0","method":"tools/list","id":1}
```
Server responds:
```json
{
"jsonrpc":"2.0",
"id":1,
"result":{
"tools":[
{"name":"create_issue","description":"Create a GitHub issue","inputSchema":{...}},
{"name":"list_commits","description":"List commits","inputSchema":{...}}
]
}
}
```
**Example: HTTP MCP Server**
```bash
# User configures:
Name: Internal Monitoring
Transport: http
URL: https://monitoring.internal.com/mcp
Auth Type: bearer
Auth Value: eyJ...
```
Application sends HTTP POST to `/mcp` with `Authorization: Bearer eyJ...` header.
---
## Related Decisions
- **ADR-007**: Three-Tier Shell Safety (MCP tools bypass shell classification — server responsibility)
- Future: **ADR-010**: MCP Tool Approval System (extend three-tier safety to MCP tools)
---
## References
- **MCP Specification**: https://spec.modelcontextprotocol.io/
- **rmcp Rust Client**: https://github.com/tankeez/rmcp
- **Implementation PR**: #32 (Hackathon v1.0.0)
- **Database Schema**: Migration 018 (`mcp_servers`, `mcp_tools`, `mcp_resources`)
- **Wiki**: `docs/wiki/AI-Providers.md` (Tool Calling section)

View File

@ -0,0 +1,241 @@
# ADR-009: Bundle kubectl Binary for Cross-Platform Consistency
**Date**: 2026-06-02
**Status**: Accepted
**Deciders**: Shaun Arman, RJ Cooper
**Context**: Hackathon v1.0.0 — Shell Execution System
---
## Context
TFTSR DevOps Investigation v1.0.0 introduced `execute_shell_command` tool for AI agents, with kubectl as a primary use case (diagnosing Kubernetes pod failures, checking deployments, viewing logs). kubectl is a critical tool for IT troubleshooting but has several challenges:
**Problems with system kubectl**:
- Version skew: User's kubectl may be v1.25 while cluster is v1.30 (API changes)
- Not installed: Many Windows/macOS users don't have kubectl
- PATH issues: kubectl in non-standard location (WSL, Homebrew, Chocolatey)
- Permission issues: System kubectl may require admin rights on Windows
- Configuration drift: `~/.kube/config` may be misconfigured or missing
**Requirements**:
- AI agents need reliable kubectl execution across all platforms
- Users should not need to install kubectl separately
- kubectl version should be consistent (no version skew errors)
- Work with multiple kubeconfig files (dev, staging, prod clusters)
**Alternatives Considered**:
1. **Use system kubectl (require manual install)**
- ✅ No binary bundling needed
- ❌ Poor UX — user must install kubectl separately
- ❌ Version skew issues
- ❌ PATH configuration required
- ❌ Windows complexity (WSL vs native)
2. **Download kubectl at runtime (first use)**
- ✅ No bloat in installer
- ✅ Always latest version
- ❌ Requires internet on first run
- ❌ Download failure = broken feature
- ❌ Security risk (MITM, checksum verification)
3. **Bundle kubectl as resource file**
- ✅ Works offline
- ✅ Consistent version
- ✅ No user setup required
- ❌ Increases installer size (~50MB per platform)
- ❌ Need to update kubectl periodically
4. **Kubernetes client library (k8s-openapi crate)**
- ✅ No binary needed
- ✅ Native Rust implementation
- ❌ Complex API (YAML → Rust types)
- ❌ Doesn't support `kubectl apply -f` directly
- ❌ No support for kubectl plugins
- ❌ AI agents know kubectl CLI, not k8s-openapi API
---
## Decision
**Bundle kubectl v1.30.0 binary for all platforms (Linux amd64/arm64, macOS arm64/Intel, Windows amd64) as a Tauri resource.**
### Implementation
**Build-time binary download**: `scripts/download-kubectl.sh`
```bash
#!/bin/bash
VERSION="1.30.0"
OS=$1 # linux, darwin, windows
ARCH=$2 # amd64, arm64
curl -LO "https://dl.k8s.io/release/v${VERSION}/bin/${OS}/${ARCH}/kubectl"
chmod +x kubectl
mv kubectl "binaries/kubectl-${OS}-${ARCH}"
```
**CI/CD Integration**: `.github/workflows/release.yml`
```yaml
- name: Download kubectl binaries
run: |
./scripts/download-kubectl.sh linux amd64
./scripts/download-kubectl.sh linux arm64
./scripts/download-kubectl.sh darwin arm64
./scripts/download-kubectl.sh darwin amd64
./scripts/download-kubectl.sh windows amd64
```
**Tauri Resource Bundling**: `src-tauri/tauri.conf.json`
```json
{
"tauri": {
"bundle": {
"resources": [
"binaries/kubectl-*"
]
}
}
}
```
**Runtime Binary Extraction**: `src-tauri/src/shell/kubectl.rs`
```rust
pub fn get_kubectl_path() -> Result<PathBuf, String> {
let resource_dir = tauri::api::path::resource_dir(...)
.ok_or("Failed to get resource directory")?;
#[cfg(target_os = "linux")]
let binary_name = if cfg!(target_arch = "aarch64") {
"kubectl-linux-arm64"
} else {
"kubectl-linux-amd64"
};
#[cfg(target_os = "macos")]
let binary_name = if cfg!(target_arch = "aarch64") {
"kubectl-darwin-arm64"
} else {
"kubectl-darwin-amd64"
};
#[cfg(target_os = "windows")]
let binary_name = "kubectl-windows-amd64.exe";
let kubectl_path = resource_dir.join(binary_name);
// Ensure executable permissions on Unix
#[cfg(unix)]
{
use std::os::unix::fs::PermissionsExt;
let metadata = std::fs::metadata(&kubectl_path)
.map_err(|e| format!("kubectl binary not found: {e}"))?;
let mut perms = metadata.permissions();
perms.set_mode(0o755);
std::fs::set_permissions(&kubectl_path, perms)?;
}
Ok(kubectl_path)
}
```
**Execution with Custom Kubeconfig**: `src-tauri/src/shell/executor.rs`
```rust
pub async fn execute_kubectl(command: &str, kubeconfig_id: Option<String>) -> Result<Output> {
let kubectl_path = kubectl::get_kubectl_path()?;
let mut cmd = Command::new(kubectl_path);
// Inject kubeconfig if provided
if let Some(id) = kubeconfig_id {
let kubeconfig = kubeconfig::get_and_decrypt(id)?;
let temp_path = write_temp_kubeconfig(kubeconfig)?;
cmd.env("KUBECONFIG", temp_path);
}
cmd.args(command.split_whitespace());
cmd.output().await
}
```
### Version Selection Rationale
**kubectl v1.30.0** (released April 2024):
- **Compatibility**: Supports Kubernetes v1.29, v1.30, v1.31 (n±1 version skew)
- **Stability**: 1.30 is a stable release (not beta)
- **Feature coverage**: Includes all common troubleshooting commands
- **Size**: ~50MB per platform (acceptable for installer)
---
## Consequences
### Positive
- **Zero-configuration**: kubectl works immediately after install
- **Consistent behavior**: Same kubectl version on all platforms
- **Offline capable**: No internet required for kubectl execution
- **Kubeconfig flexibility**: Users can upload multiple kubeconfig files
- **Security**: Binary checksum verified during CI build
- **Reliability**: No version skew errors with Kubernetes 1.29-1.31 clusters
### Negative
- **Installer size**: Increases by ~50MB per platform (150MB total for all platforms)
- **Update lag**: kubectl version frozen until release
- **Disk usage**: Each install includes kubectl binary (no sharing across users)
- **Maintenance**: Need to periodically update kubectl version
### Trade-offs
We chose **reliability and UX over installer size**. The 50MB increase is acceptable for a desktop application targeting IT engineers who likely have kubectl needs.
---
## Mitigation Strategies
**Installer size**:
- Compress binaries in bundle (reduces to ~15MB per platform)
- Document minimum disk space requirement in README
**kubectl version updates**:
- Add `scripts/update-kubectl.sh` to automate version bumps
- Schedule quarterly kubectl version reviews
- Document current version in CLAUDE.md and wiki
**Platform-specific issues**:
- Windows: Sign kubectl binary to avoid SmartScreen warnings
- macOS: Sign and notarize to pass Gatekeeper
- Linux: Verify `chmod +x` works across all distros
---
## Future Enhancements
1. **Optional system kubectl**: Add "Use system kubectl" toggle in Settings (falls back to bundled if not found)
2. **Version display**: Show kubectl version in Settings UI
3. **Auto-update**: Download newer kubectl if available (requires secure checksum verification)
4. **Plugin support**: Bundle common kubectl plugins (kubectx, kubens, stern)
---
## Related Decisions
- **ADR-007**: Three-Tier Shell Safety (kubectl commands classified as Tier 1/Tier 2)
- **ADR-008**: MCP Protocol Integration (alternative to bundling binaries — use MCP kubectl server)
---
## References
- **kubectl Releases**: https://kubernetes.io/releases/
- **Download Script**: `scripts/download-kubectl.sh`
- **Binary Management**: `src-tauri/src/shell/kubectl.rs`
- **Implementation PR**: #30 (Hackathon v1.0.0)
- **CI/CD**: `.github/workflows/release.yml` (kubectl download step)
- **Wiki**: `docs/wiki/Shell-Execution.md` (kubectl section)