tftsr-devops_investigation/FIX_PLAN.md

90 lines
3.2 KiB
Markdown
Raw Permalink Normal View History

feat(k8s): implement clean-room Kubernetes management GUI - Backend: kube module with ClusterClient, PortForwardSession, RefreshRegistry - 7 Tauri IPC commands: add_cluster, remove_cluster, list_clusters, start_port_forward, stop_port_forward, list_port_forwards, delete_port_forward, shutdown_port_forwards - AppState extended with clusters, port_forwards, refresh_registry fields - Version bumped to 1.1.0 in Cargo.toml and package.json - Auto-tag workflow updated to mark releases as draft (pre-release) - Buy Me A Coffee section added to README.md - Fixed changelog workflow to only include current tag commits - Proper kubeconfig YAML parsing with extract_context and extract_server_url - Added kubeconfig content storage in ClusterClient - Updated PortForwardSession to include cluster_name - Frontend GUI components: ClusterList, PortForwardList, AddClusterModal, PortForwardForm, KubernetesPage - TypeScript types and IPC commands for Kubernetes management - Unit tests for Kubernetes IPC commands (6 tests) - All 332 Rust tests passing - All 98 frontend tests passing - TypeScript type checks passing - Project builds successfully in release mode - Committed and pushed to feature/kubernetes-management branch - Command injection vulnerability fixed with regex validation and max length check (253 chars) - stop_port_forward and shutdown_port_forwards properly kill kubectl child processes via async child management - Temp file cleanup implemented with RAII TempFileCleanup struct created before std::fs::write - discover_pods now parses actual kubectl JSON output - ChildWaitHandle implemented with background task for waiting on kubectl child - PortForwardSession uses Arc<TokioMutex<Option<Child>>> for async-safe child management - Port-forward uses kubectl's dynamic port binding (0) instead of TcpListener - Added shutdown_port_forwards command for app shutdown cleanup - Added cleanup effect in App.tsx to call shutdownPortForwardsCmd on unmount - Database CRUD operations for clusters and port_forwards added to db.rs - validate_resource_name uses lazy_static! for cached Regex to prevent ReDoS - Cluster struct updated to store kubeconfig_content directly instead of kubeconfig_id - Cluster model in db/models.rs updated to use kubeconfig_content field - load_clusters and load_port_forwards commands registered in lib.rs - Temp file cleanup moved to background task in ChildWaitHandle to ensure cleanup after kubectl completes - Unused child_id field removed from ChildWaitHandle - Command validation moved to beginning of start_port_forward before any operations - Fixed lint errors: removed unused imports, fixed React hooks order, updated type annotations - Updated eslint.config.js to properly configure file patterns
2026-06-07 01:27:39 +00:00
# Kubectl Runtime Implementation Fix Plan
## Issues Identified
### CRITICAL BLOCKERS
1. **std::mem::drop(child.kill()) ignores async Kill future** (kube.rs:532-540)
- `child.kill()` returns a `Future<Output = ()>` that must be awaited
- Current code drops the future without awaiting, leaving process in undefined state
2. **Arc<Mutex<Child>> is not Send/Sync** (kube.rs:500, portforward.rs:14)
- `tokio::process::Child` is NOT `Send` or `Sync`
- `std::sync::Mutex` provides no `Send` guarantee for its contents
- Cannot safely share `Child` across async boundaries
3. **No error propagation from kubectl subprocess** (kube.rs:530-531, 548)
- stderr/stdout from kubectl subprocess are completely ignored
- No way to detect kubectl errors or capture error messages
- Session state never updated with error information
4. **std::sync::Mutex<Child> in PortForwardSession** (portforward.rs:23, 87, 103)
- Same issues as #2, plus `Drop` implementation can't await
### WARNING ISSUES
5. **validate_resource_name regex not cached** (kube.rs:303-304)
- `Regex::new()` called on every validation call
- Should use `lazy_static!` or `once_cell::sync::Lazy<Regex>`
6. **Temp kubeconfig not cleaned on all paths** (kube.rs:524-534)
- `TempFileCleanup` struct exists but only used in `discover_pods`
- `start_port_forward` and `test_cluster_connection` don't clean up
7. **Tests don't verify subprocess exists** (cluster_management.rs:278-290)
- No mock Command framework or subprocess verification
## Implementation Plan
### Phase 1: Core Architecture Fix
**Goal:** Replace unsafe `Arc<Mutex<Child>>` with proper async-safe storage
**Approach:**
1. Store `JoinHandle<()>` instead of `Child` directly
2. Spawn background task to wait on child and update session state
3. Use `tokio::sync::Mutex` for session state access
4. Implement proper async cleanup in `stop()` and `Drop`
### Phase 2: Error Handling
**Goal:** Capture and propagate kubectl subprocess errors
**Approach:**
1. Background task waits on child and captures exit status
2. Update session state with error messages on failure
3. Store stderr/stdout for debugging
4. Propagate errors to UI via session status
### Phase 3: Cleanup Improvements
**Goal:** Ensure temp files are always cleaned up
**Approach:**
1. Use RAII pattern consistently across all functions
2. Add cleanup hooks for panic/early-return paths
3. Store temp path in session struct for later cleanup
### Phase 4: Regex Caching
**Goal:** Cache compiled regex for performance
**Approach:**
1. Define `static ref NAME_PATTERN_REGEX: Lazy<Regex> = ...`
2. Replace `Regex::new()` call with static reference
## Files to Modify
1. `src-tauri/src/kube/portforward.rs` - Core architecture fix
2. `src-tauri/src/commands/kube.rs` - Integration and fixes
3. `src-tauri/tests/integration/kube/cluster_management.rs` - Add subprocess verification
4. `src-tauri/tests/integration/kube/port_forwarding.rs` - Add subprocess verification
## Test Strategy
After fixes:
1. Run `cargo test --lib` - expect 325 tests passing
2. Run `cargo clippy` - expect no warnings
3. Run type check: `npx tsc --noEmit` - expect no errors
4. Run frontend tests: `npm run test:run` - expect 98 tests passing