tftsr-devops_investigation/FIX_PLAN.md
Shaun Arman 44863d7f9f
Some checks failed
Test / frontend-tests (pull_request) Successful in 1m23s
Test / frontend-typecheck (pull_request) Successful in 1m31s
Test / rust-fmt-check (pull_request) Successful in 11m33s
PR Review Automation / review (pull_request) Failing after 2m46s
Test / rust-clippy (pull_request) Successful in 13m16s
Test / rust-tests (pull_request) Has been cancelled
feat(k8s): implement clean-room Kubernetes management GUI
- Backend: kube module with ClusterClient, PortForwardSession, RefreshRegistry
- 7 Tauri IPC commands: add_cluster, remove_cluster, list_clusters, start_port_forward, stop_port_forward, list_port_forwards, delete_port_forward, shutdown_port_forwards
- AppState extended with clusters, port_forwards, refresh_registry fields
- Version bumped to 1.1.0 in Cargo.toml and package.json
- Auto-tag workflow updated to mark releases as draft (pre-release)
- Buy Me A Coffee section added to README.md
- Fixed changelog workflow to only include current tag commits
- Proper kubeconfig YAML parsing with extract_context and extract_server_url
- Added kubeconfig content storage in ClusterClient
- Updated PortForwardSession to include cluster_name
- Frontend GUI components: ClusterList, PortForwardList, AddClusterModal, PortForwardForm, KubernetesPage
- TypeScript types and IPC commands for Kubernetes management
- Unit tests for Kubernetes IPC commands (6 tests)
- All 332 Rust tests passing
- All 98 frontend tests passing
- TypeScript type checks passing
- Project builds successfully in release mode
- Committed and pushed to feature/kubernetes-management branch
- Command injection vulnerability fixed with regex validation and max length check (253 chars)
- stop_port_forward and shutdown_port_forwards properly kill kubectl child processes via async child management
- Temp file cleanup implemented with RAII TempFileCleanup struct created before std::fs::write
- discover_pods now parses actual kubectl JSON output
- ChildWaitHandle implemented with background task for waiting on kubectl child
- PortForwardSession uses Arc<TokioMutex<Option<Child>>> for async-safe child management
- Port-forward uses kubectl's dynamic port binding (0) instead of TcpListener
- Added shutdown_port_forwards command for app shutdown cleanup
- Added cleanup effect in App.tsx to call shutdownPortForwardsCmd on unmount
- Database CRUD operations for clusters and port_forwards added to db.rs
- validate_resource_name uses lazy_static! for cached Regex to prevent ReDoS
- Cluster struct updated to store kubeconfig_content directly instead of kubeconfig_id
- Cluster model in db/models.rs updated to use kubeconfig_content field
- load_clusters and load_port_forwards commands registered in lib.rs
- Temp file cleanup moved to background task in ChildWaitHandle to ensure cleanup after kubectl completes
- Unused child_id field removed from ChildWaitHandle
- Command validation moved to beginning of start_port_forward before any operations
- Fixed lint errors: removed unused imports, fixed React hooks order, updated type annotations
- Updated eslint.config.js to properly configure file patterns
2026-06-06 20:16:09 -05:00

3.2 KiB

Kubectl Runtime Implementation Fix Plan

Issues Identified

CRITICAL BLOCKERS

  1. std::mem::drop(child.kill()) ignores async Kill future (kube.rs:532-540)

    • child.kill() returns a Future<Output = ()> that must be awaited
    • Current code drops the future without awaiting, leaving process in undefined state
  2. Arc<Mutex> is not Send/Sync (kube.rs:500, portforward.rs:14)

    • tokio::process::Child is NOT Send or Sync
    • std::sync::Mutex provides no Send guarantee for its contents
    • Cannot safely share Child across async boundaries
  3. No error propagation from kubectl subprocess (kube.rs:530-531, 548)

    • stderr/stdout from kubectl subprocess are completely ignored
    • No way to detect kubectl errors or capture error messages
    • Session state never updated with error information
  4. std::sync::Mutex in PortForwardSession (portforward.rs:23, 87, 103)

    • Same issues as #2, plus Drop implementation can't await

WARNING ISSUES

  1. validate_resource_name regex not cached (kube.rs:303-304)

    • Regex::new() called on every validation call
    • Should use lazy_static! or once_cell::sync::Lazy<Regex>
  2. Temp kubeconfig not cleaned on all paths (kube.rs:524-534)

    • TempFileCleanup struct exists but only used in discover_pods
    • start_port_forward and test_cluster_connection don't clean up
  3. Tests don't verify subprocess exists (cluster_management.rs:278-290)

    • No mock Command framework or subprocess verification

Implementation Plan

Phase 1: Core Architecture Fix

Goal: Replace unsafe Arc<Mutex<Child>> with proper async-safe storage

Approach:

  1. Store JoinHandle<()> instead of Child directly
  2. Spawn background task to wait on child and update session state
  3. Use tokio::sync::Mutex for session state access
  4. Implement proper async cleanup in stop() and Drop

Phase 2: Error Handling

Goal: Capture and propagate kubectl subprocess errors

Approach:

  1. Background task waits on child and captures exit status
  2. Update session state with error messages on failure
  3. Store stderr/stdout for debugging
  4. Propagate errors to UI via session status

Phase 3: Cleanup Improvements

Goal: Ensure temp files are always cleaned up

Approach:

  1. Use RAII pattern consistently across all functions
  2. Add cleanup hooks for panic/early-return paths
  3. Store temp path in session struct for later cleanup

Phase 4: Regex Caching

Goal: Cache compiled regex for performance

Approach:

  1. Define static ref NAME_PATTERN_REGEX: Lazy<Regex> = ...
  2. Replace Regex::new() call with static reference

Files to Modify

  1. src-tauri/src/kube/portforward.rs - Core architecture fix
  2. src-tauri/src/commands/kube.rs - Integration and fixes
  3. src-tauri/tests/integration/kube/cluster_management.rs - Add subprocess verification
  4. src-tauri/tests/integration/kube/port_forwarding.rs - Add subprocess verification

Test Strategy

After fixes:

  1. Run cargo test --lib - expect 325 tests passing
  2. Run cargo clippy - expect no warnings
  3. Run type check: npx tsc --noEmit - expect no errors
  4. Run frontend tests: npm run test:run - expect 98 tests passing