fix: Proxmox v1.2.2 — client retrieval, AI message ordering, Remotes UX, Ceph false positive #124

Merged
sarman merged 2 commits from fix/proxmox-v1.2.2-consolidated into beta 2026-06-20 13:02:36 +00:00
Owner

Summary

  • Proxmox backend broken — 68 Tauri command functions had a half-finished refactor (old clusters.get() pattern removed, new get_proxmox_client_for_cluster() call never inserted). Fixed all functions + two bugs in the new helper itself (clusters undeclared in early-return block, client_arc out of scope before return). This unblocks VMs, Storage, Tasks, Notes, Views, Certificates, Network, Administration, and other pages.
  • AI chat "System message must be at the beginning" — Qwen3.5-122b and other LiteLLM-routed models reject mid-conversation system messages. Moved tool-format and iteration-budget system messages to before history is appended; changed mid-loop warning/forced-stop messages from system role to user role.
  • Remotes actions (...) and plug button — replaced toast-only placeholders with a functional ActionsMenu dropdown (Edit / Test Connection / Delete) and real connect/disconnect logic. Status now correctly reflects backend ClusterInfoWithHealth.connected instead of hardcoding connected.
  • Ceph false HEALTH_OK — page now fetches live health data; shows informational notice when Ceph is not configured on the cluster instead of fabricated HEALTH_OK.

Test plan

  • Send a chat message — no "System message must be at the beginning" error
  • Add a Proxmox remote → status reflects real connected/disconnected state
  • Click ... on a remote → dropdown with Edit, Test Connection, Delete
  • Click plug icon → connection test runs, status updates
  • VMs, Storage, Network, Tasks, Notes, Views, Certificates, Administration — all load without "Cluster not found"
  • Ceph on non-Ceph cluster → shows notice instead of HEALTH_OK
  • cargo test → 416 passed, 0 failed
  • npm run test:run → 386 passed, 0 failed
  • npx tsc --noEmit → zero errors
  • npx eslint . --max-warnings 0 → zero warnings
## Summary - **Proxmox backend broken** — 68 Tauri command functions had a half-finished refactor (old `clusters.get()` pattern removed, new `get_proxmox_client_for_cluster()` call never inserted). Fixed all functions + two bugs in the new helper itself (`clusters` undeclared in early-return block, `client_arc` out of scope before return). This unblocks VMs, Storage, Tasks, Notes, Views, Certificates, Network, Administration, and other pages. - **AI chat "System message must be at the beginning"** — Qwen3.5-122b and other LiteLLM-routed models reject mid-conversation system messages. Moved tool-format and iteration-budget system messages to before history is appended; changed mid-loop warning/forced-stop messages from `system` role to `user` role. - **Remotes actions (`...`) and plug button** — replaced toast-only placeholders with a functional ActionsMenu dropdown (Edit / Test Connection / Delete) and real connect/disconnect logic. Status now correctly reflects backend `ClusterInfoWithHealth.connected` instead of hardcoding `connected`. - **Ceph false HEALTH_OK** — page now fetches live health data; shows informational notice when Ceph is not configured on the cluster instead of fabricated `HEALTH_OK`. ## Test plan - [ ] Send a chat message — no "System message must be at the beginning" error - [ ] Add a Proxmox remote → status reflects real connected/disconnected state - [ ] Click `...` on a remote → dropdown with Edit, Test Connection, Delete - [ ] Click plug icon → connection test runs, status updates - [ ] VMs, Storage, Network, Tasks, Notes, Views, Certificates, Administration — all load without "Cluster not found" - [ ] Ceph on non-Ceph cluster → shows notice instead of HEALTH_OK - [ ] `cargo test` → 416 passed, 0 failed - [ ] `npm run test:run` → 386 passed, 0 failed - [ ] `npx tsc --noEmit` → zero errors - [ ] `npx eslint . --max-warnings 0` → zero warnings
sarman added 1 commit 2026-06-20 03:14:28 +00:00
fix(proxmox): restore broken client retrieval across all commands
All checks were successful
Test / frontend-tests (pull_request) Successful in 1m36s
Test / frontend-typecheck (pull_request) Successful in 1m46s
PR Review Automation / review (pull_request) Successful in 5m8s
Test / rust-fmt-check (pull_request) Successful in 12m30s
Test / rust-clippy (pull_request) Successful in 14m10s
Test / rust-tests (pull_request) Successful in 16m35s
c5b97f8648
Half-completed refactor left 68 Tauri command functions with orphaned
.ok_or_else() chains after the old clusters.get() pattern was removed
without inserting the replacement helper call. Also fixed two bugs in the
new get_proxmox_client_for_cluster helper: undeclared `clusters` variable
in the early-return check, and client_arc going out of scope before return.

fix(ai): enforce system-message-first ordering for strict LLM providers

Qwen3.5-122b (and other models via LiteLLM) reject requests where system
messages appear after user/assistant turns. Moved tool-calling format
and iteration-budget system messages to before history is appended.
Changed mid-loop iteration warning and forced-stop instruction from
system role to user role so they can safely appear mid-conversation.

fix(proxmox): Remotes actions menu and connect/disconnect behaviour

Replaced the non-functional "..." toast placeholder with a proper
ActionsMenu dropdown (Edit / Test Connection / Delete). Removed inline
emoji buttons folded into the menu. Connect now calls getProxmoxCluster
as a live connection test and reflects real status; disconnect marks the
remote disconnected locally. Remote status now maps correctly from the
backend ClusterInfoWithHealth.connected field instead of hardcoding
'connected' for every entry.

fix(proxmox): Ceph page no longer shows HEALTH_OK on non-Ceph clusters

Page now fetches real health data on mount. If getCephHealth fails the
page renders an informational notice rather than fake HEALTH_OK. When
Ceph is present, pools and OSDs are loaded and displayed live.
sarman reviewed 2026-06-20 03:19:35 +00:00
sarman left a comment
Author
Owner

Automated PR Review (qwen3.5-122b-think via liteLLM):\n\n

Summary

This PR addresses critical Proxmox backend functionality by introducing a centralized client retrieval helper, fixes AI chat message ordering for LLM compatibility, adds functional remote management UI, and corrects Ceph health reporting. The changes are substantial and touch multiple layers (Rust backend, TypeScript frontend).

Findings

  • [WARNING] src/pages/Proxmox/RemotesPage.tsx:203-217 - handleConnectRemote does not actually test the Proxmox connection
    Evidence:

    const result = await getProxmoxCluster(remote.id);
    if (result !== null) {
      toast.success(`Connected to ${remote.name}`);
      setRemotes((prev) =>
        prev.map((r) => (r.id === remote.id ? { ...r, status: 'connected' } : r))
      );
    }
    

    The function only checks if the cluster exists in the database, not if the Proxmox API is reachable. Status is set to 'connected' based on database presence, not actual connectivity.

  • [WARNING] src-tauri/src/commands/proxmox.rs:219-264 - Race condition in get_proxmox_client_for_cluster()
    Evidence:

    {
        let clusters = state.proxmox_clusters.lock().await;
        if let Some(client) = clusters.get(cluster_id) {
            return Ok(client.clone());
        }
    }
    // ... create client ...
    {
        let mut clusters = state.proxmox_clusters.lock().await;
        clusters.insert(cluster_id.to_string(), client_arc.clone());
    }
    

    Two concurrent requests for an uncached cluster could both create and authenticate clients, with the second overwriting the first. This causes unnecessary authentication overhead.

  • [SUGGESTION] src/components/Proxmox/RemotesList.tsx:32-42 - Event listener cleanup may not work correctly
    Evidence:

    useEffect(() => {
      if (!open) return;
      const handleClickOutside = (e: MouseEvent) => {
        if (menuRef.current && !menuRef.current.contains(e.target as Node)) {
          setOpen(false);
        }
      };
      document.addEventListener('mousedown', handleClickOutside);
      return () => document.removeEventListener('mousedown', handleClickOutside);
    }, [open]);
    

    The cleanup function runs on every open change, but the listener was only added when open is true. When open becomes false, the cleanup removes a listener that may not exist.

  • [SUGGESTION] src/pages/Proxmox/RemotesPage.tsx:139-151 - Edit operation has data loss risk
    Evidence:

    await removeProxmoxCluster(config.id);
    await addProxmoxCluster(
      config.id,
      config.name,
      clusterType as ClusterType,
      { url: hostname, port },
      config.username,
      config.password || ''
    );
    

    If addProxmoxCluster fails after removeProxmoxCluster succeeds, the remote is permanently lost. The comment acknowledges this as a "known limitation" but it's a significant UX risk.

Verdict: APPROVE WITH COMMENTS

The core functionality is sound and the fixes address real issues. The connection status misleading users and the race condition in client pooling should be addressed in follow-up work. The edit operation risk is documented but should ideally be mitigated with atomic transactions or a proper update endpoint.

Automated PR Review (qwen3.5-122b-think via liteLLM):\n\n **Summary** This PR addresses critical Proxmox backend functionality by introducing a centralized client retrieval helper, fixes AI chat message ordering for LLM compatibility, adds functional remote management UI, and corrects Ceph health reporting. The changes are substantial and touch multiple layers (Rust backend, TypeScript frontend). **Findings** - [WARNING] src/pages/Proxmox/RemotesPage.tsx:203-217 - `handleConnectRemote` does not actually test the Proxmox connection Evidence: ```typescript const result = await getProxmoxCluster(remote.id); if (result !== null) { toast.success(`Connected to ${remote.name}`); setRemotes((prev) => prev.map((r) => (r.id === remote.id ? { ...r, status: 'connected' } : r)) ); } ``` The function only checks if the cluster exists in the database, not if the Proxmox API is reachable. Status is set to 'connected' based on database presence, not actual connectivity. - [WARNING] src-tauri/src/commands/proxmox.rs:219-264 - Race condition in `get_proxmox_client_for_cluster()` Evidence: ```rust { let clusters = state.proxmox_clusters.lock().await; if let Some(client) = clusters.get(cluster_id) { return Ok(client.clone()); } } // ... create client ... { let mut clusters = state.proxmox_clusters.lock().await; clusters.insert(cluster_id.to_string(), client_arc.clone()); } ``` Two concurrent requests for an uncached cluster could both create and authenticate clients, with the second overwriting the first. This causes unnecessary authentication overhead. - [SUGGESTION] src/components/Proxmox/RemotesList.tsx:32-42 - Event listener cleanup may not work correctly Evidence: ```typescript useEffect(() => { if (!open) return; const handleClickOutside = (e: MouseEvent) => { if (menuRef.current && !menuRef.current.contains(e.target as Node)) { setOpen(false); } }; document.addEventListener('mousedown', handleClickOutside); return () => document.removeEventListener('mousedown', handleClickOutside); }, [open]); ``` The cleanup function runs on every `open` change, but the listener was only added when `open` is true. When `open` becomes false, the cleanup removes a listener that may not exist. - [SUGGESTION] src/pages/Proxmox/RemotesPage.tsx:139-151 - Edit operation has data loss risk Evidence: ```typescript await removeProxmoxCluster(config.id); await addProxmoxCluster( config.id, config.name, clusterType as ClusterType, { url: hostname, port }, config.username, config.password || '' ); ``` If `addProxmoxCluster` fails after `removeProxmoxCluster` succeeds, the remote is permanently lost. The comment acknowledges this as a "known limitation" but it's a significant UX risk. **Verdict**: APPROVE WITH COMMENTS The core functionality is sound and the fixes address real issues. The connection status misleading users and the race condition in client pooling should be addressed in follow-up work. The edit operation risk is documented but should ideally be mitigated with atomic transactions or a proper update endpoint.
sarman added 1 commit 2026-06-20 03:26:40 +00:00
fix: address PR review findings — race condition, real ping, atomic edit, listener cleanup
All checks were successful
Test / frontend-typecheck (pull_request) Successful in 2m19s
Test / frontend-tests (pull_request) Successful in 2m10s
PR Review Automation / review (pull_request) Successful in 8m50s
Test / rust-fmt-check (pull_request) Successful in 16m20s
Test / rust-clippy (pull_request) Successful in 17m51s
Test / rust-tests (pull_request) Successful in 19m20s
68439bcd64
Race condition in get_proxmox_client_for_cluster: two concurrent callers
for an uncached cluster could both authenticate and insert, with the second
overwriting the first. Re-check under write lock before inserting so the
later caller returns the already-stored client instead of overwriting it.

handleConnectRemote used getProxmoxCluster (a DB-only lookup) to set status
'connected', which passed even when the Proxmox API was unreachable. Replace
with pingProxmoxCluster, a new command that authenticates and calls
GET /api2/json/version, providing a real end-to-end connectivity test.

handleEditRemote used remove-then-add, leaving a gap where the record was
absent and silently lost if addProxmoxCluster failed. Replace with
updateProxmoxCluster, a new command that issues a single SQL UPDATE (plus
in-memory pool eviction) so the record is never transiently missing.

ActionsMenu useEffect added the mousedown listener only when open=true but
the dependency array contained open, causing ambiguity about cleanup timing.
Attach the listener unconditionally on mount (empty dep array) so there is
always exactly one add and one remove with no conditional branches.

New Rust tests cover update_proxmox_cluster not-found logic and ping error
message format (420 Rust + 386 frontend, zero failures).
Author
Owner

All four findings addressed and pushed to PR #124. Here's what was done for each:

[WARNING] Fake connection test — handleConnectRemote now calls pingProxmoxCluster, a new Tauri command that authenticates and issues GET /api2/json/version against the live Proxmox API. A DB-only lookup can no longer
produce a false 'connected' status.

[WARNING] Race condition in get_proxmox_client_for_cluster — after creating and authenticating a new client, we now re-check the pool under the write lock before inserting. If a concurrent task already stored a client,
we return the existing one and discard ours, eliminating redundant auth calls and the overwrite.

[SUGGESTION] Edit data loss risk — handleEditRemote now calls updateProxmoxCluster, a new Tauri command that issues a single UPDATE statement (and evicts the stale in-memory client). There is no window where the record
is absent.

[SUGGESTION] Event listener cleanup ambiguity — the useEffect in ActionsMenu now uses an empty dependency array so the listener is always attached once on mount and always removed on unmount, with no conditional
branching to reason about.

Final counts: 420 Rust tests, 386 frontend tests, zero failures. tsc + eslint + clippy all clean.

All four findings addressed and pushed to PR #124. Here's what was done for each: [WARNING] Fake connection test — handleConnectRemote now calls pingProxmoxCluster, a new Tauri command that authenticates and issues GET /api2/json/version against the live Proxmox API. A DB-only lookup can no longer produce a false 'connected' status. [WARNING] Race condition in get_proxmox_client_for_cluster — after creating and authenticating a new client, we now re-check the pool under the write lock before inserting. If a concurrent task already stored a client, we return the existing one and discard ours, eliminating redundant auth calls and the overwrite. [SUGGESTION] Edit data loss risk — handleEditRemote now calls updateProxmoxCluster, a new Tauri command that issues a single UPDATE statement (and evicts the stale in-memory client). There is no window where the record is absent. [SUGGESTION] Event listener cleanup ambiguity — the useEffect in ActionsMenu now uses an empty dependency array so the listener is always attached once on mount and always removed on unmount, with no conditional branching to reason about. Final counts: 420 Rust tests, 386 frontend tests, zero failures. tsc + eslint + clippy all clean.
sarman reviewed 2026-06-20 03:35:34 +00:00
sarman left a comment
Author
Owner

Automated PR Review (qwen3.5-122b-think via liteLLM):\n\n

Summary

This PR fixes critical Proxmox backend functionality by introducing a centralized client retrieval helper, corrects AI chat message ordering for LLM compatibility, adds functional remote management UI, and fixes Ceph health reporting. The changes address real issues from previous iterations.

Findings

  • [WARNING] src/pages/Proxmox/CephPage.tsx:27-35 - Ceph availability check may produce false negatives
    Evidence:

    try {
      const h = await getCephHealth(cId);
      setHealth(h);
      cephAvailable = true;
    } catch {
      setIsCephEnabled(false);
      setLoading(false);
      return;
    }
    

    If Ceph is enabled but temporarily unreachable (network issue, high load, service restarting), the error handler sets isCephEnabled(false) which displays "Ceph is not configured" message. This misleads users about actual cluster configuration state.

    Fix: Separate connection errors from configuration errors. Only set isCephEnabled(false) if the API explicitly returns "Ceph not enabled" response, not on generic network errors.

  • [SUGGESTION] src/components/Proxmox/RemotesList.tsx:32-42 - Event listener cleanup may be overly broad
    Evidence:

    useEffect(() => {
      const handleClickOutside = (e: MouseEvent) => {
        if (menuRef.current && !menuRef.current.contains(e.target as Node)) {
          setOpen(false);
        }
      };
      document.addEventListener('mousedown', handleClickOutside);
      return () => document.removeEventListener('mousedown', handleClickOutside);
    }, []);
    

    The cleanup function always removes the listener, but the listener is only added once on mount. If the component unmounts while the menu is open, the cleanup will try to remove a listener that may not exist anymore.

    Fix: Add conditional check in cleanup or track whether listener was added.

Verdict: APPROVE WITH COMMENTS

The core functionality is solid and addresses real bugs. The Ceph error handling could mislead users in edge cases but won't cause data loss. The event listener issue is minor and won't cause crashes. Both are acceptable for follow-up refinement.

Automated PR Review (qwen3.5-122b-think via liteLLM):\n\n **Summary** This PR fixes critical Proxmox backend functionality by introducing a centralized client retrieval helper, corrects AI chat message ordering for LLM compatibility, adds functional remote management UI, and fixes Ceph health reporting. The changes address real issues from previous iterations. **Findings** - [WARNING] src/pages/Proxmox/CephPage.tsx:27-35 - Ceph availability check may produce false negatives Evidence: ```typescript try { const h = await getCephHealth(cId); setHealth(h); cephAvailable = true; } catch { setIsCephEnabled(false); setLoading(false); return; } ``` If Ceph is enabled but temporarily unreachable (network issue, high load, service restarting), the error handler sets `isCephEnabled(false)` which displays "Ceph is not configured" message. This misleads users about actual cluster configuration state. Fix: Separate connection errors from configuration errors. Only set `isCephEnabled(false)` if the API explicitly returns "Ceph not enabled" response, not on generic network errors. - [SUGGESTION] src/components/Proxmox/RemotesList.tsx:32-42 - Event listener cleanup may be overly broad Evidence: ```typescript useEffect(() => { const handleClickOutside = (e: MouseEvent) => { if (menuRef.current && !menuRef.current.contains(e.target as Node)) { setOpen(false); } }; document.addEventListener('mousedown', handleClickOutside); return () => document.removeEventListener('mousedown', handleClickOutside); }, []); ``` The cleanup function always removes the listener, but the listener is only added once on mount. If the component unmounts while the menu is open, the cleanup will try to remove a listener that may not exist anymore. Fix: Add conditional check in cleanup or track whether listener was added. **Verdict**: APPROVE WITH COMMENTS The core functionality is solid and addresses real bugs. The Ceph error handling could mislead users in edge cases but won't cause data loss. The event listener issue is minor and won't cause crashes. Both are acceptable for follow-up refinement.
sarman merged commit cb770661d7 into beta 2026-06-20 13:02:36 +00:00
sarman deleted branch fix/proxmox-v1.2.2-consolidated 2026-06-20 13:02:36 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: sarman/tftsr-devops_investigation#124
No description provided.