Complete backport of all features from apollo_nxt-trcaa repository: - Three-tier shell execution safety system (Tier 1: auto, Tier 2: approve, Tier 3: deny) - Ollama function calling with tool use support - AI provider tool calling auto-detection - kubectl binary bundling and management - kubeconfig upload and context management - Shell approval modal with real-time UI - MCP protocol HTTP transport with custom headers - Enhanced security audit logging - Comprehensive test coverage (275+ tests) - Updated CI/CD workflows for Gitea Actions - Complete documentation (ADRs, wiki, release notes) Sanitization applied to all files: - Removed all MSI, Motorola, VNXT, Vesta references - Replaced internal infrastructure references with TFTSR equivalents - Updated all URLs and API endpoints - Sanitized commit history references in documentation Technical changes: - New modules: shell/classifier, shell/executor, shell/kubectl, shell/kubeconfig - Enhanced AI providers: ollama.rs, openai.rs with function calling - New Tauri commands: shell execution, kubeconfig management, tool calling detection - Database migrations: shell_execution_audit table - Frontend: ShellApprovalModal, ShellExecution, KubeconfigManager pages - CI/CD: kubectl bundling, multi-platform builds, Gitea Actions integration Version: 1.0.8 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
6.1 KiB
v1.0.5 Release Summary
Date: June 3, 2026
PR: #39
ADO: #727547
Status: In Review
Description
Post-hackathon fixes addressing agent output quality issues and provider compatibility documentation.
Acceptance Criteria
- Ollama no longer echoes raw JSON tool call payloads to users
- LiteLLM diagnostic queries execute actual commands instead of status JSON
- TFTSR GenAI incompatibility documented with recommendations
- All tests passing (280 Rust, 103 frontend)
- All linting clean (clippy, TypeScript)
Work Implemented
Issue 1: Verbose JSON Output (Ollama)
Problem: Agent was echoing tool call requests and responses to users in JSON format:
Let's execute a kubectl command:
{"requesting_agent": "devops-incident-responder", "request_type": "execute_shell_command", ...}
Response:
{"stdout": [...]}
Root Cause: Agent prompt didn't explicitly prohibit showing tool call JSON to users.
Fix: Added CRITICAL instruction in devops_incident_responder.md:
Never echo tool call requests or responses in your user-facing output. When you invoke execute_shell_command, DO NOT show the JSON request payload to the user. After receiving the tool result, present ONLY the meaningful output in natural language or formatted results.
Issue 2: No Actual Investigation (LiteLLM)
Problem: Diagnostic queries like "investigate telemetry issues" returned status JSON objects without executing commands:
{
"agent": "devops-incident-responder",
"status": "investigating",
"progress": {"phase": "Phase 1: Detection & Evidence Gathering", ...}
}
Root Cause: Agent treated diagnostic investigations as status updates rather than actionable tasks.
Fix: Strengthened Diagnostic Investigation section:
- Added CRITICAL: Actually execute the diagnostic commands via execute_shell_command tool
- Added explicit instruction: DO NOT just output status JSON
- Added warning: Outputting status JSON instead of executing commands is a critical failure
- Clarified examples to include "Investigate telemetry issues"
Issue 3: TFTSR GenAI Tool Calling Incompatibility
Problem: TFTSR GenAI gateway returns:
503 Service Unavailable: {"status":false,"msg":"Gemini Filter Triggered: UNEXPECTED_TOOL_CALL"}
Root Cause: Gateway-level content filtering blocks tool calls before they reach the client. The workaround parser in PR#38 cannot overcome this because the filtering happens at the gateway layer.
Fix: Documented in docs/wiki/AI-Providers.md:
- Created dedicated "TFTSR GenAI" section
- Documented limitations:
- ❌ Tool calling not supported
- ❌ Shell execution unavailable
- ✅ Basic chat works
- ✅ Workaround parser included (attempts to parse malformed responses)
- Recommended alternatives: LiteLLM + AWS Bedrock or Ollama
- Explained root cause: Gateway-level filtering cannot be worked around from client side
Testing Needed
Automated Tests
- Rust unit tests: 280 passing
- Frontend tests: 103 passing
- Clippy: clean
- TypeScript: clean
Manual Tests
-
Ollama Simple Query: Verify no JSON output shown to user
- Prompt: "What pods are running in default namespace?"
- Expected: Clean output without
{"requesting_agent": ...}JSON
-
LiteLLM Diagnostic Query: Verify commands are executed
- Prompt: "Investigate why telemetry data is not being collected"
- Expected: kubectl commands executed (get pods, describe, logs)
- Not expected: Status JSON object without command execution
-
TFTSR GenAI Error: Verify documented error appears
- Any prompt with configured TFTSR GenAI provider
- Expected: 503 error with "Gemini Filter Triggered"
- Check: Error message helps user understand limitation
Files Changed
| File | Changes |
|---|---|
src-tauri/src/ai/agents/devops_incident_responder.md |
Added 3 CRITICAL instructions to suppress JSON output and enforce command execution |
docs/wiki/AI-Providers.md |
Added TFTSR GenAI section documenting tool calling incompatibility |
src-tauri/Cargo.toml |
Version bump to 1.0.5 |
src-tauri/tauri.conf.json |
Version bump to 1.0.5 |
package.json |
Version bump to 1.0.5 |
docs/v1.0.5-summary.md |
This release summary document |
docs/2026-HACKATHON-SUMMARY.md |
Added v1.0.5 section, Challenges 11-12, updated metrics |
Total: 7 files, +268 lines, -17 lines
Impact Analysis
User Experience
- Positive: Cleaner, more readable agent responses (no raw JSON)
- Positive: Diagnostic queries now produce actual investigation results
- Positive: Clear documentation prevents TFTSR GenAI tool calling confusion
Performance
- Neutral: No performance impact (prompt changes only)
Security
- Neutral: No security implications
Compatibility
- Positive: All existing providers maintain compatibility
- Documentation: TFTSR GenAI limitations now clearly documented
Related Work
- v1.0.4 (PR #38): Graceful exit on tool iteration limit, TFTSR GenAI workaround parser
- v1.0.3 (PR #37): Query classification (Simple/Diagnostic/Incident)
- v1.0.2 (PR #31): LiteLLM integration, Ollama auto-start
- v1.0.0 (PR #27, #28): Initial agentic shell execution
Deployment Notes
No special deployment requirements. Changes are backward-compatible agent prompt updates.
Lessons Learned
- Explicit instructions required: Agent prompts need explicit prohibitions, not just positive instructions
- Status updates vs. actions: Agents may confuse reporting status with taking action unless clearly directed
- Gateway limitations: Some infrastructure limitations (TFTSR GenAI filtering) cannot be worked around at the client level
- Testing depth: Need better manual test cases for agent behavior quality beyond unit tests
Next Steps
After merge:
- Update hackathon summary with v1.0.5 details
- Test on macOS build when available
- Monitor for any remaining agent behavior issues
- Consider adding automated tests for agent output quality