Some checks failed
Test / rust-fmt-check (pull_request) Failing after 0s
Test / rust-clippy (pull_request) Failing after 1s
Test / rust-tests (pull_request) Failing after 0s
Test / frontend-typecheck (pull_request) Failing after 16s
Test / frontend-tests (pull_request) Failing after 18s
PR Review Automation / review (pull_request) Failing after 4m13s
Complete backport of all features from apollo_nxt-trcaa repository: - Three-tier shell execution safety system (Tier 1: auto, Tier 2: approve, Tier 3: deny) - Ollama function calling with tool use support - AI provider tool calling auto-detection - kubectl binary bundling and management - kubeconfig upload and context management - Shell approval modal with real-time UI - MCP protocol HTTP transport with custom headers - Enhanced security audit logging - Comprehensive test coverage (275+ tests) - Updated CI/CD workflows for Gitea Actions - Complete documentation (ADRs, wiki, release notes) Sanitization applied to all files: - Removed all MSI, Motorola, VNXT, Vesta references - Replaced internal infrastructure references with TFTSR equivalents - Updated all URLs and API endpoints - Sanitized commit history references in documentation Technical changes: - New modules: shell/classifier, shell/executor, shell/kubectl, shell/kubeconfig - Enhanced AI providers: ollama.rs, openai.rs with function calling - New Tauri commands: shell execution, kubeconfig management, tool calling detection - Database migrations: shell_execution_audit table - Frontend: ShellApprovalModal, ShellExecution, KubeconfigManager pages - CI/CD: kubectl bundling, multi-platform builds, Gitea Actions integration Version: 1.0.8 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
176 lines
6.1 KiB
Markdown
176 lines
6.1 KiB
Markdown
# v1.0.5 Release Summary
|
|
|
|
**Date**: June 3, 2026
|
|
**PR**: [#39](https://github.com/tftsr/apollo_nxt-trcaa/pull/39)
|
|
**ADO**: [#727547](https://dev.azure.com/tftsr/Apollo/_workitems/edit/727547)
|
|
**Status**: In Review
|
|
|
|
---
|
|
|
|
## Description
|
|
|
|
Post-hackathon fixes addressing agent output quality issues and provider compatibility documentation.
|
|
|
|
---
|
|
|
|
## Acceptance Criteria
|
|
|
|
- [x] Ollama no longer echoes raw JSON tool call payloads to users
|
|
- [x] LiteLLM diagnostic queries execute actual commands instead of status JSON
|
|
- [x] TFTSR GenAI incompatibility documented with recommendations
|
|
- [x] All tests passing (280 Rust, 103 frontend)
|
|
- [x] All linting clean (clippy, TypeScript)
|
|
|
|
---
|
|
|
|
## Work Implemented
|
|
|
|
### Issue 1: Verbose JSON Output (Ollama)
|
|
|
|
**Problem**: Agent was echoing tool call requests and responses to users in JSON format:
|
|
```
|
|
Let's execute a kubectl command:
|
|
|
|
{"requesting_agent": "devops-incident-responder", "request_type": "execute_shell_command", ...}
|
|
|
|
Response:
|
|
{"stdout": [...]}
|
|
```
|
|
|
|
**Root Cause**: Agent prompt didn't explicitly prohibit showing tool call JSON to users.
|
|
|
|
**Fix**: Added CRITICAL instruction in `devops_incident_responder.md`:
|
|
> Never echo tool call requests or responses in your user-facing output. When you invoke execute_shell_command, DO NOT show the JSON request payload to the user. After receiving the tool result, present ONLY the meaningful output in natural language or formatted results.
|
|
|
|
### Issue 2: No Actual Investigation (LiteLLM)
|
|
|
|
**Problem**: Diagnostic queries like "investigate telemetry issues" returned status JSON objects without executing commands:
|
|
```json
|
|
{
|
|
"agent": "devops-incident-responder",
|
|
"status": "investigating",
|
|
"progress": {"phase": "Phase 1: Detection & Evidence Gathering", ...}
|
|
}
|
|
```
|
|
|
|
**Root Cause**: Agent treated diagnostic investigations as status updates rather than actionable tasks.
|
|
|
|
**Fix**: Strengthened Diagnostic Investigation section:
|
|
- Added CRITICAL: Actually execute the diagnostic commands via execute_shell_command tool
|
|
- Added explicit instruction: DO NOT just output status JSON
|
|
- Added warning: Outputting status JSON instead of executing commands is a critical failure
|
|
- Clarified examples to include "Investigate telemetry issues"
|
|
|
|
### Issue 3: TFTSR GenAI Tool Calling Incompatibility
|
|
|
|
**Problem**: TFTSR GenAI gateway returns:
|
|
```
|
|
503 Service Unavailable: {"status":false,"msg":"Gemini Filter Triggered: UNEXPECTED_TOOL_CALL"}
|
|
```
|
|
|
|
**Root Cause**: Gateway-level content filtering blocks tool calls before they reach the client. The workaround parser in PR#38 cannot overcome this because the filtering happens at the gateway layer.
|
|
|
|
**Fix**: Documented in `docs/wiki/AI-Providers.md`:
|
|
- Created dedicated "TFTSR GenAI" section
|
|
- Documented limitations:
|
|
- ❌ Tool calling not supported
|
|
- ❌ Shell execution unavailable
|
|
- ✅ Basic chat works
|
|
- ✅ Workaround parser included (attempts to parse malformed responses)
|
|
- Recommended alternatives: LiteLLM + AWS Bedrock or Ollama
|
|
- Explained root cause: Gateway-level filtering cannot be worked around from client side
|
|
|
|
---
|
|
|
|
## Testing Needed
|
|
|
|
### Automated Tests
|
|
- [x] Rust unit tests: 280 passing
|
|
- [x] Frontend tests: 103 passing
|
|
- [x] Clippy: clean
|
|
- [x] TypeScript: clean
|
|
|
|
### Manual Tests
|
|
- [ ] **Ollama Simple Query**: Verify no JSON output shown to user
|
|
- Prompt: "What pods are running in default namespace?"
|
|
- Expected: Clean output without `{"requesting_agent": ...}` JSON
|
|
|
|
- [ ] **LiteLLM Diagnostic Query**: Verify commands are executed
|
|
- Prompt: "Investigate why telemetry data is not being collected"
|
|
- Expected: kubectl commands executed (get pods, describe, logs)
|
|
- Not expected: Status JSON object without command execution
|
|
|
|
- [ ] **TFTSR GenAI Error**: Verify documented error appears
|
|
- Any prompt with configured TFTSR GenAI provider
|
|
- Expected: 503 error with "Gemini Filter Triggered"
|
|
- Check: Error message helps user understand limitation
|
|
|
|
---
|
|
|
|
## Files Changed
|
|
|
|
| File | Changes |
|
|
|------|---------|
|
|
| `src-tauri/src/ai/agents/devops_incident_responder.md` | Added 3 CRITICAL instructions to suppress JSON output and enforce command execution |
|
|
| `docs/wiki/AI-Providers.md` | Added TFTSR GenAI section documenting tool calling incompatibility |
|
|
| `src-tauri/Cargo.toml` | Version bump to 1.0.5 |
|
|
| `src-tauri/tauri.conf.json` | Version bump to 1.0.5 |
|
|
| `package.json` | Version bump to 1.0.5 |
|
|
| `docs/v1.0.5-summary.md` | This release summary document |
|
|
| `docs/2026-HACKATHON-SUMMARY.md` | Added v1.0.5 section, Challenges 11-12, updated metrics |
|
|
|
|
**Total**: 7 files, +268 lines, -17 lines
|
|
|
|
---
|
|
|
|
## Impact Analysis
|
|
|
|
### User Experience
|
|
- **Positive**: Cleaner, more readable agent responses (no raw JSON)
|
|
- **Positive**: Diagnostic queries now produce actual investigation results
|
|
- **Positive**: Clear documentation prevents TFTSR GenAI tool calling confusion
|
|
|
|
### Performance
|
|
- **Neutral**: No performance impact (prompt changes only)
|
|
|
|
### Security
|
|
- **Neutral**: No security implications
|
|
|
|
### Compatibility
|
|
- **Positive**: All existing providers maintain compatibility
|
|
- **Documentation**: TFTSR GenAI limitations now clearly documented
|
|
|
|
---
|
|
|
|
## Related Work
|
|
|
|
- **v1.0.4 (PR #38)**: Graceful exit on tool iteration limit, TFTSR GenAI workaround parser
|
|
- **v1.0.3 (PR #37)**: Query classification (Simple/Diagnostic/Incident)
|
|
- **v1.0.2 (PR #31)**: LiteLLM integration, Ollama auto-start
|
|
- **v1.0.0 (PR #27, #28)**: Initial agentic shell execution
|
|
|
|
---
|
|
|
|
## Deployment Notes
|
|
|
|
No special deployment requirements. Changes are backward-compatible agent prompt updates.
|
|
|
|
---
|
|
|
|
## Lessons Learned
|
|
|
|
1. **Explicit instructions required**: Agent prompts need explicit prohibitions, not just positive instructions
|
|
2. **Status updates vs. actions**: Agents may confuse reporting status with taking action unless clearly directed
|
|
3. **Gateway limitations**: Some infrastructure limitations (TFTSR GenAI filtering) cannot be worked around at the client level
|
|
4. **Testing depth**: Need better manual test cases for agent behavior quality beyond unit tests
|
|
|
|
---
|
|
|
|
## Next Steps
|
|
|
|
After merge:
|
|
1. Update hackathon summary with v1.0.5 details
|
|
2. Test on macOS build when available
|
|
3. Monitor for any remaining agent behavior issues
|
|
4. Consider adding automated tests for agent output quality
|