dgx-spark-playbooks/nvidia/station-healthcare-agent/assets/RUNBOOK.md
2026-05-26 18:25:53 +00:00

450 lines
18 KiB
Markdown

# Technical Runbook
Installation, configuration, troubleshooting, and operational details for the Clinical Intelligence Playbook.
> [!IMPORTANT]
> The supported install path is **Docker Ollama via `make up`** (see `instructions.md` Step 3 of the parent playbook, or `SETUP-GUIDE.md`). The manual steps below are for advanced users who want to wire components by hand or substitute a host Ollama install. If host Ollama is already running on port 11434, stop it (`sudo systemctl stop ollama`) or override `OLLAMA_PORT` in `.env` before starting Docker Ollama.
---
## Quick Start (OpenShell Sandbox on GB300)
This sets up an isolated OpenShell sandbox with OpenClaw and a local Ollama inference backend. Everything the agent touches — filesystem, network, processes — is confined to the sandbox.
**Requirements:** DGX Station GB300, Python 3.10+, `uv`, Docker + NVIDIA Container Toolkit, OpenShell CLI ≥ 0.0.33, Node.js 22+ LTS.
### 1. Install OpenShell
```bash
uv pip install openshell --upgrade --pre \
--index-url https://urm.nvidia.com/artifactory/api/pypi/nv-shared-pypi/simple
```
### 2. Start Ollama (Docker, recommended)
The bundled `docker-compose.yml` runs Ollama in a container, binds it to `0.0.0.0:${OLLAMA_PORT:-11434}`, and pins it to a single GPU (`LLM_GPU` in `.env`):
```bash
make up # docker compose up -d ollama openfold3 + model-pull
make status # confirm Ollama and OpenFold3 are healthy
```
> [!TIP]
> **Host Ollama alternative.** If you prefer running Ollama on the host (e.g., to share with another playbook), `sudo systemctl stop ollama` first or set `OLLAMA_PORT` in `.env` to a free port. Then `OLLAMA_HOST=0.0.0.0 ollama serve` and `ollama pull nemotron-3-super:120b-a12b`. The host's systemd unit binds to `127.0.0.1` by default — override with `Environment="OLLAMA_HOST=0.0.0.0"` in `/etc/systemd/system/ollama.service.d/override.conf` so the sandbox can reach Ollama via the Docker bridge.
### 3. Pull Model (Docker path runs this automatically)
`make up` invokes the `model-pull` service, which pulls `${OLLAMA_MODEL:-nemotron-3-super:120b-a12b}` (~86 GB) on first run. Subsequent runs skip if the model is cached.
If you opted into host Ollama, run `ollama pull nemotron-3-super:120b-a12b` manually.
### 4. Configure OpenShell Inference Provider
`make setup` runs this for you (and detects the Docker bridge IP automatically). To do it manually, replace `<HOST_IP>` with your Docker bridge IP (`ip -4 addr show docker0 | grep -oP 'inet \K[\d.]+'`):
```bash
openshell provider create \
--name ollama-local \
--type openai \
--credential OPENAI_API_KEY=ollama \
--config OPENAI_BASE_URL=http://<HOST_IP>:${OLLAMA_PORT:-11434}/v1
openshell inference set \
--provider ollama-local \
--model nemotron-3-super:120b-a12b
```
> [!NOTE]
> Current OpenShell releases do not accept the `--base-url` shorthand for `provider create`. Use `--config OPENAI_BASE_URL=...` as shown above. The `setup_sandbox.sh` script uses this form.
### 5. Generate the Sandbox Policy
The repo's `sandbox-policy.yaml` is a **template**: it contains a `__DOCKER_BRIDGE_IP__` placeholder that must be substituted with the host's `docker0` IP before the policy is valid. The helper `scripts/gen_sandbox_policy.sh` does this automatically and writes `sandbox-policy-local.yaml`:
```bash
bash scripts/gen_sandbox_policy.sh # writes sandbox-policy-local.yaml
```
`make setup` invokes this generator. If you are running steps by hand, always pass the **generated** file (`sandbox-policy-local.yaml`) to `openshell sandbox create`, not the template.
Verify the policy is working after sandbox creation:
```bash
# These should succeed:
curl -sk https://r4.smarthealthit.org/Patient?_count=1 | head -3 # FHIR
curl -sk https://inference.local/v1/models | head -3 # LLM
# These should fail (blocked):
curl -sk --max-time 5 https://google.com # denied
curl -sk --max-time 5 https://api.openai.com # denied
ping 8.8.8.8 -c 1 # Operation not permitted
```
### 6. Create the Sandbox
```bash
# Stop any stale forwards from a prior sandbox using the same port — these
# block re-creation with a "port already forwarded" error.
for s in $(openshell forward list 2>/dev/null | awk '/:18789 /{print $NF}' | sort -u); do
openshell forward stop 18789 "$s" 2>/dev/null || true
done
openshell sandbox create \
--policy sandbox-policy-local.yaml \
--provider ollama-local \
--forward 18789 \
--keep
```
`--keep` prevents the sandbox from being torn down on exit so you can re-enter it later.
### 7. Inside the Sandbox: Set Up OpenClaw
> **Note:** These manual steps (7-9) are automated by `make setup`. Use them only if you need to customize individual steps.
Everything from here runs **inside** the sandbox shell that OpenShell drops you into.
```bash
git clone https://github.com/jaival-nvidia/clinical-intelligence.git
cd clinical-intelligence
uv pip install pandas matplotlib
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
export NVM_DIR="$HOME/.nvm" && [ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"
nvm install --lts
npm install -g openclaw@latest
```
### 8. Inside the Sandbox: Configure Inference Provider
OpenShell maps the host Ollama to `inference.local` inside the sandbox. Save this as `configure_inference_local.py` and run it to register a custom OpenClaw provider:
```python
#!/usr/bin/env python3
"""configure_inference_local.py — patches OpenClaw config for OpenShell sandbox."""
import json
from pathlib import Path
config_path = Path.home() / ".openclaw" / "openclaw.json"
config_path.parent.mkdir(parents=True, exist_ok=True)
config = json.loads(config_path.read_text()) if config_path.exists() else {}
config.setdefault("providers", {})
config["providers"]["local-ollama"] = {
"type": "openai-compatible",
"baseUrl": "https://inference.local/v1",
"apiKey": "ollama",
"models": ["nemotron-3-super:120b-a12b"]
}
defaults = config.setdefault("agents", {}).setdefault("defaults", {})
defaults["model"] = "local-ollama/nemotron-3-super:120b-a12b"
defaults["timeoutSeconds"] = 600
sub = defaults.setdefault("subagents", {})
sub.update({"maxSpawnDepth": 2, "maxConcurrent": 4, "runTimeoutSeconds": 600})
config_path.write_text(json.dumps(config, indent=2))
print(f"Wrote {config_path}")
print("Provider: local-ollama -> https://inference.local/v1")
print("Model: nemotron-3-super:120b-a12b")
```
```bash
python3 configure_inference_local.py
```
### 9. Inside the Sandbox: Install Skills and Agents
```bash
# Deploy all 7 skills to ~/.openclaw/workspace/skills/ (matching setup_sandbox.sh)
mkdir -p ~/.openclaw/workspace/skills
cp -r skills/fhir-basics skills/clinical-knowledge skills/analysis-methods \
skills/case-summary skills/cohort-compare skills/molecular-viz \
skills/clinical-delegation ~/.openclaw/workspace/skills/
# Register sub-agents
for agent in patient-data labs-vitals medications analyst molecular; do
mkdir -p ~/.openclaw/workspaces/$agent
cp agents/${agent}-agent.md ~/.openclaw/workspaces/$agent/AGENTS.md
openclaw agents add $agent \
--workspace ~/.openclaw/workspaces/$agent \
--model local-ollama/nemotron-3-super:120b-a12b \
--non-interactive
done
# TOOLS.md for sub-agents
for agent in patient-data labs-vitals medications analyst molecular; do
cat > ~/.openclaw/workspaces/$agent/TOOLS.md << 'TOOLSEOF'
# Tools
Use the `exec` tool to run Python scripts against the FHIR server.
Call exec with: cat > /tmp/script.py << 'PYEOF'
import subprocess, json
r = subprocess.run(["curl", "-sf", "--max-time", "30", "https://r4.smarthealthit.org/Patient?_count=1"],
capture_output=True, text=True, timeout=35)
data = json.loads(r.stdout)
# your code here
PYEOF
python3 /tmp/script.py
Always use `exec` to run code. Do NOT use `write` for scripts.
Use subprocess.run(["curl", ...]) + json.loads() for HTTP requests. Do NOT use the requests library.
Print all results to stdout.
TOOLSEOF
done
# Auth profiles (provider name must match: local-ollama)
AUTH='{"version":1,"profiles":{"ollama":{"type":"api_key","provider":"local-ollama","key":"ollama"}}}'
for agent in main patient-data labs-vitals medications analyst molecular; do
mkdir -p ~/.openclaw/agents/$agent/agent
echo "$AUTH" > ~/.openclaw/agents/$agent/agent/auth-profiles.json
done
# Workspace stubs
for agent in patient-data labs-vitals medications analyst molecular; do
echo "# Identity" > ~/.openclaw/workspaces/$agent/IDENTITY.md
echo "# User" > ~/.openclaw/workspaces/$agent/USER.md
done
echo -e "# Identity\nClinical Intelligence Coordinator" > ~/.openclaw/workspace/IDENTITY.md
echo "# User" > ~/.openclaw/workspace/USER.md
# Allow coordinator to spawn sub-agents + disable unused skills
python3 -c "
import json
with open('$HOME/.openclaw/openclaw.json') as f: d = json.load(f)
for a in d.get('agents', {}).get('list', []):
if a['id'] == 'main': a['subagents'] = {'allowAgents': ['*']}
d.setdefault('tools', {})['sessions'] = {'visibility': 'all'}
d.setdefault('skills', {})['entries'] = {
'weather': {'enabled': False}, 'tmux': {'enabled': False},
'healthcheck': {'enabled': False}, 'gh-issues': {'enabled': False},
'skill-creator': {'enabled': False}, 'github': {'enabled': False}
}
with open('$HOME/.openclaw/openclaw.json', 'w') as f: json.dump(d, f, indent=2)
"
```
### 10. Verify Inside the Sandbox
```bash
# Verify inference routing
curl -sk https://inference.local/v1/models | head -3
# Verify FHIR access
curl -sk https://r4.smarthealthit.org/Patient?_count=1 | head -3
# Verify deny (should timeout or fail)
curl -sk --max-time 5 https://google.com
# Run a test
openclaw agent --local --session-id smoke --thinking off \
--message "Say hello in one sentence" --timeout 120
```
---
## Compatible Models
Any Ollama model with native tool calling works. Tested:
- **nemotron-3-super:120b-a12b** — recommended, MoE, fast inference
- **qwen3:235b** — larger, slower, reliable
- **qwen3-coder:480b** — best for code generation
- **qwen3:32b** — for constrained environments
- **qwen2.5:72b** — proven stable, good tool calling
To switch: `ollama pull <model>`, then `openshell cluster inference set --provider ollama-local --model <model>`, then inside the sandbox: `openclaw config set agents.defaults.model local-ollama/<model>`.
---
## OpenShell Sandbox Security
OpenShell sandboxes use an **implicit-deny** model. All network egress, filesystem writes, and resource usage are blocked by default. The `sandbox-policy.yaml` explicitly whitelists what the sandbox can reach.
**In practice:**
- The agent **cannot** reach the internet except for the FHIR endpoint in the policy.
- The agent **cannot** exfiltrate data — no outbound HTTP, no DNS to arbitrary hosts.
- LLM inference routes through `inference.local`, a OpenShell-managed bridge to the host Ollama. This never leaves the machine.
- Filesystem writes are confined to `/tmp`, `/home/sandbox`, and `~/.openclaw`.
- Resource caps prevent runaway processes from consuming the host.
### Modifying the Policy
To allow additional FHIR endpoints (e.g., a hospital server), add entries under `network_policies` in `sandbox-policy.yaml`:
```yaml
hospital_fhir:
name: hospital_fhir
endpoints:
- host: fhir.your-hospital.org
port: 443
protocol: https
description: Hospital FHIR R4 endpoint
```
Then recreate the sandbox:
```bash
openshell sandbox delete <sandbox-name>
openshell sandbox create --policy sandbox-policy.yaml --provider ollama-local --forward 18789 --keep
```
### Audit
OpenShell logs all network connections attempted by the sandbox, including blocked ones:
```bash
openshell sandbox logs <sandbox-name> --filter network
```
---
## Running Workflows
All `openclaw` commands below run **inside the OpenShell sandbox**.
```bash
openshell sandbox connect <sandbox-name>
```
### CMS122 Diabetes Quality Gap
```bash
openclaw agent --local --session-id test-cms122 \
--thinking off \
--message "Find all diabetic patients, get their latest A1c and medications. Identify gap patients with A1c above 9% not on insulin or GLP-1. Show the A1c distribution as a histogram." \
--timeout 600
```
Validate: `python3 scripts/validate_and_run.py --validate-only /tmp/cms122.py`
### Patient Case Summary
```bash
openclaw agent --local --session-id test-case \
--thinking off \
--message "Look up the first patient. Compile a case summary: demographics, conditions, recent labs (flag abnormal), and medications." \
--timeout 600
```
### Ad-Hoc Cross-Reference
```bash
openclaw agent --local --session-id test-adhoc \
--thinking off \
--message "Which patients have both diabetes and hypertension? For the overlap, get their latest HbA1c and blood pressure." \
--timeout 600
```
### Conversational Follow-Up Test
Three turns in the same session (tests context persistence):
```bash
openclaw agent --local --session-id follow-up-test --thinking off \
--message "Find all diabetic patients. Print the count and list their IDs." \
--timeout 600
openclaw agent --local --session-id follow-up-test --thinking off \
--message "From those diabetic patients, which ones also have hypertension? Intersect the two groups." \
--timeout 600
openclaw agent --local --session-id follow-up-test --thinking off \
--message "For the patients with both diabetes and hypertension, get their latest HbA1c and eGFR. Flag anyone with eGFR below 60 as CKD risk." \
--timeout 600
```
### Bulk Validation
```bash
for f in /tmp/cms122.py /tmp/case.py /tmp/adhoc.py /tmp/step1.py /tmp/step2.py /tmp/step3.py; do
echo "--- $f ---"
python3 scripts/validate_and_run.py --validate-only "$f"
done
```
---
## Connecting a Hospital FHIR Endpoint
1. Get OAuth2 credentials from your integration team (client_id, token_endpoint, scopes `patient/*.read`).
2. Add the hospital host to `sandbox-policy.yaml` under `network_policies` and recreate the sandbox.
3. Specify the FHIR URL directly in your prompt instead of `https://r4.smarthealthit.org`.
4. Add an auth helper to `skills/fhir-basics/SKILL.md` for token-based requests.
5. Test: `python3 scripts/test-fhir.py --url https://your-hospital.org/fhir --token YOUR_TOKEN`
---
## Troubleshooting
### Sandbox Provisioning Failures
| Symptom | Fix |
|---------|-----|
| `sandbox create` hangs | Check host network; `openshell sandbox create --verbose` for pull progress |
| `failed to pull image` | GHCR rate limit — `docker login ghcr.io` with a PAT, or wait and retry |
| `policy validation error` | Check YAML syntax in `sandbox-policy.yaml`; recreate the sandbox |
| `provider not found` | `openshell provider list` — confirm `ollama-local` exists |
### inference.local Not Resolving
| Check | Fix |
|-------|-----|
| Ollama listening on 0.0.0.0? | `ss -tlnp \| grep 11434` on host. Docker Ollama (default): `make down && make up`. Host Ollama: add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service.d/override.conf` and `sudo systemctl restart ollama`. |
| Provider base URL correct? | `openshell provider show ollama-local` — must use host IP, not `127.0.0.1` |
| DNS working inside sandbox? | `nslookup inference.local` inside sandbox; recreate sandbox if broken |
| Firewall blocking bridge? | `curl -sk https://inference.local/v1/models` inside sandbox; check host firewall |
### LLM Connection Failures
| Check | Fix |
|-------|-----|
| Ollama running? | `curl -sk https://inference.local/v1/models` — start Ollama on host if down |
| Model pulled? | `ollama list` on host — `ollama pull nemotron-3-super:120b-a12b` if missing |
| Model loaded? | `ollama ps` on host — warmup: `curl -sk https://inference.local/v1/chat/completions -d '{"model":"nemotron-3-super:120b-a12b","messages":[{"role":"user","content":"ping"}]}'` |
### OpenClaw Auth Errors
"No API key found for provider" — Create auth profiles as shown in step 9. Provider name must match `local-ollama`.
"Profile ollama timed out" — Model unloaded. Warm it up from inside the sandbox: `curl -sk https://inference.local/v1/chat/completions -d '{"model":"nemotron-3-super:120b-a12b","messages":[{"role":"user","content":"ping"}]}' > /dev/null`
### FHIR Empty Results
| Check | Fix |
|-------|-----|
| Reachable from sandbox? | `curl -s https://r4.smarthealthit.org/metadata \| head -20` inside sandbox |
| Blocked by policy? | `openshell sandbox logs <sandbox-name> --filter network` — look for denied connections |
| SNOMED code format? | Use bare codes: `code=44054006`. The test server returns empty results with system-qualified URIs. |
| Page size? | Ensure `_count=200` in cohort queries |
### BP Values Always Null
Blood pressure is stored as a component Observation (LOINC 85354-9). The `fhir-basics` skill includes `get_bp()` that handles both panel and standalone formats. If the LLM queries `8480-6` directly without checking the panel, regenerate.
### Slow Performance
| Bottleneck | Fix |
|-----------|-----|
| LLM generation > 120s | Check GPU utilization with `nvidia-smi` on host; model may be on CPU |
| FHIR queries > 2s/patient | Test server can be slow; use offline mode for demos |
| Model keeps unloading | Set `keep_alive` to 120m in warmup curl |
| First sandbox command slow | One-time init cost; subsequent commands are fast |
---
## Managing Sandboxes
```bash
openshell sandbox list # list running sandboxes
openshell sandbox connect <sandbox-name> # re-enter existing sandbox
openshell sandbox logs <sandbox-name> # view sandbox logs
openshell sandbox delete <sandbox-name> # tear down (after policy changes, recreate)
```
---
## Performance Notes
Performance depends on the model, hardware, and FHIR server responsiveness. LLM generation time typically dominates over FHIR query time. Larger cohorts take longer due to per-patient REST queries; production deployments should use Bulk FHIR for populations over 200.