mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-06-23 14:49:31 +00:00
450 lines
18 KiB
Markdown
450 lines
18 KiB
Markdown
# Technical Runbook
|
|
|
|
Installation, configuration, troubleshooting, and operational details for the Clinical Intelligence Playbook.
|
|
|
|
> [!IMPORTANT]
|
|
> The supported install path is **Docker Ollama via `make up`** (see `instructions.md` Step 3 of the parent playbook, or `SETUP-GUIDE.md`). The manual steps below are for advanced users who want to wire components by hand or substitute a host Ollama install. If host Ollama is already running on port 11434, stop it (`sudo systemctl stop ollama`) or override `OLLAMA_PORT` in `.env` before starting Docker Ollama.
|
|
|
|
---
|
|
|
|
## Quick Start (OpenShell Sandbox on GB300)
|
|
|
|
This sets up an isolated OpenShell sandbox with OpenClaw and a local Ollama inference backend. Everything the agent touches — filesystem, network, processes — is confined to the sandbox.
|
|
|
|
**Requirements:** DGX Station GB300, Python 3.10+, `uv`, Docker + NVIDIA Container Toolkit, OpenShell CLI ≥ 0.0.33, Node.js 22+ LTS.
|
|
|
|
### 1. Install OpenShell
|
|
|
|
```bash
|
|
uv pip install openshell --upgrade --pre \
|
|
--index-url https://urm.nvidia.com/artifactory/api/pypi/nv-shared-pypi/simple
|
|
```
|
|
|
|
### 2. Start Ollama (Docker, recommended)
|
|
|
|
The bundled `docker-compose.yml` runs Ollama in a container, binds it to `0.0.0.0:${OLLAMA_PORT:-11434}`, and pins it to a single GPU (`LLM_GPU` in `.env`):
|
|
|
|
```bash
|
|
make up # docker compose up -d ollama openfold3 + model-pull
|
|
make status # confirm Ollama and OpenFold3 are healthy
|
|
```
|
|
|
|
> [!TIP]
|
|
> **Host Ollama alternative.** If you prefer running Ollama on the host (e.g., to share with another playbook), `sudo systemctl stop ollama` first or set `OLLAMA_PORT` in `.env` to a free port. Then `OLLAMA_HOST=0.0.0.0 ollama serve` and `ollama pull nemotron-3-super:120b-a12b`. The host's systemd unit binds to `127.0.0.1` by default — override with `Environment="OLLAMA_HOST=0.0.0.0"` in `/etc/systemd/system/ollama.service.d/override.conf` so the sandbox can reach Ollama via the Docker bridge.
|
|
|
|
### 3. Pull Model (Docker path runs this automatically)
|
|
|
|
`make up` invokes the `model-pull` service, which pulls `${OLLAMA_MODEL:-nemotron-3-super:120b-a12b}` (~86 GB) on first run. Subsequent runs skip if the model is cached.
|
|
|
|
If you opted into host Ollama, run `ollama pull nemotron-3-super:120b-a12b` manually.
|
|
|
|
### 4. Configure OpenShell Inference Provider
|
|
|
|
`make setup` runs this for you (and detects the Docker bridge IP automatically). To do it manually, replace `<HOST_IP>` with your Docker bridge IP (`ip -4 addr show docker0 | grep -oP 'inet \K[\d.]+'`):
|
|
|
|
```bash
|
|
openshell provider create \
|
|
--name ollama-local \
|
|
--type openai \
|
|
--credential OPENAI_API_KEY=ollama \
|
|
--config OPENAI_BASE_URL=http://<HOST_IP>:${OLLAMA_PORT:-11434}/v1
|
|
|
|
openshell inference set \
|
|
--provider ollama-local \
|
|
--model nemotron-3-super:120b-a12b
|
|
```
|
|
|
|
> [!NOTE]
|
|
> Current OpenShell releases do not accept the `--base-url` shorthand for `provider create`. Use `--config OPENAI_BASE_URL=...` as shown above. The `setup_sandbox.sh` script uses this form.
|
|
|
|
### 5. Generate the Sandbox Policy
|
|
|
|
The repo's `sandbox-policy.yaml` is a **template**: it contains a `__DOCKER_BRIDGE_IP__` placeholder that must be substituted with the host's `docker0` IP before the policy is valid. The helper `scripts/gen_sandbox_policy.sh` does this automatically and writes `sandbox-policy-local.yaml`:
|
|
|
|
```bash
|
|
bash scripts/gen_sandbox_policy.sh # writes sandbox-policy-local.yaml
|
|
```
|
|
|
|
`make setup` invokes this generator. If you are running steps by hand, always pass the **generated** file (`sandbox-policy-local.yaml`) to `openshell sandbox create`, not the template.
|
|
|
|
Verify the policy is working after sandbox creation:
|
|
|
|
```bash
|
|
# These should succeed:
|
|
curl -sk https://r4.smarthealthit.org/Patient?_count=1 | head -3 # FHIR
|
|
curl -sk https://inference.local/v1/models | head -3 # LLM
|
|
|
|
# These should fail (blocked):
|
|
curl -sk --max-time 5 https://google.com # denied
|
|
curl -sk --max-time 5 https://api.openai.com # denied
|
|
ping 8.8.8.8 -c 1 # Operation not permitted
|
|
```
|
|
|
|
### 6. Create the Sandbox
|
|
|
|
```bash
|
|
# Stop any stale forwards from a prior sandbox using the same port — these
|
|
# block re-creation with a "port already forwarded" error.
|
|
for s in $(openshell forward list 2>/dev/null | awk '/:18789 /{print $NF}' | sort -u); do
|
|
openshell forward stop 18789 "$s" 2>/dev/null || true
|
|
done
|
|
|
|
openshell sandbox create \
|
|
--policy sandbox-policy-local.yaml \
|
|
--provider ollama-local \
|
|
--forward 18789 \
|
|
--keep
|
|
```
|
|
|
|
`--keep` prevents the sandbox from being torn down on exit so you can re-enter it later.
|
|
|
|
### 7. Inside the Sandbox: Set Up OpenClaw
|
|
|
|
> **Note:** These manual steps (7-9) are automated by `make setup`. Use them only if you need to customize individual steps.
|
|
|
|
Everything from here runs **inside** the sandbox shell that OpenShell drops you into.
|
|
|
|
```bash
|
|
git clone https://github.com/jaival-nvidia/clinical-intelligence.git
|
|
cd clinical-intelligence
|
|
uv pip install pandas matplotlib
|
|
|
|
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.1/install.sh | bash
|
|
export NVM_DIR="$HOME/.nvm" && [ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"
|
|
nvm install --lts
|
|
npm install -g openclaw@latest
|
|
```
|
|
|
|
### 8. Inside the Sandbox: Configure Inference Provider
|
|
|
|
OpenShell maps the host Ollama to `inference.local` inside the sandbox. Save this as `configure_inference_local.py` and run it to register a custom OpenClaw provider:
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
"""configure_inference_local.py — patches OpenClaw config for OpenShell sandbox."""
|
|
import json
|
|
from pathlib import Path
|
|
|
|
config_path = Path.home() / ".openclaw" / "openclaw.json"
|
|
config_path.parent.mkdir(parents=True, exist_ok=True)
|
|
|
|
config = json.loads(config_path.read_text()) if config_path.exists() else {}
|
|
|
|
config.setdefault("providers", {})
|
|
config["providers"]["local-ollama"] = {
|
|
"type": "openai-compatible",
|
|
"baseUrl": "https://inference.local/v1",
|
|
"apiKey": "ollama",
|
|
"models": ["nemotron-3-super:120b-a12b"]
|
|
}
|
|
|
|
defaults = config.setdefault("agents", {}).setdefault("defaults", {})
|
|
defaults["model"] = "local-ollama/nemotron-3-super:120b-a12b"
|
|
defaults["timeoutSeconds"] = 600
|
|
sub = defaults.setdefault("subagents", {})
|
|
sub.update({"maxSpawnDepth": 2, "maxConcurrent": 4, "runTimeoutSeconds": 600})
|
|
|
|
config_path.write_text(json.dumps(config, indent=2))
|
|
print(f"Wrote {config_path}")
|
|
print("Provider: local-ollama -> https://inference.local/v1")
|
|
print("Model: nemotron-3-super:120b-a12b")
|
|
```
|
|
|
|
```bash
|
|
python3 configure_inference_local.py
|
|
```
|
|
|
|
### 9. Inside the Sandbox: Install Skills and Agents
|
|
|
|
```bash
|
|
# Deploy all 7 skills to ~/.openclaw/workspace/skills/ (matching setup_sandbox.sh)
|
|
mkdir -p ~/.openclaw/workspace/skills
|
|
cp -r skills/fhir-basics skills/clinical-knowledge skills/analysis-methods \
|
|
skills/case-summary skills/cohort-compare skills/molecular-viz \
|
|
skills/clinical-delegation ~/.openclaw/workspace/skills/
|
|
|
|
# Register sub-agents
|
|
for agent in patient-data labs-vitals medications analyst molecular; do
|
|
mkdir -p ~/.openclaw/workspaces/$agent
|
|
cp agents/${agent}-agent.md ~/.openclaw/workspaces/$agent/AGENTS.md
|
|
openclaw agents add $agent \
|
|
--workspace ~/.openclaw/workspaces/$agent \
|
|
--model local-ollama/nemotron-3-super:120b-a12b \
|
|
--non-interactive
|
|
done
|
|
|
|
# TOOLS.md for sub-agents
|
|
for agent in patient-data labs-vitals medications analyst molecular; do
|
|
cat > ~/.openclaw/workspaces/$agent/TOOLS.md << 'TOOLSEOF'
|
|
# Tools
|
|
Use the `exec` tool to run Python scripts against the FHIR server.
|
|
Call exec with: cat > /tmp/script.py << 'PYEOF'
|
|
import subprocess, json
|
|
r = subprocess.run(["curl", "-sf", "--max-time", "30", "https://r4.smarthealthit.org/Patient?_count=1"],
|
|
capture_output=True, text=True, timeout=35)
|
|
data = json.loads(r.stdout)
|
|
# your code here
|
|
PYEOF
|
|
python3 /tmp/script.py
|
|
Always use `exec` to run code. Do NOT use `write` for scripts.
|
|
Use subprocess.run(["curl", ...]) + json.loads() for HTTP requests. Do NOT use the requests library.
|
|
Print all results to stdout.
|
|
TOOLSEOF
|
|
done
|
|
|
|
# Auth profiles (provider name must match: local-ollama)
|
|
AUTH='{"version":1,"profiles":{"ollama":{"type":"api_key","provider":"local-ollama","key":"ollama"}}}'
|
|
for agent in main patient-data labs-vitals medications analyst molecular; do
|
|
mkdir -p ~/.openclaw/agents/$agent/agent
|
|
echo "$AUTH" > ~/.openclaw/agents/$agent/agent/auth-profiles.json
|
|
done
|
|
|
|
# Workspace stubs
|
|
for agent in patient-data labs-vitals medications analyst molecular; do
|
|
echo "# Identity" > ~/.openclaw/workspaces/$agent/IDENTITY.md
|
|
echo "# User" > ~/.openclaw/workspaces/$agent/USER.md
|
|
done
|
|
echo -e "# Identity\nClinical Intelligence Coordinator" > ~/.openclaw/workspace/IDENTITY.md
|
|
echo "# User" > ~/.openclaw/workspace/USER.md
|
|
|
|
# Allow coordinator to spawn sub-agents + disable unused skills
|
|
python3 -c "
|
|
import json
|
|
with open('$HOME/.openclaw/openclaw.json') as f: d = json.load(f)
|
|
for a in d.get('agents', {}).get('list', []):
|
|
if a['id'] == 'main': a['subagents'] = {'allowAgents': ['*']}
|
|
d.setdefault('tools', {})['sessions'] = {'visibility': 'all'}
|
|
d.setdefault('skills', {})['entries'] = {
|
|
'weather': {'enabled': False}, 'tmux': {'enabled': False},
|
|
'healthcheck': {'enabled': False}, 'gh-issues': {'enabled': False},
|
|
'skill-creator': {'enabled': False}, 'github': {'enabled': False}
|
|
}
|
|
with open('$HOME/.openclaw/openclaw.json', 'w') as f: json.dump(d, f, indent=2)
|
|
"
|
|
```
|
|
|
|
### 10. Verify Inside the Sandbox
|
|
|
|
```bash
|
|
# Verify inference routing
|
|
curl -sk https://inference.local/v1/models | head -3
|
|
|
|
# Verify FHIR access
|
|
curl -sk https://r4.smarthealthit.org/Patient?_count=1 | head -3
|
|
|
|
# Verify deny (should timeout or fail)
|
|
curl -sk --max-time 5 https://google.com
|
|
|
|
# Run a test
|
|
openclaw agent --local --session-id smoke --thinking off \
|
|
--message "Say hello in one sentence" --timeout 120
|
|
```
|
|
|
|
---
|
|
|
|
## Compatible Models
|
|
|
|
Any Ollama model with native tool calling works. Tested:
|
|
|
|
- **nemotron-3-super:120b-a12b** — recommended, MoE, fast inference
|
|
- **qwen3:235b** — larger, slower, reliable
|
|
- **qwen3-coder:480b** — best for code generation
|
|
- **qwen3:32b** — for constrained environments
|
|
- **qwen2.5:72b** — proven stable, good tool calling
|
|
|
|
To switch: `ollama pull <model>`, then `openshell cluster inference set --provider ollama-local --model <model>`, then inside the sandbox: `openclaw config set agents.defaults.model local-ollama/<model>`.
|
|
|
|
---
|
|
|
|
## OpenShell Sandbox Security
|
|
|
|
OpenShell sandboxes use an **implicit-deny** model. All network egress, filesystem writes, and resource usage are blocked by default. The `sandbox-policy.yaml` explicitly whitelists what the sandbox can reach.
|
|
|
|
**In practice:**
|
|
|
|
- The agent **cannot** reach the internet except for the FHIR endpoint in the policy.
|
|
- The agent **cannot** exfiltrate data — no outbound HTTP, no DNS to arbitrary hosts.
|
|
- LLM inference routes through `inference.local`, a OpenShell-managed bridge to the host Ollama. This never leaves the machine.
|
|
- Filesystem writes are confined to `/tmp`, `/home/sandbox`, and `~/.openclaw`.
|
|
- Resource caps prevent runaway processes from consuming the host.
|
|
|
|
### Modifying the Policy
|
|
|
|
To allow additional FHIR endpoints (e.g., a hospital server), add entries under `network_policies` in `sandbox-policy.yaml`:
|
|
|
|
```yaml
|
|
hospital_fhir:
|
|
name: hospital_fhir
|
|
endpoints:
|
|
- host: fhir.your-hospital.org
|
|
port: 443
|
|
protocol: https
|
|
description: Hospital FHIR R4 endpoint
|
|
```
|
|
|
|
Then recreate the sandbox:
|
|
|
|
```bash
|
|
openshell sandbox delete <sandbox-name>
|
|
openshell sandbox create --policy sandbox-policy.yaml --provider ollama-local --forward 18789 --keep
|
|
```
|
|
|
|
### Audit
|
|
|
|
OpenShell logs all network connections attempted by the sandbox, including blocked ones:
|
|
|
|
```bash
|
|
openshell sandbox logs <sandbox-name> --filter network
|
|
```
|
|
|
|
---
|
|
|
|
## Running Workflows
|
|
|
|
All `openclaw` commands below run **inside the OpenShell sandbox**.
|
|
|
|
```bash
|
|
openshell sandbox connect <sandbox-name>
|
|
```
|
|
|
|
### CMS122 Diabetes Quality Gap
|
|
|
|
```bash
|
|
openclaw agent --local --session-id test-cms122 \
|
|
--thinking off \
|
|
--message "Find all diabetic patients, get their latest A1c and medications. Identify gap patients with A1c above 9% not on insulin or GLP-1. Show the A1c distribution as a histogram." \
|
|
--timeout 600
|
|
```
|
|
|
|
Validate: `python3 scripts/validate_and_run.py --validate-only /tmp/cms122.py`
|
|
|
|
### Patient Case Summary
|
|
|
|
```bash
|
|
openclaw agent --local --session-id test-case \
|
|
--thinking off \
|
|
--message "Look up the first patient. Compile a case summary: demographics, conditions, recent labs (flag abnormal), and medications." \
|
|
--timeout 600
|
|
```
|
|
|
|
### Ad-Hoc Cross-Reference
|
|
|
|
```bash
|
|
openclaw agent --local --session-id test-adhoc \
|
|
--thinking off \
|
|
--message "Which patients have both diabetes and hypertension? For the overlap, get their latest HbA1c and blood pressure." \
|
|
--timeout 600
|
|
```
|
|
|
|
### Conversational Follow-Up Test
|
|
|
|
Three turns in the same session (tests context persistence):
|
|
|
|
```bash
|
|
openclaw agent --local --session-id follow-up-test --thinking off \
|
|
--message "Find all diabetic patients. Print the count and list their IDs." \
|
|
--timeout 600
|
|
|
|
openclaw agent --local --session-id follow-up-test --thinking off \
|
|
--message "From those diabetic patients, which ones also have hypertension? Intersect the two groups." \
|
|
--timeout 600
|
|
|
|
openclaw agent --local --session-id follow-up-test --thinking off \
|
|
--message "For the patients with both diabetes and hypertension, get their latest HbA1c and eGFR. Flag anyone with eGFR below 60 as CKD risk." \
|
|
--timeout 600
|
|
```
|
|
|
|
### Bulk Validation
|
|
|
|
```bash
|
|
for f in /tmp/cms122.py /tmp/case.py /tmp/adhoc.py /tmp/step1.py /tmp/step2.py /tmp/step3.py; do
|
|
echo "--- $f ---"
|
|
python3 scripts/validate_and_run.py --validate-only "$f"
|
|
done
|
|
```
|
|
|
|
---
|
|
|
|
## Connecting a Hospital FHIR Endpoint
|
|
|
|
1. Get OAuth2 credentials from your integration team (client_id, token_endpoint, scopes `patient/*.read`).
|
|
2. Add the hospital host to `sandbox-policy.yaml` under `network_policies` and recreate the sandbox.
|
|
3. Specify the FHIR URL directly in your prompt instead of `https://r4.smarthealthit.org`.
|
|
4. Add an auth helper to `skills/fhir-basics/SKILL.md` for token-based requests.
|
|
5. Test: `python3 scripts/test-fhir.py --url https://your-hospital.org/fhir --token YOUR_TOKEN`
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Sandbox Provisioning Failures
|
|
|
|
| Symptom | Fix |
|
|
|---------|-----|
|
|
| `sandbox create` hangs | Check host network; `openshell sandbox create --verbose` for pull progress |
|
|
| `failed to pull image` | GHCR rate limit — `docker login ghcr.io` with a PAT, or wait and retry |
|
|
| `policy validation error` | Check YAML syntax in `sandbox-policy.yaml`; recreate the sandbox |
|
|
| `provider not found` | `openshell provider list` — confirm `ollama-local` exists |
|
|
|
|
### inference.local Not Resolving
|
|
|
|
| Check | Fix |
|
|
|-------|-----|
|
|
| Ollama listening on 0.0.0.0? | `ss -tlnp \| grep 11434` on host. Docker Ollama (default): `make down && make up`. Host Ollama: add `Environment="OLLAMA_HOST=0.0.0.0"` to `/etc/systemd/system/ollama.service.d/override.conf` and `sudo systemctl restart ollama`. |
|
|
| Provider base URL correct? | `openshell provider show ollama-local` — must use host IP, not `127.0.0.1` |
|
|
| DNS working inside sandbox? | `nslookup inference.local` inside sandbox; recreate sandbox if broken |
|
|
| Firewall blocking bridge? | `curl -sk https://inference.local/v1/models` inside sandbox; check host firewall |
|
|
|
|
### LLM Connection Failures
|
|
|
|
| Check | Fix |
|
|
|-------|-----|
|
|
| Ollama running? | `curl -sk https://inference.local/v1/models` — start Ollama on host if down |
|
|
| Model pulled? | `ollama list` on host — `ollama pull nemotron-3-super:120b-a12b` if missing |
|
|
| Model loaded? | `ollama ps` on host — warmup: `curl -sk https://inference.local/v1/chat/completions -d '{"model":"nemotron-3-super:120b-a12b","messages":[{"role":"user","content":"ping"}]}'` |
|
|
|
|
### OpenClaw Auth Errors
|
|
|
|
"No API key found for provider" — Create auth profiles as shown in step 9. Provider name must match `local-ollama`.
|
|
|
|
"Profile ollama timed out" — Model unloaded. Warm it up from inside the sandbox: `curl -sk https://inference.local/v1/chat/completions -d '{"model":"nemotron-3-super:120b-a12b","messages":[{"role":"user","content":"ping"}]}' > /dev/null`
|
|
|
|
### FHIR Empty Results
|
|
|
|
| Check | Fix |
|
|
|-------|-----|
|
|
| Reachable from sandbox? | `curl -s https://r4.smarthealthit.org/metadata \| head -20` inside sandbox |
|
|
| Blocked by policy? | `openshell sandbox logs <sandbox-name> --filter network` — look for denied connections |
|
|
| SNOMED code format? | Use bare codes: `code=44054006`. The test server returns empty results with system-qualified URIs. |
|
|
| Page size? | Ensure `_count=200` in cohort queries |
|
|
|
|
### BP Values Always Null
|
|
|
|
Blood pressure is stored as a component Observation (LOINC 85354-9). The `fhir-basics` skill includes `get_bp()` that handles both panel and standalone formats. If the LLM queries `8480-6` directly without checking the panel, regenerate.
|
|
|
|
### Slow Performance
|
|
|
|
| Bottleneck | Fix |
|
|
|-----------|-----|
|
|
| LLM generation > 120s | Check GPU utilization with `nvidia-smi` on host; model may be on CPU |
|
|
| FHIR queries > 2s/patient | Test server can be slow; use offline mode for demos |
|
|
| Model keeps unloading | Set `keep_alive` to 120m in warmup curl |
|
|
| First sandbox command slow | One-time init cost; subsequent commands are fast |
|
|
|
|
---
|
|
|
|
## Managing Sandboxes
|
|
|
|
```bash
|
|
openshell sandbox list # list running sandboxes
|
|
openshell sandbox connect <sandbox-name> # re-enter existing sandbox
|
|
openshell sandbox logs <sandbox-name> # view sandbox logs
|
|
openshell sandbox delete <sandbox-name> # tear down (after policy changes, recreate)
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Notes
|
|
|
|
Performance depends on the model, hardware, and FHIR server responsiveness. LLM generation time typically dominates over FHIR query time. Larger cohorts take longer due to per-patient REST queries; production deployments should use Bulk FHIR for populations over 200.
|