chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2026-06-17 21:19:47 +00:00
parent 45c915e144
commit 6a749bdcb0
2 changed files with 70 additions and 36 deletions

View File

@ -125,12 +125,23 @@ Write a short README checklist for a Python project.
Expected output should show the model responding in the terminal. When you are done, type `/bye` or press `Ctrl+D` to exit the interactive session before continuing. Expected output should show the model responding in the terminal. When you are done, type `/bye` or press `Ctrl+D` to exit the interactive session before continuing.
## Step 5. Launch Claude Code with Ollama ## Step 5. Install and launch Claude Code with Ollama
**Description**: Use Ollama's built-in [launch method](https://ollama.com/blog/launch) to start [Claude Code](https://docs.claude.com/en/docs/claude-code) against your local model. No environment variables or config files are required. **Description**: Install [Claude Code](https://docs.claude.com/en/docs/claude-code), then use Ollama's built-in [launch method](https://ollama.com/blog/launch) to start Claude Code against your local model. No environment variables or config files are required.
```bash ```bash
ollama launch claude curl -fsSL https://claude.ai/install.sh | bash
claude --version
```
If Claude Code is already installed, just verify the version:
```bash
claude --version
```
```bash
ollama launch claude --model qwen3.6
``` ```
Expected output should show Claude Code starting and using the local Qwen3.6 model. Qwen3.6 ships with a 256K context window by default; adjust context length through Ollama's settings if you need to tune it further. Expected output should show Claude Code starting and using the local Qwen3.6 model. Qwen3.6 ships with a 256K context window by default; adjust context length through Ollama's settings if you need to tune it further.
@ -150,6 +161,8 @@ printf 'import math_utils\n\n\ndef test_add():\n assert math_utils.add(1, 2)
If you do not already have pytest installed: If you do not already have pytest installed:
```bash ```bash
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U pytest python3 -m pip install -U pytest
``` ```
@ -165,7 +178,7 @@ Run the test:
python3 -m pytest -q python3 -m pytest -q
``` ```
Expected output should show the test passing. Expected output should show the test passing. When you are done, run `deactivate` to exit the virtual environment.
## Step 7. Cleanup and rollback ## Step 7. Cleanup and rollback
@ -259,7 +272,7 @@ Expected output should show the model responding. When you are done, type `/bye`
**Description**: Use Ollama's built-in [launch method](https://ollama.com/blog/launch) to start [OpenCode](https://opencode.ai) against your local model. No [`opencode.json`](https://opencode.ai/docs/config/) provider configuration is required. **Description**: Use Ollama's built-in [launch method](https://ollama.com/blog/launch) to start [OpenCode](https://opencode.ai) against your local model. No [`opencode.json`](https://opencode.ai/docs/config/) provider configuration is required.
```bash ```bash
ollama launch opencode ollama launch opencode --model qwen3.6
``` ```
If you want to pre-configure OpenCode without launching immediately: If you want to pre-configure OpenCode without launching immediately:
@ -285,6 +298,8 @@ printf 'import math_utils\n\n\ndef test_add():\n assert math_utils.add(1, 2)
If you do not already have pytest installed: If you do not already have pytest installed:
```bash ```bash
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U pytest python3 -m pip install -U pytest
``` ```
@ -300,7 +315,7 @@ Run the test:
python3 -m pytest -q python3 -m pytest -q
``` ```
Expected output should show the test passing. Expected output should show the test passing. When you are done, run `deactivate` to exit the virtual environment.
## Step 7. Cleanup and rollback ## Step 7. Cleanup and rollback
@ -394,7 +409,7 @@ Expected output should show the model responding. When you are done, type `/bye`
**Description**: Use Ollama's built-in [launch method](https://ollama.com/blog/launch) to start [Codex CLI](https://github.com/openai/codex) against your local model. No `~/.codex/config.toml` and no manual `npm install -g @openai/codex` are required — Ollama handles the Codex integration. **Description**: Use Ollama's built-in [launch method](https://ollama.com/blog/launch) to start [Codex CLI](https://github.com/openai/codex) against your local model. No `~/.codex/config.toml` and no manual `npm install -g @openai/codex` are required — Ollama handles the Codex integration.
```bash ```bash
ollama launch codex ollama launch codex --model qwen3.6
``` ```
Expected output should show Codex CLI starting with Ollama as the provider and Qwen3.6 as the model. Qwen3.6 ships with a 256K context window by default, which is well suited to Codex's agentic workflows. Expected output should show Codex CLI starting with Ollama as the provider and Qwen3.6 as the model. Qwen3.6 ships with a 256K context window by default, which is well suited to Codex's agentic workflows.
@ -414,6 +429,8 @@ printf 'import math_utils\n\n\ndef test_add():\n assert math_utils.add(1, 2)
If you do not already have pytest installed: If you do not already have pytest installed:
```bash ```bash
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install -U pytest python3 -m pip install -U pytest
``` ```
@ -429,7 +446,7 @@ Run the test:
python3 -m pytest -q python3 -m pytest -q
``` ```
Expected output should show the test passing. Expected output should show the test passing. When you are done, run `deactivate` to exit the virtual environment.
## Step 7. Cleanup and rollback ## Step 7. Cleanup and rollback
@ -465,6 +482,10 @@ ollama rm qwen3.6
| `connection refused` to localhost:11434 | Ollama service not running | Start with `ollama serve` or `sudo systemctl start ollama` | | `connection refused` to localhost:11434 | Ollama service not running | Start with `ollama serve` or `sudo systemctl start ollama` |
| `ollama launch <agent>` exits immediately | Agent integration failed to initialize | Re-run `ollama launch <agent>`; if it persists, check `journalctl -u ollama` | | `ollama launch <agent>` exits immediately | Agent integration failed to initialize | Re-run `ollama launch <agent>`; if it persists, check `journalctl -u ollama` |
| Slow responses or OOM errors | Model variant too large for GPU memory | Switch to `qwen3.6:35b-a3b-nvfp4` or close other GPU workloads | | Slow responses or OOM errors | Model variant too large for GPU memory | Switch to `qwen3.6:35b-a3b-nvfp4` or close other GPU workloads |
| `python3 -m pip install -U pytest` reports `externally-managed-environment` | Ubuntu 24.04 protects the system Python environment | Create and activate a virtual environment first: `python3 -m venv .venv && source .venv/bin/activate` |
| `ollama pull` reports that a model tag is a sharded GGUF | The selected model tag is not supported by Ollama | Use the Qwen3.6 commands in Step 3 instead of sharded GGUF tags |
| `ollama run` fails with `CUDA error: context is destroyed` on a multi-GPU system | Ollama is initializing across a mixed-GPU topology | Pin Ollama to one GPU. For a foreground test, run `CUDA_VISIBLE_DEVICES=0 ollama serve`; for a system service, add `Environment="CUDA_VISIBLE_DEVICES=0"` to an Ollama systemd drop-in and restart Ollama |
| A direct Claude Code setup using an Anthropic-compatible Ollama endpoint produces prose but does not edit files | Some model/server combinations do not emit tool calls reliably | Use `ollama launch claude` with Qwen3.6 as shown in this playbook |
> [!NOTE] > [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing > DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing

View File

@ -51,10 +51,10 @@ You will deploy NVIDIA's VSS AI Blueprint on NVIDIA Spark hardware with Blackwel
* Container startup can be resource-intensive and time-consuming with large model downloads * Container startup can be resource-intensive and time-consuming with large model downloads
* Network configuration conflicts if shared network already exists * Network configuration conflicts if shared network already exists
* Remote API endpoints may have rate limits or connectivity issues (hybrid deployment) * Remote API endpoints may have rate limits or connectivity issues (hybrid deployment)
* **Rollback:** Stop all containers with `scripts/dev-profile.sh down` * **Rollback:** Stop all containers with `deploy/docker/scripts/dev-profile.sh down`
* **Last Updated:** 3/16/2026 * **Last Updated:** 06/17/2026
* Update required OS and Driver versions * Update required OS and Driver versions
* Support for VSS 3.1.0 with Cosmos Reason 2 VLM * Support for VSS 3.2.0 with Cosmos Reason 2 VLM
## Instructions ## Instructions
@ -65,7 +65,7 @@ Check that your system meets the hardware and software [prerequisites](https://d
```bash ```bash
## Verify driver version ## Verify driver version
nvidia-smi | grep "Driver Version" nvidia-smi | grep "Driver Version"
## Expected output: Driver Version: 580.126.09 or higher ## Expected output: Driver Version: 580.95.05 or higher
## Verify CUDA version ## Verify CUDA version
nvcc --version nvcc --version
@ -106,10 +106,19 @@ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
Clone the Video Search and Summarization repository from NVIDIA's public GitHub. Clone the Video Search and Summarization repository from NVIDIA's public GitHub.
**Note** Install Git LFS if not already present on the system
```bash
sudo apt-get install -y git-lfs && git lfs install
```
```bash ```bash
## Clone the VSS AI Blueprint repository ## Clone the VSS AI Blueprint repository
git clone https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization.git git clone https://github.com/NVIDIA-AI-Blueprints/video-search-and-summarization.git
cd video-search-and-summarization cd video-search-and-summarization
git checkout tags/v3.2.0
git lfs install
git lfs pull
``` ```
## Step 4. Run the cache cleaner script ## Step 4. Run the cache cleaner script
@ -148,12 +157,11 @@ sudo -b /usr/local/bin/sys-cache-cleaner.sh
``` ```
> [!NOTE] > [!NOTE]
+> The above runs the cache cleaner in the current session only; it does not persist across reboots. To have the cache cleaner run across reboots, create a systemd service instead. The above runs the cache cleaner in the current session only; it does not persist across reboots. To have the cache cleaner run across reboots, create a systemd service instead.
+> To stop the background cache cleaner:
+> To stop the background cache cleaner: ```bash
+> ```bash sudo pkill -f sys-cache-cleaner.sh
+> sudo pkill -f sys-cache-cleaner.sh ```
+> ```
## Step 5. Authenticate with NVIDIA Container Registry ## Step 5. Authenticate with NVIDIA Container Registry
@ -172,13 +180,13 @@ docker login nvcr.io
## Step 6. Choose deployment scenario ## Step 6. Choose deployment scenario
Choose between two deployment options based on your requirements: Choose the deployment options based on your requirements:
| Deployment Scenario | VLM (Cosmos-Reason2-8B)| LLM | | Deployment Scenario | VLM (Cosmos-Reason2-8B)| LLM |
|-------------------------------------------|------------------------|-------------------------------| |-------------------------------------------|------------------------|-------------------------------|
| Standard VSS (Base) | Local | Remote | | Standard VSS (Base) | Local | Remote |
| Standard VSS (Alert Verification) | Local | Remote | | Standard VSS (Alert Verification) | Local | Remote |
| Standard VSS deployment (Real-Time Alerts)| Local | Remote | | Standard VSS deployment (Real-Time Alerts)| Local | Remote |
## Step 7. Standard VSS ## Step 7. Standard VSS
@ -202,19 +210,21 @@ In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://bui
```bash ```bash
## Start Standard VSS (Base) ## Start Standard VSS (Base)
## Set NGC CLI API key and Hugging Face token (required for VA-MCP)
export NGC_CLI_API_KEY='your_ngc_api_key' export NGC_CLI_API_KEY='your_ngc_api_key'
export HF_TOKEN='hf_your_token_here'
export LLM_ENDPOINT_URL=https://your-llm-endpoint.com export LLM_ENDPOINT_URL=https://your-llm-endpoint.com
scripts/dev-profile.sh up -p base -H DGX-SPARK --use-remote-llm --llm <REMOTE LLM MODEL NAME> deploy/docker/scripts/dev-profile.sh up -p base -H DGX-SPARK --use-remote-llm --llm <REMOTE LLM MODEL NAME>
## Start Standard VSS (Alert Verification) ## Start Standard VSS (Alert Verification)
export NGC_CLI_API_KEY='your_ngc_api_key' export NGC_CLI_API_KEY='your_ngc_api_key'
export LLM_ENDPOINT_URL=https://your-llm-endpoint.com export LLM_ENDPOINT_URL=https://your-llm-endpoint.com
scripts/dev-profile.sh up -p alerts -m verification -H DGX-SPARK --use-remote-llm --llm <REMOTE LLM MODEL NAME> deploy/docker/scripts/dev-profile.sh up -p alerts -m verification -H DGX-SPARK --use-remote-llm --llm <REMOTE LLM MODEL NAME>
## Start Standard VSS (Real-Time Alerts) ## Start Standard VSS (Real-Time Alerts)
export NGC_CLI_API_KEY='your_ngc_api_key' export NGC_CLI_API_KEY='your_ngc_api_key'
export LLM_ENDPOINT_URL=https://your-llm-endpoint.com export LLM_ENDPOINT_URL=https://your-llm-endpoint.com
scripts/dev-profile.sh up -p alerts -m real-time -H DGX-SPARK --use-remote-llm --llm <REMOTE LLM MODEL NAME> deploy/docker/scripts/dev-profile.sh up -p alerts -m real-time -H DGX-SPARK --use-remote-llm --llm <REMOTE LLM MODEL NAME>
``` ```
> [!NOTE] > [!NOTE]
@ -226,11 +236,11 @@ scripts/dev-profile.sh up -p alerts -m real-time -H DGX-SPARK --use-remote-llm -
> • **OPENAI_API_KEY** — (optional) For remote LLM/VLM endpoints that require it > • **OPENAI_API_KEY** — (optional) For remote LLM/VLM endpoints that require it
> • **VLM_CUSTOM_WEIGHTS** — (optional) Absolute path to a custom weights directory > • **VLM_CUSTOM_WEIGHTS** — (optional) Absolute path to a custom weights directory
> >
> Pass these additional flags to **`scripts/dev-profile.sh`** for remote LLM mode: > Pass these additional flags to **`deploy/docker/scripts/dev-profile.sh`** for remote LLM mode:
> • **`--use-remote-llm`** — (required) Use a remote LLM, the base URL is read from **`LLM_ENDPOINT_URL`** in the environment > • **`--use-remote-llm`** — (required) Use a remote LLM, the base URL is read from **`LLM_ENDPOINT_URL`** in the environment
> • **`--llm`** — (required) Remote LLM model name (for example: `nvidia/nvidia-nemotron-nano-9b-v2`). **Strongly recommended** for alert workflows (verification and real-time): use `nvidia/nvidia-nemotron-nano-9b-v2`. Omitting `--llm` may cause the script to use whatever model is returned by the remote endpoint. > • **`--llm`** — (required) Remote LLM model name (for example: `nvidia/nvidia-nemotron-nano-9b-v2`). **Strongly recommended** for alert workflows (verification and real-time): use `nvidia/nvidia-nemotron-nano-9b-v2`. Omitting `--llm` may cause the script to use whatever model is returned by the remote endpoint.
> >
> Run **`scripts/dev-profile.sh -h`** for a full list of supported arguments. > Run **`deploy/docker/scripts/dev-profile.sh --help`** for a full list of supported arguments.
**7.3 Validate Standard VSS deployment** **7.3 Validate Standard VSS deployment**
@ -241,7 +251,7 @@ Access the VSS UI to confirm successful deployment.
```bash ```bash
## Test Agent UI accessibility ## Test Agent UI accessibility
## If running locally on your Spark device, use localhost: ## If running locally on your Spark device, use localhost:
curl -I http://localhost:3000 curl -I http://localhost:7777
## Expected: HTTP 200 response ## Expected: HTTP 200 response
## If your Spark is running in Remote/Accessory mode, replace 'localhost' with the IP address or hostname of your Spark device. ## If your Spark is running in Remote/Accessory mode, replace 'localhost' with the IP address or hostname of your Spark device.
@ -250,20 +260,23 @@ hostname -I
## Or to get the hostname: ## Or to get the hostname:
hostname hostname
## Then test accessibility (replace <SPARK_IP_OR_HOSTNAME> with the actual value): ## Then test accessibility (replace <SPARK_IP_OR_HOSTNAME> with the actual value):
curl -I http://<SPARK_IP_OR_HOSTNAME>:3000 curl -I http://<SPARK_IP_OR_HOSTNAME>:7777
``` ```
Open `http://localhost:3000` or `http://<SPARK_IP_OR_HOSTNAME>:3000` in your browser to access the Agent interface. Open `http://localhost:7777` or `http://<SPARK_IP_OR_HOSTNAME>:7777` in your browser to access the Agent interface.
## Step 8. Test video processing workflow ## Step 8. Test video processing workflow
Run a basic test to verify the video analysis pipeline is functioning based on your deployment. The UI comes with a few example videos pre-populated for uploading and testing Run a basic test to verify the video analysis pipeline is functioning based on your deployment.
**For Standard VSS deployment** **For Standard VSS deployment**
Follow the steps [here](https://docs.nvidia.com/vss/latest/quickstart.html#deploy) to navigate VSS Agent UI. Follow the steps [here](https://docs.nvidia.com/vss/latest/quickstart.html#deploy) to navigate VSS Agent UI.
- Access VSS Agent interface at `http://localhost:3000` - Access VSS Agent interface at `http://localhost:7777`
- Download the sample data from NGC [here](https://docs.nvidia.com/vss/latest/quickstart.html#download-sample-data-from-ngc) and upload videos and test features [here](https://docs.nvidia.com/vss/latest/quickstart.html#download-sample-data-from-ngc) - Download the sample data from NGC [here](https://docs.nvidia.com/vss/latest/quickstart.html#download-sample-data-from-ngc) and upload videos and test features
- Test Standard VSS deployment (Base) [here](https://docs.nvidia.com/vss/latest/quickstart.html#step-2-upload-a-video)
- Test Standard VSS deployment (Alert Verification) [here](https://docs.nvidia.com/vss/latest/agent-workflow-alert-verification.html#step-2-add-a-video-stream)
- Test Standard VSS deployment (Real-Time Alerts) [here](https://docs.nvidia.com/vss/latest/agent-workflow-rt-alert.html#step-2-add-a-video-stream)
## Step 9. Cleanup and rollback ## Step 9. Cleanup and rollback
@ -275,7 +288,7 @@ To completely remove the VSS deployment and free up system resources [Follow](ht
```bash ```bash
## For Standard VSS deployment ## For Standard VSS deployment
scripts/dev-profile.sh down deploy/docker/scripts/dev-profile.sh down
``` ```
## Step 10. Next steps ## Step 10. Next steps
@ -283,8 +296,8 @@ scripts/dev-profile.sh down
With VSS deployed, you can now: With VSS deployed, you can now:
**Standard VSS deployment:** **Standard VSS deployment:**
- Access full VSS capabilities at port 3000 - Access full VSS capabilities at port 7777
- Test video summarization and Q&A features - Test video and Q&A features
- Configure knowledge graphs and graph databases - Configure knowledge graphs and graph databases
- Integrate with existing video processing workflows - Integrate with existing video processing workflows