Merge f75d5817aa into 8452a1c5b1

chore: Regenerate all playbooks
2026-04-28 12:43:52 +00:00 · 2026-04-08 03:10:21 +00:00 · 2026-04-08 02:41:59 +00:00 · 2026-04-07 04:13:30 +00:00 · 2026-04-06 19:32:24 +00:00 · 2026-01-26 08:16:23 -08:00
5 changed files with 19 additions and 8 deletions
--- a/nvidia/nemo-fine-tune/README.md
+++ b/nvidia/nemo-fine-tune/README.md
@ -47,8 +47,8 @@ All necessary files for the playbook can be found [here on GitHub](https://githu
 * **Duration:** 45-90 minutes for complete setup and initial model fine-tuning
 * **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations
 * **Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations.
-* **Last Updated:** 01/15/2026
+* **Last Updated:** 03/04/2026
-  * Fix qLoRA fine-tuning workflow
+  * Recommend running Nemo finetune workflow via Docker
 ## Instructions
--- a/nvidia/nemoclaw/README.md
+++ b/nvidia/nemoclaw/README.md
@ -172,12 +172,15 @@ Verify the NVIDIA runtime works:
 docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
 ```
-If you get a permission denied error on `docker`, add your user to the Docker group and log out/in:
+If you get a permission denied error on `docker`, add your user to the Docker group and activate the new group in your current session:
 ```bash
 sudo usermod -aG docker $USER
 newgrp docker
 ```
 This applies the group change immediately. Alternatively, you can log out and back in instead of running `newgrp docker`.
 > [!NOTE]
 > DGX Spark uses cgroup v2. OpenShell's gateway embeds k3s inside Docker and needs host cgroup namespace access. Without `default-cgroupns-mode: host`, the gateway can fail with "Failed to start ContainerManager" errors.
@ -322,13 +325,21 @@ http://127.0.0.1:18789/#token=<long-token-here>
 **If accessing the Web UI from a remote machine**, you need to set up port forwarding.
 First, find your Spark's IP address. On the Spark, run:
 ```bash
 hostname -I | awk '{print $1}'
 ```
 This prints the primary IP address (e.g. `192.168.1.42`). You can also find it in **Settings > Wi-Fi** or **Settings > Network** on the Spark's desktop, or check your router's connected-devices list.
 Start the port forward on the Spark host:
 ```bash
 openshell forward start 18789 my-assistant --background
 ```
-Then from your remote machine, create an SSH tunnel to the Spark:
+Then from your remote machine, create an SSH tunnel to the Spark (replace `<your-spark-ip>` with the IP address from above):
 ```bash
 ssh -L 18789:127.0.0.1:18789 <your-user>@<your-spark-ip>
--- a/nvidia/txt2kg/assets/deploy/compose/docker-compose.yml
+++ b/nvidia/txt2kg/assets/deploy/compose/docker-compose.yml
@ -27,8 +27,8 @@ services:
      # Ollama configuration
      - OLLAMA_BASE_URL=http://ollama:11434/v1
      - OLLAMA_MODEL=llama3.1:8b
-      # Disable vLLM
+      # vLLM disabled in default Ollama mode
-      - VLLM_BASE_URL=http://localhost:8001/v1
+      # - VLLM_BASE_URL=http://localhost:8001/v1
      - VLLM_MODEL=disabled
      # Vector DB configuration
      - QDRANT_URL=http://qdrant:6333
--- a/nvidia/txt2kg/assets/frontend/lib/text-processor.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/text-processor.ts
@ -108,7 +108,7 @@ export class TextProcessor {
    // Determine which LLM provider to use based on configuration
    // Priority: vLLM > NVIDIA > Ollama
-    if (process.env.VLLM_BASE_URL) {
+    if (process.env.VLLM_BASE_URL && process.env.VLLM_MODEL && process.env.VLLM_MODEL !== 'disabled') {
      this.selectedLLMProvider = 'vllm';
    } else if (process.env.NVIDIA_API_KEY) {
      this.selectedLLMProvider = 'nvidia';
--- a/nvidia/vllm/README.md
+++ b/nvidia/vllm/README.md
@ -82,7 +82,7 @@ The following models are supported with vLLM on Spark. All listed models are ava
 | **Nemotron3-Nano** | FP8 | ✅ | [`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8`](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8) |
 > [!NOTE]
-> The Phi-4-multimodal-instruct models require `--trust-remote-code` when launching vLLM.
+> The Phi-4-multimodal-instruct and Nemotron3-Nano models require `--trust-remote-code` when launching vLLM.
 > [!NOTE]
 > You can use the NVFP4 Quantization documentation to generate your own NVFP4-quantized checkpoints for your favorite models. This enables you to take advantage of the performance and memory benefits of NVFP4 quantization even for models not already published by NVIDIA.
Author	SHA1	Message	Date
Ago	72f0abd1cf	Merge `f75d5817aa` into `8452a1c5b1`	2026-04-08 03:10:21 +00:00
GitLab CI	8452a1c5b1	chore: Regenerate all playbooks	2026-04-08 02:41:59 +00:00
GitLab CI	9414a5141f	chore: Regenerate all playbooks	2026-04-07 04:13:30 +00:00
GitLab CI	911ca6db8b	chore: Regenerate all playbooks	2026-04-06 19:32:24 +00:00
agolajko	f75d5817aa	nemotron reqs --trust-remote-code for vllm setup	2026-01-26 08:16:23 -08:00