Compare commits

...

5 Commits

Author SHA1 Message Date
Ago
72f0abd1cf
Merge f75d5817aa into 8452a1c5b1 2026-04-08 03:10:21 +00:00
GitLab CI
8452a1c5b1 chore: Regenerate all playbooks 2026-04-08 02:41:59 +00:00
GitLab CI
9414a5141f chore: Regenerate all playbooks 2026-04-07 04:13:30 +00:00
GitLab CI
911ca6db8b chore: Regenerate all playbooks 2026-04-06 19:32:24 +00:00
agolajko
f75d5817aa nemotron reqs --trust-remote-code for vllm setup 2026-01-26 08:16:23 -08:00
5 changed files with 19 additions and 8 deletions

View File

@ -47,8 +47,8 @@ All necessary files for the playbook can be found [here on GitHub](https://githu
* **Duration:** 45-90 minutes for complete setup and initial model fine-tuning * **Duration:** 45-90 minutes for complete setup and initial model fine-tuning
* **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations * **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations
* **Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations. * **Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations.
* **Last Updated:** 01/15/2026 * **Last Updated:** 03/04/2026
* Fix qLoRA fine-tuning workflow * Recommend running Nemo finetune workflow via Docker
## Instructions ## Instructions

View File

@ -172,12 +172,15 @@ Verify the NVIDIA runtime works:
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
``` ```
If you get a permission denied error on `docker`, add your user to the Docker group and log out/in: If you get a permission denied error on `docker`, add your user to the Docker group and activate the new group in your current session:
```bash ```bash
sudo usermod -aG docker $USER sudo usermod -aG docker $USER
newgrp docker
``` ```
This applies the group change immediately. Alternatively, you can log out and back in instead of running `newgrp docker`.
> [!NOTE] > [!NOTE]
> DGX Spark uses cgroup v2. OpenShell's gateway embeds k3s inside Docker and needs host cgroup namespace access. Without `default-cgroupns-mode: host`, the gateway can fail with "Failed to start ContainerManager" errors. > DGX Spark uses cgroup v2. OpenShell's gateway embeds k3s inside Docker and needs host cgroup namespace access. Without `default-cgroupns-mode: host`, the gateway can fail with "Failed to start ContainerManager" errors.
@ -322,13 +325,21 @@ http://127.0.0.1:18789/#token=<long-token-here>
**If accessing the Web UI from a remote machine**, you need to set up port forwarding. **If accessing the Web UI from a remote machine**, you need to set up port forwarding.
First, find your Spark's IP address. On the Spark, run:
```bash
hostname -I | awk '{print $1}'
```
This prints the primary IP address (e.g. `192.168.1.42`). You can also find it in **Settings > Wi-Fi** or **Settings > Network** on the Spark's desktop, or check your router's connected-devices list.
Start the port forward on the Spark host: Start the port forward on the Spark host:
```bash ```bash
openshell forward start 18789 my-assistant --background openshell forward start 18789 my-assistant --background
``` ```
Then from your remote machine, create an SSH tunnel to the Spark: Then from your remote machine, create an SSH tunnel to the Spark (replace `<your-spark-ip>` with the IP address from above):
```bash ```bash
ssh -L 18789:127.0.0.1:18789 <your-user>@<your-spark-ip> ssh -L 18789:127.0.0.1:18789 <your-user>@<your-spark-ip>

View File

@ -27,8 +27,8 @@ services:
# Ollama configuration # Ollama configuration
- OLLAMA_BASE_URL=http://ollama:11434/v1 - OLLAMA_BASE_URL=http://ollama:11434/v1
- OLLAMA_MODEL=llama3.1:8b - OLLAMA_MODEL=llama3.1:8b
# Disable vLLM # vLLM disabled in default Ollama mode
- VLLM_BASE_URL=http://localhost:8001/v1 # - VLLM_BASE_URL=http://localhost:8001/v1
- VLLM_MODEL=disabled - VLLM_MODEL=disabled
# Vector DB configuration # Vector DB configuration
- QDRANT_URL=http://qdrant:6333 - QDRANT_URL=http://qdrant:6333

View File

@ -108,7 +108,7 @@ export class TextProcessor {
// Determine which LLM provider to use based on configuration // Determine which LLM provider to use based on configuration
// Priority: vLLM > NVIDIA > Ollama // Priority: vLLM > NVIDIA > Ollama
if (process.env.VLLM_BASE_URL) { if (process.env.VLLM_BASE_URL && process.env.VLLM_MODEL && process.env.VLLM_MODEL !== 'disabled') {
this.selectedLLMProvider = 'vllm'; this.selectedLLMProvider = 'vllm';
} else if (process.env.NVIDIA_API_KEY) { } else if (process.env.NVIDIA_API_KEY) {
this.selectedLLMProvider = 'nvidia'; this.selectedLLMProvider = 'nvidia';

View File

@ -82,7 +82,7 @@ The following models are supported with vLLM on Spark. All listed models are ava
| **Nemotron3-Nano** | FP8 | ✅ | [`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8`](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8) | | **Nemotron3-Nano** | FP8 | ✅ | [`nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8`](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8) |
> [!NOTE] > [!NOTE]
> The Phi-4-multimodal-instruct models require `--trust-remote-code` when launching vLLM. > The Phi-4-multimodal-instruct and Nemotron3-Nano models require `--trust-remote-code` when launching vLLM.
> [!NOTE] > [!NOTE]
> You can use the NVFP4 Quantization documentation to generate your own NVFP4-quantized checkpoints for your favorite models. This enables you to take advantage of the performance and memory benefits of NVFP4 quantization even for models not already published by NVIDIA. > You can use the NVFP4 Quantization documentation to generate your own NVFP4-quantized checkpoints for your favorite models. This enables you to take advantage of the performance and memory benefits of NVFP4 quantization even for models not already published by NVIDIA.