mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 01:53:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
fa1fc9f685
commit
819ce6334c
@ -77,8 +77,10 @@ to the DGX Spark device
|
||||
|
||||
## Step 1. Install NVIDIA Sync
|
||||
|
||||
Download and install NVIDIA Sync for your operating system. NVIDIA Sync provides a unified
|
||||
interface for managing SSH connections and launching development tools on your DGX Spark device.
|
||||
NVIDIA Sync is a desktop app that connects your computer to your DGX Spark over the local network.
|
||||
It gives you a single interface to manage SSH access and launch development tools on your DGX Spark.
|
||||
|
||||
Download and install NVIDIA Sync on your computer to get started.
|
||||
|
||||
::spark-download
|
||||
|
||||
@ -115,30 +117,27 @@ interface for managing SSH connections and launching development tools on your D
|
||||
|
||||
## Step 2. Configure Apps
|
||||
|
||||
After starting NVIDIA Sync and agreeing to the EULA, select which development tools you want
|
||||
to use. These are tools installed on your laptop that Sync can configure and launch connected to your Spark.
|
||||
Apps are desktop programs installed on your laptop that NVIDIA Sync can configure and launch with an automatic connection to your Spark.
|
||||
|
||||
You can modify these selections later in the Settings window. Applications marked "unavailable"
|
||||
require installation on your laptop.
|
||||
You can change your app selections anytime in the Settings window. Apps that are marked "unavailable" must be installed before you can use them.
|
||||
|
||||
**Default Options:**
|
||||
**Default apps:**
|
||||
- **DGX Dashboard**: Web application pre-installed on DGX Spark for system management and integrated JupyterLab access
|
||||
- **Terminal**: Your system's built-in terminal with automatic SSH connection
|
||||
|
||||
**Optional apps (require separate installation):**
|
||||
- **VS Code**: Download from https://code.visualstudio.com/download
|
||||
- **Cursor**: Download from https://cursor.com/downloads
|
||||
- **NVIDIA AI Workbench**: Download from https://nvidia.com/workbench
|
||||
- **NVIDIA AI Workbench**: Download from https://www.nvidia.com/workbench
|
||||
|
||||
## Step 3. Add your DGX Spark device
|
||||
|
||||
> **Find Your Hostname or IP**
|
||||
>
|
||||
> [!NOTE]
|
||||
> You must know either your hostname or IP address to connect.
|
||||
>
|
||||
> - The default hostname can be found on the Quick Start Guide included in the box. For example, `spark-abcd.local`
|
||||
> - If you have a display connected to your device, you can find the hostname on the Settings page of the [DGX Dashboard](http://localhost:11000).
|
||||
> - If `.local` (mDNS) hostnames don't work on your network you must use your IP address. This can be found in Ubuntu's network settings or by logging into the admin console of your router.
|
||||
> - If `.local` (mDNS) hostnames don't work on your network you must use an IP address. This can be found in Ubuntu's network settings or by logging into the admin console of your router.
|
||||
|
||||
Finally, connect your DGX Spark by filling out the form:
|
||||
|
||||
@ -159,7 +158,8 @@ Click add "Add" and NVIDIA Sync will automatically:
|
||||
4. Create an SSH alias locally for future connections
|
||||
5. Discard your username and password information
|
||||
|
||||
> **Wait for update:** After completing system setup for the first time, your device may take several minutes to update and become available on the network. If NVIDIA Sync fails to connect, please wait 3-4 minutes and try again.
|
||||
> [!IMPORTANT]
|
||||
> After completing system setup for the first time, your device may take several minutes to update and become available on the network. If NVIDIA Sync fails to connect, please wait 3-4 minutes and try again.
|
||||
|
||||
## Step 4. Access your DGX Spark
|
||||
|
||||
@ -178,9 +178,10 @@ connection to your DGX Spark.
|
||||
|
||||
## Step 5. Validate SSH setup
|
||||
|
||||
Verify your local SSH configuration is correct by using the SSH alias:
|
||||
NVIDIA Sync creates an SSH alias for your device for easy access manually or from other SSH enabled apps.
|
||||
|
||||
Test direct SSH connection (should not prompt for password)
|
||||
Verify your local SSH configuration is correct by using the SSH alias. You should not be prompted for your
|
||||
password when using the alias:
|
||||
|
||||
```bash
|
||||
## Configured if you use mDNS hostname
|
||||
@ -207,12 +208,21 @@ Exit the SSH session
|
||||
exit
|
||||
```
|
||||
|
||||
## Step 6. Next steps
|
||||
## Step 6. Troubleshooting
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Device name doesn't resolve | mDNS blocked on network | Use IP address instead of hostname.local |
|
||||
| Connection refused/timeout | DGX Spark not booted or SSH not ready | Wait for device boot completion; SSH available after updates finish |
|
||||
| Authentication failed | SSH key setup incomplete | Re-run device setup in NVIDIA Sync; check credentials |
|
||||
|
||||
## Step 7. Next steps
|
||||
|
||||
Test your setup by launching a development tool:
|
||||
- Click the NVIDIA Sync system tray icon.
|
||||
- Select "Terminal" to open a terminal session on your DGX Spark.
|
||||
- Or click "DGX Dashboard" to access the web interface at the forwarded localhost port.
|
||||
- Select "DGX Dashboard" to use Jupyterlab and manage updates.
|
||||
- Try [a custom port example with Open WebUI](/spark/open-webui/sync)
|
||||
|
||||
## Connect with Manual SSH
|
||||
|
||||
|
||||
@ -171,6 +171,10 @@ Unlike the base model, we can see that the fine-tuned model can generate multipl
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
|
||||
@ -202,6 +202,7 @@ docker container prune -f
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| CUDA out of memory during training | Batch size too large for GPU VRAM | Reduce `per_device_train_batch_size` or increase `gradient_accumulation_steps` |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models |
|
||||
| Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality |
|
||||
|
||||
|
||||
@ -142,7 +142,7 @@ docker volume rm "$(basename "$PWD")_postgres_data"
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
|
||||
@ -213,6 +213,7 @@ environment.
|
||||
|---------|-------|-----|
|
||||
| "CUDA out of memory" error | Insufficient VRAM for model | Use FP8/FP4 quantization or smaller model |
|
||||
| "Invalid HF token" error | Missing or expired HuggingFace token | Set valid token: `export HF_TOKEN=<YOUR_TOKEN>` |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Model download timeouts | Network issues or rate limiting | Retry command or pre-download models |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
|
||||
@ -319,6 +319,7 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au
|
||||
| GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed |
|
||||
| Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism |
|
||||
| ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
|
||||
@ -256,6 +256,7 @@ The quantized model is now ready for deployment. Common next steps include:
|
||||
| Model files not found in output directory | Volume mount failed or wrong path | Verify `$(pwd)/output_models` resolves correctly |
|
||||
| Git clone fails inside container | Network connectivity issues | Check internet connection and retry |
|
||||
| Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
|
||||
@ -60,14 +60,18 @@ If you see a permission denied error (something like `permission denied while tr
|
||||
|
||||
```bash
|
||||
sudo usermod -aG docker $USER
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
> **Warning**: After running usermod, you must log out and log back in to start a new
|
||||
> session with updated group permissions.
|
||||
Test Docker access again. In the terminal, run:
|
||||
|
||||
```bash
|
||||
docker ps
|
||||
```
|
||||
|
||||
## Step 2. Verify Docker setup and pull container
|
||||
|
||||
Open a new terminal, pull the Open WebUI container image with integrated Ollama:
|
||||
Pull the Open WebUI container image with integrated Ollama:
|
||||
|
||||
```bash
|
||||
docker pull ghcr.io/open-webui/open-webui:ollama
|
||||
@ -130,7 +134,8 @@ Press Enter to send the message and wait for the model's response.
|
||||
|
||||
Steps to completely remove the Open WebUI installation and free up resources:
|
||||
|
||||
> **Warning**: These commands will permanently delete all Open WebUI data and downloaded models.
|
||||
> [!WARNING]
|
||||
> These commands will permanently delete all Open WebUI data and downloaded models.
|
||||
|
||||
Stop and remove the Open WebUI container:
|
||||
|
||||
@ -151,9 +156,6 @@ Remove persistent data volumes:
|
||||
docker volume rm open-webui open-webui-ollama
|
||||
```
|
||||
|
||||
To rollback permission change: `sudo deluser $USER docker`
|
||||
|
||||
|
||||
## Step 9. Next steps
|
||||
|
||||
Try downloading different models from the Ollama library at https://ollama.com/library.
|
||||
@ -168,7 +170,8 @@ docker pull ghcr.io/open-webui/open-webui:ollama
|
||||
|
||||
## Setup Open WebUI on Remote Spark with NVIDIA Sync
|
||||
|
||||
> **Note**: If you haven't already installed NVIDIA Sync, [learn how here.](/spark/connect-to-your-spark/sync)
|
||||
> [!TIP]
|
||||
> If you haven't already installed NVIDIA Sync, [learn how here.](/spark/connect-to-your-spark/sync)
|
||||
|
||||
## Step 1. Configure Docker permissions
|
||||
|
||||
@ -184,17 +187,18 @@ If you see a permission denied error (something like `permission denied while tr
|
||||
|
||||
```bash
|
||||
sudo usermod -aG docker $USER
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
> **Warning**: After running usermod, you must close the terminal window completely to start a new
|
||||
> session with updated group permissions.
|
||||
Test Docker access again. In the terminal, run:
|
||||
|
||||
```bash
|
||||
docker ps
|
||||
```
|
||||
|
||||
## Step 2. Verify Docker setup and pull container
|
||||
|
||||
This step confirms Docker is working properly and downloads the Open WebUI container
|
||||
image. This runs on the DGX Spark device and may take several minutes depending on network speed.
|
||||
|
||||
Open a new Terminal app from NVIDIA Sync and pull the Open WebUI container image with integrated Ollama:
|
||||
Open a new Terminal app from NVIDIA Sync and pull the Open WebUI container image with integrated Ollama on your DGX Spark:
|
||||
|
||||
```bash
|
||||
docker pull ghcr.io/open-webui/open-webui:ollama
|
||||
@ -204,18 +208,15 @@ Once the container image is downloaded, continue to setup NVIDIA Sync.
|
||||
|
||||
## Step 3. Open NVIDIA Sync Settings
|
||||
|
||||
Click on the NVIDIA Sync icon in your system tray or taskbar to open the main application window.
|
||||
- Click on the NVIDIA Sync icon in your system tray or taskbar to open the main application window.
|
||||
- Click the gear icon in the top right corner to open the Settings window.
|
||||
- Click on the "Custom" tab to access Custom Ports configuration.
|
||||
|
||||
Click the gear icon in the top right corner to open the Settings window.
|
||||
## Step 4. Add Open WebUI custom port configuration
|
||||
|
||||
Click on the "Custom" tab to access Custom Ports configuration.
|
||||
Setting up a new Custom port will XXXX
|
||||
|
||||
## Step 4. Add Open WebUI custom port
|
||||
|
||||
This step creates a new entry in NVIDIA Sync that will manage the Open
|
||||
WebUI container and create the necessary SSH tunnel.
|
||||
|
||||
Click the "Add New" button in the Custom section.
|
||||
Click the "Add New" button on the Custom tab.
|
||||
|
||||
Fill out the form with these values:
|
||||
|
||||
@ -270,22 +271,23 @@ echo "Running. Press Ctrl+C to stop ${NAME}."
|
||||
while :; do sleep 86400; done
|
||||
```
|
||||
|
||||
Click the "Add" button to save configuration.
|
||||
Click the "Add" button to save configuration to your DGX Spark.
|
||||
|
||||
## Step 5. Launch Open WebUI
|
||||
|
||||
This step starts the Open WebUI container on your DGX Spark and establishes the SSH
|
||||
tunnel. The browser will open automatically if configured correctly.
|
||||
|
||||
Click on the NVIDIA Sync icon in your system tray or taskbar to open the main application window.
|
||||
|
||||
Under the "Custom" section, click on "Open WebUI".
|
||||
|
||||
Your default web browser should automatically open to the Open WebUI interface at `http://localhost:12000`.
|
||||
|
||||
> [!TIP]
|
||||
> On first run, Open WebUI downloads models. This can delay server start and cause the page to fail to load in your browser. Simply wait and refresh the page.
|
||||
> On future launches it will open quickly.
|
||||
|
||||
## Step 6. Create administrator account
|
||||
|
||||
This step sets up the initial administrator account for Open WebUI. This is a local account that you will use to access the Open WebUI interface.
|
||||
To start using Open WebUI you must create an initial administrator account. This is a local account that you will use to access the Open WebUI interface.
|
||||
|
||||
In the Open WebUI interface, click the "Get Started" button at the bottom of the screen.
|
||||
|
||||
@ -295,14 +297,14 @@ Click the registration button to create your account and access the main interfa
|
||||
|
||||
## Step 7. Download and configure a model
|
||||
|
||||
This step downloads a language model through Ollama and configures it for use in
|
||||
Open WebUI. The download happens on your DGX Spark device and may take several minutes.
|
||||
Next, download a language model with Ollama and configure it for use in
|
||||
Open WebUI. This download happens on your DGX Spark device and may take several minutes.
|
||||
|
||||
Click on the "Select a model" dropdown in the top left corner of the Open WebUI interface.
|
||||
|
||||
Type `gpt-oss:20b` in the search field.
|
||||
|
||||
Click the "Pull 'gpt-oss:20b' from Ollama.com" button that appears.
|
||||
Click the `Pull "gpt-oss:20b" from Ollama.com` button that appears.
|
||||
|
||||
Wait for the model download to complete. You can monitor progress in the interface.
|
||||
|
||||
@ -310,9 +312,6 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
|
||||
|
||||
## Step 8. Test the model
|
||||
|
||||
This step verifies that the complete setup is working properly by testing model
|
||||
inference through the web interface.
|
||||
|
||||
In the chat textarea at the bottom of the Open WebUI interface, enter:
|
||||
|
||||
```
|
||||
@ -331,11 +330,40 @@ Under the "Custom" section, click the `x` icon on the right of the "Open WebUI"
|
||||
|
||||
This will close the tunnel and stop the Open WebUI docker container.
|
||||
|
||||
## Step 11. Cleanup and rollback
|
||||
## Step 10. Troubleshooting
|
||||
|
||||
Common issues and their solutions.
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Permission denied on docker ps | User not in docker group | Run Step 1 completely, including terminal restart |
|
||||
| Browser doesn't open automatically | Auto-open setting disabled | Manually navigate to localhost:12000 |
|
||||
| Model download fails | Network connectivity issues | Check internet connection, retry download |
|
||||
| GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct start script |
|
||||
| Port 12000 already in use | Another application using port | Change port in Custom App settings or stop conflicting service |
|
||||
|
||||
## Step 11. Next steps
|
||||
|
||||
Try downloading different models from the Ollama library at https://ollama.com/library.
|
||||
|
||||
You can monitor GPU and memory usage through the DGX Dashboard available in NVIDIA Sync as you try different models.
|
||||
|
||||
If Open WebUI reports an update is available, you can pull the the container image by running this in your terminal:
|
||||
|
||||
```bash
|
||||
docker stop open-webui
|
||||
docker rm open-webui
|
||||
docker pull ghcr.io/open-webui/open-webui:ollama
|
||||
```
|
||||
|
||||
After the update, launch Open WebUI again from NVIDIA Sync.
|
||||
|
||||
## Step 12. Cleanup and rollback
|
||||
|
||||
Steps to completely remove the Open WebUI installation and free up resources:
|
||||
|
||||
> **Warning**: These commands will permanently delete all Open WebUI data and downloaded models.
|
||||
> [!WARNING]
|
||||
> These commands will permanently delete all Open WebUI data and downloaded models.
|
||||
|
||||
Stop and remove the Open WebUI container:
|
||||
|
||||
@ -356,24 +384,8 @@ Remove persistent data volumes:
|
||||
docker volume rm open-webui open-webui-ollama
|
||||
```
|
||||
|
||||
To rollback permission change: `sudo deluser $USER docker`
|
||||
|
||||
Remove the Custom App from NVIDIA Sync by opening Settings > Custom tab and deleting the entry.
|
||||
|
||||
## Step 12. Next steps
|
||||
|
||||
Try downloading different models from the Ollama library at https://ollama.com/library.
|
||||
|
||||
You can monitor GPU and memory usage through the DGX Dashboard available in NVIDIA Sync as you try different models.
|
||||
|
||||
If Open WebUI reports an update is available, you can update the container image by running:
|
||||
|
||||
```bash
|
||||
docker pull ghcr.io/open-webui/open-webui:ollama
|
||||
```
|
||||
|
||||
After the update, launch Open WebUI again from NVIDIA Sync.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
## Common issues with manual setup
|
||||
@ -395,7 +407,8 @@ After the update, launch Open WebUI again from NVIDIA Sync.
|
||||
| GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct start script |
|
||||
| Port 12000 already in use | Another application using port | Change port in Custom App settings or stop conflicting service |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> > [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -117,6 +117,10 @@ python Llama3_3B_full_finetuning.py
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
|
||||
@ -163,7 +163,7 @@ docker stop <container_id>
|
||||
| "CUDA out of memory" error | Insufficient GPU memory | Reduce `kv_cache_free_gpu_memory_fraction` to 0.9 or use a device with more VRAM |
|
||||
| Container fails to start | Docker GPU support issues | Verify `nvidia-docker` is installed and `--gpus=all` flag is supported |
|
||||
| Model download fails | Network or authentication issues | Check HuggingFace authentication and network connectivity |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
|
||||
@ -729,6 +729,7 @@ Compare performance metrics between speculative decoding and baseline reports to
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| OOM during weight loading (e.g., [Nemotron Super 49B](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5)) | Parallel weight-loading memory pressure | `export TRT_LLM_DISABLE_LOAD_WEIGHTS_IN_PARALLEL=1` |
|
||||
| "CUDA out of memory" | GPU VRAM insufficient for model | Reduce `free_gpu_memory_fraction: 0.9` or batch size or use smaller model |
|
||||
| "Model not found" error | HF_TOKEN invalid or model inaccessible | Verify token and model permissions |
|
||||
@ -741,6 +742,7 @@ Compare performance metrics between speculative decoding and baseline reports to
|
||||
|---------|-------|-----|
|
||||
| MPI hostname test returns single hostname | Network connectivity issues | Verify both nodes are on reachable IP addresses |
|
||||
| "Permission denied" on HuggingFace download | Invalid or missing HF_TOKEN | Set valid token: `export HF_TOKEN=<TOKEN>` |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| "CUDA out of memory" errors | Insufficient GPU memory | Reduce `--max_batch_size` or `--max_num_tokens` |
|
||||
| Container exits immediately | Missing entrypoint script | Ensure `trtllm-mn-entrypoint.sh` download succeeded and has executable permissions, also ensure you are not running the container already on your node. If port 2233 is already utilized, the entrypoint script will not start. |
|
||||
|
||||
|
||||
@ -382,6 +382,7 @@ http://192.168.100.10:8265
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Model download fails | Authentication or network issue | Re-run `huggingface-cli login`, check internet access |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
|
||||
| CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter |
|
||||
|
||||
Loading…
Reference in New Issue
Block a user