mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 01:53:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
df8de8ce09
commit
fa1fc9f685
@ -23,7 +23,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
|
||||
|
||||
- [Comfy UI](nvidia/comfy-ui/)
|
||||
- [Set Up Local Network Access](nvidia/connect-to-your-spark/)
|
||||
- [CUDA-X](nvidia/cuda-x-data-science/)
|
||||
- [CUDA-X Data Science](nvidia/cuda-x-data-science/)
|
||||
- [DGX Dashboard](nvidia/dgx-dashboard/)
|
||||
- [FLUX.1 Dreambooth LoRA Fine-tuning](nvidia/flux-finetuning/)
|
||||
- [Optimized JAX](nvidia/jax/)
|
||||
|
||||
@ -77,10 +77,8 @@ to the DGX Spark device
|
||||
|
||||
## Step 1. Install NVIDIA Sync
|
||||
|
||||
NVIDIA Sync is a desktop app that connects your computer to your DGX Spark over the local network.
|
||||
It gives you a single interface to manage SSH access and launch development tools on your DGX Spark.
|
||||
|
||||
Download and install NVIDIA Sync on your computer to get started.
|
||||
Download and install NVIDIA Sync for your operating system. NVIDIA Sync provides a unified
|
||||
interface for managing SSH connections and launching development tools on your DGX Spark device.
|
||||
|
||||
::spark-download
|
||||
|
||||
@ -117,27 +115,30 @@ Download and install NVIDIA Sync on your computer to get started.
|
||||
|
||||
## Step 2. Configure Apps
|
||||
|
||||
Apps are desktop programs installed on your laptop that NVIDIA Sync can configure and launch with an automatic connection to your Spark.
|
||||
After starting NVIDIA Sync and agreeing to the EULA, select which development tools you want
|
||||
to use. These are tools installed on your laptop that Sync can configure and launch connected to your Spark.
|
||||
|
||||
You can change your app selections anytime in the Settings window. Apps that are marked "unavailable" must be installed before you can use them.
|
||||
You can modify these selections later in the Settings window. Applications marked "unavailable"
|
||||
require installation on your laptop.
|
||||
|
||||
**Default apps:**
|
||||
**Default Options:**
|
||||
- **DGX Dashboard**: Web application pre-installed on DGX Spark for system management and integrated JupyterLab access
|
||||
- **Terminal**: Your system's built-in terminal with automatic SSH connection
|
||||
|
||||
**Optional apps (require separate installation):**
|
||||
- **VS Code**: Download from https://code.visualstudio.com/download
|
||||
- **Cursor**: Download from https://cursor.com/downloads
|
||||
- **NVIDIA AI Workbench**: Download from https://www.nvidia.com/workbench
|
||||
- **NVIDIA AI Workbench**: Download from https://nvidia.com/workbench
|
||||
|
||||
## Step 3. Add your DGX Spark device
|
||||
|
||||
> [!NOTE]
|
||||
> **Find Your Hostname or IP**
|
||||
>
|
||||
> You must know either your hostname or IP address to connect.
|
||||
>
|
||||
> - The default hostname can be found on the Quick Start Guide included in the box. For example, `spark-abcd.local`
|
||||
> - If you have a display connected to your device, you can find the hostname on the Settings page of the [DGX Dashboard](http://localhost:11000).
|
||||
> - If `.local` (mDNS) hostnames don't work on your network you must use an IP address. This can be found in Ubuntu's network settings or by logging into the admin console of your router.
|
||||
> - If `.local` (mDNS) hostnames don't work on your network you must use your IP address. This can be found in Ubuntu's network settings or by logging into the admin console of your router.
|
||||
|
||||
Finally, connect your DGX Spark by filling out the form:
|
||||
|
||||
@ -158,8 +159,7 @@ Click add "Add" and NVIDIA Sync will automatically:
|
||||
4. Create an SSH alias locally for future connections
|
||||
5. Discard your username and password information
|
||||
|
||||
> [!IMPORTANT]
|
||||
> After completing system setup for the first time, your device may take several minutes to update and become available on the network. If NVIDIA Sync fails to connect, please wait 3-4 minutes and try again.
|
||||
> **Wait for update:** After completing system setup for the first time, your device may take several minutes to update and become available on the network. If NVIDIA Sync fails to connect, please wait 3-4 minutes and try again.
|
||||
|
||||
## Step 4. Access your DGX Spark
|
||||
|
||||
@ -178,10 +178,9 @@ connection to your DGX Spark.
|
||||
|
||||
## Step 5. Validate SSH setup
|
||||
|
||||
NVIDIA Sync creates an SSH alias for your device for easy access manually or from other SSH enabled apps.
|
||||
Verify your local SSH configuration is correct by using the SSH alias:
|
||||
|
||||
Verify your local SSH configuration is correct by using the SSH alias. You should not be prompted for your
|
||||
password when using the alias:
|
||||
Test direct SSH connection (should not prompt for password)
|
||||
|
||||
```bash
|
||||
## Configured if you use mDNS hostname
|
||||
@ -208,21 +207,12 @@ Exit the SSH session
|
||||
exit
|
||||
```
|
||||
|
||||
## Step 6. Troubleshooting
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Device name doesn't resolve | mDNS blocked on network | Use IP address instead of hostname.local |
|
||||
| Connection refused/timeout | DGX Spark not booted or SSH not ready | Wait for device boot completion; SSH available after updates finish |
|
||||
| Authentication failed | SSH key setup incomplete | Re-run device setup in NVIDIA Sync; check credentials |
|
||||
|
||||
## Step 7. Next steps
|
||||
## Step 6. Next steps
|
||||
|
||||
Test your setup by launching a development tool:
|
||||
- Click the NVIDIA Sync system tray icon.
|
||||
- Select "Terminal" to open a terminal session on your DGX Spark.
|
||||
- Select "DGX Dashboard" to use Jupyterlab and manage updates.
|
||||
- Try [a custom port example with Open WebUI](/spark/open-webui/sync)
|
||||
- Or click "DGX Dashboard" to access the web interface at the forwarded localhost port.
|
||||
|
||||
## Connect with Manual SSH
|
||||
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
# CUDA-X
|
||||
# CUDA-X Data Science
|
||||
|
||||
> Accelerated data science with NVIDIA RAPIDS
|
||||
> Install and use NVIDIA cuML and NVIDIA cuDF to accelerate UMAP, HDBSCAN, pandas and more with zero code changes.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
@ -12,18 +12,25 @@
|
||||
## Overview
|
||||
|
||||
## Basic Idea
|
||||
CUDA-X Data Science (formally RAPIDS) is an open-source library collection that accelerates the data science and data processing ecosystem. Accelerate popular python tools like scikit-learn and pandas with zero code changes on DGX Spark to maximize performance at your desk. This playbook orients you with example workflows, demonstrating the acceleration of key machine learning algorithms like UMAP and HBDSCAN and core pandas operations, without changing your code.
|
||||
This playbook includes two example notebooks that demonstrate the acceleration of key machine learning algorithms and core pandas operations using CUDA-X Data Science libraries:
|
||||
|
||||
In this playbook, we will demonstrate the acceleration of key machine learning algorithms like UMAP and HBDSCAN and core pandas operations, without changing your code.
|
||||
- **NVIDIA cuDF:** Accelerates operations for data preparation and core data processing of 8GB of strings data, with no code changes.
|
||||
- **NVIDIA cuML:** Accelerates popular, compute intensive machine learning algorithms in sci-kit learn (LinearSVC), UMAP, and HDBSCAN, with no code changes.
|
||||
|
||||
## What to know before starting
|
||||
- Familiarity with pandas, scikit learn, machine learning algorithms, such as support vector machine, clustering, and dimensionality reduction algorithms
|
||||
CUDA-X Data Science (formally RAPIDS) is an open-source library collection that accelerates the data science and data processing ecosystem. These libraries accelerate popular Python tools like scikit-learn and pandas with zero code changes. On DGX Spark, these libraries maximize performance at your desk with your existing code.
|
||||
|
||||
## What you'll accomplish
|
||||
You will accelerate popular machine learning algorithms and data analytics operations GPU. You will understand how to accelerate popular Python tools, and the value of running data science workflows on your DGX Spark.
|
||||
|
||||
## Prerequisites
|
||||
- Familiarity with pandas, scikit-learn, machine learning algorithms, such as support vector machine, clustering, and dimensionality reduction algorithms.
|
||||
- Install conda
|
||||
- Generate a Kaggle API key
|
||||
|
||||
**Duration:** 20-30 minutes setup time and 2-3 minutes to run each notebook.
|
||||
## Time & risk
|
||||
- Duration:
|
||||
- 20-30 minutes setup time.
|
||||
- 2-3 minutes to run each notebook.
|
||||
|
||||
## Instructions
|
||||
|
||||
@ -33,32 +40,34 @@ In this playbook, we will demonstrate the acceleration of key machine learning a
|
||||
- Install conda using [these instructions](https://docs.anaconda.com/miniconda/install/)
|
||||
- Create Kaggle API key using [these instructions](https://www.kaggle.com/discussions/general/74235) and place the **kaggle.json** file in the same folder as the notebook
|
||||
|
||||
## Step 2. Installing CUDA-X libraries
|
||||
- use the following command to install the CUDA-X libraries (this will create a new conda environment)
|
||||
## Step 2. Installing Data Science libraries
|
||||
- Use the following command to install the CUDA-X libraries (this will create a new conda environment)
|
||||
```bash
|
||||
conda create -n rapids-test -c rapidsai-nightly -c conda-forge -c nvidia \
|
||||
rapids=25.10 python=3.12 'cuda-version=13.0' \
|
||||
jupyterlab hdbscan umap-learn
|
||||
```
|
||||
## Step 3. Activate the conda environment
|
||||
- activate the conda environment
|
||||
- Activate the conda environment
|
||||
```bash
|
||||
conda activate rapids-test
|
||||
```
|
||||
## Step 4. Cloning the notebooks
|
||||
- clone the github repository and go the cuda-x-data-science/assets folder
|
||||
## Step 4. Cloning the playbook repository
|
||||
- Clone the github repository and go the assets folder place in cuda-x-data-science folder
|
||||
```bash
|
||||
ssh://git@******:12051/spark-playbooks/dgx-spark-playbook-assets.git
|
||||
git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets
|
||||
```
|
||||
- place the **kaggle.json** created in Step 1 in the assets folder
|
||||
- Place the **kaggle.json** created in Step 1 in the assets folder
|
||||
|
||||
## Step 5. Run the notebooks
|
||||
- Both the notebooks are self explanatory
|
||||
- To experience the acceleration achieved using cudf.pandas, run the cudf_pandas_demo.ipynb notebook
|
||||
There are two notebooks in the GitHub repository.
|
||||
One runs an example of a large strings data processing workflow with pandas code on GPU.
|
||||
- Run the cudf_pandas_demo.ipynb notebook
|
||||
```bash
|
||||
jupyter notebook cudf_pandas_demo.ipynb
|
||||
```
|
||||
- To experience the acceleration achieved using cuml, run the cuml_sklearn_demo.ipynb notebook
|
||||
The other goes over an example of machine learning algorithms including UMAP and HDBSCAN.
|
||||
- Run the cuml_sklearn_demo.ipynb notebook
|
||||
```bash
|
||||
jupyter notebook cuml_sklearn_demo.ipynb
|
||||
```
|
||||
|
||||
@ -171,10 +171,6 @@ Unlike the base model, we can see that the fine-tuned model can generate multipl
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
|
||||
@ -202,7 +202,6 @@ docker container prune -f
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| CUDA out of memory during training | Batch size too large for GPU VRAM | Reduce `per_device_train_batch_size` or increase `gradient_accumulation_steps` |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Model download fails or is slow | Network connectivity or Hugging Face Hub issues | Check internet connection, try using `HF_HUB_OFFLINE=1` for cached models |
|
||||
| Training loss not decreasing | Learning rate too high/low or insufficient data | Adjust `learning_rate` parameter or check dataset quality |
|
||||
|
||||
|
||||
@ -142,7 +142,7 @@ docker volume rm "$(basename "$PWD")_postgres_data"
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
|
||||
@ -213,7 +213,6 @@ environment.
|
||||
|---------|-------|-----|
|
||||
| "CUDA out of memory" error | Insufficient VRAM for model | Use FP8/FP4 quantization or smaller model |
|
||||
| "Invalid HF token" error | Missing or expired HuggingFace token | Set valid token: `export HF_TOKEN=<YOUR_TOKEN>` |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Model download timeouts | Network issues or rate limiting | Retry command or pre-download models |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
|
||||
@ -319,7 +319,6 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au
|
||||
| GPU not detected in training | CUDA driver/runtime mismatch | Verify driver compatibility: `nvidia-smi` and reinstall CUDA if needed |
|
||||
| Out of memory during training | Model too large for available GPU memory | Reduce batch size, enable gradient checkpointing, or use model parallelism |
|
||||
| ARM64 package compatibility issues | Package not available for ARM architecture | Use source installation or build from source with ARM64 flags |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
|
||||
@ -256,7 +256,6 @@ The quantized model is now ready for deployment. Common next steps include:
|
||||
| Model files not found in output directory | Volume mount failed or wrong path | Verify `$(pwd)/output_models` resolves correctly |
|
||||
| Git clone fails inside container | Network connectivity issues | Check internet connection and retry |
|
||||
| Quantization process hangs | Container resource limits | Increase Docker memory limits or use `--ulimit` flags |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
|
||||
@ -60,18 +60,14 @@ If you see a permission denied error (something like `permission denied while tr
|
||||
|
||||
```bash
|
||||
sudo usermod -aG docker $USER
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
Test Docker access again. In the terminal, run:
|
||||
|
||||
```bash
|
||||
docker ps
|
||||
```
|
||||
> **Warning**: After running usermod, you must log out and log back in to start a new
|
||||
> session with updated group permissions.
|
||||
|
||||
## Step 2. Verify Docker setup and pull container
|
||||
|
||||
Pull the Open WebUI container image with integrated Ollama:
|
||||
Open a new terminal, pull the Open WebUI container image with integrated Ollama:
|
||||
|
||||
```bash
|
||||
docker pull ghcr.io/open-webui/open-webui:ollama
|
||||
@ -134,8 +130,7 @@ Press Enter to send the message and wait for the model's response.
|
||||
|
||||
Steps to completely remove the Open WebUI installation and free up resources:
|
||||
|
||||
> [!WARNING]
|
||||
> These commands will permanently delete all Open WebUI data and downloaded models.
|
||||
> **Warning**: These commands will permanently delete all Open WebUI data and downloaded models.
|
||||
|
||||
Stop and remove the Open WebUI container:
|
||||
|
||||
@ -156,6 +151,9 @@ Remove persistent data volumes:
|
||||
docker volume rm open-webui open-webui-ollama
|
||||
```
|
||||
|
||||
To rollback permission change: `sudo deluser $USER docker`
|
||||
|
||||
|
||||
## Step 9. Next steps
|
||||
|
||||
Try downloading different models from the Ollama library at https://ollama.com/library.
|
||||
@ -170,8 +168,7 @@ docker pull ghcr.io/open-webui/open-webui:ollama
|
||||
|
||||
## Setup Open WebUI on Remote Spark with NVIDIA Sync
|
||||
|
||||
> [!TIP]
|
||||
> If you haven't already installed NVIDIA Sync, [learn how here.](/spark/connect-to-your-spark/sync)
|
||||
> **Note**: If you haven't already installed NVIDIA Sync, [learn how here.](/spark/connect-to-your-spark/sync)
|
||||
|
||||
## Step 1. Configure Docker permissions
|
||||
|
||||
@ -187,18 +184,17 @@ If you see a permission denied error (something like `permission denied while tr
|
||||
|
||||
```bash
|
||||
sudo usermod -aG docker $USER
|
||||
newgrp docker
|
||||
```
|
||||
|
||||
Test Docker access again. In the terminal, run:
|
||||
|
||||
```bash
|
||||
docker ps
|
||||
```
|
||||
> **Warning**: After running usermod, you must close the terminal window completely to start a new
|
||||
> session with updated group permissions.
|
||||
|
||||
## Step 2. Verify Docker setup and pull container
|
||||
|
||||
Open a new Terminal app from NVIDIA Sync and pull the Open WebUI container image with integrated Ollama on your DGX Spark:
|
||||
This step confirms Docker is working properly and downloads the Open WebUI container
|
||||
image. This runs on the DGX Spark device and may take several minutes depending on network speed.
|
||||
|
||||
Open a new Terminal app from NVIDIA Sync and pull the Open WebUI container image with integrated Ollama:
|
||||
|
||||
```bash
|
||||
docker pull ghcr.io/open-webui/open-webui:ollama
|
||||
@ -208,15 +204,18 @@ Once the container image is downloaded, continue to setup NVIDIA Sync.
|
||||
|
||||
## Step 3. Open NVIDIA Sync Settings
|
||||
|
||||
- Click on the NVIDIA Sync icon in your system tray or taskbar to open the main application window.
|
||||
- Click the gear icon in the top right corner to open the Settings window.
|
||||
- Click on the "Custom" tab to access Custom Ports configuration.
|
||||
Click on the NVIDIA Sync icon in your system tray or taskbar to open the main application window.
|
||||
|
||||
## Step 4. Add Open WebUI custom port configuration
|
||||
Click the gear icon in the top right corner to open the Settings window.
|
||||
|
||||
Setting up a new Custom port will XXXX
|
||||
Click on the "Custom" tab to access Custom Ports configuration.
|
||||
|
||||
Click the "Add New" button on the Custom tab.
|
||||
## Step 4. Add Open WebUI custom port
|
||||
|
||||
This step creates a new entry in NVIDIA Sync that will manage the Open
|
||||
WebUI container and create the necessary SSH tunnel.
|
||||
|
||||
Click the "Add New" button in the Custom section.
|
||||
|
||||
Fill out the form with these values:
|
||||
|
||||
@ -271,23 +270,22 @@ echo "Running. Press Ctrl+C to stop ${NAME}."
|
||||
while :; do sleep 86400; done
|
||||
```
|
||||
|
||||
Click the "Add" button to save configuration to your DGX Spark.
|
||||
Click the "Add" button to save configuration.
|
||||
|
||||
## Step 5. Launch Open WebUI
|
||||
|
||||
This step starts the Open WebUI container on your DGX Spark and establishes the SSH
|
||||
tunnel. The browser will open automatically if configured correctly.
|
||||
|
||||
Click on the NVIDIA Sync icon in your system tray or taskbar to open the main application window.
|
||||
|
||||
Under the "Custom" section, click on "Open WebUI".
|
||||
|
||||
Your default web browser should automatically open to the Open WebUI interface at `http://localhost:12000`.
|
||||
|
||||
> [!TIP]
|
||||
> On first run, Open WebUI downloads models. This can delay server start and cause the page to fail to load in your browser. Simply wait and refresh the page.
|
||||
> On future launches it will open quickly.
|
||||
|
||||
## Step 6. Create administrator account
|
||||
|
||||
To start using Open WebUI you must create an initial administrator account. This is a local account that you will use to access the Open WebUI interface.
|
||||
This step sets up the initial administrator account for Open WebUI. This is a local account that you will use to access the Open WebUI interface.
|
||||
|
||||
In the Open WebUI interface, click the "Get Started" button at the bottom of the screen.
|
||||
|
||||
@ -297,14 +295,14 @@ Click the registration button to create your account and access the main interfa
|
||||
|
||||
## Step 7. Download and configure a model
|
||||
|
||||
Next, download a language model with Ollama and configure it for use in
|
||||
Open WebUI. This download happens on your DGX Spark device and may take several minutes.
|
||||
This step downloads a language model through Ollama and configures it for use in
|
||||
Open WebUI. The download happens on your DGX Spark device and may take several minutes.
|
||||
|
||||
Click on the "Select a model" dropdown in the top left corner of the Open WebUI interface.
|
||||
|
||||
Type `gpt-oss:20b` in the search field.
|
||||
|
||||
Click the `Pull "gpt-oss:20b" from Ollama.com` button that appears.
|
||||
Click the "Pull 'gpt-oss:20b' from Ollama.com" button that appears.
|
||||
|
||||
Wait for the model download to complete. You can monitor progress in the interface.
|
||||
|
||||
@ -312,6 +310,9 @@ Once complete, select "gpt-oss:20b" from the model dropdown.
|
||||
|
||||
## Step 8. Test the model
|
||||
|
||||
This step verifies that the complete setup is working properly by testing model
|
||||
inference through the web interface.
|
||||
|
||||
In the chat textarea at the bottom of the Open WebUI interface, enter:
|
||||
|
||||
```
|
||||
@ -330,40 +331,11 @@ Under the "Custom" section, click the `x` icon on the right of the "Open WebUI"
|
||||
|
||||
This will close the tunnel and stop the Open WebUI docker container.
|
||||
|
||||
## Step 10. Troubleshooting
|
||||
|
||||
Common issues and their solutions.
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Permission denied on docker ps | User not in docker group | Run Step 1 completely, including terminal restart |
|
||||
| Browser doesn't open automatically | Auto-open setting disabled | Manually navigate to localhost:12000 |
|
||||
| Model download fails | Network connectivity issues | Check internet connection, retry download |
|
||||
| GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct start script |
|
||||
| Port 12000 already in use | Another application using port | Change port in Custom App settings or stop conflicting service |
|
||||
|
||||
## Step 11. Next steps
|
||||
|
||||
Try downloading different models from the Ollama library at https://ollama.com/library.
|
||||
|
||||
You can monitor GPU and memory usage through the DGX Dashboard available in NVIDIA Sync as you try different models.
|
||||
|
||||
If Open WebUI reports an update is available, you can pull the the container image by running this in your terminal:
|
||||
|
||||
```bash
|
||||
docker stop open-webui
|
||||
docker rm open-webui
|
||||
docker pull ghcr.io/open-webui/open-webui:ollama
|
||||
```
|
||||
|
||||
After the update, launch Open WebUI again from NVIDIA Sync.
|
||||
|
||||
## Step 12. Cleanup and rollback
|
||||
## Step 11. Cleanup and rollback
|
||||
|
||||
Steps to completely remove the Open WebUI installation and free up resources:
|
||||
|
||||
> [!WARNING]
|
||||
> These commands will permanently delete all Open WebUI data and downloaded models.
|
||||
> **Warning**: These commands will permanently delete all Open WebUI data and downloaded models.
|
||||
|
||||
Stop and remove the Open WebUI container:
|
||||
|
||||
@ -384,8 +356,24 @@ Remove persistent data volumes:
|
||||
docker volume rm open-webui open-webui-ollama
|
||||
```
|
||||
|
||||
To rollback permission change: `sudo deluser $USER docker`
|
||||
|
||||
Remove the Custom App from NVIDIA Sync by opening Settings > Custom tab and deleting the entry.
|
||||
|
||||
## Step 12. Next steps
|
||||
|
||||
Try downloading different models from the Ollama library at https://ollama.com/library.
|
||||
|
||||
You can monitor GPU and memory usage through the DGX Dashboard available in NVIDIA Sync as you try different models.
|
||||
|
||||
If Open WebUI reports an update is available, you can update the container image by running:
|
||||
|
||||
```bash
|
||||
docker pull ghcr.io/open-webui/open-webui:ollama
|
||||
```
|
||||
|
||||
After the update, launch Open WebUI again from NVIDIA Sync.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
## Common issues with manual setup
|
||||
@ -407,8 +395,7 @@ Remove the Custom App from NVIDIA Sync by opening Settings > Custom tab and dele
|
||||
| GPU not detected in container | Missing `--gpus=all flag` | Recreate container with correct start script |
|
||||
| Port 12000 already in use | Another application using port | Change port in Custom App settings or stop conflicting service |
|
||||
|
||||
> > [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -117,10 +117,6 @@ python Llama3_3B_full_finetuning.py
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
|
||||
@ -163,7 +163,7 @@ docker stop <container_id>
|
||||
| "CUDA out of memory" error | Insufficient GPU memory | Reduce `kv_cache_free_gpu_memory_fraction` to 0.9 or use a device with more VRAM |
|
||||
| Container fails to start | Docker GPU support issues | Verify `nvidia-docker` is installed and `--gpus=all` flag is supported |
|
||||
| Model download fails | Network or authentication issues | Check HuggingFace authentication and network connectivity |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
|
||||
| Server doesn't respond | Port conflicts or firewall | Check if port 8000 is available and not blocked |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
|
||||
@ -729,7 +729,6 @@ Compare performance metrics between speculative decoding and baseline reports to
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| OOM during weight loading (e.g., [Nemotron Super 49B](https://huggingface.co/nvidia/Llama-3_3-Nemotron-Super-49B-v1_5)) | Parallel weight-loading memory pressure | `export TRT_LLM_DISABLE_LOAD_WEIGHTS_IN_PARALLEL=1` |
|
||||
| "CUDA out of memory" | GPU VRAM insufficient for model | Reduce `free_gpu_memory_fraction: 0.9` or batch size or use smaller model |
|
||||
| "Model not found" error | HF_TOKEN invalid or model inaccessible | Verify token and model permissions |
|
||||
@ -742,7 +741,6 @@ Compare performance metrics between speculative decoding and baseline reports to
|
||||
|---------|-------|-----|
|
||||
| MPI hostname test returns single hostname | Network connectivity issues | Verify both nodes are on reachable IP addresses |
|
||||
| "Permission denied" on HuggingFace download | Invalid or missing HF_TOKEN | Set valid token: `export HF_TOKEN=<TOKEN>` |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| "CUDA out of memory" errors | Insufficient GPU memory | Reduce `--max_batch_size` or `--max_num_tokens` |
|
||||
| Container exits immediately | Missing entrypoint script | Ensure `trtllm-mn-entrypoint.sh` download succeeded and has executable permissions, also ensure you are not running the container already on your node. If port 2233 is already utilized, the entrypoint script will not start. |
|
||||
|
||||
|
||||
@ -382,7 +382,6 @@ http://192.168.100.10:8265
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
| Model download fails | Authentication or network issue | Re-run `huggingface-cli login`, check internet access |
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
|
||||
| CUDA out of memory with 405B | Insufficient GPU memory | Use 70B model or reduce max_model_len parameter |
|
||||
|
||||
Loading…
Reference in New Issue
Block a user