mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-23 02:23:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
8f5d38151e
commit
c5e890f836
@ -158,7 +158,8 @@ Open a web browser and navigate to `http://<SPARK_IP>:8188` where `<SPARK_IP>` i
|
||||
|
||||
If you need to remove the installation completely, follow these steps:
|
||||
|
||||
> **Warning:** This will delete all installed packages and downloaded models.
|
||||
> [!WARNING]
|
||||
> This will delete all installed packages and downloaded models.
|
||||
|
||||
```bash
|
||||
deactivate
|
||||
|
||||
@ -66,12 +66,9 @@ applications, and manage your DGX Spark remotely from your laptop.
|
||||
|
||||
## Time & risk
|
||||
|
||||
**Time estimate:** 5-10 minutes
|
||||
|
||||
**Risk level:** Low - SSH setup involves credential configuration but no system-level changes
|
||||
to the DGX Spark device
|
||||
|
||||
**Rollback:** SSH key removal can be done by editing `~/.ssh/authorized_keys` on the DGX Spark.
|
||||
- **Time estimate:** 5-10 minutes
|
||||
- **Risk level:** Low - SSH setup involves credential configuration but no system-level changes to the DGX Spark device
|
||||
- **Rollback:** SSH key removal can be done by editing `~/.ssh/authorized_keys` on the DGX Spark.
|
||||
|
||||
## Connect with NVIDIA Sync
|
||||
|
||||
@ -146,9 +143,9 @@ Finally, connect your DGX Spark by filling out the form:
|
||||
- **Username**: Your DGX Spark user account name
|
||||
- **Password**: Your DGX Spark user account password
|
||||
|
||||
**Note:** Your password is used only during this initial setup to configure SSH key-based
|
||||
authentication. It is not stored or transmitted after setup completion. NVIDIA Sync will SSH into your device and
|
||||
configure its locally provisioned SSH key pair.
|
||||
> [!NOTE]
|
||||
> Your password is used only during this initial setup to configure SSH key-based authentication. It is not stored or transmitted after setup completion. NVIDIA Sync will SSH into your device and
|
||||
> configure its locally provisioned SSH key pair.
|
||||
|
||||
Click add "Add" and NVIDIA Sync will automatically:
|
||||
|
||||
|
||||
@ -198,7 +198,8 @@ From the Settings page, under the "Updates" tab:
|
||||
2. Click "Update Now" to initiate the update process
|
||||
3. Wait for the update to complete and your device to reboot
|
||||
|
||||
> **Warning**: System updates will upgrade packages, firmware if available, and trigger a reboot. Save your work before proceeding.
|
||||
> [!WARNING]
|
||||
> System updates will upgrade packages, firmware if available, and trigger a reboot. Save your work before proceeding.
|
||||
|
||||
## Step 7. Cleanup and rollback
|
||||
|
||||
@ -207,7 +208,8 @@ To clean up resources and return system to original state:
|
||||
1. Stop any running JupyterLab instances via dashboard
|
||||
2. Delete the JupyterLab working directory
|
||||
|
||||
> **Warning**: If you ran system updates, the only rollback is to restore from a system backup or recovery media.
|
||||
> [!WARNING]
|
||||
> If you ran system updates, the only rollback is to restore from a system backup or recovery media.
|
||||
|
||||
No permanent changes are made to the system during normal dashboard usage.
|
||||
|
||||
|
||||
@ -111,7 +111,8 @@ After playing around with the base model, you have 2 possible next steps.
|
||||
* If you already have fine-tuned LoRAs placed inside `models/loras/`, please skip to `Step 7. Fine-tuned model inference` section.
|
||||
* If you wish to train a LoRA for your custom concepts, first make sure that the ComfyUI inference container is brought down before proceeding to train. You can bring it down by interrupting the terminal with `Ctrl+C` keystroke.
|
||||
|
||||
> **Note**: To clear out any extra occupied memory from your system, execute the following command outside the container after interrupting the ComfyUI server.
|
||||
> [!NOTE]
|
||||
> To clear out any extra occupied memory from your system, execute the following command outside the container after interrupting the ComfyUI server.
|
||||
```bash
|
||||
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
|
||||
```
|
||||
|
||||
@ -99,7 +99,8 @@ git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-
|
||||
## Step 3. Build the Docker image
|
||||
|
||||
|
||||
> **Warning:** This command will download a base image and build a container locally to support this environment.
|
||||
> [!WARNING]
|
||||
> This command will download a base image and build a container locally to support this environment.
|
||||
|
||||
```bash
|
||||
cd jax/assets
|
||||
|
||||
@ -183,7 +183,8 @@ llamafactory-cli export examples/merge_lora/llama3_lora_sft.yaml
|
||||
|
||||
## Step 11. Cleanup and rollback
|
||||
|
||||
> **Warning:** This will delete all training progress and checkpoints.
|
||||
> [!WARNING]
|
||||
> This will delete all training progress and checkpoints.
|
||||
|
||||
To remove all generated files and free up storage space:
|
||||
|
||||
|
||||
@ -266,7 +266,8 @@ You can now upload a chest X-ray image and ask questions directly in the chat in
|
||||
To stop and remove the containers and network, run the following commands. This will not
|
||||
delete your downloaded model weights.
|
||||
|
||||
> **Warning:** This will stop all running containers and remove the network.
|
||||
> [!WARNING]
|
||||
> This will stop all running containers and remove the network.
|
||||
|
||||
```bash
|
||||
## Stop containers
|
||||
|
||||
@ -37,7 +37,8 @@ The setup includes:
|
||||
- No other processes running on the DGX Spark GPU
|
||||
- Enough disk space for model downloads
|
||||
|
||||
> **Note**: This demo uses ~120 out of the 128GB of DGX Spark's memory by default.
|
||||
> [!NOTE]
|
||||
> This demo uses ~120 out of the 128GB of DGX Spark's memory by default.
|
||||
> Please ensure that no other workloads are running on your Spark using `nvidia-smi`, or switch to a smaller supervisor model like gpt-oss-20B.
|
||||
|
||||
|
||||
@ -104,7 +105,8 @@ watch 'docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"'
|
||||
|
||||
Open your browser and go to: http://localhost:3000
|
||||
|
||||
> **Note**: If you are running this on a remote GPU via an SSH connection, in a new terminal window, you need to run the following command to be able to access the UI at localhost:3000 and for the UI to be able to communicate to the backend at localhost:8000.
|
||||
> [!NOTE]
|
||||
> If you are running this on a remote GPU via an SSH connection, in a new terminal window, you need to run the following command to be able to access the UI at localhost:3000 and for the UI to be able to communicate to the backend at localhost:8000.
|
||||
|
||||
>```ssh -L 3000:localhost:3000 -L 8000:localhost:8000 username@IP-address```
|
||||
|
||||
|
||||
@ -128,7 +128,8 @@ python3 demo_txt2img_flux.py "a beautiful photograph of Mt. Fuji during cherry b
|
||||
|
||||
Test the faster Flux.1 Schnell variant with different precision formats.
|
||||
|
||||
> **Warning**: FP16 Flux.1 Schnell requires >48GB VRAM for native export
|
||||
> [!WARNING]
|
||||
> FP16 Flux.1 Schnell requires >48GB VRAM for native export
|
||||
|
||||
**Substep A. FP16 precision (high VRAM requirement)**
|
||||
|
||||
@ -190,7 +191,8 @@ python3 -c "import tensorrt as trt; print(f'TensorRT version: {trt.__version__}'
|
||||
|
||||
Remove downloaded models and exit container environment to free disk space.
|
||||
|
||||
> **Warning**: This will delete all cached models and generated images
|
||||
> [!WARNING]
|
||||
> This will delete all cached models and generated images
|
||||
|
||||
```bash
|
||||
## Exit container
|
||||
|
||||
@ -280,7 +280,8 @@ print('✅ Setup complete')
|
||||
|
||||
Remove the installation and restore the original environment if needed. These commands safely remove all installed components.
|
||||
|
||||
> **Warning:** This will delete all virtual environments and downloaded models. Ensure you have backed up any important training checkpoints.
|
||||
> [!WARNING]
|
||||
> This will delete all virtual environments and downloaded models. Ensure you have backed up any important training checkpoints.
|
||||
|
||||
```bash
|
||||
## Remove virtual environment
|
||||
|
||||
@ -152,7 +152,8 @@ Expected output should be a JSON response containing a completion field with gen
|
||||
|
||||
Remove the running container and optionally clean up cached model files.
|
||||
|
||||
> **Warning:** Removing cached models will require re-downloading on next run.
|
||||
> [!WARNING]
|
||||
> Removing cached models will require re-downloading on next run.
|
||||
|
||||
```bash
|
||||
docker stop $CONTAINER_NAME
|
||||
|
||||
@ -226,7 +226,8 @@ curl -X POST http://localhost:8000/v1/chat/completions \
|
||||
|
||||
To clean up the environment and remove generated files:
|
||||
|
||||
> **Warning:** This will permanently delete all quantized model files and cached data.
|
||||
> [!WARNING]
|
||||
> This will permanently delete all quantized model files and cached data.
|
||||
|
||||
```bash
|
||||
## Remove output directory and all quantized models
|
||||
|
||||
@ -370,8 +370,8 @@ deactivate
|
||||
rm -rf openfold_env/
|
||||
```
|
||||
|
||||
> **Warning:** The following will delete downloaded databases (>3TB). Only run if you need to
|
||||
> free disk space and are willing to re-download.
|
||||
> [!WARNING]
|
||||
> The following will delete downloaded databases (>3TB). Only run if you need to free disk space and are willing to re-download.
|
||||
|
||||
```bash
|
||||
## Remove all databases (requires re-download)
|
||||
|
||||
@ -119,7 +119,8 @@ python Llama3_3B_full_finetuning.py
|
||||
|---------|--------|-----|
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> **Note:** DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
|
||||
@ -151,7 +151,8 @@ Upload a custom dataset, adjust the Router prompt, and submit custom queries to
|
||||
|
||||
This step explains how to remove the project if needed and what changes were made to your system.
|
||||
|
||||
> **Warning:** This will permanently delete the project and all associated data.
|
||||
> [!WARNING]
|
||||
> This will permanently delete the project and all associated data.
|
||||
|
||||
To remove the project completely:
|
||||
|
||||
|
||||
@ -203,7 +203,8 @@ Common issues and their resolutions:
|
||||
Stop and remove containers to clean up resources. This step returns your system to its
|
||||
original state.
|
||||
|
||||
> **Warning:** This will stop all SGLang containers and remove temporary data.
|
||||
> [!WARNING]
|
||||
> This will stop all SGLang containers and remove temporary data.
|
||||
|
||||
```bash
|
||||
## Stop all SGLang containers
|
||||
|
||||
@ -108,7 +108,8 @@ The following models are supported with TensorRT-LLM on Spark. All listed models
|
||||
| **Llama-4-Scout-17B-16E-Instruct** | NVFP4 | ✅ | `nvidia/Llama-4-Scout-17B-16E-Instruct-FP4` |
|
||||
| **Qwen3-235B-A22B (two Sparks only)** | NVFP4 | ✅ | `nvidia/Qwen3-235B-A22B-FP4` |
|
||||
|
||||
**Note:** You can use the NVFP4 Quantization documentation to generate your own NVFP4-quantized checkpoints for your favorite models. This enables you to take advantage of the performance and memory benefits of NVFP4 quantization even for models not already published by NVIDIA.
|
||||
> [!NOTE]
|
||||
> You can use the NVFP4 Quantization documentation to generate your own NVFP4-quantized checkpoints for your favorite models. This enables you to take advantage of the performance and memory benefits of NVFP4 quantization even for models not already published by NVIDIA.
|
||||
|
||||
Reminder: not all model architectures are supported for NVFP4 quantization.
|
||||
|
||||
@ -396,7 +397,8 @@ curl -s http://localhost:8355/v1/chat/completions \
|
||||
|
||||
Remove downloaded models and containers to free up space when testing is complete.
|
||||
|
||||
> **Warning:** This will delete all cached models and may require re-downloading for future runs.
|
||||
> [!WARNING]
|
||||
> This will delete all cached models and may require re-downloading for future runs.
|
||||
|
||||
```bash
|
||||
## Remove Hugging Face cache
|
||||
@ -519,7 +521,8 @@ On your primary node, deploy the TRT-LLM multi-node stack by downloading the [**
|
||||
```bash
|
||||
docker stack deploy -c $HOME/docker-compose.yml trtllm-multinode
|
||||
```
|
||||
**Note:** Ensure you download both files into the same directory from which you are running the command.
|
||||
> [!NOTE]
|
||||
> Ensure you download both files into the same directory from which you are running the command.
|
||||
|
||||
You can verify the status of your worker nodes using the following
|
||||
```bash
|
||||
@ -534,7 +537,8 @@ oe9k5o6w41le trtllm-multinode_trtllm.1 nvcr.io/nvidia/tensorrt-llm/relea
|
||||
phszqzk97p83 trtllm-multinode_trtllm.2 nvcr.io/nvidia/tensorrt-llm/release:1.0.0rc3 spark-1b3b Running Running 2 minutes ago
|
||||
```
|
||||
|
||||
**Note:** If your "Current state" is not "Running", see troubleshooting section for more information.
|
||||
> [!NOTE]
|
||||
> If your "Current state" is not "Running", see troubleshooting section for more information.
|
||||
|
||||
### Step 7. Create hosts file
|
||||
|
||||
@ -603,7 +607,8 @@ docker exec \
|
||||
|
||||
This will start the TensorRT-LLM server on port 8355. You can then make inference requests to `http://localhost:8355` using the OpenAI-compatible API format.
|
||||
|
||||
**Note:** You might see a warning such as `UCX WARN network device 'enp1s0f0np0' is not available, please use one or more of`. You can ignore this warning if your inference is successful, as it's related to only one of your two CX-7 ports being used, and the other being left unused.
|
||||
> [!NOTE]
|
||||
> You might see a warning such as `UCX WARN network device 'enp1s0f0np0' is not available, please use one or more of`. You can ignore this warning if your inference is successful, as it's related to only one of your two CX-7 ports being used, and the other being left unused.
|
||||
|
||||
**Expected output:** Server startup logs and ready message.
|
||||
|
||||
@ -630,7 +635,8 @@ Stop and remove containers by using the following command on the leader node:
|
||||
docker stack rm trtllm-multinode
|
||||
```
|
||||
|
||||
> **Warning:** This removes all inference data and performance reports. Copy `/opt/*perf-report.json` files before cleanup if needed.
|
||||
> [!WARNING]
|
||||
> This removes all inference data and performance reports. Copy `/opt/*perf-report.json` files before cleanup if needed.
|
||||
|
||||
Remove downloaded models to free disk space:
|
||||
|
||||
@ -659,7 +665,8 @@ After setting up TensorRT-LLM inference server in either single-node or multi-no
|
||||
Run the following command on the DGX Spark node where you have the TensorRT-LLM inference server running.
|
||||
For multi-node setup, this would be the primary node.
|
||||
|
||||
**Note:** If you used a different port for your OpenAI-compatible API server, adjust the `OPENAI_API_BASE_URL="http://localhost:8355/v1"` to match the IP and port of your TensorRT-LLM inference server.
|
||||
> [!NOTE]
|
||||
> If you used a different port for your OpenAI-compatible API server, adjust the `OPENAI_API_BASE_URL="http://localhost:8355/v1"` to match the IP and port of your TensorRT-LLM inference server.
|
||||
|
||||
```bash
|
||||
docker run \
|
||||
@ -696,10 +703,13 @@ You should see the Open WebUI interface at http://localhost:8080 where you can:
|
||||
|
||||
You can select your model(s) from the dropdown menu on the top left corner. That's all you need to do to start using Open WebUI with your deployed models.
|
||||
|
||||
**Note:** If accessing from a remote machine, replace localhost with your DGX Spark's IP address.
|
||||
> [!NOTE]
|
||||
> If accessing from a remote machine, replace localhost with your DGX Spark's IP address.
|
||||
|
||||
### Step 3. Cleanup and rollback
|
||||
**Warning:** This removes all chat data and may require re-uploading for future runs.
|
||||
> [!WARNING]
|
||||
> This removes all chat data and may require re-uploading for future runs.
|
||||
|
||||
Remove the container by using the following command:
|
||||
```bash
|
||||
docker stop open-webui
|
||||
|
||||
@ -89,7 +89,8 @@ docker exec ollama-compose ollama pull <model-name>
|
||||
|
||||
Browse available models at [https://ollama.com/search](https://ollama.com/search)
|
||||
|
||||
> **Note**: The unified memory architecture enables running larger models like 70B parameters, which produce significantly more accurate knowledge triples.
|
||||
> [!NOTE]
|
||||
> The unified memory architecture enables running larger models like 70B parameters, which produce significantly more accurate knowledge triples.
|
||||
|
||||
## Step 4. Access the web interface
|
||||
|
||||
|
||||
@ -244,7 +244,8 @@ Expected output includes a generated haiku response.
|
||||
|
||||
## Step 10. (Optional) Deploy Llama 3.1 405B model
|
||||
|
||||
> **Warning:** 405B model has insufficient memory headroom for production use.
|
||||
> [!WARNING]
|
||||
> 405B model has insufficient memory headroom for production use.
|
||||
|
||||
Download the quantized 405B model for testing purposes only.
|
||||
|
||||
@ -300,7 +301,8 @@ docker exec node nvidia-smi --query-gpu=memory.used,memory.total --format=csv
|
||||
|
||||
Remove temporary configurations and containers when testing is complete.
|
||||
|
||||
> **Warning:** This will stop all inference services and remove cluster configuration.
|
||||
> [!WARNING]
|
||||
> This will stop all inference services and remove cluster configuration.
|
||||
|
||||
```bash
|
||||
## Stop containers on both nodes
|
||||
|
||||
@ -104,7 +104,8 @@ sh launch.sh
|
||||
## Enter the mounted directory within the container
|
||||
cd /vlm_finetuning
|
||||
```
|
||||
**Note**: The same Docker container and launch commands work for both image and video VLM recipes. The container features all necessary dependencies, including FFmpeg, Decord, and optimized libraries for both workflows.
|
||||
> [!NOTE]
|
||||
> The same Docker container and launch commands work for both image and video VLM recipes. The container features all necessary dependencies, including FFmpeg, Decord, and optimized libraries for both workflows.
|
||||
|
||||
## Step 5. [Option A] For image VLM fine-tuning (Wildfire Detection)
|
||||
|
||||
@ -129,7 +130,8 @@ cd ui_image/data
|
||||
|
||||
For this fine-tuning playbook, we will use the [Wildfire Prediction Dataset](https://www.kaggle.com/datasets/abdelghaniaaba/wildfire-prediction-dataset) from Kaggle. Visit the kaggle dataset page [here](https://www.kaggle.com/datasets/abdelghaniaaba/wildfire-prediction-dataset) to click the download button. Select the `cURL` option in the `Download Via` dropdown and copy the curl command.
|
||||
|
||||
> **Note**: You will need to be logged into Kaggle and may need to accept the dataset terms before the download link works.
|
||||
> [!NOTE]
|
||||
> You will need to be logged into Kaggle and may need to accept the dataset terms before the download link works.
|
||||
|
||||
Run the following commands in your container:
|
||||
|
||||
@ -235,7 +237,8 @@ dataset/
|
||||
|
||||
#### 6.2. Model download
|
||||
|
||||
> **Note**: These instructions assume you are already inside the Docker container. For container setup, refer to the section above to `Build the Docker container`.
|
||||
> [!NOTE]
|
||||
> These instructions assume you are already inside the Docker container. For container setup, refer to the section above to `Build the Docker container`.
|
||||
|
||||
```bash
|
||||
hf download OpenGVLab/InternVL3-8B
|
||||
@ -262,7 +265,8 @@ Scroll down, enter your prompt in the chat box and hit `Generate`. Your prompt w
|
||||
|
||||
If you are proceeding to train a fine-tuned model, ensure that the streamlit demo UI is brought down before proceeding to train. You can bring it down by interrupting the terminal with `Ctrl+C` keystroke.
|
||||
|
||||
> **Note**: To clear out any extra occupied memory from your system, execute the following command outside the container after interrupting the ComfyUI server.
|
||||
> [!NOTE]
|
||||
> To clear out any extra occupied memory from your system, execute the following command outside the container after interrupting the ComfyUI server.
|
||||
```bash
|
||||
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
|
||||
```
|
||||
@ -294,7 +298,8 @@ You can monitor and evaluate the training progress and metrics, as they will be
|
||||
|
||||
After training, ensure that you shutdown the jupyter kernel in the notebook and kill the jupyter server in the terminal with a `Ctrl+C` keystroke.
|
||||
|
||||
> **Note**: To clear out any extra occupied memory from your system, execute the following command outside the container after interrupting the ComfyUI server.
|
||||
> [!NOTE]
|
||||
> To clear out any extra occupied memory from your system, execute the following command outside the container after interrupting the ComfyUI server.
|
||||
```bash
|
||||
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
|
||||
```
|
||||
|
||||
@ -152,7 +152,8 @@ Within VS Code:
|
||||
|
||||
## Step 8. Uninstalling VS Code
|
||||
|
||||
> **Warning:** Uninstalling VS Code will remove all user settings and extensions.
|
||||
> [!WARNING]
|
||||
> Uninstalling VS Code will remove all user settings and extensions.
|
||||
|
||||
To remove VS Code if needed:
|
||||
```bash
|
||||
|
||||
@ -128,7 +128,8 @@ Create a Docker network that will be shared between VSS services and CV pipeline
|
||||
docker network create vss-shared-network
|
||||
```
|
||||
|
||||
> **Warning:** If the network already exists, you may see an error. Remove it first with `docker network rm vss-shared-network` if needed.
|
||||
> [!WARNING]
|
||||
> If the network already exists, you may see an error. Remove it first with `docker network rm vss-shared-network` if needed.
|
||||
|
||||
## Step 6. Authenticate with NVIDIA Container Registry
|
||||
|
||||
@ -369,7 +370,8 @@ Follow the steps [here](https://docs.nvidia.com/vss/latest/content/ui_app.html)
|
||||
|
||||
To completely remove the VSS deployment and free up system resources:
|
||||
|
||||
> **Warning:** This will destroy all processed video data and analysis results.
|
||||
> [!WARNING]
|
||||
> This will destroy all processed video data and analysis results.
|
||||
|
||||
```bash
|
||||
## For Event Reviewer deployment
|
||||
|
||||
Loading…
Reference in New Issue
Block a user