mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 01:53:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
3ed5b3b073
commit
a6f94052b1
@ -48,7 +48,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
|
||||
- [Install and Use vLLM for Inference](nvidia/vllm/)
|
||||
- [Vision-Language Model Fine-tuning](nvidia/vlm-finetuning/)
|
||||
- [VS Code](nvidia/vscode/)
|
||||
- [Build a Video Search and Summarization (VSS) Agent](nvidia/vss/)
|
||||
- [Video Search and Summarization](nvidia/vss/)
|
||||
|
||||
## Resources
|
||||
|
||||
|
||||
@ -52,7 +52,7 @@ All necessary files for the playbook can be found [here on GitHub](https://githu
|
||||
|
||||
## Step 1. Verify system requirements
|
||||
|
||||
Check your NVIDIA Spark device meets the prerequisites for NeMo AutoModel installation. This step runs on the host system to confirm CUDA toolkit availability and Python version compatibility.
|
||||
Check your NVIDIA Spark device meets the prerequisites for [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel) installation. This step runs on the host system to confirm CUDA toolkit availability and Python version compatibility.
|
||||
|
||||
```bash
|
||||
## Verify CUDA installation
|
||||
@ -169,6 +169,19 @@ uv run --frozen --no-sync python -c "import nemo_automodel; print('✅ NeMo Auto
|
||||
|
||||
## Check available examples
|
||||
ls -la examples/
|
||||
|
||||
## Example output:
|
||||
$ ls -la examples/
|
||||
total 36
|
||||
drwxr-xr-x 9 akoumparouli domain-users 4096 Oct 16 14:52 .
|
||||
drwxr-xr-x 16 akoumparouli domain-users 4096 Oct 16 14:52 ..
|
||||
drwxr-xr-x 3 akoumparouli domain-users 4096 Oct 16 14:52 benchmark
|
||||
drwxr-xr-x 3 akoumparouli domain-users 4096 Oct 16 14:52 diffusion
|
||||
drwxr-xr-x 20 akoumparouli domain-users 4096 Oct 16 14:52 llm_finetune
|
||||
drwxr-xr-x 3 akoumparouli domain-users 4096 Oct 14 09:27 llm_kd
|
||||
drwxr-xr-x 2 akoumparouli domain-users 4096 Oct 16 14:52 llm_pretrain
|
||||
drwxr-xr-x 6 akoumparouli domain-users 4096 Oct 14 09:27 vlm_finetune
|
||||
drwxr-xr-x 2 akoumparouli domain-users 4096 Oct 14 09:27 vlm_generate
|
||||
```
|
||||
|
||||
## Step 8. Explore available examples
|
||||
@ -193,36 +206,37 @@ First, export your HF_TOKEN so that gated models can be downloaded.
|
||||
export HF_TOKEN=<your_huggingface_token>
|
||||
```
|
||||
> [!NOTE]
|
||||
> Please Replace `<your_huggingface_token>` with your Hugging Face access token to access gated models (e.g., Llama).
|
||||
|
||||
**Full Fine-tuning example:**
|
||||
|
||||
Once inside the `Automodel` directory you cloned from github, run:
|
||||
|
||||
```bash
|
||||
uv run --frozen --no-sync \
|
||||
examples/llm_finetune/finetune.py \
|
||||
-c examples/llm_finetune/llama3_2/llama3_2_1b_squad.yaml \
|
||||
--step_scheduler.local_batch_size 1 \
|
||||
--loss_fn._target_ nemo_automodel.components.loss.te_parallel_ce.TEParallelCrossEntropy \
|
||||
--model.pretrained_model_name_or_path Qwen/Qwen3-8B
|
||||
```
|
||||
These overrides ensure the Qwen3-8B SFT run behaves as expected:
|
||||
- `--model.pretrained_model_name_or_path`: selects the Qwen/Qwen3-8B model to fine-tune (weights fetched via your Hugging Face token).
|
||||
- `--loss_fn._target_`: uses the TransformerEngine-parallel cross-entropy loss variant compatible with tensor-parallel training for large LLMs.
|
||||
- `--step_scheduler.local_batch_size`: sets the per-GPU micro-batch size to 1 to fit in memory; overall effective batch size is still driven by gradient accumulation and data/tensor parallel settings from the recipe.
|
||||
> Replace `<your_huggingface_token>` with your personal Hugging Face access token. A valid token is required to download any gated model.
|
||||
>
|
||||
> - Generate a token: [Hugging Face tokens](https://huggingface.co/settings/tokens), guide available [here](https://huggingface.co/docs/hub/en/security-tokens).
|
||||
> - Request and receive access on each model's page (and accept license/terms) before attempting downloads.
|
||||
> - Llama-3.1-8B: [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)
|
||||
> - Qwen3-8B: [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
|
||||
> - Mixtral-8x7B: [mistralai/Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B)
|
||||
>
|
||||
> The same steps apply for any other gated model you use: visit its model card on Hugging Face, request access, accept the license, and wait for approval.
|
||||
|
||||
**LoRA fine-tuning example:**
|
||||
|
||||
Execute a basic fine-tuning example to validate the complete setup. This demonstrates parameter-efficient fine-tuning using a small model suitable for testing.
|
||||
For the examples below, we are using YAML for configuration, and parameter overrides are passed as command line arguments.
|
||||
|
||||
```bash
|
||||
## Run basic LLM fine-tuning example
|
||||
uv run --frozen --no-sync \
|
||||
examples/llm_finetune/finetune.py \
|
||||
-c examples/llm_finetune/llama3_2/llama3_2_1b_squad_peft.yaml \
|
||||
--model.pretrained_model_name_or_path meta-llama/Llama-3.1-8B
|
||||
--model.pretrained_model_name_or_path meta-llama/Llama-3.1-8B \
|
||||
--packed_sequence.packed_sequence_size 1024 \
|
||||
--step_scheduler.max_steps 100
|
||||
```
|
||||
|
||||
These overrides ensure the Llama-3.1-8B LoRA run behaves as expected:
|
||||
- `--model.pretrained_model_name_or_path`: selects the Llama-3.1-8B model to fine-tune from the Hugging Face model hub (weights fetched via your Hugging Face token).
|
||||
- `--packed_sequence.packed_sequence_size`: sets the packed sequence size to 1024 to enable packed sequence training.
|
||||
- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 100 for demonstation purposes, please adjust this based on your needs.
|
||||
|
||||
|
||||
**QLoRA fine-tuning example:**
|
||||
|
||||
We can use QLoRA to fine-tune large models in a memory-efficient manner.
|
||||
@ -233,50 +247,61 @@ examples/llm_finetune/finetune.py \
|
||||
-c examples/llm_finetune/llama3_1/llama3_1_8b_squad_qlora.yaml \
|
||||
--model.pretrained_model_name_or_path meta-llama/Meta-Llama-3-70B \
|
||||
--loss_fn._target_ nemo_automodel.components.loss.te_parallel_ce.TEParallelCrossEntropy \
|
||||
--step_scheduler.local_batch_size 1
|
||||
--step_scheduler.local_batch_size 1 \
|
||||
--packed_sequence.packed_sequence_size 1024 \
|
||||
--step_scheduler.max_steps 100
|
||||
```
|
||||
|
||||
These overrides ensure the 70B QLoRA run behaves as expected:
|
||||
- `--model.pretrained_model_name_or_path`: selects the 70B base model to fine-tune (weights fetched via your Hugging Face token).
|
||||
- `--loss_fn._target_`: uses the TransformerEngine-parallel cross-entropy loss variant compatible with tensor-parallel training for large LLMs.
|
||||
- `--step_scheduler.local_batch_size`: sets the per-GPU micro-batch size to 1 to fit 70B in memory; overall effective batch size is still driven by gradient accumulation and data/tensor parallel settings from the recipe.
|
||||
- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 100 for demonstation purposes, please adjust this based on your needs.
|
||||
- `--packed_sequence.packed_sequence_size`: sets the packed sequence size to 1024 to enable packed sequence training.
|
||||
|
||||
## Step 10. Validate training output
|
||||
**Full Fine-tuning example:**
|
||||
|
||||
Check that fine-tuning completed successfully and inspect the generated model artifacts. This confirms the training pipeline works correctly on your Spark device.
|
||||
Once inside the `Automodel` directory you cloned from GitHub, run:
|
||||
|
||||
```bash
|
||||
## Check training logs
|
||||
ls -la logs/
|
||||
|
||||
## Verify model checkpoint creation
|
||||
ls -la checkpoints/
|
||||
|
||||
## Test model inference (if applicable)
|
||||
uv run python -c "
|
||||
import torch
|
||||
print('GPU available:', torch.cuda.is_available())
|
||||
print('GPU count:', torch.cuda.device_count())
|
||||
"
|
||||
uv run --frozen --no-sync \
|
||||
examples/llm_finetune/finetune.py \
|
||||
-c examples/llm_finetune/qwen/qwen3_8b_squad_spark.yaml \
|
||||
--model.pretrained_model_name_or_path Qwen/Qwen3-8B \
|
||||
--step_scheduler.local_batch_size 1 \
|
||||
--step_scheduler.max_steps 100 \
|
||||
--packed_sequence.packed_sequence_size 1024
|
||||
```
|
||||
These overrides ensure the Qwen3-8B SFT run behaves as expected:
|
||||
- `--model.pretrained_model_name_or_path`: selects the Qwen/Qwen3-8B model to fine-tune from the Hugging Face model hub (weights fetched via your Hugging Face token). Adjust this if you want to fine-tune a different model.
|
||||
- `--step_scheduler.max_steps`: sets the maximum number of training steps. We set it to 100 for demonstation purposes, please adjust this based on your needs.
|
||||
- `--step_scheduler.local_batch_size`: sets the per-GPU micro-batch size to 1 to fit in memory; overall effective batch size is still driven by gradient accumulation and data/tensor parallel settings from the recipe.
|
||||
|
||||
## Step 11. Validate complete setup
|
||||
|
||||
Perform final validation to ensure all components are working correctly. This comprehensive check confirms the environment is ready for production fine-tuning workflows.
|
||||
## Step 10. Validate successful training completion
|
||||
|
||||
Validate the fine-tuned model by inspecting artifacts contained in the checkpoint directory.
|
||||
|
||||
```bash
|
||||
## Test complete pipeline
|
||||
uv run python -c "
|
||||
import nemo_automodel
|
||||
import torch
|
||||
print('✅ NeMo AutoModel version:', nemo_automodel.__version__)
|
||||
print('✅ CUDA available:', torch.cuda.is_available())
|
||||
print('✅ GPU count:', torch.cuda.device_count())
|
||||
print('✅ Setup complete')
|
||||
"
|
||||
## Inspect logs and checkpoint output.
|
||||
## The LATEST is a symlink pointing to the latest checkpoint.
|
||||
## The checkpoint is the one that was saved during training.
|
||||
## below is an example of the expected output (username and domain-users are placeholders).
|
||||
ls -lah checkpoints/LATEST/
|
||||
|
||||
## $ ls -lah checkpoints/LATEST/
|
||||
## total 32K
|
||||
## drwxr-xr-x 6 akoumparouli domain-users 4.0K Oct 16 22:33 .
|
||||
## drwxr-xr-x 4 akoumparouli domain-users 4.0K Oct 16 22:33 ..
|
||||
## -rw-r--r-- 1 akoumparouli domain-users 1.6K Oct 16 22:33 config.yaml
|
||||
## drwxr-xr-x 2 akoumparouli domain-users 4.0K Oct 16 22:33 dataloader
|
||||
## drwxr-xr-x 2 akoumparouli domain-users 4.0K Oct 16 22:33 model
|
||||
## drwxr-xr-x 2 akoumparouli domain-users 4.0K Oct 16 22:33 optim
|
||||
## drwxr-xr-x 2 akoumparouli domain-users 4.0K Oct 16 22:33 rng
|
||||
## -rw-r--r-- 1 akoumparouli domain-users 1.3K Oct 16 22:33 step_scheduler.pt
|
||||
```
|
||||
|
||||
## Step 13. Cleanup and rollback
|
||||
## Step 11. Cleanup and rollback (Optional)
|
||||
|
||||
Remove the installation and restore the original environment if needed. These commands safely remove all installed components.
|
||||
|
||||
@ -297,8 +322,42 @@ pip3 uninstall uv
|
||||
## Clear Python cache
|
||||
rm -rf ~/.cache/pip
|
||||
```
|
||||
## Step 12. Optional: Publish your fine-tuned model checkpoint on Hugging Face Hub
|
||||
|
||||
## Step 14. Next steps
|
||||
Publish your fine-tuned model checkpoint on Hugging Face Hub.
|
||||
> [!NOTE]
|
||||
> This is an optional step and is not required for using the fine-tuned model.
|
||||
> It is useful if you want to share your fine-tuned model with others or use it in other projects.
|
||||
> You can also use the fine-tuned model in other projects by cloning the repository and using the checkpoint.
|
||||
> To use the fine-tuned model in other projects, you need to have the Hugging Face CLI installed.
|
||||
> You can install the Hugging Face CLI by running `pip install huggingface-cli`.
|
||||
> For more information, please refer to the [Hugging Face CLI documentation](https://huggingface.co/docs/huggingface_hub/en/guides/cli).
|
||||
|
||||
> [!TIP]
|
||||
> You can use the `hf` command to upload the fine-tuned model checkpoint to Hugging Face Hub.
|
||||
> For more information, please refer to the [Hugging Face CLI documentation](https://huggingface.co/docs/huggingface_hub/en/guides/cli).
|
||||
|
||||
```bash
|
||||
## Publish the fine-tuned model checkpoint to Hugging Face Hub
|
||||
## will be published under the namespace <your_huggingface_username>/my-cool-model, adjust name as needed.
|
||||
hf upload my-cool-model checkpoints/LATEST/model
|
||||
```
|
||||
|
||||
> [!TIP]
|
||||
> The above command can fail if you don't have write permissions to the Hugging Face Hub, with the HF_TOKEN you used.
|
||||
> Sample error message:
|
||||
> ```bash
|
||||
> akoumparouli@1604ab7-lcedt:/mnt/4tb/auto/Automodel8$ hf upload my-cool-model checkpoints/LATEST/model
|
||||
> Traceback (most recent call last):
|
||||
> File "/home/akoumparouli/.local/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
|
||||
> response.raise_for_status()
|
||||
> File "/home/akoumparouli/.local/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
|
||||
> raise HTTPError(http_error_msg, response=self)
|
||||
> requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://huggingface.co/api/repos/create
|
||||
> ```
|
||||
> To fix this, you need to create an access token with *write* permissions, please see the Hugging Face guide [here](https://huggingface.co/docs/hub/en/security-tokens) for instructions.
|
||||
|
||||
## Step 12. Next steps
|
||||
|
||||
Begin using NeMo AutoModel for your specific fine-tuning tasks. Start with provided recipes and customize based on your model requirements and dataset.
|
||||
|
||||
@ -310,7 +369,7 @@ cp recipes/llm_finetune/finetune.py my_custom_training.py
|
||||
## Then run: uv run my_custom_training.py
|
||||
```
|
||||
|
||||
Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Automodel) for advanced recipes, documentation, and community examples. Consider setting up custom datasets, experimenting with different model architectures, and scaling to multi-node distributed training for larger models.
|
||||
Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Automodel) for more recipes, documentation, and community examples. Consider setting up custom datasets, experimenting with different model architectures, and scaling to multi-node distributed training for larger models.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
@ -324,8 +383,8 @@ Explore the [NeMo AutoModel GitHub repository](https://github.com/NVIDIA-NeMo/Au
|
||||
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your [HuggingFace token](https://huggingface.co/docs/hub/en/security-tokens); and request access to the [gated model](https://huggingface.co/docs/hub/en/models-gated#customize-requested-information) on your web browser |
|
||||
|
||||
> [!NOTE]
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
|
||||
> With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
|
||||
> the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
|
||||
```bash
|
||||
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
|
||||
|
||||
@ -167,10 +167,7 @@ Add additional model entries for any other Ollama models you wish to host remote
|
||||
## Common Issues
|
||||
|
||||
**1. Ollama not starting**
|
||||
- Verify GPU drivers are installed correctly.
|
||||
Run `nvidia-smi` in the terminal. If the command fails check DGX Dashboard for updates to your DGX Spark.
|
||||
If there are no updates or updates do not correct the issue, create a thread on the DGX Spark/GB10 User forum here :
|
||||
https://forums.developer.nvidia.com/c/accelerated-computing/dgx-spark-gb10/dgx-spark-gb10/
|
||||
- Verify Docker and GPU drivers are installed correctly.
|
||||
- Run `ollama serve` on the DGX Spark to view Ollama logs.
|
||||
|
||||
**2. Continue can't connect over the network**
|
||||
|
||||
@ -340,7 +340,7 @@ http://192.168.100.10:8265
|
||||
| Container registry authentication fails | Invalid or expired GitLab token | Generate new auth token |
|
||||
| SM_121a architecture not recognized | Missing LLVM patches | Verify SM_121a patches applied to LLVM source |
|
||||
|
||||
## Common Issues for running on two Sparks
|
||||
## Common Issues for running on two Starks
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|--------|-----|
|
||||
| Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
|
||||
|
||||
@ -1,4 +1,4 @@
|
||||
# Build a Video Search and Summarization (VSS) Agent
|
||||
# Video Search and Summarization
|
||||
|
||||
> Run the VSS Blueprint on your Spark
|
||||
|
||||
@ -30,8 +30,8 @@ You will deploy NVIDIA's VSS AI Blueprint on NVIDIA Spark hardware with Blackwel
|
||||
## Prerequisites
|
||||
|
||||
- NVIDIA Spark device with ARM64 architecture and Blackwell GPU
|
||||
- NVIDIA DGX OS 7.2.3 or higher
|
||||
- Driver version 580.95.05 or higher installed: `nvidia-smi | grep "Driver Version"`
|
||||
- FastOS 1.81.38 or compatible ARM64 system
|
||||
- Driver version 580.82.09 or higher installed: `nvidia-smi | grep "Driver Version"`
|
||||
- CUDA version 13.0 installed: `nvcc --version`
|
||||
- Docker installed and running: `docker --version && docker compose version`
|
||||
- Access to NVIDIA Container Registry with [NGC API Key](https://org.ngc.nvidia.com/setup/api-keys)
|
||||
@ -278,10 +278,6 @@ Open these URLs in your browser:
|
||||
|
||||
In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://build.nvidia.com/). Alternatively, you can configure your own hosted endpoints by following the instructions in the [VSS remote deployment guide](https://docs.nvidia.com/vss/latest/content/installation-remote-docker-compose.html).
|
||||
|
||||
> [!NOTE]
|
||||
> Fully local deployment using smaller LLM (Llama 3.1 8B) is also possible.
|
||||
> To set up a fully local VSS deployment, follow the [instructions in the VSS documentation](https://docs.nvidia.com/vss/latest/content/vss_dep_docker_compose_arm.html#local-deployment-single-gpu-dgx-spark).
|
||||
|
||||
**9.1 Get NVIDIA API Key**
|
||||
|
||||
- Log in to https://build.nvidia.com/explore/discover.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user