mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-23 02:23:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
064dc0c758
commit
e9a3f2a759
@ -40,9 +40,9 @@ a functional distributed computing environment.
|
||||
|
||||
## Ancillary files
|
||||
|
||||
All required files for this playbook can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/connect-two-sparks/)
|
||||
All required files for this playbook can be found [here on GitHub](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/connect-two-sparks/)
|
||||
|
||||
- [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/connect-two-sparks/assets/discover-sparks) script for automatic node discovery and SSH key distribution
|
||||
- [**discover-sparks.sh**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/connect-two-sparks/assets/discover-sparks) script for automatic node discovery and SSH key distribution
|
||||
|
||||
## Time & risk
|
||||
|
||||
@ -171,7 +171,7 @@ ip addr show enp1s0f1np1
|
||||
|
||||
#### Option 1: Automatically configure SSH
|
||||
|
||||
Run the DGX Spark [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/connect-two-sparks/assets/discover-sparks) script from one of the nodes to automatically discover and configure SSH:
|
||||
Run the DGX Spark [**discover-sparks.sh**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/connect-two-sparks/assets/discover-sparks) script from one of the nodes to automatically discover and configure SSH:
|
||||
|
||||
```bash
|
||||
bash ./discover-sparks
|
||||
|
||||
@ -55,7 +55,7 @@ You will accelerate popular machine learning algorithms and data analytics opera
|
||||
## Step 4. Cloning the playbook repository
|
||||
- Clone the github repository and go the assets folder place in cuda-x-data-science folder
|
||||
```bash
|
||||
git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets
|
||||
git clone https://github.com/NVIDIA/dgx-spark-playbooks
|
||||
```
|
||||
- Place the **kaggle.json** created in Step 1 in the assets folder
|
||||
|
||||
|
||||
@ -32,7 +32,7 @@ You will learn how to access and use the DGX Dashboard on your DGX Spark device.
|
||||
|
||||
## Ancillary files
|
||||
|
||||
- Python code snippet for SDXL found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/dgx-dashboard/assets/jupyter-cell.py)
|
||||
- Python code snippet for SDXL found [here on GitHub](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/dgx-dashboard/assets/jupyter-cell.py)
|
||||
|
||||
|
||||
## Time & risk
|
||||
|
||||
@ -72,7 +72,7 @@ newgrp docker
|
||||
In a terminal, clone the repository and navigate to the flux-finetuning directory.
|
||||
|
||||
```bash
|
||||
git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets
|
||||
git clone https://github.com/NVIDIA/dgx-spark-playbooks
|
||||
```
|
||||
|
||||
## Step 3. Model download
|
||||
|
||||
@ -50,12 +50,12 @@ GPU acceleration and performance optimization capabilities.
|
||||
|
||||
## Ancillary files
|
||||
|
||||
All required assets can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main)
|
||||
All required assets can be found [here on GitHub](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main)
|
||||
|
||||
- [**JAX introduction notebook**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/jax/assets/jax-intro.py) — covers JAX programming model differences from NumPy and performance evaluation
|
||||
- [**NumPy SOM implementation**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/jax/assets/numpy-som.py) — reference implementation of self-organized map training algorithm in NumPy
|
||||
- [**JAX SOM implementations**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/jax/assets/som-jax.py) — multiple iteratively refined implementations of SOM algorithm in JAX
|
||||
- [**Environment configuration**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/jax/assets/Dockerfile) — package dependencies and container setup specifications
|
||||
- [**JAX introduction notebook**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/jax/assets/jax-intro.py) — covers JAX programming model differences from NumPy and performance evaluation
|
||||
- [**NumPy SOM implementation**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/jax/assets/numpy-som.py) — reference implementation of self-organized map training algorithm in NumPy
|
||||
- [**JAX SOM implementations**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/jax/assets/som-jax.py) — multiple iteratively refined implementations of SOM algorithm in JAX
|
||||
- [**Environment configuration**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/jax/assets/Dockerfile) — package dependencies and container setup specifications
|
||||
|
||||
|
||||
## Time & risk
|
||||
@ -93,7 +93,7 @@ newgrp docker
|
||||
## Step 2. Clone the playbook repository
|
||||
|
||||
```bash
|
||||
git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets
|
||||
git clone https://github.com/NVIDIA/dgx-spark-playbooks
|
||||
```
|
||||
|
||||
## Step 3. Build the Docker image
|
||||
|
||||
@ -72,7 +72,7 @@ newgrp docker
|
||||
## Step 2. Clone the repository
|
||||
|
||||
```bash
|
||||
git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets
|
||||
git clone https://github.com/NVIDIA/dgx-spark-playbooks
|
||||
cd multi-agent-chatbot/assets
|
||||
```
|
||||
|
||||
|
||||
@ -121,7 +121,7 @@ space and network bandwidth.
|
||||
|
||||
```bash
|
||||
## Clone OpenFold repository
|
||||
git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets
|
||||
git clone https://github.com/NVIDIA/dgx-spark-playbooks
|
||||
cd nvidia/protein-folding/assets
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
@ -34,7 +34,7 @@ Recipes are specifically for DIGITS SPARK. Please make sure that OS and drivers
|
||||
|
||||
## Ancillary files
|
||||
|
||||
ALl files required for fine-tuning are included in the folder in [the GitHub repository here](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/pytorch-fine-tune).
|
||||
ALl files required for fine-tuning are included in the folder in [the GitHub repository here](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/pytorch-fine-tune).
|
||||
|
||||
## Time & risk
|
||||
|
||||
@ -92,7 +92,7 @@ huggingface-cli login
|
||||
## Step 6: Clone the git repo with fine-tuning recipes
|
||||
|
||||
```bash
|
||||
git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets
|
||||
git clone https://github.com/NVIDIA/dgx-spark-playbooks
|
||||
cd nvidia/pytorch-fine-tune/assets
|
||||
```
|
||||
|
||||
|
||||
@ -44,7 +44,7 @@ vision-language tasks using models like DeepSeek-V2-Lite.
|
||||
|
||||
## Ancillary files
|
||||
|
||||
- An offline inference python script [found here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/sglang/assets/offline-inference.py)
|
||||
- An offline inference python script [found here on GitHub](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/sglang/assets/offline-inference.py)
|
||||
|
||||
### Time & risk
|
||||
|
||||
@ -168,7 +168,7 @@ print(f"Response: {response.json()['text']}")
|
||||
Launch a new container instance for offline inference to demonstrate local model usage without
|
||||
HTTP server. This runs entirely within the container for batch processing scenarios.
|
||||
|
||||
TO DO: NEEDS TO HAVE SCRIPT FROM ASSETS PROPERLY INCORPORATED. [See here](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/sglang/assets)
|
||||
TO DO: NEEDS TO HAVE SCRIPT FROM ASSETS PROPERLY INCORPORATED. [See here](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/sglang/assets)
|
||||
|
||||
## Step 8. Validate installation
|
||||
|
||||
|
||||
@ -76,11 +76,11 @@ inference through kernel-level optimizations, efficient memory layouts, and adva
|
||||
|
||||
## Ancillary files
|
||||
|
||||
All required assets can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main)
|
||||
All required assets can be found [here on GitHub](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main)
|
||||
|
||||
- [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/trt-llm/assets/discover-sparks.sh) — script to automatically discover and configure SSH between Spark nodes
|
||||
- [**trtllm-mn-entrypoint.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh) — container entrypoint script for multi-node setup
|
||||
- [**docker-compose.yml**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/trt-llm/assets/docker-compose.yml) — Docker Compose configuration for multi-node deployment
|
||||
- [**discover-sparks.sh**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/trt-llm/assets/discover-sparks.sh) — script to automatically discover and configure SSH between Spark nodes
|
||||
- [**trtllm-mn-entrypoint.sh**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh) — container entrypoint script for multi-node setup
|
||||
- [**docker-compose.yml**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/trt-llm/assets/docker-compose.yml) — Docker Compose configuration for multi-node deployment
|
||||
|
||||
## Model Support Matrix
|
||||
|
||||
@ -511,13 +511,13 @@ Run the command suggested by the docker swarm init on each worker node to join t
|
||||
docker swarm join --token <worker-token> <advertise-addr>:<port>
|
||||
```
|
||||
|
||||
On both nodes, download the [**trtllm-mn-entrypoint.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh) script into your home directory and run the following command to make it executable:
|
||||
On both nodes, download the [**trtllm-mn-entrypoint.sh**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh) script into your home directory and run the following command to make it executable:
|
||||
|
||||
```bash
|
||||
chmod +x $HOME/trtllm-mn-entrypoint.sh
|
||||
```
|
||||
|
||||
On your primary node, deploy the TRT-LLM multi-node stack by downloading the [**docker-compose.yml**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/trt-llm/assets/docker-compose.yml) file into your home directory and running the following command:
|
||||
On your primary node, deploy the TRT-LLM multi-node stack by downloading the [**docker-compose.yml**](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/trt-llm/assets/docker-compose.yml) file into your home directory and running the following command:
|
||||
```bash
|
||||
docker stack deploy -c $HOME/docker-compose.yml trtllm-multinode
|
||||
```
|
||||
|
||||
@ -61,7 +61,7 @@ The setup includes:
|
||||
In a terminal, clone the txt2kg repository and navigate to the project directory.
|
||||
|
||||
```bash
|
||||
git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets
|
||||
git clone https://github.com/NVIDIA/dgx-spark-playbooks
|
||||
cd nvidia/txt2kg/assets
|
||||
```
|
||||
|
||||
|
||||
@ -44,7 +44,7 @@ parameter-efficient fine-tuning methods like LoRA and QLoRA.
|
||||
|
||||
## Ancillary files
|
||||
|
||||
The Python test script can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/unsloth/assets/test_unsloth.py)
|
||||
The Python test script can be found [here on GitHub](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/unsloth/assets/test_unsloth.py)
|
||||
|
||||
|
||||
## Time & risk
|
||||
@ -96,10 +96,10 @@ pip install --no-deps bitsandbytes
|
||||
|
||||
## Step 6. Create Python test script
|
||||
|
||||
Curl the test script [here](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/unsloth/assets/test_unsloth.py) into the container.
|
||||
Curl the test script [here](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/unsloth/assets/test_unsloth.py) into the container.
|
||||
|
||||
```bash
|
||||
curl -O https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/unsloth/assets/test_unsloth.py
|
||||
curl -O https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/unsloth/assets/test_unsloth.py
|
||||
```
|
||||
|
||||
We will use this test script to validate the installation with a simple fine-tuning task.
|
||||
|
||||
@ -79,7 +79,7 @@ newgrp docker
|
||||
In a terminal, clone the repository and navigate to the VLM fine-tuning directory.
|
||||
|
||||
```bash
|
||||
git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets
|
||||
git clone https://github.com/NVIDIA/dgx-spark-playbooks
|
||||
```
|
||||
|
||||
## Step 3. Build the Docker container
|
||||
|
||||
Loading…
Reference in New Issue
Block a user