From d551395a19e8a86e5c9b88bcdd88dcc40d8f0722 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Mon, 13 Oct 2025 00:46:13 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- nvidia/connect-to-your-spark/README.md | 6 +++--- nvidia/dgx-dashboard/README.md | 2 +- nvidia/jax/README.md | 8 ++++---- nvidia/protein-folding/README.md | 2 +- nvidia/pytorch-fine-tune/README.md | 4 ++-- nvidia/sglang/README.md | 4 ++-- nvidia/stack-sparks/README.md | 6 +++--- nvidia/trt-llm/README.md | 10 +++++----- nvidia/txt2kg/README.md | 2 +- nvidia/unsloth/README.md | 6 +++--- 10 files changed, 25 insertions(+), 25 deletions(-) diff --git a/nvidia/connect-to-your-spark/README.md b/nvidia/connect-to-your-spark/README.md index db91b27..1affd51 100644 --- a/nvidia/connect-to-your-spark/README.md +++ b/nvidia/connect-to-your-spark/README.md @@ -344,6 +344,6 @@ With SSH access configured, you can: | Symptom | Cause | Fix | |---------|--------|-----| -| Device name doesn't resolve | mDNS blocked on network | Use IP address instead of hostname.local | -| Connection refused/timeout | DGX Spark not booted or SSH not ready | Wait for device boot completion; SSH available after updates finish | -| Port forwarding fails | Service not running or port conflict | Verify remote service is active; try different local port | +| `ssh: Could not resolve hostname` | mDNS not working | Use IP address instead of .local hostname | +| `Connection refused` | Device not booted or SSH disabled | Wait for full boot; SSH available after system updates complete | +| `Port forwarding fails` | Service not running or port conflict | Verify remote service is active; try different local port | diff --git a/nvidia/dgx-dashboard/README.md b/nvidia/dgx-dashboard/README.md index 4453e90..7640746 100644 --- a/nvidia/dgx-dashboard/README.md +++ b/nvidia/dgx-dashboard/README.md @@ -32,7 +32,7 @@ You will learn how to access and use the DGX Dashboard on your DGX Spark device. ## Ancillary files -- Python code snippet for SDXL found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/jupyter-cell.py) +- Python code snippet for SDXL found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/dgx-dashboard/assets/jupyter-cell.py) ## Time & risk diff --git a/nvidia/jax/README.md b/nvidia/jax/README.md index 8644000..8c5e6c4 100644 --- a/nvidia/jax/README.md +++ b/nvidia/jax/README.md @@ -52,10 +52,10 @@ GPU acceleration and performance optimization capabilities. All required assets can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main) -- [**JAX introduction notebook**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/jax-intro.py) — covers JAX programming model differences from NumPy and performance evaluation -- [**NumPy SOM implementation**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/numpy-som.py) — reference implementation of self-organized map training algorithm in NumPy -- [**JAX SOM implementations**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/som-jax.py) — multiple iteratively refined implementations of SOM algorithm in JAX -- [**Environment configuration**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/Dockerfile) — package dependencies and container setup specifications +- [**JAX introduction notebook**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/jax/assets/jax-intro.py) — covers JAX programming model differences from NumPy and performance evaluation +- [**NumPy SOM implementation**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/jax/assets/numpy-som.py) — reference implementation of self-organized map training algorithm in NumPy +- [**JAX SOM implementations**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/jax/assets/som-jax.py) — multiple iteratively refined implementations of SOM algorithm in JAX +- [**Environment configuration**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/jax/assets/Dockerfile) — package dependencies and container setup specifications ## Time & risk diff --git a/nvidia/protein-folding/README.md b/nvidia/protein-folding/README.md index e104978..b397f06 100644 --- a/nvidia/protein-folding/README.md +++ b/nvidia/protein-folding/README.md @@ -122,7 +122,7 @@ space and network bandwidth. ```bash ## Clone OpenFold repository git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets -cd ${MODEL}/assets +cd nvidia/protein-folding/assets pip install -e . ``` diff --git a/nvidia/pytorch-fine-tune/README.md b/nvidia/pytorch-fine-tune/README.md index f1ee844..7f19db4 100644 --- a/nvidia/pytorch-fine-tune/README.md +++ b/nvidia/pytorch-fine-tune/README.md @@ -34,7 +34,7 @@ Recipes are specifically for DIGITS SPARK. Please make sure that OS and drivers ## Ancillary files -ALl files required for fine-tuning are included in the folder in [the GitHub repository here](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}). +ALl files required for fine-tuning are included in the folder in [the GitHub repository here](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/pytorch-fine-tune). ## Time & risk @@ -93,7 +93,7 @@ huggingface-cli login ```bash git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets -cd ${MODEL}/assets +cd nvidia/pytorch-fine-tune/assets ``` ## Step7: Run the fine-tuning recipes diff --git a/nvidia/sglang/README.md b/nvidia/sglang/README.md index 3135a78..ea83cb8 100644 --- a/nvidia/sglang/README.md +++ b/nvidia/sglang/README.md @@ -44,7 +44,7 @@ vision-language tasks using models like DeepSeek-V2-Lite. ## Ancillary files -- An offline inference python script [found here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/offline-inference.py) +- An offline inference python script [found here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/sglang/assets/offline-inference.py) ### Time & risk @@ -168,7 +168,7 @@ print(f"Response: {response.json()['text']}") Launch a new container instance for offline inference to demonstrate local model usage without HTTP server. This runs entirely within the container for batch processing scenarios. -TO DO: NEEDS TO HAVE SCRIPT FROM ASSETS PROPERLY INCORPORATED. [See here](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets) +TO DO: NEEDS TO HAVE SCRIPT FROM ASSETS PROPERLY INCORPORATED. [See here](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/sglang/assets) ## Step 8. Validate installation diff --git a/nvidia/stack-sparks/README.md b/nvidia/stack-sparks/README.md index 5175765..3da9e6d 100644 --- a/nvidia/stack-sparks/README.md +++ b/nvidia/stack-sparks/README.md @@ -40,9 +40,9 @@ a functional distributed computing environment. ## Ancillary files -All required files for this playbook can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/) +All required files for this playbook can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/stack-sparks/) -- [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/discover-sparks) script for automatic node discovery and SSH key distribution +- [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/stack-sparks/assets/discover-sparks) script for automatic node discovery and SSH key distribution ## Time & risk @@ -169,7 +169,7 @@ ip addr show enp1s0f1np1 **Option 1: Automatically configure SSH** -Run the DGX Spark [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/discover-sparks) script from one of the nodes to automatically discover and configure SSH: +Run the DGX Spark [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/stack-sparks/assets/discover-sparks) script from one of the nodes to automatically discover and configure SSH: ```bash bash ./discover-sparks diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md index 14d8263..4fa2f95 100644 --- a/nvidia/trt-llm/README.md +++ b/nvidia/trt-llm/README.md @@ -78,9 +78,9 @@ inference through kernel-level optimizations, efficient memory layouts, and adva All required assets can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main) -- [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/discover-sparks.sh) — script to automatically discover and configure SSH between Spark nodes -- [**trtllm-mn-entrypoint.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/trtllm-mn-entrypoint.sh) — container entrypoint script for multi-node setup -- [**docker-compose.yml**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/docker-compose.yml) — Docker Compose configuration for multi-node deployment +- [**discover-sparks.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/trt-llm/assets/discover-sparks.sh) — script to automatically discover and configure SSH between Spark nodes +- [**trtllm-mn-entrypoint.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh) — container entrypoint script for multi-node setup +- [**docker-compose.yml**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/trt-llm/assets/docker-compose.yml) — Docker Compose configuration for multi-node deployment ## Model Support Matrix @@ -511,13 +511,13 @@ Run the command suggested by the docker swarm init on each worker node to join t docker swarm join --token : ``` -On both nodes, download the [**trtllm-mn-entrypoint.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/trtllm-mn-entrypoint.sh) script into your home directory and run the following command to make it executable: +On both nodes, download the [**trtllm-mn-entrypoint.sh**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/trt-llm/assets/trtllm-mn-entrypoint.sh) script into your home directory and run the following command to make it executable: ```bash chmod +x $HOME/trtllm-mn-entrypoint.sh ``` -On your primary node, deploy the TRT-LLM multi-node stack by downloading the [**docker-compose.yml**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/docker-compose.yml) file into your home directory and running the following command: +On your primary node, deploy the TRT-LLM multi-node stack by downloading the [**docker-compose.yml**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/trt-llm/assets/docker-compose.yml) file into your home directory and running the following command: ```bash docker stack deploy -c $HOME/docker-compose.yml trtllm-multinode ``` diff --git a/nvidia/txt2kg/README.md b/nvidia/txt2kg/README.md index ee71138..c3545a0 100644 --- a/nvidia/txt2kg/README.md +++ b/nvidia/txt2kg/README.md @@ -62,7 +62,7 @@ In a terminal, clone the txt2kg repository and navigate to the project directory ```bash git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets -cd ${MODEL}/assets +cd nvidia/txt2kg/assets ``` ## Step 2. Start the txt2kg services diff --git a/nvidia/unsloth/README.md b/nvidia/unsloth/README.md index 179fb01..91bb848 100644 --- a/nvidia/unsloth/README.md +++ b/nvidia/unsloth/README.md @@ -44,7 +44,7 @@ parameter-efficient fine-tuning methods like LoRA and QLoRA. ## Ancillary files -The Python test script can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/test_unsloth.py) +The Python test script can be found [here on GitHub](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/unsloth/assets/test_unsloth.py) ## Time & risk @@ -96,10 +96,10 @@ pip install --no-deps bitsandbytes ## Step 6. Create Python test script -Curl the test script [here](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/test_unsloth.py) into the container. +Curl the test script [here](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/unsloth/assets/test_unsloth.py) into the container. ```bash -curl -O https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/test_unsloth.py +curl -O https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/nvidia/unsloth/assets/test_unsloth.py ``` We will use this test script to validate the installation with a simple fine-tuning task.