mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-26 03:43:52 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
b0c028ed1f
commit
8b262929d3
@ -45,7 +45,6 @@ GPU acceleration and performance optimization capabilities.
|
|||||||
[ ] Docker or container runtime installed
|
[ ] Docker or container runtime installed
|
||||||
[ ] NVIDIA Container Toolkit configured
|
[ ] NVIDIA Container Toolkit configured
|
||||||
[ ] Verify GPU access: `nvidia-smi`
|
[ ] Verify GPU access: `nvidia-smi`
|
||||||
[ ] Verify Docker GPU support: `docker run --gpus all --rm nvcr.io/nvidia/cuda:13.0.1-runtime-ubuntu24.04 nvidia-smi`
|
|
||||||
[ ] Port 8080 available for marimo notebook access
|
[ ] Port 8080 available for marimo notebook access
|
||||||
|
|
||||||
## Ancillary files
|
## Ancillary files
|
||||||
@ -56,7 +55,7 @@ All required assets can be found [here on GitHub](https://gitlab.com/nvidia/dgx-
|
|||||||
- [**NumPy SOM implementation**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/numpy-som.py) — reference implementation of self-organized map training algorithm in NumPy
|
- [**NumPy SOM implementation**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/numpy-som.py) — reference implementation of self-organized map training algorithm in NumPy
|
||||||
- [**JAX SOM implementations**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/som-jax.py) — multiple iteratively refined implementations of SOM algorithm in JAX
|
- [**JAX SOM implementations**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/som-jax.py) — multiple iteratively refined implementations of SOM algorithm in JAX
|
||||||
- [**Environment configuration**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/Dockerfile) — package dependencies and container setup specifications
|
- [**Environment configuration**](https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-spark-playbook-assets/-/blob/main/${MODEL}/assets/Dockerfile) — package dependencies and container setup specifications
|
||||||
- [**Course guide notebook**]() — overall material navigation and learning path
|
|
||||||
|
|
||||||
## Time & risk
|
## Time & risk
|
||||||
|
|
||||||
@ -85,12 +84,13 @@ uname -m
|
|||||||
docker run --gpus all --rm nvcr.io/nvidia/cuda:13.0.1-runtime-ubuntu24.04 nvidia-smi
|
docker run --gpus all --rm nvcr.io/nvidia/cuda:13.0.1-runtime-ubuntu24.04 nvidia-smi
|
||||||
```
|
```
|
||||||
|
|
||||||
If the `docker` command fails with a permission error, you can either
|
If the `docker` command fails with a permission error, you can either run the command with `sudo`, or add yourself to the `docker` group to use `docker` without `sudo`.
|
||||||
|
|
||||||
1. run it with `sudo`, e.g., `sudo docker run --gpus all --rm nvcr.io/nvidia/cuda:13.0.1-runtime-ubuntu24.04 nvidia-smi`, or
|
```bash
|
||||||
2. add yourself to the `docker` group so you can use `docker` without `sudo`.
|
sudo usermod -aG docker $USER
|
||||||
|
newgrp docker
|
||||||
To add yourself to the `docker` group, first run `sudo usermod -aG docker $USER`. Then, as your user account, either run `newgrp docker` or log out and log back in.
|
sudo systemctl restart docker
|
||||||
|
```
|
||||||
|
|
||||||
## Step 3. Clone the playbook repository
|
## Step 3. Clone the playbook repository
|
||||||
|
|
||||||
@ -101,7 +101,7 @@ git clone https://gitlab.com/nvidia/dgx-spark/temp-external-playbook-assets/dgx-
|
|||||||
## Step 3. Build the Docker image
|
## Step 3. Build the Docker image
|
||||||
|
|
||||||
|
|
||||||
> **Warning:** This command will download a base image and build a container locally to support this environment
|
> **Warning:** This command will download a base image and build a container locally to support this environment.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd jax/assets
|
cd jax/assets
|
||||||
@ -165,21 +165,7 @@ The notebooks will show you how to check the performance of each SOM training im
|
|||||||
|
|
||||||
Visually inspect the SOM training output on random color data to confirm algorithm correctness.
|
Visually inspect the SOM training output on random color data to confirm algorithm correctness.
|
||||||
|
|
||||||
## Step 10. Validate installation
|
## Step 10. Troubleshooting
|
||||||
|
|
||||||
Confirm all components are working correctly and notebooks execute successfully.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
## Test GPU JAX functionality
|
|
||||||
python -c "import jax; print(jax.devices()); print(jax.device_count())"
|
|
||||||
|
|
||||||
## Verify JAX can access GPU
|
|
||||||
python -c "import jax.numpy as jnp; x = jnp.array([1, 2, 3]); print(x.device())"
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected output should show GPU devices detected and JAX arrays placed on GPU.
|
|
||||||
|
|
||||||
## Step 11. Troubleshooting
|
|
||||||
|
|
||||||
Common issues and their solutions:
|
Common issues and their solutions:
|
||||||
|
|
||||||
@ -191,24 +177,7 @@ Common issues and their solutions:
|
|||||||
| Port 8080 unavailable | Port already in use | Use `-p 8081:8080` or kill process on 8080 |
|
| Port 8080 unavailable | Port already in use | Use `-p 8081:8080` or kill process on 8080 |
|
||||||
| Package conflicts in Docker build | Outdated environment file | Update environment file for Blackwell |
|
| Package conflicts in Docker build | Outdated environment file | Update environment file for Blackwell |
|
||||||
|
|
||||||
## Step 12. Cleanup and rollback
|
## Step 11. Next steps
|
||||||
|
|
||||||
Remove containers and reset environment if needed.
|
|
||||||
|
|
||||||
> **Warning:** This will remove all container data and downloaded images.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
## Stop and remove containers
|
|
||||||
docker stop $(docker ps -q)
|
|
||||||
docker system prune -f
|
|
||||||
|
|
||||||
## Reset pipenv environment
|
|
||||||
pipenv --rm
|
|
||||||
```
|
|
||||||
|
|
||||||
To rollback: Re-run installation steps from Step 2.
|
|
||||||
|
|
||||||
## Step 13. Next steps
|
|
||||||
|
|
||||||
Apply JAX optimization techniques to your own NumPy-based machine learning code.
|
Apply JAX optimization techniques to your own NumPy-based machine learning code.
|
||||||
|
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user