mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-22 01:53:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
a5431dd77a
commit
430b1e685f
@ -57,7 +57,7 @@ All required assets can be found in the Isaac Sim and Isaac Lab repositories on
|
||||
* Build process requires significant compilation time and may encounter dependency issues
|
||||
* Build artifacts consume substantial disk space
|
||||
* **Rollback:** Isaac Sim build directory can be removed to free space. Git repositories can be deleted and re-cloned if needed.
|
||||
* **Last Updated:** 1/06/2024
|
||||
* **Last Updated:** 01/02/2026
|
||||
* First Publication
|
||||
|
||||
## Run Isaac Sim
|
||||
@ -146,7 +146,7 @@ ln -sfn "${ISAACSIM_PATH}" "${PWD}/_isaac_sim"
|
||||
ls -l "${PWD}/_isaac_sim/python.sh"
|
||||
```
|
||||
|
||||
## Step 4. Install Isaac Lab.
|
||||
## Step 4. Install Isaac Lab
|
||||
|
||||
```bash
|
||||
./isaaclab.sh --install
|
||||
@ -176,11 +176,11 @@ export LD_PRELOAD="$LD_PRELOAD:/lib/aarch64-linux-gnu/libgomp.so.1"
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
## Common issues for Isaac Sim
|
||||
## Common Issues for Isaac Sim
|
||||
|
||||
| Symptom | Cause | Fix |
|
||||
|-----------------------------|--------------------------|-----------------------------------|
|
||||
| Isaac Sim error compilation | gcc+11 is not by default | Be sure that gcc+11 is by default |
|
||||
| Isaac Sim error compilation | gcc/g++ 11 is not by default | Be sure that gcc/g++ 11 is by default |
|
||||
| Isaac Sim not executes | Error libgomp.so.1 | Add export LD_PRELOAD |
|
||||
| Error in build | old installation | Remove .cache folder |
|
||||
|
||||
|
||||
@ -79,7 +79,7 @@ The package will be installed directly via pip, so no additional files are requi
|
||||
* Port 8090 must be accessible for web interface functionality
|
||||
* Self-signed SSL certificate requires browser security exception
|
||||
* **Rollback:** Uninstall the Python package with `pip uninstall live-vlm-webui`. Ollama can be uninstalled with standard package removal. No persistent changes to DGX Spark configuration.
|
||||
* **Last Updated:** December 2025
|
||||
* **Last Updated:** 01/02/2026
|
||||
* First Publication
|
||||
|
||||
## Instructions
|
||||
|
||||
@ -54,7 +54,7 @@ You will have a fully functional Nemotron-3-Nano-30B-A3B inference server runnin
|
||||
* **Risk level:** Low
|
||||
* Build process compiles from source but doesn't modify system files
|
||||
* Model downloads can be resumed if interrupted
|
||||
* **Rollback:** Delete the cloned llama.cpp directory and downloaded model files to fully remove the installation
|
||||
* **Rollback:** Delete the cloned `llama.cpp` directory and downloaded model files to fully remove the installation
|
||||
* **Last Updated:** 12/17/2025
|
||||
* First Publication
|
||||
|
||||
|
||||
@ -81,7 +81,7 @@ All required assets can be found [in the Portfolio Optimization repository](http
|
||||
|
||||
* **Rollback:** Stop the Docker container and remove the cloned repository to fully remove the installation.
|
||||
|
||||
* **Last Updated:** 1/05/2026
|
||||
* **Last Updated:** 01/02/2026
|
||||
* First Publication
|
||||
|
||||
## Instructions
|
||||
|
||||
@ -51,10 +51,8 @@ ALl files required for fine-tuning are included in the folder in [the GitHub rep
|
||||
|
||||
* **Time estimate:** 30-45 mins for setup and runing fine-tuning. Fine-tuning run time varies depending on model size
|
||||
* **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting.
|
||||
* **Last Updated:** 12/16/2025
|
||||
* Fix broken commands to access files from GitHub
|
||||
* Upgrade to latest pytorch container version nvcr.io/nvidia/pytorch:25.11-py3
|
||||
* Temporarily remove Llama3_70B_qLoRA_finetuning command to investigate slowness.
|
||||
* **Last Updated:** 01/02/2025
|
||||
* Add two-Spark distributed finetuning example
|
||||
|
||||
## Instructions
|
||||
|
||||
|
||||
@ -73,8 +73,8 @@ The following models are supported with SGLang on Spark. All listed models are a
|
||||
* **Estimated time:** 30 minutes for initial setup and validation
|
||||
* **Risk level:** Low - Uses pre-built, validated SGLang container with minimal configuration
|
||||
* **Rollback:** Stop and remove containers with `docker stop` and `docker rm` commands
|
||||
* **Last Updated:** 11/25/2025
|
||||
* First Publication
|
||||
* **Last Updated:** 01/02/2026
|
||||
* Add Model Support Matrix
|
||||
|
||||
## Instructions
|
||||
|
||||
|
||||
@ -24,7 +24,7 @@ This playbook shows an end-to-end GPU-powered workflow for scRNA-seq using [RAPI
|
||||
2. QC cells visually to understand the data
|
||||
3. Filter unusual cells
|
||||
4. Remove unwanted sources of variation
|
||||
5. Cluster and visualize PCA nd UMAP data
|
||||
5. Cluster and visualize PCA and UMAP data
|
||||
6. Batch Correction and analysis using Harmony, k-nearest neighbors, UMAP, and tSNE
|
||||
7. Explore the biological information from the data with differential expression analysis and trajectory analysis
|
||||
|
||||
@ -50,16 +50,16 @@ The README elaborates on these steps.
|
||||
|
||||
## Ancillary files
|
||||
|
||||
All required assets can be found [in the Single-cell RNA Sequencing repository](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/single-cell/). In the running playbook, they will all be found under the `playbook` folder.
|
||||
All required assets can be found [in the Single-cell RNA Sequencing repository](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/single-cell/). In the running playbook, they will all be found under the `playbook` folder.
|
||||
|
||||
- `scRNA_analysis_preprocessing.ipynb` - Main playbook notebook.
|
||||
- `README.md` - Quick Start Guide to the Playbook Environment. It will also be found in the main directory of the Jupyter Lab. Please start there!
|
||||
- `/setup/start_playbook.sh` - Script to start the install of the playbook in a Docker container
|
||||
- `/setup/setup_playbook.sh` - Configures the Docker container before user enters jupyterlab environment
|
||||
- `/setup/requirements.txt` - used as a lists of libraries that commands in setup_playbook will install into the playbook environment
|
||||
- `/setup/setup_playbook.sh` - Configures the Docker container before user enters JupyterLab environment
|
||||
- `/setup/requirements.txt` - used as a list of libraries that commands in setup_playbook will install into the playbook environment
|
||||
|
||||
## Time & risk
|
||||
* **Estimated Time** ~15 minutes for first run
|
||||
* **Estimated Time:** ~15 minutes for first run
|
||||
|
||||
- Total Notebook Processing Time: Approximately 2-3 minutes for the full pipeline (~130 seconds recorded in demo).
|
||||
- Data Loading: ~1.7 seconds.
|
||||
@ -73,7 +73,7 @@ All required assets can be found [in the Single-cell RNA Sequencing repository](
|
||||
- Kernel Management: You may need to kill/restart kernels to free up GPU resources between workflow stages.
|
||||
- Rollback: If an OOM error occurs, kill all kernels to free GPU memory and restart either the specific notebook or the entire playbook.
|
||||
|
||||
* **Last Updated:** 01/06/2026
|
||||
* **Last Updated:** 01/02/2026
|
||||
* First Publication
|
||||
|
||||
## Instructions
|
||||
@ -90,7 +90,7 @@ docker --version
|
||||
|
||||
- `nvidia-smi` will output information about your GPU. If it doesn't, your GPU is not properly configured.
|
||||
- `git --version` will print something like `git version 2.43.0`. If you get an error saying that git is not installed, please reinstall it.
|
||||
- `docker --version` will print something like `Docker version 28.3.3, build 980b856`. If you get an error saying that Docker is not installed, please reinstall it.
|
||||
- `docker --version` will print something like `Docker version 28.3.3, build 980b856`. If you get an error saying that Docker is not installed, please reinstall it. If you see a permission denied error, add your user to the docker group by running `sudo usermod -aG docker $USER && newgrp docker`.
|
||||
|
||||
## Step 2. Installation
|
||||
Open up Terminal, then copy and paste in the below commands:
|
||||
@ -104,26 +104,26 @@ bash ./setup/start_playbook.sh
|
||||
start_playbook.sh will:
|
||||
|
||||
1. pull the RAPIDS 25.10 Notebooks Docker container
|
||||
2. build all the environments needed for the playbook in the container using `setup_playbook.sh`
|
||||
3. start Jupyterlab
|
||||
2. build all the environments needed for the playbook in the container using setup_playbook.sh
|
||||
3. start JupyterLab
|
||||
|
||||
Please keep the Terminal window open while using the playbook.
|
||||
|
||||
You can access your Jupyterlab server in two ways
|
||||
You can access your JupyterLab server in two ways
|
||||
1. at `http://127.0.0.1:8888` if running locally on the DGX Spark.
|
||||
2. at `http://<SPARK_IP>:8888` if using your DGX Spark headless over your network.
|
||||
|
||||
Once in Jupyterlab, you'll be greeted with a directory containing `scRNA_analysis_preprocessing.ipynb`, and the folders `cuDF`, `cuML`, `cuGraph`, and `playbook`.
|
||||
Once in JupyterLab, you'll be greeted with a directory containing scRNA_analysis_preprocessing.ipynb, and the folders `cuDF`, `cuML`, `cuGraph`, and `playbook`.
|
||||
|
||||
- `scRNA_analysis_preprocessing.ipynb`is the playbook notebook. You will want to open this by double clicking on the file.
|
||||
- `cuDF`, `cuML`, `cuGraph` folders contain the standard RAPIDS libary example notebooks to help you continue exploring.
|
||||
- `cuDF`, `cuML`, `cuGraph` folders contain the standard RAPIDS library example notebooks to help you continue exploring.
|
||||
- `playbook` contains the playbook files. The contents of this folder are read-only inside of a rootless Docker Container.
|
||||
|
||||
If you want to install any of the playbook notebooks on your own system, check out the readmes within the folder that accompanies the notebook
|
||||
|
||||
## Step 3. Run the notebook
|
||||
|
||||
Once in jupyterlab, there all you have to do is run the `scRNA_analysis_preprocessing.ipynb`. You'll get both these playbook notebooks as well as the standard RAPIDS libary example notebooks to help you get going.
|
||||
Once in JupyterLab, there all you have to do is run the `scRNA_analysis_preprocessing.ipynb`. You'll get both these playbook notebooks as well as the standard RAPIDS library example notebooks to help you get going.
|
||||
|
||||
You can use `Shift + Enter` to manually run each cell at your own pace, or `Run > Run All` to run all the cells.
|
||||
|
||||
@ -131,19 +131,18 @@ Once you're done with exploring the `scRNA_analysis_preprocessing` notebook, you
|
||||
|
||||
## Step 4. Download your work
|
||||
|
||||
Since the docker container cannot priviledged write back to the host system, you can use Jupyterlab to download any files you may want to keep once the docker container is shut down.
|
||||
Since the docker container cannot privileged write back to the host system, you can use JupyterLab to download any files you may want to keep once the docker container is shut down.
|
||||
|
||||
Simply right click the file you want, in the browser, and click `Download` in the drop down.
|
||||
Simply right click the file you want, in the browser, and click `Download` in the dropdown.
|
||||
|
||||
## Step 5. Cleanup
|
||||
|
||||
Once you have downloaded all your work, Go back to the Terminal window where you started running the playbook.
|
||||
Once you have downloaded all your work, go back to the Terminal window where you started running the playbook.
|
||||
|
||||
In the Terminal window,
|
||||
1. Type `Ctrl + C`
|
||||
2. Quickly either enter `y` and then hit `Enter` at the prompt or hit `Ctrl + C` again
|
||||
3. The Docker container will proceed to shut down
|
||||
{When and why someone might need this step.}
|
||||
|
||||
> [!WARNING]
|
||||
> This will delete ALL data that wasn't already downloaded from the Docker container. The browser window may still show cached files if it is still open.
|
||||
@ -162,8 +161,8 @@ FORMAT: Use the table format for easy scanning. Add detailed notes when needed.
|
||||
| Symptom | Cause | Fix |
|
||||
|---------|-------|-----|
|
||||
| Docker is not found. | Docker may have been uninstalled, as it is preinstalled on your DGX Spark | Please install Docker using their convenience script here: `curl -fsSL https://get.docker.com -o get-docker.sh && sudo sh get-docker.sh`. You will be prompted for your password. |
|
||||
| Docker command unexpectedly exits with "permissions" error | Your user is not part of the `docker` group | Open Terminal and run these commands: `sudo groupadd docker $$ sudo usermod -aG docker $USER`. You will be prompted for your password. Then, close the Terminal, open a new one, and try again |
|
||||
| Docker container download, environment build, or data download fails | There was either a connectivity issue or a resource may be temporariliy unavailable. | You may need to try again later. If this persist, please reach out to us! |
|
||||
| Docker command unexpectedly exits with "permissions" error | Your user is not part of the `docker` group | Open Terminal and run these commands: `sudo groupadd docker && sudo usermod -aG docker $USER`. You will be prompted for your password. Then, close the Terminal, open a new one, and try again |
|
||||
| Docker container download, environment build, or data download fails | There was either a connectivity issue or a resource may be temporarily unavailable. | You may need to try again later. If this persists, please post on the Spark user forum for support |
|
||||
|
||||
|
||||
|
||||
|
||||
Loading…
Reference in New Issue
Block a user