sarman/dgx-spark-playbooks

Fork 0

mirror of https://github.com/NVIDIA/dgx-spark-playbooks.git synced 2026-04-22 18:13:52 +00:00

GitLab CI 430b1e685f chore: Regenerate all playbooks

2026-01-02 22:57:04 +00:00

9.4 KiB

Raw Blame History

Single-cell RNA Sequencing

An end-to-end GPU-powered workflow for scRNA-seq using RAPIDS

Overview
Instructions
Troubleshooting

Overview

Basic idea

Single-cell RNA sequencing (scRNA-seq) lets researchers study gene activity in each cell on its own, exposing variation, cell types, and cell states that bulk methods hide. But these large, high-dimensional datasets take heavy compute to handle.

This playbook shows an end-to-end GPU-powered workflow for scRNA-seq using RAPIDS-singlecell, a RAPIDS powered library in the scverse® ecosystem. It follows the familiar Scanpy API and lets researchers run the steps of data preprocessing, quality control (QC) and cleanup, visualization, and investigation faster than CPU tools by working with sparse count matrices directly on the GPU.

What you'll accomplish

GPU-Accelerated Data Loading & Preprocessing
QC cells visually to understand the data
Filter unusual cells
Remove unwanted sources of variation
Cluster and visualize PCA and UMAP data
Batch Correction and analysis using Harmony, k-nearest neighbors, UMAP, and tSNE
Explore the biological information from the data with differential expression analysis and trajectory analysis

The README elaborates on these steps.

What to know before starting

The rapids-singlecell library mimics the Scanpy API from scverse, allowing users familiar with the standard CPU workflow to easily adapt to GPU acceleration through cuPy and NVIDIA RAPIDS cuML and cuGraph.
Algorithmic Precision: Unlike Scanpy's CPU implementation which uses approximate nearest neighbor search, this GPU implementation computes the exact graph; consequently, small differences in results are expected and valid.
Parameter Sensitivity: When performing t-SNE, the number of nearest neighbors must be at least 3x to avoid distortion

Prerequisites

Hardware Requirements:

NVIDIA Grace Blackwell GB10 Superchip System (DGX Spark)
Minimum 40GB Unified memory free for docker container and GPU accelerated data processing
At least 30GB available storage space for docker container and data files
High Speed network connectivity
High speed internet connection recommended

Software Requirements:

NVIDIA DGX OS
Docker

Ancillary files

All required assets can be found in the Single-cell RNA Sequencing repository. In the running playbook, they will all be found under the playbook folder.

scRNA_analysis_preprocessing.ipynb - Main playbook notebook.
README.md - Quick Start Guide to the Playbook Environment. It will also be found in the main directory of the Jupyter Lab. Please start there!
/setup/start_playbook.sh - Script to start the install of the playbook in a Docker container
/setup/setup_playbook.sh - Configures the Docker container before user enters JupyterLab environment
/setup/requirements.txt - used as a list of libraries that commands in setup_playbook will install into the playbook environment

Time & risk

Estimated Time: ~15 minutes for first run
- Total Notebook Processing Time: Approximately 2-3 minutes for the full pipeline (~130 seconds recorded in demo).
- Data Loading: ~1.7 seconds.
- Preprocessing: ~21 seconds.
- Post-processing (Clustering/Diff Exp): ~104 seconds.
- Data: Internet access to download the docker container, libraries, and demo dataset (dli_census.h5ad).
Risks
- GPU Memory Constraints: The workflow is very GPU memory intensive. Large datasets may trigger Out Of Memory (OOM) errors.
- Kernel Management: You may need to kill/restart kernels to free up GPU resources between workflow stages.
- Rollback: If an OOM error occurs, kill all kernels to free GPU memory and restart either the specific notebook or the entire playbook.
Last Updated: 01/02/2026
- First Publication

Instructions

Step 1. Verify your environment

Let's first verify that you have a working GPU, git, and Docker. Open up Terminal, then copy and paste in the below commands:

nvidia-smi
git --version
docker --version

nvidia-smi will output information about your GPU. If it doesn't, your GPU is not properly configured.
git --version will print something like git version 2.43.0. If you get an error saying that git is not installed, please reinstall it.
docker --version will print something like Docker version 28.3.3, build 980b856. If you get an error saying that Docker is not installed, please reinstall it. If you see a permission denied error, add your user to the docker group by running sudo usermod -aG docker $USER && newgrp docker.

Step 2. Installation

Open up Terminal, then copy and paste in the below commands:

git clone https://github.com/NVIDIA/dgx-spark-playbooks
cd dgx-spark-playbooks/nvidia/single-cell/assets
bash ./setup/start_playbook.sh

start_playbook.sh will:

pull the RAPIDS 25.10 Notebooks Docker container
build all the environments needed for the playbook in the container using setup_playbook.sh
start JupyterLab

Please keep the Terminal window open while using the playbook.

You can access your JupyterLab server in two ways

at http://127.0.0.1:8888 if running locally on the DGX Spark.
at http://<SPARK_IP>:8888 if using your DGX Spark headless over your network.

Once in JupyterLab, you'll be greeted with a directory containing scRNA_analysis_preprocessing.ipynb, and the folders cuDF, cuML, cuGraph, and playbook.

scRNA_analysis_preprocessing.ipynbis the playbook notebook. You will want to open this by double clicking on the file.
cuDF, cuML, cuGraph folders contain the standard RAPIDS library example notebooks to help you continue exploring.
playbook contains the playbook files. The contents of this folder are read-only inside of a rootless Docker Container.

If you want to install any of the playbook notebooks on your own system, check out the readmes within the folder that accompanies the notebook

Step 3. Run the notebook

Once in JupyterLab, there all you have to do is run the scRNA_analysis_preprocessing.ipynb. You'll get both these playbook notebooks as well as the standard RAPIDS library example notebooks to help you get going.

You can use Shift + Enter to manually run each cell at your own pace, or Run > Run All to run all the cells.

Once you're done with exploring the scRNA_analysis_preprocessing notebook, you can explore other RAPIDS notebooks by going into the folders, selecting other notebooks, and doing the same thing.

Step 4. Download your work

Since the docker container cannot privileged write back to the host system, you can use JupyterLab to download any files you may want to keep once the docker container is shut down.

Simply right click the file you want, in the browser, and click Download in the dropdown.

Step 5. Cleanup

Once you have downloaded all your work, go back to the Terminal window where you started running the playbook.

In the Terminal window,

Type Ctrl + C
Quickly either enter y and then hit Enter at the prompt or hit Ctrl + C again
The Docker container will proceed to shut down

Warning

This will delete ALL data that wasn't already downloaded from the Docker container. The browser window may still show cached files if it is still open.

Troubleshooting

Symptom	Cause	Fix
Docker is not found.	Docker may have been uninstalled, as it is preinstalled on your DGX Spark	Please install Docker using their convenience script here: `curl -fsSL https://get.docker.com -o get-docker.sh && sudo sh get-docker.sh`. You will be prompted for your password.
Docker command unexpectedly exits with "permissions" error	Your user is not part of the `docker` group	Open Terminal and run these commands: `sudo groupadd docker && sudo usermod -aG docker $USER`. You will be prompted for your password. Then, close the Terminal, open a new one, and try again
Docker container download, environment build, or data download fails	There was either a connectivity issue or a resource may be temporarily unavailable.	You may need to try again later. If this persists, please post on the Spark user forum for support

Note

DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:

sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'

For latest known issues, please review the DGX Spark User Guide.

9.4 KiB Raw Blame History