> An end-to-end GPU-powered workflow for scRNA-seq using RAPIDS
## Table of Contents
- [Overview](#overview)
- [Instructions](#instructions)
- [Troubleshooting](#troubleshooting)
---
## Overview
## Basic idea
Single-cell RNA sequencing (scRNA-seq) lets researchers study gene activity in each cell on its own, exposing variation, cell types, and cell states that bulk methods hide. But these large, high-dimensional datasets take heavy compute to handle.
This playbook shows an end-to-end GPU-powered workflow for scRNA-seq using [RAPIDS-singlecell](https://rapids-singlecell.readthedocs.io/en/latest/), a RAPIDS powered library in the [scverse® ecosystem](https://github.com/scverse). It follows the familiar [Scanpy API](https://scanpy.readthedocs.io/en/stable/) and lets researchers run the steps of data preprocessing, quality control (QC) and cleanup, visualization, and investigation faster than CPU tools by working with sparse count matrices directly on the GPU.
6. Batch Correction and analysis using Harmony, k-nearest neighbors, UMAP, and tSNE
7. Explore the biological information from the data with differential expression analysis and trajectory analysis
The README elaborates on these steps.
## What to know before starting
- The rapids-singlecell library mimics the Scanpy API from scverse, allowing users familiar with the standard CPU workflow to easily adapt to GPU acceleration through cuPy and NVIDIA RAPIDS cuML and cuGraph.
- Algorithmic Precision: Unlike Scanpy's CPU implementation which uses approximate nearest neighbor search, this GPU implementation computes the exact graph; consequently, small differences in results are expected and valid.
- Parameter Sensitivity: When performing t-SNE, the number of nearest neighbors must be at least 3x to avoid distortion
## Prerequisites
**Hardware Requirements:**
- NVIDIA Grace Blackwell GB10 Superchip System (DGX Spark)
- Minimum 40GB Unified memory free for docker container and GPU accelerated data processing
- At least 30GB available storage space for docker container and data files
All required assets can be found [in the Single-cell RNA Sequencing repository](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/single-cell/). In the running playbook, they will all be found under the `playbook` folder.
-`docker --version` will print something like `Docker version 28.3.3, build 980b856`. If you get an error saying that Docker is not installed, please reinstall it. If you see a permission denied error, add your user to the docker group by running `sudo usermod -aG docker $USER && newgrp docker`.
Once in JupyterLab, you'll be greeted with a directory containing scRNA_analysis_preprocessing.ipynb, and the folders `cuDF`, `cuML`, `cuGraph`, and `playbook`.
Once in JupyterLab, there all you have to do is run the `scRNA_analysis_preprocessing.ipynb`. You'll get both these playbook notebooks as well as the standard RAPIDS library example notebooks to help you get going.
You can use `Shift + Enter` to manually run each cell at your own pace, or `Run > Run All` to run all the cells.
Once you're done with exploring the `scRNA_analysis_preprocessing` notebook, you can explore other RAPIDS notebooks by going into the folders, selecting other notebooks, and doing the same thing.
Since the docker container cannot privileged write back to the host system, you can use JupyterLab to download any files you may want to keep once the docker container is shut down.
2. Quickly either enter `y` and then hit `Enter` at the prompt or hit `Ctrl + C` again
3. The Docker container will proceed to shut down
> [!WARNING]
> This will delete ALL data that wasn't already downloaded from the Docker container. The browser window may still show cached files if it is still open.
## Troubleshooting
<!--
TROUBLESHOOTING TEMPLATE: Although optional, this resource can significantly help users resolve common issues.
Replace all placeholder content in {} with your actual troubleshooting information.
Remove these comment blocks when you're done.
PURPOSE: Provide quick solutions to problems users are likely to encounter.
FORMAT: Use the table format for easy scanning. Add detailed notes when needed.
-->
| Symptom | Cause | Fix |
|---------|-------|-----|
| Docker is not found. | Docker may have been uninstalled, as it is preinstalled on your DGX Spark | Please install Docker using their convenience script here: `curl -fsSL https://get.docker.com -o get-docker.sh && sudo sh get-docker.sh`. You will be prompted for your password. |
| Docker command unexpectedly exits with "permissions" error | Your user is not part of the `docker` group | Open Terminal and run these commands: `sudo groupadd docker && sudo usermod -aG docker $USER`. You will be prompted for your password. Then, close the Terminal, open a new one, and try again |
| Docker container download, environment build, or data download fails | There was either a connectivity issue or a resource may be temporarily unavailable. | You may need to try again later. If this persists, please post on the Spark user forum for support |