dgx-spark-playbooks/nvidia/station-topic-modeling/endpoint-test.yaml

kind: Playbook
metadata:
  name: station-topic-modeling
  displayName: Topic Modeling
  shortDescription: Extract insights from massive text datasets using cuML's GPU-accelerated BERTopic

  publisher: nvidia
  description: |
    # REPLACE THIS WITH YOUR MODEL CARD
    https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads
    
  labelsV2:
  - gpuType:playbook:gpu_type_station
  - Data Science
  - Machine Learning
  - NLP
  - cuML
  - BERTopic
  
  attributes:
  - key: DURATION
    value: 45 MIN
  
spec:
  artifactName: station-topic-modeling
  nvcfFunctionId: None
  attributes:

    showUnavailableBanner: false
    apiDocsUrl: None
    termsOfUse: |
      
    cta:
      text: View on GitHub
      url: https://github.com/NVIDIA/dgx-station-playbooks/blob/main/nvidia/station-topic-modeling/
      

    tabs:
    - 
      id: overview
      
      label: Overview
      content: |
        # Basic idea
        
        Topic modeling helps you discover hidden themes in large document collections—but traditional methods crawl when datasets grow to millions of records. This playbook shows how to process **40 million Amazon product reviews in minutes** using GPU-accelerated BERTopic.
        
        BERTopic combines transformer embeddings with clustering to extract human-readable topics from text. By swapping CPU-based UMAP and HDBSCAN with GPU-accelerated versions from **RAPIDS cuML**, you get the same results dramatically faster—no code changes required.
        
        - **Drop-in GPU acceleration**: Load `cuml.accel` and your existing UMAP/HDBSCAN code runs on GPU automatically
        - **Scale to millions**: Process datasets that would take hours on CPU in minutes on GPU
        - **Interactive visualizations**: Explore topic distributions, relationships, and document clusters
        
        # What you'll accomplish
        
        You'll run a complete topic modeling pipeline on 40 million product reviews and generate interactive visualizations of discovered topics.
        
        By the end, you'll be able to:
        - Use cuML's drop-in accelerators for UMAP and HDBSCAN
        - Generate sentence embeddings at scale with SentenceTransformers
        - Create topic visualizations including heatmaps, barcharts, and document datamaps
        
        # What to know before starting
        
        - Experience with Python and Jupyter notebooks
        - Basic understanding of machine learning concepts (embeddings, clustering)
        - Familiarity with pandas DataFrames
        
        # Prerequisites
        
        **Hardware Requirements:**
        - NVIDIA DGX Station with GB300 GPU
        - Minimum 64GB GPU memory for processing 40M documents
        - At least 50GB available storage for dataset and embeddings
        
        **Software Requirements:**
        - Conda (Miniconda or Anaconda): `conda --version`
        - CUDA 13.0 compatible drivers: `nvidia-smi`
        - Network access to download the Amazon Reviews dataset (~14GB compressed)
        
        # Ancillary files
        
        All required assets are in the playbook directory `nvidia/station-topic-modeling/assets` (see Step 7). Key file:
        
        - `video_notebook_for_GPU_Accelerated_Machine_Learning_BERTopic_RTX6000_40M.ipynb` - Complete Jupyter notebook with GPU-accelerated topic modeling pipeline (filename reflects original demo hardware; the notebook runs on GB300 and other NVIDIA GPUs)
        
        # Time & risk
        
        * **Estimated time:** 45 minutes (includes environment setup, dataset download, and embedding generation)
        * **Risk level:** Low
          * Large dataset download (~14GB) may take time depending on network speed
          * Embedding generation requires significant GPU memory
        * **Rollback:** Delete the downloaded dataset and any generated embedding files to restore state
        * **Last Updated:** 02/05/2026
          * First Publication
        
      
    - 
      id: instructions
      
      label: Instructions
      content: |
        # Step 1. (DGX Station) Hugging Face cache permissions
        
        On DGX Station, ensure the Hugging Face cache is writable so model downloads succeed:
        
        ```bash
        sudo chown -R $USER:$USER $HOME/.cache/huggingface 2>/dev/null || true
        sudo chmod -R u+rwX $HOME/.cache/huggingface 2>/dev/null || true
        mkdir -p $HOME/.cache/huggingface
        ```
        
        If you see "Permission denied" when downloading models later, run the `chown`/`chmod` lines with your username (e.g. `nvidia`).
        
        # Step 2. Install RAPIDS cuDF and cuML
        
        Create a new conda environment with RAPIDS libraries for GPU-accelerated data processing.
        
        ```bash
        conda create -n rapids-25.10 \
          -c rapidsai -c conda-forge \
          cudf=25.10 cuml=25.10 python=3.11 'cuda-version=13.0'
        ```
        
        This installs cuDF (GPU DataFrame library) and cuML (GPU machine learning library) that provide drop-in acceleration for pandas and scikit-learn operations.
        
        # Step 3. Activate the conda environment
        
        ```bash
        conda activate rapids-25.10
        ```
        
        # Step 4. Install machine learning packages
        
        Install UMAP, HDBSCAN, BERTopic, and supporting libraries for topic modeling.
        
        ```bash
        pip install \
          transformers datasets sentence-transformers \
          umap-learn hdbscan==0.8.40 bertopic matplotlib \
          scikit-learn==1.4.2 datamapplot
        ```
        
        These packages provide:
        - **sentence-transformers**: Generate text embeddings
        - **umap-learn / hdbscan**: Dimensionality reduction and clustering (GPU-accelerated via cuML)
        - **bertopic**: Topic modeling framework
        - **datamapplot**: Document visualization
        
        > [!NOTE]
        > Pip may report dependency conflicts (e.g. dask/distributed downgraded, cuml/rapids-dask-dependency). BERTopic and the notebook can still run. If you need cuML and RAPIDS dask together, consider keeping the conda default dask versions and installing only the BERTopic stack via pip in a separate env; see **Troubleshooting**.
        
        # Step 5. Install visualization packages
        
        Install JupyterLab and visualization libraries for interactive topic exploration.
        
        ```bash
        conda install -c conda-forge \
            notebook=7.5.0 \
            jupyterlab=4.5.0 \
            ipywidgets=8.1.8 \
            jupyterlab-widgets=3.0.16 \
            bokeh=3.8.1 \
            colorcet=3.1.0 \
            datashader=0.18.2 \
            plotly=6.5.0
        ```
        
        If conda reports `PackagesNotFoundError` for `jupyterlab-widgets` (e.g. on some platforms), install it with pip:
        
        ```bash
        pip install jupyterlab-widgets
        ```
        
        # Step 6. Install compatible PyTorch
        
        Install PyTorch with CUDA 13.0 support for GPU-accelerated embedding generation.
        
        ```bash
        pip install torch==2.9.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
        ```
        
        # Step 7. Clone the repository and download the dataset
        
        Clone the playbook repository and download the Amazon Electronics Reviews dataset.
        
        ```bash
        git clone https://github.com/NVIDIA/dgx-station-playbooks
        cd dgx-station-playbooks/nvidia/station-topic-modeling/assets
        ```
        
        Download the dataset (~14GB compressed):
        
        ```bash
        wget https://mcauleylab.ucsd.edu/public_datasets/data/amazon_2023/raw/review_categories/Electronics.jsonl.gz
        ```
        
        # Step 8. Launch JupyterLab
        
        Start JupyterLab from the assets directory:
        
        ```bash
        jupyter lab
        ```
        
        # Step 9. Select the rapids-25.10 kernel
        
        In JupyterLab, open the notebook `video_notebook_for_GPU_Accelerated_Machine_Learning_BERTopic_1M.ipynb`.
        
        Select the **rapids-25.10** kernel from the kernel selector in the top right corner of the notebook interface.
        
        # Step 10. Execute all cells
        
        Run all cells in the notebook sequentially. The notebook will:
        
        1. **Load data with cuDF**: GPU-accelerated pandas via `%load_ext cudf.pandas`
        2. **Preprocess text**: Clean and normalize review text
        3. **Generate embeddings**: Create sentence embeddings
        4. **Enable GPU acceleration**: Load cuML accelerators via `%load_ext cuml.accel`
        5. **Run BERTopic**: Cluster documents into topics using GPU-accelerated UMAP and HDBSCAN
        6. **Visualize results**: Generate interactive topic visualizations
        
        # Step 11. Explore the results
        
        After the notebook completes, you'll have:
        
        - **Topic information table**: Discovered topics with keywords and document counts
        - **Topic visualization**: Interactive 2D map of topic relationships
        - **Barchart**: Top keywords for each topic
        - **Heatmap**: Topic similarity matrix
        - **Document datamap**: Visual clustering of documents by topic
        
        # Step 12. Cleanup (optional)
        
        Remove the conda environment when finished:
        
        ```bash
        conda deactivate
        conda env remove -n rapids-25.10
        ```
        
        Remove the downloaded dataset:
        
        ```bash
        rm Electronics.jsonl.gz
        ```
        
        # Next steps
        
        Apply this workflow to your own datasets:
        
        1. **Adjust data size**: Modify `nrows` parameter when loading data to process smaller subsets
        2. **Tune clustering**: Experiment with `min_cluster_size` and `min_samples` in HDBSCAN
        3. **Try different embedding models**: Swap `all-MiniLM-L6-v2` for domain-specific models
        4. **Export topics**: Save the topic model using `topic_model.save()` for later analysis
        5. **Monitor GPU usage**: Run `nvidia-smi -l 1` to watch GPU utilization during processing
        
      
    - 
      id: troubleshooting
      
      label: Troubleshooting
      content: |
        # Common issues
        
        | Symptom | Cause | Fix |
        |---------|-------|-----|
        | "Permission denied" on `~/.cache/huggingface` or Hugging Face download fails | Cache dir owned by root or wrong permissions | Run `sudo chown -R $USER:$USER $HOME/.cache/huggingface` and `sudo chmod -R u+rwX $HOME/.cache/huggingface` (use your username if different). |
        | `PackagesNotFoundError` for `jupyterlab-widgets` with conda | Package not available for platform/channel | Install with pip: `pip install jupyterlab-widgets`. |
        | Pip reports dependency conflicts (dask, distributed, cuml, rapids-dask-dependency) after installing BERTopic stack | Pip downgrades dask/distributed; RAPIDS expects newer versions | BERTopic and the notebook typically still work. To avoid conflicts, install BERTopic/umap/hdbscan in a separate env, or accept the conflict if you do not need cuML + dask together. |
        | `CUDA out of memory` error during embedding generation | Insufficient GPU memory for batch size | Reduce batch size in `model.encode()` or process fewer documents by lowering `nrows` |
        | `ModuleNotFoundError: No module named 'cuml'` | cuML not installed or wrong environment | Verify `conda activate rapids-25.10` and run `%load_ext cuml.accel` before imports |
        | Notebook kernel dies during UMAP | Out of memory during dimensionality reduction | Reduce dataset size or use `low_memory=True` in UMAP parameters |
        | `wget` download fails or hangs | Network issues or firewall blocking | Check internet connection, try with `--retry-connrefused --waitretry=1 --read-timeout=20` |
        | Kernel not found in JupyterLab | rapids-25.10 kernel not registered | Run `python -m ipykernel install --user --name rapids-25.10` |
        | `cudf.pandas` not accelerating operations | Extension not loaded before pandas import | Restart kernel and ensure `%load_ext cudf.pandas` runs before `import pandas` |
        | Topic model produces too many/few topics | HDBSCAN parameters need tuning | Adjust `min_cluster_size` (larger = fewer topics) and `min_samples` |
        | Plotly visualizations not rendering | Renderer not configured for JupyterLab | Add `pio.renderers.default = "notebook"` after importing plotly |
        | `ResolvePackageNotFound` during conda install | Package version conflict or missing channel | Ensure `-c rapidsai -c conda-forge` channels are specified |
        | PyTorch not using GPU | Wrong PyTorch version or CUDA mismatch | Reinstall with correct CUDA version: `pip install torch==2.9.0 --index-url https://download.pytorch.org/whl/cu130` |
        
      
    resources:
    - name: BERTopic Documentation
      url: https://maartengr.github.io/BERTopic/
      

    - name: RAPIDS cuML Documentation
      url: https://docs.rapids.ai/api/cuml/stable/
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00			`kind: Playbook`
			`metadata:`
			`name: station-topic-modeling`
			`displayName: Topic Modeling`
			`shortDescription: Extract insights from massive text datasets using cuML's GPU-accelerated BERTopic`

			`publisher: nvidia`
			`description: \|`
			`# REPLACE THIS WITH YOUR MODEL CARD`
			`https://gitlab-master.nvidia.com/api-catalog/examples/-/blob/main/modelcard-example-mixtral8x7b.md?ref_type=heads`

			`labelsV2:`
			`- gpuType:playbook:gpu_type_station`
			`- Data Science`
			`- Machine Learning`
			`- NLP`
			`- cuML`
			`- BERTopic`

			`attributes:`
			`- key: DURATION`
			`value: 45 MIN`

			`spec:`
			`artifactName: station-topic-modeling`
			`nvcfFunctionId: None`
			`attributes:`

			`showUnavailableBanner: false`
			`apiDocsUrl: None`
			`termsOfUse: \|`

			`cta:`
			`text: View on GitHub`
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`url: https://github.com/NVIDIA/dgx-station-playbooks/blob/main/nvidia/station-topic-modeling/`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00

			`tabs:`
			`-`
			`id: overview`

			`label: Overview`
			`content: \|`
			`# Basic idea`

			`Topic modeling helps you discover hidden themes in large document collections—but traditional methods crawl when datasets grow to millions of records. This playbook shows how to process 40 million Amazon product reviews in minutes using GPU-accelerated BERTopic.`

			`BERTopic combines transformer embeddings with clustering to extract human-readable topics from text. By swapping CPU-based UMAP and HDBSCAN with GPU-accelerated versions from RAPIDS cuML, you get the same results dramatically faster—no code changes required.`

			- Drop-in GPU acceleration: Load `cuml.accel` and your existing UMAP/HDBSCAN code runs on GPU automatically
			`- Scale to millions: Process datasets that would take hours on CPU in minutes on GPU`
			`- Interactive visualizations: Explore topic distributions, relationships, and document clusters`

			`# What you'll accomplish`

			`You'll run a complete topic modeling pipeline on 40 million product reviews and generate interactive visualizations of discovered topics.`

			`By the end, you'll be able to:`
			`- Use cuML's drop-in accelerators for UMAP and HDBSCAN`
			`- Generate sentence embeddings at scale with SentenceTransformers`
			`- Create topic visualizations including heatmaps, barcharts, and document datamaps`

			`# What to know before starting`

			`- Experience with Python and Jupyter notebooks`
			`- Basic understanding of machine learning concepts (embeddings, clustering)`
			`- Familiarity with pandas DataFrames`

			`# Prerequisites`

			`Hardware Requirements:`
			`- NVIDIA DGX Station with GB300 GPU`
			`- Minimum 64GB GPU memory for processing 40M documents`
			`- At least 50GB available storage for dataset and embeddings`

			`Software Requirements:`
			- Conda (Miniconda or Anaconda): `conda --version`
			- CUDA 13.0 compatible drivers: `nvidia-smi`
			`- Network access to download the Amazon Reviews dataset (~14GB compressed)`

			`# Ancillary files`

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			All required assets are in the playbook directory `nvidia/station-topic-modeling/assets` (see Step 7). Key file:
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			- `video_notebook_for_GPU_Accelerated_Machine_Learning_BERTopic_RTX6000_40M.ipynb` - Complete Jupyter notebook with GPU-accelerated topic modeling pipeline (filename reflects original demo hardware; the notebook runs on GB300 and other NVIDIA GPUs)

			`# Time & risk`

			`* Estimated time: 45 minutes (includes environment setup, dataset download, and embedding generation)`
			`* Risk level: Low`
			`* Large dataset download (~14GB) may take time depending on network speed`
			`* Embedding generation requires significant GPU memory`
			`* Rollback: Delete the downloaded dataset and any generated embedding files to restore state`
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`* Last Updated: 02/05/2026`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00			`* First Publication`



			`-`
			`id: instructions`

			`label: Instructions`
			`content: \|`
			`# Step 1. (DGX Station) Hugging Face cache permissions`

			`On DGX Station, ensure the Hugging Face cache is writable so model downloads succeed:`

			```bash
			`sudo chown -R $USER:$USER $HOME/.cache/huggingface 2>/dev/null \|\| true`
			`sudo chmod -R u+rwX $HOME/.cache/huggingface 2>/dev/null \|\| true`
			`mkdir -p $HOME/.cache/huggingface`
			```

			If you see "Permission denied" when downloading models later, run the `chown`/`chmod` lines with your username (e.g. `nvidia`).

			`# Step 2. Install RAPIDS cuDF and cuML`

			`Create a new conda environment with RAPIDS libraries for GPU-accelerated data processing.`

			```bash
			`conda create -n rapids-25.10 \`
			`-c rapidsai -c conda-forge \`
			`cudf=25.10 cuml=25.10 python=3.11 'cuda-version=13.0'`
			```

			`This installs cuDF (GPU DataFrame library) and cuML (GPU machine learning library) that provide drop-in acceleration for pandas and scikit-learn operations.`

			`# Step 3. Activate the conda environment`

			```bash
			`conda activate rapids-25.10`
			```

			`# Step 4. Install machine learning packages`

			`Install UMAP, HDBSCAN, BERTopic, and supporting libraries for topic modeling.`

			```bash
			`pip install \`
			`transformers datasets sentence-transformers \`
			`umap-learn hdbscan==0.8.40 bertopic matplotlib \`
			`scikit-learn==1.4.2 datamapplot`
			```

			`These packages provide:`
			`- sentence-transformers: Generate text embeddings`
			`- umap-learn / hdbscan: Dimensionality reduction and clustering (GPU-accelerated via cuML)`
			`- bertopic: Topic modeling framework`
			`- datamapplot: Document visualization`

			`> [!NOTE]`
			`> Pip may report dependency conflicts (e.g. dask/distributed downgraded, cuml/rapids-dask-dependency). BERTopic and the notebook can still run. If you need cuML and RAPIDS dask together, consider keeping the conda default dask versions and installing only the BERTopic stack via pip in a separate env; see Troubleshooting.`

			`# Step 5. Install visualization packages`

			`Install JupyterLab and visualization libraries for interactive topic exploration.`

			```bash
			`conda install -c conda-forge \`
			`notebook=7.5.0 \`
			`jupyterlab=4.5.0 \`
			`ipywidgets=8.1.8 \`
			`jupyterlab-widgets=3.0.16 \`
			`bokeh=3.8.1 \`
			`colorcet=3.1.0 \`
			`datashader=0.18.2 \`
			`plotly=6.5.0`
			```

			If conda reports `PackagesNotFoundError` for `jupyterlab-widgets` (e.g. on some platforms), install it with pip:

			```bash
			`pip install jupyterlab-widgets`
			```

			`# Step 6. Install compatible PyTorch`

			`Install PyTorch with CUDA 13.0 support for GPU-accelerated embedding generation.`

			```bash
			`pip install torch==2.9.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130`
			```

			`# Step 7. Clone the repository and download the dataset`

			`Clone the playbook repository and download the Amazon Electronics Reviews dataset.`

			```bash
chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`git clone https://github.com/NVIDIA/dgx-station-playbooks`
			`cd dgx-station-playbooks/nvidia/station-topic-modeling/assets`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00			```

			`Download the dataset (~14GB compressed):`

			```bash
			`wget https://mcauleylab.ucsd.edu/public_datasets/data/amazon_2023/raw/review_categories/Electronics.jsonl.gz`
			```

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`# Step 8. Launch JupyterLab`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			`Start JupyterLab from the assets directory:`

			```bash
			`jupyter lab`
			```

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`# Step 9. Select the rapids-25.10 kernel`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			In JupyterLab, open the notebook `video_notebook_for_GPU_Accelerated_Machine_Learning_BERTopic_1M.ipynb`.

			`Select the rapids-25.10 kernel from the kernel selector in the top right corner of the notebook interface.`

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`# Step 10. Execute all cells`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			`Run all cells in the notebook sequentially. The notebook will:`

			1. Load data with cuDF: GPU-accelerated pandas via `%load_ext cudf.pandas`
			`2. Preprocess text: Clean and normalize review text`
			`3. Generate embeddings: Create sentence embeddings`
			4. Enable GPU acceleration: Load cuML accelerators via `%load_ext cuml.accel`
			`5. Run BERTopic: Cluster documents into topics using GPU-accelerated UMAP and HDBSCAN`
			`6. Visualize results: Generate interactive topic visualizations`

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`# Step 11. Explore the results`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			`After the notebook completes, you'll have:`

			`- Topic information table: Discovered topics with keywords and document counts`
			`- Topic visualization: Interactive 2D map of topic relationships`
			`- Barchart: Top keywords for each topic`
			`- Heatmap: Topic similarity matrix`
			`- Document datamap: Visual clustering of documents by topic`

chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`# Step 12. Cleanup (optional)`
chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			`Remove the conda environment when finished:`

			```bash
			`conda deactivate`
			`conda env remove -n rapids-25.10`
			```

			`Remove the downloaded dataset:`

			```bash
			`rm Electronics.jsonl.gz`
			```

			`# Next steps`

			`Apply this workflow to your own datasets:`

			1. Adjust data size: Modify `nrows` parameter when loading data to process smaller subsets
			2. Tune clustering: Experiment with `min_cluster_size` and `min_samples` in HDBSCAN
			3. Try different embedding models: Swap `all-MiniLM-L6-v2` for domain-specific models
			4. Export topics: Save the topic model using `topic_model.save()` for later analysis
			5. Monitor GPU usage: Run `nvidia-smi -l 1` to watch GPU utilization during processing



chore: Regenerate all playbooks 2026-06-11 01:07:29 +00:00			`-`
			`id: troubleshooting`

			`label: Troubleshooting`
			`content: \|`
			`# Common issues`

			`\| Symptom \| Cause \| Fix \|`
			`\|---------\|-------\|-----\|`
			\| "Permission denied" on `~/.cache/huggingface` or Hugging Face download fails \| Cache dir owned by root or wrong permissions \| Run `sudo chown -R $USER:$USER $HOME/.cache/huggingface` and `sudo chmod -R u+rwX $HOME/.cache/huggingface` (use your username if different). \|
			\| `PackagesNotFoundError` for `jupyterlab-widgets` with conda \| Package not available for platform/channel \| Install with pip: `pip install jupyterlab-widgets`. \|
			`\| Pip reports dependency conflicts (dask, distributed, cuml, rapids-dask-dependency) after installing BERTopic stack \| Pip downgrades dask/distributed; RAPIDS expects newer versions \| BERTopic and the notebook typically still work. To avoid conflicts, install BERTopic/umap/hdbscan in a separate env, or accept the conflict if you do not need cuML + dask together. \|`
			\| `CUDA out of memory` error during embedding generation \| Insufficient GPU memory for batch size \| Reduce batch size in `model.encode()` or process fewer documents by lowering `nrows` \|
			\| `ModuleNotFoundError: No module named 'cuml'` \| cuML not installed or wrong environment \| Verify `conda activate rapids-25.10` and run `%load_ext cuml.accel` before imports \|
			\| Notebook kernel dies during UMAP \| Out of memory during dimensionality reduction \| Reduce dataset size or use `low_memory=True` in UMAP parameters \|
			\| `wget` download fails or hangs \| Network issues or firewall blocking \| Check internet connection, try with `--retry-connrefused --waitretry=1 --read-timeout=20` \|
			\| Kernel not found in JupyterLab \| rapids-25.10 kernel not registered \| Run `python -m ipykernel install --user --name rapids-25.10` \|
			\| `cudf.pandas` not accelerating operations \| Extension not loaded before pandas import \| Restart kernel and ensure `%load_ext cudf.pandas` runs before `import pandas` \|
			\| Topic model produces too many/few topics \| HDBSCAN parameters need tuning \| Adjust `min_cluster_size` (larger = fewer topics) and `min_samples` \|
			\| Plotly visualizations not rendering \| Renderer not configured for JupyterLab \| Add `pio.renderers.default = "notebook"` after importing plotly \|
			\| `ResolvePackageNotFound` during conda install \| Package version conflict or missing channel \| Ensure `-c rapidsai -c conda-forge` channels are specified \|
			\| PyTorch not using GPU \| Wrong PyTorch version or CUDA mismatch \| Reinstall with correct CUDA version: `pip install torch==2.9.0 --index-url https://download.pytorch.org/whl/cu130` \|



chore: Regenerate all playbooks 2026-05-26 18:25:53 +00:00
			`resources:`
			`- name: BERTopic Documentation`
			`url: https://maartengr.github.io/BERTopic/`


			`- name: RAPIDS cuML Documentation`
			`url: https://docs.rapids.ai/api/cuml/stable/`