chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2025-10-14 00:40:26 +00:00
parent e17deb3167
commit 34239a8313
9 changed files with 0 additions and 4572 deletions

View File

@ -24,12 +24,10 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
- [Comfy UI](nvidia/comfy-ui/)
- [Set Up Local Network Access](nvidia/connect-to-your-spark/)
- [Connect Two Sparks](nvidia/connect-two-sparks/)
- [CUDA-X Data Science](nvidia/cuda-x-data-science/)
- [DGX Dashboard](nvidia/dgx-dashboard/)
- [FLUX.1 Dreambooth LoRA Fine-tuning](nvidia/flux-finetuning/)
- [Optimized JAX](nvidia/jax/)
- [LLaMA Factory](nvidia/llama-factory/)
- [MONAI Reasoning Model](nvidia/monai-reasoning/)
- [Build and Deploy a Multi-Agent Chatbot](nvidia/multi-agent-chatbot/)
- [Multi-modal Inference](nvidia/multi-modal-inference/)
- [NCCL for Two Sparks](nvidia/nccl/)
@ -38,16 +36,13 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
- [NVFP4 Quantization](nvidia/nvfp4-quantization/)
- [Ollama](nvidia/ollama/)
- [Open WebUI with Ollama](nvidia/open-webui/)
- [Use Open Fold](nvidia/protein-folding/)
- [Fine tune with Pytorch](nvidia/pytorch-fine-tune/)
- [RAG application in AI Workbench](nvidia/rag-ai-workbench/)
- [SGLang Inference Server](nvidia/sglang/)
- [Speculative Decoding](nvidia/speculative-decoding/)
- [Set up Tailscale on your Spark](nvidia/tailscale/)
- [TRT LLM for Inference](nvidia/trt-llm/)
- [Text to Knowledge Graph](nvidia/txt2kg/)
- [Unsloth on DGX Spark](nvidia/unsloth/)
- [Vibe Coding in VS Code](nvidia/vibe-coding/)
- [Install and Use vLLM for Inference](nvidia/vllm/)
- [Vision-Language Model Fine-tuning](nvidia/vlm-finetuning/)
- [VS Code](nvidia/vscode/)

View File

@ -1,73 +0,0 @@
# CUDA-X Data Science
> Install and use NVIDIA cuML and NVIDIA cuDF to accelerate UMAP, HDBSCAN, pandas and more with zero code changes
## Table of Contents
- [Overview](#overview)
- [Instructions](#instructions)
---
## Overview
## Basic Idea
This playbook includes two example notebooks that demonstrate the acceleration of key machine learning algorithms and core pandas operations using CUDA-X Data Science libraries:
- **NVIDIA cuDF:** Accelerates operations for data preparation and core data processing of 8GB of strings data, with no code changes.
- **NVIDIA cuML:** Accelerates popular, compute intensive machine learning algorithms in sci-kit learn (LinearSVC), UMAP, and HDBSCAN, with no code changes.
CUDA-X Data Science (formally RAPIDS) is an open-source library collection that accelerates the data science and data processing ecosystem. These libraries accelerate popular Python tools like scikit-learn and pandas with zero code changes. On DGX Spark, these libraries maximize performance at your desk with your existing code.
## What you'll accomplish
You will accelerate popular machine learning algorithms and data analytics operations GPU. You will understand how to accelerate popular Python tools, and the value of running data science workflows on your DGX Spark.
## Prerequisites
- Familiarity with pandas, scikit-learn, machine learning algorithms, such as support vector machine, clustering, and dimensionality reduction algorithms.
- Install conda
- Generate a Kaggle API key
## Time & risk
- Duration:
- 20-30 minutes setup time.
- 2-3 minutes to run each notebook.
## Instructions
## Step 1. Verify system requirements
- Verify the system has CUDA 13 installed
- Verify the python version is greater than 3.10
- Install conda using [these instructions](https://docs.anaconda.com/miniconda/install/)
- Create Kaggle API key using [these instructions](https://www.kaggle.com/discussions/general/74235) and place the **kaggle.json** file in the same folder as the notebook
## Step 2. Installing Data Science libraries
- Use the following command to install the CUDA-X libraries (this will create a new conda environment)
```bash
conda create -n rapids-test -c rapidsai-nightly -c conda-forge -c nvidia \
rapids=25.10 python=3.12 'cuda-version=13.0' \
jupyterlab hdbscan umap-learn
```
## Step 3. Activate the conda environment
- Activate the conda environment
```bash
conda activate rapids-test
```
## Step 4. Cloning the playbook repository
- Clone the github repository and go the assets folder place in cuda-x-data-science folder
```bash
git clone https://github.com/NVIDIA/dgx-spark-playbooks
```
- Place the **kaggle.json** created in Step 1 in the assets folder
## Step 5. Run the notebooks
There are two notebooks in the GitHub repository.
One runs an example of a large strings data processing workflow with pandas code on GPU.
- Run the cudf_pandas_demo.ipynb notebook
```bash
jupyter notebook cudf_pandas_demo.ipynb
```
The other goes over an example of machine learning algorithms including UMAP and HDBSCAN.
- Run the cuml_sklearn_demo.ipynb notebook
```bash
jupyter notebook cuml_sklearn_demo.ipynb
```

View File

@ -1,969 +0,0 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "84635d55-68a2-468b-ac09-9029ebdab55f",
"metadata": {
"id": "84635d55-68a2-468b-ac09-9029ebdab55f"
},
"source": [
"# Accelerating large string data processing with cudf pandas accelerator mode (cudf.pandas)\n",
"<a href=\"https://github.com/rapidsai/cudf\">cuDF</a> is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating tabular data using a DataFrame style API in the style of pandas.\n",
"\n",
"cuDF now provides a <a href=\"https://rapids.ai/cudf-pandas/\">pandas accelerator mode</a> (`cudf.pandas`), allowing you to bring accelerated computing to your pandas workflows without requiring any code change.\n",
"\n",
"This notebook demonstrates how cuDF pandas accelerator mode can help accelerate processing of datasets with large string fields (4 GB+) processing by simply adding a `%load_ext` command. We have introduced this feature as part of our Rapids 24.08 release.\n",
"\n",
"**Author:** Allison Ding, Mitesh Patel <br>\n",
"**Date:** October 3, 2025"
]
},
{
"cell_type": "markdown",
"id": "bb8fe7ab-c055-40e9-897d-c62c72f28a16",
"metadata": {
"id": "bb8fe7ab-c055-40e9-897d-c62c72f28a16"
},
"source": [
"# ⚠️ Verify your setup\n",
"\n",
"First, we'll verify that you are running with an NVIDIA GPU."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "a88b8586-cfdd-4d31-9b4d-9be8508f7ba0",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "a88b8586-cfdd-4d31-9b4d-9be8508f7ba0",
"outputId": "18525b64-b34b-40e3-ed3a-1ad56ae794b5"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fri Oct 3 23:16:52 2025 \n",
"+-----------------------------------------------------------------------------------------+\n",
"| NVIDIA-SMI 580.82.09 Driver Version: 580.82.09 CUDA Version: 13.0 |\n",
"+-----------------------------------------+------------------------+----------------------+\n",
"| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
"| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
"| | | MIG M. |\n",
"|=========================================+========================+======================|\n",
"| 0 NVIDIA GB10 Off | 0000000F:01:00.0 Off | N/A |\n",
"| N/A 44C P0 10W / N/A | Not Supported | 0% Default |\n",
"| | | N/A |\n",
"+-----------------------------------------+------------------------+----------------------+\n",
"\n",
"+-----------------------------------------------------------------------------------------+\n",
"| Processes: |\n",
"| GPU GI CI PID Type Process name GPU Memory |\n",
"| ID ID Usage |\n",
"|=========================================================================================|\n",
"| 0 N/A N/A 3405 G /usr/lib/xorg/Xorg 242MiB |\n",
"| 0 N/A N/A 3562 G /usr/bin/gnome-shell 53MiB |\n",
"| 0 N/A N/A 214921 C .../envs/rapids-25.10/bin/python 196MiB |\n",
"+-----------------------------------------------------------------------------------------+\n"
]
}
],
"source": [
"!nvidia-smi # this should display information about available GPUs"
]
},
{
"cell_type": "markdown",
"id": "5cd58071-4371-428b-8a02-9cd66e6cb91f",
"metadata": {
"id": "5cd58071-4371-428b-8a02-9cd66e6cb91f"
},
"source": [
"# Download the data"
]
},
{
"cell_type": "markdown",
"id": "9eb67713-7cf4-415a-bce7-ff4695862faa",
"metadata": {
"id": "9eb67713-7cf4-415a-bce7-ff4695862faa"
},
"source": [
"## Overview\n",
"The data we'll be working with summarizes job postings data that a developer working at a job listing firm might analyze to understand posting trends.\n",
"\n",
"We'll need to download a curated copy of this [Kaggle dataset](https://www.kaggle.com/datasets/asaniczka/1-3m-linkedin-jobs-and-skills-2024/data?select=job_summary.csv) directly from the kaggle API. \n",
"\n",
"**Data License and Terms** <br>\n",
"As this dataset originates from a Kaggle dataset, it's governed by that dataset's license and terms of use, which is the Open Data Commons license. Review here:https://opendatacommons.org/licenses/by/1-0/index.html. For each dataset an user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose.\n",
"\n",
"**Are there restrictions on how I can use this data? </br>**\n",
"For each dataset an user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose.\n",
"\n",
"## Get the Data\n",
"First, [please follow these instructions from Kaggle to download and/or updating your Kaggle API token to get acces the dataset](https://www.kaggle.com/discussions/general/74235). \n",
"\n",
"Once generated, make sure to have the **kaggle.json** file in the same folder as the notebook\n",
"\n",
"Next, run this code below, which should also take 1-2 minutes:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "406838c6-267c-423e-82ab-ea13d5fa9c90",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: kaggle in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (1.7.4.5)\n",
"Requirement already satisfied: bleach in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (6.2.0)\n",
"Requirement already satisfied: certifi>=14.05.14 in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (2025.8.3)\n",
"Requirement already satisfied: charset-normalizer in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (3.4.3)\n",
"Requirement already satisfied: idna in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (3.10)\n",
"Requirement already satisfied: protobuf in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (6.32.1)\n",
"Requirement already satisfied: python-dateutil>=2.5.3 in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (2.9.0.post0)\n",
"Requirement already satisfied: python-slugify in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (8.0.4)\n",
"Requirement already satisfied: requests in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (2.32.5)\n",
"Requirement already satisfied: setuptools>=21.0.0 in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (80.9.0)\n",
"Requirement already satisfied: six>=1.10 in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (1.17.0)\n",
"Requirement already satisfied: text-unidecode in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (1.3)\n",
"Requirement already satisfied: tqdm in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (4.67.1)\n",
"Requirement already satisfied: urllib3>=1.15.1 in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (2.5.0)\n",
"Requirement already satisfied: webencodings in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (0.5.1)\n"
]
}
],
"source": [
"!pip install kaggle\n",
"!mkdir -p ~/.kaggle\n",
"!cp kaggle.json ~/.kaggle/\n",
"!chmod 600 ~/.kaggle/kaggle.json"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "3efacb3c-5f3d-4ff0-b32a-76bbb80b5f74",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3efacb3c-5f3d-4ff0-b32a-76bbb80b5f74",
"outputId": "5fe4a878-cf57-44f9-e40e-ed413035b150"
},
"outputs": [],
"source": [
"# Download the dataset through kaggle API-\n",
"!kaggle datasets download -d asaniczka/1-3m-linkedin-jobs-and-skills-2024\n",
"#unzip the file to access contents\n",
"!unzip 1-3m-linkedin-jobs-and-skills-2024.zip"
]
},
{
"cell_type": "markdown",
"id": "2__ZMVe6LaBJ",
"metadata": {
"id": "2__ZMVe6LaBJ"
},
"source": [
"# Analysis with cuDF Pandas"
]
},
{
"cell_type": "markdown",
"id": "df47f304-2b30-4380-afd5-0613b63d103d",
"metadata": {},
"source": [
"The magic command `%load_ext cudf.pandas` enables GPU acceleration for pandas data processing in a Jupyter notebook, allowing most pandas operations to automatically execute on NVIDIA GPUs for improved performance. \n",
"\n",
"With this extension loaded before importing pandas, your code can use standard pandas syntax while gaining the benefits of GPU speedup, automatically falling back to CPU execution for operations not supported on the GPU. This provides a seamless way to accelerate existing pandas workflows with zero code changes, especially for large data analytics tasks or machine learning preprocessing."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e5cd2520-30a6-41c1-b7c5-5abe0eb90d82",
"metadata": {},
"outputs": [],
"source": [
"%load_ext cudf.pandas"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "eadb8d77-cb45-4c7c-ae9f-77e47a4f29b3",
"metadata": {
"id": "eadb8d77-cb45-4c7c-ae9f-77e47a4f29b3"
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"id": "196268f2-6169-4ed7-a9e6-db9078caa6ab",
"metadata": {
"id": "196268f2-6169-4ed7-a9e6-db9078caa6ab"
},
"source": [
"We'll run a piece of code to get a feel what GPU-acceleration brings to pandas workflows."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "2688bfeb-58c4-4fc0-9233-8d7e2759ec46",
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"start_time = time.time()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ae3b6a16-ff72-4421-b43c-06c33f57ec12",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ae3b6a16-ff72-4421-b43c-06c33f57ec12",
"outputId": "656acbf7-078f-42b3-832d-ad4e84e01c70"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 185 ms, sys: 2.08 s, total: 2.27 s\n",
"Wall time: 2.95 s\n",
"Dataset Size (in GB): 4.76\n"
]
}
],
"source": [
"%time job_summary_df = pd.read_csv(\"job_summary.csv\", dtype=('str'))\n",
"print(\"Dataset Size (in GB):\",round(job_summary_df.memory_usage(\n",
" deep=True).sum()/(1024**3),2))"
]
},
{
"cell_type": "markdown",
"id": "01c506e1-f135-4afb-8fc7-23e72c05d73c",
"metadata": {
"id": "01c506e1-f135-4afb-8fc7-23e72c05d73c"
},
"source": [
"The same dataset takes about around 1.5 minutes to load with pandas. That's around **5x speedup** with no changes to the code!"
]
},
{
"cell_type": "markdown",
"id": "d9d0a0e1-1d74-494d-bd12-b829f11eeede",
"metadata": {
"id": "d9d0a0e1-1d74-494d-bd12-b829f11eeede"
},
"source": [
"Let's load the remaining two datasets as well:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "12e4cf7e-8824-4822-9d30-46b81ba2acd7",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "12e4cf7e-8824-4822-9d30-46b81ba2acd7",
"outputId": "5ca1be17-09e3-40ab-928b-82176bf597bf"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 45.3 ms, sys: 199 ms, total: 244 ms\n",
"Wall time: 354 ms\n"
]
}
],
"source": [
"%%time\n",
"job_skills_df = pd.read_csv(\"job_skills.csv\", dtype=('str'))\n",
"job_postings_df = pd.read_csv(\"linkedin_job_postings.csv\", dtype=('str'))"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "13c8f9da-121f-4311-8a79-274425363e5e",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 276
},
"id": "13c8f9da-121f-4311-8a79-274425363e5e",
"outputId": "a73599c1-05b2-4f56-a190-c69c017bb330"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 4.46 ms, sys: 3.1 ms, total: 7.56 ms\n",
"Wall time: 46.3 ms\n"
]
},
{
"data": {
"text/plain": [
"0 957\n",
"1 3816\n",
"2 5314\n",
"3 2774\n",
"4 2749\n",
"Name: summary_length, dtype: int32"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"job_summary_df['summary_length'] = job_summary_df['job_summary'].str.len()\n",
"job_summary_df['summary_length'].head()"
]
},
{
"cell_type": "markdown",
"id": "67b68792-5c64-4ebd-9d80-cf6ff55baeef",
"metadata": {
"id": "67b68792-5c64-4ebd-9d80-cf6ff55baeef"
},
"source": [
"That was lightning fast! We went from around 10+ (with pandas) to a few milliseconds."
]
},
{
"cell_type": "code",
"execution_count": 39,
"id": "31e1cc84-debb-4da7-bc20-5c7139f786f7",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 504
},
"id": "31e1cc84-debb-4da7-bc20-5c7139f786f7",
"outputId": "2d89fc49-7e5b-41db-c25b-441d54480711"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 39.8 ms, sys: 30 ms, total: 69.8 ms\n",
"Wall time: 211 ms\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>job_link</th>\n",
" <th>last_processed_time</th>\n",
" <th>got_summary</th>\n",
" <th>got_ner</th>\n",
" <th>is_being_worked</th>\n",
" <th>job_title</th>\n",
" <th>company</th>\n",
" <th>job_location</th>\n",
" <th>first_seen</th>\n",
" <th>search_city</th>\n",
" <th>search_country</th>\n",
" <th>search_position</th>\n",
" <th>job_level</th>\n",
" <th>job_type</th>\n",
" <th>job_summary</th>\n",
" <th>summary_length</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>https://www.linkedin.com/jobs/view/account-exe...</td>\n",
" <td>2024-01-21 07:12:29.00256+00</td>\n",
" <td>t</td>\n",
" <td>t</td>\n",
" <td>f</td>\n",
" <td>Account Executive - Dispensing (NorCal/Norther...</td>\n",
" <td>BD</td>\n",
" <td>San Diego, CA</td>\n",
" <td>2024-01-15</td>\n",
" <td>Coronado</td>\n",
" <td>United States</td>\n",
" <td>Color Maker</td>\n",
" <td>Mid senior</td>\n",
" <td>Onsite</td>\n",
" <td>Responsibilities\\nJob Description Summary\\nJob...</td>\n",
" <td>4602</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>https://www.linkedin.com/jobs/view/registered-...</td>\n",
" <td>2024-01-21 07:39:58.88137+00</td>\n",
" <td>t</td>\n",
" <td>t</td>\n",
" <td>f</td>\n",
" <td>Registered Nurse - RN Care Manager</td>\n",
" <td>Trinity Health MI</td>\n",
" <td>Norton Shores, MI</td>\n",
" <td>2024-01-14</td>\n",
" <td>Grand Haven</td>\n",
" <td>United States</td>\n",
" <td>Director Nursing Service</td>\n",
" <td>Mid senior</td>\n",
" <td>Onsite</td>\n",
" <td>Employment Type:\\nFull time\\nShift:\\nDescripti...</td>\n",
" <td>2950</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>https://www.linkedin.com/jobs/view/restaurant-...</td>\n",
" <td>2024-01-21 07:40:00.251126+00</td>\n",
" <td>t</td>\n",
" <td>t</td>\n",
" <td>f</td>\n",
" <td>RESTAURANT SUPERVISOR - THE FORKLIFT</td>\n",
" <td>Wasatch Adaptive Sports</td>\n",
" <td>Sandy, UT</td>\n",
" <td>2024-01-14</td>\n",
" <td>Tooele</td>\n",
" <td>United States</td>\n",
" <td>Stand-In</td>\n",
" <td>Mid senior</td>\n",
" <td>Onsite</td>\n",
" <td>Job Details\\nDescription\\nWhat You'll Do\\nAs a...</td>\n",
" <td>4571</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>https://www.linkedin.com/jobs/view/independent...</td>\n",
" <td>2024-01-21 07:40:00.308133+00</td>\n",
" <td>t</td>\n",
" <td>t</td>\n",
" <td>f</td>\n",
" <td>Independent Real Estate Agent</td>\n",
" <td>Howard Hanna | Rand Realty</td>\n",
" <td>Englewood Cliffs, NJ</td>\n",
" <td>2024-01-16</td>\n",
" <td>Pinehurst</td>\n",
" <td>United States</td>\n",
" <td>Real-Estate Clerk</td>\n",
" <td>Mid senior</td>\n",
" <td>Onsite</td>\n",
" <td>Who We Are\\nRand Realty is a family-owned brok...</td>\n",
" <td>3944</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>https://www.linkedin.com/jobs/view/group-unit-...</td>\n",
" <td>2024-01-19 09:45:09.215838+00</td>\n",
" <td>f</td>\n",
" <td>f</td>\n",
" <td>f</td>\n",
" <td>Group/Unit Supervisor (Systems Support Manager...</td>\n",
" <td>IRS, Office of Chief Counsel</td>\n",
" <td>Chamblee, GA</td>\n",
" <td>2024-01-17</td>\n",
" <td>Gadsden</td>\n",
" <td>United States</td>\n",
" <td>Supervisor Travel-Information Center</td>\n",
" <td>Mid senior</td>\n",
" <td>Onsite</td>\n",
" <td>None</td>\n",
" <td>&lt;NA&gt;</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" job_link \\\n",
"0 https://www.linkedin.com/jobs/view/account-exe... \n",
"1 https://www.linkedin.com/jobs/view/registered-... \n",
"2 https://www.linkedin.com/jobs/view/restaurant-... \n",
"3 https://www.linkedin.com/jobs/view/independent... \n",
"4 https://www.linkedin.com/jobs/view/group-unit-... \n",
"\n",
" last_processed_time got_summary got_ner is_being_worked \\\n",
"0 2024-01-21 07:12:29.00256+00 t t f \n",
"1 2024-01-21 07:39:58.88137+00 t t f \n",
"2 2024-01-21 07:40:00.251126+00 t t f \n",
"3 2024-01-21 07:40:00.308133+00 t t f \n",
"4 2024-01-19 09:45:09.215838+00 f f f \n",
"\n",
" job_title \\\n",
"0 Account Executive - Dispensing (NorCal/Norther... \n",
"1 Registered Nurse - RN Care Manager \n",
"2 RESTAURANT SUPERVISOR - THE FORKLIFT \n",
"3 Independent Real Estate Agent \n",
"4 Group/Unit Supervisor (Systems Support Manager... \n",
"\n",
" company job_location first_seen \\\n",
"0 BD San Diego, CA 2024-01-15 \n",
"1 Trinity Health MI Norton Shores, MI 2024-01-14 \n",
"2 Wasatch Adaptive Sports Sandy, UT 2024-01-14 \n",
"3 Howard Hanna | Rand Realty Englewood Cliffs, NJ 2024-01-16 \n",
"4 IRS, Office of Chief Counsel Chamblee, GA 2024-01-17 \n",
"\n",
" search_city search_country search_position \\\n",
"0 Coronado United States Color Maker \n",
"1 Grand Haven United States Director Nursing Service \n",
"2 Tooele United States Stand-In \n",
"3 Pinehurst United States Real-Estate Clerk \n",
"4 Gadsden United States Supervisor Travel-Information Center \n",
"\n",
" job_level job_type job_summary \\\n",
"0 Mid senior Onsite Responsibilities\\nJob Description Summary\\nJob... \n",
"1 Mid senior Onsite Employment Type:\\nFull time\\nShift:\\nDescripti... \n",
"2 Mid senior Onsite Job Details\\nDescription\\nWhat You'll Do\\nAs a... \n",
"3 Mid senior Onsite Who We Are\\nRand Realty is a family-owned brok... \n",
"4 Mid senior Onsite None \n",
"\n",
" summary_length \n",
"0 4602 \n",
"1 2950 \n",
"2 4571 \n",
"3 3944 \n",
"4 <NA> "
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"df_merged=pd.merge(job_postings_df, job_summary_df, how=\"left\", on=\"job_link\")\n",
"df_merged.head()"
]
},
{
"cell_type": "code",
"execution_count": 40,
"id": "0160a559-2b17-40a6-ad9d-34ce746236d0",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 490
},
"id": "0160a559-2b17-40a6-ad9d-34ce746236d0",
"outputId": "e397c28b-a90d-42d2-8a9a-4c6260c45b38"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 33.2 ms, sys: 17.3 ms, total: 50.6 ms\n",
"Wall time: 120 ms\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th></th>\n",
" <th>summary_length</th>\n",
" </tr>\n",
" <tr>\n",
" <th>company</th>\n",
" <th>job_title</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>ClickJobs.io</th>\n",
" <th>Adolescent Behavioral Health Therapist - Substance Use Specialty (Entry Senior Level) Psychiatry</th>\n",
" <td>23748.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mt. San Antonio College</th>\n",
" <th>Chief, Police and Campus Safety</th>\n",
" <td>22998.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>CareerBeacon</th>\n",
" <th>Airside/Groundside Project Manager [Halifax International Airport Authority]</th>\n",
" <td>22938.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Tacoma Community College</th>\n",
" <th>Anthropology Professor - Part-time</th>\n",
" <td>22790.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>IRS, Office of Chief Counsel</th>\n",
" <th>Program Analyst (12-Month Roster)</th>\n",
" <td>22774.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th rowspan=\"4\" valign=\"top\">鴻海精密工業股份有限公司</th>\n",
" <th>HR Specialist - Payroll &amp; Benefit</th>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Material Planner</th>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>RFQ Specialist</th>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Supply Chain Program Manager</th>\n",
" <td>0.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>🌟Daniel-Scott Recruitment Ltd🌟</th>\n",
" <th>IT Manager</th>\n",
" <td>0.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>801276 rows × 1 columns</p>\n",
"</div>"
],
"text/plain": [
" summary_length\n",
"company job_title \n",
"ClickJobs.io Adolescent Behavioral Health Therapist - Substa... 23748.0\n",
"Mt. San Antonio College Chief, Police and Campus Safety 22998.0\n",
"CareerBeacon Airside/Groundside Project Manager [Halifax Int... 22938.0\n",
"Tacoma Community College Anthropology Professor - Part-time 22790.0\n",
"IRS, Office of Chief Counsel Program Analyst (12-Month Roster) 22774.0\n",
"... ...\n",
"鴻海精密工業股份有限公司 HR Specialist - Payroll & Benefit 0.0\n",
" Material Planner 0.0\n",
" RFQ Specialist 0.0\n",
" Supply Chain Program Manager 0.0\n",
"🌟Daniel-Scott Recruitment Ltd🌟 IT Manager 0.0\n",
"\n",
"[801276 rows x 1 columns]"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"df_merged.groupby(['company',\"job_title\"]).agg({\n",
" \"summary_length\":\"mean\"}).sort_values(by='summary_length', ascending = False).fillna(0)"
]
},
{
"cell_type": "markdown",
"id": "IME4urGYQ3qS",
"metadata": {
"id": "IME4urGYQ3qS"
},
"source": [
"We went down from around 5 seconds to less than a second here. This is in line with our speedups on other operations!"
]
},
{
"cell_type": "code",
"execution_count": 41,
"id": "adc00726-f151-41f4-8731-a1ce1f83eea2",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 458
},
"id": "adc00726-f151-41f4-8731-a1ce1f83eea2",
"outputId": "46423696-b167-4ffe-bb3b-9de7f3e6d668"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 13.7 ms, sys: 20.3 ms, total: 34 ms\n",
"Wall time: 156 ms\n"
]
},
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>job_title</th>\n",
" <th>job_location</th>\n",
" <th>summary_length</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>🔥Nurse Manager, Patient Services - Operating Room</td>\n",
" <td>Lake George, NY</td>\n",
" <td>7342.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>🔥Behavioral Health RN 3 12s</td>\n",
" <td>Glens Falls, NY</td>\n",
" <td>2787.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>🔥 Surgical Technologist - Evenings</td>\n",
" <td>Lake George, NY</td>\n",
" <td>2920.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>🔥 Physician Practice Clinical Lead RN</td>\n",
" <td>Saratoga Springs, NY</td>\n",
" <td>2945.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>🔥 Physican Practice LPN - Green</td>\n",
" <td>Lake George, NY</td>\n",
" <td>2969.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1104106</th>\n",
" <td>\"Attorney\" (Gov Appt/Non-Merit) Jobs</td>\n",
" <td>Kentucky, United States</td>\n",
" <td>2427.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1104107</th>\n",
" <td>\"Accountant\"</td>\n",
" <td>Shavano Park, TX</td>\n",
" <td>1497.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1104108</th>\n",
" <td>\"Accountant\"</td>\n",
" <td>Basking Ridge, NJ</td>\n",
" <td>1073.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1104109</th>\n",
" <td>\"Accountant\"</td>\n",
" <td>Austin, TX</td>\n",
" <td>1993.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1104110</th>\n",
" <td>\"A\" Softball Coach - Central Middle School</td>\n",
" <td>East Corinth, ME</td>\n",
" <td>718.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1104111 rows × 3 columns</p>\n",
"</div>"
],
"text/plain": [
" job_title \\\n",
"0 🔥Nurse Manager, Patient Services - Operating Room \n",
"1 🔥Behavioral Health RN 3 12s \n",
"2 🔥 Surgical Technologist - Evenings \n",
"3 🔥 Physician Practice Clinical Lead RN \n",
"4 🔥 Physican Practice LPN - Green \n",
"... ... \n",
"1104106 \"Attorney\" (Gov Appt/Non-Merit) Jobs \n",
"1104107 \"Accountant\" \n",
"1104108 \"Accountant\" \n",
"1104109 \"Accountant\" \n",
"1104110 \"A\" Softball Coach - Central Middle School \n",
"\n",
" job_location summary_length \n",
"0 Lake George, NY 7342.0 \n",
"1 Glens Falls, NY 2787.0 \n",
"2 Lake George, NY 2920.0 \n",
"3 Saratoga Springs, NY 2945.0 \n",
"4 Lake George, NY 2969.0 \n",
"... ... ... \n",
"1104106 Kentucky, United States 2427.0 \n",
"1104107 Shavano Park, TX 1497.0 \n",
"1104108 Basking Ridge, NJ 1073.0 \n",
"1104109 Austin, TX 1993.0 \n",
"1104110 East Corinth, ME 718.0 \n",
"\n",
"[1104111 rows x 3 columns]"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%time\n",
"# Group by company, job_title, and month, and calculate the mean of summary_length\n",
"grouped_df = df_merged.groupby(['job_title', 'job_location']).agg({'summary_length': 'mean'})\n",
"\n",
"# Reset index to sort by job_title and month\n",
"grouped_df = grouped_df.reset_index()\n",
"\n",
"# Sort by job_title and month\n",
"sorted_df = grouped_df.sort_values(by=['job_title', 'job_location','summary_length'],\n",
" ascending=False).reset_index(drop=True).fillna(0)\n",
"sorted_df"
]
},
{
"cell_type": "markdown",
"id": "08c97b81-64c5-48fb-8fe0-d36789cf3deb",
"metadata": {
"id": "08c97b81-64c5-48fb-8fe0-d36789cf3deb"
},
"source": [
"The acceleration is consistently 10x+ for complex aggregations and sorting that involve multiple columns."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "4560fe8f-61f9-4c23-bf43-ed6a82e5456e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"5.182934522628784\n"
]
}
],
"source": [
"end_time = time.time()\n",
"execution_time = end_time - start_time\n",
"print(execution_time)"
]
},
{
"cell_type": "markdown",
"id": "9bcc719b-666a-4bc9-97d6-16f448b5c707",
"metadata": {
"id": "9bcc719b-666a-4bc9-97d6-16f448b5c707"
},
"source": [
"# Summary\n",
"\n",
"With cudf.pandas, you can keep using pandas as your primary dataframe library. When things start to get a little slow, just load the `cudf.pandas` extension and enjoy the incredible speedups.\n",
"\n",
"To learn more about cudf.pandas, we encourage you to visit https://rapids.ai/cudf-pandas."
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}

File diff suppressed because one or more lines are too long

View File

@ -1,305 +0,0 @@
# MONAI Reasoning Model
> Work with a MONAI-Reasoning-CXR-3B vision-language model through Open WebUI
## Table of Contents
- [Overview](#overview)
- [Instructions](#instructions)
- [Troubleshooting](#troubleshooting)
---
## Overview
## Basic idea
The MONAI Reasoning CXR 3B model is a **medical AI model** designed for **chest X-ray (CXR) interpretation** with reasoning capabilities. It combines imaging analysis with large-scale language modeling:
- **Medical focus**: Built within the MONAI framework for healthcare imaging tasks.
- **Vision + language**: Takes CXR images as input and produces diagnostic text or reasoning outputs.
- **Reasoning layer**: Goes beyond simple classification to explain intermediate steps (e.g., opacity → pneumonia suspicion).
- **3B scale**: A moderately large multimodal model (~3 billion parameters).
- **Trust and explainability**: Aims to make results more interpretable and clinically useful.
## What you'll accomplish
You'll deploy the MONAI-Reasoning-CXR-3B model, a specialized vision-language model for chest X-ray
analysis, on an NVIDIA Spark device with Blackwell GPU architecture. By the end of this
walkthrough, you will have a complete system running with VLLM serving the model for
high-performance inference and Open WebUI providing an easy-to-use interface for interacting
with the model. This setup is ideal for clinical demonstrations and research that requires
transparent AI reasoning.
## What to know before starting
* Experience with the Linux command line and shell scripting
* A basic understanding of Docker, including running containers and managing images
* Familiarity with Python and using pip for package management
* Knowledge of Large Language Models (LLMs) and how to interact with API endpoints
* Basic understanding of NVIDIA GPU hardware and CUDA drivers
## Prerequisites
**Hardware Requirements:**
* NVIDIA Spark device with ARM64 (AArch64) architecture
* NVIDIA Blackwell GPU architecture
* At least 24GB of GPU VRAM
**Software Requirements:**
* **NVIDIA Driver**: Ensure the driver is installed and the GPU is recognized
```bash
nvidia-smi
```
* **Docker Engine**: Docker must be installed and the daemon running
```bash
docker --version
```
* **NVIDIA Container Toolkit**: Required for GPU access in containers
```bash
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
```
* **Hugging Face CLI**: You'll need this to download the model
```bash
pip install -U huggingface_hub
huggingface-cli whoami
```
* **System Architecture**: Verify your system architecture for proper container selection
```bash
uname -m
## Should output: aarch64 for ARM64 systems like NVIDIA Spark
```
## Time & risk
* **Estimated time:** 20-35 minutes (not including model download)
* **Risk level:** Low. All steps use publicly available containers and models
* **Rollback:** The entire deployment is containerized. To roll back, you can simply stop and remove the Docker containers
## Instructions
## Step 1. Create the Project Directory
First, create a dedicated directory to store your model weights and configuration files. This
keeps the project organized and provides a clean workspace.
```bash
## Create the main directory
mkdir -p ~/monai-reasoning-spark
cd ~/monai-reasoning-spark
## Create a subdirectory for the model
mkdir -p models
```
## Step 2. Download the MONAI-Reasoning-CXR-3B Model
Use the Hugging Face CLI to download the model weights into the directory you just created.
The model is approximately 6GB and will take several minutes to download depending on your
internet connection.
```bash
huggingface-cli download monai/monai-reasoning-cxr-3b \
--local-dir ./models/monai-reasoning-cxr-3b \
--local-dir-use-symlinks False
```
**Verification Step:**
```bash
ls -la ./models/monai-reasoning-cxr-3b
## You should see model files including config.json and model weights
```
> [!IMPORTANT]
> Currently, a custom internal VLLM container is required until the sm121 support is available in the public image. The instructions below use the internal container `******:5005/dl/dgx/vllm:main-py3.31165712-devel`.
## Step 3. Verify System Architecture
Before proceeding, confirm your system architecture is ARM64 for proper container selection
on your NVIDIA Spark device:
```bash
## Check your system architecture
uname -m
## Should output: aarch64 for ARM64 systems like NVIDIA Spark
```
## Step 4. Create a Docker Network
Create a dedicated Docker bridge network to allow the VLLM and Open WebUI containers to
communicate with each other easily and reliably.
```bash
docker network create monai-net
```
## Step 5. Deploy the VLLM Server
Launch the VLLM container with ARM64 architecture support, attaching it to the network you
created and mounting your local model directory. This step configures the server for optimal
performance on NVIDIA Spark hardware.
```bash
## Stop and remove existing container if running
docker stop vllm-server 2>/dev/null || true
docker rm vllm-server 2>/dev/null || true
## Run the VLLM server with internal container
docker run --rm -d \
--name vllm-server \
--gpus all \
--ipc=host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--network monai-net \
--platform linux/arm64 \
-v ./models/monai-reasoning-cxr-3b:/model \
-p 8000:8000 \
******:5005/dl/dgx/vllm:main-py3.31165712-devel \
vllm serve /model \
--host 0.0.0.0 \
--port 8000 \
--dtype bfloat16 \
--trust-remote-code \
--gpu-memory-utilization 0.5 \
--enforce-eager \
--served-model-name monai-reasoning-cxr-3b
```
**Wait for startup and verify:**
```bash
## Wait for the model to load (can take 1-2 minutes on Spark hardware)
sleep 90
## Check if container is running
docker ps
## Test the VLLM API
curl http://localhost:8000/v1/models
```
You should see JSON output showing the model is loaded and available.
## Step 6. Deploy Open WebUI
Launch the Open WebUI container with ARM64 architecture support for your NVIDIA Spark device.
```bash
## Define custom prompt suggestions for medical X-ray analysis
PROMPT_SUGGESTIONS='[
{
"title": ["Analyze X-Ray Image", "Find abnormalities and support devices"],
"content": "Find abnormalities and support devices in the image."
}
]'
## Stop and remove existing container if running
docker stop open-webui 2>/dev/null || true
docker rm open-webui 2>/dev/null || true
sleep 5
## Run Open WebUI with custom configuration
docker run -d --rm \
--name open-webui \
--network monai-net \
--platform linux/arm64 \
-p 3000:8080 \
-e WEBUI_AUTH=0 \
-e WEBUI_NAME=monai-reasoning \
-e ENABLE_SIGNUP=0 \
-e ENABLE_ADMIN_CHAT_ACCESS=0 \
-e ENABLE_VERSION_UPDATE_CHECK=0 \
-e OPENAI_API_BASE_URL="http://vllm-server:8000/v1" \
-e DEFAULT_PROMPT_SUGGESTIONS="$PROMPT_SUGGESTIONS" \
ghcr.io/open-webui/open-webui:main
```
**Verify deployment:**
```bash
## Wait for startup
sleep 15
## Check both containers are running
docker ps
## Test Open WebUI accessibility
curl -f http://localhost:3000 || echo "Still starting up"
```
## Step 7. Validate the Complete Deployment
Check that both containers are running properly and all endpoints are accessible:
```bash
## Check container status
docker ps
## You should see both vllm-server and open-webui containers running
## Test the VLLM API
curl http://localhost:8000/v1/models
## Should return JSON with model information
## Test Open WebUI accessibility
curl -f http://localhost:3000
## Should return HTTP 200 response
```
## Step 8. Configure Open WebUI
Configure the front-end interface to connect to your VLLM backend:
1. Open your web browser and navigate to **http://<YOUR_SPARK_DEVICE_IP>:3000**
2. Since authentication is disabled, you'll have direct access to the interface
3. The OpenAI API connection is pre-configured through environment variables
4. Go to the main chat screen, click **"Select a model"**, and choose **monai-reasoning-cxr-3b**
5. **Important:** Navigate to **Chat Controls****Advanced Params** and disable **"Reasoning Tags"** to get the full reasoning output from the model
You can now upload a chest X-ray image and ask questions directly in the chat interface. The custom prompt suggestion "Find abnormalities and support devices in the image" will be available for quick access.
## Step 9. Cleanup and Rollback
To stop and remove the containers and network, run the following commands. This will not
delete your downloaded model weights.
> [!WARNING]
> This will stop all running containers and remove the network.
```bash
## Stop containers
docker stop vllm-server open-webui
## Remove network
docker network rm monai-net
## Optional: Remove model directory to free disk space
## rm -rf ~/monai-reasoning-spark/models
```
## Step 10. Next Steps
Your MONAI reasoning system is now ready for use. Upload chest X-ray images through the web
interface at http://<YOUR_SPARK_DEVICE_IP>:3000 and interact with the MONAI-Reasoning-CXR-3B model
for medical image analysis and reasoning tasks.
## Troubleshooting
| Symptom | Cause | Fix |
|---------|-------|-----|
| VLLM container fails to start | Insufficient GPU memory | Reduce `--gpu-memory-utilization` to 0.25 |
| Model download fails | Network connectivity or HF auth | Check `huggingface-cli whoami` and internet |
| Cannot access gated repo for URL | Certain HuggingFace models have restricted access | Regenerate your HuggingFace token; and request access to the gated model on your web browser |
| Open WebUI shows connection error | Wrong backend URL | Verify `OPENAI_API_BASE_URL` is set correctly |
| Model doesn't show full reasoning | Reasoning tags enabled | Disable "Reasoning Tags" in Chat Controls → Advanced Params |
> [!NOTE]
> DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
With many applications still updating to take advantage of UMA, you may encounter memory issues even when within
the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with:
```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```

View File

@ -1,415 +0,0 @@
# Use Open Fold
> Use OpenFold with TensorRT optimization
## Table of Contents
- [Overview](#overview)
- [Access through terminal](#access-through-terminal)
- [Step 7. Option B - Run locally with demo script](#step-7-option-b-run-locally-with-demo-script)
- [Using a custom FASTA file](#using-a-custom-fasta-file)
---
## Overview
## What you'll accomplish
You'll set up a GPU-accelerated protein folding workflow on NVIDIA Spark devices using OpenFold
with TensorRT optimization and MMseqs2-GPU. After completing this walkthrough, you'll be able to
fold proteins in under 60 seconds using either NVIDIA's cloud UI or running locally on your
RTX Pro 6000 or DGX Spark workstation.
## What to know before starting
- Installing Python packages via pip
- Using Docker and the NVIDIA Container Toolkit for GPU workflows
- Running basic Linux commands and setting environment variables
- Understanding FASTA files and basics of protein structure workflows
- Working with CUDA-enabled applications
## Prerequisites
- NVIDIA GPU (RTX Pro 6000 or DGX Spark recommended)
```bash
nvidia-smi # Should show GPU with CUDA ≥12.9
```
- NVIDIA drivers and CUDA toolkit installed
```bash
nvcc --version # Should show CUDA 12.9 or higher
```
- Docker with NVIDIA Container Toolkit
```bash
docker run --rm --gpus all nvidia/cuda:12.9.0-base-ubuntu22.04 nvidia-smi
```
- Python 3.8+ environment
```bash
python3 --version # Should show 3.8 or higher
```
- Sufficient disk space for databases (>3TB recommended)
```bash
df -h # Check available space
```
## Ancillary files
- OpenFold parameters (`finetuning_ptm_2.pt`) — pre-trained model weights for structure prediction
- PDB70 database — template structures for homology modeling
- UniRef90 database — sequence database for MSA generation
- MGnify database — metagenomic sequences for MSA generation
- Uniclust30 database — clustered UniProt sequences for MSA generation
- BFD database — large sequence database for MSA generation
- MMCIF files — template structure files in mmCIF format
- py3Dmol package — Python library for 3D protein visualization
## Time & risk
**Duration:** Initial setup takes 2-4 hours (mainly downloading databases). Each protein fold takes
<60 seconds on GPU vs hours on CPU.
**Risks:**
- Database downloads may fail due to network interruptions
- Insufficient disk space for full databases
- GPU memory limitations for very large proteins (>2000 residues)
**Rollback:** All operations are read-only after setup. Remove downloaded databases and output
directories to clean up.
## Access through terminal
## Step 1. Verify GPU and CUDA installation
Confirm your system has the required GPU and CUDA version for running OpenFold with TensorRT
optimization.
```bash
nvidia-smi
```
Expected output should show an NVIDIA GPU with CUDA capability ≥12.9. For DGX Spark or RTX Pro
6000, you should see the appropriate GPU model listed.
```bash
nvcc --version
```
This should display CUDA compilation tools, release 12.9 or higher.
## Step 2. Set up Python environment
Create a Python virtual environment and install the required packages for protein folding and
visualization.
```bash
python3 -m venv openfold_env
source openfold_env/bin/activate
pip install --upgrade pip
```
Install the py3Dmol visualization package:
```bash
pip install py3Dmol
```
## Step 3. Download OpenFold and databases
Download the OpenFold repository and required databases. This step requires significant disk
space and network bandwidth.
> TODO: Add specific download URLs for OpenFold repository from official GitHub
```bash
## Clone OpenFold repository
git clone https://github.com/NVIDIA/dgx-spark-playbooks
cd nvidia/protein-folding/assets
pip install -e .
```
Download the model parameters:
> TODO: Add direct download URL for finetuning_ptm_2.pt
```bash
mkdir -p openfold_params
wget -O openfold_params/finetuning_ptm_2.pt <PARAM_DOWNLOAD_URL>
```
## Step 4. Download sequence databases
Download all required databases for MSA generation. Each database serves a specific purpose in
the folding pipeline.
> TODO: Add specific download URLs for each database from official sources
```bash
## Create database directory
mkdir -p databases
cd databases
## Download PDB70 (for template structures)
wget <PDB70_DOWNLOAD_URL>
tar -xzf pdb70.tar.gz
## Download UniRef90 (for MSA)
wget <UNIREF90_DOWNLOAD_URL>
tar -xzf uniref90.tar.gz
## Download MGnify (metagenomic sequences)
wget <MGNIFY_DOWNLOAD_URL>
tar -xzf mgnify.tar.gz
## Download Uniclust30 (clustered sequences)
wget <UNICLUST30_DOWNLOAD_URL>
tar -xzf uniclust30.tar.gz
## Download BFD (large sequence database)
wget <BFD_DOWNLOAD_URL>
tar -xzf bfd.tar.gz
## Download MMCIF files (structure templates)
wget <MMCIF_DOWNLOAD_URL>
tar -xzf mmcif.tar.gz
cd ..
```
## Step 5. Configure environment variables
Set up environment variables pointing to your downloaded databases and parameters.
```bash
export OF_PARAM_PATH="$(pwd)/openfold_params/finetuning_ptm_2.pt"
export OF_DB_PDB70="$(pwd)/databases/pdb70"
export OF_DB_UNIREF90="$(pwd)/databases/uniref90"
export OF_DB_MGNIFY="$(pwd)/databases/mgnify"
export OF_DB_UNICLUST30="$(pwd)/databases/uniclust30"
export OF_DB_BFD="$(pwd)/databases/bfd"
export OF_DB_MMCIF="$(pwd)/databases/pdb_mmcif/mmcif_files"
export OF_DB_OBSOLETE="$(pwd)/databases/pdb_mmcif/obsolete.dat"
export OF_DEVICE="cuda:0"
export OF_OUTDIR="openfold_out"
export OF_JOB="demo"
```
## Step 6. Option A - Use NVIDIA Build Portal (Cloud UI)
For quick testing without local setup, use NVIDIA's online demo interface.
1. Navigate to the OpenFold2 page on NVIDIA Build Portal
> TODO: Add specific URL for NVIDIA Build Portal OpenFold2 demo
2. Paste your protein sequence in FASTA format
3. Click "Run" to execute the folding pipeline
4. View results in the integrated Mol* or py3Dmol viewer
### Step 7. Option B - Run locally with demo script
Create and run the OpenFold demo script for local execution on your DGX Spark or RTX Pro 6000.
Create the demo script file:
```bash
cat > openfold_demo.py << 'EOF'
#!/usr/bin/env python3
"""
Single-file OpenFold runner + py3Dmol viewer.
"""
import os, subprocess as sp, glob, sys, tempfile, textwrap
## Paths (edit for your system)
PARAM = os.getenv("OF_PARAM_PATH", "/path/to/openfold_params/finetuning_ptm_2.pt")
PDB70 = os.getenv("OF_DB_PDB70", "/path/to/pdb70")
UNIREF90 = os.getenv("OF_DB_UNIREF90", "/path/to/uniref90")
MGNIFY = os.getenv("OF_DB_MGNIFY", "/path/to/mgnify")
UNICLUST30 = os.getenv("OF_DB_UNICLUST30", "/path/to/uniclust30")
BFD = os.getenv("OF_DB_BFD", "/path/to/bfd")
MMCIF = os.getenv("OF_DB_MMCIF", "/path/to/pdb_mmcif/mmcif_files")
OBSOLETE = os.getenv("OF_DB_OBSOLETE", "/path/to/pdb_mmcif/obsolete.dat")
DEVICE = os.getenv("OF_DEVICE", "cuda:0")
OUTDIR = os.getenv("OF_OUTDIR", "openfold_out")
JOB = os.getenv("OF_JOB", "demo")
SEQ = """>demo
MGSDKIHHHHHHENLYFQGAMASMTGGQQMGRGSMAAAAKKVVAGAAAAGGQAGD"""
def ensure_py3dmol():
try:
import py3Dmol
except ImportError:
sp.check_call([sys.executable, "-m", "pip", "install", "py3Dmol"])
def run_openfold(fasta_path):
cmd = [
sys.executable, "openfold/run_pretrained_openfold.py",
"--fasta_path", fasta_path,
"--job_name", JOB,
"--output_dir", OUTDIR,
"--model_device", DEVICE,
"--param_path", PARAM,
"--pdb70_database_path", PDB70,
"--uniref90_database_path", UNIREF90,
"--mgnify_database_path", MGNIFY,
"--uniclust30_database_path", UNICLUST30,
"--bfd_database_path", BFD,
"--template_mmcif_dir", MMCIF,
"--obsolete_pdbs_path", OBSOLETE,
"--skip_relaxation"
]
sp.check_call(cmd)
def visualize():
import py3Dmol
pdb = open(f"{OUTDIR}/{JOB}/ranked_0.pdb").read()
view = py3Dmol.view(width=800, height=520)
view.addModel(pdb, "pdb")
view.setStyle({"cartoon": {"arrows": True}})
view.zoomTo()
open(f"{OUTDIR}/{JOB}_view.html", "w").write(view._make_html())
print(f"Viewer written to {OUTDIR}/{JOB}_view.html")
def main():
ensure_py3dmol()
with tempfile.TemporaryDirectory() as td:
fasta_path = os.path.join(td, f"{JOB}.fasta")
open(fasta_path, "w").write(textwrap.dedent(SEQ).strip() + "\n")
run_openfold(fasta_path)
visualize()
if __name__ == "__main__":
main()
EOF
```
Make the script executable and run it:
```bash
chmod +x openfold_demo.py
python openfold_demo.py
```
## Step 8. Validate the output
Check that the folding completed successfully and view the generated structure.
```bash
## Verify PDB file was created
ls -la openfold_out/demo/ranked_0.pdb
```
The file should exist and be non-empty (typically >10KB for a small protein).
```bash
## Check the HTML viewer was generated
ls -la openfold_out/demo_view.html
```
Open the HTML file in a web browser to visualize the folded protein structure:
```bash
## On Linux with GUI
xdg-open openfold_out/demo_view.html
## Or copy the full path and open in browser manually
realpath openfold_out/demo_view.html
```
## Step 9. Run with custom sequences
To fold your own protein sequences, modify the demo script or create a new FASTA file.
### Using a custom FASTA file
```bash
## Create your FASTA file
cat > my_protein.fasta << 'EOF'
>my_protein
MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQYMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLPARTVETRQAQDLARSYGIPYIETSAKTRQGVEDAFYTLVREIRQHKLRKLNPPDESGPGCMNCKCVIS
EOF
## Run OpenFold directly
python openfold/run_pretrained_openfold.py \
--fasta_path my_protein.fasta \
--job_name my_protein \
--output_dir openfold_out \
--model_device cuda:0 \
--param_path $OF_PARAM_PATH \
--pdb70_database_path $OF_DB_PDB70 \
--uniref90_database_path $OF_DB_UNIREF90 \
--mgnify_database_path $OF_DB_MGNIFY \
--uniclust30_database_path $OF_DB_UNICLUST30 \
--bfd_database_path $OF_DB_BFD \
--template_mmcif_dir $OF_DB_MMCIF \
--obsolete_pdbs_path $OF_DB_OBSOLETE \
--skip_relaxation
```
## Step 10. Troubleshooting common issues
| Symptom | Cause | Fix |
|---------|-------|-----|
| CUDA out of memory error | Protein too large for GPU | Reduce max_template_date or use smaller sequence |
| Database file not found | Incomplete download or wrong path | Verify all databases downloaded and paths in env vars |
| ImportError: No module named 'openfold' | OpenFold not installed | Run `pip install -e .` in openfold directory |
| nvidia-smi command not found | NVIDIA drivers not installed | Install NVIDIA drivers for your GPU |
| Folding takes hours instead of minutes | Running on CPU instead of GPU | Check OF_DEVICE="cuda:0" and GPU availability |
| py3Dmol viewer shows blank page | JavaScript blocked or path issue | Use absolute path to HTML file or check browser console |
## Step 11. Cleanup and rollback
Remove generated outputs and optionally remove downloaded databases.
```bash
## Remove output files only (safe)
rm -rf openfold_out/
## Remove virtual environment (reversible)
deactivate
rm -rf openfold_env/
```
> [!WARNING]
> The following will delete downloaded databases (>3TB). Only run if you need to free disk space and are willing to re-download.
```bash
## Remove all databases (requires re-download)
rm -rf databases/
## Remove OpenFold repository
rm -rf openfold/
```
## Step 12. Next steps
Test the installation with a well-known protein structure to verify accuracy:
```bash
## Test with ubiquitin (PDB: 1UBQ)
cat > test_ubiquitin.fasta << 'EOF'
>1UBQ
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG
EOF
python openfold/run_pretrained_openfold.py \
--fasta_path test_ubiquitin.fasta \
--job_name ubiquitin_test \
--output_dir openfold_out \
--model_device cuda:0 \
--param_path $OF_PARAM_PATH \
--pdb70_database_path $OF_DB_PDB70 \
--uniref90_database_path $OF_DB_UNIREF90 \
--mgnify_database_path $OF_DB_MGNIFY \
--uniclust30_database_path $OF_DB_UNICLUST30 \
--bfd_database_path $OF_DB_BFD \
--template_mmcif_dir $OF_DB_MMCIF \
--obsolete_pdbs_path $OF_DB_OBSOLETE \
--skip_relaxation
```
For production use, consider:
- Enabling structure relaxation for higher accuracy (remove `--skip_relaxation`)
- Setting up batch processing for multiple sequences
- Integrating with drug discovery pipelines
- Scaling to full proteomes using DGX Spark clusters

View File

@ -1,227 +0,0 @@
# SGLang Inference Server
> Install and use SGLang on DGX Spark
## Table of Contents
- [Overview](#overview)
- [Time & risk](#time-risk)
- [Instructions](#instructions)
---
## Overview
## Basic Idea
SGLang is a fast serving framework for large language models and vision language models that makes
your interaction with models faster and more controllable by co-designing the backend runtime and
frontend language. This setup uses the optimized NVIDIA SGLang NGC Container on a single NVIDIA
Spark device with Blackwell architecture, providing GPU-accelerated inference with all dependencies
pre-installed.
## What you'll accomplish
You'll deploy SGLang in both server and offline inference modes on your NVIDIA Spark device,
enabling high-performance LLM serving with support for text generation, chat completion, and
vision-language tasks using models like DeepSeek-V2-Lite.
## What to know before starting
- Working in a terminal environment on Linux systems
- Basic understanding of Docker containers and container management
- Familiarity with NVIDIA GPU drivers and CUDA toolkit concepts
- Experience with HTTP API endpoints and JSON request/response handling
## Prerequisites
- NVIDIA Spark device with Blackwell architecture
- Docker Engine installed and running: `docker --version`
- NVIDIA GPU drivers installed: `nvidia-smi`
- NVIDIA Container Toolkit configured: `docker run --rm --gpus all nvidia/cuda:12.9-base nvidia-smi`
- Sufficient disk space (>20GB available): `df -h`
- Network connectivity for pulling NGC containers: `ping nvcr.io`
## Ancillary files
- An offline inference python script [found here on GitHub](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/sglang/assets/offline-inference.py)
### Time & risk
**Duration:** 15-30 minutes for initial setup and validation
**Risk level:** Low - Uses pre-built, validated NGC container with minimal configuration
**Rollback:** Stop and remove containers with `docker stop` and `docker rm` commands
## Instructions
## Step 1. Verify system prerequisites
Check that your NVIDIA Spark device meets all requirements before proceeding. This step runs on
your host system and ensures Docker, GPU drivers, and container toolkit are properly configured.
```bash
## Verify Docker installation
docker --version
## Check NVIDIA GPU drivers
nvidia-smi
## Test NVIDIA Container Toolkit
docker run --rm --gpus all nvidia/cuda:12.9-base-ubuntu20.04 nvidia-smi
## Check available disk space
df -h /
```
## Step 2. Pull the SGLang NGC Container
Download the latest SGLang container from NVIDIA NGC. This step runs on the host and may take
several minutes depending on your network connection.
> TODO: Verify the exact container tag/version for SGLang NGC container
```bash
## Pull the SGLang container
docker pull nvcr.io/nvidia/sglang:<VERSION>-py3
## Verify the image was downloaded
docker images | grep sglang
```
## Step 3. Launch SGLang container for server mode
Start the SGLang container in server mode to enable HTTP API access. This runs the inference
server inside the container, exposing it on port 30000 for client connections.
```bash
## Launch container with GPU support and port mapping
docker run --gpus all -it --rm \
-p 30000:30000 \
-v /tmp:/tmp \
nvcr.io/nvidia/sglang:<VERSION>-py3 \
bash
```
## Step 4. Start the SGLang inference server
Inside the container, launch the HTTP inference server with a supported model. This step runs
inside the Docker container and starts the SGLang server daemon.
```bash
## Start the inference server with DeepSeek-V2-Lite model
python3 -m sglang.launch_server \
--model-path deepseek-ai/DeepSeek-V2-Lite \
--host 0.0.0.0 \
--port 30000 \
--trust-remote-code \
--tp 1 &
## Wait for server to initialize
sleep 30
## Check server status
curl http://localhost:30000/health
```
## Step 5. Test client-server inference
From a new terminal on your host system, test the SGLang server API to ensure it's working
correctly. This validates that the server is accepting requests and generating responses.
```bash
## Test with curl
curl -X POST http://localhost:30000/generate \
-H "Content-Type: application/json" \
-d '{
"text": "What does NVIDIA love?",
"sampling_params": {
"temperature": 0.7,
"max_new_tokens": 100
}
}'
```
## Step 6. Test Python client API
Create a simple Python script to test programmatic access to the SGLang server. This runs on
the host system and demonstrates how to integrate SGLang into applications.
```python
import requests
## Send prompt to server
response = requests.post('http://localhost:30000/generate', json={
'text': 'What does NVIDIA love?',
'sampling_params': {
'temperature': 0.7,
'max_new_tokens': 100,
},
})
print(f"Response: {response.json()['text']}")
```
## Step 7. Test offline inference mode
Launch a new container instance for offline inference to demonstrate local model usage without
HTTP server. This runs entirely within the container for batch processing scenarios.
TO DO: NEEDS TO HAVE SCRIPT FROM ASSETS PROPERLY INCORPORATED. [See here](https://github.com/NVIDIA/dgx-spark-playbooks/blob/main/nvidia/sglang/assets)
## Step 8. Validate installation
Confirm that both server and offline modes are working correctly. This step verifies the
complete SGLang setup and ensures reliable operation.
```bash
## Check server mode (from host)
curl http://localhost:30000/health
curl -X POST http://localhost:30000/generate -H "Content-Type: application/json" \
-d '{"text": "Hello", "sampling_params": {"max_new_tokens": 10}}'
## Check container logs
docker ps
docker logs <CONTAINER_ID>
```
## Step 9. Troubleshooting
Common issues and their resolutions:
| Symptom | Cause | Fix |
|---------|-------|-----|
| Container fails to start with GPU errors | NVIDIA drivers/toolkit missing | Install nvidia-container-toolkit, restart Docker |
| Server responds with 404 or connection refused | Server not fully initialized | Wait 60 seconds, check container logs |
| Out of memory errors during model loading | Insufficient GPU memory | Use smaller model or increase --tp parameter |
| Model download fails | Network connectivity issues | Check internet connection, retry download |
| Permission denied accessing /tmp | Volume mount issues | Use full path: -v /tmp:/tmp or create dedicated directory |
## Step 10. Cleanup and rollback
Stop and remove containers to clean up resources. This step returns your system to its
original state.
> [!WARNING]
> This will stop all SGLang containers and remove temporary data.
```bash
## Stop all SGLang containers
docker ps | grep sglang | awk '{print $1}' | xargs docker stop
## Remove stopped containers
docker container prune -f
## Remove SGLang images (optional)
docker rmi nvcr.io/nvidia/sglang:<VERSION>-py3
```
## Step 11. Next steps
With SGLang successfully deployed, you can now:
- Integrate the HTTP API into your applications using the `/generate` endpoint
- Experiment with different models by changing the `--model-path` parameter
- Scale up using multiple GPUs by adjusting the `--tp` (tensor parallel) setting
- Deploy production workloads using the container orchestration platform of your choice

View File

@ -1,30 +0,0 @@
#
# SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import sglang as sgl
def main():
llm = sgl.Engine(model_path="deepseek-ai/DeepSeek-V2-Lite", trust_remote_code=True)
prompt = "What does NVIDIA love?"
sampling_params = {"temperature": 0.7, "max_new_tokens": 100}
output = llm.generate(prompt, sampling_params)
print(f"Output: {output}")
if __name__ == '__main__':
main()

View File

@ -1,153 +0,0 @@
# Vibe Coding in VS Code
> Use DGX Spark as a local or remote Vibe Coding assistant with Ollama and Continue.dev
## Table of Contents
- [Overview](#overview)
- [What You'll Accomplish](#what-youll-accomplish)
- [Prerequisites](#prerequisites)
- [Requirements](#requirements)
- [Instructions](#instructions)
- [Troubleshooting](#troubleshooting)
---
## Overview
## DGX Spark Vibe Coding
This playbook walks you through setting up DGX Spark as a **Vibe Coding assistant** — locally or as a remote coding companion for VSCode with Continue.dev.
While NVIDIA NIMs are not yet widely supported, this guide uses **Ollama** with **GPT-OSS 120B** to provide a high-performance local LLM environment.
### What You'll Accomplish
You'll have a fully configured DGX Spark system capable of:
- Running local code assistance through Ollama.
- Serving models remotely for Continue.dev and VSCode integration.
- Hosting large LLMs like GPT-OSS 120B using unified memory.
### Prerequisites
- DGX Spark (128GB unified memory recommended)
- Internet access for model downloads
- Basic familiarity with the terminal
- Optional: firewall control for remote access configuration
### Requirements
- **Ollama** and an LLM of your choice (e.g., `gpt-oss:120b`)
- **VSCode**
- **Continue.dev** VSCode extension
## Instructions
## Step 1. Install Ollama
Install the latest version of Ollama using the following command:
```bash
curl -fsSL https://ollama.com/install.sh | sh
```
Start the Ollama service:
```bash
ollama serve
```
Once the service is running, pull the desired model:
```bash
ollama pull gpt-oss:120b
```
## Step 2. (Optional) Enable Remote Access
To allow remote connections (e.g., from a workstation using VSCode and Continue.dev), modify the Ollama systemd service:
```bash
sudo systemctl edit ollama
```
Add the following lines beneath the commented section:
```ini
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_ORIGINS=*"
```
Reload and restart the service:
```bash
sudo systemctl daemon-reload
sudo systemctl restart ollama
```
If using a firewall, open port 11434:
```bash
sudo ufw allow 11434/tcp
```
## Step 3. Install VSCode
For DGX Spark (ARM-based), download and install VSCode:
```bash
wget https://code.visualstudio.com/sha/download?build=stable&os=linux-deb-arm64 -O vscode-arm64.deb
sudo apt install ./vscode-arm64.deb
```
If using a remote workstation, install VSCode appropriate for your system architecture.
## Step 4. Install Continue.dev Extension
Open VSCode and install **Continue.dev** from the Marketplace.
After installation, click the Continue icon on the right-hand bar.
Skip login and open the manual configuration via the **gear (⚙️)** icon.
This opens `config.yaml`, which controls model settings.
## Step 5. Local Inference Setup
- In the Continue chat window, use `Ctrl/Cmd + L` to focus the chat.
- Click **Select Model → + Add Chat Model**
- Choose **Ollama** as the provider.
- Set **Install Provider** to default.
- For **Model**, select **Autodetect**.
- Click **Connect**.
You can now select your downloaded model (e.g., `gpt-oss:120b`) for local inference.
## Step 6. Remote Setup for DGX Spark
To connect Continue.dev to a remote DGX Spark instance, edit `config.yaml` in Continue and add:
```yaml
models:
- model: gpt-oss:120b
title: gpt-oss:120b
apiBase: http://YOUR_SPARK_IP:11434/
provider: ollama
```
Replace `YOUR_SPARK_IP` with the IP address of your DGX Spark.
Add additional model entries for any other Ollama models you wish to host remotely.
## Troubleshooting
## Common Issues
**1. Ollama not starting**
- Verify Docker and GPU drivers are installed correctly.
- Run `ollama serve` manually to view errors.
**2. VSCode can't connect**
- Ensure port 11434 is open and accessible from your workstation.
- Check `OLLAMA_HOST` and `OLLAMA_ORIGINS` in `/etc/systemd/system/ollama.service.d/override.conf`.
**3. High memory usage**
- Use smaller models such as `gpt-oss:20b` for lightweight usage.
- Confirm no other large models or containers are running with `nvidia-smi`.