mirror of
https://github.com/NVIDIA/dgx-spark-playbooks.git
synced 2026-04-23 02:23:53 +00:00
chore: Regenerate all playbooks
This commit is contained in:
parent
0d9108cf14
commit
3ed5b3b073
@ -24,6 +24,7 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
|
|||||||
- [Comfy UI](nvidia/comfy-ui/)
|
- [Comfy UI](nvidia/comfy-ui/)
|
||||||
- [Set Up Local Network Access](nvidia/connect-to-your-spark/)
|
- [Set Up Local Network Access](nvidia/connect-to-your-spark/)
|
||||||
- [Connect Two Sparks](nvidia/connect-two-sparks/)
|
- [Connect Two Sparks](nvidia/connect-two-sparks/)
|
||||||
|
- [CUDA-X Data Science](nvidia/cuda-x-data-science/)
|
||||||
- [DGX Dashboard](nvidia/dgx-dashboard/)
|
- [DGX Dashboard](nvidia/dgx-dashboard/)
|
||||||
- [FLUX.1 Dreambooth LoRA Fine-tuning](nvidia/flux-finetuning/)
|
- [FLUX.1 Dreambooth LoRA Fine-tuning](nvidia/flux-finetuning/)
|
||||||
- [Optimized JAX](nvidia/jax/)
|
- [Optimized JAX](nvidia/jax/)
|
||||||
@ -43,10 +44,11 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting
|
|||||||
- [TRT LLM for Inference](nvidia/trt-llm/)
|
- [TRT LLM for Inference](nvidia/trt-llm/)
|
||||||
- [Text to Knowledge Graph](nvidia/txt2kg/)
|
- [Text to Knowledge Graph](nvidia/txt2kg/)
|
||||||
- [Unsloth on DGX Spark](nvidia/unsloth/)
|
- [Unsloth on DGX Spark](nvidia/unsloth/)
|
||||||
|
- [Vibe Coding in VS Code](nvidia/vibe-coding/)
|
||||||
- [Install and Use vLLM for Inference](nvidia/vllm/)
|
- [Install and Use vLLM for Inference](nvidia/vllm/)
|
||||||
- [Vision-Language Model Fine-tuning](nvidia/vlm-finetuning/)
|
- [Vision-Language Model Fine-tuning](nvidia/vlm-finetuning/)
|
||||||
- [VS Code](nvidia/vscode/)
|
- [VS Code](nvidia/vscode/)
|
||||||
- [Video Search and Summarization](nvidia/vss/)
|
- [Build a Video Search and Summarization (VSS) Agent](nvidia/vss/)
|
||||||
|
|
||||||
## Resources
|
## Resources
|
||||||
|
|
||||||
|
|||||||
82
nvidia/cuda-x-data-science/README.md
Normal file
82
nvidia/cuda-x-data-science/README.md
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
# CUDA-X Data Science
|
||||||
|
|
||||||
|
> Install and use NVIDIA cuML and NVIDIA cuDF to accelerate UMAP, HDBSCAN, pandas and more with zero code changes
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Overview](#overview)
|
||||||
|
- [Instructions](#instructions)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
## Basic idea
|
||||||
|
This playbook includes two example notebooks that demonstrate the acceleration of key machine learning algorithms and core pandas operations using CUDA-X Data Science libraries:
|
||||||
|
|
||||||
|
- **NVIDIA cuDF:** Accelerates operations for data preparation and core data processing of 8GB of strings data, with no code changes.
|
||||||
|
- **NVIDIA cuML:** Accelerates popular, compute intensive machine learning algorithms in sci-kit learn (LinearSVC), UMAP, and HDBSCAN, with no code changes.
|
||||||
|
|
||||||
|
CUDA-X Data Science (formally RAPIDS) is an open-source library collection that accelerates the data science and data processing ecosystem. These libraries accelerate popular Python tools like scikit-learn and pandas with zero code changes. On DGX Spark, these libraries maximize performance at your desk with your existing code.
|
||||||
|
|
||||||
|
## What you'll accomplish
|
||||||
|
You will accelerate popular machine learning algorithms and data analytics operations GPU. You will understand how to accelerate popular Python tools, and the value of running data science workflows on your DGX Spark.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
- Familiarity with pandas, scikit-learn, machine learning algorithms, such as support vector machine, clustering, and dimensionality reduction algorithms.
|
||||||
|
- Install conda
|
||||||
|
- Generate a Kaggle API key
|
||||||
|
|
||||||
|
## Time & risk
|
||||||
|
* **Duration:** 20-30 minutes setup time and 2-3 minutes to run each notebook.
|
||||||
|
* **Risk level:**
|
||||||
|
* Data download slowness or failure due to network issues
|
||||||
|
* Kaggle API generation failure requiring retries
|
||||||
|
* **Rollback:** No permanent system changes made during normal usage.
|
||||||
|
|
||||||
|
## Instructions
|
||||||
|
|
||||||
|
## Step 1. Verify system requirements
|
||||||
|
- Verify the system has CUDA 13 installed using `nvcc --version` or `nvidia-smi`
|
||||||
|
- Install conda using [these instructions](https://docs.anaconda.com/miniconda/install/)
|
||||||
|
- Create Kaggle API key using [these instructions](https://www.kaggle.com/discussions/general/74235) and place the **kaggle.json** file in the same folder as the notebook
|
||||||
|
|
||||||
|
## Step 2. Installing Data Science libraries
|
||||||
|
- Use the following command to install the CUDA-X libraries (this will create a new conda environment)
|
||||||
|
```bash
|
||||||
|
conda create -n rapids-test -c rapidsai-nightly -c conda-forge -c nvidia \
|
||||||
|
rapids=25.10 python=3.12 'cuda-version=13.0' \
|
||||||
|
jupyter hdbscan umap-learn
|
||||||
|
```
|
||||||
|
## Step 3. Activate the conda environment
|
||||||
|
- Activate the conda environment
|
||||||
|
```bash
|
||||||
|
conda activate rapids-test
|
||||||
|
```
|
||||||
|
## Step 4. Cloning the playbook repository
|
||||||
|
- Clone the github repository and go the assets folder place in cuda-x-data-science folder
|
||||||
|
```bash
|
||||||
|
git clone https://github.com/NVIDIA/dgx-spark-playbooks
|
||||||
|
```
|
||||||
|
- Place the **kaggle.json** created in Step 1 in the assets folder
|
||||||
|
|
||||||
|
## Step 5. Run the notebooks
|
||||||
|
There are two notebooks in the GitHub repository.
|
||||||
|
One runs an example of a large strings data processing workflow with pandas code on GPU.
|
||||||
|
- Run the cudf_pandas_demo.ipynb notebook and use `localhost:8888` in your browser to access the notebook
|
||||||
|
```bash
|
||||||
|
jupyter notebook cudf_pandas_demo.ipynb
|
||||||
|
```
|
||||||
|
The other goes over an example of machine learning algorithms including UMAP and HDBSCAN.
|
||||||
|
- Run the cuml_sklearn_demo.ipynb notebook and use `localhost:8888` in your browser to access the notebook
|
||||||
|
```bash
|
||||||
|
jupyter notebook cuml_sklearn_demo.ipynb
|
||||||
|
```
|
||||||
|
If you are remotely accessing your DGX-Spark then make sure to forward the necesary port to access the notebook in your local browser. Use the below instruction for port fowarding
|
||||||
|
```bash
|
||||||
|
ssh -N -L YYYY:localhost:XXXX username@remote_host
|
||||||
|
```
|
||||||
|
- `YYYY`: The local port you want to use (e.g. 8888)
|
||||||
|
- `XXXX`: The port you specified when starting Jupyter Notebook on the remote machine (e.g. 8888)
|
||||||
|
- `-N`: Prevents SSH from executing a remote command
|
||||||
|
- `-L`: Spcifies local port forwarding
|
||||||
939
nvidia/cuda-x-data-science/assets/cudf_pandas_demo.ipynb
Normal file
939
nvidia/cuda-x-data-science/assets/cudf_pandas_demo.ipynb
Normal file
@ -0,0 +1,939 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "84635d55-68a2-468b-ac09-9029ebdab55f",
|
||||||
|
"metadata": {
|
||||||
|
"id": "84635d55-68a2-468b-ac09-9029ebdab55f"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# Accelerating large string data processing with cudf pandas accelerator mode (cudf.pandas)\n",
|
||||||
|
"<a href=\"https://github.com/rapidsai/cudf\">cuDF</a> is a Python GPU DataFrame library (built on the Apache Arrow columnar memory format) for loading, joining, aggregating, filtering, and otherwise manipulating tabular data using a DataFrame style API in the style of pandas.\n",
|
||||||
|
"\n",
|
||||||
|
"cuDF now provides a <a href=\"https://rapids.ai/cudf-pandas/\">pandas accelerator mode</a> (`cudf.pandas`), allowing you to bring accelerated computing to your pandas workflows without requiring any code change.\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook demonstrates how cuDF pandas accelerator mode can help accelerate processing of datasets with large string fields (4 GB+) processing by simply adding a `%load_ext` command. We have introduced this feature as part of our Rapids 24.08 release.\n",
|
||||||
|
"\n",
|
||||||
|
"**Author:** Allison Ding, Mitesh Patel <br>\n",
|
||||||
|
"**Date:** October 3, 2025"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "bb8fe7ab-c055-40e9-897d-c62c72f28a16",
|
||||||
|
"metadata": {
|
||||||
|
"id": "bb8fe7ab-c055-40e9-897d-c62c72f28a16"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# ⚠️ Verify your setup\n",
|
||||||
|
"\n",
|
||||||
|
"First, we'll verify that you are running with an NVIDIA GPU."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 7,
|
||||||
|
"id": "a88b8586-cfdd-4d31-9b4d-9be8508f7ba0",
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"base_uri": "https://localhost:8080/"
|
||||||
|
},
|
||||||
|
"id": "a88b8586-cfdd-4d31-9b4d-9be8508f7ba0",
|
||||||
|
"outputId": "18525b64-b34b-40e3-ed3a-1ad56ae794b5"
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"Fri Oct 3 23:16:52 2025 \n",
|
||||||
|
"+-----------------------------------------------------------------------------------------+\n",
|
||||||
|
"| NVIDIA-SMI 580.82.09 Driver Version: 580.82.09 CUDA Version: 13.0 |\n",
|
||||||
|
"+-----------------------------------------+------------------------+----------------------+\n",
|
||||||
|
"| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
|
||||||
|
"| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
|
||||||
|
"| | | MIG M. |\n",
|
||||||
|
"|=========================================+========================+======================|\n",
|
||||||
|
"| 0 NVIDIA GB10 Off | 0000000F:01:00.0 Off | N/A |\n",
|
||||||
|
"| N/A 44C P0 10W / N/A | Not Supported | 0% Default |\n",
|
||||||
|
"| | | N/A |\n",
|
||||||
|
"+-----------------------------------------+------------------------+----------------------+\n",
|
||||||
|
"\n",
|
||||||
|
"+-----------------------------------------------------------------------------------------+\n",
|
||||||
|
"| Processes: |\n",
|
||||||
|
"| GPU GI CI PID Type Process name GPU Memory |\n",
|
||||||
|
"| ID ID Usage |\n",
|
||||||
|
"|=========================================================================================|\n",
|
||||||
|
"| 0 N/A N/A 3405 G /usr/lib/xorg/Xorg 242MiB |\n",
|
||||||
|
"| 0 N/A N/A 3562 G /usr/bin/gnome-shell 53MiB |\n",
|
||||||
|
"| 0 N/A N/A 214921 C .../envs/rapids-25.10/bin/python 196MiB |\n",
|
||||||
|
"+-----------------------------------------------------------------------------------------+\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"!nvidia-smi # this should display information about available GPUs"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "5cd58071-4371-428b-8a02-9cd66e6cb91f",
|
||||||
|
"metadata": {
|
||||||
|
"id": "5cd58071-4371-428b-8a02-9cd66e6cb91f"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# Download the data"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "9eb67713-7cf4-415a-bce7-ff4695862faa",
|
||||||
|
"metadata": {
|
||||||
|
"id": "9eb67713-7cf4-415a-bce7-ff4695862faa"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"## Overview\n",
|
||||||
|
"The data we'll be working with summarizes job postings data that a developer working at a job listing firm might analyze to understand posting trends.\n",
|
||||||
|
"\n",
|
||||||
|
"We'll need to download a curated copy of this [Kaggle dataset](https://www.kaggle.com/datasets/asaniczka/1-3m-linkedin-jobs-and-skills-2024/data?select=job_summary.csv) directly from the kaggle API. \n",
|
||||||
|
"\n",
|
||||||
|
"**Data License and Terms** <br>\n",
|
||||||
|
"As this dataset originates from a Kaggle dataset, it's governed by that dataset's license and terms of use, which is the Open Data Commons license. Review here:https://opendatacommons.org/licenses/by/1-0/index.html. For each dataset an user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose.\n",
|
||||||
|
"\n",
|
||||||
|
"**Are there restrictions on how I can use this data? </br>**\n",
|
||||||
|
"For each dataset an user elects to use, the user is responsible for checking if the dataset license is fit for the intended purpose.\n",
|
||||||
|
"\n",
|
||||||
|
"## Get the Data\n",
|
||||||
|
"First, [please follow these instructions from Kaggle to download and/or updating your Kaggle API token to get acces the dataset](https://www.kaggle.com/discussions/general/74235). \n",
|
||||||
|
"\n",
|
||||||
|
"Once generated, make sure to have the **kaggle.json** file in the same folder as the notebook\n",
|
||||||
|
"\n",
|
||||||
|
"Next, run this code below, which should also take 1-2 minutes:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 8,
|
||||||
|
"id": "406838c6-267c-423e-82ab-ea13d5fa9c90",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"Requirement already satisfied: kaggle in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (1.7.4.5)\n",
|
||||||
|
"Requirement already satisfied: bleach in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (6.2.0)\n",
|
||||||
|
"Requirement already satisfied: certifi>=14.05.14 in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (2025.8.3)\n",
|
||||||
|
"Requirement already satisfied: charset-normalizer in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (3.4.3)\n",
|
||||||
|
"Requirement already satisfied: idna in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (3.10)\n",
|
||||||
|
"Requirement already satisfied: protobuf in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (6.32.1)\n",
|
||||||
|
"Requirement already satisfied: python-dateutil>=2.5.3 in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (2.9.0.post0)\n",
|
||||||
|
"Requirement already satisfied: python-slugify in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (8.0.4)\n",
|
||||||
|
"Requirement already satisfied: requests in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (2.32.5)\n",
|
||||||
|
"Requirement already satisfied: setuptools>=21.0.0 in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (80.9.0)\n",
|
||||||
|
"Requirement already satisfied: six>=1.10 in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (1.17.0)\n",
|
||||||
|
"Requirement already satisfied: text-unidecode in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (1.3)\n",
|
||||||
|
"Requirement already satisfied: tqdm in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (4.67.1)\n",
|
||||||
|
"Requirement already satisfied: urllib3>=1.15.1 in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (2.5.0)\n",
|
||||||
|
"Requirement already satisfied: webencodings in /home/nvidia/miniconda3/envs/rapids-25.10/lib/python3.12/site-packages (from kaggle) (0.5.1)\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"!pip install kaggle\n",
|
||||||
|
"!mkdir -p ~/.kaggle\n",
|
||||||
|
"!cp kaggle.json ~/.kaggle/\n",
|
||||||
|
"!chmod 600 ~/.kaggle/kaggle.json"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 18,
|
||||||
|
"id": "3efacb3c-5f3d-4ff0-b32a-76bbb80b5f74",
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"base_uri": "https://localhost:8080/"
|
||||||
|
},
|
||||||
|
"id": "3efacb3c-5f3d-4ff0-b32a-76bbb80b5f74",
|
||||||
|
"outputId": "5fe4a878-cf57-44f9-e40e-ed413035b150"
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Download the dataset through kaggle API-\n",
|
||||||
|
"!kaggle datasets download -d asaniczka/1-3m-linkedin-jobs-and-skills-2024\n",
|
||||||
|
"#unzip the file to access contents\n",
|
||||||
|
"!unzip 1-3m-linkedin-jobs-and-skills-2024.zip"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "2__ZMVe6LaBJ",
|
||||||
|
"metadata": {
|
||||||
|
"id": "2__ZMVe6LaBJ"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# Analysis with cuDF Pandas"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "df47f304-2b30-4380-afd5-0613b63d103d",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The magic command `%load_ext cudf.pandas` enables GPU acceleration for pandas data processing in a Jupyter notebook, allowing most pandas operations to automatically execute on NVIDIA GPUs for improved performance. \n",
|
||||||
|
"\n",
|
||||||
|
"With this extension loaded before importing pandas, your code can use standard pandas syntax while gaining the benefits of GPU speedup, automatically falling back to CPU execution for operations not supported on the GPU. This provides a seamless way to accelerate existing pandas workflows with zero code changes, especially for large data analytics tasks or machine learning preprocessing."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 1,
|
||||||
|
"id": "e5cd2520-30a6-41c1-b7c5-5abe0eb90d82",
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"%load_ext cudf.pandas"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"id": "eadb8d77-cb45-4c7c-ae9f-77e47a4f29b3",
|
||||||
|
"metadata": {
|
||||||
|
"id": "eadb8d77-cb45-4c7c-ae9f-77e47a4f29b3"
|
||||||
|
},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import numpy as np"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "196268f2-6169-4ed7-a9e6-db9078caa6ab",
|
||||||
|
"metadata": {
|
||||||
|
"id": "196268f2-6169-4ed7-a9e6-db9078caa6ab"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"We'll run a piece of code to get a feel what GPU-acceleration brings to pandas workflows."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"id": "ae3b6a16-ff72-4421-b43c-06c33f57ec12",
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"base_uri": "https://localhost:8080/"
|
||||||
|
},
|
||||||
|
"id": "ae3b6a16-ff72-4421-b43c-06c33f57ec12",
|
||||||
|
"outputId": "656acbf7-078f-42b3-832d-ad4e84e01c70"
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"CPU times: user 185 ms, sys: 2.08 s, total: 2.27 s\n",
|
||||||
|
"Wall time: 2.95 s\n",
|
||||||
|
"Dataset Size (in GB): 4.76\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"%%time \n",
|
||||||
|
"job_summary_df = pd.read_csv(\"job_summary.csv\", dtype=('str'))\n",
|
||||||
|
"print(\"Dataset Size (in GB):\",round(job_summary_df.memory_usage(\n",
|
||||||
|
" deep=True).sum()/(1024**3),2))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "01c506e1-f135-4afb-8fc7-23e72c05d73c",
|
||||||
|
"metadata": {
|
||||||
|
"id": "01c506e1-f135-4afb-8fc7-23e72c05d73c"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"The same dataset takes about around 1.5 minutes to load with pandas. That's around **5x speedup** with no changes to the code!"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "d9d0a0e1-1d74-494d-bd12-b829f11eeede",
|
||||||
|
"metadata": {
|
||||||
|
"id": "d9d0a0e1-1d74-494d-bd12-b829f11eeede"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"Let's load the remaining two datasets as well:"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 4,
|
||||||
|
"id": "12e4cf7e-8824-4822-9d30-46b81ba2acd7",
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"base_uri": "https://localhost:8080/"
|
||||||
|
},
|
||||||
|
"id": "12e4cf7e-8824-4822-9d30-46b81ba2acd7",
|
||||||
|
"outputId": "5ca1be17-09e3-40ab-928b-82176bf597bf"
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"CPU times: user 45.3 ms, sys: 199 ms, total: 244 ms\n",
|
||||||
|
"Wall time: 354 ms\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"job_skills_df = pd.read_csv(\"job_skills.csv\", dtype=('str'))\n",
|
||||||
|
"job_postings_df = pd.read_csv(\"linkedin_job_postings.csv\", dtype=('str'))"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 38,
|
||||||
|
"id": "13c8f9da-121f-4311-8a79-274425363e5e",
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"base_uri": "https://localhost:8080/",
|
||||||
|
"height": 276
|
||||||
|
},
|
||||||
|
"id": "13c8f9da-121f-4311-8a79-274425363e5e",
|
||||||
|
"outputId": "a73599c1-05b2-4f56-a190-c69c017bb330"
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"CPU times: user 4.46 ms, sys: 3.1 ms, total: 7.56 ms\n",
|
||||||
|
"Wall time: 46.3 ms\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"0 957\n",
|
||||||
|
"1 3816\n",
|
||||||
|
"2 5314\n",
|
||||||
|
"3 2774\n",
|
||||||
|
"4 2749\n",
|
||||||
|
"Name: summary_length, dtype: int32"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 38,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"job_summary_df['summary_length'] = job_summary_df['job_summary'].str.len()\n",
|
||||||
|
"job_summary_df['summary_length'].head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "67b68792-5c64-4ebd-9d80-cf6ff55baeef",
|
||||||
|
"metadata": {
|
||||||
|
"id": "67b68792-5c64-4ebd-9d80-cf6ff55baeef"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"That was lightning fast! We went from around 10+ (with pandas) to a few milliseconds."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 39,
|
||||||
|
"id": "31e1cc84-debb-4da7-bc20-5c7139f786f7",
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"base_uri": "https://localhost:8080/",
|
||||||
|
"height": 504
|
||||||
|
},
|
||||||
|
"id": "31e1cc84-debb-4da7-bc20-5c7139f786f7",
|
||||||
|
"outputId": "2d89fc49-7e5b-41db-c25b-441d54480711"
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"CPU times: user 39.8 ms, sys: 30 ms, total: 69.8 ms\n",
|
||||||
|
"Wall time: 211 ms\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/html": [
|
||||||
|
"<div>\n",
|
||||||
|
"<style scoped>\n",
|
||||||
|
" .dataframe tbody tr th:only-of-type {\n",
|
||||||
|
" vertical-align: middle;\n",
|
||||||
|
" }\n",
|
||||||
|
"\n",
|
||||||
|
" .dataframe tbody tr th {\n",
|
||||||
|
" vertical-align: top;\n",
|
||||||
|
" }\n",
|
||||||
|
"\n",
|
||||||
|
" .dataframe thead th {\n",
|
||||||
|
" text-align: right;\n",
|
||||||
|
" }\n",
|
||||||
|
"</style>\n",
|
||||||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
||||||
|
" <thead>\n",
|
||||||
|
" <tr style=\"text-align: right;\">\n",
|
||||||
|
" <th></th>\n",
|
||||||
|
" <th>job_link</th>\n",
|
||||||
|
" <th>last_processed_time</th>\n",
|
||||||
|
" <th>got_summary</th>\n",
|
||||||
|
" <th>got_ner</th>\n",
|
||||||
|
" <th>is_being_worked</th>\n",
|
||||||
|
" <th>job_title</th>\n",
|
||||||
|
" <th>company</th>\n",
|
||||||
|
" <th>job_location</th>\n",
|
||||||
|
" <th>first_seen</th>\n",
|
||||||
|
" <th>search_city</th>\n",
|
||||||
|
" <th>search_country</th>\n",
|
||||||
|
" <th>search_position</th>\n",
|
||||||
|
" <th>job_level</th>\n",
|
||||||
|
" <th>job_type</th>\n",
|
||||||
|
" <th>job_summary</th>\n",
|
||||||
|
" <th>summary_length</th>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" </thead>\n",
|
||||||
|
" <tbody>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>0</th>\n",
|
||||||
|
" <td>https://www.linkedin.com/jobs/view/account-exe...</td>\n",
|
||||||
|
" <td>2024-01-21 07:12:29.00256+00</td>\n",
|
||||||
|
" <td>t</td>\n",
|
||||||
|
" <td>t</td>\n",
|
||||||
|
" <td>f</td>\n",
|
||||||
|
" <td>Account Executive - Dispensing (NorCal/Norther...</td>\n",
|
||||||
|
" <td>BD</td>\n",
|
||||||
|
" <td>San Diego, CA</td>\n",
|
||||||
|
" <td>2024-01-15</td>\n",
|
||||||
|
" <td>Coronado</td>\n",
|
||||||
|
" <td>United States</td>\n",
|
||||||
|
" <td>Color Maker</td>\n",
|
||||||
|
" <td>Mid senior</td>\n",
|
||||||
|
" <td>Onsite</td>\n",
|
||||||
|
" <td>Responsibilities\\nJob Description Summary\\nJob...</td>\n",
|
||||||
|
" <td>4602</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>1</th>\n",
|
||||||
|
" <td>https://www.linkedin.com/jobs/view/registered-...</td>\n",
|
||||||
|
" <td>2024-01-21 07:39:58.88137+00</td>\n",
|
||||||
|
" <td>t</td>\n",
|
||||||
|
" <td>t</td>\n",
|
||||||
|
" <td>f</td>\n",
|
||||||
|
" <td>Registered Nurse - RN Care Manager</td>\n",
|
||||||
|
" <td>Trinity Health MI</td>\n",
|
||||||
|
" <td>Norton Shores, MI</td>\n",
|
||||||
|
" <td>2024-01-14</td>\n",
|
||||||
|
" <td>Grand Haven</td>\n",
|
||||||
|
" <td>United States</td>\n",
|
||||||
|
" <td>Director Nursing Service</td>\n",
|
||||||
|
" <td>Mid senior</td>\n",
|
||||||
|
" <td>Onsite</td>\n",
|
||||||
|
" <td>Employment Type:\\nFull time\\nShift:\\nDescripti...</td>\n",
|
||||||
|
" <td>2950</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>2</th>\n",
|
||||||
|
" <td>https://www.linkedin.com/jobs/view/restaurant-...</td>\n",
|
||||||
|
" <td>2024-01-21 07:40:00.251126+00</td>\n",
|
||||||
|
" <td>t</td>\n",
|
||||||
|
" <td>t</td>\n",
|
||||||
|
" <td>f</td>\n",
|
||||||
|
" <td>RESTAURANT SUPERVISOR - THE FORKLIFT</td>\n",
|
||||||
|
" <td>Wasatch Adaptive Sports</td>\n",
|
||||||
|
" <td>Sandy, UT</td>\n",
|
||||||
|
" <td>2024-01-14</td>\n",
|
||||||
|
" <td>Tooele</td>\n",
|
||||||
|
" <td>United States</td>\n",
|
||||||
|
" <td>Stand-In</td>\n",
|
||||||
|
" <td>Mid senior</td>\n",
|
||||||
|
" <td>Onsite</td>\n",
|
||||||
|
" <td>Job Details\\nDescription\\nWhat You'll Do\\nAs a...</td>\n",
|
||||||
|
" <td>4571</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>3</th>\n",
|
||||||
|
" <td>https://www.linkedin.com/jobs/view/independent...</td>\n",
|
||||||
|
" <td>2024-01-21 07:40:00.308133+00</td>\n",
|
||||||
|
" <td>t</td>\n",
|
||||||
|
" <td>t</td>\n",
|
||||||
|
" <td>f</td>\n",
|
||||||
|
" <td>Independent Real Estate Agent</td>\n",
|
||||||
|
" <td>Howard Hanna | Rand Realty</td>\n",
|
||||||
|
" <td>Englewood Cliffs, NJ</td>\n",
|
||||||
|
" <td>2024-01-16</td>\n",
|
||||||
|
" <td>Pinehurst</td>\n",
|
||||||
|
" <td>United States</td>\n",
|
||||||
|
" <td>Real-Estate Clerk</td>\n",
|
||||||
|
" <td>Mid senior</td>\n",
|
||||||
|
" <td>Onsite</td>\n",
|
||||||
|
" <td>Who We Are\\nRand Realty is a family-owned brok...</td>\n",
|
||||||
|
" <td>3944</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>4</th>\n",
|
||||||
|
" <td>https://www.linkedin.com/jobs/view/group-unit-...</td>\n",
|
||||||
|
" <td>2024-01-19 09:45:09.215838+00</td>\n",
|
||||||
|
" <td>f</td>\n",
|
||||||
|
" <td>f</td>\n",
|
||||||
|
" <td>f</td>\n",
|
||||||
|
" <td>Group/Unit Supervisor (Systems Support Manager...</td>\n",
|
||||||
|
" <td>IRS, Office of Chief Counsel</td>\n",
|
||||||
|
" <td>Chamblee, GA</td>\n",
|
||||||
|
" <td>2024-01-17</td>\n",
|
||||||
|
" <td>Gadsden</td>\n",
|
||||||
|
" <td>United States</td>\n",
|
||||||
|
" <td>Supervisor Travel-Information Center</td>\n",
|
||||||
|
" <td>Mid senior</td>\n",
|
||||||
|
" <td>Onsite</td>\n",
|
||||||
|
" <td>None</td>\n",
|
||||||
|
" <td><NA></td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" </tbody>\n",
|
||||||
|
"</table>\n",
|
||||||
|
"</div>"
|
||||||
|
],
|
||||||
|
"text/plain": [
|
||||||
|
" job_link \\\n",
|
||||||
|
"0 https://www.linkedin.com/jobs/view/account-exe... \n",
|
||||||
|
"1 https://www.linkedin.com/jobs/view/registered-... \n",
|
||||||
|
"2 https://www.linkedin.com/jobs/view/restaurant-... \n",
|
||||||
|
"3 https://www.linkedin.com/jobs/view/independent... \n",
|
||||||
|
"4 https://www.linkedin.com/jobs/view/group-unit-... \n",
|
||||||
|
"\n",
|
||||||
|
" last_processed_time got_summary got_ner is_being_worked \\\n",
|
||||||
|
"0 2024-01-21 07:12:29.00256+00 t t f \n",
|
||||||
|
"1 2024-01-21 07:39:58.88137+00 t t f \n",
|
||||||
|
"2 2024-01-21 07:40:00.251126+00 t t f \n",
|
||||||
|
"3 2024-01-21 07:40:00.308133+00 t t f \n",
|
||||||
|
"4 2024-01-19 09:45:09.215838+00 f f f \n",
|
||||||
|
"\n",
|
||||||
|
" job_title \\\n",
|
||||||
|
"0 Account Executive - Dispensing (NorCal/Norther... \n",
|
||||||
|
"1 Registered Nurse - RN Care Manager \n",
|
||||||
|
"2 RESTAURANT SUPERVISOR - THE FORKLIFT \n",
|
||||||
|
"3 Independent Real Estate Agent \n",
|
||||||
|
"4 Group/Unit Supervisor (Systems Support Manager... \n",
|
||||||
|
"\n",
|
||||||
|
" company job_location first_seen \\\n",
|
||||||
|
"0 BD San Diego, CA 2024-01-15 \n",
|
||||||
|
"1 Trinity Health MI Norton Shores, MI 2024-01-14 \n",
|
||||||
|
"2 Wasatch Adaptive Sports Sandy, UT 2024-01-14 \n",
|
||||||
|
"3 Howard Hanna | Rand Realty Englewood Cliffs, NJ 2024-01-16 \n",
|
||||||
|
"4 IRS, Office of Chief Counsel Chamblee, GA 2024-01-17 \n",
|
||||||
|
"\n",
|
||||||
|
" search_city search_country search_position \\\n",
|
||||||
|
"0 Coronado United States Color Maker \n",
|
||||||
|
"1 Grand Haven United States Director Nursing Service \n",
|
||||||
|
"2 Tooele United States Stand-In \n",
|
||||||
|
"3 Pinehurst United States Real-Estate Clerk \n",
|
||||||
|
"4 Gadsden United States Supervisor Travel-Information Center \n",
|
||||||
|
"\n",
|
||||||
|
" job_level job_type job_summary \\\n",
|
||||||
|
"0 Mid senior Onsite Responsibilities\\nJob Description Summary\\nJob... \n",
|
||||||
|
"1 Mid senior Onsite Employment Type:\\nFull time\\nShift:\\nDescripti... \n",
|
||||||
|
"2 Mid senior Onsite Job Details\\nDescription\\nWhat You'll Do\\nAs a... \n",
|
||||||
|
"3 Mid senior Onsite Who We Are\\nRand Realty is a family-owned brok... \n",
|
||||||
|
"4 Mid senior Onsite None \n",
|
||||||
|
"\n",
|
||||||
|
" summary_length \n",
|
||||||
|
"0 4602 \n",
|
||||||
|
"1 2950 \n",
|
||||||
|
"2 4571 \n",
|
||||||
|
"3 3944 \n",
|
||||||
|
"4 <NA> "
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 39,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"df_merged=pd.merge(job_postings_df, job_summary_df, how=\"left\", on=\"job_link\")\n",
|
||||||
|
"df_merged.head()"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 40,
|
||||||
|
"id": "0160a559-2b17-40a6-ad9d-34ce746236d0",
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"base_uri": "https://localhost:8080/",
|
||||||
|
"height": 490
|
||||||
|
},
|
||||||
|
"id": "0160a559-2b17-40a6-ad9d-34ce746236d0",
|
||||||
|
"outputId": "e397c28b-a90d-42d2-8a9a-4c6260c45b38"
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"CPU times: user 33.2 ms, sys: 17.3 ms, total: 50.6 ms\n",
|
||||||
|
"Wall time: 120 ms\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/html": [
|
||||||
|
"<div>\n",
|
||||||
|
"<style scoped>\n",
|
||||||
|
" .dataframe tbody tr th:only-of-type {\n",
|
||||||
|
" vertical-align: middle;\n",
|
||||||
|
" }\n",
|
||||||
|
"\n",
|
||||||
|
" .dataframe tbody tr th {\n",
|
||||||
|
" vertical-align: top;\n",
|
||||||
|
" }\n",
|
||||||
|
"\n",
|
||||||
|
" .dataframe thead th {\n",
|
||||||
|
" text-align: right;\n",
|
||||||
|
" }\n",
|
||||||
|
"</style>\n",
|
||||||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
||||||
|
" <thead>\n",
|
||||||
|
" <tr style=\"text-align: right;\">\n",
|
||||||
|
" <th></th>\n",
|
||||||
|
" <th></th>\n",
|
||||||
|
" <th>summary_length</th>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>company</th>\n",
|
||||||
|
" <th>job_title</th>\n",
|
||||||
|
" <th></th>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" </thead>\n",
|
||||||
|
" <tbody>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>ClickJobs.io</th>\n",
|
||||||
|
" <th>Adolescent Behavioral Health Therapist - Substance Use Specialty (Entry Senior Level) Psychiatry</th>\n",
|
||||||
|
" <td>23748.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>Mt. San Antonio College</th>\n",
|
||||||
|
" <th>Chief, Police and Campus Safety</th>\n",
|
||||||
|
" <td>22998.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>CareerBeacon</th>\n",
|
||||||
|
" <th>Airside/Groundside Project Manager [Halifax International Airport Authority]</th>\n",
|
||||||
|
" <td>22938.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>Tacoma Community College</th>\n",
|
||||||
|
" <th>Anthropology Professor - Part-time</th>\n",
|
||||||
|
" <td>22790.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>IRS, Office of Chief Counsel</th>\n",
|
||||||
|
" <th>Program Analyst (12-Month Roster)</th>\n",
|
||||||
|
" <td>22774.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>...</th>\n",
|
||||||
|
" <th>...</th>\n",
|
||||||
|
" <td>...</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th rowspan=\"4\" valign=\"top\">鴻海精密工業股份有限公司</th>\n",
|
||||||
|
" <th>HR Specialist - Payroll & Benefit</th>\n",
|
||||||
|
" <td>0.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>Material Planner</th>\n",
|
||||||
|
" <td>0.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>RFQ Specialist</th>\n",
|
||||||
|
" <td>0.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>Supply Chain Program Manager</th>\n",
|
||||||
|
" <td>0.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>🌟Daniel-Scott Recruitment Ltd🌟</th>\n",
|
||||||
|
" <th>IT Manager</th>\n",
|
||||||
|
" <td>0.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" </tbody>\n",
|
||||||
|
"</table>\n",
|
||||||
|
"<p>801276 rows × 1 columns</p>\n",
|
||||||
|
"</div>"
|
||||||
|
],
|
||||||
|
"text/plain": [
|
||||||
|
" summary_length\n",
|
||||||
|
"company job_title \n",
|
||||||
|
"ClickJobs.io Adolescent Behavioral Health Therapist - Substa... 23748.0\n",
|
||||||
|
"Mt. San Antonio College Chief, Police and Campus Safety 22998.0\n",
|
||||||
|
"CareerBeacon Airside/Groundside Project Manager [Halifax Int... 22938.0\n",
|
||||||
|
"Tacoma Community College Anthropology Professor - Part-time 22790.0\n",
|
||||||
|
"IRS, Office of Chief Counsel Program Analyst (12-Month Roster) 22774.0\n",
|
||||||
|
"... ...\n",
|
||||||
|
"鴻海精密工業股份有限公司 HR Specialist - Payroll & Benefit 0.0\n",
|
||||||
|
" Material Planner 0.0\n",
|
||||||
|
" RFQ Specialist 0.0\n",
|
||||||
|
" Supply Chain Program Manager 0.0\n",
|
||||||
|
"🌟Daniel-Scott Recruitment Ltd🌟 IT Manager 0.0\n",
|
||||||
|
"\n",
|
||||||
|
"[801276 rows x 1 columns]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 40,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"df_merged.groupby(['company',\"job_title\"]).agg({\n",
|
||||||
|
" \"summary_length\":\"mean\"}).sort_values(by='summary_length', ascending = False).fillna(0)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "IME4urGYQ3qS",
|
||||||
|
"metadata": {
|
||||||
|
"id": "IME4urGYQ3qS"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"We went down from around 5 seconds to less than a second here. This is in line with our speedups on other operations!"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 41,
|
||||||
|
"id": "adc00726-f151-41f4-8731-a1ce1f83eea2",
|
||||||
|
"metadata": {
|
||||||
|
"colab": {
|
||||||
|
"base_uri": "https://localhost:8080/",
|
||||||
|
"height": 458
|
||||||
|
},
|
||||||
|
"id": "adc00726-f151-41f4-8731-a1ce1f83eea2",
|
||||||
|
"outputId": "46423696-b167-4ffe-bb3b-9de7f3e6d668"
|
||||||
|
},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"name": "stdout",
|
||||||
|
"output_type": "stream",
|
||||||
|
"text": [
|
||||||
|
"CPU times: user 13.7 ms, sys: 20.3 ms, total: 34 ms\n",
|
||||||
|
"Wall time: 156 ms\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/html": [
|
||||||
|
"<div>\n",
|
||||||
|
"<style scoped>\n",
|
||||||
|
" .dataframe tbody tr th:only-of-type {\n",
|
||||||
|
" vertical-align: middle;\n",
|
||||||
|
" }\n",
|
||||||
|
"\n",
|
||||||
|
" .dataframe tbody tr th {\n",
|
||||||
|
" vertical-align: top;\n",
|
||||||
|
" }\n",
|
||||||
|
"\n",
|
||||||
|
" .dataframe thead th {\n",
|
||||||
|
" text-align: right;\n",
|
||||||
|
" }\n",
|
||||||
|
"</style>\n",
|
||||||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
||||||
|
" <thead>\n",
|
||||||
|
" <tr style=\"text-align: right;\">\n",
|
||||||
|
" <th></th>\n",
|
||||||
|
" <th>job_title</th>\n",
|
||||||
|
" <th>job_location</th>\n",
|
||||||
|
" <th>summary_length</th>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" </thead>\n",
|
||||||
|
" <tbody>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>0</th>\n",
|
||||||
|
" <td>🔥Nurse Manager, Patient Services - Operating Room</td>\n",
|
||||||
|
" <td>Lake George, NY</td>\n",
|
||||||
|
" <td>7342.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>1</th>\n",
|
||||||
|
" <td>🔥Behavioral Health RN 3 12s</td>\n",
|
||||||
|
" <td>Glens Falls, NY</td>\n",
|
||||||
|
" <td>2787.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>2</th>\n",
|
||||||
|
" <td>🔥 Surgical Technologist - Evenings</td>\n",
|
||||||
|
" <td>Lake George, NY</td>\n",
|
||||||
|
" <td>2920.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>3</th>\n",
|
||||||
|
" <td>🔥 Physician Practice Clinical Lead RN</td>\n",
|
||||||
|
" <td>Saratoga Springs, NY</td>\n",
|
||||||
|
" <td>2945.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>4</th>\n",
|
||||||
|
" <td>🔥 Physican Practice LPN - Green</td>\n",
|
||||||
|
" <td>Lake George, NY</td>\n",
|
||||||
|
" <td>2969.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>...</th>\n",
|
||||||
|
" <td>...</td>\n",
|
||||||
|
" <td>...</td>\n",
|
||||||
|
" <td>...</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>1104106</th>\n",
|
||||||
|
" <td>\"Attorney\" (Gov Appt/Non-Merit) Jobs</td>\n",
|
||||||
|
" <td>Kentucky, United States</td>\n",
|
||||||
|
" <td>2427.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>1104107</th>\n",
|
||||||
|
" <td>\"Accountant\"</td>\n",
|
||||||
|
" <td>Shavano Park, TX</td>\n",
|
||||||
|
" <td>1497.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>1104108</th>\n",
|
||||||
|
" <td>\"Accountant\"</td>\n",
|
||||||
|
" <td>Basking Ridge, NJ</td>\n",
|
||||||
|
" <td>1073.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>1104109</th>\n",
|
||||||
|
" <td>\"Accountant\"</td>\n",
|
||||||
|
" <td>Austin, TX</td>\n",
|
||||||
|
" <td>1993.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>1104110</th>\n",
|
||||||
|
" <td>\"A\" Softball Coach - Central Middle School</td>\n",
|
||||||
|
" <td>East Corinth, ME</td>\n",
|
||||||
|
" <td>718.0</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" </tbody>\n",
|
||||||
|
"</table>\n",
|
||||||
|
"<p>1104111 rows × 3 columns</p>\n",
|
||||||
|
"</div>"
|
||||||
|
],
|
||||||
|
"text/plain": [
|
||||||
|
" job_title \\\n",
|
||||||
|
"0 🔥Nurse Manager, Patient Services - Operating Room \n",
|
||||||
|
"1 🔥Behavioral Health RN 3 12s \n",
|
||||||
|
"2 🔥 Surgical Technologist - Evenings \n",
|
||||||
|
"3 🔥 Physician Practice Clinical Lead RN \n",
|
||||||
|
"4 🔥 Physican Practice LPN - Green \n",
|
||||||
|
"... ... \n",
|
||||||
|
"1104106 \"Attorney\" (Gov Appt/Non-Merit) Jobs \n",
|
||||||
|
"1104107 \"Accountant\" \n",
|
||||||
|
"1104108 \"Accountant\" \n",
|
||||||
|
"1104109 \"Accountant\" \n",
|
||||||
|
"1104110 \"A\" Softball Coach - Central Middle School \n",
|
||||||
|
"\n",
|
||||||
|
" job_location summary_length \n",
|
||||||
|
"0 Lake George, NY 7342.0 \n",
|
||||||
|
"1 Glens Falls, NY 2787.0 \n",
|
||||||
|
"2 Lake George, NY 2920.0 \n",
|
||||||
|
"3 Saratoga Springs, NY 2945.0 \n",
|
||||||
|
"4 Lake George, NY 2969.0 \n",
|
||||||
|
"... ... ... \n",
|
||||||
|
"1104106 Kentucky, United States 2427.0 \n",
|
||||||
|
"1104107 Shavano Park, TX 1497.0 \n",
|
||||||
|
"1104108 Basking Ridge, NJ 1073.0 \n",
|
||||||
|
"1104109 Austin, TX 1993.0 \n",
|
||||||
|
"1104110 East Corinth, ME 718.0 \n",
|
||||||
|
"\n",
|
||||||
|
"[1104111 rows x 3 columns]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 41,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"%%time\n",
|
||||||
|
"# Group by company, job_title, and month, and calculate the mean of summary_length\n",
|
||||||
|
"grouped_df = df_merged.groupby(['job_title', 'job_location']).agg({'summary_length': 'mean'})\n",
|
||||||
|
"\n",
|
||||||
|
"# Reset index to sort by job_title and month\n",
|
||||||
|
"grouped_df = grouped_df.reset_index()\n",
|
||||||
|
"\n",
|
||||||
|
"# Sort by job_title and month\n",
|
||||||
|
"sorted_df = grouped_df.sort_values(by=['job_title', 'job_location','summary_length'],\n",
|
||||||
|
" ascending=False).reset_index(drop=True).fillna(0)\n",
|
||||||
|
"sorted_df"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "08c97b81-64c5-48fb-8fe0-d36789cf3deb",
|
||||||
|
"metadata": {
|
||||||
|
"id": "08c97b81-64c5-48fb-8fe0-d36789cf3deb"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"The acceleration is consistently 10x+ for complex aggregations and sorting that involve multiple columns."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"id": "9bcc719b-666a-4bc9-97d6-16f448b5c707",
|
||||||
|
"metadata": {
|
||||||
|
"id": "9bcc719b-666a-4bc9-97d6-16f448b5c707"
|
||||||
|
},
|
||||||
|
"source": [
|
||||||
|
"# Summary\n",
|
||||||
|
"\n",
|
||||||
|
"With cudf.pandas, you can keep using pandas as your primary dataframe library. When things start to get a little slow, just load the `cudf.pandas` extension and enjoy the incredible speedups.\n",
|
||||||
|
"\n",
|
||||||
|
"To learn more about cudf.pandas, we encourage you to visit https://rapids.ai/cudf-pandas."
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"accelerator": "GPU",
|
||||||
|
"colab": {
|
||||||
|
"gpuType": "T4",
|
||||||
|
"provenance": []
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3 (ipykernel)",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"file_extension": ".py",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"version": "3.12.11"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5
|
||||||
|
}
|
||||||
2311
nvidia/cuda-x-data-science/assets/cuml_sklearn_demo.ipynb
Normal file
2311
nvidia/cuda-x-data-science/assets/cuml_sklearn_demo.ipynb
Normal file
File diff suppressed because one or more lines are too long
190
nvidia/vibe-coding/README.md
Normal file
190
nvidia/vibe-coding/README.md
Normal file
@ -0,0 +1,190 @@
|
|||||||
|
# Vibe Coding in VS Code
|
||||||
|
|
||||||
|
> Use DGX Spark as a local or remote Vibe Coding assistant with Ollama and Continue
|
||||||
|
|
||||||
|
## Table of Contents
|
||||||
|
|
||||||
|
- [Overview](#overview)
|
||||||
|
- [What You'll Accomplish](#what-youll-accomplish)
|
||||||
|
- [Prerequisites](#prerequisites)
|
||||||
|
- [Requirements](#requirements)
|
||||||
|
- [Instructions](#instructions)
|
||||||
|
- [Troubleshooting](#troubleshooting)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
## DGX Spark Vibe Coding
|
||||||
|
|
||||||
|
This playbook walks you through setting up DGX Spark as a **Vibe Coding assistant** — locally or as a remote coding companion for VSCode with Continue.dev.
|
||||||
|
This guide uses **Ollama** with **GPT-OSS 120B** to provide easy deployment of a coding assistant to VSCode. Included is advanced instructions to allow DGX Spark and Ollama to provide the coding assistant to be available over your local network. This guide is also written on a **fresh installation* of the OS. If your OS is not freshly installed and you have issues, see the troubleshooting section at the bottom of the document.
|
||||||
|
|
||||||
|
### What You'll Accomplish
|
||||||
|
|
||||||
|
You'll have a fully configured DGX Spark system capable of:
|
||||||
|
- Running local code assistance through Ollama.
|
||||||
|
- Serving models remotely for Continue and VSCode integration.
|
||||||
|
- Hosting large LLMs like GPT-OSS 120B using unified memory.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- DGX Spark (128GB unified memory recommended)
|
||||||
|
- Internet access for model downloads
|
||||||
|
- Basic familiarity with the terminal
|
||||||
|
- Optional: firewall control for remote access configuration
|
||||||
|
|
||||||
|
### Requirements
|
||||||
|
|
||||||
|
- **Ollama** and an LLM of your choice (e.g., `gpt-oss:120b`)
|
||||||
|
- **VSCode**
|
||||||
|
- **Continue** VSCode extension
|
||||||
|
- Basic familiarity with opening the Linux terminal, copying and pasting commands.
|
||||||
|
- Having sudo access.
|
||||||
|
|
||||||
|
## Instructions
|
||||||
|
|
||||||
|
## Step 1. Install Ollama
|
||||||
|
|
||||||
|
Install the latest version of Ollama using the following command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -fsSL https://ollama.com/install.sh | sh
|
||||||
|
```
|
||||||
|
Once the service is running, pull the desired model:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ollama pull gpt-oss:120b
|
||||||
|
```
|
||||||
|
|
||||||
|
## Step 2. (Optional) Enable Remote Access
|
||||||
|
|
||||||
|
To allow remote connections (e.g., from a workstation using VSCode and Continue), modify the Ollama systemd service:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl edit ollama
|
||||||
|
```
|
||||||
|
|
||||||
|
Add the following lines beneath the commented section:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
[Service]
|
||||||
|
Environment="OLLAMA_HOST=0.0.0.0:11434"
|
||||||
|
Environment="OLLAMA_ORIGINS=*"
|
||||||
|
```
|
||||||
|
|
||||||
|
Reload and restart the service:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl restart ollama
|
||||||
|
```
|
||||||
|
|
||||||
|
If using a firewall, open port 11434:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
sudo ufw allow 11434/tcp
|
||||||
|
```
|
||||||
|
|
||||||
|
Verify that the workstation can connect to your DGX Spark's Ollama server:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -v http://YOUR_SPARK_IP:11434/api/version
|
||||||
|
```
|
||||||
|
Replace YOUR_SPARK_IP with your DGX Spark's IP address.
|
||||||
|
If the connection fails please see the troubleshooting section at the bottom of this document.
|
||||||
|
|
||||||
|
## Step 3. Install VSCode
|
||||||
|
|
||||||
|
For DGX Spark (ARM-based), download and install VSCode:
|
||||||
|
Navigate to https://code.visualstudio.com/download and download the Linux ARM64 version of VSCode. After
|
||||||
|
the download completes note the downloaded package name. Use it in the next command in place of DOWNLOADED_PACKAGE_NAME.
|
||||||
|
```bash
|
||||||
|
sudo dpkg -i DOWNLOADED_PACKAGE_NAME
|
||||||
|
```
|
||||||
|
|
||||||
|
If using a remote workstation, **install VSCode appropriate for your system architecture**.
|
||||||
|
|
||||||
|
## Step 4. Install Continue.dev Extension
|
||||||
|
|
||||||
|
Open VSCode and install **Continue.dev** from the Marketplace.
|
||||||
|
After installation, click the Continue icon on the right-hand bar.
|
||||||
|
|
||||||
|
|
||||||
|
## Step 5. Local Inference Setup
|
||||||
|
- Click Select **Or, configure your own models**
|
||||||
|
- Click **Click here to view more providers**
|
||||||
|
- Choose **Ollama** as the provider.
|
||||||
|
- For **Model**, select **Autodetect**.
|
||||||
|
- Test inference by sending a test prompt.
|
||||||
|
|
||||||
|
Your downloaded model will now be the default (e.g., `gpt-oss:120b`) for inference.
|
||||||
|
|
||||||
|
## Step 6. Setting up a Workstation to Connect to the DGX Spark' Ollama Server
|
||||||
|
|
||||||
|
To connect a workstation running VSCode to a remote DGX Spark instance the following must be completed on that workstation:
|
||||||
|
- Install Continue from the marketplace.
|
||||||
|
- Click on the Continue icon on the left pane.
|
||||||
|
- Click ***Or, configure your own models***
|
||||||
|
- Click **Click here to view more providers.
|
||||||
|
- Select ***Ollama*** from the provider list.
|
||||||
|
- Select ***Autodetect*** as the model.
|
||||||
|
|
||||||
|
Continue **wil** fail to detect the model as it is attempting to connect to a locally hosted Ollama server.
|
||||||
|
- Find the **gear** icon in the upper right corner of the chat window and click on it.
|
||||||
|
- On the left pane, click **Models**
|
||||||
|
- Next to the first dropdown menu under **Chat** click the gear icon.
|
||||||
|
- Continue's config.yaml will open. Take note of your DGX Spark's IP address.
|
||||||
|
- Replace the configuration with the following. **YOUR_SPARK_IP** should be replaced with your DGX Spark's IP.
|
||||||
|
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
name: Config
|
||||||
|
version: 1.0.0
|
||||||
|
schema: v1
|
||||||
|
|
||||||
|
assistants:
|
||||||
|
- name: default
|
||||||
|
model: OllamaSpark
|
||||||
|
|
||||||
|
models:
|
||||||
|
- name: OllamaSpark
|
||||||
|
provider: ollama
|
||||||
|
model: gpt-oss:120b
|
||||||
|
apiBase: http://YOUR_SPARK_IP:11434
|
||||||
|
title: gpt-oss:120b
|
||||||
|
roles:
|
||||||
|
- chat
|
||||||
|
- edit
|
||||||
|
- autocomplete
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `YOUR_SPARK_IP` with the IP address of your DGX Spark.
|
||||||
|
Add additional model entries for any other Ollama models you wish to host remotely.
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
## Common Issues
|
||||||
|
|
||||||
|
**1. Ollama not starting**
|
||||||
|
- Verify GPU drivers are installed correctly.
|
||||||
|
Run `nvidia-smi` in the terminal. If the command fails check DGX Dashboard for updates to your DGX Spark.
|
||||||
|
If there are no updates or updates do not correct the issue, create a thread on the DGX Spark/GB10 User forum here :
|
||||||
|
https://forums.developer.nvidia.com/c/accelerated-computing/dgx-spark-gb10/dgx-spark-gb10/
|
||||||
|
- Run `ollama serve` on the DGX Spark to view Ollama logs.
|
||||||
|
|
||||||
|
**2. Continue can't connect over the network**
|
||||||
|
- Ensure port 11434 is open and accessible from your workstation.
|
||||||
|
```bash
|
||||||
|
ss -tuln | grep 11434
|
||||||
|
```
|
||||||
|
If the output does not reflect " tcp LISTEN 0 4096 *:11434 *:* "
|
||||||
|
go back to step 2 and run the ufw command.
|
||||||
|
|
||||||
|
**3. Continue can't detect a locally running Ollama model
|
||||||
|
- Check `OLLAMA_HOST` and `OLLAMA_ORIGINS` in `/etc/systemd/system/ollama.service.d/override.conf`.
|
||||||
|
- If `OLLAMA_HOST` and `OLLAMA_ORIGINS` are set correctly you should add these lines to your .bashrc.
|
||||||
|
|
||||||
|
**4. High memory usage**
|
||||||
|
- Use smaller models such as `gpt-oss:20b` for lightweight usage.
|
||||||
|
- Confirm no other large models or containers are running with `nvidia-smi`.
|
||||||
@ -118,7 +118,7 @@ sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
|
|||||||
|
|
||||||
## Step 1. Configure network connectivity
|
## Step 1. Configure network connectivity
|
||||||
|
|
||||||
Follow the network setup instructions from the [Connect two Sparks](https://build.nvidia.com/spark/stack-sparks/stacked-sparks) playbook to establish connectivity between your DGX Spark nodes.
|
Follow the network setup instructions from the [Connect two Sparks](https://build.nvidia.com/spark/connect-two-sparks) playbook to establish connectivity between your DGX Spark nodes.
|
||||||
|
|
||||||
This includes:
|
This includes:
|
||||||
- Physical QSFP cable connection
|
- Physical QSFP cable connection
|
||||||
@ -340,7 +340,7 @@ http://192.168.100.10:8265
|
|||||||
| Container registry authentication fails | Invalid or expired GitLab token | Generate new auth token |
|
| Container registry authentication fails | Invalid or expired GitLab token | Generate new auth token |
|
||||||
| SM_121a architecture not recognized | Missing LLVM patches | Verify SM_121a patches applied to LLVM source |
|
| SM_121a architecture not recognized | Missing LLVM patches | Verify SM_121a patches applied to LLVM source |
|
||||||
|
|
||||||
## Common Issues for running on two Starks
|
## Common Issues for running on two Sparks
|
||||||
| Symptom | Cause | Fix |
|
| Symptom | Cause | Fix |
|
||||||
|---------|--------|-----|
|
|---------|--------|-----|
|
||||||
| Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
|
| Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |
|
||||||
|
|||||||
@ -1,4 +1,4 @@
|
|||||||
# Video Search and Summarization
|
# Build a Video Search and Summarization (VSS) Agent
|
||||||
|
|
||||||
> Run the VSS Blueprint on your Spark
|
> Run the VSS Blueprint on your Spark
|
||||||
|
|
||||||
@ -30,8 +30,8 @@ You will deploy NVIDIA's VSS AI Blueprint on NVIDIA Spark hardware with Blackwel
|
|||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
- NVIDIA Spark device with ARM64 architecture and Blackwell GPU
|
- NVIDIA Spark device with ARM64 architecture and Blackwell GPU
|
||||||
- FastOS 1.81.38 or compatible ARM64 system
|
- NVIDIA DGX OS 7.2.3 or higher
|
||||||
- Driver version 580.82.09 or higher installed: `nvidia-smi | grep "Driver Version"`
|
- Driver version 580.95.05 or higher installed: `nvidia-smi | grep "Driver Version"`
|
||||||
- CUDA version 13.0 installed: `nvcc --version`
|
- CUDA version 13.0 installed: `nvcc --version`
|
||||||
- Docker installed and running: `docker --version && docker compose version`
|
- Docker installed and running: `docker --version && docker compose version`
|
||||||
- Access to NVIDIA Container Registry with [NGC API Key](https://org.ngc.nvidia.com/setup/api-keys)
|
- Access to NVIDIA Container Registry with [NGC API Key](https://org.ngc.nvidia.com/setup/api-keys)
|
||||||
@ -278,6 +278,10 @@ Open these URLs in your browser:
|
|||||||
|
|
||||||
In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://build.nvidia.com/). Alternatively, you can configure your own hosted endpoints by following the instructions in the [VSS remote deployment guide](https://docs.nvidia.com/vss/latest/content/installation-remote-docker-compose.html).
|
In this hybrid deployment, we would use NIMs from [build.nvidia.com](https://build.nvidia.com/). Alternatively, you can configure your own hosted endpoints by following the instructions in the [VSS remote deployment guide](https://docs.nvidia.com/vss/latest/content/installation-remote-docker-compose.html).
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> Fully local deployment using smaller LLM (Llama 3.1 8B) is also possible.
|
||||||
|
> To set up a fully local VSS deployment, follow the [instructions in the VSS documentation](https://docs.nvidia.com/vss/latest/content/vss_dep_docker_compose_arm.html#local-deployment-single-gpu-dgx-spark).
|
||||||
|
|
||||||
**9.1 Get NVIDIA API Key**
|
**9.1 Get NVIDIA API Key**
|
||||||
|
|
||||||
- Log in to https://build.nvidia.com/explore/discover.
|
- Log in to https://build.nvidia.com/explore/discover.
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user