From b9c45a61a062f6f1a4d00fa9fab9fb438a683d7c Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Fri, 10 Oct 2025 17:27:37 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- README.md | 3 +- nvidia/stack-sparks/README.md | 2 +- nvidia/txt2kg/README.md | 24 +++--- nvidia/vibe-coding/README.md | 153 ---------------------------------- 4 files changed, 13 insertions(+), 169 deletions(-) delete mode 100644 nvidia/vibe-coding/README.md diff --git a/README.md b/README.md index 16d037c..ddff0e1 100644 --- a/README.md +++ b/README.md @@ -42,12 +42,11 @@ Each playbook includes prerequisites, step-by-step instructions, troubleshooting - [RAG application in AI Workbench](nvidia/rag-ai-workbench/) - [SGLang Inference Server](nvidia/sglang/) - [Speculative Decoding](nvidia/speculative-decoding/) -- [Connect two Sparks](nvidia/stack-sparks/) +- [Stack two Sparks](nvidia/stack-sparks/) - [Set up Tailscale on your Spark](nvidia/tailscale/) - [TRT LLM for Inference](nvidia/trt-llm/) - [Text to Knowledge Graph](nvidia/txt2kg/) - [Unsloth on DGX Spark](nvidia/unsloth/) -- [Vibe Coding in VS Code](nvidia/vibe-coding/) - [Install and use vLLM](nvidia/vllm/) - [Vision-Language Model Fine-tuning](nvidia/vlm-finetuning/) - [Install VS Code](nvidia/vscode/) diff --git a/nvidia/stack-sparks/README.md b/nvidia/stack-sparks/README.md index 2b2213b..d3b8bcc 100644 --- a/nvidia/stack-sparks/README.md +++ b/nvidia/stack-sparks/README.md @@ -1,4 +1,4 @@ -# Connect two Sparks +# Stack two Sparks > Connect two Spark devices and setup them up for inference and fine-tuning diff --git a/nvidia/txt2kg/README.md b/nvidia/txt2kg/README.md index 1473e67..ba3fb9c 100644 --- a/nvidia/txt2kg/README.md +++ b/nvidia/txt2kg/README.md @@ -1,6 +1,6 @@ # Text to Knowledge Graph -> Transform unstructured text using LLM inference into interactive knowledge graphs with GPU-accelerated visualization +> Transform unstructured text into interactive knowledge graphs using local GPU-accelerated LLM inference and graph visualization ## Table of Contents @@ -20,16 +20,16 @@ The unified memory architecture enables running larger, more accurate models tha This txt2kg playbook transforms unstructured text documents into structured knowledge graphs using: - **Knowledge Triple Extraction**: Using Ollama with GPU acceleration for local LLM inference to extract subject-predicate-object relationships - **Graph Database Storage**: ArangoDB for storing and querying knowledge triples with relationship traversal -- **Vector Embeddings**: Local SentenceTransformer models for entity embeddings and semantic search - **GPU-Accelerated Visualization**: Three.js WebGPU rendering for interactive 2D/3D graph exploration +> **Future Enhancements**: Vector embeddings and GraphRAG capabilities are planned enhancements. + ## What you'll accomplish You will have a fully functional system capable of processing documents, generating and editing knowledge graphs, and providing querying, accessible through an interactive web interface. The setup includes: - **Local LLM Inference**: Ollama for GPU-accelerated LLM inference with no API keys required - **Graph Database**: ArangoDB for storing and querying triples with relationship traversal -- **Vector Search**: Local Pinecone-compatible storage for entity embeddings and KNN search - **Interactive Visualization**: GPU-accelerated graph rendering with Three.js WebGPU - **Modern Web Interface**: Next.js frontend with document management and query interface - **Fully Containerized**: Reproducible deployment with Docker Compose and GPU support @@ -67,7 +67,7 @@ cd ${MODEL}/assets ## Step 2. Start the txt2kg services -Use the provided start script to launch all required services. This will set up Ollama, ArangoDB, local Pinecone, and the Next.js frontend: +Use the provided start script to launch all required services. This will set up Ollama, ArangoDB, and the Next.js frontend: ```bash ./start.sh @@ -77,7 +77,6 @@ The script will automatically: - Check for GPU availability - Start Docker Compose services - Set up ArangoDB database -- Initialize local Pinecone vector storage - Launch the web interface ## Step 3. Pull an Ollama model (optional) @@ -90,7 +89,7 @@ docker exec ollama-compose ollama pull Browse available models at [https://ollama.com/search](https://ollama.com/search) -> **Note**: The unified memory architecture enables running larger models like 70B parameters, which produce significantly more accurate knowledge triples and deliver superior GraphRAG performance. +> **Note**: The unified memory architecture enables running larger models like 70B parameters, which produce significantly more accurate knowledge triples. ## Step 4. Access the web interface @@ -103,7 +102,6 @@ http://localhost:3001 You can also access individual services: - **ArangoDB Web Interface**: http://localhost:8529 - **Ollama API**: http://localhost:11434 -- **Local Pinecone**: http://localhost:5081 ## Step 5. Upload documents and build knowledge graphs @@ -114,19 +112,19 @@ You can also access individual services: #### 5.2. Knowledge Graph Generation - The system extracts subject-predicate-object triples using Ollama - Triples are stored in ArangoDB for relationship querying -- Entity embeddings are generated and stored in local Pinecone (optional) #### 5.3. Interactive Visualization - View your knowledge graph in 2D or 3D with GPU-accelerated rendering - Explore nodes and relationships interactively -#### 5.4. Graph-based RAG Queries +#### 5.4. Graph-based Queries - Ask questions about your documents using the query interface - Graph traversal enhances context with entity relationships from ArangoDB -- The system uses KNN search to find relevant entities in the vector database (optional) - LLM generates responses using the enriched graph context -## Step 7. Cleanup and rollback +> **Future Enhancement**: GraphRAG capabilities with vector-based KNN search for entity retrieval are planned. + +## Step 6. Cleanup and rollback Stop all services and optionally remove containers: @@ -141,11 +139,11 @@ docker compose down -v docker exec ollama-compose ollama rm llama3.1:8b ``` -## Step 8. Next steps +## Step 7. Next steps - Experiment with different Ollama models for varied extraction quality - Customize triple extraction prompts for domain-specific knowledge -- Explore advanced Graph-based RAG features +- Explore advanced graph querying and visualization features ## Troubleshooting diff --git a/nvidia/vibe-coding/README.md b/nvidia/vibe-coding/README.md deleted file mode 100644 index 67a7627..0000000 --- a/nvidia/vibe-coding/README.md +++ /dev/null @@ -1,153 +0,0 @@ -# Vibe Coding in VS Code - -> Use DGX Spark as a local or remote Vibe Coding assistant with Ollama and Continue.dev - -## Table of Contents - -- [Overview](#overview) - - [What You'll Accomplish](#what-youll-accomplish) - - [Prerequisites](#prerequisites) - - [Requirements](#requirements) -- [Instructions](#instructions) -- [Troubleshooting](#troubleshooting) - ---- - -## Overview - -## DGX Spark Vibe Coding - -This playbook walks you through setting up DGX Spark as a **Vibe Coding assistant** — locally or as a remote coding companion for VSCode with Continue.dev. -While NVIDIA NIMs are not yet widely supported, this guide uses **Ollama** with **GPT-OSS 120B** to provide a high-performance local LLM environment. - -### What You'll Accomplish - -You’ll have a fully configured DGX Spark system capable of: -- Running local code assistance through Ollama. -- Serving models remotely for Continue.dev and VSCode integration. -- Hosting large LLMs like GPT-OSS 120B using unified memory. - -### Prerequisites - -- DGX Spark (128GB unified memory recommended) -- Internet access for model downloads -- Basic familiarity with the terminal -- Optional: firewall control for remote access configuration - -### Requirements - -- **Ollama** and an LLM of your choice (e.g., `gpt-oss:120b`) -- **VSCode** -- **Continue.dev** VSCode extension - -## Instructions - -## Step 1. Install Ollama - -Install the latest version of Ollama using the following command: - -```bash -curl -fsSL https://ollama.com/install.sh | sh -``` - -Start the Ollama service: - -```bash -ollama serve -``` - -Once the service is running, pull the desired model: - -```bash -ollama pull gpt-oss:120b -``` - -## Step 2. (Optional) Enable Remote Access - -To allow remote connections (e.g., from a workstation using VSCode and Continue.dev), modify the Ollama systemd service: - -```bash -sudo systemctl edit ollama -``` - -Add the following lines beneath the commented section: - -```ini -[Service] -Environment="OLLAMA_HOST=0.0.0.0:11434" -Environment="OLLAMA_ORIGINS=*" -``` - -Reload and restart the service: - -```bash -sudo systemctl daemon-reload -sudo systemctl restart ollama -``` - -If using a firewall, open port 11434: - -```bash -sudo ufw allow 11434/tcp -``` - -## Step 3. Install VSCode - -For DGX Spark (ARM-based), download and install VSCode: - -```bash -wget https://code.visualstudio.com/sha/download?build=stable&os=linux-deb-arm64 -O vscode-arm64.deb -sudo apt install ./vscode-arm64.deb -``` - -If using a remote workstation, install VSCode appropriate for your system architecture. - -## Step 4. Install Continue.dev Extension - -Open VSCode and install **Continue.dev** from the Marketplace. -After installation, click the Continue icon on the right-hand bar. - -Skip login and open the manual configuration via the **gear (⚙️)** icon. -This opens `config.yaml`, which controls model settings. - -## Step 5. Local Inference Setup - -- In the Continue chat window, use `Ctrl/Cmd + L` to focus the chat. -- Click **Select Model → + Add Chat Model** -- Choose **Ollama** as the provider. -- Set **Install Provider** to default. -- For **Model**, select **Autodetect**. -- Click **Connect**. - -You can now select your downloaded model (e.g., `gpt-oss:120b`) for local inference. - -## Step 6. Remote Setup for DGX Spark - -To connect Continue.dev to a remote DGX Spark instance, edit `config.yaml` in Continue and add: - -```yaml -models: - - model: gpt-oss:120b - title: gpt-oss:120b - apiBase: http://YOUR_SPARK_IP:11434/ - provider: ollama -``` - -Replace `YOUR_SPARK_IP` with the IP address of your DGX Spark. -Add additional model entries for any other Ollama models you wish to host remotely. - -## Troubleshooting - -## Common Issues - -**1. Ollama not starting** -- Verify Docker and GPU drivers are installed correctly. -- Run `ollama serve` manually to view errors. - -**2. VSCode can’t connect** -- Ensure port 11434 is open and accessible from your workstation. -- Check `OLLAMA_HOST` and `OLLAMA_ORIGINS` in `/etc/systemd/system/ollama.service.d/override.conf`. - -**3. High memory usage** -- Use smaller models such as `gpt-oss:20b` for lightweight usage. -- Confirm no other large models or containers are running with `nvidia-smi`.