Merge 48fc5eb30e into a0e917e6f5

chore: Regenerate all playbooks
2026-06-22 14:19:30 +00:00 · 2026-06-11 02:17:07 +03:00 · 2026-06-10 22:36:25 +00:00 · 2026-06-04 14:56:19 +00:00 · 2026-06-04 14:55:49 +00:00 · 2026-03-09 17:19:09 -06:00
12 changed files with 113 additions and 58 deletions
--- a/nvidia/llama-cpp/README.md
+++ b/nvidia/llama-cpp/README.md
@ -1,7 +1,6 @@
 # Run models with llama.cpp on DGX Spark

-> Build llama.cpp with CUDA and serve models via an OpenAI-compatible API (Nemotron 3 Nano Omni as example)
-
+> Build llama.cpp with CUDA and serve models via an OpenAI-compatible API

 ## Table of Contents

--- a/nvidia/sglang/README.md
+++ b/nvidia/sglang/README.md
@ -39,9 +39,9 @@ vision-language tasks using models like DeepSeek-V2-Lite.
 - NVIDIA Spark device with Blackwell architecture
 - Docker Engine installed and running: `docker --version`
 - NVIDIA GPU drivers installed: `nvidia-smi`
- NVIDIA Container Toolkit configured: `docker run --rm --gpus all lmsysorg/sglang@sha256:ceaf8b16e02d165143633ac228bbb994a05fe77d7e0526cf035ae4bbf4eacc36 nvidia-smi`
+- NVIDIA Container Toolkit configured: `docker run --rm --gpus all lmsysorg/sglang:latest-cu130 nvidia-smi`
 - Sufficient disk space (>20GB available): `df -h`
- Network connectivity for pulling containers: `docker pull lmsysorg/sglang@sha256:ceaf8b16e02d165143633ac228bbb994a05fe77d7e0526cf035ae4bbf4eacc36`
+- Network connectivity for pulling containers: `docker pull lmsysorg/sglang:latest-cu130`

 ## Ancillary files

@ -103,7 +103,7 @@ docker --version
 nvidia-smi

 ## Verify Docker GPU support
-docker run --rm --gpus all lmsysorg/sglang@sha256:ceaf8b16e02d165143633ac228bbb994a05fe77d7e0526cf035ae4bbf4eacc36 nvidia-smi
+docker run --rm --gpus all lmsysorg/sglang:latest-cu130 nvidia-smi

 ## Check available disk space
 df -h /
@ -124,7 +124,7 @@ several minutes depending on your network connection.

 ```bash
 ## Pull the SGLang container
-docker pull lmsysorg/sglang@sha256:ceaf8b16e02d165143633ac228bbb994a05fe77d7e0526cf035ae4bbf4eacc36
+docker pull lmsysorg/sglang:latest-cu130

 ## Verify the image was downloaded
 docker images | grep sglang
@ -140,7 +140,7 @@ server inside the container, exposing it on port 30000 for client connections.
 docker run --gpus all -it --rm \
  -p 30000:30000 \
  -v /tmp:/tmp \
-  lmsysorg/sglang@sha256:ceaf8b16e02d165143633ac228bbb994a05fe77d7e0526cf035ae4bbf4eacc36 \
+  lmsysorg/sglang:latest-cu130 \
  bash
 ```

@ -237,7 +237,7 @@ docker ps | grep sglang | awk '{print $1}' | xargs docker stop
 docker container prune -f

 ## Remove SGLang images (optional)
-docker rmi lmsysorg/sglang@sha256:ceaf8b16e02d165143633ac228bbb994a05fe77d7e0526cf035ae4bbf4eacc36
+docker rmi lmsysorg/sglang:latest-cu130
 ```

 ## Step 10. Next steps
--- a/nvidia/station-brev/README.md
+++ b/nvidia/station-brev/README.md
@ -52,21 +52,18 @@ You will also need the following:

 ## Step 1. Log in to Brev

-Go to the [Brev UI](https://brev.nvidia.com), log in, and confirm you’re in the correct org (by clicking the org button on the top right-hand side of the page). Once logged in, go to the [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute) section under the "GPU" tab in the main navigation.
+Go to the [Brev UI](https://brev.nvidia.com), log in, and confirm you’re in the correct org (by clicking the org button on the top right hand side of the page). Once logged in, go to the [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute) section under the "GPU" tab in the main navigation.

 Click the “Register Compute” button and follow the instructions in the pop-up window.

-## Step 2. Complete Pop-up Instructions
+## Step 2. Complete Popup Instructions

 * Install the Brev CLI
 * Configure your compute
    * Add a name for compute
-    * To configure SSH, ensure the “Enable SSH access” toggle is on
+    * To configure ssh, ensure the “Enable SSH access” toggle is on
 * Run the registration command

-> [!IMPORTANT]
-> Run the Brev CLI install command **without `sudo`**. Prefixing the installer with `sudo` writes the `brev` binary into root's home directory, which is not on your user shell's `PATH` — the next command will fail with `brev: command not found`. Copy the install command from the pop-up and run it as your normal user.
-
 ## Step 3. Follow Registration Flow

 In the CLI, you’ll be walked through registration. Go through the flow until registration is complete.
@ -83,14 +80,10 @@ Your DGX Station is now integrated into Brev as a secure, remotely accessible GP

 Now that your hardware is connected, you can:

-* **Access your machine from anywhere:** Open the [Brev UI](https://brev.nvidia.com) and launch a session from [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute).
-* **Share access with others:** Invite teammates to your DGX Station from the Brev UI:
-    * Go to the [Brev UI](https://brev.nvidia.com) and open [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute).
-    * Find your DGX Station in the list and open the row's three-dot (⋯) menu.
-    * Select **Share Access**.
-    * Enter the email address of the person you want to share with.
-    * Choose their role / permission level.
-    * Confirm to send the invitation.
+* **Share Access Anywhere:** Access your machine from anywhere and share access with others through the Brev UI by:
+    * Adding the user to your [Team](https://brev.nvidia.com/org/team)
+    * Navigating to your instance in the [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute) section
+    * In **SSH Access** section of the instance, search for the user you wish to add and click **Modify Access** to enable access

 ## Step 6. Cleanup

@ -105,7 +98,7 @@ brev deregister
 In the UI:
 * Go to the [Brev UI](https://brev.nvidia.com)
 * Navigate to the section listing “GPU Environments” and look under “Registered Compute”
-* Click the “Remove” menu item on the device you wish to delete from Brev.
+* Click the “Remove” menu item on the DGX Station you wish to delete from Brev.
 * Confirm your selection.

 ## Troubleshooting
--- a/nvidia/station-brev/endpoint-test.yaml
+++ b/nvidia/station-brev/endpoint-test.yaml
@ -82,21 +82,18 @@ spec:
      content: |
        # Step 1. Log in to Brev
        
-        Go to the [Brev UI](https://brev.nvidia.com), log in, and confirm you’re in the correct org (by clicking the org button on the top right-hand side of the page). Once logged in, go to the [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute) section under the "GPU" tab in the main navigation.
+        Go to the [Brev UI](https://brev.nvidia.com), log in, and confirm you’re in the correct org (by clicking the org button on the top right hand side of the page). Once logged in, go to the [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute) section under the "GPU" tab in the main navigation.
        
        Click the “Register Compute” button and follow the instructions in the pop-up window.
        
-        # Step 2. Complete Pop-up Instructions
+        # Step 2. Complete Popup Instructions
        
        * Install the Brev CLI
        * Configure your compute
            * Add a name for compute
-            * To configure SSH, ensure the “Enable SSH access” toggle is on
+            * To configure ssh, ensure the “Enable SSH access” toggle is on
        * Run the registration command
        
-        > [!IMPORTANT]
-        > Run the Brev CLI install command **without `sudo`**. Prefixing the installer with `sudo` writes the `brev` binary into root's home directory, which is not on your user shell's `PATH` — the next command will fail with `brev: command not found`. Copy the install command from the pop-up and run it as your normal user.
-        
        # Step 3. Follow Registration Flow
        
        In the CLI, you’ll be walked through registration. Go through the flow until registration is complete.
@ -113,14 +110,10 @@ spec:
        
        Now that your hardware is connected, you can:
        
-        * **Access your machine from anywhere:** Open the [Brev UI](https://brev.nvidia.com) and launch a session from [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute).
-        * **Share access with others:** Invite teammates to your DGX Station from the Brev UI:
-            * Go to the [Brev UI](https://brev.nvidia.com) and open [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute).
-            * Find your DGX Station in the list and open the row's three-dot (⋯) menu.
-            * Select **Share Access**.
-            * Enter the email address of the person you want to share with.
-            * Choose their role / permission level.
-            * Confirm to send the invitation.
+        * **Share Access Anywhere:** Access your machine from anywhere and share access with others through the Brev UI by:
+            * Adding the user to your [Team](https://brev.nvidia.com/org/team)
+            * Navigating to your instance in the [Registered Compute](https://brev.nvidia.com/org/environments?tab=registered-compute) section
+            * In **SSH Access** section of the instance, search for the user you wish to add and click **Modify Access** to enable access
        
        # Step 6. Cleanup
        
@ -135,7 +128,7 @@ spec:
        In the UI:
        * Go to the [Brev UI](https://brev.nvidia.com)
        * Navigate to the section listing “GPU Environments” and look under “Registered Compute”
-        * Click the “Remove” menu item on the device you wish to delete from Brev.
+        * Click the “Remove” menu item on the DGX Station you wish to delete from Brev.
        * Confirm your selection.
        
      
--- a/nvidia/station-nanochat/endpoint-production.yaml
+++ b/nvidia/station-nanochat/endpoint-production.yaml
@ -107,7 +107,7 @@ spec:
        
        # Time & risk
        
-        - **Estimated time:** ~30 minutes for setup. Full d24 training takes on the order of 16+ hours on a single GB300 Ultra.
+        - **Estimated time:** ~30 minutes for setup. Full d24 training takes on the order of 12+ hours on a single GB300 Ultra.
        - **Risk level:** Medium
          - Large downloads (FineWeb) can be slow; ensure stable network and disk space.
          - API keys (W&B, HF) must be set or `launch.sh` will exit immediately.
@ -184,7 +184,7 @@ spec:
        3. **SFT** — downloads synthetic identity conversations, fine-tunes for chat
        4. **Report generation** — produces `report.md` with metrics and samples
        
-        Training on a single GB300 Ultra takes on the order of 16+ hours for the full d24 run.
+        Training on a single GB300 Ultra takes on the order of 12+ hours for the full d24 run.
        
        # Step 4. Monitor training
        
--- a/nvidia/txt2kg/assets/frontend/components/embeddings-generator.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/embeddings-generator.tsx
@ -226,6 +226,8 @@ export function EmbeddingsGenerator({ showTripleExtraction = false }: Embeddings
          const model = JSON.parse(selectedModel);
          if (model.provider === "ollama") {
            processingMethod = `Ollama ${model.model || 'qwen3:1.7b'}`;
+          } else if (model.provider === "vllm") {
+            processingMethod = `vLLM ${model.model || 'local model'}`;
          } else if (model.id?.startsWith("nvidia-")) {
            processingMethod = 'NVIDIA Nemotron';
          }
@ -242,14 +244,36 @@ export function EmbeddingsGenerator({ showTripleExtraction = false }: Embeddings
      
      // Call processDocuments with the selected document IDs and processing options
      const useGraphTransformer = useLangChain && langChainMethod === 'graphtransformer';
-      await processDocuments(selectedDocs, {
+      const processingOptions: Parameters<typeof processDocuments>[1] = {
        useLangChain,
        useGraphTransformer,
        promptConfigs: options || undefined,
        chunkSize: options?.chunkSize,
        overlapSize: options?.overlapSize,
        chunkingMethod: options?.chunkingMethod
-      });
+      };
+
+      try {
+        const selectedModel = localStorage.getItem("selectedModel");
+        if (selectedModel) {
+          const model = JSON.parse(selectedModel);
+          if (model.provider === "ollama") {
+            processingOptions.llmProvider = "ollama";
+            processingOptions.ollamaModel = model.model || "qwen3:1.7b";
+            processingOptions.ollamaBaseUrl = model.baseURL || "http://localhost:11434/v1";
+          } else if (model.provider === "vllm") {
+            processingOptions.llmProvider = "vllm";
+            processingOptions.vllmModel = model.model;
+            processingOptions.vllmBaseUrl = model.baseURL || "http://localhost:8001/v1";
+          } else if (model.provider === "nvidia" || model.id?.startsWith("nvidia-")) {
+            processingOptions.llmProvider = "nvidia";
+          }
+        }
+      } catch (e) {
+        console.log("Could not parse selected model, using default extraction provider");
+      }
+
+      await processDocuments(selectedDocs, processingOptions);
      
      // Navigate to the edit tab after processing is complete
      setTimeout(() => {
@ -1265,4 +1289,4 @@ function InfoIcon(props: React.SVGProps<SVGSVGElement>) {
      <path d="M12 8h.01" />
    </svg>
  )
-} 
+}
--- a/nvidia/txt2kg/assets/frontend/components/model-selector.tsx
+++ b/nvidia/txt2kg/assets/frontend/components/model-selector.tsx
@ -151,7 +151,9 @@ export function ModelSelector() {
      
      // Default to first available local model (vLLM or Ollama)
      const localModel = availableModels.find(m => m.provider === "vllm" || m.provider === "ollama")
-      setSelectedModel(localModel || availableModels[0])
+      const defaultModel = localModel || availableModels[0]
+      setSelectedModel(defaultModel)
+      localStorage.setItem("selectedModel", JSON.stringify(defaultModel))
    }
    
    setIsLoading(false)
--- a/nvidia/txt2kg/assets/frontend/contexts/document-context.tsx
+++ b/nvidia/txt2kg/assets/frontend/contexts/document-context.tsx
@ -49,7 +49,7 @@ export type Document = {
  }
 }

-export type LLMProvider = 'nvidia' | 'ollama';
+export type LLMProvider = 'nvidia' | 'ollama' | 'vllm';

 export type ProcessingOptions = {
  useLangChain?: boolean;
@ -58,6 +58,8 @@ export type ProcessingOptions = {
  llmProvider?: LLMProvider;
  ollamaModel?: string;
  ollamaBaseUrl?: string;
+  vllmModel?: string;
+  vllmBaseUrl?: string;
  chunkSize?: number;
  overlapSize?: number;
  chunkingMethod?: 'optimized' | 'pyg';
@ -451,6 +453,8 @@ export function DocumentProvider({ children }: { children: React.ReactNode }) {
      llmProvider = 'ollama',
      ollamaModel = 'qwen3:1.7b',
      ollamaBaseUrl = 'http://localhost:11434/v1',
+      vllmModel,
+      vllmBaseUrl,
      chunkSize = 64000,
      overlapSize = 2000,
      chunkingMethod = 'optimized'
@ -460,6 +464,8 @@ export function DocumentProvider({ children }: { children: React.ReactNode }) {
      llmProvider,
      ollamaModel,
      ollamaBaseUrl,
+      vllmModel,
+      vllmBaseUrl,
      chunkSize,
      overlapSize,
      chunkingMethod
@ -485,6 +491,8 @@ export function DocumentProvider({ children }: { children: React.ReactNode }) {
      llmProvider?: LLMProvider;
      ollamaModel?: string;
      ollamaBaseUrl?: string;
+      vllmModel?: string;
+      vllmBaseUrl?: string;
      chunkSize?: number;
      overlapSize?: number;
      chunkingMethod?: 'optimized' | 'pyg';
@ -673,6 +681,12 @@ export function DocumentProvider({ children }: { children: React.ReactNode }) {
                if (llmOptions.ollamaBaseUrl) {
                  requestBody.ollamaBaseUrl = llmOptions.ollamaBaseUrl;
                }
+                if (llmOptions.vllmModel) {
+                  requestBody.vllmModel = llmOptions.vllmModel;
+                }
+                if (llmOptions.vllmBaseUrl) {
+                  requestBody.vllmBaseUrl = llmOptions.vllmBaseUrl;
+                }
              }
              
              // Add prompt configs if available
@ -1273,4 +1287,4 @@ export function useDocuments() {
    throw new Error("useDocuments must be used within a DocumentProvider")
  }
  return context
-}
+}
--- a/nvidia/txt2kg/assets/frontend/lib/langchain-service.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/langchain-service.ts
@ -290,9 +290,6 @@ export class LangChainService {
        configuration: {
          baseURL: baseURL,
          timeout: 120000, // 2 minute timeout for vLLM inference
-        },
-        modelKwargs: {
-          "response_format": { "type": "text" }
        }
      });
      
@ -320,4 +317,4 @@ export class LangChainService {
 }

 // Export a singleton instance for convenience
-export const langChainService = LangChainService.getInstance(); 
+export const langChainService = LangChainService.getInstance();
--- a/nvidia/txt2kg/assets/frontend/lib/qdrant.ts
+++ b/nvidia/txt2kg/assets/frontend/lib/qdrant.ts
@ -153,6 +153,16 @@ export class QdrantService {
        return true;
      }

+      const collectionsResponse = await fetch(`${this.hostUrl}/collections`, {
+        method: 'GET'
+      });
+
+      if (collectionsResponse.ok) {
+        console.log(`Qdrant server is reachable`);
+        this.isQdrantRunningCheck = false;
+        return true;
+      }
+
      console.log('Qdrant health check failed - server might not be running');
      this.isQdrantRunningCheck = false;
      return false;
@ -534,6 +544,21 @@ export class QdrantService {
  public async getStats(): Promise<any> {
    try {
      console.log('Getting stats from Qdrant...');
+      const isRunning = await this.isQdrantRunning();
+      if (!isRunning) {
+        return {
+          totalVectorCount: 0,
+          source: 'qdrant',
+          httpHealthy: false,
+          url: this.hostUrl,
+          error: `Qdrant is not reachable at ${this.hostUrl}. Start vector search with ./start.sh --vector-search if you need Vector DB features.`
+        };
+      }
+
+      if (!this.initialized) {
+        await this.initialize();
+      }
+
      const response = await this.makeRequest(`/collections/${this.collectionName}`, 'GET');

      if (response && response.result) {
@ -554,17 +579,19 @@ export class QdrantService {
        console.log(`Qdrant stats request failed`);
        return {
          totalVectorCount: 0,
-          source: 'error',
-          httpHealthy: false,
-          error: 'Failed to get stats'
+          source: 'qdrant',
+          httpHealthy: true,
+          url: this.hostUrl,
+          error: `Qdrant is reachable, but collection '${this.collectionName}' is not available.`
        };
      }
    } catch (error) {
      console.log('Qdrant connection failed - server may not be running');
      return {
        totalVectorCount: 0,
-        source: 'error',
+        source: 'qdrant',
        httpHealthy: false,
+        url: this.hostUrl,
        error: error instanceof Error ? error.message : String(error)
      };
    }
--- a/nvidia/txt2kg/assets/start.sh
+++ b/nvidia/txt2kg/assets/start.sh
@ -176,6 +176,10 @@ if [ "$USE_VECTOR_SEARCH" = true ]; then
  echo "  • Qdrant: http://localhost:6333"
  echo "  • Sentence Transformers: http://localhost:8000"
  echo ""
+else
+  echo "Vector Search Services: disabled"
+  echo "  • Start with --vector-search to enable Vector DB status and embedding search"
+  echo ""
 fi

 echo "Next steps:"
--- a/nvidia/vibe-coding/README.md
+++ b/nvidia/vibe-coding/README.md
@ -171,10 +171,12 @@ Add additional model entries for any other Ollama models you wish to host remote

 | Symptom | Cause | Fix |
 |---------|-------|-----|
-|Ollama not starting|GPU drivers may not be installed correctly|Run `nvidia-smi` in the terminal. If the command fails check DGX Dashboard for updates to your DGX Spark.|
-|Continue can't connect over the network|Port 11434 may not be open or accessible|Run command `ss -tuln \| grep 11434`. If the output does not reflect ` tcp   LISTEN 0      4096               *:11434            *:*  `, go back to step 2 and run the ufw command.|
-|Continue can't detect a locally running Ollama model|Configuration not properly set or detected|Check `OLLAMA_HOST` and `OLLAMA_ORIGINS` in `/etc/systemd/system/ollama.service.d/override.conf` file. If `OLLAMA_HOST` and `OLLAMA_ORIGINS` are set correctly, add these lines to your `~/.bashrc` file.|
-|High memory usage|Model size too big|Confirm no other large models or containers are running with `nvidia-smi`. Use smaller models such as `gpt-oss:20b` for lightweight usage.|
+| **WiFi connection drops or becomes unreachable** (especially in headless mode) | Aggressive WiFi power-saving settings in NetworkManager | Edit `/etc/NetworkManager/conf.d/default-wifi-powersave-on.conf`, set `wifi.powersave = 2`, and run `sudo systemctl restart NetworkManager`. |
+| **Random reboots and "00" error code on the display** | Watchdog timer module (`sbsa_gwdt`) not loaded | Add `sbsa_gwdt` to `/etc/modules-load.d/watchdog.conf` and reboot to ensure the hardware watchdog is correctly managed by the kernel. |
+| Ollama not starting | GPU drivers may not be installed correctly | Run `nvidia-smi` in the terminal. If the command fails check DGX Dashboard for updates to your DGX Spark. |
+| Continue can't connect over the network | Port 11434 may not be open or accessible | Run command `ss -tuln \| grep 11434`. If the output does not reflect `tcp LISTEN 0 4096 *:11434 *:*`, go back to step 2 and run the ufw command. |
+| Continue can't detect a locally running Ollama model | Configuration not properly set or detected | Check `OLLAMA_HOST` and `OLLAMA_ORIGINS` in `/etc/systemd/system/ollama.service.d/override.conf` file. If `OLLAMA_HOST` and `OLLAMA_ORIGINS` are set correctly, add these lines to your `~/.bashrc` file. |
+| High memory usage | Model size too big | Confirm no other large models or containers are running with `nvidia-smi`. Use smaller models such as `gpt-oss:20b` for lightweight usage. |

 > [!NOTE]
 > DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU.
Author	SHA1	Message	Date
Omar Obando	6461873c40	Merge `48fc5eb30e` into `a0e917e6f5`	2026-06-11 02:17:07 +03:00
GitLab CI	a0e917e6f5	chore: Regenerate all playbooks	2026-06-10 22:36:25 +00:00
GitLab CI	2f703e1793	chore: Regenerate all playbooks	2026-06-04 14:56:19 +00:00
GitLab CI	9ce5aae4f3	chore: Regenerate all playbooks	2026-06-04 14:55:49 +00:00
Omar Obando	48fc5eb30e	Add troubleshooting tips for WiFi and watchdog issues	2026-03-09 17:19:09 -06:00