sarman/dgx-spark-playbooks

Fork 0

mirror of https://github.com/NVIDIA/dgx-spark-playbooks.git synced 2026-04-22 18:13:52 +00:00

GitLab CI c20b49d138 chore: Regenerate all playbooks

2025-10-06 13:35:52 +00:00

7.7 KiB

Raw Blame History

Ollama

Install and use Ollama

Overview
Instructions

Overview

Basic Idea

This playbook demonstrates how to set up remote access to an Ollama server running on your NVIDIA Spark device using NVIDIA Sync's Custom Apps feature. You'll install Ollama on your Spark device, configure NVIDIA Sync to create an SSH tunnel, and access the Ollama API from your local machine. This eliminates the need to expose ports on your network while enabling AI inference from your laptop through a secure SSH tunnel.

What you'll accomplish

You will have Ollama running on your NVIDIA Spark with Blackwell architecture and accessible via API calls from your local laptop. This setup allows you to build applications or use tools on your local machine that communicate with the Ollama API for large language model inference, leveraging the powerful GPU capabilities of your Spark device without complex network configuration.

What to know before starting

Working with SSH connections and system tray applications
Basic familiarity with terminal commands and cURL for API testing
Understanding of REST API concepts and JSON formatting
Experience with container environments and GPU-accelerated workloads

Prerequisites

DGX Spark device set up and connected to your network
NVIDIA Sync installed and connected to your Spark
Terminal access to your local machine for testing API calls

Time & risk

Duration: 10-15 minutes for initial setup, 2-3 minutes for model download (varies by model size)

Risk level: Low - No system-level changes, easily reversible by stopping the custom app

Rollback: Stop the custom app in NVIDIA Sync and uninstall Ollama with standard package removal if needed

Instructions

Step 1. Verify Ollama installation status

Description: Check if Ollama is already installed on your NVIDIA Spark device. This runs on the Spark device through NVIDIA Sync terminal to determine if installation is needed.

ollama --version

If you see version information, skip to Step 3. If you get "command not found", proceed to Step 2.

Step 2. Install Ollama on your Spark device

Description: Download and install Ollama using the official installation script. This runs on the Spark device and installs the Ollama binary and service components.

curl -fsSL https://ollama.com/install.sh | sh

Wait for the installation to complete. You should see output indicating successful installation.

Step 3. Download and verify a language model

Description: Pull a language model to your Spark device. This downloads the model files and makes them available for inference. The example uses Qwen2.5 30B, optimized for Blackwell GPUs.

ollama pull qwen2.5:32b

Expected output:

pulling manifest
pulling 58574f2e94b9: 100% ████████████████████████████  18 GB
pulling 53e4ea15e8f5: 100% ████████████████████████████ 1.5 KB
pulling d18a5cc71b84: 100% ████████████████████████████  11 KB
pulling cff3f395ef37: 100% ████████████████████████████  120 B
pulling 3cdc64c2b371: 100% ████████████████████████████  494 B
verifying sha256 digest
writing manifest
success

Step 4. Access NVIDIA Sync settings

Description: Open the NVIDIA Sync configuration interface on your local machine to add a new custom application tunnel. This runs on your local laptop/workstation.

Click on the NVIDIA Sync logo in your system tray/taskbar
Click on the gear icon in the top right corner to open Settings window
Click on the "Custom" tab

Step 5. Configure Ollama custom app in NVIDIA Sync

Description: Create a new custom application entry that will establish an SSH tunnel to the Ollama server running on port 11434. This configuration runs on your local machine.

Click the "Add New" button
Fill out the form with these values:
- Name: Ollama Server
- Port: 11434
- Auto open in browser: Leave unchecked (this is an API, not a web interface)
- Start Script: Leave empty
Click "Add"

The new Ollama Server entry should now appear in your NVIDIA Sync custom apps list.

Step 6. Start the SSH tunnel

Description: Activate the SSH tunnel to make the remote Ollama server accessible on your local machine. This creates a secure connection from localhost:11434 to your Spark device.

Click on the NVIDIA Sync logo in your system tray/taskbar
Under the "Custom" section, click on "Ollama Server"

The tunnel is active when you see the connection status indicator in NVIDIA Sync.

Step 7. Validate API connectivity

Description: Test the Ollama API connection from your local machine to ensure the tunnel is working correctly. This runs on your local laptop terminal.

curl http://localhost:11434/api/chat -d '{
  "model": "qwen2.5:32b",
  "messages": [{
    "role": "user",
    "content": "Write me a haiku about GPUs and AI."
  }],
  "stream": false
}'

Expected response format:

{
  "model": "qwen2.5:32b",
  "created_at": "2024-01-15T12:30:45.123Z",
  "message": {
    "role": "assistant",
    "content": "Silicon power flows\nThrough circuits, dreams become real\nAI awakens"
  },
  "done": true
}

Step 8. Test additional API endpoints

Description: Verify other Ollama API functionality to ensure full operation. These commands run on your local machine and test different API capabilities.

Test model listing:

curl http://localhost:11434/api/tags

Test streaming responses:

curl -N http://localhost:11434/api/chat -d '{
  "model": "qwen2.5:32b",
  "messages": [{"role": "user", "content": "Count to 5 slowly"}],
  "stream": true
}'

Step 9. Troubleshooting

Description: Common issues and their solutions when setting up Ollama with NVIDIA Sync.

Symptom	Cause	Fix
"Connection refused" on localhost:11434	SSH tunnel not active	Start Ollama Server in NVIDIA Sync custom apps
Model download fails with disk space error	Insufficient storage on Spark	Free up space or choose smaller model (e.g., qwen2.5:7b)
Ollama command not found after install	Installation path not in PATH	Restart terminal session or run `source ~/.bashrc`
API returns "model not found" error	Model not pulled or wrong name	Run `ollama list` to verify available models
Slow inference on Spark	Model too large for GPU memory	Try smaller model or check GPU memory with `nvidia-smi`

Step 10. Cleanup and rollback

Description: How to remove the setup and return to the original state.

To stop the tunnel:

Open NVIDIA Sync and click "Ollama Server" to deactivate

To remove the custom app:

Open NVIDIA Sync Settings → Custom tab
Select "Ollama Server" and click "Remove"

Warning: To completely uninstall Ollama from your Spark device:

sudo systemctl stop ollama
sudo systemctl disable ollama
sudo rm /usr/local/bin/ollama
sudo rm -rf /usr/share/ollama
sudo userdel ollama

This will remove all Ollama files and downloaded models.

Step 11. Next steps

Description: Explore additional functionality and integration options with your working Ollama setup.

Test different models from the Ollama library:

ollama pull llama3.1:8b
ollama pull codellama:13b
ollama pull phi3.5:3.8b

Monitor GPU and system usage during inference using the DGX Dashboard available through NVIDIA Sync.

Build applications using the Ollama API by integrating with your preferred programming language's HTTP client libraries.

7.7 KiB Raw Blame History