From b3a97461df29bbbfa5e6a7d32f18ed2d3ae0bca0 Mon Sep 17 00:00:00 2001 From: GitLab CI Date: Wed, 8 Oct 2025 22:00:07 +0000 Subject: [PATCH] chore: Regenerate all playbooks --- nvidia/comfy-ui/README.md | 17 +- nvidia/dgx-dashboard/README.md | 12 +- nvidia/flux-finetuning/README.md | 18 +- nvidia/jax/README.md | 14 +- nvidia/llama-factory/README.md | 15 +- nvidia/multi-agent-chatbot/README.md | 16 +- nvidia/nemo-fine-tune/README.md | 12 +- nvidia/nim-llm/README.md | 18 +- nvidia/nvfp4-quantization/README.md | 18 +- nvidia/open-webui/README.md | 16 +- nvidia/pytorch-fine-tune/README.md | 9 +- .../assets/Llama3_3B_full_finetuning.py | 195 +++++++++++++++ .../assets/Llama3_70B_qLoRA_finetuning.py | 228 ++++++++++++++++++ .../assets/Llama3_8B_LoRA_finetuning.py | 176 ++++++++++++++ nvidia/pytorch-fine-tune/assets/example | 0 nvidia/rag-ai-workbench/README.md | 13 +- nvidia/speculative-decoding/README.md | 12 +- nvidia/tailscale/README.md | 19 +- nvidia/trt-llm/README.md | 12 +- nvidia/unsloth/README.md | 19 +- nvidia/vllm/README.md | 12 +- nvidia/vlm-finetuning/README.md | 26 +- nvidia/vscode/README.md | 12 +- nvidia/vss/README.md | 18 +- 24 files changed, 770 insertions(+), 137 deletions(-) create mode 100644 nvidia/pytorch-fine-tune/assets/Llama3_3B_full_finetuning.py create mode 100644 nvidia/pytorch-fine-tune/assets/Llama3_70B_qLoRA_finetuning.py create mode 100644 nvidia/pytorch-fine-tune/assets/Llama3_8B_LoRA_finetuning.py create mode 100644 nvidia/pytorch-fine-tune/assets/example diff --git a/nvidia/comfy-ui/README.md b/nvidia/comfy-ui/README.md index ed76a04..906f4bd 100644 --- a/nvidia/comfy-ui/README.md +++ b/nvidia/comfy-ui/README.md @@ -58,14 +58,15 @@ All required assets can be found [in the ComfyUI repository on GitHub](https://g ## Time & risk -**Estimated time:** 30-45 minutes (including model download) - -**Risk level:** Medium -- Model downloads are large (~2GB) and may fail due to network issues -- Port 8188 must be accessible for web interface functionality - -**Rollback:** Virtual environment can be deleted to remove all installed packages. Downloaded models -can be removed manually from the checkpoints directory. +* **Estimated time:** 30-45 minutes (including model download) +* **Risk level:** Medium + * Model downloads are large (~2GB) and may fail due to network issues + * Port 8188 must be accessible for web interface functionality +* **Rollback:** Virtual environment can be deleted to remove all installed packages. Downloaded models can be removed manually from the checkpoints directory. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/dgx-dashboard/README.md b/nvidia/dgx-dashboard/README.md index 3751a67..9504437 100644 --- a/nvidia/dgx-dashboard/README.md +++ b/nvidia/dgx-dashboard/README.md @@ -36,11 +36,13 @@ You will learn how to access and use the DGX Dashboard on your DGX Spark device. ## Time & risk -**Duration:** 15-30 minutes for complete walkthrough including sample AI workload - -**Risk level:** Low - Web interface operations with minimal system impact - -**Rollback:** Stop JupyterLab instances through dashboard interface; no permanent system changes made during normal usage. +* **Duration:** 15-30 minutes for complete walkthrough including sample AI workload +* **Risk level:** Low - Web interface operations with minimal system impact +* **Rollback:** Stop JupyterLab instances through dashboard interface; no permanent system changes made during normal usage. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/flux-finetuning/README.md b/nvidia/flux-finetuning/README.md index f74bac0..93ddde1 100644 --- a/nvidia/flux-finetuning/README.md +++ b/nvidia/flux-finetuning/README.md @@ -39,15 +39,17 @@ The setup includes: ## Time & risk -**Duration**: -- 30-45 minutes for initial setup model download time -- 1-2 hours for dreambooth LoRA training - -**Risks**: -- Docker permission issues may require user group changes and session restart -- The recipe would require hyperparameter tuning and a high-quality dataset for the best results - +* **Duration**: + * 30-45 minutes for initial setup model download time + * 1-2 hours for dreambooth LoRA training +* **Risks**: + * Docker permission issues may require user group changes and session restart + * The recipe would require hyperparameter tuning and a high-quality dataset for the best results **Rollback**: Stop and remove Docker containers, delete downloaded models if needed. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/jax/README.md b/nvidia/jax/README.md index 6a1dda6..450d4e1 100644 --- a/nvidia/jax/README.md +++ b/nvidia/jax/README.md @@ -59,13 +59,15 @@ All required assets can be found [here on GitHub](https://gitlab.com/nvidia/dgx- ## Time & risk -**Duration:** 2-3 hours including setup, tutorial completion, and validation - -**Risks:** -- Package dependency conflicts in Python environment -- Performance validation may require architecture-specific optimizations - +* **Duration:** 2-3 hours including setup, tutorial completion, and validation +* **Risks:** + * Package dependency conflicts in Python environment + * Performance validation may require architecture-specific optimizations **Rollback:** Container environments provide isolation; remove containers and restart to reset state. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/llama-factory/README.md b/nvidia/llama-factory/README.md index 9e4e199..8fa5fcc 100644 --- a/nvidia/llama-factory/README.md +++ b/nvidia/llama-factory/README.md @@ -63,14 +63,13 @@ model adaptation for specialized domains while leveraging hardware-specific opti ## Time & risk -**Duration:** 30-60 minutes for initial setup, 1-7 hours for training depending on model size -and dataset. - -**Risks:** Model downloads require significant bandwidth and storage. Training may consume -substantial GPU memory and require parameter tuning for hardware constraints. - -**Rollback:** Remove Docker containers and cloned repositories. Training checkpoints are -saved locally and can be deleted to reclaim storage space. +* **Duration:** 30-60 minutes for initial setup, 1-7 hours for training depending on model size and dataset. +* **Risks:** Model downloads require significant bandwidth and storage. Training may consume substantial GPU memory and require parameter tuning for hardware constraints. +* **Rollback:** Remove Docker containers and cloned repositories. Training checkpoints are saved locally and can be deleted to reclaim storage space. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/multi-agent-chatbot/README.md b/nvidia/multi-agent-chatbot/README.md index 6e3acc9..daef767 100644 --- a/nvidia/multi-agent-chatbot/README.md +++ b/nvidia/multi-agent-chatbot/README.md @@ -42,13 +42,15 @@ The setup includes: ## Time & risk -**Estimated time**: 30 minutes to an hour - -**Risks**: -- Docker permission issues may require user group changes and session restart -- Setup includes downloading model files for gpt-oss-120B (~63GB), Deepseek-Coder:6.7B-Instruct (~7GB) and Qwen3-Embedding-4B (~4GB), which may take between 30 minutes to 2 hours depending on network speed - -**Rollback**: Stop and remove Docker containers using provided cleanup commands. +* **Estimated time**: 30 minutes to an hour +* **Risks**: + * Docker permission issues may require user group changes and session restart + * Setup includes downloading model files for gpt-oss-120B (~63GB), Deepseek-Coder:6.7B-Instruct (~7GB) and Qwen3-Embedding-4B (~4GB), which may take between 30 minutes to 2 hours depending on network speed +* **Rollback**: Stop and remove Docker containers using provided cleanup commands. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/nemo-fine-tune/README.md b/nvidia/nemo-fine-tune/README.md index a965172..6430ab1 100644 --- a/nvidia/nemo-fine-tune/README.md +++ b/nvidia/nemo-fine-tune/README.md @@ -43,11 +43,13 @@ All necessary files for the playbook can be found [here on GitHub](https://githu ## Time & risk -**Duration:** 45-90 minutes for complete setup and initial model fine-tuning - -**Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations - -**Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations. +* **Duration:** 45-90 minutes for complete setup and initial model fine-tuning +* **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting, distributed training setup complexity increases with multi-node configurations +* **Rollback:** Virtual environments can be completely removed; no system-level changes are made to the host system beyond package installations. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/nim-llm/README.md b/nvidia/nim-llm/README.md index 4e26de8..3e4e8a4 100644 --- a/nvidia/nim-llm/README.md +++ b/nvidia/nim-llm/README.md @@ -60,14 +60,16 @@ completions. ### Time & risk -**Estimated time:** 15-30 minutes for setup and validation - -**Risks:** -- Large model downloads may take significant time depending on network speed -- GPU memory requirements vary by model size -- Container startup time depends on model loading - -**Rollback:** Stop and remove containers with `docker stop && docker rm `. Remove cached models from `~/.cache/nim` if disk space recovery is needed. +* **Estimated time:** 15-30 minutes for setup and validation +* **Risks:** + * Large model downloads may take significant time depending on network speed + * GPU memory requirements vary by model size + * Container startup time depends on model loading +* **Rollback:** Stop and remove containers with `docker stop && docker rm `. Remove cached models from `~/.cache/nim` if disk space recovery is needed. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/nvfp4-quantization/README.md b/nvidia/nvfp4-quantization/README.md index abd23ae..4ca1750 100644 --- a/nvidia/nvfp4-quantization/README.md +++ b/nvidia/nvfp4-quantization/README.md @@ -58,14 +58,16 @@ df -h . ## Time & risk -**Estimated duration**: 45-90 minutes depending on network speed and model size - -**Risks**: -- Model download may fail due to network issues or Hugging Face authentication problems -- Quantization process is memory-intensive and may fail on systems with insufficient GPU memory -- Output files are large (several GB) and require adequate storage space - -**Rollback**: Remove the output directory and any pulled Docker images to restore original state. +* **Estimated duration**: 45-90 minutes depending on network speed and model size +* **Risks**: + * Model download may fail due to network issues or Hugging Face authentication problems + * Quantization process is memory-intensive and may fail on systems with insufficient GPU memory + * Output files are large (several GB) and require adequate storage space +* **Rollback**: Remove the output directory and any pulled Docker images to restore original state. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/open-webui/README.md b/nvidia/open-webui/README.md index 5519f30..f02ebb6 100644 --- a/nvidia/open-webui/README.md +++ b/nvidia/open-webui/README.md @@ -38,13 +38,15 @@ for model management, persistent data storage, and GPU acceleration for model in ## Time & risk -**Duration**: 15-20 minutes for initial setup, plus model download time (varies by model size) - -**Risks**: -- Docker permission issues may require user group changes and session restart -- Large model downloads may take significant time depending on network speed - -**Rollback**: Stop and remove Docker containers using provided cleanup commands, remove custom port from NVIDIA Sync settings. +* **Duration**: 15-20 minutes for initial setup, plus model download time (varies by model size) +* **Risks**: + * Docker permission issues may require user group changes and session restart + * Large model downloads may take significant time depending on network speed +* **Rollback**: Stop and remove Docker containers using provided cleanup commands, remove custom port from NVIDIA Sync settings. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/pytorch-fine-tune/README.md b/nvidia/pytorch-fine-tune/README.md index 4e679fa..858e339 100644 --- a/nvidia/pytorch-fine-tune/README.md +++ b/nvidia/pytorch-fine-tune/README.md @@ -37,9 +37,12 @@ ALl files required for fine-tuning are included in the folder in [the GitHub rep ## Time & risk -**Time estimate:** 30-45 mins for setup and runing fine-tuning. Fine-tuning run time varies depending on model size - -**Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting. +* **Time estimate:** 30-45 mins for setup and runing fine-tuning. Fine-tuning run time varies depending on model size +* **Risks:** Model downloads can be large (several GB), ARM64 package compatibility issues may require troubleshooting. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/pytorch-fine-tune/assets/Llama3_3B_full_finetuning.py b/nvidia/pytorch-fine-tune/assets/Llama3_3B_full_finetuning.py new file mode 100644 index 0000000..f829a1b --- /dev/null +++ b/nvidia/pytorch-fine-tune/assets/Llama3_3B_full_finetuning.py @@ -0,0 +1,195 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import torch +import argparse +from datasets import load_dataset +from trl import SFTConfig, SFTTrainer +from transformers import AutoModelForCausalLM, AutoTokenizer + + +# Define prompt templates +ALPACA_PROMPT_TEMPLATE = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. +### Instruction: {} + +### Input: {} + +### Response: {}""" + +def get_alpaca_dataset(eos_token, dataset_size=500): + # Preprocess the dataset + def preprocess(x): + texts = [ + ALPACA_PROMPT_TEMPLATE.format(instruction, input, output) + eos_token + for instruction, input, output in zip(x["instruction"], x["input"], x["output"]) + ] + return {"text": texts} + + dataset = load_dataset("tatsu-lab/alpaca", split="train").select(range(dataset_size)).shuffle(seed=42) + return dataset.map(preprocess, remove_columns=dataset.column_names, batched=True) + + +def main(args): + # Load the model and tokenizer + print(f"Loading model: {args.model_name}") + model = AutoModelForCausalLM.from_pretrained( + args.model_name, + dtype=args.dtype, + device_map="auto", + trust_remote_code=True + ) + tokenizer = AutoTokenizer.from_pretrained(args.model_name, trust_remote_code=True) + tokenizer.pad_token = tokenizer.eos_token + + # Print model information + total_params = sum(p.numel() for p in model.parameters()) + trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad) + print(f"Total parameters: {total_params:,}") + print(f"Trainable parameters: {trainable_params:,} (100% - Full Fine-tuning)") + + # Load and preprocess the dataset + print(f"Loading dataset with {args.dataset_size} samples...") + dataset = get_alpaca_dataset(tokenizer.eos_token, args.dataset_size) + + # Configure the SFT config + config = { + "per_device_train_batch_size": args.batch_size, + "num_train_epochs": 0.01, # Warmup epoch + "gradient_accumulation_steps": args.gradient_accumulation_steps, + "learning_rate": args.learning_rate, + "optim": "adamw_torch", + "save_strategy": 'no', + "remove_unused_columns": False, + "seed": 42, + "dataset_text_field": "text", + "packing": False, + "max_seq_length": args.seq_length, + "torch_compile": False, + "report_to": "none", + "logging_dir": args.log_dir, + "logging_steps": args.logging_steps, + "gradient_checkpointing": args.gradient_checkpointing, # Save memory + } + + # Compile model if requested + if args.use_torch_compile: + print("Compiling model with torch.compile()...") + model = torch.compile(model) + + # Warmup for torch compile + print("Running warmup for torch.compile()...") + SFTTrainer( + model=model, + processing_class=tokenizer, + train_dataset=dataset, + args=SFTConfig(**config), + ).train() + + # Train the model + print(f"\nStarting full fine-tuning for {args.num_epochs} epoch(s)...") + config["num_train_epochs"] = args.num_epochs + config["report_to"] = "tensorboard" + + trainer = SFTTrainer( + model=model, + processing_class=tokenizer, + train_dataset=dataset, + args=SFTConfig(**config), + ) + + trainer_stats = trainer.train() + + # Print training statistics + print(f"\n{'='*60}") + print("TRAINING COMPLETED") + print(f"{'='*60}") + print(f"Training runtime: {trainer_stats.metrics['train_runtime']:.2f} seconds") + print(f"Samples per second: {trainer_stats.metrics['train_samples_per_second']:.2f}") + print(f"Steps per second: {trainer_stats.metrics['train_steps_per_second']:.2f}") + print(f"Train loss: {trainer_stats.metrics['train_loss']:.4f}") + print(f"{'='*60}\n") + + # Save model if requested + if args.output_dir: + print(f"Saving model to {args.output_dir}...") + trainer.save_model(args.output_dir) + tokenizer.save_pretrained(args.output_dir) + print("Model saved successfully!") + + +def parse_arguments(): + parser = argparse.ArgumentParser(description="Llama 3.2 3B Full Fine-tuning (SFT)") + + # Model configuration + parser.add_argument("--model_name", type=str, default="meta-llama/Llama-3.2-3B-Instruct", + help="Model name or path") + parser.add_argument("--dtype", type=str, default="bfloat16", + choices=["float32", "float16", "bfloat16"], + help="Model dtype") + + # Training configuration + parser.add_argument("--batch_size", type=int, default=8, + help="Per device training batch size") + parser.add_argument("--seq_length", type=int, default=2048, + help="Maximum sequence length") + parser.add_argument("--num_epochs", type=int, default=1, + help="Number of training epochs") + parser.add_argument("--gradient_accumulation_steps", type=int, default=1, + help="Gradient accumulation steps") + parser.add_argument("--learning_rate", type=float, default=5e-5, + help="Learning rate") + parser.add_argument("--gradient_checkpointing", action="store_true", + help="Enable gradient checkpointing to save memory") + + # Dataset configuration + parser.add_argument("--dataset_size", type=int, default=500, + help="Number of samples to use from dataset") + + # Logging configuration + parser.add_argument("--logging_steps", type=int, default=1, + help="Log every N steps") + parser.add_argument("--log_dir", type=str, default="logs", + help="Directory for logs") + + # Compilation and saving + parser.add_argument("--use_torch_compile", action="store_true", + help="Use torch.compile() for faster training") + parser.add_argument("--output_dir", type=str, default=None, + help="Directory to save the fine-tuned model") + + return parser.parse_args() + + +if __name__ == "__main__": + args = parse_arguments() + print(f"\n{'='*60}") + print("LLAMA 3.2 3B FULL FINE-TUNING CONFIGURATION") + print(f"{'='*60}") + print(f"Model: {args.model_name}") + print(f"Training mode: Full SFT ") + print(f"Batch size: {args.batch_size}") + print(f"Gradient accumulation: {args.gradient_accumulation_steps}") + print(f"Effective batch size: {args.batch_size * args.gradient_accumulation_steps}") + print(f"Sequence length: {args.seq_length}") + print(f"Number of epochs: {args.num_epochs}") + print(f"Learning rate: {args.learning_rate}") + print(f"Dataset size: {args.dataset_size}") + print(f"Gradient checkpointing: {args.gradient_checkpointing}") + print(f"Torch compile: {args.use_torch_compile}") + print(f"{'='*60}\n") + + main(args) \ No newline at end of file diff --git a/nvidia/pytorch-fine-tune/assets/Llama3_70B_qLoRA_finetuning.py b/nvidia/pytorch-fine-tune/assets/Llama3_70B_qLoRA_finetuning.py new file mode 100644 index 0000000..c633636 --- /dev/null +++ b/nvidia/pytorch-fine-tune/assets/Llama3_70B_qLoRA_finetuning.py @@ -0,0 +1,228 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import torch +import argparse +import os +from datasets import load_dataset +from trl import SFTConfig, SFTTrainer +from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig +from peft import get_peft_model, LoraConfig, TaskType, prepare_model_for_kbit_training + + +# Define prompt templates +ALPACA_PROMPT_TEMPLATE = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. +### Instruction: {} + +### Input: {} + +### Response: {}""" + +def get_alpaca_dataset(eos_token, dataset_size=500): + # Preprocess the dataset + def preprocess(x): + texts = [ + ALPACA_PROMPT_TEMPLATE.format(instruction, input, output) + eos_token + for instruction, input, output in zip(x["instruction"], x["input"], x["output"]) + ] + return {"text": texts} + + dataset = load_dataset("tatsu-lab/alpaca", split="train").select(range(dataset_size)).shuffle(seed=42) + return dataset.map(preprocess, remove_columns=dataset.column_names, batched=True) + + +def main(args): + # Load the model and tokenizer + print(f"Loading model: {args.model_name}") + print(f"Training mode: QLoRA (4-bit quantization)") + + # Use balanced device map for QLoRA to avoid device placement issues + # "balanced" distributes model across available GPUs more reliably than "auto" + device_map_config = "balanced" if torch.cuda.device_count() > 1 else {"": 0} + + # Configure 4-bit quantization for QLoRA + quantization_config = BitsAndBytesConfig( + load_in_4bit=True, + bnb_4bit_use_double_quant=True, + bnb_4bit_quant_type='nf4', + bnb_4bit_compute_dtype=getattr(torch, args.dtype), + ) + + model = AutoModelForCausalLM.from_pretrained( + args.model_name, + quantization_config=quantization_config, + dtype=args.dtype, + device_map=device_map_config, + trust_remote_code=True + ) + tokenizer = AutoTokenizer.from_pretrained(args.model_name, trust_remote_code=True) + tokenizer.pad_token = tokenizer.eos_token + + # Prepare model for QLoRA training + print(f"Preparing model for QLoRA (4-bit) with rank {args.lora_rank}...") + model = prepare_model_for_kbit_training(model) + + model = get_peft_model(model, LoraConfig( + r=args.lora_rank, + target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], + lora_alpha=16, + lora_dropout=0, + task_type=TaskType.CAUSAL_LM + )) + + trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad) + total_params = sum(p.numel() for p in model.parameters()) + print(f"Trainable parameters: {trainable_params:,} ({100 * trainable_params / total_params:.2f}%)") + + # Load and preprocess the dataset + print(f"Loading dataset with {args.dataset_size} samples...") + dataset = get_alpaca_dataset(tokenizer.eos_token, args.dataset_size) + + # Configure the SFT config + config = { + "per_device_train_batch_size": args.batch_size, + "num_train_epochs": 0.01, # Warmup epoch + "gradient_accumulation_steps": args.gradient_accumulation_steps, + "learning_rate": args.learning_rate, + "optim": "adamw_torch", + "save_strategy": 'no', + "remove_unused_columns": False, + "seed": 42, + "dataset_text_field": "text", + "packing": False, + "max_seq_length": args.seq_length, + "torch_compile": False, + "report_to": "none", + "logging_dir": args.log_dir, + "logging_steps": args.logging_steps, + "gradient_checkpointing": args.gradient_checkpointing + } + + # Compile model if requested + if args.use_torch_compile: + print("Compiling model with torch.compile()...") + model = torch.compile(model) + + # Warmup for torch compile + print("Running warmup for torch.compile()...") + SFTTrainer( + model=model, + processing_class=tokenizer, + train_dataset=dataset, + args=SFTConfig(**config), + ).train() + + # Train the model + print(f"\nStarting QLoRA fine-tuning for {args.num_epochs} epoch(s)...") + config["num_train_epochs"] = args.num_epochs + config["report_to"] = "tensorboard" + + trainer = SFTTrainer( + model=model, + processing_class=tokenizer, + train_dataset=dataset, + args=SFTConfig(**config), + ) + + trainer_stats = trainer.train() + + # Print training statistics + print(f"\n{'='*60}") + print("TRAINING COMPLETED") + print(f"{'='*60}") + print(f"Training runtime: {trainer_stats.metrics['train_runtime']:.2f} seconds") + print(f"Samples per second: {trainer_stats.metrics['train_samples_per_second']:.2f}") + print(f"Steps per second: {trainer_stats.metrics['train_steps_per_second']:.2f}") + print(f"Train loss: {trainer_stats.metrics['train_loss']:.4f}") + print(f"{'='*60}\n") + + # Save model if requested + if args.output_dir: + print(f"Saving model to {args.output_dir}...") + trainer.save_model(args.output_dir) + tokenizer.save_pretrained(args.output_dir) + print("Model saved successfully!") + + +def parse_arguments(): + parser = argparse.ArgumentParser(description="Llama 3.1 70B Fine-tuning with QLoRA") + + # Model configuration + parser.add_argument("--model_name", type=str, default="meta-llama/Llama-3.1-70B-Instruct", + help="Model name or path") + parser.add_argument("--dtype", type=str, default="bfloat16", + help="Model dtype (e.g., float32, float16, bfloat16)") + + # Training configuration + parser.add_argument("--batch_size", type=int, default=8, + choices=[1, 2, 4, 8, 16, 32], + help="Per device training batch size") + parser.add_argument("--seq_length", type=int, default=2048, + choices=[256, 512, 1024, 2048, 4096, 8192], + help="Maximum sequence length") + parser.add_argument("--num_epochs", type=int, default=1, + help="Number of training epochs") + parser.add_argument("--gradient_accumulation_steps", type=int, default=1, + help="Gradient accumulation steps") + parser.add_argument("--learning_rate", type=float, default=1e-4, + help="Learning rate") + parser.add_argument("--gradient_checkpointing", action="store_true", + help="Enable gradient checkpointing to save memory") + + # LoRA configuration + parser.add_argument("--lora_rank", type=int, default=8, + help="LoRA rank") + + # Dataset configuration + parser.add_argument("--dataset_size", type=int, default=500, + help="Number of samples to use from dataset") + + # Logging configuration + parser.add_argument("--logging_steps", type=int, default=1, + help="Log every N steps") + parser.add_argument("--log_dir", type=str, default="logs", + help="Directory for logs") + + # Compilation and saving + parser.add_argument("--use_torch_compile", action="store_true", + help="Use torch.compile() for faster training") + parser.add_argument("--output_dir", type=str, default=None, + help="Directory to save the fine-tuned model") + + return parser.parse_args() + + +if __name__ == "__main__": + args = parse_arguments() + print(f"\n{'='*60}") + print("LLAMA 3.1 70B QLoRA FINE-TUNING") + print(f"{'='*60}") + print(f"Model: {args.model_name}") + print(f"Training mode: QLoRA (4-bit quantization)") + print(f"Batch size: {args.batch_size}") + print(f"Gradient accumulation: {args.gradient_accumulation_steps}") + print(f"Effective batch size: {args.batch_size * args.gradient_accumulation_steps}") + print(f"Sequence length: {args.seq_length}") + print(f"Number of epochs: {args.num_epochs}") + print(f"Learning rate: {args.learning_rate}") + print(f"LoRA rank: {args.lora_rank}") + print(f"Dataset size: {args.dataset_size}") + print(f"Gradient checkpointing: {args.gradient_checkpointing}") + print(f"Torch compile: {args.use_torch_compile}") + print(f"{'='*60}\n") + + main(args) \ No newline at end of file diff --git a/nvidia/pytorch-fine-tune/assets/Llama3_8B_LoRA_finetuning.py b/nvidia/pytorch-fine-tune/assets/Llama3_8B_LoRA_finetuning.py new file mode 100644 index 0000000..34b9e15 --- /dev/null +++ b/nvidia/pytorch-fine-tune/assets/Llama3_8B_LoRA_finetuning.py @@ -0,0 +1,176 @@ +# +# SPDX-FileCopyrightText: Copyright (c) 1993-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import torch +import argparse +from datasets import load_dataset +from trl import SFTConfig, SFTTrainer +from transformers import AutoModelForCausalLM, AutoTokenizer +from peft import get_peft_model, LoraConfig, TaskType + + +# Define prompt templates +ALPACA_PROMPT_TEMPLATE = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. +### Instruction: {} + +### Input: {} + +### Response: {}""" + +def get_alpaca_dataset(eos_token, dataset_size=500): + # Preprocess the dataset + def preprocess(x): + texts = [ + ALPACA_PROMPT_TEMPLATE.format(instruction, input, output) + eos_token + for instruction, input, output in zip(x["instruction"], x["input"], x["output"]) + ] + return {"text": texts} + + dataset = load_dataset("tatsu-lab/alpaca", split="train").select(range(dataset_size)).shuffle(seed=42) + return dataset.map(preprocess, remove_columns=dataset.column_names, batched=True) + + +def main(args): + # Load the model and tokenizer + print(f"Loading model: {args.model_name}") + model = AutoModelForCausalLM.from_pretrained( + args.model_name, + dtype=args.dtype, + device_map="auto" + ) + tokenizer = AutoTokenizer.from_pretrained(args.model_name) + tokenizer.pad_token = tokenizer.eos_token + + # Configure LoRA config + model = get_peft_model(model, LoraConfig( + r=args.lora_rank, + target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], + lora_alpha=16, + lora_dropout=0, + task_type=TaskType.CAUSAL_LM)) + print(f"Trainable parameters = {sum(p.numel() for p in model.parameters() if p.requires_grad):,}") + + # Load and preprocess the dataset + print(f"Loading dataset with {args.dataset_size} samples...") + dataset = get_alpaca_dataset(tokenizer.eos_token, args.dataset_size) + + # Configure the SFT config + config = { + "per_device_train_batch_size": args.batch_size, + "num_train_epochs": 0.01, + "gradient_accumulation_steps": args.gradient_accumulation_steps, + "learning_rate": args.learning_rate, + "optim": "adamw_torch", + "save_strategy": 'no', + "remove_unused_columns": False, + "seed": 42, + "dataset_text_field": "text", + "packing": False, + "max_seq_length": args.seq_length, + "torch_compile": False, + "report_to": "none", + "logging_dir": args.log_dir, + "logging_steps": args.logging_steps + } + + # Warmup for torch compile + model = torch.compile(model) + SFTTrainer( + model=model, + processing_class=tokenizer, + train_dataset=dataset, + args=SFTConfig(**config), + ).train() + + # Train the model + print(f"\nStarting LoRA fine-tuning for {args.num_epochs} epoch(s)...") + config["num_train_epochs"] = args.num_epochs + config["report_to"] = "tensorboard" + trainer = SFTTrainer( + model=model, + processing_class=tokenizer, + train_dataset=dataset, + args=SFTConfig(**config), + ) + + trainer_stats = trainer.train() + + # Print training statistics + print(f"\n{'='*60}") + print("TRAINING COMPLETED") + print(f"{'='*60}") + print(f"Training runtime: {trainer_stats.metrics['train_runtime']:.2f} seconds") + print(f"Samples per second: {trainer_stats.metrics['train_samples_per_second']:.2f}") + print(f"Steps per second: {trainer_stats.metrics['train_steps_per_second']:.2f}") + print(f"Train loss: {trainer_stats.metrics['train_loss']:.4f}") + print(f"{'='*60}\n") + + +def parse_arguments(): + parser = argparse.ArgumentParser(description="Llama 3.1 8B Fine-tuning with LoRA") + + # Model configuration + parser.add_argument("--model_name", type=str, default="meta-llama/Llama-3.1-8B-Instruct", + help="Model name or path") + parser.add_argument("--dtype", type=str, default="bfloat16", + choices=["float32", "float16", "bfloat16"], + help="Model dtype") + + # Training configuration + parser.add_argument("--batch_size", type=int, default=4, + help="Per device training batch size") + parser.add_argument("--seq_length", type=int, default=2048, + help="Maximum sequence length") + parser.add_argument("--num_epochs", type=int, default=1, + help="Number of training epochs") + parser.add_argument("--gradient_accumulation_steps", type=int, default=1, + help="Gradient accumulation steps") + parser.add_argument("--learning_rate", type=float, default=1e-4, + help="Learning rate") + + # LoRA configuration + parser.add_argument("--lora_rank", type=int, default=8, + help="LoRA rank") + + # Dataset configuration + parser.add_argument("--dataset_size", type=int, default=500, + help="Number of samples to use from dataset") + + # Logging configuration + parser.add_argument("--logging_steps", type=int, default=1, + help="Log every N steps") + parser.add_argument("--log_dir", type=str, default="logs", + help="Directory for logs") + + return parser.parse_args() + + +if __name__ == "__main__": + args = parse_arguments() + print(f"\n{'='*60}") + print("LLAMA 3.1 8B LoRA FINE-TUNING CONFIGURATION") + print(f"{'='*60}") + print(f"Model: {args.model_name}") + print(f"Batch size: {args.batch_size}") + print(f"Sequence length: {args.seq_length}") + print(f"Number of epochs: {args.num_epochs}") + print(f"Learning rate: {args.learning_rate}") + print(f"LoRA rank: {args.lora_rank}") + print(f"Dataset size: {args.dataset_size}") + print(f"{'='*60}\n") + + main(args) \ No newline at end of file diff --git a/nvidia/pytorch-fine-tune/assets/example b/nvidia/pytorch-fine-tune/assets/example new file mode 100644 index 0000000..e69de29 diff --git a/nvidia/rag-ai-workbench/README.md b/nvidia/rag-ai-workbench/README.md index 9d08abc..de13fa7 100644 --- a/nvidia/rag-ai-workbench/README.md +++ b/nvidia/rag-ai-workbench/README.md @@ -50,12 +50,13 @@ architectures. ## Time & risk -**Estimated time:** 30-45 minutes (including AI Workbench installation if needed) - -**Risk level:** Low - Uses pre-built containers and established APIs - -**Rollback:** Simply delete the cloned project from AI Workbench to remove all components. No system -changes are made outside the AI Workbench environment. +* **Estimated time:** 30-45 minutes (including AI Workbench installation if needed) +* **Risk level:** Low - Uses pre-built containers and established APIs +* **Rollback:** Simply delete the cloned project from AI Workbench to remove all components. No system changes are made outside the AI Workbench environment. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/speculative-decoding/README.md b/nvidia/speculative-decoding/README.md index 3f7e689..9b7474e 100644 --- a/nvidia/speculative-decoding/README.md +++ b/nvidia/speculative-decoding/README.md @@ -52,11 +52,13 @@ These examples demonstrate how to accelerate large language model inference whil ## Time & risk -**Duration:** 10-20 minutes for setup, additional time for model downloads (varies by network speed) - -**Risks:** GPU memory exhaustion with large models, container registry access issues, network timeouts during downloads - -**Rollback:** Stop Docker containers and optionally clean up downloaded model cache. +* **Duration:** 10-20 minutes for setup, additional time for model downloads (varies by network speed) +* **Risks:** GPU memory exhaustion with large models, container registry access issues, network timeouts during downloads +* **Rollback:** Stop Docker containers and optionally clean up downloaded model cache. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/tailscale/README.md b/nvidia/tailscale/README.md index 469f4d5..8d4f722 100644 --- a/nvidia/tailscale/README.md +++ b/nvidia/tailscale/README.md @@ -62,15 +62,16 @@ all traffic automatically encrypted and NAT traversal handled transparently. ## Time & risk -**Duration**: 15-30 minutes for initial setup, 5 minutes per additional device - -**Risks**: -- Potential SSH service configuration conflicts -- Network connectivity issues during initial setup -- Authentication provider service dependencies - -**Rollback**: Tailscale can be completely removed with `sudo apt remove tailscale` -and all network routing automatically reverts to default settings. +* **Duration**: 15-30 minutes for initial setup, 5 minutes per additional device +* **Risks**: + * Potential SSH service configuration conflicts + * Network connectivity issues during initial setup + * Authentication provider service dependencies +* **Rollback**: Tailscale can be completely removed with `sudo apt remove tailscale` and all network routing automatically reverts to default settings. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/trt-llm/README.md b/nvidia/trt-llm/README.md index 08899fc..1030093 100644 --- a/nvidia/trt-llm/README.md +++ b/nvidia/trt-llm/README.md @@ -110,11 +110,13 @@ Reminder: not all model architectures are supported for NVFP4 quantization. ## Time & risk -**Duration**: 45-60 minutes for setup and API server deployment - -**Risk level**: Medium - container pulls and model downloads may fail due to network issues - -**Rollback**: Stop inference servers and remove downloaded models to free resources. +* **Duration**: 45-60 minutes for setup and API server deployment +* **Risk level**: Medium - container pulls and model downloads may fail due to network issues +* **Rollback**: Stop inference servers and remove downloaded models to free resources. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Single Spark diff --git a/nvidia/unsloth/README.md b/nvidia/unsloth/README.md index 1e5acbf..e541f7c 100644 --- a/nvidia/unsloth/README.md +++ b/nvidia/unsloth/README.md @@ -48,15 +48,16 @@ The Python test script can be found [here on GitHub](https://gitlab.com/nvidia/d ## Time & risk -**Duration**: 30-60 minutes for initial setup and test run - -**Risks**: - -- Triton compiler version mismatches may cause compilation errors -- CUDA toolkit configuration issues may prevent kernel compilation -- Memory constraints on smaller models require batch size adjustments - -**Rollback**: Uninstall packages with `pip uninstall unsloth torch torchvision`. +* **Duration**: 30-60 minutes for initial setup and test run +* **Risks**: + * Triton compiler version mismatches may cause compilation errors + * CUDA toolkit configuration issues may prevent kernel compilation + * Memory constraints on smaller models require batch size adjustments +* **Rollback**: Uninstall packages with `pip uninstall unsloth torch torchvision`. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/vllm/README.md b/nvidia/vllm/README.md index 4b826d3..dc74bb8 100644 --- a/nvidia/vllm/README.md +++ b/nvidia/vllm/README.md @@ -48,11 +48,13 @@ support for ARM64. ## Time & risk -**Duration:** 30 minutes for Docker approach - -**Risks:** Container registry access requires internal credentials - -**Rollback:** Container approach is non-destructive. +* **Duration:** 30 minutes for Docker approach +* **Risks:** Container registry access requires internal credentials +* **Rollback:** Container approach is non-destructive. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/vlm-finetuning/README.md b/nvidia/vlm-finetuning/README.md index c1658fc..3301fb8 100644 --- a/nvidia/vlm-finetuning/README.md +++ b/nvidia/vlm-finetuning/README.md @@ -43,18 +43,20 @@ The setup includes: ## Time & risk -**Duration**: -- 15-20 minutes for initial setup and model downloads -- 30-60 minutes for image VLM training (depending on dataset size) -- 1-2 hours for video VLM training (depending on video dataset size) - -**Risks**: -- Docker permission issues may require user group changes and a session restart -- Large model downloads and datasets may require significant disk space and time -- Training requires sustained GPU usage and memory -- Dataset preparation may require manual steps (Kaggle downloads, video processing) - -**Rollback**: Stop and remove Docker containers, delete downloaded models and datasets if needed. +* **Duration**: + * 15-20 minutes for initial setup and model downloads + * 30-60 minutes for image VLM training (depending on dataset size) + * 1-2 hours for video VLM training (depending on video dataset size) +* **Risks**: + * Docker permission issues may require user group changes and a session restart + * Large model downloads and datasets may require significant disk space and time + * Training requires sustained GPU usage and memory + * Dataset preparation may require manual steps (Kaggle downloads, video processing) +* **Rollback**: Stop and remove Docker containers, delete downloaded models and datasets if needed. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/vscode/README.md b/nvidia/vscode/README.md index d7837a5..26d2e0d 100644 --- a/nvidia/vscode/README.md +++ b/nvidia/vscode/README.md @@ -46,11 +46,13 @@ You will have Visual Studio Code running natively on your DGX Spark device with ## Time & risk -**Duration:** 10-15 minutes - -**Risk level:** Low - installation uses official packages with standard rollback - -**Rollback:** Standard package removal via system package manager +* **Duration:** 10-15 minutes +* **Risk level:** Low - installation uses official packages with standard rollback +* **Rollback:** Standard package removal via system package manager +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions diff --git a/nvidia/vss/README.md b/nvidia/vss/README.md index e0729ab..19841b7 100644 --- a/nvidia/vss/README.md +++ b/nvidia/vss/README.md @@ -45,14 +45,16 @@ You will deploy NVIDIA's VSS AI Blueprint on NVIDIA Spark hardware with Blackwel ## Time & risk -**Duration:** 30-45 minutes for initial setup, additional time for video processing validation - -**Risks:** -- Container startup can be resource-intensive and time-consuming with large model downloads -- Network configuration conflicts if shared network already exists -- Remote API endpoints may have rate limits or connectivity issues (hybrid deployment) - -**Rollback:** Stop all containers with `docker compose down`, remove shared network with `docker network rm vss-shared-network`, and clean up temporary media directories. +* **Duration:** 30-45 minutes for initial setup, additional time for video processing validation +* **Risks:** + * Container startup can be resource-intensive and time-consuming with large model downloads + * Network configuration conflicts if shared network already exists + * Remote API endpoints may have rate limits or connectivity issues (hybrid deployment) +* **Rollback:** Stop all containers with `docker compose down`, remove shared network with `docker network rm vss-shared-network`, and clean up temporary media directories. +* DGX Spark uses a Unified Memory Architecture (UMA), which enables dynamic memory sharing between the GPU and CPU. With many applications still updating to take advantage of UMA, you may encounter memory issues even when within the memory capacity of DGX Spark. If that happens, manually flush the buffer cache with: +```bash +sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches' +``` ## Instructions