chore: Regenerate all playbooks

This commit is contained in:
GitLab CI 2025-10-12 16:55:22 +00:00
parent 302c15b6cf
commit f4c0014bf5

View File

@ -91,16 +91,7 @@ curl http://localhost:8000/v1/chat/completions \
Expected response should contain `"content": "204"` or similar mathematical calculation. Expected response should contain `"content": "204"` or similar mathematical calculation.
## Step 3. Troubleshooting ## Step 3. Cleanup and rollback
| Symptom | Cause | Fix |
|---------|--------|-----|
| CUDA version mismatch errors | Wrong CUDA toolkit version | Reinstall CUDA 12.9 using exact installer |
| Container registry authentication fails | Invalid or expired GitLab token | Generate new auth token |
| SM_121a architecture not recognized | Missing LLVM patches | Verify SM_121a patches applied to LLVM source |
## Step 4. Cleanup and rollback
For container approach (non-destructive): For container approach (non-destructive):
@ -116,7 +107,7 @@ To remove CUDA 12.9:
sudo /usr/local/cuda-12.9/bin/cuda-uninstaller sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
``` ```
## Step 5. Next steps ## Step 4. Next steps
- **Production deployment:** Configure vLLM with your specific model requirements - **Production deployment:** Configure vLLM with your specific model requirements
- **Performance tuning:** Adjust batch sizes and memory settings for your workload - **Performance tuning:** Adjust batch sizes and memory settings for your workload
@ -127,7 +118,7 @@ sudo /usr/local/cuda-12.9/bin/cuda-uninstaller
## Step 1. Configure network connectivity ## Step 1. Configure network connectivity
Follow the network setup instructions from the Connect two Sparks playbook to establish connectivity between your DGX Spark nodes. Follow the network setup instructions from the [Connect two Sparks](https://build.nvidia.com/spark/stack-sparks/stacked-sparks) playbook to establish connectivity between your DGX Spark nodes.
This includes: This includes:
- Physical QSFP cable connection - Physical QSFP cable connection
@ -339,6 +330,15 @@ http://192.168.100.10:8265
## Troubleshooting ## Troubleshooting
## Common issues for running on a single Spark
| Symptom | Cause | Fix |
|---------|--------|-----|
| CUDA version mismatch errors | Wrong CUDA toolkit version | Reinstall CUDA 12.9 using exact installer |
| Container registry authentication fails | Invalid or expired GitLab token | Generate new auth token |
| SM_121a architecture not recognized | Missing LLVM patches | Verify SM_121a patches applied to LLVM source |
## Common Issues for running on two Starks
| Symptom | Cause | Fix | | Symptom | Cause | Fix |
|---------|--------|-----| |---------|--------|-----|
| Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration | | Node 2 not visible in Ray cluster | Network connectivity issue | Verify QSFP cable connection, check IP configuration |