**Estimated time:** 15-30 minutes for setup and validation
**Risks:**
- Large model downloads may take significant time depending on network speed
- GPU memory requirements vary by model size
- Container startup time depends on model loading
**Rollback:** Stop and remove containers with `docker stop <CONTAINER_NAME> && docker rm <CONTAINER_NAME>`. Remove cached models from `~/.cache/nim` if disk space recovery is needed.
## Instructions
## Step 1. Verify environment prerequisites
Check that your system meets the basic requirements for running GPU-enabled containers.
```bash
nvidia-smi
docker --version
docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu20.04 nvidia-smi
```
### Step 2. Configure NGC authentication
Set up access to NVIDIA's container registry using your NGC API key.
Start the containerized LLM service with GPU acceleration and proper resource allocation.
```bash
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e NGC_API_KEY=$NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME
```
The container will download the model on first run and may take several minutes to start. Look for
startup messages indicating the service is ready.
## Step 5. Validate inference endpoint
Test the deployed service with a basic completion request to verify functionality. Run the following curl command in a new terminal.
```bash
curl -X 'POST' \
'http://0.0.0.0:8000/v1/chat/completions' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "meta/llama-3.1-8b-instruct",
"messages": [
{
"role":"system",
"content":"detailed thinking on"
},
{
"role":"user",
"content":"Can you write me a song?"
}
],
"top_p": 1,
"n": 1,
"max_tokens": 15,
"frequency_penalty": 1.0,
"stop": ["hello"]
}'
```
Expected output should be a JSON response containing a completion field with generated text.
## Step 6. Troubleshooting
| Symptom | Cause | Fix |
|---------|--------|-----|
| Container fails to start with GPU error | NVIDIA Container Toolkit not configured | Install nvidia-container-toolkit and restart Docker |
| "Invalid credentials" during docker login | Incorrect NGC API key format | Verify API key from NGC portal, ensure no extra whitespace |
| Model download hangs or fails | Network connectivity or insufficient disk space | Check internet connection and available disk space in cache directory |
| API returns 404 or connection refused | Container not fully started or wrong port | Wait for container startup completion, verify port 8000 is accessible |