The MONAI Reasoning CXR 3B model is a **medical AI model** designed for **chest X-ray (CXR) interpretation** with reasoning capabilities. It combines imaging analysis with large-scale language modeling:
- **Medical focus**: Built within the MONAI framework for healthcare imaging tasks.
- **Vision + language**: Takes CXR images as input and produces diagnostic text or reasoning outputs.
## You should see model files including config.json and model weights
```
> **Important Note:** Currently, a custom internal VLLM container is required until the sm121 support is available in the public image. The instructions below use the internal container `******:5005/dl/dgx/vllm:main-py3.31165712-devel`.
## Step 3. Verify System Architecture
Before proceeding, confirm your system architecture is ARM64 for proper container selection
on your NVIDIA Spark device:
```bash
## Check your system architecture
uname -m
## Should output: aarch64 for ARM64 systems like NVIDIA Spark
```
## Step 4. Create a Docker Network
Create a dedicated Docker bridge network to allow the VLLM and Open WebUI containers to
communicate with each other easily and reliably.
```bash
docker network create monai-net
```
## Step 5. Deploy the VLLM Server
Launch the VLLM container with ARM64 architecture support, attaching it to the network you
created and mounting your local model directory. This step configures the server for optimal
performance on NVIDIA Spark hardware.
```bash
## Stop and remove existing container if running
docker stop vllm-server 2>/dev/null || true
docker rm vllm-server 2>/dev/null || true
## Run the VLLM server with internal container
docker run --rm -d \
--name vllm-server \
--gpus all \
--ipc=host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
--network monai-net \
--platform linux/arm64 \
-v ./models/monai-reasoning-cxr-3b:/model \
-p 8000:8000 \
******:5005/dl/dgx/vllm:main-py3.31165712-devel \
vllm serve /model \
--host 0.0.0.0 \
--port 8000 \
--dtype bfloat16 \
--trust-remote-code \
--gpu-memory-utilization 0.5 \
--enforce-eager \
--served-model-name monai-reasoning-cxr-3b
```
**Wait for startup and verify:**
```bash
## Wait for the model to load (can take 1-2 minutes on Spark hardware)
sleep 90
## Check if container is running
docker ps
## Test the VLLM API
curl http://localhost:8000/v1/models
```
You should see JSON output showing the model is loaded and available.
## Step 6. Deploy Open WebUI
Launch the Open WebUI container with ARM64 architecture support for your NVIDIA Spark device.
```bash
## Define custom prompt suggestions for medical X-ray analysis
PROMPT_SUGGESTIONS='[
{
"title": ["Analyze X-Ray Image", "Find abnormalities and support devices"],
"content": "Find abnormalities and support devices in the image."
Check that both containers are running properly and all endpoints are accessible:
```bash
## Check container status
docker ps
## You should see both vllm-server and open-webui containers running
## Test the VLLM API
curl http://localhost:8000/v1/models
## Should return JSON with model information
## Test Open WebUI accessibility
curl -f http://localhost:3000
## Should return HTTP 200 response
```
## Step 8. Configure Open WebUI
Configure the front-end interface to connect to your VLLM backend:
1. Open your web browser and navigate to **http://<YOUR_SPARK_DEVICE_IP>:3000**
2. Since authentication is disabled, you'll have direct access to the interface
3. The OpenAI API connection is pre-configured through environment variables
4. Go to the main chat screen, click **"Select a model"**, and choose **monai-reasoning-cxr-3b**
5.**Important:** Navigate to **Chat Controls** → **Advanced Params** and disable **"Reasoning Tags"** to get the full reasoning output from the model
You can now upload a chest X-ray image and ask questions directly in the chat interface. The custom prompt suggestion "Find abnormalities and support devices in the image" will be available for quick access.