From 09c6bd900aea811fda3a41fce25b0b56bc5d8ed2 Mon Sep 17 00:00:00 2001
From: Cory Scott <cory.scott@gmail.com>
Date: Thu, 16 Apr 2026 13:59:47 -0400
Subject: [PATCH] fix: correct gemma-4-31B-it model filename to bf16 in
 llama-cpp playbook

The download command and subsequent --model paths referenced
gemma-4-31B-it-f16.gguf, but the file published on Hugging Face
is gemma-4-31B-it-bf16.gguf. Without this fix the hf download
step fails and later steps point to a missing file.
---
 nvidia/llama-cpp/README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/nvidia/llama-cpp/README.md b/nvidia/llama-cpp/README.md
index a801717..44a72d8 100644
--- a/nvidia/llama-cpp/README.md
+++ b/nvidia/llama-cpp/README.md
@@ -74,7 +74,7 @@ The following models are supported with llama.cpp on Spark. All listed models ar
 
 ## Step 1. Verify prerequisites
 
-This walkthrough uses **Gemma 4 31B IT** (`gemma-4-31B-it-f16.gguf`) as the example checkpoint. You can substitute another GGUF from [`ggml-org/gemma-4-31B-it-GGUF`](https://huggingface.co/ggml-org/gemma-4-31B-it-GGUF) (for example `Q4_K_M` or `Q8_0`) by changing the `hf download` filename and `--model` path in later steps.
+This walkthrough uses **Gemma 4 31B IT** (`gemma-4-31B-it-bf16.gguf`) as the example checkpoint. You can substitute another GGUF from [`ggml-org/gemma-4-31B-it-GGUF`](https://huggingface.co/ggml-org/gemma-4-31B-it-GGUF) (for example `Q4_K_M` or `Q8_0`) by changing the `hf download` filename and `--model` path in later steps.
 
 Ensure the required tools are installed:
 
@@ -127,7 +127,7 @@ llama.cpp loads models in **GGUF** format. **gemma-4-31B-it** is available in GG
 
 ```bash
 hf download ggml-org/gemma-4-31B-it-GGUF \
-  gemma-4-31B-it-f16.gguf \
+  gemma-4-31B-it-bf16.gguf \
   --local-dir ~/models/gemma-4-31B-it-GGUF
 ```
 
@@ -139,7 +139,7 @@ From your `llama.cpp/build` directory, launch the OpenAI-compatible server with
 
 ```bash
 ./bin/llama-server \
-  --model ~/models/gemma-4-31B-it-GGUF/gemma-4-31B-it-f16.gguf \
+  --model ~/models/gemma-4-31B-it-GGUF/gemma-4-31B-it-bf16.gguf \
   --host 0.0.0.0 \
   --port 30000 \
   --n-gpu-layers 99 \
@@ -195,7 +195,7 @@ Example shape of the response (fields vary by llama.cpp version; `message` may i
     }
   ],
   "created": 1765916539,
-  "model": "gemma-4-31B-it-f16.gguf",
+  "model": "gemma-4-31B-it-bf16.gguf",
   "object": "chat.completion",
   "usage": {
     "completion_tokens": 100,