dgx-spark-playbooks/SKILL.md at 34cd09b53ea666e609feba4e43d96be1bc014641

mirror of https://github.com/NVIDIA/dgx-spark-playbooks.git synced 2026-04-22 18:13:52 +00:00

github-actions[bot] 4d0d20d39f chore: regenerate skills/ from upstream playbooks [skip ci]

2026-04-19 09:25:00 +00:00

1.2 KiB

Raw Blame History

name	description
dgx-spark-multi-modal-inference	Setup multi-modal inference with TensorRT — on NVIDIA DGX Spark. Use when setting up multi-modal-inference on Spark hardware.

Setup multi-modal inference with TensorRT

Multi-modal inference combines different data types, such as text, images, and audio, within a single model pipeline to generate or interpret richer outputs.
Instead of processing one input type at a time, multi-modal systems have shared representations that text-to-image generation, image captioning, or vision-language reasoning.

On GPUs, this enables parallel processing across modalities for faster, higher-fidelity results for tasks that combine language and vision.

Outcome: You'll deploy GPU-accelerated multi-modal inference capabilities on NVIDIA Spark using TensorRT to run Flux.1 and SDXL diffusion models with optimized performance across multiple precision formats (FP16, FP8, FP4).

Duration: 45-90 minutes depending on model downloads and optimization steps

Full playbook: /home/runner/work/dgx-spark-playbooks/dgx-spark-playbooks/nvidia/multi-modal-inference/README.md

1.2 KiB Raw Blame History

Multi-modal Inference

1.2 KiB

Raw Blame History