NVIDIA NIM Deployment

Datature Vi includes built-in tools for deploying and running inference with NVIDIA NIM (NVIDIA Inference Microservice) containers. NIM containers are pre-built, GPU-accelerated Docker images that expose an OpenAI-compatible API, so you can move from a trained model to a production-ready inference endpoint without building custom serving infrastructure.

Key concepts

Docker container: A self-contained package that includes the model, runtime environment, and serving infrastructure. You start it with one command and it handles everything.

NVIDIA NIM: Pre-built containers optimized for GPU-accelerated model serving. NIM handles request batching, GPU memory management, and API routing.

OpenAI-compatible API: NIM uses the same API format as OpenAI's chat completion endpoint (/v1/chat/completions). Existing code that calls OpenAI can point at your NIM container with minimal changes.

NGC (NVIDIA GPU Cloud): NVIDIA's registry for GPU-optimized container images. You need an NGC API key to pull NIM containers.

Before You Start

What NIM gives you

NIM handles the infrastructure work: pulling the container image, loading model weights, starting the server, and confirming readiness before handing control back to your code. The service exposes standard /v1/chat/completions endpoints, so it works with any OpenAI-compatible client.

Supported NIM images:

Supported NIM images

Image
Parameters
Video support
`cosmos-reason1-7b`
7B
No
`cosmos-reason2-2b`
2B
Yes
`cosmos-reason2-8b`
8B
Yes

Supported task types:

Visual Question Answering (VQA)

Answer questions about images and videos.

Phrase Grounding

Locate objects with bounding boxes.

Freeform Text

Open-ended image and video analysis.

Model and LoRA Support

Architecture support: NVIDIA NIM currently provides container images for Cosmos-Reason1 and Cosmos-Reason2 only. Other architectures (Qwen3.5, Qwen3-VL, Qwen2.5-VL, InternVL3.5) are not available as NIM containers. This is an NVIDIA NIM limitation, not a Datature Vi restriction.

LoRA adapters: Models trained with LoRA deploy on NIM with full base model weights only. NVIDIA NIM does not support PEFT adapters, so LoRA adapter weights are not applied during NIM inference. To serve LoRA-trained models from other architectures with adapter weights applied, use the Vi SDK for local inference or download your weights in HuggingFace format and serve them with a framework like vLLM.

Coming soon: Datature Vi is developing a built-in vLLM-based local deployment server that will support all architectures (including Qwen3.5 and Qwen3-VL) with full LoRA adapter support. This will provide an OpenAI-compatible serving endpoint without the NIM container limitations.

Installation

Install the SDK with deployment dependencies:

pip install vi-sdk[deployment]

This pulls in the Docker SDK, OpenAI client library, and NIM-specific utilities.

Quick start

The workflow has three steps: configure, deploy, predict.

from vi.deployment.nim import NIMDeployer, NIMPredictor, NIMConfig

# 1. Configure
config = NIMConfig(
    nvidia_api_key="nvapi-...",   # Your NGC API key
    run_id="YOUR_DATATURE_VI_RUN_ID",  # Omit to use base model weights
    port=8000
)

# 2. Deploy
deployer = NIMDeployer(config)
result = deployer.deploy()
print(f"Container running on port {result.port}")
print(f"Available models: {result.available_models}")

# 3. Predict
predictor = NIMPredictor(config=config)
output = predictor(source="image.jpg", stream=False)
print(output.caption)

# 4. Stop when done
NIMDeployer.stop(result.container_name)

Deployment workflow

The NIMDeployer.deploy() call runs through these steps automatically:

  1. Pull NIM image: downloads from NVIDIA Container Registry using your NGC API key
  2. Download weights (optional): fetches custom model weights from Vi when run_id is provided
  3. Start container: launches Docker container with GPU access and volume mounts
  4. Health check: waits up to 10 minutes for the service to become ready
  5. Return result: hands back a NIMDeploymentResult with the container ID, name, port, and available model list

Environment variables

Store credentials in environment variables rather than in code:

Environment variables

Name
Type
Description
Required
Default
NGC_API_KEY
string
NVIDIA NGC API key for container registry access
Required
DATATURE_VI_SECRET_KEY
string
Vi SDK secret key for custom weight downloads
Optional
DATATURE_VI_ORGANIZATION_ID
string
Vi organization ID for custom weight downloads
Optional
export NGC_API_KEY="nvapi-..."
export DATATURE_VI_SECRET_KEY="your-secret-key"
export DATATURE_VI_ORGANIZATION_ID="your-org-id"
from vi.deployment.nim import NIMDeployer, NIMConfig

# Credentials loaded from environment
config = NIMConfig(run_id="YOUR_DATATURE_VI_RUN_ID")
deployer = NIMDeployer(config)
result = deployer.deploy()

Common patterns

Batch image processing

from vi.deployment.nim import NIMPredictor, NIMConfig
from pathlib import Path

config = NIMConfig(nvidia_api_key="nvapi-...", port=8000)
predictor = NIMPredictor(
    model_name="cosmos-reason2-2b",
    task_type="vqa",
    port=8000
)

image_dir = Path("./images")
results = []

for image_path in image_dir.glob("*.jpg"):
    result = predictor(
        source=str(image_path),
        user_prompt="What is in this image?",
        stream=False
    )
    results.append({"image": image_path.name, "result": result.result})

print(f"Processed {len(results)} images")

Video analysis

Cosmos-Reason2 models accept video files directly:

from vi.deployment.nim import NIMPredictor, NIMSamplingParams

predictor = NIMPredictor(
    model_name="cosmos-reason2-2b",
    task_type="vqa",
    port=8000
)

video_params = NIMSamplingParams(
    temperature=0.2,
    max_tokens=4096,
    media_io_kwargs={"fps": 2.0},
    mm_processor_kwargs={"shortest_edge": 336, "longest_edge": 672}
)

result = predictor(
    source="video.mp4",
    user_prompt="Describe what happens in this video",
    stream=False,
    sampling_params=video_params
)
print(result.result)

Error handling

from vi.deployment.nim import NIMDeployer, NIMConfig
from vi.deployment.nim.exceptions import (
    ContainerExistsError,
    InvalidConfigError,
    ModelIncompatibilityError
)

try:
    config = NIMConfig(nvidia_api_key="nvapi-...")
    deployer = NIMDeployer(config)
    result = deployer.deploy()
except InvalidConfigError as e:
    print(f"Invalid configuration: {e}")
except ContainerExistsError as e:
    print(f"Container already exists: {e.container_name}")
    # Reuse with use_existing_container=True, or remove with auto_kill_existing_container=True
except ModelIncompatibilityError as e:
    print(f"Model incompatible with {e.image_name}: {e.details}")
except Exception as e:
    print(f"Deployment failed: {e}")

Next steps

Deploy A Container

Pull NIM images, mount custom weights, manage container lifecycle, and configure deployment options.

Run Inference

Process images and videos, configure sampling parameters, and work with structured outputs.

Configuration Reference

Complete parameter reference for NIMConfig and NIMSamplingParams.