NVIDIA NIM Deployment

Overview

Vi SDK provides tools for deploying and running inference with NVIDIA NIM (NVIDIA Inference Microservices) containers. NIM containers offer optimized GPU-accelerated inference for vision-language models.

📋
Prerequisites

Vi SDK installed with NIM deployment dependencies

NVIDIA NGC API key for container registry access — Create account and generate API key

Docker installed with GPU support

NVIDIA GPU with appropriate drivers

Secret key for deploying custom weights (optional)

Get started with Vi SDK →

Key features:

Deploy NIM containers with automatic image pulling and setup
Run inference with Vision-Language Models
Custom model weights from Datature Vi
Video processing support with Cosmos-Reason2
Sampling parameters and guided decoding
Container lifecycle management with automatic readiness checks

📘
Currently supported
NIM Images:

cosmos-reason1-7b — 7B parameter model for visual reasoning

cosmos-reason2-2b — 2B parameter model with video support

cosmos-reason2-8b — 8B parameter model with video support

Task types:

Visual question answering — Answer questions about images and videos

Phrase grounding — Detect and locate objects with bounding boxes

Freeform — Open-ended image and video analysis

More NIM images and task types will be supported in future releases.
Learn about task types →

🚧
LoRA Adapter Limitation
Models trained on Vi with LoRA adapters will only use the full base model weights when deployed with NIM. NVIDIA NIM does not currently support PEFT adapters, so LoRA adapter weights are not utilized during inference.

Installation

Install the SDK with NIM deployment dependencies:

pip install vi-sdk[deployment]

This includes Docker SDK, OpenAI client, and NIM-specific dependencies.

Complete installation guide →

Quick start

Deploy NIM container

from vi.deployment.nim import NIMDeployer, NIMConfig

# Create deployment config
config = NIMConfig(
    nvidia_api_key="nvapi-...",  # Your NGC API key
    port=8000,
    stream_logs=True
)

# Deploy container
deployer = NIMDeployer(config)
result = deployer.deploy()

print(f"Container running on port {result.port}")
print(f"Available models: {result.available_models}")

Deploy with custom weights

Deploy a model trained on Datature Vi:

from vi.deployment.nim import NIMDeployer, NIMConfig

# Configure with custom weights
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    secret_key="YOUR_DATATURE_VI_SECRET_KEY",
    organization_id="YOUR_DATATURE_VI_ORGANIZATION_ID",
    run_id="YOUR_DATATURE_VI_RUN_ID",
    port=8000
)

# Deploy with custom weights
deployer = NIMDeployer(config)
result = deployer.deploy()

Run inference

from vi.deployment.nim import NIMPredictor, NIMSamplingParams

# Create predictor (task_type auto-inferred from metadata)
predictor = NIMPredictor(config=config)

# Or specify task_type explicitly
predictor = NIMPredictor(
    task_type="phrase-grounding",
    config=config
)

# Run inference
result = predictor(source="image.jpg", stream=False)
print(f"Caption: {result.caption}")

# With custom sampling parameters
params = NIMSamplingParams(
    temperature=0.7,
    max_tokens=512,
    top_p=0.95
)
result = predictor(
    source="image.jpg",
    stream=False,
    sampling_params=params
)

Stop container

from vi.deployment.nim import NIMDeployer

# Stop container by name
NIMDeployer.stop("cosmos-reason2-2b")

Core concepts

NIM containers

NVIDIA NIM containers are pre-optimized Docker containers that provide:

Optimized inference — GPU-accelerated inference with NVIDIA optimizations
OpenAI-compatible API — Standard /v1/chat/completions endpoints compatible with OpenAI API format
Model serving — Automatic model loading and health checking
Container lifecycle — Start, stop, and monitor containers with Docker

Deployment workflow

Pull NIM image — Download from NVIDIA Container Registry using your NGC API key
Download weights (optional) — Fetch custom model weights from Datature Vi
Start container — Launch with GPU support and volume mounts
Health check — Wait for service readiness
Run inference — Make predictions via OpenAI-compatible API

Custom weights

Deploy models trained on Datature Vi by providing your run_id:

Automatically downloads model weights using Vi SDK
Mounts weights into container
NIM loads custom weights at startup
Falls back to base model if weights incompatible

Note: LoRA adapters are not currently supported by NVIDIA NIM.

Documentation

Deploy container

Deploy NIM containers with automatic setup, custom weights, and lifecycle management

Run inference

Run predictions on images and videos with streaming and sampling control

Configuration

Complete reference for NIMConfig and NIMSamplingParams

Troubleshoot

Common issues and solutions for NIM deployment

Environment variables

Set these environment variables for easier configuration:

Variable	Description	Required
`NGC_API_KEY`	NVIDIA NGC API key for container registry — Get API key	Yes
`DATATURE_VI_SECRET_KEY`	Vi SDK secret key for custom weights	No
`DATATURE_VI_ORGANIZATION_ID`	Vi organization ID for custom weights	No

Example:

export NGC_API_KEY="nvapi-..."
export DATATURE_VI_SECRET_KEY="your-secret-key"
export DATATURE_VI_ORGANIZATION_ID="your-org-id"

Then initialize without explicit credentials:

from vi.deployment.nim import NIMDeployer, NIMConfig

# Credentials loaded from environment
config = NIMConfig(run_id="YOUR_DATATURE_VI_RUN_ID")
deployer = NIMDeployer(config)
result = deployer.deploy()

Common workflows

Deploy and predict

Complete workflow from deployment to inference:

from vi.deployment.nim import NIMDeployer, NIMPredictor, NIMConfig

# Step 1: Deploy container
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    run_id="YOUR_DATATURE_VI_RUN_ID",
    port=8000
)

deployer = NIMDeployer(config)
result = deployer.deploy()

# Step 2: Create predictor
predictor = NIMPredictor(config=config)

# Step 3: Run inference
result = predictor(source="image.jpg", stream=False)
print(result.caption)

# Step 4: Stop container when done
NIMDeployer.stop(result.container_name)

Batch processing

Process multiple images with a deployed container:

from vi.deployment.nim import NIMPredictor
from pathlib import Path

# Create predictor
predictor = NIMPredictor(
    model_name="cosmos-reason2-2b",
    task_type="vqa",
    port=8000
)

# Process all images in directory
image_dir = Path("./images")
results = []

for image_path in image_dir.glob("*.jpg"):
    result = predictor(
        source=str(image_path),
        user_prompt="What is in this image?",
        stream=False
    )
    results.append({
        "image": image_path.name,
        "result": result.result
    })

print(f"Processed {len(results)} images")

Video analysis

Analyze videos with Cosmos-Reason2 models:

from vi.deployment.nim import NIMPredictor, NIMSamplingParams

# Create predictor for video
predictor = NIMPredictor(
    model_name="cosmos-reason2-2b",
    task_type="vqa",
    port=8000
)

# Configure video sampling
video_params = NIMSamplingParams(
    temperature=0.2,
    max_tokens=4096,
    media_io_kwargs={"fps": 2.0},  # Sample at 2 FPS
    mm_processor_kwargs={
        "shortest_edge": 336,
        "longest_edge": 672
    }
)

# Analyze video
result = predictor(
    source="video.mp4",
    user_prompt="Describe what happens in this video",
    stream=False,
    sampling_params=video_params
)

print(f"Video analysis: {result.result}")

Performance tips

Container configuration

Optimize container resource allocation:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    port=8000,
    shm_size="32GB",  # Shared memory for large models
    max_model_len=8192,  # Maximum sequence length
    local_cache_dir="~/.cache/nim"  # Cache directory
)

Sampling optimization

Balance speed and quality with sampling parameters:

# Fast inference
fast_params = NIMSamplingParams(
    temperature=0.2,  # Lower temperature for faster, deterministic output
    max_tokens=512,   # Limit output length
    top_p=0.9
)

# High quality inference
quality_params = NIMSamplingParams(
    temperature=0.7,  # Higher temperature for diverse output
    max_tokens=2048,  # Longer output
    top_p=0.95,
    top_k=50
)

Reuse containers

Reuse existing containers to save startup time:

# First deployment - creates container
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=True  # Reuse if exists
)

deployer = NIMDeployer(config)
result1 = deployer.deploy()  # Creates container

# Second deployment - reuses container
result2 = deployer.deploy()  # Instant (reuses existing)

Best practices

Security

Store NGC API key in environment variables, not in code
Use separate service accounts for production deployments
Restrict Docker socket access to authorized users
Keep NIM images updated with latest security patches

Resource management

Use appropriate shm_size for your model size (default: 32GB)
Monitor GPU memory usage with nvidia-smi
Stop containers when not in use to free resources
Use container resource limits in production

Error handling

from vi.deployment.nim import NIMDeployer, NIMConfig
from vi.deployment.nim.exceptions import (
    ContainerExistsError,
    InvalidConfigError,
    ModelIncompatibilityError
)

try:
    config = NIMConfig(nvidia_api_key="nvapi-...")
    deployer = NIMDeployer(config)
    result = deployer.deploy()
except InvalidConfigError as e:
    print(f"Invalid configuration: {e}")
except ContainerExistsError as e:
    print(f"Container already exists: {e.container_name}")
    # Reuse existing or stop and redeploy
except ModelIncompatibilityError as e:
    print(f"Model incompatible with {e.image_name}: {e.details}")
except Exception as e:
    print(f"Deployment failed: {e}")

Related resources

Vi SDK getting started — Quick start guide for the Vi SDK
Deploy container — Deploy NIM containers with custom weights
Run inference — Execute predictions on images and videos
Configuration — Complete configuration reference
Troubleshooting — Common problems and solutions
Vi SDK inference — Alternative inference with ViModel
Task types — VQA and phrase grounding explained
API resources — Complete SDK reference documentation

Need help?

We're here to support your VLMOps journey. Reach out through any of these channels:

Contact Support

Get help from our team via our website or email us at [email protected]

Join Our Community

Connect with other Datature users, share ideas, and get community support on Slack

Explore Resources

Read our Blog
Check out GitHub
Watch Tutorials

Schedule a Demo

Book a personalized demo to see how Datature Vi can accelerate your vision AI projects