NVIDIA NIM Deployment

Overview

Vi SDK provides tools for deploying and running inference with NVIDIA NIM (NVIDIA Inference Microservices) containers. NIM containers offer optimized GPU-accelerated inference for vision-language models.

📋

Prerequisites

Get started with Vi SDK →

Key features:

📘

Currently supported

NIM Images:

  • cosmos-reason1-7b — 7B parameter model for visual reasoning
  • cosmos-reason2-2b — 2B parameter model with video support
  • cosmos-reason2-8b — 8B parameter model with video support

Task types:

More NIM images and task types will be supported in future releases.

Learn about task types →

🚧

LoRA Adapter Limitation

Models trained on Vi with LoRA adapters will only use the full base model weights when deployed with NIM. NVIDIA NIM does not currently support PEFT adapters, so LoRA adapter weights are not utilized during inference.


Installation

Install the SDK with NIM deployment dependencies:

pip install vi-sdk[deployment]

This includes Docker SDK, OpenAI client, and NIM-specific dependencies.

Complete installation guide →


Quick start

Deploy NIM container

from vi.deployment.nim import NIMDeployer, NIMConfig

# Create deployment config
config = NIMConfig(
    nvidia_api_key="nvapi-...",  # Your NGC API key
    port=8000,
    stream_logs=True
)

# Deploy container
deployer = NIMDeployer(config)
result = deployer.deploy()

print(f"Container running on port {result.port}")
print(f"Available models: {result.available_models}")

Deploy with custom weights

Deploy a model trained on Datature Vi:

from vi.deployment.nim import NIMDeployer, NIMConfig

# Configure with custom weights
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    secret_key="YOUR_DATATURE_VI_SECRET_KEY",
    organization_id="YOUR_DATATURE_VI_ORGANIZATION_ID",
    run_id="YOUR_DATATURE_VI_RUN_ID",
    port=8000
)

# Deploy with custom weights
deployer = NIMDeployer(config)
result = deployer.deploy()

Run inference

from vi.deployment.nim import NIMPredictor, NIMSamplingParams

# Create predictor (task_type auto-inferred from metadata)
predictor = NIMPredictor(config=config)

# Or specify task_type explicitly
predictor = NIMPredictor(
    task_type="phrase-grounding",
    config=config
)

# Run inference
result = predictor(source="image.jpg", stream=False)
print(f"Caption: {result.caption}")

# With custom sampling parameters
params = NIMSamplingParams(
    temperature=0.7,
    max_tokens=512,
    top_p=0.95
)
result = predictor(
    source="image.jpg",
    stream=False,
    sampling_params=params
)

Stop container

from vi.deployment.nim import NIMDeployer

# Stop container by name
NIMDeployer.stop("cosmos-reason2-2b")

Core concepts

NIM containers

NVIDIA NIM containers are pre-optimized Docker containers that provide:

Deployment workflow

  1. Pull NIM image — Download from NVIDIA Container Registry using your NGC API key
  2. Download weights (optional) — Fetch custom model weights from Datature Vi
  3. Start container — Launch with GPU support and volume mounts
  4. Health check — Wait for service readiness
  5. Run inference — Make predictions via OpenAI-compatible API

Custom weights

Deploy models trained on Datature Vi by providing your run_id:

Note: LoRA adapters are not currently supported by NVIDIA NIM.


Documentation

Deploy container

Deploy NIM containers with automatic setup, custom weights, and lifecycle management

Run inference

Run predictions on images and videos with streaming and sampling control


Environment variables

Set these environment variables for easier configuration:

VariableDescriptionRequired
NGC_API_KEYNVIDIA NGC API key for container registry — Get API keyYes
DATATURE_VI_SECRET_KEYVi SDK secret key for custom weightsNo
DATATURE_VI_ORGANIZATION_IDVi organization ID for custom weightsNo

Example:

export NGC_API_KEY="nvapi-..."
export DATATURE_VI_SECRET_KEY="your-secret-key"
export DATATURE_VI_ORGANIZATION_ID="your-org-id"

Then initialize without explicit credentials:

from vi.deployment.nim import NIMDeployer, NIMConfig

# Credentials loaded from environment
config = NIMConfig(run_id="YOUR_DATATURE_VI_RUN_ID")
deployer = NIMDeployer(config)
result = deployer.deploy()

Common workflows

Deploy and predict

Complete workflow from deployment to inference:

from vi.deployment.nim import NIMDeployer, NIMPredictor, NIMConfig

# Step 1: Deploy container
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    run_id="YOUR_DATATURE_VI_RUN_ID",
    port=8000
)

deployer = NIMDeployer(config)
result = deployer.deploy()

# Step 2: Create predictor
predictor = NIMPredictor(config=config)

# Step 3: Run inference
result = predictor(source="image.jpg", stream=False)
print(result.caption)

# Step 4: Stop container when done
NIMDeployer.stop(result.container_name)

Batch processing

Process multiple images with a deployed container:

from vi.deployment.nim import NIMPredictor
from pathlib import Path

# Create predictor
predictor = NIMPredictor(
    model_name="cosmos-reason2-2b",
    task_type="vqa",
    port=8000
)

# Process all images in directory
image_dir = Path("./images")
results = []

for image_path in image_dir.glob("*.jpg"):
    result = predictor(
        source=str(image_path),
        user_prompt="What is in this image?",
        stream=False
    )
    results.append({
        "image": image_path.name,
        "result": result.result
    })

print(f"Processed {len(results)} images")

Video analysis

Analyze videos with Cosmos-Reason2 models:

from vi.deployment.nim import NIMPredictor, NIMSamplingParams

# Create predictor for video
predictor = NIMPredictor(
    model_name="cosmos-reason2-2b",
    task_type="vqa",
    port=8000
)

# Configure video sampling
video_params = NIMSamplingParams(
    temperature=0.2,
    max_tokens=4096,
    media_io_kwargs={"fps": 2.0},  # Sample at 2 FPS
    mm_processor_kwargs={
        "shortest_edge": 336,
        "longest_edge": 672
    }
)

# Analyze video
result = predictor(
    source="video.mp4",
    user_prompt="Describe what happens in this video",
    stream=False,
    sampling_params=video_params
)

print(f"Video analysis: {result.result}")

Performance tips

Container configuration

Optimize container resource allocation:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    port=8000,
    shm_size="32GB",  # Shared memory for large models
    max_model_len=8192,  # Maximum sequence length
    local_cache_dir="~/.cache/nim"  # Cache directory
)

Sampling optimization

Balance speed and quality with sampling parameters:

# Fast inference
fast_params = NIMSamplingParams(
    temperature=0.2,  # Lower temperature for faster, deterministic output
    max_tokens=512,   # Limit output length
    top_p=0.9
)

# High quality inference
quality_params = NIMSamplingParams(
    temperature=0.7,  # Higher temperature for diverse output
    max_tokens=2048,  # Longer output
    top_p=0.95,
    top_k=50
)

Reuse containers

Reuse existing containers to save startup time:

# First deployment - creates container
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=True  # Reuse if exists
)

deployer = NIMDeployer(config)
result1 = deployer.deploy()  # Creates container

# Second deployment - reuses container
result2 = deployer.deploy()  # Instant (reuses existing)

Best practices

Security

Resource management

Error handling

from vi.deployment.nim import NIMDeployer, NIMConfig
from vi.deployment.nim.exceptions import (
    ContainerExistsError,
    InvalidConfigError,
    ModelIncompatibilityError
)

try:
    config = NIMConfig(nvidia_api_key="nvapi-...")
    deployer = NIMDeployer(config)
    result = deployer.deploy()
except InvalidConfigError as e:
    print(f"Invalid configuration: {e}")
except ContainerExistsError as e:
    print(f"Container already exists: {e.container_name}")
    # Reuse existing or stop and redeploy
except ModelIncompatibilityError as e:
    print(f"Model incompatible with {e.image_name}: {e.details}")
except Exception as e:
    print(f"Deployment failed: {e}")

Related resources