NVIDIA NIM Deployment
Overview
Vi SDK provides tools for deploying and running inference with NVIDIA NIM (NVIDIA Inference Microservices) containers. NIM containers offer optimized GPU-accelerated inference for vision-language models.
Prerequisites
- Vi SDK installed with NIM deployment dependencies
- NVIDIA NGC API key for container registry access — Create account and generate API key
- Docker installed with GPU support
- NVIDIA GPU with appropriate drivers
- Secret key for deploying custom weights (optional)
Key features:
- Deploy NIM containers with automatic image pulling and setup
- Run inference with Vision-Language Models
- Custom model weights from Datature Vi
- Video processing support with Cosmos-Reason2
- Sampling parameters and guided decoding
- Container lifecycle management with automatic readiness checks
Currently supportedNIM Images:
cosmos-reason1-7b— 7B parameter model for visual reasoningcosmos-reason2-2b— 2B parameter model with video supportcosmos-reason2-8b— 8B parameter model with video supportTask types:
- Visual question answering — Answer questions about images and videos
- Phrase grounding — Detect and locate objects with bounding boxes
- Freeform — Open-ended image and video analysis
More NIM images and task types will be supported in future releases.
LoRA Adapter LimitationModels trained on Vi with LoRA adapters will only use the full base model weights when deployed with NIM. NVIDIA NIM does not currently support PEFT adapters, so LoRA adapter weights are not utilized during inference.
Installation
Install the SDK with NIM deployment dependencies:
pip install vi-sdk[deployment]This includes Docker SDK, OpenAI client, and NIM-specific dependencies.
Quick start
Deploy NIM container
from vi.deployment.nim import NIMDeployer, NIMConfig
# Create deployment config
config = NIMConfig(
nvidia_api_key="nvapi-...", # Your NGC API key
port=8000,
stream_logs=True
)
# Deploy container
deployer = NIMDeployer(config)
result = deployer.deploy()
print(f"Container running on port {result.port}")
print(f"Available models: {result.available_models}")Deploy with custom weights
Deploy a model trained on Datature Vi:
from vi.deployment.nim import NIMDeployer, NIMConfig
# Configure with custom weights
config = NIMConfig(
nvidia_api_key="nvapi-...",
secret_key="YOUR_DATATURE_VI_SECRET_KEY",
organization_id="YOUR_DATATURE_VI_ORGANIZATION_ID",
run_id="YOUR_DATATURE_VI_RUN_ID",
port=8000
)
# Deploy with custom weights
deployer = NIMDeployer(config)
result = deployer.deploy()Run inference
from vi.deployment.nim import NIMPredictor, NIMSamplingParams
# Create predictor (task_type auto-inferred from metadata)
predictor = NIMPredictor(config=config)
# Or specify task_type explicitly
predictor = NIMPredictor(
task_type="phrase-grounding",
config=config
)
# Run inference
result = predictor(source="image.jpg", stream=False)
print(f"Caption: {result.caption}")
# With custom sampling parameters
params = NIMSamplingParams(
temperature=0.7,
max_tokens=512,
top_p=0.95
)
result = predictor(
source="image.jpg",
stream=False,
sampling_params=params
)Stop container
from vi.deployment.nim import NIMDeployer
# Stop container by name
NIMDeployer.stop("cosmos-reason2-2b")Core concepts
NIM containers
NVIDIA NIM containers are pre-optimized Docker containers that provide:
- Optimized inference — GPU-accelerated inference with NVIDIA optimizations
- OpenAI-compatible API — Standard
/v1/chat/completionsendpoints compatible with OpenAI API format - Model serving — Automatic model loading and health checking
- Container lifecycle — Start, stop, and monitor containers with Docker
Deployment workflow
- Pull NIM image — Download from NVIDIA Container Registry using your NGC API key
- Download weights (optional) — Fetch custom model weights from Datature Vi
- Start container — Launch with GPU support and volume mounts
- Health check — Wait for service readiness
- Run inference — Make predictions via OpenAI-compatible API
Custom weights
Deploy models trained on Datature Vi by providing your run_id:
- Automatically downloads model weights using Vi SDK
- Mounts weights into container
- NIM loads custom weights at startup
- Falls back to base model if weights incompatible
Note: LoRA adapters are not currently supported by NVIDIA NIM.
Documentation
Deploy NIM containers with automatic setup, custom weights, and lifecycle management
Run predictions on images and videos with streaming and sampling control
Complete reference for NIMConfig and NIMSamplingParams
Common issues and solutions for NIM deployment
Environment variables
Set these environment variables for easier configuration:
| Variable | Description | Required |
|---|---|---|
NGC_API_KEY | NVIDIA NGC API key for container registry — Get API key | Yes |
DATATURE_VI_SECRET_KEY | Vi SDK secret key for custom weights | No |
DATATURE_VI_ORGANIZATION_ID | Vi organization ID for custom weights | No |
Example:
export NGC_API_KEY="nvapi-..."
export DATATURE_VI_SECRET_KEY="your-secret-key"
export DATATURE_VI_ORGANIZATION_ID="your-org-id"Then initialize without explicit credentials:
from vi.deployment.nim import NIMDeployer, NIMConfig
# Credentials loaded from environment
config = NIMConfig(run_id="YOUR_DATATURE_VI_RUN_ID")
deployer = NIMDeployer(config)
result = deployer.deploy()Common workflows
Deploy and predict
Complete workflow from deployment to inference:
from vi.deployment.nim import NIMDeployer, NIMPredictor, NIMConfig
# Step 1: Deploy container
config = NIMConfig(
nvidia_api_key="nvapi-...",
run_id="YOUR_DATATURE_VI_RUN_ID",
port=8000
)
deployer = NIMDeployer(config)
result = deployer.deploy()
# Step 2: Create predictor
predictor = NIMPredictor(config=config)
# Step 3: Run inference
result = predictor(source="image.jpg", stream=False)
print(result.caption)
# Step 4: Stop container when done
NIMDeployer.stop(result.container_name)Batch processing
Process multiple images with a deployed container:
from vi.deployment.nim import NIMPredictor
from pathlib import Path
# Create predictor
predictor = NIMPredictor(
model_name="cosmos-reason2-2b",
task_type="vqa",
port=8000
)
# Process all images in directory
image_dir = Path("./images")
results = []
for image_path in image_dir.glob("*.jpg"):
result = predictor(
source=str(image_path),
user_prompt="What is in this image?",
stream=False
)
results.append({
"image": image_path.name,
"result": result.result
})
print(f"Processed {len(results)} images")Video analysis
Analyze videos with Cosmos-Reason2 models:
from vi.deployment.nim import NIMPredictor, NIMSamplingParams
# Create predictor for video
predictor = NIMPredictor(
model_name="cosmos-reason2-2b",
task_type="vqa",
port=8000
)
# Configure video sampling
video_params = NIMSamplingParams(
temperature=0.2,
max_tokens=4096,
media_io_kwargs={"fps": 2.0}, # Sample at 2 FPS
mm_processor_kwargs={
"shortest_edge": 336,
"longest_edge": 672
}
)
# Analyze video
result = predictor(
source="video.mp4",
user_prompt="Describe what happens in this video",
stream=False,
sampling_params=video_params
)
print(f"Video analysis: {result.result}")Performance tips
Container configuration
Optimize container resource allocation:
config = NIMConfig(
nvidia_api_key="nvapi-...",
port=8000,
shm_size="32GB", # Shared memory for large models
max_model_len=8192, # Maximum sequence length
local_cache_dir="~/.cache/nim" # Cache directory
)Sampling optimization
Balance speed and quality with sampling parameters:
# Fast inference
fast_params = NIMSamplingParams(
temperature=0.2, # Lower temperature for faster, deterministic output
max_tokens=512, # Limit output length
top_p=0.9
)
# High quality inference
quality_params = NIMSamplingParams(
temperature=0.7, # Higher temperature for diverse output
max_tokens=2048, # Longer output
top_p=0.95,
top_k=50
)Reuse containers
Reuse existing containers to save startup time:
# First deployment - creates container
config = NIMConfig(
nvidia_api_key="nvapi-...",
use_existing_container=True # Reuse if exists
)
deployer = NIMDeployer(config)
result1 = deployer.deploy() # Creates container
# Second deployment - reuses container
result2 = deployer.deploy() # Instant (reuses existing)Best practices
Security
- Store NGC API key in environment variables, not in code
- Use separate service accounts for production deployments
- Restrict Docker socket access to authorized users
- Keep NIM images updated with latest security patches
Resource management
- Use appropriate
shm_sizefor your model size (default: 32GB) - Monitor GPU memory usage with
nvidia-smi - Stop containers when not in use to free resources
- Use container resource limits in production
Error handling
from vi.deployment.nim import NIMDeployer, NIMConfig
from vi.deployment.nim.exceptions import (
ContainerExistsError,
InvalidConfigError,
ModelIncompatibilityError
)
try:
config = NIMConfig(nvidia_api_key="nvapi-...")
deployer = NIMDeployer(config)
result = deployer.deploy()
except InvalidConfigError as e:
print(f"Invalid configuration: {e}")
except ContainerExistsError as e:
print(f"Container already exists: {e.container_name}")
# Reuse existing or stop and redeploy
except ModelIncompatibilityError as e:
print(f"Model incompatible with {e.image_name}: {e.details}")
except Exception as e:
print(f"Deployment failed: {e}")Related resources
- Vi SDK getting started — Quick start guide for the Vi SDK
- Deploy container — Deploy NIM containers with custom weights
- Run inference — Execute predictions on images and videos
- Configuration — Complete configuration reference
- Troubleshooting — Common problems and solutions
- Vi SDK inference — Alternative inference with ViModel
- Task types — VQA and phrase grounding explained
- API resources — Complete SDK reference documentation
Need help?
We're here to support your VLMOps journey. Reach out through any of these channels:
Updated 1 day ago
