NVIDIA NIM Deployment
Datature Vi includes built-in tools for deploying and running inference with NVIDIA NIM (NVIDIA Inference Microservice) containers. NIM containers are pre-built, GPU-accelerated Docker images that expose an OpenAI-compatible API, so you can move from a trained model to a production-ready inference endpoint without building custom serving infrastructure.
Key concepts
- Vi SDK installed with deployment extras:
pip install vi-sdk[deployment] - NVIDIA NGC API key: Create an account and generate a key
- Docker installed with GPU support (NVIDIA Container Toolkit)
- NVIDIA GPU with compatible drivers
- Secret key if you want to deploy custom weights from Vi
What NIM gives you
NIM handles the infrastructure work: pulling the container image, loading model weights, starting the server, and confirming readiness before handing control back to your code. The service exposes standard /v1/chat/completions endpoints, so it works with any OpenAI-compatible client.
Supported NIM images:
Supported task types:
Architecture support: NVIDIA NIM currently provides container images for Cosmos-Reason1 and Cosmos-Reason2 only. Other architectures (Qwen3.5, Qwen3-VL, Qwen2.5-VL, InternVL3.5) are not available as NIM containers. This is an NVIDIA NIM limitation, not a Datature Vi restriction.
LoRA adapters: Models trained with LoRA deploy on NIM with full base model weights only. NVIDIA NIM does not support PEFT adapters, so LoRA adapter weights are not applied during NIM inference. To serve LoRA-trained models from other architectures with adapter weights applied, use the Vi SDK for local inference or download your weights in HuggingFace format and serve them with a framework like vLLM.
Coming soon: Datature Vi is developing a built-in vLLM-based local deployment server that will support all architectures (including Qwen3.5 and Qwen3-VL) with full LoRA adapter support. This will provide an OpenAI-compatible serving endpoint without the NIM container limitations.
Installation
Install the SDK with deployment dependencies:
pip install vi-sdk[deployment]This pulls in the Docker SDK, OpenAI client library, and NIM-specific utilities.
Quick start
The workflow has three steps: configure, deploy, predict.
from vi.deployment.nim import NIMDeployer, NIMPredictor, NIMConfig
# 1. Configure
config = NIMConfig(
nvidia_api_key="nvapi-...", # Your NGC API key
run_id="YOUR_DATATURE_VI_RUN_ID", # Omit to use base model weights
port=8000
)
# 2. Deploy
deployer = NIMDeployer(config)
result = deployer.deploy()
print(f"Container running on port {result.port}")
print(f"Available models: {result.available_models}")
# 3. Predict
predictor = NIMPredictor(config=config)
output = predictor(source="image.jpg", stream=False)
print(output.caption)
# 4. Stop when done
NIMDeployer.stop(result.container_name)Deployment workflow
The NIMDeployer.deploy() call runs through these steps automatically:
- Pull NIM image: downloads from NVIDIA Container Registry using your NGC API key
- Download weights (optional): fetches custom model weights from Vi when
run_idis provided - Start container: launches Docker container with GPU access and volume mounts
- Health check: waits up to 10 minutes for the service to become ready
- Return result: hands back a
NIMDeploymentResultwith the container ID, name, port, and available model list
Environment variables
Store credentials in environment variables rather than in code:
export NGC_API_KEY="nvapi-..."
export DATATURE_VI_SECRET_KEY="your-secret-key"
export DATATURE_VI_ORGANIZATION_ID="your-org-id"from vi.deployment.nim import NIMDeployer, NIMConfig
# Credentials loaded from environment
config = NIMConfig(run_id="YOUR_DATATURE_VI_RUN_ID")
deployer = NIMDeployer(config)
result = deployer.deploy()Common patterns
Batch image processing
from vi.deployment.nim import NIMPredictor, NIMConfig
from pathlib import Path
config = NIMConfig(nvidia_api_key="nvapi-...", port=8000)
predictor = NIMPredictor(
model_name="cosmos-reason2-2b",
task_type="vqa",
port=8000
)
image_dir = Path("./images")
results = []
for image_path in image_dir.glob("*.jpg"):
result = predictor(
source=str(image_path),
user_prompt="What is in this image?",
stream=False
)
results.append({"image": image_path.name, "result": result.result})
print(f"Processed {len(results)} images")Video analysis
Cosmos-Reason2 models accept video files directly:
from vi.deployment.nim import NIMPredictor, NIMSamplingParams
predictor = NIMPredictor(
model_name="cosmos-reason2-2b",
task_type="vqa",
port=8000
)
video_params = NIMSamplingParams(
temperature=0.2,
max_tokens=4096,
media_io_kwargs={"fps": 2.0},
mm_processor_kwargs={"shortest_edge": 336, "longest_edge": 672}
)
result = predictor(
source="video.mp4",
user_prompt="Describe what happens in this video",
stream=False,
sampling_params=video_params
)
print(result.result)Error handling
from vi.deployment.nim import NIMDeployer, NIMConfig
from vi.deployment.nim.exceptions import (
ContainerExistsError,
InvalidConfigError,
ModelIncompatibilityError
)
try:
config = NIMConfig(nvidia_api_key="nvapi-...")
deployer = NIMDeployer(config)
result = deployer.deploy()
except InvalidConfigError as e:
print(f"Invalid configuration: {e}")
except ContainerExistsError as e:
print(f"Container already exists: {e.container_name}")
# Reuse with use_existing_container=True, or remove with auto_kill_existing_container=True
except ModelIncompatibilityError as e:
print(f"Model incompatible with {e.image_name}: {e.details}")
except Exception as e:
print(f"Deployment failed: {e}")Next steps
Deploy A Container
Pull NIM images, mount custom weights, manage container lifecycle, and configure deployment options.
Run Inference
Process images and videos, configure sampling parameters, and work with structured outputs.
Configuration Reference
Complete parameter reference for NIMConfig and NIMSamplingParams.
Updated 5 days ago
