NIM Configuration
NIM configuration
Complete reference for NIM deployment and inference configuration.
Overview
The Vi SDK provides two main configuration classes for NIM:
- NIMConfig — Deployment and container configuration
- NIMSamplingParams — Inference sampling parameters
NIMConfig
Configuration for NIM container deployment.
Class definition
from vi.deployment.nim import NIMConfig
config = NIMConfig(
nvidia_api_key="nvapi-...",
image_name="cosmos-reason2-2b",
port=8000,
# ... additional options
)Required parameters
nvidia_api_key
Type: str
NVIDIA NGC API key for container registry authentication.
config = NIMConfig(nvidia_api_key="nvapi-...")Environment variable: NGC_API_KEY
export NGC_API_KEY="nvapi-..."# Loads from environment
config = NIMConfig()Format: Must start with nvapi-
Image selection
image_name
Type: str
Default: "cosmos-reason2-2b"
Name of the NIM image to pull (without registry prefix).
# Cosmos Reason1 7B
config = NIMConfig(
nvidia_api_key="nvapi-...",
image_name="cosmos-reason1-7b"
)
# Cosmos Reason2 2B (default)
config = NIMConfig(
nvidia_api_key="nvapi-...",
image_name="cosmos-reason2-2b"
)
# Cosmos Reason2 8B
config = NIMConfig(
nvidia_api_key="nvapi-...",
image_name="cosmos-reason2-8b"
)Supported images:
cosmos-reason1-7bcosmos-reason2-2bcosmos-reason2-8b
tag
Type: str
Default: "latest"
Docker image tag to pull.
# Use latest version
config = NIMConfig(
nvidia_api_key="nvapi-...",
tag="latest"
)
# Use specific version
config = NIMConfig(
nvidia_api_key="nvapi-...",
tag="1.0.0"
)Network configuration
port
Type: int
Default: 8000
Port to expose the NIM service on.
config = NIMConfig(
nvidia_api_key="nvapi-...",
port=8080 # Custom port
)Valid range: 1024-65535 (unprivileged ports)
Resource configuration
shm_size
Type: str
Default: "32GB"
Shared memory size for the container.
# Standard (default)
config = NIMConfig(
nvidia_api_key="nvapi-...",
shm_size="32GB"
)
# Large models
config = NIMConfig(
nvidia_api_key="nvapi-...",
shm_size="64GB"
)
# Limited resources
config = NIMConfig(
nvidia_api_key="nvapi-...",
shm_size="16GB"
)max_model_len
Type: int
Default: 8192
Maximum model context length.
# Default context
config = NIMConfig(
nvidia_api_key="nvapi-...",
max_model_len=8192
)
# Long context
config = NIMConfig(
nvidia_api_key="nvapi-...",
max_model_len=16384
)
# Short context (faster)
config = NIMConfig(
nvidia_api_key="nvapi-...",
max_model_len=4096
)local_cache_dir
Type: str | None
Default: None (uses ~/.cache/nim)
Local directory to mount for NIM cache.
# Custom cache directory
config = NIMConfig(
nvidia_api_key="nvapi-...",
local_cache_dir="/mnt/ssd/nim_cache"
)
# Default location
config = NIMConfig(
nvidia_api_key="nvapi-...",
local_cache_dir=None # Uses ~/.cache/nim
)Container lifecycle
use_existing_container
Type: bool
Default: True
Whether to reuse an existing container with the same name.
# Reuse existing (instant deployment)
config = NIMConfig(
nvidia_api_key="nvapi-...",
use_existing_container=True
)
# Always create new (stops if exists)
config = NIMConfig(
nvidia_api_key="nvapi-...",
use_existing_container=False
)auto_kill_existing_container
Type: bool
Default: False
Whether to automatically stop and remove existing containers.
# Automatically remove conflicts
config = NIMConfig(
nvidia_api_key="nvapi-...",
use_existing_container=False,
auto_kill_existing_container=True
)Warning: Setting
auto_kill_existing_container=Truewill stop and remove any existing container with the same name without confirmation.
Output options
stream_logs
Type: bool
Default: True
Whether to stream container logs during startup.
# Show logs (default)
config = NIMConfig(
nvidia_api_key="nvapi-...",
stream_logs=True
)
# Hide logs
config = NIMConfig(
nvidia_api_key="nvapi-...",
stream_logs=False
)force_pull
Type: bool
Default: False
Whether to pull the image even if it exists locally.
# Use cached image
config = NIMConfig(
nvidia_api_key="nvapi-...",
force_pull=False
)
# Always pull latest
config = NIMConfig(
nvidia_api_key="nvapi-...",
force_pull=True
)Custom weights
secret_key
Type: str | None
Default: None
Vi SDK secret key for downloading custom model weights.
config = NIMConfig(
nvidia_api_key="nvapi-...",
secret_key="your-secret-key"
)Environment variable: DATATURE_VI_SECRET_KEY
export DATATURE_VI_SECRET_KEY="your-secret-key"organization_id
Type: str | None
Default: None
Vi organization ID for downloading custom model weights.
config = NIMConfig(
nvidia_api_key="nvapi-...",
organization_id="your-org-id"
)Environment variable: DATATURE_VI_ORGANIZATION_ID
export DATATURE_VI_ORGANIZATION_ID="your-org-id"run_id
Type: str | None
Default: None
Run ID of the trained model to deploy from Datature Vi.
config = NIMConfig(
nvidia_api_key="nvapi-...",
secret_key="your-secret-key",
organization_id="your-org-id",
run_id="your-run-id" # Enables custom weights
)ckpt
Type: str | None
Default: None
Optional checkpoint identifier for custom weights.
config = NIMConfig(
nvidia_api_key="nvapi-...",
run_id="your-run-id",
ckpt="checkpoint-1000"
)model_save_path
Type: Path | str
Default: Path("~/.datature/vi/models")
Directory to save downloaded model files.
from pathlib import Path
config = NIMConfig(
nvidia_api_key="nvapi-...",
run_id="your-run-id",
model_save_path=Path("./models")
)overwrite
Type: bool
Default: False
Whether to re-download model weights even if they exist locally.
# Use cached weights
config = NIMConfig(
nvidia_api_key="nvapi-...",
run_id="your-run-id",
overwrite=False
)
# Force re-download
config = NIMConfig(
nvidia_api_key="nvapi-...",
run_id="your-run-id",
overwrite=True
)Advanced options
endpoint
Type: str | None
Default: None
Custom Vi API endpoint (for testing/development).
config = NIMConfig(
nvidia_api_key="nvapi-...",
endpoint="https://api-staging.datature.io"
)NIMSamplingParams
Configuration for inference sampling and guided decoding.
Class definition
from vi.deployment.nim import NIMSamplingParams
params = NIMSamplingParams(
temperature=0.7,
max_tokens=1024,
top_p=0.95,
# ... additional options
)Basic sampling
temperature
Type: float
Default: 0.7
Range: 0.0 - 2.0
Controls randomness of sampling.
# Deterministic (greedy)
params = NIMSamplingParams(temperature=0.0)
# Focused
params = NIMSamplingParams(temperature=0.2)
# Balanced (default)
params = NIMSamplingParams(temperature=0.7)
# Creative
params = NIMSamplingParams(temperature=1.0)
# Very creative
params = NIMSamplingParams(temperature=1.5)Guidelines:
0.0— Greedy decoding, fully deterministic0.1-0.3— Very focused, consistent output0.5-0.8— Balanced creativity (recommended)0.9-1.5— High creativity, more diverse1.5+— Very creative, potentially inconsistent
top_p
Type: float
Default: 0.95
Range: 0.0 - 1.0
Nucleus sampling threshold (cumulative probability).
# Very focused
params = NIMSamplingParams(top_p=0.8)
# Balanced (default)
params = NIMSamplingParams(top_p=0.95)
# Consider all tokens
params = NIMSamplingParams(top_p=1.0)top_k
Type: int
Default: 50
Range: -1 or >= 1
Number of top tokens to consider.
# Very focused
params = NIMSamplingParams(top_k=10)
# Balanced (default)
params = NIMSamplingParams(top_k=50)
# No filtering
params = NIMSamplingParams(top_k=-1)min_p
Type: float
Default: 0.05
Range: 0.0 - 1.0
Minimum probability for a token relative to the most likely token.
# Strict filtering
params = NIMSamplingParams(min_p=0.1)
# Balanced (default)
params = NIMSamplingParams(min_p=0.05)
# No filtering
params = NIMSamplingParams(min_p=0.0)Length control
max_tokens
Type: int
Default: 1024
Range: >= 1
Maximum number of tokens to generate.
# Short responses
params = NIMSamplingParams(max_tokens=256)
# Balanced (default)
params = NIMSamplingParams(max_tokens=1024)
# Long responses
params = NIMSamplingParams(max_tokens=4096)min_tokens
Type: int
Default: 0
Range: >= 0
Minimum number of tokens before EOS/stop can be generated.
# Ensure at least 100 tokens
params = NIMSamplingParams(
min_tokens=100,
max_tokens=2048
)Repetition control
presence_penalty
Type: float
Default: 0.0
Range: -2.0 - 2.0
Penalizes tokens based on whether they appear in the generated text.
# Encourage repetition
params = NIMSamplingParams(presence_penalty=-0.5)
# Neutral (default)
params = NIMSamplingParams(presence_penalty=0.0)
# Discourage repetition
params = NIMSamplingParams(presence_penalty=0.5)
# Strongly discourage repetition
params = NIMSamplingParams(presence_penalty=1.0)Effect:
- Positive values encourage new tokens
- Negative values encourage repetition
- Higher magnitude = stronger effect
frequency_penalty
Type: float
Default: 0.0
Range: -2.0 - 2.0
Penalizes tokens based on their frequency in generated text.
# Encourage repeated words
params = NIMSamplingParams(frequency_penalty=-0.5)
# Neutral (default)
params = NIMSamplingParams(frequency_penalty=0.0)
# Discourage repeated words
params = NIMSamplingParams(frequency_penalty=0.5)Effect:
- Positive values discourage frequently used tokens
- Negative values encourage repetition
- Higher magnitude = stronger effect
repetition_penalty
Type: float
Default: 1.05
Range: 0.0 - 2.0
Penalizes tokens based on whether they appear in prompt and generated text.
# No penalty
params = NIMSamplingParams(repetition_penalty=1.0)
# Light penalty (default)
params = NIMSamplingParams(repetition_penalty=1.05)
# Strong penalty
params = NIMSamplingParams(repetition_penalty=1.2)Effect:
1.0— No penalty> 1.0— Discourage repetition< 1.0— Encourage repetition
Stop sequences
stop
Type: str | list[str] | None
Default: None
String(s) that stop generation when produced.
# Single stop sequence
params = NIMSamplingParams(stop="END")
# Multiple stop sequences
params = NIMSamplingParams(stop=["\n\n", "END", "STOP"])Note: Output will not include the stop string(s).
Determinism
seed
Type: int | None
Default: 0
Random seed for reproducible generation.
# Reproducible
params = NIMSamplingParams(
seed=42,
temperature=0.7 # Still has randomness, but reproducible
)
# Different seed
params = NIMSamplingParams(seed=123)
# No seed (non-reproducible)
params = NIMSamplingParams(seed=None)ignore_eos
Type: bool
Default: False
Whether to ignore end-of-sequence token and continue generating.
# Respect EOS (default)
params = NIMSamplingParams(ignore_eos=False)
# Ignore EOS (for benchmarking)
params = NIMSamplingParams(
ignore_eos=True,
max_tokens=1000
)Use case: Performance benchmarking
Log probabilities
logprobs
Type: int | None
Default: None
Range: >= 0
Number of log probabilities to return per output token.
# No probabilities (default)
params = NIMSamplingParams(logprobs=None)
# Top 5 token probabilities
params = NIMSamplingParams(logprobs=5)prompt_logprobs
Type: int | None
Default: None
Range: >= 0
Number of log probabilities to return per prompt token.
# Prompt token probabilities
params = NIMSamplingParams(
logprobs=5,
prompt_logprobs=5
)Guided decoding
guided_json
Type: str | dict | None
Default: None
JSON schema to constrain output structure.
# Dict schema
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
params = NIMSamplingParams(guided_json=schema)
# String schema
import json
params = NIMSamplingParams(guided_json=json.dumps(schema))guided_regex
Type: str | None
Default: None
Regular expression pattern to constrain output format.
# Email format
params = NIMSamplingParams(
guided_regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
)
# Date format (YYYY-MM-DD)
params = NIMSamplingParams(
guided_regex=r"\d{4}-\d{2}-\d{2}"
)
# Phone number
params = NIMSamplingParams(
guided_regex=r"\+?1?\d{9,15}"
)guided_choice
Type: list[str] | None
Default: None
List of valid output choices.
# Binary choice
params = NIMSamplingParams(
guided_choice=["yes", "no"]
)
# Multiple options
params = NIMSamplingParams(
guided_choice=["positive", "negative", "neutral"]
)guided_grammar
Type: str | None
Default: None
Context-free grammar in EBNF format.
# Simple grammar
grammar = """
root ::= "The answer is " answer "."
answer ::= "yes" | "no" | "maybe"
"""
params = NIMSamplingParams(guided_grammar=grammar)
# Complex grammar
grammar = """
root ::= sentence+
sentence ::= subject " " verb " " object "."
subject ::= "The cat" | "A dog" | "My friend"
verb ::= "sees" | "likes" | "knows"
object ::= "a bird" | "the moon" | "something"
"""
params = NIMSamplingParams(guided_grammar=grammar)Video processing (Cosmos-Reason2)
media_io_kwargs
Type: dict[str, float | int] | None
Default: None
Video frame sampling parameters.
# Sample by FPS
params = NIMSamplingParams(
media_io_kwargs={"fps": 2.0} # 2 frames per second
)
# Sample by frame count
params = NIMSamplingParams(
media_io_kwargs={"num_frames": 16} # Exactly 16 frames
)Options:
fps(float) — Frames per second to samplenum_frames(int) — Total number of frames to sample
Important: Use either
fpsornum_frames, not both.
mm_processor_kwargs
Type: dict[str, int] | None
Default: None
Frame dimension parameters for video processing.
# Standard resolution
params = NIMSamplingParams(
mm_processor_kwargs={
"shortest_edge": 336,
"longest_edge": 672
}
)
# High resolution
params = NIMSamplingParams(
mm_processor_kwargs={
"shortest_edge": 672,
"longest_edge": 1344
}
)Options:
shortest_edge(int) — Resize shortest edge to this valuelongest_edge(int) — Resize longest edge to this value
Configuration examples
Balanced configuration
config = NIMConfig(
nvidia_api_key="nvapi-...",
image_name="cosmos-reason2-2b",
port=8000,
use_existing_container=True,
stream_logs=True
)
params = NIMSamplingParams(
temperature=0.7,
max_tokens=1024,
top_p=0.95,
top_k=50
)Production configuration
import os
from pathlib import Path
config = NIMConfig(
nvidia_api_key=os.getenv("NGC_API_KEY"),
secret_key=os.getenv("DATATURE_VI_SECRET_KEY"),
organization_id=os.getenv("DATATURE_VI_ORGANIZATION_ID"),
run_id="your-run-id",
image_name="cosmos-reason2-2b",
port=8000,
shm_size="64GB",
max_model_len=8192,
local_cache_dir="/mnt/ssd/nim_cache",
model_save_path=Path("/mnt/models"),
use_existing_container=True,
auto_kill_existing_container=False,
stream_logs=False,
force_pull=False,
overwrite=False
)
params = NIMSamplingParams(
temperature=0.3,
max_tokens=2048,
top_p=0.95,
repetition_penalty=1.05,
seed=42
)Video analysis configuration
config = NIMConfig(
nvidia_api_key="nvapi-...",
image_name="cosmos-reason2-2b",
port=8000,
shm_size="64GB",
max_model_len=16384
)
params = NIMSamplingParams(
temperature=0.2,
max_tokens=4096,
media_io_kwargs={"fps": 2.0},
mm_processor_kwargs={
"shortest_edge": 336,
"longest_edge": 672
}
)See also
- NIM Overview — Introduction to NVIDIA NIM deployment
- Deploy container — Deploy NIM containers
- Run inference — Execute predictions
- Troubleshooting — Common problems and solutions
Need help?
We're here to support your VLMOps journey. Reach out through any of these channels:
Updated 1 day ago
