NIM Configuration

The Datature Vi SDK provides two configuration classes for NVIDIA NIM: NIMConfig controls container deployment, and NIMSamplingParams controls inference behavior. This page documents every parameter for both classes.

NIMConfig

NIMConfig specifies how the NIM container is deployed: which image to pull, which port to use, whether to load custom weights, and how to manage the container lifecycle.

from vi.deployment.nim import NIMConfig

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-2b",
    port=8000,
    # ... additional options
)

Credentials

NIMConfig: Credentials

Name
Type
Description
Required
Default
nvidia_api_key
string
NVIDIA NGC API key for container registry authentication. Must start with nvapi-. Can also be set via the NGC_API_KEY environment variable. If the env var is set, this parameter is optional.
Required
secret_key
string
Vi SDK secret key for downloading custom model weights. Can also be set via DATATURE_VI_SECRET_KEY. Required when using run_id.
Optional
organization_id
string
Vi organization ID for downloading custom model weights. Can also be set via DATATURE_VI_ORGANIZATION_ID. Required when using run_id.
Optional
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    secret_key="your-secret-key",
    organization_id="your-org-id"
)
# Set in shell:
# export NGC_API_KEY="nvapi-..."
# export DATATURE_VI_SECRET_KEY="your-secret-key"
# export DATATURE_VI_ORGANIZATION_ID="your-org-id"

config = NIMConfig()  # reads from environment

Image selection

NIMConfig: Image selection

Name
Type
Description
Required
Default
image_name
string
NIM image name (without registry prefix). Supported values: cosmos-reason1-7b, cosmos-reason2-2b, cosmos-reason2-8b.
Optional
"cosmos-reason2-2b"
tag
string
Docker image tag to pull. Use a specific version like 1.0.0 to pin to a release.
Optional
"latest"
# 7B model, image reasoning only
config = NIMConfig(nvidia_api_key="nvapi-...", image_name="cosmos-reason1-7b")

# 2B model with video support (default)
config = NIMConfig(nvidia_api_key="nvapi-...", image_name="cosmos-reason2-2b")

# 8B model with video support
config = NIMConfig(nvidia_api_key="nvapi-...", image_name="cosmos-reason2-8b")

# Pin to a specific version
config = NIMConfig(nvidia_api_key="nvapi-...", image_name="cosmos-reason2-2b", tag="1.0.0")

Network

NIMConfig: Network

Name
Type
Description
Required
Default
port
integer
Port to expose the NIM service on. Valid range: 1024–65535.
Optional
8000

Resources

NIMConfig: Resources

Name
Type
Description
Required
Default
shm_size
string
Shared memory size for the container. Use 32GB for most models, 64GB for large models or high batch sizes, 16GB on limited hardware.
Optional
"32GB"
max_model_len
integer
Maximum model context length in tokens. 4096 is faster with limited context. 16384 allows longer inputs but uses more GPU memory.
Optional
8192
local_cache_dir
string
Local directory to mount as the NIM cache. Useful when pointing to a fast SSD or shared storage.
Optional
null (uses ~/.cache/nim)

Container lifecycle

NIMConfig: Container lifecycle

Name
Type
Description
Required
Default
use_existing_container
boolean
Whether to reuse a running container with the same name. When true, a second deploy() call returns the existing container immediately. When false, the deployer requires a fresh container.
Optional
true
auto_kill_existing_container
boolean
Whether to stop and remove an existing container before creating a new one. Only relevant when use_existing_container is false. Use with caution in shared environments.
Optional
false
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=True   # default (instant on second deploy)
)
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=False,
    auto_kill_existing_container=True  # removes any existing container first
)

Output

NIMConfig: Output

Name
Type
Description
Required
Default
stream_logs
boolean
Whether to stream container logs to the terminal during startup. Set to false for cleaner output in scripts.
Optional
true
force_pull
boolean
Whether to pull the image from the registry even if a local copy exists. Useful for picking up security patches.
Optional
false

Custom weights

NIMConfig: Custom weights

Name
Type
Description
Required
Default
run_id
string
Run ID of the trained model to deploy from Datature Vi. When provided, the SDK downloads model weights before starting the container. Requires secret_key and organization_id.
Optional
null
ckpt
string
Checkpoint identifier for the custom weights. If omitted, the best available checkpoint is used.
Optional
null
model_save_path
string
Local directory where downloaded model weights are saved.
Optional
"~/.datature/vi/models"
overwrite
boolean
Whether to re-download model weights even if they exist locally. Set to true after a model retrain.
Optional
false

Advanced

NIMConfig: Advanced

Name
Type
Description
Required
Default
endpoint
string
Custom Vi API endpoint URL. Use only for testing against staging or development environments.
Optional
null

NIMSamplingParams

NIMSamplingParams controls how the model generates tokens during inference. Pass an instance to the sampling_params argument of NIMPredictor.__call__().

from vi.deployment.nim import NIMSamplingParams

params = NIMSamplingParams(
    temperature=0.7,
    max_tokens=1024,
    top_p=0.95
)

Sampling

NIMSamplingParams: Sampling

Name
Type
Description
Required
Default
temperature
number
Controls output randomness. Range: 0.0–2.0. 0.0 = greedy (deterministic). 0.2–0.5 = focused and consistent. 0.7–1.0 = balanced (recommended). 1.0+ = more diverse and creative.
Optional
0.7
top_p
number
Nucleus sampling threshold. Range: 0.0–1.0. Considers only tokens whose cumulative probability reaches this value. 0.9 = focused, 0.95 = balanced, 1.0 = all tokens.
Optional
0.95
top_k
integer
Number of top tokens to consider. Range: -1 or ≥1. -1 disables top-k filtering. 10–20 is tightly focused, 50 is balanced.
Optional
50
min_p
number
Minimum token probability relative to the most likely token. Range: 0.0–1.0. Filters out low-probability tokens.
Optional
0.05
from vi.deployment.nim import NIMSamplingParams

# Deterministic output
params = NIMSamplingParams(temperature=0.0)

# Fast, focused
params = NIMSamplingParams(temperature=0.2, max_tokens=256)

# Balanced (recommended starting point)
params = NIMSamplingParams(temperature=0.7, top_p=0.95, top_k=50)

# Creative
params = NIMSamplingParams(temperature=1.0, max_tokens=2048)

Length control

NIMSamplingParams: Length control

Name
Type
Description
Required
Default
max_tokens
integer
Maximum number of tokens to generate. Range: ≥1. Increase for detailed descriptions; reduce for faster inference.
Optional
1024
min_tokens
integer
Minimum tokens to generate before the end-of-sequence token can appear. Use to ensure a minimum response length.
Optional
0

Repetition control

NIMSamplingParams: Repetition control

Name
Type
Description
Required
Default
presence_penalty
number
Penalty applied to tokens that have already appeared in the output. Range: -2.0–2.0. Positive values encourage new vocabulary; negative values encourage repetition.
Optional
0.0
frequency_penalty
number
Penalty proportional to how often a token has been generated. Range: -2.0–2.0. Higher positive values reduce word repetition.
Optional
0.0
repetition_penalty
number
Multiplicative penalty on tokens appearing in prompt or output. Range: 0.0–2.0. 1.0 = no penalty. >1.0 discourages repetition.
Optional
1.05

Stop sequences

NIMSamplingParams: Stop sequences

Name
Type
Description
Required
Default
stop
string
One or more strings that stop generation when produced. Accepts a single string or a list of strings. The stop string is not included in the output.
Optional
null
params = NIMSamplingParams(stop="END")
params = NIMSamplingParams(stop=["\n\n", "END", "STOP"])

Determinism

NIMSamplingParams: Determinism

Name
Type
Description
Required
Default
seed
integer
Random seed for reproducible generation. Set to null for non-reproducible output.
Optional
0
ignore_eos
boolean
Whether to ignore the end-of-sequence token and continue generating until max_tokens. Primarily for performance benchmarking.
Optional
false

Log probabilities

NIMSamplingParams: Log probabilities

Name
Type
Description
Required
Default
logprobs
integer
Number of log probabilities to return per output token. Range: ≥0.
Optional
null
prompt_logprobs
integer
Number of log probabilities to return per prompt token. Range: ≥0.
Optional
null

Guided decoding

Guided decoding constrains the model's output to a specific format. Only one guided decoding option can be active at a time.

NIMSamplingParams: Guided decoding

Name
Type
Description
Required
Default
guided_json
object
JSON schema (as a dict or JSON string) that the output must conform to. The model produces valid JSON matching the schema.
Optional
null
guided_regex
string
Regular expression pattern the output must match.
Optional
null
guided_choice
array
List of allowed output strings. The model picks one from this list.
Optional
null
guided_grammar
string
Context-free grammar in EBNF format that constrains the output structure.
Optional
null
params = NIMSamplingParams(
    temperature=0.2,
    guided_choice=["yes", "no", "maybe"]
)
schema = {
    "type": "object",
    "properties": {
        "objects": {"type": "array", "items": {"type": "string"}},
        "count": {"type": "integer"}
    },
    "required": ["objects", "count"]
}

params = NIMSamplingParams(temperature=0.3, guided_json=schema)
params = NIMSamplingParams(
    temperature=0.1,
    guided_regex=r"\d{4}-\d{2}-\d{2}"  # YYYY-MM-DD
)
grammar = """
root ::= "The answer is " answer "."
answer ::= "yes" | "no" | "maybe"
"""
params = NIMSamplingParams(temperature=0.2, guided_grammar=grammar)

Video processing

These parameters apply only to Cosmos-Reason2 models (cosmos-reason2-2b, cosmos-reason2-8b).

NIMSamplingParams: Video processing

Name
Type
Description
Required
Default
media_io_kwargs
object
Video frame sampling options. Use either fps (float, frames per second) or num_frames (int, total frames to sample), not both.
Optional
null
mm_processor_kwargs
object
Frame dimension options. Keys: shortest_edge (int) and longest_edge (int). Controls resize behavior before the model processes each frame.
Optional
null
params = NIMSamplingParams(
    temperature=0.2,
    max_tokens=4096,
    media_io_kwargs={"fps": 2.0},           # 2 frames per second
    mm_processor_kwargs={"shortest_edge": 336, "longest_edge": 672}
)
params = NIMSamplingParams(
    temperature=0.2,
    media_io_kwargs={"num_frames": 16},     # exactly 16 frames
    mm_processor_kwargs={"shortest_edge": 336, "longest_edge": 672}
)

Use either fps or num_frames in media_io_kwargs, not both. Providing both may cause an error.


Configuration examples

Development (quick iteration)

from vi.deployment.nim import NIMConfig, NIMSamplingParams

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-2b",
    port=8000,
    use_existing_container=True,
    stream_logs=True
)

params = NIMSamplingParams(
    temperature=0.7,
    max_tokens=1024,
    top_p=0.95,
    top_k=50
)

Production (custom weights, stable output)

import os
from pathlib import Path
from vi.deployment.nim import NIMConfig, NIMSamplingParams

config = NIMConfig(
    nvidia_api_key=os.getenv("NGC_API_KEY"),
    secret_key=os.getenv("DATATURE_VI_SECRET_KEY"),
    organization_id=os.getenv("DATATURE_VI_ORGANIZATION_ID"),
    run_id="your-run-id",
    image_name="cosmos-reason2-2b",
    port=8000,
    shm_size="64GB",
    max_model_len=8192,
    local_cache_dir="/mnt/ssd/nim_cache",
    model_save_path=Path("/mnt/models"),
    use_existing_container=True,
    auto_kill_existing_container=False,
    stream_logs=False,
    force_pull=False,
    overwrite=False
)

params = NIMSamplingParams(
    temperature=0.3,
    max_tokens=2048,
    top_p=0.95,
    repetition_penalty=1.05,
    seed=42
)

Video analysis

from vi.deployment.nim import NIMConfig, NIMSamplingParams

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-2b",
    port=8000,
    shm_size="64GB",
    max_model_len=16384
)

params = NIMSamplingParams(
    temperature=0.2,
    max_tokens=4096,
    media_io_kwargs={"fps": 2.0},
    mm_processor_kwargs={"shortest_edge": 336, "longest_edge": 672}
)

Related resources

Deploy A Container

Use NIMConfig to deploy a NIM container with custom weights and lifecycle options.

Run Inference

Pass NIMSamplingParams to NIMPredictor to control inference behavior.

Troubleshooting

Debug common NIM deployment and inference errors.