NIM Configuration

NIM configuration

Complete reference for NIM deployment and inference configuration.

Overview

The Vi SDK provides two main configuration classes for NIM:


NIMConfig

Configuration for NIM container deployment.

Class definition

from vi.deployment.nim import NIMConfig

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-2b",
    port=8000,
    # ... additional options
)

Required parameters

nvidia_api_key

Type: str

NVIDIA NGC API key for container registry authentication.

config = NIMConfig(nvidia_api_key="nvapi-...")

Environment variable: NGC_API_KEY

export NGC_API_KEY="nvapi-..."
# Loads from environment
config = NIMConfig()

Format: Must start with nvapi-


Image selection

image_name

Type: str Default: "cosmos-reason2-2b"

Name of the NIM image to pull (without registry prefix).

# Cosmos Reason1 7B
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason1-7b"
)

# Cosmos Reason2 2B (default)
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-2b"
)

# Cosmos Reason2 8B
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-8b"
)

Supported images:

  • cosmos-reason1-7b
  • cosmos-reason2-2b
  • cosmos-reason2-8b

tag

Type: str Default: "latest"

Docker image tag to pull.

# Use latest version
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    tag="latest"
)

# Use specific version
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    tag="1.0.0"
)

Network configuration

port

Type: int Default: 8000

Port to expose the NIM service on.

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    port=8080  # Custom port
)

Valid range: 1024-65535 (unprivileged ports)


Resource configuration

shm_size

Type: str Default: "32GB"

Shared memory size for the container.

# Standard (default)
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    shm_size="32GB"
)

# Large models
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    shm_size="64GB"
)

# Limited resources
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    shm_size="16GB"
)

max_model_len

Type: int Default: 8192

Maximum model context length.

# Default context
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    max_model_len=8192
)

# Long context
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    max_model_len=16384
)

# Short context (faster)
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    max_model_len=4096
)

local_cache_dir

Type: str | None Default: None (uses ~/.cache/nim)

Local directory to mount for NIM cache.

# Custom cache directory
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    local_cache_dir="/mnt/ssd/nim_cache"
)

# Default location
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    local_cache_dir=None  # Uses ~/.cache/nim
)

Container lifecycle

use_existing_container

Type: bool Default: True

Whether to reuse an existing container with the same name.

# Reuse existing (instant deployment)
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=True
)

# Always create new (stops if exists)
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=False
)

auto_kill_existing_container

Type: bool Default: False

Whether to automatically stop and remove existing containers.

# Automatically remove conflicts
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=False,
    auto_kill_existing_container=True
)
🚧

Warning: Setting auto_kill_existing_container=True will stop and remove any existing container with the same name without confirmation.


Output options

stream_logs

Type: bool Default: True

Whether to stream container logs during startup.

# Show logs (default)
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    stream_logs=True
)

# Hide logs
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    stream_logs=False
)

force_pull

Type: bool Default: False

Whether to pull the image even if it exists locally.

# Use cached image
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    force_pull=False
)

# Always pull latest
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    force_pull=True
)

Custom weights

secret_key

Type: str | None Default: None

Vi SDK secret key for downloading custom model weights.

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    secret_key="your-secret-key"
)

Environment variable: DATATURE_VI_SECRET_KEY

export DATATURE_VI_SECRET_KEY="your-secret-key"

organization_id

Type: str | None Default: None

Vi organization ID for downloading custom model weights.

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    organization_id="your-org-id"
)

Environment variable: DATATURE_VI_ORGANIZATION_ID

export DATATURE_VI_ORGANIZATION_ID="your-org-id"

run_id

Type: str | None Default: None

Run ID of the trained model to deploy from Datature Vi.

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    secret_key="your-secret-key",
    organization_id="your-org-id",
    run_id="your-run-id"  # Enables custom weights
)

ckpt

Type: str | None Default: None

Optional checkpoint identifier for custom weights.

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    run_id="your-run-id",
    ckpt="checkpoint-1000"
)

model_save_path

Type: Path | str Default: Path("~/.datature/vi/models")

Directory to save downloaded model files.

from pathlib import Path

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    run_id="your-run-id",
    model_save_path=Path("./models")
)

overwrite

Type: bool Default: False

Whether to re-download model weights even if they exist locally.

# Use cached weights
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    run_id="your-run-id",
    overwrite=False
)

# Force re-download
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    run_id="your-run-id",
    overwrite=True
)

Advanced options

endpoint

Type: str | None Default: None

Custom Vi API endpoint (for testing/development).

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    endpoint="https://api-staging.datature.io"
)

NIMSamplingParams

Configuration for inference sampling and guided decoding.

Class definition

from vi.deployment.nim import NIMSamplingParams

params = NIMSamplingParams(
    temperature=0.7,
    max_tokens=1024,
    top_p=0.95,
    # ... additional options
)

Basic sampling

temperature

Type: float Default: 0.7 Range: 0.0 - 2.0

Controls randomness of sampling.

# Deterministic (greedy)
params = NIMSamplingParams(temperature=0.0)

# Focused
params = NIMSamplingParams(temperature=0.2)

# Balanced (default)
params = NIMSamplingParams(temperature=0.7)

# Creative
params = NIMSamplingParams(temperature=1.0)

# Very creative
params = NIMSamplingParams(temperature=1.5)

Guidelines:

  • 0.0 — Greedy decoding, fully deterministic
  • 0.1-0.3 — Very focused, consistent output
  • 0.5-0.8 — Balanced creativity (recommended)
  • 0.9-1.5 — High creativity, more diverse
  • 1.5+ — Very creative, potentially inconsistent

top_p

Type: float Default: 0.95 Range: 0.0 - 1.0

Nucleus sampling threshold (cumulative probability).

# Very focused
params = NIMSamplingParams(top_p=0.8)

# Balanced (default)
params = NIMSamplingParams(top_p=0.95)

# Consider all tokens
params = NIMSamplingParams(top_p=1.0)

top_k

Type: int Default: 50 Range: -1 or >= 1

Number of top tokens to consider.

# Very focused
params = NIMSamplingParams(top_k=10)

# Balanced (default)
params = NIMSamplingParams(top_k=50)

# No filtering
params = NIMSamplingParams(top_k=-1)

min_p

Type: float Default: 0.05 Range: 0.0 - 1.0

Minimum probability for a token relative to the most likely token.

# Strict filtering
params = NIMSamplingParams(min_p=0.1)

# Balanced (default)
params = NIMSamplingParams(min_p=0.05)

# No filtering
params = NIMSamplingParams(min_p=0.0)

Length control

max_tokens

Type: int Default: 1024 Range: >= 1

Maximum number of tokens to generate.

# Short responses
params = NIMSamplingParams(max_tokens=256)

# Balanced (default)
params = NIMSamplingParams(max_tokens=1024)

# Long responses
params = NIMSamplingParams(max_tokens=4096)

min_tokens

Type: int Default: 0 Range: >= 0

Minimum number of tokens before EOS/stop can be generated.

# Ensure at least 100 tokens
params = NIMSamplingParams(
    min_tokens=100,
    max_tokens=2048
)

Repetition control

presence_penalty

Type: float Default: 0.0 Range: -2.0 - 2.0

Penalizes tokens based on whether they appear in the generated text.

# Encourage repetition
params = NIMSamplingParams(presence_penalty=-0.5)

# Neutral (default)
params = NIMSamplingParams(presence_penalty=0.0)

# Discourage repetition
params = NIMSamplingParams(presence_penalty=0.5)

# Strongly discourage repetition
params = NIMSamplingParams(presence_penalty=1.0)

Effect:

  • Positive values encourage new tokens
  • Negative values encourage repetition
  • Higher magnitude = stronger effect

frequency_penalty

Type: float Default: 0.0 Range: -2.0 - 2.0

Penalizes tokens based on their frequency in generated text.

# Encourage repeated words
params = NIMSamplingParams(frequency_penalty=-0.5)

# Neutral (default)
params = NIMSamplingParams(frequency_penalty=0.0)

# Discourage repeated words
params = NIMSamplingParams(frequency_penalty=0.5)

Effect:

  • Positive values discourage frequently used tokens
  • Negative values encourage repetition
  • Higher magnitude = stronger effect

repetition_penalty

Type: float Default: 1.05 Range: 0.0 - 2.0

Penalizes tokens based on whether they appear in prompt and generated text.

# No penalty
params = NIMSamplingParams(repetition_penalty=1.0)

# Light penalty (default)
params = NIMSamplingParams(repetition_penalty=1.05)

# Strong penalty
params = NIMSamplingParams(repetition_penalty=1.2)

Effect:

  • 1.0 — No penalty
  • > 1.0 — Discourage repetition
  • < 1.0 — Encourage repetition

Stop sequences

stop

Type: str | list[str] | None Default: None

String(s) that stop generation when produced.

# Single stop sequence
params = NIMSamplingParams(stop="END")

# Multiple stop sequences
params = NIMSamplingParams(stop=["\n\n", "END", "STOP"])

Note: Output will not include the stop string(s).


Determinism

seed

Type: int | None Default: 0

Random seed for reproducible generation.

# Reproducible
params = NIMSamplingParams(
    seed=42,
    temperature=0.7  # Still has randomness, but reproducible
)

# Different seed
params = NIMSamplingParams(seed=123)

# No seed (non-reproducible)
params = NIMSamplingParams(seed=None)

ignore_eos

Type: bool Default: False

Whether to ignore end-of-sequence token and continue generating.

# Respect EOS (default)
params = NIMSamplingParams(ignore_eos=False)

# Ignore EOS (for benchmarking)
params = NIMSamplingParams(
    ignore_eos=True,
    max_tokens=1000
)

Use case: Performance benchmarking


Log probabilities

logprobs

Type: int | None Default: None Range: >= 0

Number of log probabilities to return per output token.

# No probabilities (default)
params = NIMSamplingParams(logprobs=None)

# Top 5 token probabilities
params = NIMSamplingParams(logprobs=5)

prompt_logprobs

Type: int | None Default: None Range: >= 0

Number of log probabilities to return per prompt token.

# Prompt token probabilities
params = NIMSamplingParams(
    logprobs=5,
    prompt_logprobs=5
)

Guided decoding

guided_json

Type: str | dict | None Default: None

JSON schema to constrain output structure.

# Dict schema
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name", "age"]
}

params = NIMSamplingParams(guided_json=schema)

# String schema
import json
params = NIMSamplingParams(guided_json=json.dumps(schema))

guided_regex

Type: str | None Default: None

Regular expression pattern to constrain output format.

# Email format
params = NIMSamplingParams(
    guided_regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
)

# Date format (YYYY-MM-DD)
params = NIMSamplingParams(
    guided_regex=r"\d{4}-\d{2}-\d{2}"
)

# Phone number
params = NIMSamplingParams(
    guided_regex=r"\+?1?\d{9,15}"
)

guided_choice

Type: list[str] | None Default: None

List of valid output choices.

# Binary choice
params = NIMSamplingParams(
    guided_choice=["yes", "no"]
)

# Multiple options
params = NIMSamplingParams(
    guided_choice=["positive", "negative", "neutral"]
)

guided_grammar

Type: str | None Default: None

Context-free grammar in EBNF format.

# Simple grammar
grammar = """
root ::= "The answer is " answer "."
answer ::= "yes" | "no" | "maybe"
"""

params = NIMSamplingParams(guided_grammar=grammar)

# Complex grammar
grammar = """
root ::= sentence+
sentence ::= subject " " verb " " object "."
subject ::= "The cat" | "A dog" | "My friend"
verb ::= "sees" | "likes" | "knows"
object ::= "a bird" | "the moon" | "something"
"""

params = NIMSamplingParams(guided_grammar=grammar)

Video processing (Cosmos-Reason2)

media_io_kwargs

Type: dict[str, float | int] | None Default: None

Video frame sampling parameters.

# Sample by FPS
params = NIMSamplingParams(
    media_io_kwargs={"fps": 2.0}  # 2 frames per second
)

# Sample by frame count
params = NIMSamplingParams(
    media_io_kwargs={"num_frames": 16}  # Exactly 16 frames
)

Options:

  • fps (float) — Frames per second to sample
  • num_frames (int) — Total number of frames to sample
🚧

Important: Use either fps or num_frames, not both.

mm_processor_kwargs

Type: dict[str, int] | None Default: None

Frame dimension parameters for video processing.

# Standard resolution
params = NIMSamplingParams(
    mm_processor_kwargs={
        "shortest_edge": 336,
        "longest_edge": 672
    }
)

# High resolution
params = NIMSamplingParams(
    mm_processor_kwargs={
        "shortest_edge": 672,
        "longest_edge": 1344
    }
)

Options:

  • shortest_edge (int) — Resize shortest edge to this value
  • longest_edge (int) — Resize longest edge to this value

Configuration examples

Balanced configuration

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-2b",
    port=8000,
    use_existing_container=True,
    stream_logs=True
)

params = NIMSamplingParams(
    temperature=0.7,
    max_tokens=1024,
    top_p=0.95,
    top_k=50
)

Production configuration

import os
from pathlib import Path

config = NIMConfig(
    nvidia_api_key=os.getenv("NGC_API_KEY"),
    secret_key=os.getenv("DATATURE_VI_SECRET_KEY"),
    organization_id=os.getenv("DATATURE_VI_ORGANIZATION_ID"),
    run_id="your-run-id",
    image_name="cosmos-reason2-2b",
    port=8000,
    shm_size="64GB",
    max_model_len=8192,
    local_cache_dir="/mnt/ssd/nim_cache",
    model_save_path=Path("/mnt/models"),
    use_existing_container=True,
    auto_kill_existing_container=False,
    stream_logs=False,
    force_pull=False,
    overwrite=False
)

params = NIMSamplingParams(
    temperature=0.3,
    max_tokens=2048,
    top_p=0.95,
    repetition_penalty=1.05,
    seed=42
)

Video analysis configuration

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-2b",
    port=8000,
    shm_size="64GB",
    max_model_len=16384
)

params = NIMSamplingParams(
    temperature=0.2,
    max_tokens=4096,
    media_io_kwargs={"fps": 2.0},
    mm_processor_kwargs={
        "shortest_edge": 336,
        "longest_edge": 672
    }
)

See also