Deploy NIM Container

Deploy NIM container

Deploy NVIDIA NIM containers for optimized GPU-accelerated inference with vision-language models.

Overview

The NIMDeployer class handles the complete lifecycle of NIM container deployment:


Basic deployment

Deploy with default settings

Deploy a NIM container using your NGC API key:

from vi.deployment.nim import NIMDeployer, NIMConfig

# Create config with NGC API key
config = NIMConfig(nvidia_api_key="nvapi-...")

# Deploy container
deployer = NIMDeployer(config)
result = deployer.deploy()

print(f"Container ID: {result.container_id}")
print(f"Container name: {result.container_name}")
print(f"Port: {result.port}")
print(f"Available models: {result.available_models}")

Using environment variables

For better security, set your NGC API key as an environment variable:

export NGC_API_KEY="nvapi-..."

Then deploy without explicit credentials:

from vi.deployment.nim import NIMDeployer

# API key loaded from NGC_API_KEY environment variable
deployer = NIMDeployer()
result = deployer.deploy()

Custom port

Specify a custom port for the NIM service:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    port=8080  # Custom port
)

deployer = NIMDeployer(config)
result = deployer.deploy()

print(f"Service running on port {result.port}")

Deploying with custom weights

From Datature Vi

Deploy a model trained on Datature Vi with custom weights:

from vi.deployment.nim import NIMDeployer, NIMConfig

# Configure with Datature Vi credentials
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    secret_key="your-vi-secret-key",
    organization_id="your-org-id",
    run_id="your-run-id"  # Model from Datature Vi
)

deployer = NIMDeployer(config)
result = deployer.deploy()
💡

Using environment variables for Vi credentials

Set Vi credentials as environment variables for better security:

export NGC_API_KEY="nvapi-..."
export DATATURE_VI_SECRET_KEY="your-secret-key"
export DATATURE_VI_ORGANIZATION_ID="your-org-id"

Then deploy with just the run ID:

config = NIMConfig(run_id="your-run-id")
deployer = NIMDeployer(config)
result = deployer.deploy()

Custom save path

Specify where to save downloaded model weights:

from pathlib import Path

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    secret_key="your-vi-secret-key",
    organization_id="your-org-id",
    run_id="your-run-id",
    model_save_path=Path("./my_models")  # Custom save location
)

deployer = NIMDeployer(config)
result = deployer.deploy()

Force re-download

Force re-download of model weights even if they exist locally:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    run_id="your-run-id",
    overwrite=True  # Force re-download
)

deployer = NIMDeployer(config)
result = deployer.deploy()
🚧

LoRA Adapter Limitation

Models trained with LoRA adapters will only use the full base model weights when deployed with NIM. NVIDIA NIM does not currently support PEFT adapters, so LoRA adapter weights are not utilized during inference.


Selecting NIM images

Available images

Choose from supported NIM images:

# Cosmos Reason1 7B (default)
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason1-7b"
)

# Cosmos Reason2 2B (video support)
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-2b"
)

# Cosmos Reason2 8B (video support)
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-8b"
)

Image tags

Specify image tag (default is latest):

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    image_name="cosmos-reason2-2b",
    tag="1.0.0"  # Specific version
)

Container lifecycle management

Reusing existing containers

Reuse an existing container instead of creating a new one:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=True  # Default: True
)

# First deployment - creates container
deployer = NIMDeployer(config)
result = deployer.deploy()

# Second deployment - reuses existing container (instant)
result = deployer.deploy()

Benefits:

  • Instant deployment (no image pull or startup time)
  • Maintains container state
  • Saves resources

Auto-kill existing containers

Automatically stop and remove existing containers with the same name:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=False,  # Don't reuse
    auto_kill_existing_container=True  # Auto-remove conflicts
)

deployer = NIMDeployer(config)
result = deployer.deploy()  # Removes any existing container first
🚧

Destructive operation

Setting auto_kill_existing_container=True will stop and remove any existing container with the same name. Use with caution in shared environments.

Stopping containers

Stop a running NIM container:

from vi.deployment.nim import NIMDeployer

# Stop by container name
success = NIMDeployer.stop("cosmos-reason2-2b")

if success:
    print("Container stopped successfully")
else:
    print("Container not found")

Or use the container name from deployment result:

# Deploy
deployer = NIMDeployer(config)
result = deployer.deploy()

# Stop using result
NIMDeployer.stop(result.container_name)

Deployment options

Streaming logs

Stream container logs to terminal during startup:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    stream_logs=True  # Default: True
)

deployer = NIMDeployer(config)
result = deployer.deploy()  # Shows real-time logs

Disable log streaming for cleaner output:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    stream_logs=False
)

deployer = NIMDeployer(config)
result = deployer.deploy()  # No log output

Quiet mode

Suppress all console output:

config = NIMConfig(nvidia_api_key="nvapi-...")

# Enable quiet mode
deployer = NIMDeployer(config, quiet=True)
result = deployer.deploy()  # No output

Useful for:

  • Automated scripts
  • CI/CD pipelines
  • Logging to files instead of console

Force image pull

Always pull the latest image even if it exists locally:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    force_pull=True  # Always pull latest
)

deployer = NIMDeployer(config)
result = deployer.deploy()

When to use:

  • Testing image updates
  • Ensuring latest security patches
  • Debugging image issues

Resource configuration

Shared memory size

Configure shared memory for the container:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    shm_size="64GB"  # Default: 32GB
)

deployer = NIMDeployer(config)
result = deployer.deploy()

Guidelines:

  • 32GB — Sufficient for most models
  • 64GB — Large models or high batch sizes
  • 16GB — Small models on limited hardware

Maximum model length

Set maximum context length:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    max_model_len=16384  # Default: 8192
)

deployer = NIMDeployer(config)
result = deployer.deploy()

Options:

  • 4096 — Short contexts, faster inference
  • 8192 — Default, balanced performance
  • 16384 — Long contexts, higher memory usage

Cache directory

Specify local cache directory for NIM:

from pathlib import Path

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    local_cache_dir=str(Path.home() / ".cache" / "nim")
)

deployer = NIMDeployer(config)
result = deployer.deploy()

Default location: ~/.cache/nim


Deployment result

The deploy() method returns a NIMDeploymentResult object:

result = deployer.deploy()

# Access deployment information
print(f"Container ID: {result.container_id}")
print(f"Container name: {result.container_name}")
print(f"Port: {result.port}")
print(f"Available models: {result.available_models}")

Result attributes:

  • container_id — Full Docker container ID
  • container_name — Container name (e.g., cosmos-reason2-2b)
  • port — Port where service is running
  • available_models — List of model IDs available in the container

Error handling

Handle deployment errors

Implement proper error handling:

from vi.deployment.nim import NIMDeployer, NIMConfig
from vi.deployment.nim.exceptions import (
    InvalidConfigError,
    ContainerExistsError,
    ModelIncompatibilityError
)

try:
    config = NIMConfig(nvidia_api_key="nvapi-...")
    deployer = NIMDeployer(config)
    result = deployer.deploy()

    print(f"✓ Deployed successfully on port {result.port}")

except InvalidConfigError as e:
    print(f"✗ Invalid configuration: {e}")
    # Check API key format, image name, etc.

except ContainerExistsError as e:
    print(f"✗ Container '{e.container_name}' already exists")
    # Either reuse with use_existing_container=True
    # Or remove with auto_kill_existing_container=True

except ModelIncompatibilityError as e:
    print(f"✗ Model incompatible with {e.image_name}")
    print(f"Details: {e.details}")
    # Check model architecture compatibility

except Exception as e:
    print(f"✗ Deployment failed: {e}")

Common errors

Invalid API key

# Error: InvalidConfigError
# Cause: API key doesn't start with 'nvapi-'
# Fix: Check your NGC API key format
config = NIMConfig(nvidia_api_key="nvapi-...")

Container already exists

# Error: ContainerExistsError
# Solution 1: Reuse existing container
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=True
)

# Solution 2: Auto-remove existing
config = NIMConfig(
    nvidia_api_key="nvapi-...",
    auto_kill_existing_container=True
)

Model incompatibility

# Error: ModelIncompatibilityError
# Cause: Custom model architecture not supported by container
# Fix: Use compatible NIM image or base model

Complete deployment example

Full example with all options:

from vi.deployment.nim import NIMDeployer, NIMConfig
from pathlib import Path
import os

# Configure deployment
config = NIMConfig(
    # NGC credentials
    nvidia_api_key=os.getenv("NGC_API_KEY"),

    # Image selection
    image_name="cosmos-reason2-2b",
    tag="latest",

    # Vi credentials for custom weights
    secret_key=os.getenv("DATATURE_VI_SECRET_KEY"),
    organization_id=os.getenv("DATATURE_VI_ORGANIZATION_ID"),
    run_id="your-run-id",

    # Model paths
    model_save_path=Path("./models"),
    overwrite=False,

    # Container configuration
    port=8000,
    shm_size="32GB",
    max_model_len=8192,
    local_cache_dir=str(Path.home() / ".cache" / "nim"),

    # Lifecycle options
    use_existing_container=True,
    auto_kill_existing_container=False,

    # Output options
    stream_logs=True,
    force_pull=False
)

# Deploy with error handling
try:
    deployer = NIMDeployer(config, quiet=False)
    result = deployer.deploy()

    print("\n" + "="*50)
    print("Deployment successful!")
    print("="*50)
    print(f"Container ID: {result.container_id}")
    print(f"Container name: {result.container_name}")
    print(f"Port: {result.port}")
    print(f"Available models: {', '.join(result.available_models or [])}")

except Exception as e:
    print(f"\nDeployment failed: {e}")
    raise

Best practices

1. Use environment variables

Store sensitive credentials in environment variables for security:

# .env file
NGC_API_KEY=nvapi-...
DATATURE_VI_SECRET_KEY=your-secret-key
DATATURE_VI_ORGANIZATION_ID=your-org-id
import os
from dotenv import load_dotenv

load_dotenv()

config = NIMConfig(
    nvidia_api_key=os.getenv("NGC_API_KEY"),
    secret_key=os.getenv("DATATURE_VI_SECRET_KEY"),
    organization_id=os.getenv("DATATURE_VI_ORGANIZATION_ID"),
    run_id="your-run-id"
)

2. Reuse containers in development

Enable container reuse for faster iteration:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    use_existing_container=True  # Instant redeployment
)

3. Monitor deployment progress

Stream logs to monitor deployment:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    stream_logs=True  # Watch container startup
)

deployer = NIMDeployer(config)
result = deployer.deploy()

4. Clean up resources

Stop containers when done to free GPU resources:

try:
    # Deploy and use container
    result = deployer.deploy()
    # ... run inference ...

finally:
    # Clean up
    NIMDeployer.stop(result.container_name)

5. Handle errors gracefully

Always use try-except for deployment error handling:

from vi.deployment.nim.exceptions import NIMDeploymentError

try:
    result = deployer.deploy()
except NIMDeploymentError as e:
    print(f"Deployment failed: {e}")
    # Handle error appropriately

View common deployment errors →


Troubleshooting

Deployment hangs

If deployment appears to hang:

# Check container logs
docker logs cosmos-reason2-2b

# Check GPU availability
nvidia-smi

# Check Docker daemon
docker info

Out of GPU memory

Reduce memory usage:

config = NIMConfig(
    nvidia_api_key="nvapi-...",
    max_model_len=4096,  # Reduce context length
    shm_size="16GB"      # Reduce shared memory
)

Image pull fails

Check registry authentication:

# Verify NGC API key
echo $NGC_API_KEY

# Test Docker login
docker login nvcr.io
# Username: $oauthtoken
# Password: <your-ngc-api-key>

More troubleshooting →


See also