Configure Generation

Configure generation

Configure how the model generates responses using temperature, sampling, and token limits for inference.

📋

Prerequisites

Learn how to run inference →

Overview

Generation configuration allows you to fine-tune VLM behavior during inference:


Basic Usage

Pass generation parameters via the generation_config dictionary:

from vi.inference import ViModel

model = ViModel(run_id="your-run-id")

result, error = model(
    source="image.jpg",
    user_prompt="Describe this image",
    generation_config={
        "max_new_tokens": 256,
        "temperature": 0.7,
        "top_p": 0.9,
        "do_sample": True
    }
)

Parameters Reference

max_new_tokens

Maximum number of tokens to generate.

Type: int Default: 1024 Range: 1 to 4096 (model-dependent)

# Short response
generation_config={"max_new_tokens": 50}

# Medium response
generation_config={"max_new_tokens": 256}

# Long response
generation_config={"max_new_tokens": 1024}

temperature

Controls randomness in generation. Lower = more deterministic, Higher = more creative.

Type: float Default: 1.0 Range: 0.0 to 2.0

# Deterministic (factual)
generation_config={"temperature": 0.0}

# Balanced
generation_config={"temperature": 0.7}

# Creative
generation_config={"temperature": 1.5}

top_p

Nucleus sampling threshold. Only considers tokens with cumulative probability up to top_p.

Type: float Default: 1.0 Range: 0.0 to 1.0

# More focused
generation_config={"top_p": 0.8}

# Balanced
generation_config={"top_p": 0.95}

# All tokens considered
generation_config={"top_p": 1.0}

top_k

Top-k sampling parameter. Considers only the top k most likely tokens.

Type: int Default: 50 Range: 1 to 100

# Very focused
generation_config={"top_k": 10}

# Balanced
generation_config={"top_k": 50}

# Diverse
generation_config={"top_k": 100}

do_sample

Whether to use sampling (True) or greedy decoding (False).

Type: bool Default: False

# Sampling (varied outputs)
generation_config={"do_sample": True}

# Greedy decoding (deterministic)
generation_config={"do_sample": False}

repetition_penalty

Penalty for repeating tokens. Higher values reduce repetition.

Type: float Default: 1.05 Range: 1.0 to 2.0

# No penalty
generation_config={"repetition_penalty": 1.0}

# Moderate penalty
generation_config={"repetition_penalty": 1.2}

# Strong penalty
generation_config={"repetition_penalty": 1.5}

seed

Random seed for reproducibility.

Type: int Default: 0

# Fixed seed (reproducible)
generation_config={"seed": 42}

# Random generation
generation_config={"seed": -1}

Common Configurations

Short Captions

result, error = model(
    source="image.jpg",
    user_prompt="Provide a brief caption",
    generation_config={
        "max_new_tokens": 50,
        "temperature": 0.3,
        "do_sample": True
    }
)

Detailed Descriptions

result, error = model(
    source="image.jpg",
    user_prompt="Provide a detailed description",
    generation_config={
        "max_new_tokens": 512,
        "temperature": 0.7,
        "top_p": 0.95,
        "do_sample": True
    }
)

Deterministic Output

result, error = model(
    source="image.jpg",
    user_prompt="What objects are visible?",
    generation_config={
        "temperature": 0.0,
        "do_sample": False,
        "seed": 42
    }
)

Creative Descriptions

result, error = model(
    source="image.jpg",
    user_prompt="Create an artistic description",
    generation_config={
        "max_new_tokens": 300,
        "temperature": 0.9,
        "top_p": 0.95,
        "do_sample": True,
        "repetition_penalty": 1.2
    }
)

See also