Configure Generation

Configure generation

Configure how the model generates responses using temperature, sampling, and token limits for inference.

📋
Prerequisites

A loaded model ready for inference

Understanding of inference basics

Familiarity with task types — VQA or phrase grounding

Learn how to run inference →

Overview

Generation configuration allows you to fine-tune VLM behavior during inference:

Output length — Control token generation limits with max_new_tokens
Randomness — Adjust temperature for creativity vs precision
Sampling — Configure sampling strategies
Determinism — Enable reproducible outputs
Penalties — Prevent repetition

Basic Usage

Pass generation parameters via the generation_config dictionary:

from vi.inference import ViModel

model = ViModel(run_id="your-run-id")

result, error = model(
    source="image.jpg",
    user_prompt="Describe this image",
    generation_config={
        "max_new_tokens": 256,
        "temperature": 0.7,
        "top_p": 0.9,
        "do_sample": True
    }
)

Parameters Reference

max_new_tokens

Maximum number of tokens to generate.

Type: int Default: 1024 Range: 1 to 4096 (model-dependent)

# Short response
generation_config={"max_new_tokens": 50}

# Medium response
generation_config={"max_new_tokens": 256}

# Long response
generation_config={"max_new_tokens": 1024}

temperature

Controls randomness in generation. Lower = more deterministic, Higher = more creative.

Type: float Default: 1.0 Range: 0.0 to 2.0

# Deterministic (factual)
generation_config={"temperature": 0.0}

# Balanced
generation_config={"temperature": 0.7}

# Creative
generation_config={"temperature": 1.5}

top_p

Nucleus sampling threshold. Only considers tokens with cumulative probability up to top_p.

Type: float Default: 1.0 Range: 0.0 to 1.0

# More focused
generation_config={"top_p": 0.8}

# Balanced
generation_config={"top_p": 0.95}

# All tokens considered
generation_config={"top_p": 1.0}

top_k

Top-k sampling parameter. Considers only the top k most likely tokens.

Type: int Default: 50 Range: 1 to 100

# Very focused
generation_config={"top_k": 10}

# Balanced
generation_config={"top_k": 50}

# Diverse
generation_config={"top_k": 100}

do_sample

Whether to use sampling (True) or greedy decoding (False).

Type: bool Default: False

# Sampling (varied outputs)
generation_config={"do_sample": True}

# Greedy decoding (deterministic)
generation_config={"do_sample": False}

repetition_penalty

Penalty for repeating tokens. Higher values reduce repetition.

Type: float Default: 1.05 Range: 1.0 to 2.0

# No penalty
generation_config={"repetition_penalty": 1.0}

# Moderate penalty
generation_config={"repetition_penalty": 1.2}

# Strong penalty
generation_config={"repetition_penalty": 1.5}

seed

Random seed for reproducibility.

Type: int Default: 0

# Fixed seed (reproducible)
generation_config={"seed": 42}

# Random generation
generation_config={"seed": -1}

Common Configurations

Short Captions

result, error = model(
    source="image.jpg",
    user_prompt="Provide a brief caption",
    generation_config={
        "max_new_tokens": 50,
        "temperature": 0.3,
        "do_sample": True
    }
)

Detailed Descriptions

result, error = model(
    source="image.jpg",
    user_prompt="Provide a detailed description",
    generation_config={
        "max_new_tokens": 512,
        "temperature": 0.7,
        "top_p": 0.95,
        "do_sample": True
    }
)

Deterministic Output

result, error = model(
    source="image.jpg",
    user_prompt="What objects are visible?",
    generation_config={
        "temperature": 0.0,
        "do_sample": False,
        "seed": 42
    }
)

Creative Descriptions

result, error = model(
    source="image.jpg",
    user_prompt="Create an artistic description",
    generation_config={
        "max_new_tokens": 300,
        "temperature": 0.9,
        "top_p": 0.95,
        "do_sample": True,
        "repetition_penalty": 1.2
    }
)

Need help?

We're here to support your VLMOps journey. Reach out through any of these channels:

Contact Support

Get help from our team via our website or email us at [email protected]

Join Our Community

Connect with other Datature users, share ideas, and get community support on Slack

Explore Resources

Read our Blog
Check out GitHub
Watch Tutorials

Schedule a Demo

Book a personalized demo to see how Datature Vi can accelerate your vision AI projects