Configure Generation
Configure generation
Configure how the model generates responses using temperature, sampling, and token limits for inference.
Prerequisites
- A loaded model ready for inference
- Understanding of inference basics
- Familiarity with task types — VQA or phrase grounding
Overview
Generation configuration allows you to fine-tune VLM behavior during inference:
- Output length — Control token generation limits with max_new_tokens
- Randomness — Adjust temperature for creativity vs precision
- Sampling — Configure sampling strategies
- Determinism — Enable reproducible outputs
- Penalties — Prevent repetition
Basic Usage
Pass generation parameters via the generation_config dictionary:
from vi.inference import ViModel
model = ViModel(run_id="your-run-id")
result, error = model(
source="image.jpg",
user_prompt="Describe this image",
generation_config={
"max_new_tokens": 256,
"temperature": 0.7,
"top_p": 0.9,
"do_sample": True
}
)Parameters Reference
max_new_tokens
Maximum number of tokens to generate.
Type: int
Default: 1024
Range: 1 to 4096 (model-dependent)
# Short response
generation_config={"max_new_tokens": 50}
# Medium response
generation_config={"max_new_tokens": 256}
# Long response
generation_config={"max_new_tokens": 1024}temperature
Controls randomness in generation. Lower = more deterministic, Higher = more creative.
Type: float
Default: 1.0
Range: 0.0 to 2.0
# Deterministic (factual)
generation_config={"temperature": 0.0}
# Balanced
generation_config={"temperature": 0.7}
# Creative
generation_config={"temperature": 1.5}top_p
Nucleus sampling threshold. Only considers tokens with cumulative probability up to top_p.
Type: float
Default: 1.0
Range: 0.0 to 1.0
# More focused
generation_config={"top_p": 0.8}
# Balanced
generation_config={"top_p": 0.95}
# All tokens considered
generation_config={"top_p": 1.0}top_k
Top-k sampling parameter. Considers only the top k most likely tokens.
Type: int
Default: 50
Range: 1 to 100
# Very focused
generation_config={"top_k": 10}
# Balanced
generation_config={"top_k": 50}
# Diverse
generation_config={"top_k": 100}do_sample
Whether to use sampling (True) or greedy decoding (False).
Type: bool
Default: False
# Sampling (varied outputs)
generation_config={"do_sample": True}
# Greedy decoding (deterministic)
generation_config={"do_sample": False}repetition_penalty
Penalty for repeating tokens. Higher values reduce repetition.
Type: float
Default: 1.05
Range: 1.0 to 2.0
# No penalty
generation_config={"repetition_penalty": 1.0}
# Moderate penalty
generation_config={"repetition_penalty": 1.2}
# Strong penalty
generation_config={"repetition_penalty": 1.5}seed
Random seed for reproducibility.
Type: int
Default: 0
# Fixed seed (reproducible)
generation_config={"seed": 42}
# Random generation
generation_config={"seed": -1}Common Configurations
Short Captions
result, error = model(
source="image.jpg",
user_prompt="Provide a brief caption",
generation_config={
"max_new_tokens": 50,
"temperature": 0.3,
"do_sample": True
}
)Detailed Descriptions
result, error = model(
source="image.jpg",
user_prompt="Provide a detailed description",
generation_config={
"max_new_tokens": 512,
"temperature": 0.7,
"top_p": 0.95,
"do_sample": True
}
)Deterministic Output
result, error = model(
source="image.jpg",
user_prompt="What objects are visible?",
generation_config={
"temperature": 0.0,
"do_sample": False,
"seed": 42
}
)Creative Descriptions
result, error = model(
source="image.jpg",
user_prompt="Create an artistic description",
generation_config={
"max_new_tokens": 300,
"temperature": 0.9,
"top_p": 0.95,
"do_sample": True,
"repetition_penalty": 1.2
}
)See also
- Inference Overview — Getting started with inference
- Running Inference — Execute predictions on images
- Task Types — VQA and Phrase Grounding explained
- Result Handling — Process and visualize results
Need help?
We're here to support your VLMOps journey. Reach out through any of these channels:
Updated about 1 month ago
