Configure Generation
Pass a generation_config dictionary to model(...) to control how Datature Vi generates responses during inference. You can tune output length, randomness, sampling behavior, and repetition.
- A loaded model ready for inference
- Familiarity with inference basics
- Understanding of task types (VQA or phrase grounding)
Generation parameters control how the model writes its response, token by token. These are the same settings available during training evaluation, but here you can change them at inference time without retraining.
Basic usage
from vi.inference import ViModel
model = ViModel(run_id="your-run-id")
result, error = model(
source="image.jpg",
user_prompt="Describe this image",
generation_config={
"max_new_tokens": 256,
"temperature": 0.7,
"top_p": 0.9,
"do_sample": True
}
)Parameters
Video frame sampling (fps)
fps)Video inputs use the same generation_config fields as images (temperature, max_new_tokens, and so on). Frame sampling rate for video is controlled separately: pass fps as a keyword argument on model(...), not inside the generation_config dictionary. Optional; default 4.0 (frames sampled per second of source video for preprocessing). Ignored when the input is not video.
Run inference on video files, URLs, and batch lists →
Chain-of-thought (cot)
cot)Chain-of-thought decoding is enabled on the model(...) call, not inside generation_config. Pass cot=True together with your source and prompt. Behavior and token budget notes are covered in Run Inference.
Parameter details
max_new_tokens
Controls how long the response can be. For short captions, 50-100 tokens is plenty. For detailed descriptions or long CoT traces, you may need 512 or more (with cot=True and a dict generation_config that omits max_new_tokens, the SDK raises the cap automatically; see Run Inference).
# Short response
generation_config={"max_new_tokens": 50}
# Medium response
generation_config={"max_new_tokens": 256}
# Long response
generation_config={"max_new_tokens": 1024}temperature
Lower temperature makes the model repeat its highest-probability token choices more consistently. At 0.0, the output is fully deterministic. At 1.5 or above, outputs become more varied and sometimes unpredictable.
# Deterministic: best for factual Q&A
generation_config={"temperature": 0.0}
# Balanced
generation_config={"temperature": 0.7}
# Creative: more varied outputs
generation_config={"temperature": 1.5}top_p and top_k
These two parameters work together to limit which tokens the model considers at each step:
top_p(nucleus sampling) limits to tokens covering the topppercent of probability masstop_klimits to the topkmost likely tokens
# Focused output
generation_config={"top_p": 0.8, "top_k": 10}
# Balanced output
generation_config={"top_p": 0.95, "top_k": 50}
# Consider all tokens
generation_config={"top_p": 1.0, "top_k": 100}do_sample
Set do_sample=True when using temperature, top_p, or top_k. Sampling parameters only take effect when sampling is enabled. Without it, the model uses greedy decoding regardless of other settings.
# Sampling (varied outputs)
generation_config={"do_sample": True, "temperature": 0.7}
# Greedy decoding (fully deterministic)
generation_config={"do_sample": False}repetition_penalty
A value of 1.0 applies no penalty. Increase it if the model repeats phrases in long outputs.
# No penalty
generation_config={"repetition_penalty": 1.0}
# Moderate: good for long descriptions
generation_config={"repetition_penalty": 1.2}
# Strong: aggressive anti-repetition
generation_config={"repetition_penalty": 1.5}seed
Use a fixed seed to get the same output for the same input. Set to -1 for random behavior.
generation_config={"seed": 42}Common configurations
result, error = model(
source="image.jpg",
user_prompt="Provide a brief caption",
generation_config={
"max_new_tokens": 50,
"temperature": 0.3,
"do_sample": True
}
)result, error = model(
source="image.jpg",
user_prompt="Provide a detailed description",
generation_config={
"max_new_tokens": 512,
"temperature": 0.7,
"top_p": 0.95,
"do_sample": True
}
)result, error = model(
source="image.jpg",
user_prompt="What objects are visible?",
generation_config={
"temperature": 0.0,
"do_sample": False,
"seed": 42
}
)result, error = model(
source="image.jpg",
user_prompt="Create an artistic description",
generation_config={
"max_new_tokens": 300,
"temperature": 0.9,
"top_p": 0.95,
"do_sample": True,
"repetition_penalty": 1.2
}
)Next steps
Updated 30 days ago
