Inference
The Vi SDK lets you load trained vision-language models (VLMs) and run inference on images or videos from Python. You get structured outputs for visual question answering and phrase grounding, with built-in support for batch processing, streaming, and quantized loading. Video uses the same call pattern as images; see Run Inference.
The Vi SDK works with models trained on the Datature Vi platform. Follow the quickstart to train your first model, or install the SDK if you already have one.
- Vi SDK installed with inference dependencies (
pip install vi-sdk[inference]) - A trained model or access to a HuggingFace model
- Your secret key and organization ID
- Familiarity with task types (VQA or phrase grounding)
Quick start
from vi.inference import ViModel
model = ViModel(
run_id="your-run-id",
secret_key="your-secret-key",
organization_id="your-organization-id"
)
result, error = model(
source="/path/to/image.jpg",
user_prompt="What objects are in this image?"
)
if error is None:
print(f"Result: {result.result}")For streaming mode with real-time token output:
stream = model(
source="image.jpg",
user_prompt="Describe this image",
stream=True
)
for token in stream:
print(token, end="", flush=True)
result = stream.get_final_completion()
print(f"\n\nFinal: {result.result}")What Datature Vi supports
Supported models: Qwen3.5, Qwen3-VL, Qwen2.5-VL, NVILA-Lite, Cosmos-Reason1, Cosmos-Reason2, InternVL3.5
Coming soon: DeepSeek OCR, Gemma 4, LLaVA-NeXT
Task types:
How inference works
Non-streaming (default)
Calling model(...) returns a (result, error) tuple. This pattern makes error handling explicit and works well for batch processing and automated workflows.
result, error = model(source="image.jpg", user_prompt="What's in this image?")
if error is None:
print(result.result)
else:
print(f"Error: {error}")Streaming
Pass stream=True to get token-by-token output as the model generates it. Call stream.get_final_completion() when you need the complete structured result.
Learn more about inference modes →
Model loading
Load from Datature Vi, HuggingFace, or a local path. The SDK caches models locally after the first download.
# From Datature Vi
model = ViModel(run_id="your-run-id")
# From HuggingFace
model = ViModel(pretrained_model_name_or_path="Qwen/Qwen2.5-VL-7B-Instruct")
# With 8-bit quantization (cuts memory ~50%)
model = ViModel(run_id="your-run-id", load_in_8bit=True, device_map="auto")Batch processing
Pass a list of paths, a folder path, or mix both. The SDK processes them and returns one (result, error) tuple per file (image paths, or video paths when you pass an explicit list; folders are expanded to images only).
results = model(
source="./images/",
user_prompt="Describe this image",
recursive=True,
show_progress=True
)
for result, error in results:
if error is None:
print(result.result)Common workflows
Dataset annotation
results = model(
source="./unlabeled_images/",
user_prompt="Describe this image concisely",
recursive=True,
show_progress=True
)
annotations = [r.result for r, e in results if e is None]Quality control
test_cases = [
{"image": "defect1.jpg", "expected": "defect"},
{"image": "good1.jpg", "expected": "no defect"}
]
for test in test_cases:
result, error = model(
source=test["image"],
user_prompt="Does this part have defects?"
)
match = test["expected"] in str(result.result).lower() if error is None else False
print(f"{'PASS' if match else 'FAIL'}: {test['image']}")Model comparison
models = {
"v1": ViModel(run_id="run_v1"),
"v2": ViModel(run_id="run_v2")
}
for name, m in models.items():
result, error = m(source="test.jpg")
if error is None:
print(f"{name}: {result.result}")Memory and GPU tips
# ~50% memory reduction
model = ViModel(run_id="your-run-id", load_in_8bit=True, device_map="auto")# ~75% memory reduction
model = ViModel(run_id="your-run-id", load_in_4bit=True, device_map="auto")# Faster inference on Ampere+ GPUs
model = ViModel(
run_id="your-run-id",
attn_implementation="flash_attention_2",
dtype="float16",
device_map="auto"
)Next steps
Updated 30 days ago
