Run Inference
Run inference
Execute predictions on images using loaded VLM models with single or batch inference modes.
Prerequisites
- A loaded model from Datature Vi or HuggingFace
- Images in supported formats (
.jpg,.jpeg,.png,.bmp,.gif,.tiff,.webp)- Understanding of streaming vs non-streaming modes
Overview
Once you've loaded a model, you can run inference using the model() call. The Vi SDK automatically handles:
- Single or batch inference — Automatically detected based on input
- Non-streaming by default — Returns
(result, error)tuple for explicit error handling (usestream=Truefor real-time token generation) - Error handling — Consistent
(result, error)tuple pattern in non-streaming mode - Progress tracking — Built-in progress bars for batch processing
- Folder processing — Process entire directories recursively with recursive search
Streaming vs non-streaming
Non-streaming is the default modeWhen you call
model(...)without specifyingstream, it defaults to non-streaming mode and returns a(result, error)tuple for explicit error handling. Setstream=Trueto enable real-time token generation.
Streaming mode
Enable streaming mode with stream=True for real-time token generation in visual question answering tasks:
# Streaming mode - set stream=True to enable
stream = model(
source="image.jpg",
user_prompt="Describe this image",
stream=True # Enable streaming
)
# Iterate through tokens
for token in stream:
print(token, end="", flush=True)
# Get final result
result = stream.get_final_completion()
print(f"\n\n{result.caption}")Use when:
- Building interactive applications with real-time feedback
- Displaying results in real-time to users
- Immediate feedback is important for VQA tasks
Non-streaming mode (default)
Non-streaming is the default mode. Returns complete results as (result, error) tuples for explicit error handling:
# Non-streaming mode (default) - returns (result, error) tuple
result, error = model(
source="image.jpg",
user_prompt="Describe this image"
# stream=False is the default, no need to specify
)
if error is None:
# Access result fields based on response type
# See prediction schemas for all available fields
print(result.result) # Generic access
else:
print(f"Error: {error}")Use when:
- You need explicit error handling
- Processing batch of images
- Implementing automated workflows
- Storing results in databases
Single image inference
Basic usage
Run inference on a single image with non-streaming mode (default):
from vi.inference import ViModel
# Load model
model = ViModel(
run_id="your-run-id",
secret_key="your-secret-key",
organization_id="your-organization-id"
)
# Run inference (non-streaming is default)
result, error = model(
source="/path/to/image.jpg",
user_prompt="What objects are in this image?"
)
if error is None:
print(f"Result: {result.result}")
else:
print(f"Error: {error}")
# For detailed information on accessing result fields, see prediction schemasFor real-time token generation with streaming mode:
# Set stream=True for streaming
stream = model(
source="image.jpg",
user_prompt="What objects are in this image?",
stream=True # Enable streaming
)
# Iterate through tokens as they're generated
for token in stream:
print(token, end="", flush=True)
# Get final result
result = stream.get_final_completion()
# Access result fields based on response type (see prediction schemas)
print(f"\n\n{result.result}")With generation config
Control the output generation with generation parameters:
result, error = model(
source="image.jpg",
user_prompt="Describe this image in detail",
generation_config={
"max_new_tokens": 256,
"temperature": 0.7,
"top_p": 0.9
}
)Learn more about generation config →
Different image sources
Support for various image formats and paths with Vi SDK:
# Local file path
result, error = model(source="./images/photo.jpg")
# Absolute path
result, error = model(source="/home/user/images/photo.jpg")
# User home directory
result, error = model(source="~/Pictures/photo.png")Supported formats: .jpg, .jpeg, .png, .bmp, .gif, .tiff, .tif, .webp
Batch inference
Process multiple images efficiently with automatic progress tracking and error handling.
Batch inference benefits
- Automatic progress bars — Track processing status in real-time
- Individual error handling — Failed images don't stop the batch
- Memory efficient — Processes images sequentially to manage GPU memory
- Flexible inputs — Mix files, folders, and paths in a single call
Process list of files
# List of image paths
image_paths = ["image1.jpg", "image2.jpg", "image3.jpg"]
# Run batch inference
results = model(
source=image_paths,
user_prompt="Describe this image",
show_progress=True # Shows progress bar (default)
)
# Process results - each item is a (result, error) tuple
for i, (result, error) in enumerate(results):
if error is None:
# Access result fields based on response type
# See prediction schemas documentation for details
print(f"Image {i+1}: {result.result}")
else:
print(f"Image {i+1} failed: {error}")Process entire folder
Automatically process all images in a directory with supported formats:
# Process all images in a folder
results = model(
source="./my_images/",
user_prompt="Describe this image",
show_progress=True
)
# Count successes
successful = sum(1 for _, error in results if error is None)
total = len(results)
print(f"Processed {successful}/{total} images successfully")Recursive directory search
Search and process images in subdirectories recursively:
# Process all images in folder and all subdirectories
results = model(
source="./dataset/",
user_prompt="What's in this image?",
recursive=True, # Search subdirectories recursively
show_progress=True
)
print(f"Processed {len(results)} images across all subdirectories")Mix files and folders
Combine individual files and folders in a single batch inference call:
# Mix files and folders in the same call
results = model(
source=[
"./image1.jpg", # Single file
"./folder1/", # All images in folder1
"~/Pictures/photo.png", # User path
"./dataset/", # All images in dataset
],
user_prompt="Analyze this image",
recursive=False, # Only immediate folder contents
show_progress=True
)Different prompts per image
Provide different user prompts for each image in batch processing:
images = ["car.jpg", "person.jpg", "building.jpg"]
prompts = [
"What color is the car?",
"How many people are visible?",
"What type of building is this?"
]
# Each image gets its own prompt
results = model(
source=images,
user_prompt=prompts,
show_progress=True
)
for image, prompt, (result, error) in zip(images, prompts, results):
if error is None:
# Access fields based on response type (see prediction schemas)
print(f"{image} - {prompt}: {result.result}")
else:
print(f"{image} failed: {error}")
Prompt Length MatchingWhen providing a list of prompts, the length must match the number of images:
# ✅ Good - matching lengths results = model( source=["img1.jpg", "img2.jpg"], user_prompt=["Prompt 1", "Prompt 2"] ) # ❌ Bad - length mismatch (raises ValueError) results = model( source=["img1.jpg", "img2.jpg"], user_prompt=["Prompt 1"] # Wrong! )
Progress tracking
Enable or disable progress bar
Control progress display for batch inference operations:
# With progress bar (default)
results = model(
source=image_list,
user_prompt="Describe this",
show_progress=True # Default
)
# Without progress bar (silent mode)
results = model(
source=image_list,
user_prompt="Describe this",
show_progress=False
)Progress information
The progress bar shows:
- Current image number / total images
- Processing speed (images per second)
- Estimated time remaining
- Real-time success/failure counts
Running batch inference (45 / 100 images)... ━━━━━━╸━━━━━━━━ 45% 0:02:15
Performance tipFor large batch jobs, enable progress tracking to monitor processing speed and identify bottlenecks. Disable it in automated scripts to reduce overhead.
Error handling
Consistent error pattern
All inference calls return (result, error) tuples by default (non-streaming mode):
result, error = model(source="image.jpg")
if error is None:
# Success - process result
# See prediction schemas for accessing specific fields
print(f"Success: {result.result}")
else:
# Error - handle appropriately
print(f"Failed: {error}")
# Check error type for specific handling
if isinstance(error, FileNotFoundError):
print("Image file not found")
elif "out of memory" in str(error).lower():
print("GPU out of memory")Batch error handling
Each image in a batch has its own error status:
results = model(
source=["img1.jpg", "missing.jpg", "img3.jpg"],
user_prompt="Describe this",
show_progress=True
)
successful = []
failed = []
for img, (result, error) in zip(images, results):
if error is None:
successful.append((img, result))
else:
failed.append((img, error))
print(f"Successful: {len(successful)}")
print(f"Failed: {len(failed)}")
# Process failures
for img, error in failed:
print(f" {img}: {type(error).__name__} - {error}")Graceful degradation
Continue batch processing even if some images fail:
results = model(
source="./images/",
user_prompt="Analyze this",
recursive=True,
show_progress=True
)
# Batch inference continues even if individual images fail
valid_results = [r for r, e in results if e is None]
print(f"Successfully processed {len(valid_results)} images")
# Handle failures separately
failures = [(r, e) for r, e in results if e is not None]
for result, error in failures:
# Log or retry failed images
print(f"Failed with: {error}")Common workflows
Save results to JSON
Export inference results to JSON format for later analysis:
import json
from pathlib import Path
# Process folder
results = model(
source="./images/",
user_prompt="Describe this image",
show_progress=True
)
# Save results
from vi.inference.task_types.phrase_grounding import PhraseGroundingResponse
output_data = []
for result, error in results:
output_data.append({
"result": str(result.result) if error is None else None,
"error": str(error) if error else None,
"has_grounding": isinstance(result, PhraseGroundingResponse) if error is None else False
})
with open("results.json", "w") as f:
json.dump(output_data, f, indent=2)
print(f"Saved {len(output_data)} results to results.json")Process with metadata
Track additional information alongside inference results:
from pathlib import Path
from datetime import datetime
from vi.inference.task_types.vqa import VQAResponse
from vi.inference.task_types.phrase_grounding import PhraseGroundingResponse
image_dir = Path("./test_images")
image_files = list(image_dir.glob("*.jpg"))
results = model(
source=image_files,
user_prompt="Describe this image",
show_progress=True
)
# Helper to extract text from any response type
def get_text(result):
if isinstance(result, VQAResponse):
return result.result.answer
elif isinstance(result, PhraseGroundingResponse):
return result.result.sentence
else:
return result.result
# Save with metadata
output = []
for img_path, (result, error) in zip(image_files, results):
output.append({
"filename": img_path.name,
"path": str(img_path),
"timestamp": datetime.now().isoformat(),
"text": get_text(result) if error is None else None,
"error": str(error) if error else None,
"success": error is None
})
# Calculate statistics
success_rate = sum(1 for item in output if item["success"]) / len(output)
print(f"Success rate: {success_rate:.1%}")Retry failed images
Implement retry logic for failed images with error handling:
def process_with_retry(model, images, max_retries=3):
"""Process images with retry logic."""
results = {}
remaining = list(images)
for attempt in range(max_retries):
if not remaining:
break
print(f"Attempt {attempt + 1}/{max_retries} - Processing {len(remaining)} images")
batch_results = model(
source=remaining,
user_prompt="Describe this image",
show_progress=True
)
new_remaining = []
for img, (result, error) in zip(remaining, batch_results):
if error is None:
results[img] = result
else:
new_remaining.append(img)
remaining = new_remaining
return results, remaining
# Usage
successful, failed = process_with_retry(model, image_list)
print(f"Successful: {len(successful)}, Failed: {len(failed)}")Chunked processing for large datasets
Process large datasets in chunks for better memory management:
from pathlib import Path
def process_in_chunks(model, image_dir, chunk_size=100):
"""Process images in chunks to manage memory."""
image_dir = Path(image_dir)
all_images = list(image_dir.glob("*.jpg"))
print(f"Processing {len(all_images)} images in chunks of {chunk_size}")
all_results = []
for i in range(0, len(all_images), chunk_size):
chunk = all_images[i:i+chunk_size]
print(f"\nChunk {i//chunk_size + 1}: Processing {len(chunk)} images...")
results = model(
source=chunk,
user_prompt="Describe this image",
show_progress=True
)
all_results.extend(results)
# Optional: Clear GPU cache between chunks
if torch.cuda.is_available():
torch.cuda.empty_cache()
return all_results
# Usage
results = process_in_chunks(model, "./large_dataset", chunk_size=100)Best practices
Follow these best practices for efficient inference with the Vi SDK:
Use batch inference
Process multiple images at once with native batch support:
# ✅ Good - native batch inference
results = model(
source=["img1.jpg", "img2.jpg", "img3.jpg"],
user_prompt="Describe this",
show_progress=True
)
# ❌ Bad - manual loop (slower, no progress)
results = []
for image in ["img1.jpg", "img2.jpg", "img3.jpg"]:
result, error = model(source=image)
results.append((result, error))Handle errors gracefully
Always check error status in your code:
# ✅ Good - proper error handling
result, error = model(source="image.jpg")
if error is None:
print(result.result)
else:
logging.error(f"Inference failed: {error}")
# ❌ Bad - assuming success
result, _ = model(source="image.jpg")
print(result.result) # May crash if error occurredUse progress bars
Enable progress tracking for batch jobs:
# ✅ Good - with progress tracking
results = model(
source="./images/",
user_prompt="Describe this",
show_progress=True # Default
)
# ❌ Bad - no feedback for long-running jobs
results = model(
source="./images/",
user_prompt="Describe this",
show_progress=False
)Process folders directly
Use folder paths instead of manual file listing:
# ✅ Good - direct folder processing
results = model(
source="./images/",
user_prompt="Describe this",
recursive=True
)
# ❌ Bad - manual file listing
from pathlib import Path
images = [str(p) for p in Path("./images").rglob("*.jpg")]
results = model(source=images, user_prompt="Describe this")Reuse model instance
Create model once, reuse many times for better performance:
# ✅ Good - reuse model
model = ViModel(run_id="your-run-id")
for image in images:
result, error = model(source=image)
# ❌ Bad - recreate model each time
for image in images:
model = ViModel(run_id="your-run-id") # Wasteful!
result, error = model(source=image)Performance tips
Optimize inference performance with these techniques:
Memory management
Clear GPU cache periodically for long-running batch jobs:
import gc
import torch
for i, image in enumerate(images):
result, error = model(source=image)
# Clear cache every 100 images
if i % 100 == 0:
if torch.cuda.is_available():
torch.cuda.empty_cache()
gc.collect()Optimal batch sizes
Balance speed and memory usage for your GPU with chunked processing:
# For GPUs with 8GB VRAM
small_batches = process_in_chunks(model, images, chunk_size=50)
# For GPUs with 16GB+ VRAM
large_batches = process_in_chunks(model, images, chunk_size=200)Disable progress for scripts
Reduce overhead in automated scripts by disabling progress bars:
# In automated pipelines
results = model(
source=images,
user_prompt="Describe this",
show_progress=False # Reduce overhead
)Related resources
- Inference overview — Getting started with Vi SDK inference
- Load models — Load models from Datature Vi or HuggingFace
- Task types — VQA and phrase grounding explained
- Prediction schemas — Complete reference for all response types and available fields
- Configure generation — Control temperature, max tokens, and sampling parameters
- Handle results — Process captions, bounding boxes, and visualize predictions
- Optimize performance — Memory management, GPU utilization, and quantization
- Troubleshoot issues — Common problems and solutions for inference
- Vi SDK getting started — Quick start guide for the SDK
- Quickstart: Deploy and test — End-to-end deployment workflow
Need help?
We're here to support your VLMOps journey. Reach out through any of these channels:
Updated about 1 month ago
