Deploy and Test Your Model
Download your trained VLM and run quick inference tests using the Vi SDK.
Step 3 of 3: Deploy and TestPart of the Vi quickstart workflow. Next: Explore advanced evaluation or manage training projects.
Now that your model is trained, it's time to download it and test its performance with the Vi SDK. This guide shows you how to quickly set up inference and validate your model's predictions on new images.
⏱️ Time: ~10 minutes (plus download time)
📚 What you'll learn: Download models, run inference, and validate performance
PrerequisitesBefore you begin, ensure you have:
A completed training run (complete step 2 if you haven't)
Vi SDK installed with inference support (
pip install vi-sdk[inference])Your secret key and organization ID for authentication
A few test images that weren't in your training dataset
New to Vi SDK?Check out the Vi SDK Getting Started guide for a quick introduction.
Why use Vi SDK for testing?
While you can test models in the Vi web interface, the Vi SDK gives you:
- ✅ Programmatic access — Automate testing workflows
- ✅ Batch processing — Test multiple images efficiently
- ✅ Local inference — Run predictions on your own infrastructure
- ✅ Integration ready — Easy to integrate into production systems
- ✅ Flexible testing — Customize prompts and evaluate various scenarios
Step 1: Set up your environment
First, create a Python script or Jupyter notebook for testing.
Install Vi SDK with inference support
If you haven't already installed the SDK with inference capabilities:
# Install with inference support (includes PyTorch, Transformers, etc.)
pip install vi-sdk[inference]
# Or install all features
pip install vi-sdk[all]Verify installation
import vi
print(f"Vi SDK version: {vi.__version__}")
print("✓ Installation successful!")
GPU acceleration recommendedFor faster inference, use a GPU-enabled environment. Vi SDK automatically detects and uses available GPUs.
Check GPU availability:
import torch print(f"CUDA available: {torch.cuda.is_available()}") print(f"MPS available: {torch.backends.mps.is_available()}") # Apple Silicon
Step 2: Find your trained model
Get your run ID from the training project:
Option 1: From the web interface
- Go to your Training project
- Click on your completed run
- Copy the Run ID from the URL or run details
Option 2: List runs via SDK
import vi
# Initialize client
client = vi.Client(
secret_key="your-secret-key",
organization_id="your-organization-id"
)
# List recent runs
print("📊 Recent training runs:")
for run in client.runs:
status = run.status.phase
print(f" - {run.name} ({run.run_id})")
print(f" Status: {status}")Learn more about managing runs with the SDK →
Step 3: Download your model
Download the trained model weights to your local machine:
import vi
# Initialize client
client = vi.Client(
secret_key="your-secret-key",
organization_id="your-organization-id"
)
# Download the model
print("📥 Downloading model...")
downloaded = client.get_model(
run_id="your-run-id",
save_path="./models"
)
print(f"✓ Model downloaded successfully!")
print(f" Model path: {downloaded.model_path}")
print(f" Config path: {downloaded.run_config_path}")Downloaded structure:
models/
└── your-run-id/
├── model_full/ # Full model weights
├── adapter/ # Adapter weights (if available)
└── run_config.json # Training configuration
Download options
- Checkpoint selection — Download specific training epochs
- Caching — Avoid re-downloading already cached models
- Progress tracking — Monitor large model downloads
Need to manage your models?For renaming, editing keys, deleting models, or other management operations, see Manage Models.
Step 4: Load model for inference
Initialize the inference model with your credentials and run ID:
from vi.inference import ViModel
# Load model for inference
print("🔄 Loading model...")
model = ViModel(
run_id="your-run-id",
secret_key="your-secret-key",
organization_id="your-organization-id"
)
print("✓ Model loaded and ready for inference!")
Memory optimizationFor GPUs with limited memory, use quantization:
model = ViModel( run_id="your-run-id", secret_key="your-secret-key", organization_id="your-organization-id", load_in_4bit=True # 4-bit quantization (reduces memory by ~75%) )
Step 5: Run inference on test images
Now test your model with new images to validate its performance.
Single image inference
# Run inference on a single image (streaming is default)
result, error = model(
source="path/to/test_image.jpg",
user_prompt="Describe what you see in this image",
stream=False # Use non-streaming mode for (result, error) tuple
)
if error is None:
print(f"✓ Result: {result.caption}")
else:
print(f"❌ Error: {error}")Batch inference on multiple images
# Test multiple images
test_images = [
"test_images/image1.jpg",
"test_images/image2.jpg",
"test_images/image3.jpg"
]
print("\n🧪 Running batch inference...")
results = model(
source=test_images,
user_prompt="Describe this image in detail",
show_progress=True # Show progress bar
)
# Display results
for img, (result, error) in zip(test_images, results):
print(f"\n📸 {img}")
if error is None:
print(f" ✓ {result.caption}")
else:
print(f" ❌ Error: {error}")Process entire folder
# Process all images in a folder
results = model(
source="./test_images/",
user_prompt="Describe this image",
recursive=True, # Include subdirectories
show_progress=True
)
# Count successes
success_count = sum(1 for _, error in results if error is None)
total_count = len(results)
print(f"\n📊 Results: {success_count}/{total_count} successful")Complete inference API reference →
Step 6: Validate model performance
Evaluate your model's predictions to ensure it meets your requirements.
Visual inspection
# Save predictions with visualizations
from PIL import Image
import json
for img_path, (result, error) in zip(test_images, results):
if error is None:
# Load and display image
image = Image.open(img_path)
print(f"\n📸 {img_path}")
print(f" Prediction: {result.caption}")
# Save prediction
with open(f"{img_path}.prediction.json", "w") as f:
json.dump({"prediction": result.caption}, f, indent=2)Compare with expected results
# Test cases with expected outputs
test_cases = [
{
"image": "test_images/defect1.jpg",
"prompt": "Does this product have any defects?",
"expected": "defect" # Keywords to look for
},
{
"image": "test_images/good1.jpg",
"prompt": "Does this product have any defects?",
"expected": "no defect"
}
]
print("\n🧪 Validation tests:")
passed = 0
for test in test_cases:
result, error = model(
source=test["image"],
user_prompt=test["prompt"],
stream=False
)
if error is None:
# Simple keyword matching
prediction = result.caption.lower()
expected = test["expected"].lower()
if expected in prediction:
print(f"✅ PASS: {test['image']}")
passed += 1
else:
print(f"❌ FAIL: {test['image']}")
print(f" Expected: {test['expected']}")
print(f" Got: {result.caption}")
else:
print(f"❌ ERROR: {test['image']} - {error}")
print(f"\n📊 Test Results: {passed}/{len(test_cases)} passed")Testing for different use cases
Customize your testing based on your model's task type:
Phrase Grounding (Object Detection)
Test your model's ability to locate and describe objects:
# Test phrase grounding
result, error = model(
source="test_image.jpg",
user_prompt="Identify and locate all objects in this image",
stream=False
)
if error is None:
print(f"Grounding result: {result.caption}")
# If your model outputs structured data
if hasattr(result, 'grounded_phrases'):
for phrase in result.grounded_phrases:
print(f" - {phrase.phrase}: {phrase.bbox}")Visual Question Answering (VQA)
Test your model's question-answering capabilities:
# Test VQA with multiple questions
questions = [
"What color is the product?",
"Are there any visible defects?",
"What is the approximate size?",
"Is this product properly aligned?"
]
print(f"\n🔍 Testing VQA:")
for question in questions:
result, error = model(
source="test_image.jpg",
user_prompt=question,
stream=False
)
if error is None:
print(f"\nQ: {question}")
print(f"A: {result.caption}")Image Classification
Test classification accuracy:
# Test classification with categories
test_images_with_labels = [
("product_a.jpg", "Product A"),
("product_b.jpg", "Product B"),
("product_c.jpg", "Product C")
]
correct = 0
for img_path, expected_label in test_images_with_labels:
result, error = model(
source=img_path,
user_prompt="What product category is this?",
stream=False
)
if error is None:
predicted = result.caption.lower()
expected = expected_label.lower()
if expected in predicted:
print(f"✅ {img_path}: Correct")
correct += 1
else:
print(f"❌ {img_path}: Wrong (expected {expected_label}, got {result.caption})")
accuracy = (correct / len(test_images_with_labels)) * 100
print(f"\n📊 Accuracy: {accuracy:.1f}%")Defect Detection
Test defect detection capabilities:
# Test defect detection
defect_types = ["scratch", "dent", "discoloration", "crack"]
for img_path in test_images:
result, error = model(
source=img_path,
user_prompt=f"Identify any defects in this image. Look for: {', '.join(defect_types)}",
stream=False
)
if error is None:
print(f"\n🔍 {img_path}")
print(f" Detection: {result.caption}")
# Check for specific defect types
found_defects = [d for d in defect_types if d in result.caption.lower()]
if found_defects:
print(f" ⚠️ Found: {', '.join(found_defects)}")
else:
print(f" ✓ No defects detected")Complete testing workflow example
Here's a full end-to-end testing script combining all the steps:
import vi
from vi.inference import ViModel
from pathlib import Path
import json
from datetime import datetime
# 1. Initialize client and download model
print("🚀 Starting model testing workflow\n")
client = vi.Client(
secret_key="your-secret-key",
organization_id="your-organization-id"
)
# Download model (with caching)
model_dir = Path("./models/your-run-id")
if not model_dir.exists():
print("📥 Downloading model...")
downloaded = client.get_model(
run_id="your-run-id",
save_path="./models"
)
print(f"✓ Downloaded to: {downloaded.model_path}\n")
else:
print("✓ Using cached model\n")
# 2. Load model for inference
print("🔄 Loading model...")
model = ViModel(
run_id="your-run-id",
secret_key="your-secret-key",
organization_id="your-organization-id"
)
print("✓ Model loaded\n")
# 3. Run inference on test set
test_images = list(Path("./test_images").glob("*.jpg"))
print(f"🧪 Testing on {len(test_images)} images...\n")
results = model(
source=test_images,
user_prompt="Describe this image in detail",
show_progress=True
)
# 4. Analyze results
success_count = 0
errors = []
for img, (result, error) in zip(test_images, results):
if error is None:
success_count += 1
caption = result.caption if hasattr(result, 'caption') else str(result)
print(f"✅ {img.name}: {caption[:100]}...") # First 100 chars
else:
errors.append((img.name, str(error)))
print(f"❌ {img.name}: {error}")
# 5. Save test report
report = {
"timestamp": datetime.now().isoformat(),
"run_id": "your-run-id",
"total_images": len(test_images),
"successful": success_count,
"success_rate": (success_count / len(test_images)) * 100,
"errors": errors
}
with open("test_report.json", "w") as f:
json.dump(report, f, indent=2)
# 6. Summary
print(f"\n📊 Test Summary:")
print(f" Total images: {len(test_images)}")
print(f" Successful: {success_count}")
print(f" Success rate: {report['success_rate']:.1f}%")
print(f"\n✓ Report saved to test_report.json")Performance tips
Speed up inference
Use GPU acceleration:
# Check GPU availability
import torch
print(f"Using device: {'GPU' if torch.cuda.is_available() else 'CPU'}")Batch processing:
# Process multiple images at once
results = model(
source=["img1.jpg", "img2.jpg", "img3.jpg"],
show_progress=True
)Use quantization:
# Load model with 4-bit quantization
model = ViModel(
run_id="your-run-id",
secret_key="your-secret-key",
organization_id="your-organization-id",
load_in_4bit=True # Faster and uses less memory
)Handle large test sets
Process in chunks:
from pathlib import Path
test_dir = Path("./test_images")
all_images = list(test_dir.glob("*.jpg"))
# Process in batches of 100
batch_size = 100
for i in range(0, len(all_images), batch_size):
batch = all_images[i:i+batch_size]
print(f"\nProcessing batch {i//batch_size + 1}...")
results = model(
source=batch,
user_prompt="Describe this image",
show_progress=True
)
# Save batch results
for img, (result, error) in zip(batch, results):
if error is None:
# Save result
output_path = test_dir / f"{img.stem}_result.json"
caption = result.caption if hasattr(result, 'caption') else str(result)
with open(output_path, "w") as f:
json.dump({"prediction": caption}, f)Compare multiple models
Test different training runs:
# Compare two models
run_ids = ["run_abc123", "run_def456"]
test_image = "test_images/sample.jpg"
print("🔍 Comparing models:\n")
for run_id in run_ids:
model = ViModel(
run_id=run_id,
secret_key="your-secret-key",
organization_id="your-organization-id"
)
result, error = model(
source=test_image,
user_prompt="Describe this image",
stream=False
)
print(f"Model {run_id}:")
if error is None:
print(f" {result.caption}\n")
else:
print(f" Error: {error}\n")Automate testing in CI/CD
Create a test script:
#!/usr/bin/env python3
import os
import sys
from pathlib import Path
from vi.inference import ViModel
def run_tests(run_id: str, test_dir: str) -> bool:
"""Run inference tests and return pass/fail."""
model = ViModel(
secret_key=os.getenv("VI_SECRET_KEY"),
organization_id=os.getenv("VI_ORG_ID"),
run_id=run_id
)
test_images = list(Path(test_dir).glob("*.jpg"))
results = model(source=test_images, user_prompt="Describe this image")
# Check success rate
success_count = sum(1 for _, error in results if error is None)
success_rate = (success_count / len(results)) * 100
print(f"Success rate: {success_rate:.1f}%")
return success_rate >= 95 # Require 95% success
if __name__ == "__main__":
passed = run_tests(sys.argv[1], sys.argv[2])
sys.exit(0 if passed else 1)Use in CI/CD:
# In your CI/CD pipeline
export VI_SECRET_KEY="your-secret-key"
export VI_ORG_ID="your-org-id"
python test_model.py run_abc123 ./test_imagesCommon questions
How do I test on images from my dataset?
Download your dataset and test on the validation split:
from vi.dataset.loaders import ViDataset
# Download dataset
client.get_dataset(dataset_id="your-dataset-id", save_dir="./data")
# Load validation split
dataset = ViDataset("./data/your-dataset-id")
# Test on validation images
for asset, annotations in dataset.validation.iter_pairs():
result, error = model(
source=asset.path,
user_prompt="Describe this image",
stream=False
)
if error is None:
print(f"✓ {asset.filename}: {result.caption}")Can I test models without downloading them?
Currently, local inference requires downloading model weights. However, you can:
- Cache downloads — Models are downloaded once and reused
- Use Vi Cloud for testing — Test via the web interface without downloading
- Deploy to production — Use cloud deployment for API-based inference
What if my model's predictions are poor?
If your model isn't performing well:
- Check training metrics — Review loss curves and validation metrics
- Increase training data — Add more annotated images
- Adjust training settings — Try different hyperparameters
- Refine system prompt — Improve your system prompt
- Try different architecture — Experiment with different base models
How do I integrate this into production?
After validating your model:
- Optimize for production — Use quantization and GPU acceleration
- Deploy as API — Use NIM deployment for cloud inference
- Monitor performance — Track prediction quality and latency
- Version control — Keep track of model versions and training configs
Can I customize the inference prompt?
Yes! Customize prompts based on your use case:
# Generic description
result, _ = model(source="image.jpg", user_prompt="Describe this image", stream=False)
# Specific question
result, _ = model(source="image.jpg", user_prompt="What defects are visible?", stream=False)
# Structured output
result, _ = model(
source="image.jpg",
user_prompt="List all objects in this image with their locations",
stream=False
)What's next?
Congratulations!You've successfully downloaded and tested your trained VLM using the Vi SDK!
Continue your VLMOps journey:
Deep dive into metrics, visualizations, and advanced testing
Organize runs, workflows, and experiments
Explore the complete SDK documentation
Rename, edit keys, delete, and organize your models
Related resources
- Vi SDK Getting Started — Quick start guide for the SDK
- Vi SDK Inference — Complete inference documentation
- Vi SDK Models API — Model download API reference
- Manage Models — Rename, edit keys, and delete models
- Monitor Training Runs — Track training progress
- Evaluation Guide — Assess model performance
- Training Logs — Debug training issues
Need help?
We're here to support your VLMOps journey. Reach out through any of these channels:
Updated about 1 month ago
