Deploy and Test Your Model

Download your trained VLM and run quick inference tests using the Vi SDK.

📍

Step 3 of 3: Deploy and Test

Part of the Vi quickstart workflow. Next: Explore advanced evaluation or manage training projects.

Now that your model is trained, it's time to download it and test its performance with the Vi SDK. This guide shows you how to quickly set up inference and validate your model's predictions on new images.

⏱️ Time: ~10 minutes (plus download time)

📚 What you'll learn: Download models, run inference, and validate performance

📋

Prerequisites

Before you begin, ensure you have:


Why use Vi SDK for testing?

While you can test models in the Vi web interface, the Vi SDK gives you:

  • Programmatic access — Automate testing workflows
  • Batch processing — Test multiple images efficiently
  • Local inference — Run predictions on your own infrastructure
  • Integration ready — Easy to integrate into production systems
  • Flexible testing — Customize prompts and evaluate various scenarios

Step 1: Set up your environment

First, create a Python script or Jupyter notebook for testing.

Install Vi SDK with inference support

If you haven't already installed the SDK with inference capabilities:

# Install with inference support (includes PyTorch, Transformers, etc.)
pip install vi-sdk[inference]

# Or install all features
pip install vi-sdk[all]

Complete installation guide →

Verify installation

import vi

print(f"Vi SDK version: {vi.__version__}")
print("✓ Installation successful!")
💡

GPU acceleration recommended

For faster inference, use a GPU-enabled environment. Vi SDK automatically detects and uses available GPUs.

Check GPU availability:

import torch

print(f"CUDA available: {torch.cuda.is_available()}")
print(f"MPS available: {torch.backends.mps.is_available()}")  # Apple Silicon

GPU setup guide →


Step 2: Find your trained model

Get your run ID from the training project:

Option 1: From the web interface

  1. Go to your Training project
  2. Click on your completed run
  3. Copy the Run ID from the URL or run details

Option 2: List runs via SDK

import vi

# Initialize client
client = vi.Client(
    secret_key="your-secret-key",
    organization_id="your-organization-id"
)

# List recent runs
print("📊 Recent training runs:")
for run in client.runs:
    status = run.status.phase
    print(f"   - {run.name} ({run.run_id})")
    print(f"     Status: {status}")

Learn more about managing runs with the SDK →


Step 3: Download your model

Download the trained model weights to your local machine:

import vi

# Initialize client
client = vi.Client(
    secret_key="your-secret-key",
    organization_id="your-organization-id"
)

# Download the model
print("📥 Downloading model...")
downloaded = client.get_model(
    run_id="your-run-id",
    save_path="./models"
)

print(f"✓ Model downloaded successfully!")
print(f"   Model path: {downloaded.model_path}")
print(f"   Config path: {downloaded.run_config_path}")

Downloaded structure:

models/
└── your-run-id/
    ├── model_full/          # Full model weights
    ├── adapter/             # Adapter weights (if available)
    └── run_config.json      # Training configuration
💡

Download options

  • Checkpoint selection — Download specific training epochs
  • Caching — Avoid re-downloading already cached models
  • Progress tracking — Monitor large model downloads

Complete model download API →

📘

Need to manage your models?

For renaming, editing keys, deleting models, or other management operations, see Manage Models.


Step 4: Load model for inference

Initialize the inference model with your credentials and run ID:

from vi.inference import ViModel

# Load model for inference
print("🔄 Loading model...")
model = ViModel(
    run_id="your-run-id",
    secret_key="your-secret-key",
    organization_id="your-organization-id"
)

print("✓ Model loaded and ready for inference!")
📘

Memory optimization

For GPUs with limited memory, use quantization:

model = ViModel(
    run_id="your-run-id",
    secret_key="your-secret-key",
    organization_id="your-organization-id",
    load_in_4bit=True  # 4-bit quantization (reduces memory by ~75%)
)

Learn about inference optimization →


Step 5: Run inference on test images

Now test your model with new images to validate its performance.

Single image inference

# Run inference on a single image (streaming is default)
result, error = model(
    source="path/to/test_image.jpg",
    user_prompt="Describe what you see in this image",
    stream=False  # Use non-streaming mode for (result, error) tuple
)

if error is None:
    print(f"✓ Result: {result.caption}")
else:
    print(f"❌ Error: {error}")

Batch inference on multiple images

# Test multiple images
test_images = [
    "test_images/image1.jpg",
    "test_images/image2.jpg",
    "test_images/image3.jpg"
]

print("\n🧪 Running batch inference...")
results = model(
    source=test_images,
    user_prompt="Describe this image in detail",
    show_progress=True  # Show progress bar
)

# Display results
for img, (result, error) in zip(test_images, results):
    print(f"\n📸 {img}")
    if error is None:
        print(f"   ✓ {result.caption}")
    else:
        print(f"   ❌ Error: {error}")

Process entire folder

# Process all images in a folder
results = model(
    source="./test_images/",
    user_prompt="Describe this image",
    recursive=True,  # Include subdirectories
    show_progress=True
)

# Count successes
success_count = sum(1 for _, error in results if error is None)
total_count = len(results)

print(f"\n📊 Results: {success_count}/{total_count} successful")

Complete inference API reference →


Step 6: Validate model performance

Evaluate your model's predictions to ensure it meets your requirements.

Visual inspection

# Save predictions with visualizations
from PIL import Image
import json

for img_path, (result, error) in zip(test_images, results):
    if error is None:
        # Load and display image
        image = Image.open(img_path)

        print(f"\n📸 {img_path}")
        print(f"   Prediction: {result.caption}")

        # Save prediction
        with open(f"{img_path}.prediction.json", "w") as f:
            json.dump({"prediction": result.caption}, f, indent=2)

Compare with expected results

# Test cases with expected outputs
test_cases = [
    {
        "image": "test_images/defect1.jpg",
        "prompt": "Does this product have any defects?",
        "expected": "defect"  # Keywords to look for
    },
    {
        "image": "test_images/good1.jpg",
        "prompt": "Does this product have any defects?",
        "expected": "no defect"
    }
]

print("\n🧪 Validation tests:")
passed = 0

for test in test_cases:
    result, error = model(
        source=test["image"],
        user_prompt=test["prompt"],
        stream=False
    )

    if error is None:
        # Simple keyword matching
        prediction = result.caption.lower()
        expected = test["expected"].lower()

        if expected in prediction:
            print(f"✅ PASS: {test['image']}")
            passed += 1
        else:
            print(f"❌ FAIL: {test['image']}")
            print(f"   Expected: {test['expected']}")
            print(f"   Got: {result.caption}")
    else:
        print(f"❌ ERROR: {test['image']} - {error}")

print(f"\n📊 Test Results: {passed}/{len(test_cases)} passed")

Testing for different use cases

Customize your testing based on your model's task type:

Phrase Grounding (Object Detection)

Test your model's ability to locate and describe objects:

# Test phrase grounding
result, error = model(
    source="test_image.jpg",
    user_prompt="Identify and locate all objects in this image",
    stream=False
)

if error is None:
    print(f"Grounding result: {result.caption}")

    # If your model outputs structured data
    if hasattr(result, 'grounded_phrases'):
        for phrase in result.grounded_phrases:
            print(f"   - {phrase.phrase}: {phrase.bbox}")

Learn about Phrase Grounding →

Visual Question Answering (VQA)

Test your model's question-answering capabilities:

# Test VQA with multiple questions
questions = [
    "What color is the product?",
    "Are there any visible defects?",
    "What is the approximate size?",
    "Is this product properly aligned?"
]

print(f"\n🔍 Testing VQA:")
for question in questions:
    result, error = model(
        source="test_image.jpg",
        user_prompt=question,
        stream=False
    )

    if error is None:
        print(f"\nQ: {question}")
        print(f"A: {result.caption}")

Learn about Visual Question Answering →

Image Classification

Test classification accuracy:

# Test classification with categories
test_images_with_labels = [
    ("product_a.jpg", "Product A"),
    ("product_b.jpg", "Product B"),
    ("product_c.jpg", "Product C")
]

correct = 0
for img_path, expected_label in test_images_with_labels:
    result, error = model(
        source=img_path,
        user_prompt="What product category is this?",
        stream=False
    )

    if error is None:
        predicted = result.caption.lower()
        expected = expected_label.lower()

        if expected in predicted:
            print(f"✅ {img_path}: Correct")
            correct += 1
        else:
            print(f"❌ {img_path}: Wrong (expected {expected_label}, got {result.caption})")

accuracy = (correct / len(test_images_with_labels)) * 100
print(f"\n📊 Accuracy: {accuracy:.1f}%")
Defect Detection

Test defect detection capabilities:

# Test defect detection
defect_types = ["scratch", "dent", "discoloration", "crack"]

for img_path in test_images:
    result, error = model(
        source=img_path,
        user_prompt=f"Identify any defects in this image. Look for: {', '.join(defect_types)}",
        stream=False
    )

    if error is None:
        print(f"\n🔍 {img_path}")
        print(f"   Detection: {result.caption}")

        # Check for specific defect types
        found_defects = [d for d in defect_types if d in result.caption.lower()]
        if found_defects:
            print(f"   ⚠️ Found: {', '.join(found_defects)}")
        else:
            print(f"   ✓ No defects detected")

Complete testing workflow example

Here's a full end-to-end testing script combining all the steps:

import vi
from vi.inference import ViModel
from pathlib import Path
import json
from datetime import datetime

# 1. Initialize client and download model
print("🚀 Starting model testing workflow\n")

client = vi.Client(
    secret_key="your-secret-key",
    organization_id="your-organization-id"
)

# Download model (with caching)
model_dir = Path("./models/your-run-id")
if not model_dir.exists():
    print("📥 Downloading model...")
    downloaded = client.get_model(
        run_id="your-run-id",
        save_path="./models"
    )
    print(f"✓ Downloaded to: {downloaded.model_path}\n")
else:
    print("✓ Using cached model\n")

# 2. Load model for inference
print("🔄 Loading model...")
model = ViModel(
    run_id="your-run-id",
    secret_key="your-secret-key",
    organization_id="your-organization-id"
)
print("✓ Model loaded\n")

# 3. Run inference on test set
test_images = list(Path("./test_images").glob("*.jpg"))
print(f"🧪 Testing on {len(test_images)} images...\n")

results = model(
    source=test_images,
    user_prompt="Describe this image in detail",
    show_progress=True
)

# 4. Analyze results
success_count = 0
errors = []

for img, (result, error) in zip(test_images, results):
    if error is None:
        success_count += 1
        caption = result.caption if hasattr(result, 'caption') else str(result)
        print(f"✅ {img.name}: {caption[:100]}...")  # First 100 chars
    else:
        errors.append((img.name, str(error)))
        print(f"❌ {img.name}: {error}")

# 5. Save test report
report = {
    "timestamp": datetime.now().isoformat(),
    "run_id": "your-run-id",
    "total_images": len(test_images),
    "successful": success_count,
    "success_rate": (success_count / len(test_images)) * 100,
    "errors": errors
}

with open("test_report.json", "w") as f:
    json.dump(report, f, indent=2)

# 6. Summary
print(f"\n📊 Test Summary:")
print(f"   Total images: {len(test_images)}")
print(f"   Successful: {success_count}")
print(f"   Success rate: {report['success_rate']:.1f}%")
print(f"\n✓ Report saved to test_report.json")

Performance tips

Speed up inference

Use GPU acceleration:

# Check GPU availability
import torch
print(f"Using device: {'GPU' if torch.cuda.is_available() else 'CPU'}")

Batch processing:

# Process multiple images at once
results = model(
    source=["img1.jpg", "img2.jpg", "img3.jpg"],
    show_progress=True
)

Use quantization:

# Load model with 4-bit quantization
model = ViModel(
    run_id="your-run-id",
    secret_key="your-secret-key",
    organization_id="your-organization-id",
    load_in_4bit=True  # Faster and uses less memory
)

Complete inference optimization guide →

Handle large test sets

Process in chunks:

from pathlib import Path

test_dir = Path("./test_images")
all_images = list(test_dir.glob("*.jpg"))

# Process in batches of 100
batch_size = 100
for i in range(0, len(all_images), batch_size):
    batch = all_images[i:i+batch_size]
    print(f"\nProcessing batch {i//batch_size + 1}...")

    results = model(
        source=batch,
        user_prompt="Describe this image",
        show_progress=True
    )

    # Save batch results
    for img, (result, error) in zip(batch, results):
        if error is None:
            # Save result
            output_path = test_dir / f"{img.stem}_result.json"
            caption = result.caption if hasattr(result, 'caption') else str(result)
            with open(output_path, "w") as f:
                json.dump({"prediction": caption}, f)
Compare multiple models

Test different training runs:

# Compare two models
run_ids = ["run_abc123", "run_def456"]
test_image = "test_images/sample.jpg"

print("🔍 Comparing models:\n")

for run_id in run_ids:
    model = ViModel(
        run_id=run_id,
        secret_key="your-secret-key",
        organization_id="your-organization-id"
    )

    result, error = model(
        source=test_image,
        user_prompt="Describe this image",
        stream=False
    )

    print(f"Model {run_id}:")
    if error is None:
        print(f"   {result.caption}\n")
    else:
        print(f"   Error: {error}\n")

Learn about evaluating multiple models →

Automate testing in CI/CD

Create a test script:

#!/usr/bin/env python3
import os
import sys
from pathlib import Path
from vi.inference import ViModel

def run_tests(run_id: str, test_dir: str) -> bool:
    """Run inference tests and return pass/fail."""
    model = ViModel(
        secret_key=os.getenv("VI_SECRET_KEY"),
        organization_id=os.getenv("VI_ORG_ID"),
        run_id=run_id
    )

    test_images = list(Path(test_dir).glob("*.jpg"))
    results = model(source=test_images, user_prompt="Describe this image")

    # Check success rate
    success_count = sum(1 for _, error in results if error is None)
    success_rate = (success_count / len(results)) * 100

    print(f"Success rate: {success_rate:.1f}%")
    return success_rate >= 95  # Require 95% success

if __name__ == "__main__":
    passed = run_tests(sys.argv[1], sys.argv[2])
    sys.exit(0 if passed else 1)

Use in CI/CD:

# In your CI/CD pipeline
export VI_SECRET_KEY="your-secret-key"
export VI_ORG_ID="your-org-id"
python test_model.py run_abc123 ./test_images

Common questions

How do I test on images from my dataset?

Download your dataset and test on the validation split:

from vi.dataset.loaders import ViDataset

# Download dataset
client.get_dataset(dataset_id="your-dataset-id", save_dir="./data")

# Load validation split
dataset = ViDataset("./data/your-dataset-id")

# Test on validation images
for asset, annotations in dataset.validation.iter_pairs():
    result, error = model(
        source=asset.path,
        user_prompt="Describe this image",
        stream=False
    )

    if error is None:
        print(f"✓ {asset.filename}: {result.caption}")

Learn about dataset loaders →

Can I test models without downloading them?

Currently, local inference requires downloading model weights. However, you can:

  1. Cache downloads — Models are downloaded once and reused
  2. Use Vi Cloud for testing — Test via the web interface without downloading
  3. Deploy to production — Use cloud deployment for API-based inference

Learn about deployment options →

What if my model's predictions are poor?

If your model isn't performing well:

  1. Check training metrics — Review loss curves and validation metrics
  2. Increase training data — Add more annotated images
  3. Adjust training settings — Try different hyperparameters
  4. Refine system prompt — Improve your system prompt
  5. Try different architecture — Experiment with different base models

Complete evaluation guide →

How do I integrate this into production?

After validating your model:

  1. Optimize for production — Use quantization and GPU acceleration
  2. Deploy as API — Use NIM deployment for cloud inference
  3. Monitor performance — Track prediction quality and latency
  4. Version control — Keep track of model versions and training configs

Learn about production deployment →

Can I customize the inference prompt?

Yes! Customize prompts based on your use case:

# Generic description
result, _ = model(source="image.jpg", user_prompt="Describe this image", stream=False)

# Specific question
result, _ = model(source="image.jpg", user_prompt="What defects are visible?", stream=False)

# Structured output
result, _ = model(
    source="image.jpg",
    user_prompt="List all objects in this image with their locations",
    stream=False
)

What's next?

Congratulations!

You've successfully downloaded and tested your trained VLM using the Vi SDK!

Continue your VLMOps journey:


Related resources