Troubleshoot Issues

Troubleshoot issues

Common inference issues and solutions for memory errors, performance problems, and model loading failures.

📋
Prerequisites

Vi SDK installed with inference dependencies

A trained model loaded with ViModel

Understanding of inference basics

Familiarity with performance optimization

Get started with inference →

Overview

This guide covers:

Memory errors - Out of memory (OOM) issues with GPU and CPU
Performance issues - Slow inference and throughput problems
Loading problems - Model loading and download failures
Runtime errors - File and execution issues
Result issues - Unexpected outputs and poor quality predictions

Out of Memory Errors

GPU Out of Memory

Symptoms:

CUDA out of memory. Tried to allocate X GB...
RuntimeError: CUDA error: out of memory

Solutions:

1. Use quantization

Learn more about quantization for memory reduction.

# Try 8-bit quantization first
model = ViModel(
    run_id="your-run-id",
    load_in_8bit=True,
    device_map="auto"
)

# If still OOM, try 4-bit
model = ViModel(
    run_id="your-run-id",
    load_in_4bit=True,
    device_map="auto"
)

2. Enable low CPU memory usage

model = ViModel(
    run_id="your-run-id",
    load_in_8bit=True,
    low_cpu_mem_usage=True,
    device_map="auto"
)

3. Clear GPU cache

import torch
import gc

# Clear cache before loading
torch.cuda.empty_cache()
gc.collect()

# Load model
model = ViModel(run_id="your-run-id", load_in_8bit=True)

4. Process in smaller batches

See batch inference guide for more details.

# Reduce batch size
def process_in_chunks(model, images, chunk_size=25):  # Smaller chunks
    results = []
    for i in range(0, len(images), chunk_size):
        chunk = images[i:i+chunk_size]
        batch_results = model(source=chunk)
        results.extend(batch_results)

        # Clear cache between batches
        torch.cuda.empty_cache()
    return results

CPU Out of Memory

Symptoms:

MemoryError
Killed (process terminated)

Solutions:

1. Enable Low CPU Memory

model = ViModel(
    run_id="your-run-id",
    low_cpu_mem_usage=True
)

2. Use GPU Instead

# Ensure model loads on GPU, not CPU
model = ViModel(
    run_id="your-run-id",
    device_map="cuda"  # Force GPU
)

3. Close Other Applications

Free up system memory before loading models.

Model Loading Issues

Model Download Fails

Symptoms:

ValueError: Failed to download model
ConnectionError: Failed to connect

Solutions:

1. Check Credentials

# Verify credentials
import os

print(f"Secret Key: {os.getenv('DATATURE_VI_SECRET_KEY')[:10]}...")
print(f"Org ID: {os.getenv('DATATURE_VI_ORGANIZATION_ID')}")

2. Verify Run ID

import vi

# List available runs
client = vi.Client()
for run in client.runs:
    print(f"Run ID: {run.run_id}, Status: {run.status.phase}")

3. Check Network Connection

# Test connection
import requests

try:
    response = requests.get("https://vi.datature.io", timeout=5)
    print(f"Connection OK: {response.status_code}")
except Exception as e:
    print(f"Connection failed: {e}")

4. Check Model Status

Ensure the model has finished training and exporting:

client = vi.Client()
run = client.runs.get(run_id="your-run-id")
print(f"Status: {run.status.phase}")

# Status should be "completed"
if run.status.phase != "completed":
    print("Model is still training or exporting")

Model Loading Hangs

Symptoms:

Loading process freezes
No progress for extended period

Solutions:

1. Check Disk Space

import shutil

# Check available space
total, used, free = shutil.disk_usage("/")
print(f"Free space: {free // (2**30)} GB")

2. Clear Model Cache

# Remove cached models
rm -rf ~/.datature/vi/models/

3. Force Re-download

model = ViModel(
    run_id="your-run-id",
    overwrite=True  # Force fresh download
)

Slow Inference

General Slowness

Symptoms:

Inference takes longer than expected
Low throughput

Solutions:

1. Use GPU

import torch

# Check if GPU is being used
print(f"CUDA available: {torch.cuda.is_available()}")

# Force GPU usage
model = ViModel(
    run_id="your-run-id",
    device_map="cuda"
)

2. Enable Mixed Precision

model = ViModel(
    run_id="your-run-id",
    dtype="float16",  # Faster than float32
    device_map="auto"
)

3. Use Flash Attention 2

model = ViModel(
    run_id="your-run-id",
    attn_implementation="flash_attention_2",
    dtype="float16"
)

4. Use Batch Processing

# ✅ Good - batch inference
results = model(source=["img1.jpg", "img2.jpg", "img3.jpg"])

# ❌ Bad - sequential processing
for img in ["img1.jpg", "img2.jpg", "img3.jpg"]:
    result, error = model(source=img)

First Inference Slow

Symptoms:

First inference much slower than subsequent ones

Solution:

This is normal due to model initialization. Perform a warm-up inference:

# Warm-up inference
_ = model(source="dummy_image.jpg")

# Subsequent inferences will be faster
result, error = model(source="real_image.jpg")

Runtime Errors

File Not Found

Symptoms:

FileNotFoundError: [Errno 2] No such file or directory: 'image.jpg'

Solutions:

1. Use Absolute Paths

import os

# Get absolute path
image_path = os.path.abspath("image.jpg")
result, error = model(source=image_path)

2. Verify File Exists

from pathlib import Path

image_path = "image.jpg"
if Path(image_path).exists():
    result, error = model(source=image_path)
else:
    print(f"File not found: {image_path}")

Unsupported Image Format

Symptoms:

PIL.UnidentifiedImageError: cannot identify image file

Solutions:

1. Check Image Format

from PIL import Image

try:
    img = Image.open("image.jpg")
    print(f"Format: {img.format}, Size: {img.size}")
except Exception as e:
    print(f"Invalid image: {e}")

2. Convert to Supported Format

from PIL import Image

# Convert to JPEG
img = Image.open("image.webp")
img = img.convert("RGB")
img.save("image.jpg", "JPEG")

# Now use converted image
result, error = model(source="image.jpg")

Supported formats: .jpg, .jpeg, .png, .bmp, .gif, .tiff, .tif, .webp

Invalid Prompt Error

Symptoms:

ValueError: user_prompt length must match sources length

Solution:

Ensure prompt list matches image list length:

# ✅ Good - matching lengths
results = model(
    source=["img1.jpg", "img2.jpg"],
    user_prompt=["Prompt 1", "Prompt 2"]
)

# ❌ Bad - length mismatch
results = model(
    source=["img1.jpg", "img2.jpg"],
    user_prompt=["Prompt 1"]  # Wrong!
)

Result Issues

No Grounded Phrases

Symptoms:

result.grounded_phrases attribute missing
Expected bounding boxes not returned

Solutions:

1. Check Task Type

# For phrase grounding, use appropriate prompt or omit it
result, error = model(
    source="image.jpg",
    user_prompt="Locate all objects"  # Or omit for default
)

# Check if grounded phrases available
if hasattr(result, 'grounded_phrases'):
    print(f"Found {len(result.grounded_phrases)} objects")
else:
    print("No grounded phrases (may be VQA task)")

2. Verify Model Supports Phrase Grounding

Not all models support phrase grounding. Check model capabilities:

info = ViModel.inspect(run_id="your-run-id")
print(f"Task type: {info.task_type}")

Unexpected Output Format

Symptoms:

Result format different than expected
Attributes missing

Solution:

Always check for attributes before accessing:

result, error = model(source="image.jpg")

if error is None:
    # Always check before accessing
    if hasattr(result, 'caption'):
        print(f"Caption: {result.caption}")

    if hasattr(result, 'grounded_phrases'):
        for phrase in result.grounded_phrases:
            print(f"Phrase: {phrase.phrase}")

Poor Quality Results

Symptoms:

Inaccurate predictions
Low confidence outputs

Solutions:

1. Adjust Generation Config

result, error = model(
    source="image.jpg",
    user_prompt="Describe this image",
    generation_config={
        "temperature": 0.0,  # More deterministic
        "max_new_tokens": 256,
        "do_sample": False
    }
)

2. Improve Prompt Quality

# ❌ Bad - vague prompt
"Tell me about this"

# ✅ Good - specific prompt
"What objects are visible in this image and where are they located?"

3. Check Model Training

Verify the model was properly trained:

client = vi.Client()
run = client.runs.get(run_id="your-run-id")
print(f"Status: {run.status.phase}")
print(f"Metrics: {run.metrics if hasattr(run, 'metrics') else 'N/A'}")

Permission Errors

Access Denied

Symptoms:

PermissionError: [Errno 13] Permission denied
HTTPError: 403 Forbidden

Solutions:

1. Check API Key Permissions

Ensure your secret key has the necessary permissions:

import vi

client = vi.Client()
org = client.organizations
print(f"Organization: {org.name}")
print(f"Access level: {org.role if hasattr(org, 'role') else 'Unknown'}")

2. Verify File Permissions

import os

# Check file permissions
file_path = "image.jpg"
if os.access(file_path, os.R_OK):
    print("File is readable")
else:
    print("Permission denied - cannot read file")

3. Check Write Permissions

For saving results:

import os

output_dir = "./results"
if os.access(output_dir, os.W_OK):
    print("Directory is writable")
else:
    print("Permission denied - cannot write to directory")
    # Create directory with proper permissions
    os.makedirs(output_dir, exist_ok=True)

Debugging Tips

Enable Detailed Logging

import logging

# Set logging level
logging.basicConfig(level=logging.DEBUG)

# Run inference with detailed logs
model = ViModel(run_id="your-run-id")
result, error = model(source="image.jpg")

Check System Resources

import psutil
import torch

# CPU and RAM
cpu_percent = psutil.cpu_percent()
ram_percent = psutil.virtual_memory().percent

print(f"CPU Usage: {cpu_percent}%")
print(f"RAM Usage: {ram_percent}%")

# GPU
if torch.cuda.is_available():
    for i in range(torch.cuda.device_count()):
        allocated = torch.cuda.memory_allocated(i) / 1e9
        reserved = torch.cuda.memory_reserved(i) / 1e9
        print(f"GPU {i} - Allocated: {allocated:.2f}GB, Reserved: {reserved:.2f}GB")

Minimal Reproducible Example

Create a minimal test case:

from vi.inference import ViModel

# Minimal test
model = ViModel(run_id="your-run-id")
result, error = model(source="test.jpg")

if error:
    print(f"Error: {type(error).__name__}: {error}")
else:
    print(f"Success: {result.caption[:50]}...")

Getting Help

If you're still experiencing issues:

1. Check Documentation

2. Review Error Messages

Read error messages carefully - they often contain the solution:

try:
    model = ViModel(run_id="invalid-id")
except Exception as e:
    print(f"Error type: {type(e).__name__}")
    print(f"Error message: {str(e)}")
    # Often contains hints about what went wrong

3. Contact Support

If issues persist:

Visit Contact Us
Check Datature Community
Review Vi SDK Changelog for known issues

4. Provide Details

When reporting issues, include:

Vi SDK version (vi.__version__)
Python version
Operating system
GPU details (if applicable)
Complete error message
Minimal reproducible code

import vi
import sys
import torch

print(f"Vi SDK: {vi.__version__}")
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.version.cuda if torch.cuda.is_available() else 'N/A'}")

Related resources

Inference overview — Getting started with inference
Optimize performance — Memory management and GPU utilization
Load models — Model loading from Datature or HuggingFace
Run inference — Single and batch inference guide
Handle results — Process prediction outputs
Configure generation — Adjust generation parameters
Vi SDK installation — Install inference dependencies
Vi SDK getting started — Quick start guide for the SDK
Contact us — Get help from the Datature team
Vi SDK changelog — Latest updates and known issues
API resources — Complete SDK reference

Need help?

We're here to support your VLMOps journey. Reach out through any of these channels:

Contact Support

Get help from our team via our website or email us at [email protected]

Join Our Community

Connect with other Datature users, share ideas, and get community support on Slack

Explore Resources

Read our Blog
Check out GitHub
Watch Tutorials

Schedule a Demo

Book a personalized demo to see how Datature Vi can accelerate your vision AI projects