Troubleshoot Issues

This page covers the most common errors when running inference with Datature Vi: out-of-memory failures, slow performance, model loading problems, runtime errors, and unexpected outputs.

Before You Start

Get started with inference →

Out of memory errors

GPU out of memory

Symptoms:

CUDA out of memory. Tried to allocate X GB...
RuntimeError: CUDA error: out of memory

Step 1: Switch to 8-bit quantization

from vi.inference import ViModel

model = ViModel(
    run_id="your-run-id",
    load_in_8bit=True,
    device_map="auto"
)

Step 2: If still OOM, switch to 4-bit

model = ViModel(
    run_id="your-run-id",
    load_in_4bit=True,
    device_map="auto"
)

Step 3: Enable low CPU memory usage

model = ViModel(
    run_id="your-run-id",
    load_in_4bit=True,
    low_cpu_mem_usage=True,
    device_map="auto"
)

Step 4: Clear GPU cache before loading

import torch
import gc

torch.cuda.empty_cache()
gc.collect()

model = ViModel(run_id="your-run-id", load_in_8bit=True)

Step 5: Process in smaller chunks

def process_in_chunks(model, images, chunk_size=25):
    results = []
    for i in range(0, len(images), chunk_size):
        chunk = images[i:i + chunk_size]
        batch_results = model(source=chunk)
        results.extend(batch_results)
        torch.cuda.empty_cache()
    return results

See improve performance for more memory management strategies.

CPU out of memory

Symptoms:

MemoryError
Killed (process terminated)
# Enable low CPU memory usage during loading
model = ViModel(run_id="your-run-id", low_cpu_mem_usage=True)

# Force model onto GPU instead of CPU
model = ViModel(run_id="your-run-id", device_map="cuda")

If your machine has a GPU, make sure the model is loading there rather than staying on CPU.

Model loading issues

Model download fails

Symptoms:

ValueError: Failed to download model
ConnectionError: Failed to connect

Check your credentials:

import os

print(f"Secret Key set: {bool(os.getenv('DATATURE_VI_SECRET_KEY'))}")
print(f"Org ID set: {bool(os.getenv('DATATURE_VI_ORGANIZATION_ID'))}")

Verify the run ID exists and training is complete:

import vi

client = vi.Client()
run = client.runs.get(run_id="your-run-id")
print(f"Status: {run.status.phase}")
# Status must be "completed" before you can download

Test network connectivity:

import requests

try:
    response = requests.get("https://vi.datature.com", timeout=5)
    print(f"Connection OK: {response.status_code}")
except Exception as e:
    print(f"Connection failed: {e}")

Model loading hangs

Symptoms: Loading freezes with no progress for an extended period.

import shutil

total, used, free = shutil.disk_usage("/")
print(f"Free space: {free // (2**30)} GB")

If you have insufficient disk space, the download may stall. Clear old cached models and retry:

rm -rf ~/.datature/vi/models/
model = ViModel(
    run_id="your-run-id",
    overwrite=True
)

Slow inference

General slowness

Symptoms: Inference takes much longer than expected, or throughput is low.

Use a GPU:

import torch

print(f"CUDA available: {torch.cuda.is_available()}")

model = ViModel(run_id="your-run-id", device_map="cuda")

Enable FP16 and Flash Attention 2:

model = ViModel(
    run_id="your-run-id",
    dtype="float16",
    attn_implementation="flash_attention_2",
    device_map="auto"
)

Use batch inference instead of single-image loops:

# Good: batch inference
results = model(source=["img1.jpg", "img2.jpg", "img3.jpg"])

# Slow: sequential single-image calls
for img in ["img1.jpg", "img2.jpg", "img3.jpg"]:
    result, error = model(source=img)

First inference is slow

Symptom: The first prediction takes much longer than subsequent ones.

This is expected behavior. The model runs initialization work on the first call. Add a warm-up call before your timed measurements:

model = ViModel(run_id="your-run-id")

# Warm-up (discard result)
model(source="any_image.jpg")

# Subsequent calls will be faster
result, error = model(source="real_image.jpg")

Runtime errors

File not found

Symptoms:

FileNotFoundError: [Errno 2] No such file or directory: 'image.jpg'
import os
from pathlib import Path

image_path = os.path.abspath("image.jpg")
result, error = model(source=image_path)

# Or verify the file exists first
if Path("image.jpg").exists():
    result, error = model(source="image.jpg")
else:
    print("File not found")

Unsupported image format

Symptoms:

PIL.UnidentifiedImageError: cannot identify image file
from PIL import Image

img = Image.open("image.webp")
img = img.convert("RGB")
img.save("image.jpg", "JPEG")

result, error = model(source="image.jpg")

Supported formats: .jpg, .jpeg, .png, .bmp, .gif, .tiff, .tif, .webp

Prompt length mismatch

Symptoms:

ValueError: user_prompt length must match sources length
# Good: lengths match
results = model(
    source=["img1.jpg", "img2.jpg"],
    user_prompt=["Prompt 1", "Prompt 2"]
)

# Bad: one prompt for two images
results = model(
    source=["img1.jpg", "img2.jpg"],
    user_prompt=["Prompt 1"]  # raises ValueError
)

Result issues

No grounded phrases

Symptoms: result.grounded_phrases attribute is missing or bounding boxes are not returned.

info = ViModel.inspect(run_id="your-run-id")
print(f"Task type: {info.task_type}")

Not all models support phrase grounding. If info.task_type is VQA, the model will not return bounding boxes. Load a model trained for phrase grounding, or omit the prompt to let the model use its default detection behavior.

Also check that you are accessing the correct field:

from vi.inference.task_types.phrase_grounding import PhraseGroundingResponse

result, error = model(source="image.jpg")

if error is None and isinstance(result, PhraseGroundingResponse):
    print(f"Found {len(result.result.groundings)} objects")
    for grounding in result.result.groundings:
        print(f"  {grounding.phrase}: {grounding.grounding}")

Unexpected output format

Symptom: Accessing result.caption or result.grounded_phrases raises AttributeError.

The response field names depend on the response type. Always use isinstance() and access the correct fields:

from vi.inference.task_types.vqa import VQAResponse
from vi.inference.task_types.phrase_grounding import PhraseGroundingResponse

result, error = model(source="image.jpg")

if error is None:
    if isinstance(result, VQAResponse):
        print(f"Answer: {result.result.answer}")
    elif isinstance(result, PhraseGroundingResponse):
        print(f"Caption: {result.result.sentence}")
    else:
        print(f"Raw output: {result.result}")

See complete prediction schemas →

Poor quality results

Symptom: Predictions are inaccurate or vague.

Lower the temperature for factual tasks:

result, error = model(
    source="image.jpg",
    user_prompt="Describe this image",
    generation_config={
        "temperature": 0.0,
        "max_new_tokens": 256,
        "do_sample": False
    }
)

Write more specific prompts:

# Too vague
"Tell me about this"

# Specific
"What objects are visible in this image and where are they located?"

Check training completion:

client = vi.Client()
run = client.runs.get(run_id="your-run-id")
print(f"Status: {run.status.phase}")
# "completed" means training and export are done

Permission errors

Access denied

Symptoms:

PermissionError: [Errno 13] Permission denied
HTTPError: 403 Forbidden
import vi

client = vi.Client()
org = client.organizations
print(f"Organization: {org.name}")
import os

file_path = "image.jpg"
if os.access(file_path, os.R_OK):
    print("File is readable")
else:
    print("Cannot read file, check permissions")

output_dir = "./results"
os.makedirs(output_dir, exist_ok=True)

Debugging tips

Enable detailed logging

import logging

logging.basicConfig(level=logging.DEBUG)

model = ViModel(run_id="your-run-id")
result, error = model(source="image.jpg")

Check system resources

import psutil
import torch

cpu_percent = psutil.cpu_percent()
ram_percent = psutil.virtual_memory().percent

print(f"CPU: {cpu_percent}%")
print(f"RAM: {ram_percent}%")

if torch.cuda.is_available():
    for i in range(torch.cuda.device_count()):
        allocated = torch.cuda.memory_allocated(i) / 1e9
        reserved = torch.cuda.memory_reserved(i) / 1e9
        print(f"GPU {i}: Allocated: {allocated:.2f} GB, Reserved: {reserved:.2f} GB")

Build a minimal reproducible example

When reporting an issue, include a minimal script that shows the problem:

from vi.inference import ViModel
import vi, sys, torch

print(f"Vi SDK: {vi.__version__}")
print(f"Python: {sys.version}")
print(f"PyTorch: {torch.__version__}")
print(f"CUDA: {torch.version.cuda if torch.cuda.is_available() else 'N/A'}")

model = ViModel(run_id="your-run-id")
result, error = model(source="test.jpg")

if error:
    print(f"Error: {type(error).__name__}: {error}")
else:
    print(f"Success: {str(result.result)[:50]}...")

Getting help

If these steps don't resolve your issue:

1

Check result.raw_output for the full unprocessed model output

2

Review the Vi SDK changelog for known issues

Review the Vi SDK changelog for known issues

3

Ask in the Datature community

4

Contact Datature support

When reaching out, include your Vi SDK version, Python version, GPU details, the complete error message, and a minimal reproducible script.

Related resources

Improve Performance

Memory management, quantization, GPU utilization, and batching strategies.

Load Models

Quantization options, device mapping, caching, and loading error handling.

Prediction Schemas

Field names and types for VQAResponse, PhraseGroundingResponse, and GenericResponse.