What Is Datature Vi?

Datature Vi is a VLMOps platform for building custom vision-language models (VLMs) trained on your own data. You upload images, annotate them, fine-tune a VLM, and deploy it with the Vi SDK or NVIDIA NIM containers. Vi manages the GPU infrastructure and training pipeline so you can focus on your data.

New to AI? Start here.

You show Vi a few dozen labeled images of what you want it to find or answer, and Vi trains an AI model that can do the same thing on new images automatically. No coding required to get started.

Not sure where to begin? Follow the 30-minute quickstart.

Why build a custom model instead of using ChatGPT or Gemini?

General-purpose AI models can describe images, but production workflows need results that are reliable, consistent, and specific to your domain.

General-purpose AI (ChatGPT, Gemini)

Custom VLM on Datature Vi

Domain knowledge

Trained on generic web data

Trained on YOUR labeled images

Consistency at scale

May vary between identical requests

Same output for the same visual pattern

Data control

Images sent to third-party API

Download weights, run on your own servers

Cost at volume

Per-request pricing adds up

Fixed cost regardless of volume

What is a vision-language model?

A VLM takes an image and a text prompt as input and produces a text response. Traditional CV models detect objects from a fixed category list. VLMs understand both the image and what you're asking about it, in natural language.

For a full breakdown of how VLMs work, see What Are Vision-Language Models?.

What problems can Datature Vi solve?

Any task where a person looks at an image and makes a judgment call is a candidate for a custom VLM. Datature Vi supports three types of visual task, each suited to a different class of problem.

Phrase Grounding

Find and locate objects

Detect items by describing them in natural language. Use it for defect spotting on production lines, inventory counting on warehouse shelves, or identifying hazards in site photos.

VQA

Answer questions about images

Ask a question about an image and get a grounded answer. Use it for quality inspection reports, compliance checks, or triaging images before human review.

Freeform Text

Extract or generate text

Define any output format: medical reports, inspection checklists, structured JSON, or descriptions. The model produces your format from your training examples.

How does the workflow work?

You bring images and domain knowledge. Vi handles GPU infrastructure, training, and export. The four-step workflow:

Create a dataset and upload images

Choose a task type (phrase grounding, VQA, or freeform text) and upload your images. Datature Vi supports JPEG, PNG, TIFF, BMP, WebP, and more.

Create a dataset guide →

Annotate your images

Add labels that teach the model what to look for. For phrase grounding, draw bounding boxes and link them to text descriptions. For VQA, write question-answer pairs. For freeform text, write any text that describes the image.

Annotation guide →

Train a VLM

Configure a model architecture, set training parameters, and launch a training run. Datature Vi allocates GPU hardware, runs the training, and tracks metrics automatically.

Train a model guide →

Deploy and run inference

Download your trained model and run predictions on new images using the Vi SDK (local inference) or NVIDIA NIM containers (production deployment).

Deploy and test guide → · Vi SDK inference docs →

Key terms in plain language

If you are new to AI and VLMs, here are the terms you will see most often in these docs. Each is explained without technical jargon.

For a full glossary, see the searchable glossary.

Who is Datature Vi for?

Role

What you get from Vi

Operations teams

Replace manual visual inspection. Train defect detection, inventory counting, and QC models in the browser. No code required.

Developers

Add visual AI via the Vi SDK. Object localization, visual Q&A, and text extraction through a Python API. Local or container deployment.

ML engineers

Fine-tune VLMs on domain data without managing GPU clusters. LoRA and full fine-tuning, side-by-side run comparison, downloadable weights.

Researchers

Experiment with VLM architectures, custom annotation schemas, and chain-of-thought reasoning. Freeform text datasets for novel tasks.

See it in action

These end-to-end guides walk through real industry workflows, from image collection to running inference. Each one is written for domain teams, not data scientists.

Manufacturing Inspection

Detect defects and surface anomalies on production line images.

Document Processing

Extract structured fields from invoices, receipts, and forms.

Agriculture

Assess crop health and detect disease from drone or ground-level images.

See all use cases →

Choose your starting point

I want to understand VLMs first

Start with a crash course on vision-language models, training, and evaluation concepts.

I'm ready to build

Jump into the 30-minute quickstart. Train and deploy your first VLM.

I'm a developer with code

Install the Vi SDK and start running inference or managing datasets programmatically.

I want to evaluate the platform

Explore plans, pricing, and capabilities to see if Datature Vi fits your needs.