What Is Datature Vi?

Datature Vi is a VLMOps platform for building, training, and deploying custom vision-language models on your own data.

Datature Vi is a VLMOps platform for building custom vision-language models (VLMs) trained on your own data. You upload images, annotate them, fine-tune a VLM, and deploy it with the Vi SDK or NVIDIA NIM containers. Vi manages the GPU infrastructure and training pipeline so you can focus on your data.

New to AI? Start here.

You show Vi a few dozen labeled images of what you want it to find or answer, and Vi trains an AI model that can do the same thing on new images automatically. No coding required to get started.

Not sure where to begin? Follow the 30-minute quickstart.


Why build a custom model instead of using ChatGPT or Gemini?

General-purpose AI models can describe images, but production workflows need results that are reliable, consistent, and specific to your domain.

General-purpose AI (ChatGPT, Gemini)
Custom VLM on Datature Vi
Domain knowledge
Trained on generic web data
Trained on YOUR labeled images
Consistency at scale
May vary between identical requests
Same output for the same visual pattern
Data control
Images sent to third-party API
Download weights, run on your own servers
Cost at volume
Per-request pricing adds up
Fixed cost regardless of volume

What is a vision-language model?

A VLM takes an image and a text prompt as input and produces a text response. Traditional CV models detect objects from a fixed category list. VLMs understand both the image and what you're asking about it, in natural language.

For a full breakdown of how VLMs work, see What Are Vision-Language Models?.


What problems can Datature Vi solve?

Any task where a person looks at an image and makes a judgment call is a candidate for a custom VLM. Datature Vi supports three types of visual task, each suited to a different class of problem.

Phrase Grounding

Find and locate objects

Detect items by describing them in natural language. Use it for defect spotting on production lines, inventory counting on warehouse shelves, or identifying hazards in site photos.

VQA

Answer questions about images

Ask a question about an image and get a grounded answer. Use it for quality inspection reports, compliance checks, or triaging images before human review.

Freeform Text

Extract or generate text

Define any output format: medical reports, inspection checklists, structured JSON, or descriptions. The model produces your format from your training examples.


How does the workflow work?

You bring images and domain knowledge. Vi handles GPU infrastructure, training, and export. The four-step workflow:

1

Create a dataset and upload images

Choose a task type (phrase grounding, VQA, or freeform text) and upload your images. Datature Vi supports JPEG, PNG, TIFF, BMP, WebP, and more.

Create a dataset guide →

2

Annotate your images

Add labels that teach the model what to look for. For phrase grounding, draw bounding boxes and link them to text descriptions. For VQA, write question-answer pairs. For freeform text, write any text that describes the image.

Annotation guide →

3

Train a VLM

Configure a model architecture, set training parameters, and launch a training run. Datature Vi allocates GPU hardware, runs the training, and tracks metrics automatically.

Train a model guide →

4

Deploy and run inference

Download your trained model and run predictions on new images using the Vi SDK (local inference) or NVIDIA NIM containers (production deployment).

Deploy and test guide → · Vi SDK inference docs →


Key terms in plain language

If you are new to AI and VLMs, here are the terms you will see most often in these docs. Each is explained without technical jargon.

Short for "vision-language model operations." It describes the process of managing VLMs through their full lifecycle: preparing data, training, evaluating, and deploying. Think of it as the set of tools and workflows for building image-understanding AI.

Taking a pre-built AI model and teaching it your specific task by showing it your labeled images. The model already knows general things about images and language; fine-tuning teaches it YOUR domain (for example, what a defect looks like on YOUR product).

A faster, cheaper way to fine-tune a model. Instead of updating all of the model's internal settings, LoRA updates a small add-on layer (about 1-5% of the total). This makes training faster and uses less GPU memory while producing similar results for most tasks.

A task type where the model finds objects in an image based on a text description and draws a box around them. "Find all scratches on this surface" would return the locations of each scratch.

The internal settings a model learns during training. "4B" means 4 billion parameters. Larger models can learn more complex patterns but need more time and memory to train and run. For most tasks, start with a smaller model and scale up only if needed.

Running a trained model on a new image to get a prediction. After you train a model on defect images, running it on a new product photo to check for defects is inference.

Billing units in Datature Vi. A data row is consumed when you upload an image (5 data rows) or create an annotation (1 data row). A compute credit is consumed when you use GPU time for training. See Resource Usage for the full breakdown.

One complete pass through your entire training dataset. If you train for 50 epochs, the model sees every image 50 times. More epochs can improve accuracy but risk overfitting (memorizing the training data instead of learning general patterns).

For a full glossary, see the searchable glossary.


Who is Datature Vi for?

Role
What you get from Vi
Operations teams
Replace manual visual inspection. Train defect detection, inventory counting, and QC models in the browser. No code required.
Developers
Add visual AI via the Vi SDK. Object localization, visual Q&A, and text extraction through a Python API. Local or container deployment.
ML engineers
Fine-tune VLMs on domain data without managing GPU clusters. LoRA and full fine-tuning, side-by-side run comparison, downloadable weights.
Researchers
Experiment with VLM architectures, custom annotation schemas, and chain-of-thought reasoning. Freeform text datasets for novel tasks.

See it in action

These end-to-end guides walk through real industry workflows, from image collection to running inference. Each one is written for domain teams, not data scientists.

Manufacturing Inspection

Detect defects and surface anomalies on production line images.

Document Processing

Extract structured fields from invoices, receipts, and forms.

Agriculture

Assess crop health and detect disease from drone or ground-level images.

See all use cases →


Choose your starting point