Concepts

Datature Vi is a platform for building custom vision-language models (VLMs): models that read an image and text together (for example, answer an inspection question or place a box on a defect). You upload images, annotate them, fine-tune a model, and run inference. VLMOps is the name for that full lifecycle in one product, from labels to trained weights to how you ship predictions, similar to how teams talk about MLOps for traditional models.

If you are not technical, start with What Is Datature Vi? before this hub. The pages below explain how Datature Vi implements VLMs and related settings at mixed depth so you can stop when you have enough context.

New to VLMs?

What Is Datature Vi?

Learn what the platform does, who it's for, and how the workflow works.

What Are Vision-Language Models?

Understand how VLMs work, what tokens and parameters are, and the difference between LoRA and full fine-tuning.

Context Windows and Token Budgets

How images and text share a fixed token budget. Why image resolution affects output length.

Dataset types

Choose a dataset type based on your task. This determines your annotation format, model output, and evaluation metrics.

Phrase Grounding

Locate objects with natural language. The model draws a box on the image for phrases like 'the red car on the left.'

Visual Question Answering

Answer questions about images in natural language. Ask 'Is there a defect?' and get a text response.

Freeform Text

Define custom annotation schemas for specialized use cases and research projects.

How Do I Choose a Dataset Type?

Compare phrase grounding, VQA, and freeform text to find the right fit for your task.

Training, evaluation, and inference

Understand how Datature Vi trains models, what the settings mean, how to assess results, and how inference generates output.

When you pick a base model by name (for example Qwen3.5 4B vs 9B, or NVILA-Lite for tight memory), the smaller option is usually the right first experiment: lower cost and faster runs while you prove the workflow. Move up when metrics stop improving or when deployment constraints point to a specific architecture. Compare options in Model architectures.

System Prompts

How to instruct your model: role, focus, output format, and hallucination guards.

Annotation Guide

What annotations are, how to create good ones, and how to enable chain-of-thought reasoning.

How Does VLM Training Work?

Epochs, batch size, learning rate, loss curves, and validation splits in plain language.

LoRA and Quantization

How LoRA reduces training cost and quantization shrinks memory. NF4, FP4, QLoRA explained.

How Do I Evaluate My Model?

IoU, F1, BLEU, BERTScore, and what good scores look like for phrase grounding and VQA.

How Does Inference Work?

Token generation, temperature, top-p, top-k, and how to control model output.

Deployment and resources

How Do I Deploy My Trained Model?

Compare Vi SDK, NVIDIA NIM, and self-hosted options for getting your model into production.

Data Rows and Compute Credits

What consumes each resource, how to estimate costs, and what happens at limits.

Advanced capabilities

Chain-Of-Thought Reasoning

Break complex visual tasks into step-by-step reasoning processes for more accurate and explainable results.

Reference

Glossary

Definitions of key terms used across the Datature Vi platform and documentation.

Next steps

Create A Dataset

Set up a dataset for phrase grounding, VQA, or freeform text tasks.

Follow The Quickstart

Complete a VLM workflow from data upload to model deployment in 30 minutes.

Vi SDK Reference

Run inference, manage datasets, and automate workflows with the Python SDK.