Concepts

Understand the core vision-language model concepts that power Datature Vi. Start here if you're new to VLMs or computer vision.

Datature Vi is a platform for building custom vision-language models (VLMs): models that read an image and text together (for example, answer an inspection question or place a box on a defect). You upload images, annotate them, fine-tune a model, and run inference. VLMOps is the name for that full lifecycle in one product, from labels to trained weights to how you ship predictions, similar to how teams talk about MLOps for traditional models.

If you are not technical, start with What Is Datature Vi? before this hub. The pages below explain how Datature Vi implements VLMs and related settings at mixed depth so you can stop when you have enough context.

Start here for a product-shaped path: platform basics, data, training, then shipping and billing. Skip token budgets and training math until you need them.

  1. What Is Datature Vi? - Platform overview and workflow
  2. What Are Vision-Language Models? - How VLMs work
  3. How Do I Choose a Dataset Type? - Pick the right task type
  4. Annotation Guide - Create effective training data
  5. What Are System Prompts? - How to instruct your model
  6. How Does VLM Training Work? - Training settings and loss curves
  7. How Do I Evaluate My Model? - Metrics and benchmarks
  8. How Does Inference Work? - Generation settings and output control
  9. How Do I Deploy My Trained Model? - Deployment paths and production
  10. How Do Data Rows and Compute Credits Work? - Resource planning and costs

Read these after the first-pass list when you are sizing inputs, picking training efficiency, or adding step-by-step reasoning.

  1. What Are Context Windows and Token Budgets? - How images and text share token capacity
  2. How Do LoRA and Quantization Work? - Training efficiency techniques
  3. Chain-of-Thought Reasoning - Step-by-step reasoning for complex tasks

New to VLMs?

What Is Datature Vi?

Learn what the platform does, who it's for, and how the workflow works.

What Are Vision-Language Models?

Understand how VLMs work, what tokens and parameters are, and the difference between LoRA and full fine-tuning.

Context Windows and Token Budgets

How images and text share a fixed token budget. Why image resolution affects output length.


Dataset types

Choose a dataset type based on your task. This determines your annotation format, model output, and evaluation metrics.


Training, evaluation, and inference

Understand how Datature Vi trains models, what the settings mean, how to assess results, and how inference generates output.

When you pick a base model by name (for example Qwen3.5 4B vs 9B, or NVILA-Lite for tight memory), the smaller option is usually the right first experiment: lower cost and faster runs while you prove the workflow. Move up when metrics stop improving or when deployment constraints point to a specific architecture. Compare options in Model architectures.


Deployment and resources

How Do I Deploy My Trained Model?

Compare Vi SDK, NVIDIA NIM, and self-hosted options for getting your model into production.

Data Rows and Compute Credits

What consumes each resource, how to estimate costs, and what happens at limits.


Advanced capabilities

Chain-Of-Thought Reasoning

Break complex visual tasks into step-by-step reasoning processes for more accurate and explainable results.


Reference

Glossary

Definitions of key terms used across the Datature Vi platform and documentation.


Next steps

Create A Dataset

Set up a dataset for phrase grounding, VQA, or freeform text tasks.

Follow The Quickstart

Complete a VLM workflow from data upload to model deployment in 30 minutes.

Vi SDK Reference

Run inference, manage datasets, and automate workflows with the Python SDK.