Concepts
Understand the core vision-language model concepts that power Datature Vi. Start here if you're new to VLMs or computer vision.
Datature Vi is a platform for building custom vision-language models (VLMs): models that read an image and text together (for example, answer an inspection question or place a box on a defect). You upload images, annotate them, fine-tune a model, and run inference. VLMOps is the name for that full lifecycle in one product, from labels to trained weights to how you ship predictions, similar to how teams talk about MLOps for traditional models.
If you are not technical, start with What Is Datature Vi? before this hub. The pages below explain how Datature Vi implements VLMs and related settings at mixed depth so you can stop when you have enough context.
New to VLMs?
What Is Datature Vi?
Learn what the platform does, who it's for, and how the workflow works.
What Are Vision-Language Models?
Understand how VLMs work, what tokens and parameters are, and the difference between LoRA and full fine-tuning.
Context Windows and Token Budgets
How images and text share a fixed token budget. Why image resolution affects output length.
Dataset types
Choose a dataset type based on your task. This determines your annotation format, model output, and evaluation metrics.
Phrase Grounding
Locate objects with natural language. The model draws a box on the image for phrases like 'the red car on the left.'
Visual Question Answering
Answer questions about images in natural language. Ask 'Is there a defect?' and get a text response.
Freeform Text
Define custom annotation schemas for specialized use cases and research projects.
How Do I Choose a Dataset Type?
Compare phrase grounding, VQA, and freeform text to find the right fit for your task.
Training, evaluation, and inference
Understand how Datature Vi trains models, what the settings mean, how to assess results, and how inference generates output.
When you pick a base model by name (for example Qwen3.5 4B vs 9B, or NVILA-Lite for tight memory), the smaller option is usually the right first experiment: lower cost and faster runs while you prove the workflow. Move up when metrics stop improving or when deployment constraints point to a specific architecture. Compare options in Model architectures.
System Prompts
How to instruct your model: role, focus, output format, and hallucination guards.
Annotation Guide
What annotations are, how to create good ones, and how to enable chain-of-thought reasoning.
How Does VLM Training Work?
Epochs, batch size, learning rate, loss curves, and validation splits in plain language.
LoRA and Quantization
How LoRA reduces training cost and quantization shrinks memory. NF4, FP4, QLoRA explained.
How Do I Evaluate My Model?
IoU, F1, BLEU, BERTScore, and what good scores look like for phrase grounding and VQA.
How Does Inference Work?
Token generation, temperature, top-p, top-k, and how to control model output.
Deployment and resources
Advanced capabilities
Reference
Next steps
Updated 1 day ago
