What Are Annotations and How Do I Create Good Ones?

Annotations are the labels that teach your VLM what to look for and how to respond. Each annotation pairs an image with text: a descriptive phrase linked to a bounding box (phrase grounding), a question-answer pair (VQA), or custom text (freeform). Datature Vi uses your annotations as training signal, so annotation quality directly determines model quality. This page covers the three annotation types, what separates good annotations from bad ones, how to add chain-of-thought reasoning to your training data, and which formats you can import.

On this page

Annotation types Quality vs quantity Annotation quality checklist Common mistakes Chain-of-thought reasoning Upload formats AI-assisted annotation FAQ

Annotation types in Datature Vi

Datature Vi supports three annotation types, each designed for a different task. Choose the one that matches what you want the model to do.

For step-by-step annotation instructions, see Annotate for Phrase Grounding, Annotate for VQA, or Annotate for Freeform Text.

Quality vs quantity

Fifty accurate, specific annotations outperform 500 inconsistent ones. Datature Vi learns patterns from your training data, and noisy labels teach noisy patterns. A single annotation that says "box" gives the model almost nothing to work with. An annotation that says "the small red box on the top shelf" teaches the model about color, size, position, and the relationship between the object and its surroundings.

Consistency matters just as much as specificity. If you label the same type of defect as "scratch" in half your images and "surface damage" in the other half, the model has to learn two representations for the same concept. Pick one term and use it everywhere.

The table below shows recommended annotation volumes for each task type. Start with the minimum to validate your approach, then scale up once you confirm the model is learning the right patterns.

Annotation volume guidelines

Task type

Minimum

Recommended

Production

Annotation quality checklist

Before starting a training run, review a random sample of 20-30 annotations against this checklist. Catching problems before training saves hours of wasted compute.

Check

What to look for

Why it matters

Common annotation mistakes

These patterns appear often and degrade model performance. Each one is fixable by editing your annotations before retraining.

How to enable chain-of-thought reasoning in annotations

For complex reasoning tasks, you can include step-by-step reasoning in your VQA annotations. Prepend <datature_think> tags to the annotation answer text. During training, Datature Vi converts these to the model's native <think> tags, teaching the model to reason step by step before producing a final answer.

At inference time, the model outputs <think>...</think> and <answer>...</answer> tags. The Vi SDK parses these into separate thinking and answer fields on the response object, so you can access the reasoning and the final answer independently.

For more on how chain-of-thought reasoning works and when to use it, see Chain-of-Thought Reasoning. For generation parameter tuning during inference, see Configure Generation. For cot, stream, and related call options, see Run Inference.

Annotation formats for upload

If you have existing annotations from another tool or platform, Datature Vi accepts several standard formats. The format you choose depends on where your annotations came from.

Format

Best for

Notes

Automatic Coordinate Conversion

Datature Vi converts coordinate formats automatically during upload. COCO uses absolute pixel coordinates, YOLO uses normalized center coordinates, Pascal VOC uses absolute corner coordinates. You do not need to convert between these yourself.

For the full format specification with examples, see Upload Annotations.

AI-assisted annotation with IntelliScribe

Datature Vi includes IntelliScribe, an AI-assisted annotation tool that speeds up phrase grounding and freeform image annotation. IntelliScribe auto-generates captions and links phrases to bounding boxes. Press C to generate a caption for the current image, then press P to auto-link phrases to your drawn bounding boxes.

IntelliScribe works best on clear images with common objects. For specialized domains, treat the generated caption as a starting draft and edit it with domain-specific terminology before running phrase linking.

Learn more about AI-assisted annotation tools