VQA

Before You Start

A visual question answering dataset with uploaded images
- A list of the questions you want your model to answer

Visual question answering (VQA) annotations teach your vision-language model (VLM) to answer specific questions about images. Each annotation is a question-answer pair: you write both the question and the correct answer, and your model learns from those examples. This guide walks through creating VQA annotations in Datature Vi.

Open the annotator

Go to your dataset, then click the Annotate tab. Click any image thumbnail in the bottom strip to load it.

Open the Annotator tab

From the Dataset Overview page, click the Annotator tab to open the labeling interface.

You should see

Dataset Overview showing image and annotation count, and heatmaps showing annotation patterns

Your annotations are ready when you see annotation count matching the image count, and heatmaps showing annotation patterns.

Keyboard shortcuts

Key

Action

Question types

Train a well-rounded model by mixing question types. Each type teaches the model a different kind of visual reasoning.

Writing effective questions

Good questions are clear, specific, and have verifiable answers. These patterns produce well-structured annotations.

Good examples:

Good examples

Use case

Question

Answer

Avoid these patterns:

Avoid these patterns

Bad question

Problem

Better version

Keep answers short. The model learns better from "Yes" than "Yes, there are three scratches on the left side near the edge." Aim for answers under five words.

Edit or delete annotations

To edit a question-answer pair: Click the pair in the Visual Question Answering panel to focus it. Click the pair again to enter edit mode. Make your changes; they save automatically.

To delete a question-answer pair: Click the three-dot menu on the right side of the pair, then click Delete. Confirm the deletion.

Deletions Are Immediate

Deleted question-answer pairs cannot be recovered. Export your dataset regularly as a backup.

Track annotation progress

The annotator's bottom-right corner shows word and character counts for the current image. Click any question-answer pair to see counts for that specific pair, which is useful for spotting answers that are too long.

For dataset-wide statistics, open the Annotations tab in the Explorer. It shows total annotated images, total question-answer pairs, and average questions per image.

How many annotations do you need?

Minimum: 50 question-answer pairs across 20+ images
Recommended: 200+ pairs across 50+ images
Good coverage: 500+ pairs with mixed question types

View the dataset overview for detailed analytics.

VQA

Open the annotator

Open the Annotator tab

Keyboard shortcuts

Keyboard shortcuts

Question types

Writing effective questions

Good examples

Avoid these patterns

Edit or delete annotations

Track annotation progress

Chain-of-thought reasoning

Next steps