VQA
Add question-answer pairs to your images to train a vision-language model for visual question answering tasks.
- A visual question answering dataset with uploaded images
- A list of the questions you want your model to answer
Visual question answering (VQA) annotations teach your vision-language model (VLM) to answer specific questions about images. Each annotation is a question-answer pair: you write both the question and the correct answer, and your model learns from those examples. This guide walks through creating VQA annotations in Datature Vi.
Open the annotator
Go to your dataset, then click the Annotate tab. Click any image thumbnail in the bottom strip to load it.
Open the Annotator tab

From the Dataset Overview page, click the Annotator tab to open the labeling interface.

Your annotations are ready when you see annotation count matching the image count, and heatmaps showing annotation patterns.
Keyboard shortcuts
Question types
Train a well-rounded model by mixing question types. Each type teaches the model a different kind of visual reasoning.
Writing effective questions
Good questions are clear, specific, and have verifiable answers. These patterns produce well-structured annotations.
Good examples:
Avoid these patterns:
Keep answers short. The model learns better from "Yes" than "Yes, there are three scratches on the left side near the edge." Aim for answers under five words.
Edit or delete annotations
To edit a question-answer pair: Click the pair in the Visual Question Answering panel to focus it. Click the pair again to enter edit mode. Make your changes; they save automatically.
To delete a question-answer pair: Click the three-dot menu on the right side of the pair, then click Delete. Confirm the deletion.
Deleted question-answer pairs cannot be recovered. Export your dataset regularly as a backup.
Track annotation progress
The annotator's bottom-right corner shows word and character counts for the current image. Click any question-answer pair to see counts for that specific pair, which is useful for spotting answers that are too long.
For dataset-wide statistics, open the Annotations tab in the Explorer. It shows total annotated images, total question-answer pairs, and average questions per image.
How many annotations do you need?
- Minimum: 50 question-answer pairs across 20+ images
- Recommended: 200+ pairs across 50+ images
- Good coverage: 500+ pairs with mixed question types
View the dataset overview for detailed analytics.
Chain-of-thought reasoning
Next steps
Updated about 1 month ago
