Annotate For VQA

Visual Question Answering (VQA) annotations teach your VLM to understand and answer questions about images. Create comprehensive question-answer datasets using Datature's visual annotator.

📋
Prerequisites

A dataset with uploaded images

Understanding of what questions you want your model to answer

Basic familiarity with VQA concepts

Create a dataset if you don't have one yet.

💡
What is VQA?
Visual Question Answering allows you to ask questions about images and get answers from your model. Examples: "What color is the car?", "How many defects are visible?", "Is this product damaged?"
Learn more about VQA →

Open the annotator

Navigate to your dataset and click the Annotator tab

The annotator will open with empty views
Click on an image thumbnail at the bottom to load it

📘
Annotator interface
The VQA annotator has three main areas:

Left side — Image Annotator (displays your selected image)

Right side — Visual Question Answering panel (shows question-answer pairs for the selected image)

Bottom — Image thumbnail navigation strip (click any thumbnail to load that image)

Bottom right — Word and character count statistics for your annotations

Create VQA annotations

VQA annotations consist of question-answer pairs that teach your model to understand and respond to visual queries. Each image can have multiple questions, and each question should have a clear, concise answer.

Add a question-answer pair

On the right side, in the Visual Question Answering panel, click + Add Question

Type your question in the Question field
Type the answer in the Answer field
Click Add to save the question-answer pair

The annotation appears in the Visual Question Answering panel on the right. You can edit or delete it later if needed.

Word and character count tracking:

The bottom right corner displays the total word and character count for all question-answer pairs on the current image. This helps you:

Monitor annotation length across your dataset
Ensure consistency in answer detail
Track overall annotation volume

To view individual QA pair statistics:

Click on any question-answer pair in the right panel
The bottom right counter updates to show words and characters for that specific pair
This helps identify verbose answers that should be kept concise

✅
Best practices for VQA annotations

Be specific — Ask clear, focused questions (see Write effective questions)

Keep answers concise — Short answers work better for training (monitor using the word count)

Stay consistent — Use similar phrasing across similar questions

Vary question types — Include different question types (counting, colors, presence/absence, descriptions)

Review Annotation best practices for comprehensive guidelines.

Add multiple questions per image

You can add multiple question-answer pairs to the same image:

After adding your first question-answer pair, click + Add Question again
Enter another question and answer
Click Add to save
Repeat as needed

Example for a manufacturing image:

Q: "How many defects are visible?" → A: "3"
Q: "What type of defect is this?" → A: "Scratch"
Q: "Is the product acceptable?" → A: "No"

See Question types to include for more examples and guidance.

💡
How many questions per image?
Recommended: 2-5 question-answer pairs per image
More questions = more comprehensive training data. Vary the question types to teach your model different aspects of visual understanding.

Edit or delete annotations

You can modify or remove annotations at any time. This is useful for correcting mistakes, improving question clarity, or removing low-quality annotations.

To edit a question-answer pair:

In the Visual Question Answering panel on the right, locate the question-answer pair you want to edit
Click the question-answer pair to focus it (the word and character count will update on the bottom right to show this pair's statistics)
Click the edit icon or the question-answer pair itself to open the edit mode
Make your changes to improve clarity or accuracy (see Write effective questions)
Changes save automatically as you type

The word and character count updates in real-time as you edit. Your edits are tracked in your annotation statistics.

To delete a question-answer pair:

In the Visual Question Answering panel, locate the question-answer pair you want to remove
Click the three-dot menu (⋮) on the right side of the question-answer pair
Click Delete
Confirm the deletion

❗️
Deletions are immediate
Deleted annotations cannot be recovered. Make sure you want to remove the annotation before confirming. Use filters to review annotations before deletion.

When to edit vs. delete:

Edit — Fix typos, improve clarity, correct answers, or rephrase questions
Delete — Remove duplicate questions, unclear questions, or annotations that don't follow your annotation best practices

Navigate between images

Efficient navigation helps you annotate large datasets quickly. Use keyboard shortcuts or mouse controls to move between images while maintaining your annotation workflow.

Move to the next image

After adding question-answer pairs, move to the next image:

Using keyboard (recommended for speed):

Press E → Next image
Press Q → Previous image

Using mouse:

Click the next image thumbnail in the bottom navigation strip
Click on any specific image thumbnail at the bottom to jump directly to it

The bottom thumbnail strip shows all images in your dataset with "Annotated" badges indicating which images already have question-answer pairs.

Pro tip: Use keyboard shortcuts to maintain annotation speed without switching between keyboard and mouse.

Save your work

Your annotations save automatically when you:

Click Add after entering a question-answer pair
Navigate to another image using the bottom thumbnail strip or keyboard (press E or Q)
Make edits to existing annotations
Close the annotator

✅
Auto-save enabled
All changes save automatically and appear immediately in the Visual Question Answering panel on the right. You don't need to manually save your work. Use keyboard shortcuts to navigate efficiently through the bottom thumbnail strip.

Keyboard shortcuts

Speed up annotation with keyboard shortcuts. Master these to significantly improve your annotation efficiency.

Essential shortcuts

Shortcut	Action	Use case
E	Next image	After adding questions to current image
Q	Previous image	Review or edit previous annotations
Tab	Move between input fields	Quickly switch from Question to Answer field
Enter	Submit form	Add annotation without clicking Add button
?	Show all shortcuts	View complete keyboard shortcut reference

Workflow optimization

Efficient annotation pattern:

View image → Type question → Press Tab → Type answer → Press Enter
Press E to move to next image
Repeat

Alternative workflow:

Press Q to go back to previous images for review
Press E to continue forward through your dataset

This workflow keeps your hands on the keyboard for maximum speed. See Annotation best practices for more efficiency tips.

💡
Pro tip
Press ? in the annotator to see the complete list of keyboard shortcuts. Different annotation tools may have additional shortcuts.

Question writing guidelines

Writing effective questions is critical for training accurate VQA models. Well-structured questions lead to consistent answers and better model performance.

Write effective questions

Good VQA questions are clear, answerable, and useful for your use case. See Question types to include for specific categories and examples.

Good examples:

Use case	Question	Answer
Manufacturing QC	"Is this part defective?"	"Yes"
Retail inventory	"How many items are on the shelf?"	"12"
Agriculture	"What is the crop condition?"	"Healthy"
Medical imaging	"Is there abnormality present?"	"No"

Avoid these patterns:

Bad question	Why it's bad	Better version
"What do you see?"	Too vague	"What type of vehicle is this?"
"Is this good or bad or maybe damaged?"	Multiple questions	Split into separate questions
"Describe everything in detail"	Too broad	"What is the primary defect type?"

See Common questions for guidance on answer formatting and consistency.

Question types to include

Train a well-rounded model by including diverse question types. Use a mix of these categories across your dataset for comprehensive VQA training.

Counting questions

Format: "How many [objects] are visible?"

Counting questions teach your model to quantify objects or features in images. These are essential for inventory, quality control, and monitoring applications.

Examples:

"How many defects are present?" → "3"
"How many people are in the image?" → "7"
"How many red items are on the shelf?" → "12"
"How many complete products are visible?" → "5"
"How many safety violations are shown?" → "0"

Answer format: Use numbers only ("3", "0", "15")

Best practices:

Be specific about what to count
Use "0" for none, not "None" or "Zero"
For large numbers, ensure accuracy by counting carefully
Consider combining with presence/absence questions for validation

See Write effective questions for clarity guidelines.

Yes/No questions

Format: "Is [condition] true?"

Yes/No questions are simple binary classifications that help models make quick decisions. These are ideal for quality control, compliance checks, and condition assessment.

Examples:

"Is the product damaged?" → "Yes"
"Is the workspace clean?" → "No"
"Is protective equipment worn?" → "Yes"
"Is packaging intact?" → "Yes"
"Is there visible rust?" → "No"

Answer format: Use "Yes" or "No" consistently (not "yes"/"no", "Y"/"N", or "True"/"False")

Best practices:

Make questions unambiguous — avoid "maybe" situations
Phrase positively when possible ("Is X present?" vs. "Is X absent?")
Keep questions focused on one condition
Add multiple questions for complex scenarios

Pair with attribute questions for more detailed information.

Attribute questions

Format: "What [attribute] is the [object]?"

Attribute questions help models identify specific characteristics like color, size, shape, material, or condition. These are valuable for detailed object description and classification.

Examples:

"What color is the vehicle?" → "Blue"
"What size is the defect?" → "Large"
"What material is the surface?" → "Metal"
"What shape is the container?" → "Rectangular"
"What is the lighting condition?" → "Bright"

Answer format: Single word or short phrase ("Blue", "Large", "Metal")

Common attribute types:

Color: "Red", "Blue", "Green", "Silver"
Size: "Small", "Medium", "Large"
Shape: "Round", "Square", "Irregular"
Material: "Metal", "Plastic", "Wood", "Glass"
Condition: "New", "Worn", "Damaged"

Best practices:

Use consistent vocabulary across your dataset
Define size categories clearly for your team (see collaborative annotation)
Standardize color names (e.g., always "Blue" not "Light blue" or "Navy")
See Write effective questions for consistency guidelines

Category questions

Format: "What type of [object] is this?"

Category questions teach models to classify objects or conditions into predefined groups. These are essential for sorting, routing, and decision-making applications.

Examples:

"What type of defect is visible?" → "Scratch"
"What category does this product belong to?" → "Electronics"
"What kind of damage occurred?" → "Water damage"
"What vehicle type is shown?" → "Sedan"
"What equipment is this?" → "Forklift"

Answer format: Category name ("Scratch", "Electronics", "Water damage")

Best practices:

Define your categories clearly before annotating
Use mutually exclusive categories when possible
Maintain a category list for consistency
For overlapping categories, use multiple questions

Category planning:

List all possible categories for your use case
Create examples of each category
Share with your team if annotating collaboratively
Review annotation statistics to ensure balanced coverage

Combine with Yes/No questions for hierarchical classification.

Presence/Absence questions

Format: "Is [object] present in the image?"

Presence/absence questions focus on detecting specific objects or features regardless of quantity or attributes. Similar to Yes/No questions but specifically about object existence.

Examples:

"Is a safety helmet visible?" → "Yes"
"Is packaging intact?" → "No"
"Is the label present?" → "Yes"
"Is there any text in the image?" → "Yes"
"Is liquid leaking?" → "No"

Answer format: "Yes" or "No" consistently

When to use:

Safety and compliance checks
Quality control verification
Completeness validation
Component detection

Presence/Absence vs. Counting:

Use presence/absence when you only care if something exists
Use counting questions when quantity matters
Combine both for comprehensive coverage

Example combination:

"Is protective equipment visible?" → "Yes" (presence)
"How many people are wearing helmets?" → "3" (counting)

See Write effective questions to ensure clarity between presence and counting.

Complete annotation strategies

Apply question types strategically based on your use case. Here are proven annotation patterns for common scenarios.

Manufacturing quality control

Goal: Detect and classify defects for automated quality inspection

Question strategy:

Presence/Absence: "Is this product defective?" → "Yes"/"No"
Counting: "How many defects are visible?" → "2"
Category: "What type of defect is this?" → "Scratch"
Attribute: "What is the defect severity?" → "Minor"
Yes/No: "Should this be rejected?" → "Yes"

Why this works: Covers detection → classification → decision-making workflow

Retail inventory and shelf monitoring

Goal: Track products, stock levels, and planogram compliance

Question strategy:

Counting: "How many items are on the shelf?" → "15"
Category: "What product category is displayed?" → "Beverages"
Presence/Absence: "Is the price tag visible?" → "Yes"
Yes/No: "Is the shelf fully stocked?" → "No"
Attribute: "What is the stock level?" → "Low"

Why this works: Enables inventory tracking and compliance verification

Safety and compliance monitoring

Goal: Verify safety equipment usage and workplace compliance

Question strategy:

Counting: "How many people are in the workspace?" → "4"
Counting: "How many people are wearing helmets?" → "3"
Presence/Absence: "Is safety signage visible?" → "Yes"
Yes/No: "Is the area compliant?" → "No"
Category: "What type of violation is shown?" → "Missing PPE"

Why this works: Combines counting with compliance assessment

Agricultural monitoring

Goal: Assess crop health, growth stage, and pest detection

Question strategy:

Attribute: "What is the crop condition?" → "Healthy"
Presence/Absence: "Are pests visible?" → "No"
Category: "What growth stage is shown?" → "Flowering"
Yes/No: "Does the crop need irrigation?" → "Yes"
Attribute: "What is the canopy color?" → "Green"

Why this works: Monitors multiple health indicators simultaneously

Use these patterns as templates and adapt to your specific needs. Track your progress to ensure balanced coverage across question types.

Track annotation progress

Monitor your annotation progress to ensure you have sufficient training data. Track both quantity (number of annotations) and quality (diverse question types).

View annotation statistics

Check your progress from the dataset Explorer page:

Click the Annotations tab
View statistics:
- Total annotated images
- Total question-answer pairs
- Average questions per image
- Distribution by question type

View dataset insights for detailed analytics on your annotations.

💡
How many annotations do I need?
Minimum: 50 question-answer pairs across 20+ images
Recommended: 200+ question-answer pairs across 50+ images
Optimal: 500+ question-answer pairs with diverse question types
More diverse questions = better model understanding. See Write effective questions for quality guidelines.

Filter annotated images

In the Annotator:

Check the bottom thumbnail navigation strip to see which images have annotations (marked with "Annotated" badges)
Monitor word and character count on the bottom right corner:
- Total count — Shows combined statistics for all QA pairs on the current image
- Individual count — Click any QA pair to see its specific word and character count
Quickly identify images that need additional questions

Use the word count feature to ensure answer consistency across your dataset.

In the Explorer:

Click the Assets tab
Use filters to show:
- Images with annotations
- Images without annotations
- Images with specific answer values

Learn more about dataset management

Troubleshooting

Common issues and solutions when creating VQA annotations.

My annotations aren't saving

Possible causes and solutions:

Network connectivity — Check your internet connection. Annotations require an active connection to save.
Session timeout — If you've been idle, refresh the page and reopen the annotator.
Browser issues — Clear cache or try a different browser. Supported browsers: Chrome, Firefox, Safari, Edge.
Form incomplete — Ensure both Question and Answer fields are filled before clicking Add.

Verify your annotations saved:

Check annotation statistics in the dataset Explorer
Filter annotated images to see which images have annotations

I can't find the annotator

Solution steps:

Navigate to your dataset from the main Datasets page
Click the Annotator tab in the left sidebar (see Open the annotator)
If the tab is missing or annotator won't open, verify:
- You have uploaded images to the dataset
- You have permission to annotate (check with your organization admin)
- Your dataset type supports VQA annotations

The annotator interface shows your image on the left, question-answer pairs on the right, and thumbnails at the bottom.

The interface is slow or lagging

Performance optimization:

Large images — Images over 10 MB may load slowly. Upload optimized images when possible.
Browser resources — Close unnecessary tabs and applications.
Use keyboard shortcuts — Keyboard navigation is faster than mouse clicks.
Batch workflow — Annotate similar images together to reduce context switching.

My question-answer pairs disappeared

Recovery steps:

Check the image — Ensure you're viewing the correct image. Check the bottom thumbnail strip to confirm which image is selected.
Check the right panel — Question-answer pairs appear in the Visual Question Answering panel on the right side. Scroll down if you have many annotations.
Check filters — Remove filters that might be hiding annotations.
Review edit history — If annotations were deleted, they cannot be recovered. Re-annotate the image.
Browser cache — Try refreshing the page or clearing your browser cache.

Prevention: Use annotation statistics to regularly verify your progress. The bottom thumbnail strip shows "Annotated" badges on images with question-answer pairs.

I accidentally deleted annotations

Unfortunately, deletions cannot be undone. Best practices to prevent this:

Double-check before deleting — Review annotations carefully before clicking the three-dot menu and selecting Delete
Use edit instead — Edit annotations rather than deleting and recreating (edits save automatically)
Export backups — Regularly download your dataset as a backup
Team coordination — If annotating collaboratively, communicate before bulk deletions

How do I handle special characters in questions or answers?

Special characters are supported:

Standard punctuation — Question marks, commas, periods work normally
Numbers — Use digits for counting questions ("3", not "three")
Unicode — Non-English characters and symbols are supported
Avoid — Excessive special characters may confuse the model

Best practice: Keep answers concise with minimal special characters for better model training.

Common questions

Should I ask the same questions for every image?

It depends on your use case:

Same questions — Best when you want the model to answer consistent questions across different images (e.g., "Is this product defective?")

Varied questions — Best when training a general-purpose VQA model that needs to handle diverse questions

Recommended approach: Use a core set of questions for all images, plus additional specific questions where relevant.

Review Question types to include to plan your annotation strategy and Write effective questions for consistency guidelines.

How specific should my answers be?

Keep answers concise:

✅ Good: "3", "Red", "Damaged", "Top-left corner"
❌ Too detailed: "There are three defects visible on the surface, including two scratches and one dent located near the edge"

Why shorter is better:

Easier for the model to learn
Faster to annotate
More consistent across annotators
Better training results

Monitor answer length:

Use the word and character count displayed on the bottom right of the annotator:

Click any QA pair to see its individual word count
Compare counts across your annotations to ensure consistency
Aim for answers with fewer than 5 words when possible

For specific formats by question type, see Question types to include.

Can I use the same answer for different questions?

Yes! It's fine (and often correct) to have the same answer for different questions:

Q: "Is the product damaged?" → A: "Yes"
Q: "Is this part acceptable?" → A: "No"
Q: "Should this be rejected?" → A: "Yes"

The model learns to understand different questions that relate to the same visual information. This is common with Yes/No questions and Category questions.

You can add multiple questions to the same image with overlapping answers.

What if I don't know the answer to a question?

Best practices:

Skip unclear images — If you genuinely can't determine the answer, navigate to the next image
Use 'Unknown' — For questions where the answer isn't visible, you can use "Unknown" or "Not visible" consistently across your dataset
Refine your questions — If many images are unclear, revisit your question writing strategy

Avoid guessing — Incorrect annotations hurt model performance more than missing annotations. Focus on effective questions that have clear, verifiable answers.

Can I annotate images collaboratively?

Yes! Multiple team members can annotate different images in the same dataset:

Add team members to your organization
Assign different images to different annotators using filters
Use consistent question formats and answer styles across your team
Reference Question writing guidelines for consistency

Pro tip: Create annotation guidelines for your team based on Question types to include and Write effective questions to ensure consistency.

Track annotation progress across your team using dataset insights.

Can I import existing VQA annotations?

Yes! You can upload pre-existing VQA annotations:

Upload VQA annotations in Vi JSONL format
This is useful if you have annotations from other tools or datasets
You can then edit or add to these annotations in the annotator

After importing, use Track annotation progress to verify your uploaded annotations.

How do I handle multiple objects in one image?

Approach 1 — General questions:

Q: "How many defects are visible?" → A: "3" (see Counting questions)
Q: "What types of defects are present?" → A: "Scratches and dents" (see Category questions)

Approach 2 — Specific questions:

Q: "Is there a scratch on the left side?" → A: "Yes" (see Yes/No questions)
Q: "Is there a dent on the right side?" → A: "Yes" (see Presence/Absence questions)
Q: "What is near the center?" → A: "Paint chip" (see Attribute questions)

Both approaches work — Choose based on your use case. You can add multiple questions per image to combine both approaches.

Review Write effective questions for guidance on clarity and consistency.

Annotation best practices

Follow these guidelines to create high-quality VQA annotations that improve model performance.

Consistency is key

Use the same phrasing for similar questions across images
Standardize answer formats (e.g., always use "Yes"/"No", not "yes"/"no" or "Y"/"N")
Create a style guide for your team if annotating collaboratively
Review Question writing guidelines regularly

Quality over quantity

Accurate annotations are more valuable than many low-quality ones
Take time to write effective questions
Review and edit annotations when you spot errors
Use filters to review your work
Monitor word and character counts (bottom right) to maintain consistent answer length

Diverse coverage

Include various question types in your dataset
Cover different scenarios and edge cases
Add multiple questions per image when appropriate
Balance question difficulty (easy, medium, challenging)

Efficient workflow

Learn keyboard shortcuts to speed up annotation (use E and Q to navigate)
Use navigation features to move quickly between images
Click on QA pairs to check individual word counts and identify verbose answers
Track your progress to stay motivated
Take breaks to maintain annotation quality

What's next?

✅
Annotations complete!
Your VQA annotations are ready for training. Your model will learn to answer these types of questions on new images.

Review Annotation best practices to ensure your dataset is optimized for training.

Next steps

Train a VQA Model

Use your annotations to fine-tune a vision-language model

Upload More Annotations

Import existing VQA annotations in Vi JSONL format

Learn About VQA

Understand how VQA works and best practices

AI-Assisted Tools

Speed up annotation with AI-powered suggestions

Related resources

Annotate for Phrase Grounding — Object detection annotations
Upload annotations — Import existing VQA annotations
Visual Question Answering concepts — Deep dive into VQA
View dataset insights — Analyze your annotation quality
Manage datasets — Organize and maintain your datasets
Add team members — Collaborate on annotation projects

Need help?

We're here to support your VLMOps journey. Reach out through any of these channels:

Contact Support

Get help from our team via our website or email us at [email protected]

Join Our Community

Connect with other Datature users, share ideas, and get community support on Slack

Explore Resources

Read our Blog
Check out GitHub
Watch Tutorials

Schedule a Demo

Book a personalized demo to see how Datature Vi can accelerate your vision AI projects