Annotate For VQA

Create Visual Question Answering annotations by adding question-answer pairs to your images.

Visual Question Answering (VQA) annotations teach your VLM to understand and answer questions about images. Create comprehensive question-answer datasets using Datature's visual annotator.

📋

Prerequisites

Create a dataset if you don't have one yet.

💡

What is VQA?

Visual Question Answering allows you to ask questions about images and get answers from your model. Examples: "What color is the car?", "How many defects are visible?", "Is this product damaged?"

Learn more about VQA →


Open the annotator

  1. Navigate to your dataset and click the Annotator tab
Annotator tab in dataset navigation
  1. The annotator will open with empty views

  2. Click on an image thumbnail at the bottom to load it

📘

Annotator interface

Annotator interface

The VQA annotator has three main areas:

  • Left side — Image Annotator (displays your selected image)
  • Right side — Visual Question Answering panel (shows question-answer pairs for the selected image)
  • Bottom — Image thumbnail navigation strip (click any thumbnail to load that image)
  • Bottom right — Word and character count statistics for your annotations

Create VQA annotations

VQA annotations consist of question-answer pairs that teach your model to understand and respond to visual queries. Each image can have multiple questions, and each question should have a clear, concise answer.

Add a question-answer pair

  1. On the right side, in the Visual Question Answering panel, click + Add Question
VQA question field
  1. Type your question in the Question field

  2. Type the answer in the Answer field

  3. Click Add to save the question-answer pair

The annotation appears in the Visual Question Answering panel on the right. You can edit or delete it later if needed.

Word and character count tracking:

The bottom right corner displays the total word and character count for all question-answer pairs on the current image. This helps you:

  • Monitor annotation length across your dataset
  • Ensure consistency in answer detail
  • Track overall annotation volume

To view individual QA pair statistics:

  • Click on any question-answer pair in the right panel
  • The bottom right counter updates to show words and characters for that specific pair
  • This helps identify verbose answers that should be kept concise

Best practices for VQA annotations

  • Be specific — Ask clear, focused questions (see Write effective questions)
  • Keep answers concise — Short answers work better for training (monitor using the word count)
  • Stay consistent — Use similar phrasing across similar questions
  • Vary question types — Include different question types (counting, colors, presence/absence, descriptions)

Review Annotation best practices for comprehensive guidelines.

Add multiple questions per image

You can add multiple question-answer pairs to the same image:

  1. After adding your first question-answer pair, click + Add Question again

  2. Enter another question and answer

  3. Click Add to save

  4. Repeat as needed

Example for a manufacturing image:

  • Q: "How many defects are visible?" → A: "3"
  • Q: "What type of defect is this?" → A: "Scratch"
  • Q: "Is the product acceptable?" → A: "No"

See Question types to include for more examples and guidance.

💡

How many questions per image?

Recommended: 2-5 question-answer pairs per image

More questions = more comprehensive training data. Vary the question types to teach your model different aspects of visual understanding.

Edit or delete annotations

You can modify or remove annotations at any time. This is useful for correcting mistakes, improving question clarity, or removing low-quality annotations.

To edit a question-answer pair:

  1. In the Visual Question Answering panel on the right, locate the question-answer pair you want to edit

  2. Click the question-answer pair to focus it (the word and character count will update on the bottom right to show this pair's statistics)

  3. Click the edit icon or the question-answer pair itself to open the edit mode

  4. Make your changes to improve clarity or accuracy (see Write effective questions)

  5. Changes save automatically as you type

The word and character count updates in real-time as you edit. Your edits are tracked in your annotation statistics.

To delete a question-answer pair:

  1. In the Visual Question Answering panel, locate the question-answer pair you want to remove

  2. Click the three-dot menu (⋮) on the right side of the question-answer pair

  3. Click Delete

  4. Confirm the deletion

❗️

Deletions are immediate

Deleted annotations cannot be recovered. Make sure you want to remove the annotation before confirming. Use filters to review annotations before deletion.

When to edit vs. delete:

  • Edit — Fix typos, improve clarity, correct answers, or rephrase questions
  • Delete — Remove duplicate questions, unclear questions, or annotations that don't follow your annotation best practices

Navigate between images

Efficient navigation helps you annotate large datasets quickly. Use keyboard shortcuts or mouse controls to move between images while maintaining your annotation workflow.

Move to the next image

After adding question-answer pairs, move to the next image:

Using keyboard (recommended for speed):

  • Press E → Next image
  • Press Q → Previous image

Using mouse:

  • Click the next image thumbnail in the bottom navigation strip
  • Click on any specific image thumbnail at the bottom to jump directly to it

The bottom thumbnail strip shows all images in your dataset with "Annotated" badges indicating which images already have question-answer pairs.

Pro tip: Use keyboard shortcuts to maintain annotation speed without switching between keyboard and mouse.

Save your work

Your annotations save automatically when you:

Auto-save enabled

All changes save automatically and appear immediately in the Visual Question Answering panel on the right. You don't need to manually save your work. Use keyboard shortcuts to navigate efficiently through the bottom thumbnail strip.


Keyboard shortcuts

Speed up annotation with keyboard shortcuts. Master these to significantly improve your annotation efficiency.

Essential shortcuts

ShortcutActionUse case
ENext imageAfter adding questions to current image
QPrevious imageReview or edit previous annotations
TabMove between input fieldsQuickly switch from Question to Answer field
EnterSubmit formAdd annotation without clicking Add button
?Show all shortcutsView complete keyboard shortcut reference

Workflow optimization

Efficient annotation pattern:

  1. View image → Type question → Press Tab → Type answer → Press Enter
  2. Press E to move to next image
  3. Repeat

Alternative workflow:

  • Press Q to go back to previous images for review
  • Press E to continue forward through your dataset

This workflow keeps your hands on the keyboard for maximum speed. See Annotation best practices for more efficiency tips.

💡

Pro tip

Press ? in the annotator to see the complete list of keyboard shortcuts. Different annotation tools may have additional shortcuts.


Question writing guidelines

Writing effective questions is critical for training accurate VQA models. Well-structured questions lead to consistent answers and better model performance.

Write effective questions

Good VQA questions are clear, answerable, and useful for your use case. See Question types to include for specific categories and examples.

Good examples:

Use caseQuestionAnswer
Manufacturing QC"Is this part defective?""Yes"
Retail inventory"How many items are on the shelf?""12"
Agriculture"What is the crop condition?""Healthy"
Medical imaging"Is there abnormality present?""No"

Avoid these patterns:

Bad questionWhy it's badBetter version
"What do you see?"Too vague"What type of vehicle is this?"
"Is this good or bad or maybe damaged?"Multiple questionsSplit into separate questions
"Describe everything in detail"Too broad"What is the primary defect type?"

See Common questions for guidance on answer formatting and consistency.

Question types to include

Train a well-rounded model by including diverse question types. Use a mix of these categories across your dataset for comprehensive VQA training.

Counting questions

Format: "How many [objects] are visible?"

Counting questions teach your model to quantify objects or features in images. These are essential for inventory, quality control, and monitoring applications.

Examples:

  • "How many defects are present?" → "3"
  • "How many people are in the image?" → "7"
  • "How many red items are on the shelf?" → "12"
  • "How many complete products are visible?" → "5"
  • "How many safety violations are shown?" → "0"

Answer format: Use numbers only ("3", "0", "15")

Best practices:

  • Be specific about what to count
  • Use "0" for none, not "None" or "Zero"
  • For large numbers, ensure accuracy by counting carefully
  • Consider combining with presence/absence questions for validation

See Write effective questions for clarity guidelines.

Yes/No questions

Format: "Is [condition] true?"

Yes/No questions are simple binary classifications that help models make quick decisions. These are ideal for quality control, compliance checks, and condition assessment.

Examples:

  • "Is the product damaged?" → "Yes"
  • "Is the workspace clean?" → "No"
  • "Is protective equipment worn?" → "Yes"
  • "Is packaging intact?" → "Yes"
  • "Is there visible rust?" → "No"

Answer format: Use "Yes" or "No" consistently (not "yes"/"no", "Y"/"N", or "True"/"False")

Best practices:

  • Make questions unambiguous — avoid "maybe" situations
  • Phrase positively when possible ("Is X present?" vs. "Is X absent?")
  • Keep questions focused on one condition
  • Add multiple questions for complex scenarios

Pair with attribute questions for more detailed information.

Attribute questions

Format: "What [attribute] is the [object]?"

Attribute questions help models identify specific characteristics like color, size, shape, material, or condition. These are valuable for detailed object description and classification.

Examples:

  • "What color is the vehicle?" → "Blue"
  • "What size is the defect?" → "Large"
  • "What material is the surface?" → "Metal"
  • "What shape is the container?" → "Rectangular"
  • "What is the lighting condition?" → "Bright"

Answer format: Single word or short phrase ("Blue", "Large", "Metal")

Common attribute types:

  • Color: "Red", "Blue", "Green", "Silver"
  • Size: "Small", "Medium", "Large"
  • Shape: "Round", "Square", "Irregular"
  • Material: "Metal", "Plastic", "Wood", "Glass"
  • Condition: "New", "Worn", "Damaged"

Best practices:

  • Use consistent vocabulary across your dataset
  • Define size categories clearly for your team (see collaborative annotation)
  • Standardize color names (e.g., always "Blue" not "Light blue" or "Navy")
  • See Write effective questions for consistency guidelines
Category questions

Format: "What type of [object] is this?"

Category questions teach models to classify objects or conditions into predefined groups. These are essential for sorting, routing, and decision-making applications.

Examples:

  • "What type of defect is visible?" → "Scratch"
  • "What category does this product belong to?" → "Electronics"
  • "What kind of damage occurred?" → "Water damage"
  • "What vehicle type is shown?" → "Sedan"
  • "What equipment is this?" → "Forklift"

Answer format: Category name ("Scratch", "Electronics", "Water damage")

Best practices:

  • Define your categories clearly before annotating
  • Use mutually exclusive categories when possible
  • Maintain a category list for consistency
  • For overlapping categories, use multiple questions

Category planning:

  1. List all possible categories for your use case
  2. Create examples of each category
  3. Share with your team if annotating collaboratively
  4. Review annotation statistics to ensure balanced coverage

Combine with Yes/No questions for hierarchical classification.

Presence/Absence questions

Format: "Is [object] present in the image?"

Presence/absence questions focus on detecting specific objects or features regardless of quantity or attributes. Similar to Yes/No questions but specifically about object existence.

Examples:

  • "Is a safety helmet visible?" → "Yes"
  • "Is packaging intact?" → "No"
  • "Is the label present?" → "Yes"
  • "Is there any text in the image?" → "Yes"
  • "Is liquid leaking?" → "No"

Answer format: "Yes" or "No" consistently

When to use:

  • Safety and compliance checks
  • Quality control verification
  • Completeness validation
  • Component detection

Presence/Absence vs. Counting:

  • Use presence/absence when you only care if something exists
  • Use counting questions when quantity matters
  • Combine both for comprehensive coverage

Example combination:

  • "Is protective equipment visible?" → "Yes" (presence)
  • "How many people are wearing helmets?" → "3" (counting)

See Write effective questions to ensure clarity between presence and counting.

Complete annotation strategies

Apply question types strategically based on your use case. Here are proven annotation patterns for common scenarios.

Manufacturing quality control

Goal: Detect and classify defects for automated quality inspection

Question strategy:

  1. Presence/Absence: "Is this product defective?" → "Yes"/"No"
  2. Counting: "How many defects are visible?" → "2"
  3. Category: "What type of defect is this?" → "Scratch"
  4. Attribute: "What is the defect severity?" → "Minor"
  5. Yes/No: "Should this be rejected?" → "Yes"

Why this works: Covers detection → classification → decision-making workflow

Retail inventory and shelf monitoring

Goal: Track products, stock levels, and planogram compliance

Question strategy:

  1. Counting: "How many items are on the shelf?" → "15"
  2. Category: "What product category is displayed?" → "Beverages"
  3. Presence/Absence: "Is the price tag visible?" → "Yes"
  4. Yes/No: "Is the shelf fully stocked?" → "No"
  5. Attribute: "What is the stock level?" → "Low"

Why this works: Enables inventory tracking and compliance verification

Safety and compliance monitoring

Goal: Verify safety equipment usage and workplace compliance

Question strategy:

  1. Counting: "How many people are in the workspace?" → "4"
  2. Counting: "How many people are wearing helmets?" → "3"
  3. Presence/Absence: "Is safety signage visible?" → "Yes"
  4. Yes/No: "Is the area compliant?" → "No"
  5. Category: "What type of violation is shown?" → "Missing PPE"

Why this works: Combines counting with compliance assessment

Agricultural monitoring

Goal: Assess crop health, growth stage, and pest detection

Question strategy:

  1. Attribute: "What is the crop condition?" → "Healthy"
  2. Presence/Absence: "Are pests visible?" → "No"
  3. Category: "What growth stage is shown?" → "Flowering"
  4. Yes/No: "Does the crop need irrigation?" → "Yes"
  5. Attribute: "What is the canopy color?" → "Green"

Why this works: Monitors multiple health indicators simultaneously

Use these patterns as templates and adapt to your specific needs. Track your progress to ensure balanced coverage across question types.


Track annotation progress

Monitor your annotation progress to ensure you have sufficient training data. Track both quantity (number of annotations) and quality (diverse question types).

View annotation statistics

Check your progress from the dataset Explorer page:

  1. Click the Annotations tab

  2. View statistics:

    • Total annotated images
    • Total question-answer pairs
    • Average questions per image
    • Distribution by question type

View dataset insights for detailed analytics on your annotations.

💡

How many annotations do I need?

Minimum: 50 question-answer pairs across 20+ images

Recommended: 200+ question-answer pairs across 50+ images

Optimal: 500+ question-answer pairs with diverse question types

More diverse questions = better model understanding. See Write effective questions for quality guidelines.

Filter annotated images

In the Annotator:

  • Check the bottom thumbnail navigation strip to see which images have annotations (marked with "Annotated" badges)
  • Monitor word and character count on the bottom right corner:
    • Total count — Shows combined statistics for all QA pairs on the current image
    • Individual count — Click any QA pair to see its specific word and character count
  • Quickly identify images that need additional questions

Use the word count feature to ensure answer consistency across your dataset.

In the Explorer:

  1. Click the Assets tab

  2. Use filters to show:

    • Images with annotations
    • Images without annotations
    • Images with specific answer values

Learn more about dataset management


Troubleshooting

Common issues and solutions when creating VQA annotations.

My annotations aren't saving

Possible causes and solutions:

  1. Network connectivity — Check your internet connection. Annotations require an active connection to save.

  2. Session timeout — If you've been idle, refresh the page and reopen the annotator.

  3. Browser issues — Clear cache or try a different browser. Supported browsers: Chrome, Firefox, Safari, Edge.

  4. Form incomplete — Ensure both Question and Answer fields are filled before clicking Add.

Verify your annotations saved:

I can't find the annotator

Solution steps:

  1. Navigate to your dataset from the main Datasets page

  2. Click the Annotator tab in the left sidebar (see Open the annotator)

  3. If the tab is missing or annotator won't open, verify:

The annotator interface shows your image on the left, question-answer pairs on the right, and thumbnails at the bottom.

The interface is slow or lagging

Performance optimization:

  1. Large images — Images over 10 MB may load slowly. Upload optimized images when possible.

  2. Browser resources — Close unnecessary tabs and applications.

  3. Use keyboard shortcutsKeyboard navigation is faster than mouse clicks.

  4. Batch workflow — Annotate similar images together to reduce context switching.

My question-answer pairs disappeared

Recovery steps:

  1. Check the image — Ensure you're viewing the correct image. Check the bottom thumbnail strip to confirm which image is selected.

  2. Check the right panel — Question-answer pairs appear in the Visual Question Answering panel on the right side. Scroll down if you have many annotations.

  3. Check filtersRemove filters that might be hiding annotations.

  4. Review edit history — If annotations were deleted, they cannot be recovered. Re-annotate the image.

  5. Browser cache — Try refreshing the page or clearing your browser cache.

Prevention: Use annotation statistics to regularly verify your progress. The bottom thumbnail strip shows "Annotated" badges on images with question-answer pairs.

I accidentally deleted annotations

Unfortunately, deletions cannot be undone. Best practices to prevent this:

  1. Double-check before deleting — Review annotations carefully before clicking the three-dot menu and selecting Delete

  2. Use edit insteadEdit annotations rather than deleting and recreating (edits save automatically)

  3. Export backups — Regularly download your dataset as a backup

  4. Team coordination — If annotating collaboratively, communicate before bulk deletions

How do I handle special characters in questions or answers?

Special characters are supported:

  • Standard punctuation — Question marks, commas, periods work normally
  • Numbers — Use digits for counting questions ("3", not "three")
  • Unicode — Non-English characters and symbols are supported
  • Avoid — Excessive special characters may confuse the model

Best practice: Keep answers concise with minimal special characters for better model training.


Common questions

Should I ask the same questions for every image?

It depends on your use case:

Same questions — Best when you want the model to answer consistent questions across different images (e.g., "Is this product defective?")

Varied questions — Best when training a general-purpose VQA model that needs to handle diverse questions

Recommended approach: Use a core set of questions for all images, plus additional specific questions where relevant.

Review Question types to include to plan your annotation strategy and Write effective questions for consistency guidelines.

How specific should my answers be?

Keep answers concise:

  • ✅ Good: "3", "Red", "Damaged", "Top-left corner"
  • ❌ Too detailed: "There are three defects visible on the surface, including two scratches and one dent located near the edge"

Why shorter is better:

  • Easier for the model to learn
  • Faster to annotate
  • More consistent across annotators
  • Better training results

Monitor answer length:

Use the word and character count displayed on the bottom right of the annotator:

  • Click any QA pair to see its individual word count
  • Compare counts across your annotations to ensure consistency
  • Aim for answers with fewer than 5 words when possible

For specific formats by question type, see Question types to include.

Can I use the same answer for different questions?

Yes! It's fine (and often correct) to have the same answer for different questions:

  • Q: "Is the product damaged?" → A: "Yes"
  • Q: "Is this part acceptable?" → A: "No"
  • Q: "Should this be rejected?" → A: "Yes"

The model learns to understand different questions that relate to the same visual information. This is common with Yes/No questions and Category questions.

You can add multiple questions to the same image with overlapping answers.

What if I don't know the answer to a question?

Best practices:

  1. Skip unclear images — If you genuinely can't determine the answer, navigate to the next image
  2. Use 'Unknown' — For questions where the answer isn't visible, you can use "Unknown" or "Not visible" consistently across your dataset
  3. Refine your questions — If many images are unclear, revisit your question writing strategy

Avoid guessing — Incorrect annotations hurt model performance more than missing annotations. Focus on effective questions that have clear, verifiable answers.

Can I annotate images collaboratively?

Yes! Multiple team members can annotate different images in the same dataset:

Pro tip: Create annotation guidelines for your team based on Question types to include and Write effective questions to ensure consistency.

Track annotation progress across your team using dataset insights.

Can I import existing VQA annotations?

Yes! You can upload pre-existing VQA annotations:

After importing, use Track annotation progress to verify your uploaded annotations.

How do I handle multiple objects in one image?

Approach 1 — General questions:

Approach 2 — Specific questions:

Both approaches work — Choose based on your use case. You can add multiple questions per image to combine both approaches.

Review Write effective questions for guidance on clarity and consistency.


Annotation best practices

Follow these guidelines to create high-quality VQA annotations that improve model performance.

Consistency is key
Quality over quantity
  • Accurate annotations are more valuable than many low-quality ones
  • Take time to write effective questions
  • Review and edit annotations when you spot errors
  • Use filters to review your work
  • Monitor word and character counts (bottom right) to maintain consistent answer length
Diverse coverage
  • Include various question types in your dataset
  • Cover different scenarios and edge cases
  • Add multiple questions per image when appropriate
  • Balance question difficulty (easy, medium, challenging)
Efficient workflow
  • Learn keyboard shortcuts to speed up annotation (use E and Q to navigate)
  • Use navigation features to move quickly between images
  • Click on QA pairs to check individual word counts and identify verbose answers
  • Track your progress to stay motivated
  • Take breaks to maintain annotation quality

What's next?

Annotations complete!

Your VQA annotations are ready for training. Your model will learn to answer these types of questions on new images.

Review Annotation best practices to ensure your dataset is optimized for training.

Next steps


Related resources