Annotate Images
Label your images with phrase grounding, VQA, or freeform text annotations to train a vision-language model.
Datature Vi supports three annotation types for images, each suited to a different vision-language model task. Choose based on what you need your model to do.
A dataset with uploaded images. Create a dataset if you don't have one yet.
Annotation types
Phrase Grounding
Link text descriptions to bounding boxes. Teaches your model to locate objects by natural-language description.
Visual Question Answering
Create question-answer pairs about images. Teaches your model to answer specific questions about what it sees.
Freeform Text
Write open-ended text annotations in any structure. Teaches your model with flexible, unstructured text data.
Phrase grounding
Phrase grounding connects natural-language descriptions to specific regions in an image. You write a caption, draw bounding boxes around objects, then link each phrase to its box.
The result: a model that can answer "Find the large black chip" by returning a bounding box coordinate, not only a label.
Typical use cases:
- Object detection with flexible, natural-language descriptions
- Visual search and retrieval systems
- Zero-shot object detection (no fixed category list required)
- Image description and captioning tasks
Annotate images for phrase grounding
Visual question answering
Visual question answering teaches your model to answer questions about images. You write question-answer pairs that cover the decisions your model needs to make.
The result: a model that can answer "Is this product defective?" or "How many safety violations are present?" from a camera feed.
Typical use cases:
- Quality control and inspection systems
- Inventory counting and shelf monitoring
- Compliance and safety verification
- Condition assessment for agriculture or maintenance
Freeform text
Freeform text gives you a free-form text editor for each image. You can write any structured or unstructured text, including JSON, plain descriptions, or custom schemas that match your training pipeline.
The result: a model trained on flexible text outputs tailored to your specific task.
Typical use cases:
- Custom JSON output schemas for specialized tasks
- Detailed image descriptions and scene analysis
- Multi-attribute annotations that don't fit predefined formats
- Research and experimental annotation workflows
Annotate images with freeform text
Which type should you use?
Choose based on what you want your model to output:
You can run different annotation types in separate datasets within the same project.
Annotation workflow
The general flow for any annotation type:
- Upload assets to your dataset
- Open the annotator from your dataset's Annotate tab
- Create annotations (manually, with AI assistance, or both)
- Review coverage using the dataset overview
- Train your model using the annotated dataset
Next steps
Annotate For Phrase Grounding
Step-by-step guide to creating phrase grounding annotations with captions, bounding boxes, and phrase links.
Annotate For VQA
Step-by-step guide to adding question-answer pairs to your images for visual question answering tasks.
Annotate With Freeform Text
Step-by-step guide to writing open-ended text annotations for flexible model training.
Updated 6 days ago
