Annotate Data
Add labels to your images and videos to teach a vision-language model what to recognize and how to respond.
Annotations are the labeled examples your vision-language model (VLM) learns from. Datature Vi supports annotations for both images and videos, with different annotation types suited to different tasks. This page helps you pick the right asset type and annotation format, and points you to the step-by-step guides.
A dataset with uploaded images or uploaded videos. Create a dataset if you don't have one yet.
New to Datature Vi? Learn what it does or follow the quickstart.
Add labels to your images and videos using phrase grounding, VQA, and freeform text annotations.
Annotate by asset type
Image annotation types
Datature Vi supports three annotation types for images:
Phrase Grounding
Link text descriptions to bounding boxes. Teaches your model to locate objects by natural-language description.
Visual Question Answering
Create question-answer pairs about images. Teaches your model to answer specific questions about what it sees.
Freeform Text
Write open-ended text annotations in any structure for flexible model training.
Video annotation types
Datature Vi supports freeform text annotation for videos. Use the timeline scrubber to select frame sequences and write text annotations that describe actions, events, and temporal relationships.
Which type should you use?
Choose based on what you want your model to output:
You can run different annotation types in separate datasets within the same project.
Annotation workflow
The general flow for any annotation type:
- Upload assets to your dataset
- Open the annotator from your dataset's Annotate tab
- Create annotations (manually, with AI assistance, or both)
- Review coverage using the dataset overview
- Train your model using the annotated dataset
AI-assisted annotation
Datature Vi includes IntelliScribe, an AI tool that speeds up phrase grounding and freeform image annotation. For phrase grounding, IntelliScribe can generate captions automatically (press C) and link phrases to bounding boxes (press P). For freeform image annotation, IntelliScribe generates text content (press C) that you can edit to match your schema.
AI assistance is most useful on large datasets with common objects. For domain-specific content, generate the AI text as a starting point, then edit for your vocabulary.
Next steps
Updated about 22 hours ago
