Phrase Grounding

Before You Start

A phrase grounding dataset with uploaded images
A plan for which objects you want to annotate and how you will describe them

Phrase grounding annotations teach your vision-language model (VLM) to locate objects by their text description. Each annotation pairs a written caption with bounding boxes, where specific phrases in the caption link to the boxes they describe. This guide walks through creating those annotations in Datature Vi.

Open the annotator

Go to your dataset, then click the Annotate tab. Click any image thumbnail in the bottom strip to load it onto the canvas.

Open the Annotator tab

From the Dataset Overview page, click the Annotator tab to open the labeling interface.

You should see

Dataset Overview showing image and annotation count, and heatmaps showing annotation patterns

Your annotations are ready when you see annotation count matching the image count, and heatmaps showing annotation patterns.

Keyboard shortcuts

Key

Action

Rectangle tool: draw bounding boxes

Text tool: write or edit the caption

Highlight tool: link phrases to boxes

Delete tool: remove boxes or phrase links

IntelliScribe Caption: AI-generate a caption

IntelliScribe Phrases: AI-link phrases to boxes

Next image

Previous image

Esc

Exit current tool mode

Show all shortcuts

Annotation guidelines

These guidelines produce annotations that train well. Inconsistent annotations train poorly regardless of quantity.

Captions

Be specific: "FPGA chip" over "chip", "blue safety helmet" over "helmet"
Include spatial info: "in the center", "on the left edge", "above the connector"
Use consistent terminology across your entire dataset

Bounding boxes

Include the full object with minimal empty space
Stay consistent about box tightness across images
For partial objects, box the visible portion and note it in the caption

Phrase links

Each highlighted phrase must be a distinct, non-overlapping region
One phrase per box; split complex descriptions into separate pairs
Aim for 5-10 phrase-box pairs per image (3-5 minimum, up to 15 for complex scenes)

Edit or delete annotations

You can modify or remove annotations at any time.

To edit a caption: Press T, click in the text area, make your changes. Changes save automatically.

To resize or reposition a box: Click the box, then drag it or drag a handle. The phrase link stays connected.

To delete a box: Press D, then click the box. The box and its phrase link are removed. The caption text remains.

To unlink a phrase: Press D, then click the highlighted phrase in the caption. The highlight and link are removed, but the box and caption text remain.

Deletions Cannot Be Undone

Deleted boxes and phrase links cannot be recovered. Export your dataset regularly as a backup.

Phrase Grounding

Open the annotator

Open the Annotator tab

Keyboard shortcuts

Keyboard shortcuts

Annotation guidelines

Captions

Bounding boxes

Phrase links

Edit or delete annotations

Chain-of-thought reasoning

Next steps

Train A Model

AI-Assisted Tools

Dataset Overview