Dataset Types

Dataset types in Datature Vi define the structure of your training data and the kind of output your vision-language model (VLM) produces. Choose a type before you create a dataset, as it determines your annotation format, model output, and evaluation metrics.

By the end of this guide

Know which dataset type to choose when creating a dataset. The type determines your annotation format, model output, and evaluation metrics.

Available dataset types

Object localization

Phrase Grounding

Localize objects in images using natural language descriptions. Returns bounding boxes for phrases like 'the red car on the left.'

View guide →

Image understanding

Visual Question Answering

Answer natural language questions about images. Returns text responses to questions like 'Is there a defect on the surface?'

View guide →

Custom schemas

Freeform Text

Define custom annotation schemas for specialized use cases and research projects.

View guide →

Need structured output?

If your application needs consistent, machine-readable output from image analysis (JSON, YAML, code, or any custom format), see Structured Data Extraction. It builds on freeform text with a defined annotation schema and system prompt to produce predictable output your code can parse directly.

Related resources

Phrase Grounding

Deep dive into visual grounding, annotation format, and best practices.

Visual Question Answering

Complete guide to VQA, question types, and annotation best practices.

Create A Dataset

Set up your first dataset in Datature Vi.