Prepare Your Dataset

Set up your dataset with images and annotations to train your VLM.

📍
Quickstart: Step 1 of 3
This is the first step in the quickstart guide. After preparing your dataset, you'll train a VLM and deploy it.

Before you can train a VLM, you need a dataset with images and annotations. Follow these three focused steps to get your data ready for training.

⏱️ Time to complete: ~15 minutes

📚 What you'll learn: Dataset creation, image uploading, and annotation basics

📋
Prerequisites
You'll need a Datature Vi account. Sign up for free if you haven't already.

Three steps to prepare your dataset

Follow these steps in order to set up your data for training:

1. Create a Dataset

Choose your task type and configure dataset settings

2. Upload Images

Add images via drag-and-drop or SDK

3. Add Annotations

Upload existing labels or create new ones

Quick overview

What you'll do

Create a dataset — Choose between Phrase Grounding, Visual Question Answering, or Freeform (coming soon), then configure storage
Upload images — Use drag-and-drop or SDK to add your images
Add annotations — Import existing labels or create them manually

What you'll need

Your images in supported formats (.jpg, .png, etc.)
Annotations (optional - you can create them in Datature)
About 15 minutes

💡
Need more detail?
This quickstart covers the essentials. For comprehensive guides, see:

Complete dataset creation guide

Complete image upload guide

Complete annotation guide

Tips for success

Start with 20-50 images for initial testing
Ensure annotation filenames match your image filenames exactly
Use Multi-Region storage for best performance
Check out our AI-assisted annotation tools to speed up labeling
Learn about Phrase Grounding vs Visual Question Answering concepts

Dataset types explained

Not sure which dataset type to choose? Here's a quick guide:

Phrase Grounding

Best for:

Object detection
Defect detection
Product identification
Counting objects

Output: Bounding boxes around objects with labels

Learn more →

Visual Question Answering

Best for:

Image understanding
Content analysis
Quality inspection
Flexible Q&A about images

Output: Natural language answers to questions

Learn more →

Freeform

🚧 Coming soon

Best for:

Custom annotation schemas
Specialized use cases
Research projects
Novel vision tasks

Output: Custom annotation formats

What's next?

Once you've completed all three steps, your dataset will be ready for training.

Train Your VLM →

Create a training workflow and start fine-tuning your vision-language model on your prepared dataset.

Related resources

Create a dataset — First step: configure dataset settings
Upload images — Second step: add images
Add annotations — Third step: import labels
Train a VLM — Next: train your model
Annotate data — Create phrase grounding and VQA annotations
Upload data guide — Detailed upload instructions
Manage datasets — Organize and maintain datasets
Concepts — Understanding VLM concepts
Vi SDK — Programmatic uploads with Python
View insights — Check dataset quality
Quickstart overview — Back to main quickstart
Contact us — Get help from the Datature team

Need help?

We're here to support your VLMOps journey. Reach out through any of these channels:

Contact Support

Get help from our team via our website or email us at [email protected]

Join Our Community

Connect with other Datature users, share ideas, and get community support on Slack

Explore Resources

Read our Blog
Check out GitHub
Watch Tutorials

Schedule a Demo

Book a personalized demo to see how Datature Vi can accelerate your vision AI projects

Updated about 1 month ago