Create a Dataset

Datasets are the foundation of your computer vision projects. They store your images along with their annotations, enabling you to organize, manage, and prepare data for training.

This document explains how to create a new dataset in Datature Vi, from selecting your vision task type to configuring storage settings.

💡
Quick workflow
Create dataset (you are here) → Upload images → Add annotations → Train a model

📋
Prerequisites
Before creating a dataset, ensure you have:

An active Datature Vi account

Access to a workspace or organization

A clear understanding of your vision task requirements

Create your dataset

Follow these steps to create a new dataset:

In your Datature Vi dashboard, click Dataset in the left sidebar

Click Create Dataset

This opens the dataset creation wizard, which guides you through four configuration steps.

Choose your dataset type

Select the vision task you want to accomplish. Your choice determines how you'll annotate and train models with this dataset.

Best for: Object detection, defect detection, product identification, inventory management

Phrase grounding enables you to locate and classify multiple objects within images using bounding boxes. This task type is ideal when you need to identify where objects are located and what they are.

Common use cases:

Manufacturing defect detection
Retail product recognition
Traffic monitoring and vehicle detection
Medical imaging for region identification

Learn more about Phrase Grounding →

Click Next after selecting your type.

💡
Can I change the dataset type later?
No, the dataset type cannot be changed after creation. If you need a different task type, you'll need to create a new dataset.

💡
Not sure which to choose?

Choose Phrase Grounding if you need to find and locate specific objects with bounding boxes

Choose VQA if you need to ask questions and get natural language answers

Choose Freeform (coming soon) if you need custom annotation schemas for specialized use cases

Learn more about Phrase Grounding | Learn more about VQA

Choose your data type

Select your input data format:

Choose Image for:

Individual photos or screenshots
Extracted frames from videos
Scanned documents or medical imagery
Any static visual content

Click Next to continue.

Configure settings

Enter your dataset details and configuration:

Dataset name

Choose a descriptive, memorable name for your dataset.

Best practices:

Use clear, descriptive names (e.g., "Factory Defects 2024" instead of "Dataset1")
Include version numbers if maintaining multiple iterations
Follow your organization's naming conventions
You can rename it later if needed

Dataset description

(optional) Add context about your dataset's purpose, contents, or specifications.

Recommended information:

Data source and collection date
Annotation guidelines or standards
Expected use cases
Any special preprocessing applied

Dataset localization

Select your storage region preference:

Multi-Region — Recommended for best performance and reliability. Data is distributed across multiple regions for optimal access speed and redundancy.
Single Region — Data is stored in a specific geographic region (useful for compliance requirements)

⚡
Recommendation: Choose Multi-Region unless you have specific data sovereignty or compliance requirements.
This setting cannot be changed after dataset creation.

Click Next to review.

Review and create

Verify all your settings in the summary screen:

Review:

Dataset type
Data type
Dataset name and description
Localization settings

If everything looks correct, click Create Dataset to finish.

✅
Dataset created successfully!
Your dataset is now ready. You can start uploading assets and adding annotations.

Next steps

After creating your dataset, you can:

Upload assets — Add images to your dataset (required next step)
Upload annotations — Import existing annotations if you have them
Annotate data — Start labeling your data manually or with AI assistance
View dataset insights — Explore statistics and analytics about your dataset
Manage your dataset — Rename, delete, or download your dataset

Related resources

Manage datasets — Rename, delete, and organize your datasets
Download data — Export your datasets and annotations
Phrase grounding concepts — Deep dive into object detection tasks
Visual question answering concepts — Understanding VQA capabilities
Training workflows — Train VLMs with your dataset
Upload data — Add images and annotations to datasets
Annotate data — Create annotations for training
View dataset insights — Analyze dataset statistics and quality
Quickstart — Complete end-to-end workflow
Team settings — Add members to collaborate
Vi SDK — Programmatic dataset management
Create a training project — Set up training environment

Need help?

We're here to support your VLMOps journey. Reach out through any of these channels:

Contact Support

Get help from our team via our website or email us at [email protected]

Join Our Community

Connect with other Datature users, share ideas, and get community support on Slack

Explore Resources

Read our Blog
Check out GitHub
Watch Tutorials

Schedule a Demo

Book a personalized demo to see how Datature Vi can accelerate your vision AI projects

Quick workflow

Prerequisites

Create your dataset

Choose your dataset type

Can I change the dataset type later?

Not sure which to choose?

Choose your data type

Configure settings

Dataset name

Dataset description

Dataset localization

Recommendation: Choose Multi-Region unless you have specific data sovereignty or compliance requirements.

Review and create

Dataset created successfully!

Next steps

Related resources

Need help?