View Dataset Insights

Analyze dataset statistics, annotation distributions, and quality metrics to ensure optimal training data.

Dataset insights provide comprehensive analytics about your dataset composition, helping you understand annotation distribution, identify quality issues, and ensure your data is ready for training.

💡
Part of dataset workflow
Create dataset → Upload data → Annotate → View insights (you are here) → Train model

What insights are available?

Asset statistics

Total number of assets in the dataset
File format distribution
File size analytics
Upload timeline

Annotation metrics

Total annotation count
Phrase grounding pairs (if applicable)
VQA question-answer pairs (if applicable)
Annotations per asset

Distribution analysis

Class balance across annotations
Annotation type distribution

Quality indicators

Assets without annotations

Why dataset insights matter

Before training

Check insights before starting training to:

Verify sufficient annotations exist
Ensure balanced class distribution
Identify missing labels
Confirm dataset readiness

Start training →

During annotation

Monitor progress while annotating:

Track annotation completion
Spot coverage gaps
Balance class distribution
Plan remaining work

Annotate data →

Quality assurance

Regular checks help maintain quality:

Detect outliers or anomalies
Verify annotation consistency
Identify quality issues early
Plan data improvements

Manage assets →

How to access insights

Via web interface

Navigate to your dataset
Click the Insights or Analytics tab
Review the displayed metrics and visualizations
Use filters to focus on specific aspects

Via Vi SDK

import vi

client = vi.Client(
    secret_key="your-secret-key",
    organization_id="your-org-id"
)

# Get dataset information
dataset = client.datasets.get("your-dataset-id")
print(f"Total assets: {dataset.asset_count}")
print(f"Annotations: {dataset.annotation_count}")

Learn more about Vi SDK →

Best practices

Check regularly

Review insights weekly during active annotation

Before training

Always verify dataset health before starting runs

After cleanup

Confirm changes improved dataset quality

Share with team

Use insights to coordinate annotation efforts

Common scenarios

Insufficient annotations

Problem: Dataset has too few annotations for training

Action:

Add more annotations manually
Upload additional annotations if available
Use AI-assisted tools to speed up labeling

Class imbalance

Problem: Some classes heavily overrepresented

Action:

Collect more data for underrepresented classes
Remove excess samples from overrepresented classes
Adjust dataset split strategy

Missing annotations

Problem: Many assets without annotations

Action:

Complete annotation workflow
Upload missing annotations if they exist
Remove unannotated assets via bulk actions

Next steps

Add or improve annotations based on insights

Clean up dataset based on quality analysis

Start training once dataset is ready

Export dataset for backup or processing

Upload annotations

Add more annotations to improve balance

Configure dataset

Adjust training split based on insights

Related resources

Manage datasets — Overview of dataset management operations
Create a dataset — Set up new datasets with proper configuration
Annotate data — Create quality annotations for training
Upload data — Add more images and labels
Manage assets — Delete, organize, and clean up images
Download data — Export dataset for backup or external processing
Configure your dataset — Adjust training splits and settings
Train a model — Use your dataset for VLM training
Resource usage — Monitor Data Rows and storage consumption
Vi SDK Datasets API — Manage datasets programmatically

Need help?

We're here to support your VLMOps journey. Reach out through any of these channels:

Contact Support

Get help from our team via our website or email us at [email protected]

Join Our Community

Connect with other Datature users, share ideas, and get community support on Slack

Explore Resources

Read our Blog
Check out GitHub
Watch Tutorials

Schedule a Demo

Book a personalized demo to see how Datature Vi can accelerate your vision AI projects

Updated about 1 month ago