View Dataset Insights

Analyze dataset statistics, annotation distributions, and quality metrics to ensure optimal training data.

Dataset insights provide comprehensive analytics about your dataset composition, helping you understand annotation distribution, identify quality issues, and ensure your data is ready for training.

💡

Part of dataset workflow

Create datasetUpload dataAnnotateView insights (you are here) → Train model


What insights are available?

Asset statistics

  • Total number of assets in the dataset
  • File format distribution
  • File size analytics
  • Upload timeline

Annotation metrics

  • Total annotation count
  • Phrase grounding pairs (if applicable)
  • VQA question-answer pairs (if applicable)
  • Annotations per asset

Distribution analysis

  • Class balance across annotations
  • Annotation type distribution

Quality indicators

  • Assets without annotations

Why dataset insights matter

Before training

Check insights before starting training to:

  • Verify sufficient annotations exist
  • Ensure balanced class distribution
  • Identify missing labels
  • Confirm dataset readiness

Start training →

During annotation

Monitor progress while annotating:

  • Track annotation completion
  • Spot coverage gaps
  • Balance class distribution
  • Plan remaining work

Annotate data →

Quality assurance

Regular checks help maintain quality:

  • Detect outliers or anomalies
  • Verify annotation consistency
  • Identify quality issues early
  • Plan data improvements

Manage assets →


How to access insights

Via web interface

  1. Navigate to your dataset
  2. Click the Insights or Analytics tab
  3. Review the displayed metrics and visualizations
  4. Use filters to focus on specific aspects

Via Vi SDK

import vi

client = vi.Client(
    secret_key="your-secret-key",
    organization_id="your-org-id"
)

# Get dataset information
dataset = client.datasets.get("your-dataset-id")
print(f"Total assets: {dataset.asset_count}")
print(f"Annotations: {dataset.annotation_count}")

Learn more about Vi SDK →


Best practices

Check regularly

Review insights weekly during active annotation

Before training

Always verify dataset health before starting runs

After cleanup

Confirm changes improved dataset quality

Share with team

Use insights to coordinate annotation efforts


Common scenarios

Insufficient annotations

Problem: Dataset has too few annotations for training

Action:

Class imbalance

Problem: Some classes heavily overrepresented

Action:

Missing annotations

Problem: Many assets without annotations

Action:


Next steps


Related resources