View Dataset Insights
Analyze dataset statistics, annotation distributions, and quality metrics to ensure optimal training data.
Dataset insights provide comprehensive analytics about your dataset composition, helping you understand annotation distribution, identify quality issues, and ensure your data is ready for training.
Part of dataset workflowCreate dataset → Upload data → Annotate → View insights (you are here) → Train model
What insights are available?
Asset statistics
- Total number of assets in the dataset
- File format distribution
- File size analytics
- Upload timeline
Annotation metrics
- Total annotation count
- Phrase grounding pairs (if applicable)
- VQA question-answer pairs (if applicable)
- Annotations per asset
Distribution analysis
- Class balance across annotations
- Annotation type distribution
Quality indicators
- Assets without annotations
Why dataset insights matter
Before training
Check insights before starting training to:
- Verify sufficient annotations exist
- Ensure balanced class distribution
- Identify missing labels
- Confirm dataset readiness
During annotation
Monitor progress while annotating:
- Track annotation completion
- Spot coverage gaps
- Balance class distribution
- Plan remaining work
Quality assurance
Regular checks help maintain quality:
- Detect outliers or anomalies
- Verify annotation consistency
- Identify quality issues early
- Plan data improvements
How to access insights
Via web interface
- Navigate to your dataset
- Click the Insights or Analytics tab
- Review the displayed metrics and visualizations
- Use filters to focus on specific aspects
Via Vi SDK
import vi
client = vi.Client(
secret_key="your-secret-key",
organization_id="your-org-id"
)
# Get dataset information
dataset = client.datasets.get("your-dataset-id")
print(f"Total assets: {dataset.asset_count}")
print(f"Annotations: {dataset.annotation_count}")Best practices
Review insights weekly during active annotation
Always verify dataset health before starting runs
Confirm changes improved dataset quality
Use insights to coordinate annotation efforts
Common scenarios
Insufficient annotations
Problem: Dataset has too few annotations for training
Action:
- Add more annotations manually
- Upload additional annotations if available
- Use AI-assisted tools to speed up labeling
Class imbalance
Problem: Some classes heavily overrepresented
Action:
- Collect more data for underrepresented classes
- Remove excess samples from overrepresented classes
- Adjust dataset split strategy
Missing annotations
Problem: Many assets without annotations
Action:
- Complete annotation workflow
- Upload missing annotations if they exist
- Remove unannotated assets via bulk actions
Next steps
Add or improve annotations based on insights
Clean up dataset based on quality analysis
Start training once dataset is ready
Export dataset for backup or processing
Add more annotations to improve balance
Adjust training split based on insights
Related resources
- Manage datasets — Overview of dataset management operations
- Create a dataset — Set up new datasets with proper configuration
- Annotate data — Create quality annotations for training
- Upload data — Add more images and labels
- Manage assets — Delete, organize, and clean up images
- Download data — Export dataset for backup or external processing
- Configure your dataset — Adjust training splits and settings
- Train a model — Use your dataset for VLM training
- Resource usage — Monitor Data Rows and storage consumption
- Vi SDK Datasets API — Manage datasets programmatically
Need help?
We're here to support your VLMOps journey. Reach out through any of these channels:
Updated about 1 month ago
