Manage Datasets
Organize, maintain, and optimize your datasets through renaming, deletion, downloading, and asset management.
Effective dataset management is essential for maintaining high-quality training data and keeping your workspace organized. Datature Vi provides comprehensive tools to rename, download, analyze, and clean up your datasets throughout the model development lifecycle.
This guide covers all aspects of dataset management, from basic operations like renaming and downloading to advanced maintenance tasks like bulk asset cleanup and insight analysis.
Complete dataset workflowCreate dataset → Upload data → Annotate → Manage datasets (you are here) → Train model
Core dataset operations
Datature Vi provides five essential categories of dataset management operations:
Update dataset names to reflect purpose and improve organization
Permanently remove datasets you no longer need
Export datasets, annotations, and assets for backup or external use
Analyze dataset statistics, distributions, and quality metrics
Delete individual assets or perform bulk operations
Quick reference
Common dataset management tasks and where to find them:
| Task | Documentation | When to use |
|---|---|---|
| Change dataset name | Rename a dataset → | Project reorganization, improved clarity |
| Export for backup | Download full dataset → | Before major changes, periodic backups |
| Export annotations only | Download annotations → | Lightweight backup, format conversion |
| Remove poor quality images | Delete an asset → | Quality control, dataset cleanup |
| Bulk asset cleanup | Bulk actions → | Remove multiple assets efficiently |
| Check dataset statistics | View insights → | Analyze distribution, spot issues |
| Remove entire dataset | Delete a dataset → | Cleanup unused projects |
Renaming datasets
Keep your workspace organized with descriptive, meaningful dataset names.
When to rename
- Improved clarity — Make dataset purpose clear to team members
- Project evolution — Update names as content or scope changes
- Standardization — Apply consistent naming conventions
- Version management — Distinguish between dataset versions
Key features
- Safe operation — Dataset ID remains unchanged; all integrations continue working
- No downtime — Active training runs and workflows are unaffected
- Instant updates — Name changes appear immediately across the platform
- Unlimited changes — Rename as often as needed
Learn how to rename datasets →
Downloading data
Export your datasets and annotations for backup, local development, or external processing.
Export options
Download complete dataset with all assets and annotations
Export annotation data without asset files (lightweight)
Common use cases
| Use case | Recommended export | Format |
|---|---|---|
| Backup before deletion | Full dataset | Assets + Vi JSONL |
| Local training | Full dataset | Assets + TFRecord |
| Annotation analysis | Annotations only | Vi JSONL |
| Format conversion | Annotations only | Vi JSONL or TFRecord |
| External processing | Full dataset | Assets + Vi JSONL |
| Periodic backups | Full dataset | Assets + annotations |
Download methods
- Web interface — Point-and-click export with progress tracking
- Vi SDK — Programmatic downloads with
client.get_dataset() - Automated workflows — Schedule periodic backups via SDK
Viewing dataset insights
Analyze your dataset composition, annotation distribution, and quality metrics to ensure optimal training data.
Available insights
- Asset statistics — Total count, file types, size distribution
- Annotation counts — Phrase grounding pairs, VQA pairs, distribution
- Class balance — See if your annotations are evenly distributed
- Quality indicators — Identify potential issues or gaps
- Split information — Training vs validation set breakdown
When to check insights
Before training
Review dataset insights before starting training runs:
- Verify you have sufficient annotations
- Check for class imbalance issues
- Ensure asset quality meets requirements
- Confirm train/validation split is appropriate
During dataset cleanup
Monitor changes as you clean up your dataset:
- Track annotation count changes after deletions
- Verify balanced distribution after asset removal
- Confirm quality improvements from cleanup
Quality assurance
Regular insight reviews help maintain data quality:
- Spot outliers or unusual distributions
- Identify missing annotations
- Detect potential labeling errors
- Plan additional data collection
Managing assets
Maintain dataset quality by removing poor quality images, duplicates, or unnecessary assets.
Asset management operations
Remove specific assets one at a time
Delete multiple assets simultaneously with selection
Common scenarios
| Scenario | Recommended action | Method |
|---|---|---|
| Poor quality images | Delete individual assets | Single deletion |
| Duplicate content | Remove duplicates | Bulk operations |
| Wrong dataset | Move or delete assets | Bulk operations |
| Privacy compliance | Remove sensitive data | Individual or bulk |
| Dataset optimization | Clean up test data | Bulk operations |
Important considerations
Asset deletion is permanent
- Cannot be undone — Deleted assets are removed immediately
- Annotations lost — All annotations associated with deleted assets are also removed
- Always backup first — Download dataset before large-scale deletions
- No recovery option — Only way to restore is re-uploading from backups
Learn about asset management →
Deleting datasets
Permanently remove datasets you no longer need to keep your workspace organized and manage storage.
When to delete
- Completed projects — Remove datasets after project completion
- Test datasets — Clean up experimental or prototype datasets
- Duplicate data — Remove redundant datasets
- Storage optimization — Free up space for new projects
- Workspace cleanup — Maintain organized, clutter-free environment
Safety measures
Before deleting a dataset:
- Create a backup — Download the full dataset
- Check dependencies — Verify no active training runs depend on this data
- Inform team — Notify collaborators about the planned deletion
- Review carefully — Ensure you're deleting the correct dataset
Deletion is permanent and irreversible
- All assets (images, videos) are deleted
- All annotations are permanently removed
- Dataset ID becomes invalid
- Training runs lose reference to source data
- Cannot be recovered through any means
- SDK queries for deleted dataset ID will fail
Always export a backup first if there's any chance you'll need the data later.
Learn how to delete datasets →
Best practices for dataset management
Download datasets periodically, especially before major changes
Use clear, consistent naming conventions across your organization
Regularly check dataset insights to maintain data quality
Remove poor quality assets as you discover them
Keep notes on major dataset modifications for team reference
Create dataset versions for significant changes using naming
Dataset management workflow
Follow this recommended workflow for maintaining healthy datasets:
1. Regular monitoring
- Check insights weekly — Review statistics and distributions
- Track annotation progress — Monitor annotation completion rates
- Identify quality issues — Spot problems early
2. Periodic cleanup
- Remove poor quality data — Delete blurry, corrupted, or off-topic images
- Eliminate duplicates — Use bulk operations for efficiency
- Archive old versions — Download and remove superseded datasets
3. Backup strategy
- Before major changes — Always download before bulk deletions
- Monthly backups — Regular exports for important datasets
- Version snapshots — Download before significant modifications
4. Organization maintenance
- Standardize names — Apply consistent naming conventions
- Delete test data — Remove experimental datasets after use
- Document structure — Maintain team documentation on dataset organization
Programmatic dataset management
For advanced users, the Vi SDK provides programmatic access to all dataset management operations:
import vi
client = vi.Client(
secret_key="your-secret-key",
organization_id="your-org-id"
)
# List all datasets
for dataset in client.datasets:
print(f"Dataset: {dataset.name} (ID: {dataset.dataset_id})")
# Download a dataset
result = client.get_dataset(
dataset_id="dataset-id",
save_dir="./backups"
)
# Delete a dataset
client.datasets.delete("dataset-id")Troubleshooting
Cannot rename dataset
Potential causes:
- Insufficient permissions
- Browser caching issues
- Active training using the dataset
Solutions:
- Verify you have edit access to the dataset
- Refresh the page and try again
- Check if any training runs are actively using the dataset
Download fails or times out
Potential causes:
- Large dataset size
- Network connectivity issues
- Browser limitations
Solutions:
- Use Vi SDK for large datasets (more reliable)
- Split downloads into smaller chunks if possible
- Check your internet connection stability
- Try downloading during off-peak hours
Deleted wrong dataset
Unfortunately, deletion is permanent:
- No recovery option through the interface
- No undo or trash bin functionality
- Cannot restore from server backups
Your only options:
- Re-upload from local backup if available
- Recreate dataset from original source data
- Contact support for Enterprise plans (may have additional options)
Next steps
Update dataset names for better organization
Export datasets and annotations
Analyze dataset statistics
Delete and organize dataset assets
Permanently remove datasets
Start training with your managed dataset
Related resources
- Create a dataset — Set up new datasets for different task types
- Upload data — Add images and annotations to datasets
- Annotate data — Create and edit annotations
- Vi SDK getting started — Programmatic dataset management
- Resource usage — Monitor Data Rows consumption
- Rename a dataset — Update dataset names
- Delete a dataset — Remove unused datasets
- Download data — Export full datasets or annotations
- View insights — Analyze dataset statistics
- Manage assets — Delete assets and perform bulk actions
- Train a model — Use datasets for VLM training
- Quickstart — Complete training workflow
Need help?
We're here to support your VLMOps journey. Reach out through any of these channels:
Updated about 1 month ago
