Manage Datasets

Organize, maintain, and optimize your datasets through renaming, deletion, downloading, and asset management.

Effective dataset management is essential for maintaining high-quality training data and keeping your workspace organized. Datature Vi provides comprehensive tools to rename, download, analyze, and clean up your datasets throughout the model development lifecycle.

This guide covers all aspects of dataset management, from basic operations like renaming and downloading to advanced maintenance tasks like bulk asset cleanup and insight analysis.

💡

Complete dataset workflow

Create datasetUpload dataAnnotateManage datasets (you are here) → Train model


Core dataset operations

Datature Vi provides five essential categories of dataset management operations:


Quick reference

Common dataset management tasks and where to find them:

TaskDocumentationWhen to use
Change dataset nameRename a dataset →Project reorganization, improved clarity
Export for backupDownload full dataset →Before major changes, periodic backups
Export annotations onlyDownload annotations →Lightweight backup, format conversion
Remove poor quality imagesDelete an asset →Quality control, dataset cleanup
Bulk asset cleanupBulk actions →Remove multiple assets efficiently
Check dataset statisticsView insights →Analyze distribution, spot issues
Remove entire datasetDelete a dataset →Cleanup unused projects

Renaming datasets

Keep your workspace organized with descriptive, meaningful dataset names.

When to rename

  • Improved clarity — Make dataset purpose clear to team members
  • Project evolution — Update names as content or scope changes
  • Standardization — Apply consistent naming conventions
  • Version management — Distinguish between dataset versions

Key features

  • Safe operation — Dataset ID remains unchanged; all integrations continue working
  • No downtime — Active training runs and workflows are unaffected
  • Instant updates — Name changes appear immediately across the platform
  • Unlimited changes — Rename as often as needed

Learn how to rename datasets →


Downloading data

Export your datasets and annotations for backup, local development, or external processing.

Export options

Common use cases

Use caseRecommended exportFormat
Backup before deletionFull datasetAssets + Vi JSONL
Local trainingFull datasetAssets + TFRecord
Annotation analysisAnnotations onlyVi JSONL
Format conversionAnnotations onlyVi JSONL or TFRecord
External processingFull datasetAssets + Vi JSONL
Periodic backupsFull datasetAssets + annotations

Download methods

  • Web interface — Point-and-click export with progress tracking
  • Vi SDK — Programmatic downloads with client.get_dataset()
  • Automated workflows — Schedule periodic backups via SDK

Explore download options →


Viewing dataset insights

Analyze your dataset composition, annotation distribution, and quality metrics to ensure optimal training data.

Available insights

  • Asset statistics — Total count, file types, size distribution
  • Annotation counts — Phrase grounding pairs, VQA pairs, distribution
  • Class balance — See if your annotations are evenly distributed
  • Quality indicators — Identify potential issues or gaps
  • Split information — Training vs validation set breakdown

When to check insights

Before training

Review dataset insights before starting training runs:

  • Verify you have sufficient annotations
  • Check for class imbalance issues
  • Ensure asset quality meets requirements
  • Confirm train/validation split is appropriate
During dataset cleanup

Monitor changes as you clean up your dataset:

  • Track annotation count changes after deletions
  • Verify balanced distribution after asset removal
  • Confirm quality improvements from cleanup
Quality assurance

Regular insight reviews help maintain data quality:

  • Spot outliers or unusual distributions
  • Identify missing annotations
  • Detect potential labeling errors
  • Plan additional data collection

Learn how to view insights →


Managing assets

Maintain dataset quality by removing poor quality images, duplicates, or unnecessary assets.

Asset management operations

Common scenarios

ScenarioRecommended actionMethod
Poor quality imagesDelete individual assetsSingle deletion
Duplicate contentRemove duplicatesBulk operations
Wrong datasetMove or delete assetsBulk operations
Privacy complianceRemove sensitive dataIndividual or bulk
Dataset optimizationClean up test dataBulk operations

Important considerations

⚠️

Asset deletion is permanent

  • Cannot be undone — Deleted assets are removed immediately
  • Annotations lost — All annotations associated with deleted assets are also removed
  • Always backup first — Download dataset before large-scale deletions
  • No recovery option — Only way to restore is re-uploading from backups

Learn about asset management →


Deleting datasets

Permanently remove datasets you no longer need to keep your workspace organized and manage storage.

When to delete

  • Completed projects — Remove datasets after project completion
  • Test datasets — Clean up experimental or prototype datasets
  • Duplicate data — Remove redundant datasets
  • Storage optimization — Free up space for new projects
  • Workspace cleanup — Maintain organized, clutter-free environment

Safety measures

Before deleting a dataset:

  1. Create a backupDownload the full dataset
  2. Check dependencies — Verify no active training runs depend on this data
  3. Inform team — Notify collaborators about the planned deletion
  4. Review carefully — Ensure you're deleting the correct dataset
🔒

Deletion is permanent and irreversible

  • All assets (images, videos) are deleted
  • All annotations are permanently removed
  • Dataset ID becomes invalid
  • Training runs lose reference to source data
  • Cannot be recovered through any means
  • SDK queries for deleted dataset ID will fail

Always export a backup first if there's any chance you'll need the data later.

Learn how to delete datasets →


Best practices for dataset management

Regular backups

Download datasets periodically, especially before major changes

Descriptive naming

Use clear, consistent naming conventions across your organization

Quality monitoring

Regularly check dataset insights to maintain data quality

Proactive cleanup

Remove poor quality assets as you discover them

Document changes

Keep notes on major dataset modifications for team reference

Version control

Create dataset versions for significant changes using naming


Dataset management workflow

Follow this recommended workflow for maintaining healthy datasets:

1. Regular monitoring

  • Check insights weekly — Review statistics and distributions
  • Track annotation progress — Monitor annotation completion rates
  • Identify quality issues — Spot problems early

2. Periodic cleanup

  • Remove poor quality data — Delete blurry, corrupted, or off-topic images
  • Eliminate duplicates — Use bulk operations for efficiency
  • Archive old versions — Download and remove superseded datasets

3. Backup strategy

  • Before major changes — Always download before bulk deletions
  • Monthly backups — Regular exports for important datasets
  • Version snapshots — Download before significant modifications

4. Organization maintenance

  • Standardize names — Apply consistent naming conventions
  • Delete test data — Remove experimental datasets after use
  • Document structure — Maintain team documentation on dataset organization

Programmatic dataset management

For advanced users, the Vi SDK provides programmatic access to all dataset management operations:

import vi

client = vi.Client(
    secret_key="your-secret-key",
    organization_id="your-org-id"
)

# List all datasets
for dataset in client.datasets:
    print(f"Dataset: {dataset.name} (ID: {dataset.dataset_id})")

# Download a dataset
result = client.get_dataset(
    dataset_id="dataset-id",
    save_dir="./backups"
)

# Delete a dataset
client.datasets.delete("dataset-id")

Learn more about Vi SDK →


Troubleshooting

Cannot rename dataset

Potential causes:

  • Insufficient permissions
  • Browser caching issues
  • Active training using the dataset

Solutions:

  • Verify you have edit access to the dataset
  • Refresh the page and try again
  • Check if any training runs are actively using the dataset
Download fails or times out

Potential causes:

  • Large dataset size
  • Network connectivity issues
  • Browser limitations

Solutions:

  • Use Vi SDK for large datasets (more reliable)
  • Split downloads into smaller chunks if possible
  • Check your internet connection stability
  • Try downloading during off-peak hours
Deleted wrong dataset

Unfortunately, deletion is permanent:

  • No recovery option through the interface
  • No undo or trash bin functionality
  • Cannot restore from server backups

Your only options:

  • Re-upload from local backup if available
  • Recreate dataset from original source data
  • Contact support for Enterprise plans (may have additional options)

Next steps


Related resources