Download Data

Export your datasets and annotations for backup, local development, or external processing.

Downloading data from Datature Vi enables you to create backups, develop locally, process data externally, or migrate to other platforms. You can export complete datasets with all assets and annotations, or download only the annotation data for lightweight transfers.

This guide helps you choose the right export method and format for your specific needs.

💡

Two ways to download data

  • Full dataset — Assets (images/videos) + annotations in organized folders
  • Annotations only — Just the annotation data in Vi JSONL or TFRecord format

Choose based on whether you need the actual image files or just the annotation metadata.


Export options


Which export method should I use?

Choose the right download method based on your use case:

Use caseRecommended exportWhat you getFormat
Backup before deletionFull datasetAssets + annotationsOrganized folders
Local model trainingFull datasetAssets + annotationsTraining-ready structure
Data migrationFull datasetAssets + annotationsPortable dataset
External processingFull datasetAssets + annotationsComplete data
Annotation analysisAnnotations onlyLabel data, no imagesVi JSONL or TFRecord
Format conversionAnnotations onlyAnnotation metadataVi JSONL or TFRecord
Lightweight backupAnnotations onlyCompact annotation dataVi JSONL or TFRecord
SDK integrationAnnotations onlyProgrammatic accessVi JSONL

Download full dataset

Download your complete dataset including all assets (images, videos) and their annotations in a training-ready folder structure.

What's included

Assets

  • All original images or videos
  • Preserved file names and formats
  • Organized by train/validation split (optional)
  • Full resolution, no compression

Annotations

  • Complete annotation data
  • Phrase grounding pairs (captions + bounding boxes)
  • VQA pairs (question-answer data)
  • Freeform annotations (coming soon)
  • Metadata and dataset information

Best for

  • Backup and disaster recovery — Complete copy of your dataset
  • Local development — Train models on your own infrastructure
  • Data migration — Move datasets to other platforms or systems
  • External processing — Use custom tools or pipelines
  • Offline access — Work without internet connectivity
  • Version control — Create snapshots of dataset versions

Export formats

Full dataset exports include:

  • Assets — Original image/video files in their native formats (JPEG, PNG, etc.)
  • Annotations — Vi JSONL format with all annotation data
  • Folder structure — Organized directories for training workflows:
    dataset-name/
    ├── assets/
    │   ├── train/
    │   └── validation/
    └── annotations/
        └── annotations.jsonl

Learn how to download full datasets →


Download annotations only

Export just the annotation data without asset files for lightweight transfers, analysis, or format conversion.

What's included

  • Annotation metadata — All phrase grounding and VQA annotation data (freeform coming soon)
  • Bounding box coordinates — Spatial information for phrase grounding
  • Text data — Captions, phrases, questions, and answers
  • Asset references — File paths and IDs (without the actual files)
  • Dataset metadata — Information about splits and structure

Best for

  • Lightweight backups — Small file size, fast transfer
  • Annotation analysis — Analyze labels without processing images
  • Format conversion — Convert Vi annotations to other formats
  • Label review — Check annotation quality and distribution
  • Programmatic access — Load and parse annotation data in scripts
  • Sharing labels — Send annotations to team members without large files

Export formats

Choose from two annotation-only formats:

Vi JSONL

Best for: Human-readable format, parsing, conversion

  • One JSON object per line
  • Easy to read and parse
  • Simple format conversion
  • Direct SDK integration
  • Supports all annotation types

TFRecord

Best for: TensorFlow training, performance

  • Binary format for efficiency
  • Optimized for TensorFlow
  • Faster I/O operations
  • Reduced file size
  • Training pipeline integration

Learn how to download annotations →


Comparison: Full dataset vs annotations only

FeatureFull datasetAnnotations only
Assets included✅ Yes❌ No
Annotations included✅ Yes✅ Yes
File sizeLarge (GB)Small (MB)
Download timeMinutes to hoursSeconds to minutes
Use for training✅ Direct training❌ Need to fetch assets
Backup completeness✅ Complete backup⚠️ Annotations only
Format conversionPossible but slower✅ Fast and easy
Annotation analysisPossible✅ Optimized for this
SDK loadingViDataset()Parse JSONL directly

Download methods

You can download data through the web interface or programmatically with the Vi SDK:

Web interface downloads

Best for: One-time exports, small to medium datasets, visual workflows

Features:

  • Point-and-click export
  • Real-time progress tracking
  • Visual format selection
  • Browser-based downloads
  • No coding required

Access:

  1. Navigate to dataset in Explorer
  2. Click download button in dataset menu
  3. Choose export format and options
  4. Monitor download progress

Programmatic downloads with Vi SDK

Best for: Large datasets, automated workflows, repeated exports, CI/CD integration

Features:

  • Efficient batch processing
  • Automated retry logic
  • Progress tracking in code
  • Integration with pipelines
  • Scheduled exports

Example:

import vi

client = vi.Client(
    secret_key="your-secret-key",
    organization_id="your-org-id"
)

# Download full dataset
result = client.datasets.download(
    dataset_id="dataset-id",
    save_dir="./data"
)

# Download annotations only (recommended convenience method)
result = client.annotations.download(
    dataset_id="dataset-id",
    save_dir="./annotations"
)

# Alternative: Download annotations via datasets API
result = client.datasets.download(
    dataset_id="dataset-id",
    save_dir="./annotations",
    annotations_only=True
)

Learn more about Vi SDK →


Export formats explained

Vi JSONL format

JSON Lines format with one annotation record per line:

{"asset_id": "asset-123", "filename": "image001.jpg", "annotations": [...]}
{"asset_id": "asset-124", "filename": "image002.jpg", "annotations": [...]}

Advantages:

  • Human-readable and easy to understand
  • Simple to parse with standard JSON libraries
  • One record per line makes streaming efficient
  • Easy to convert to other formats
  • Works with text editors and command-line tools

Use cases:

  • Annotation analysis and statistics
  • Format conversion to COCO, YOLO, etc.
  • Custom processing scripts
  • Quality assurance reviews
  • Label distribution analysis

TFRecord format

TensorFlow's binary format optimized for training:

Advantages:

  • Highly efficient binary format
  • Optimized for TensorFlow data pipelines
  • Faster I/O compared to text formats
  • Compressed storage for large datasets
  • Built-in support in TensorFlow ecosystem

Use cases:

  • TensorFlow model training
  • Performance-critical data loading
  • Large-scale training workflows
  • Production ML pipelines
  • Distributed training systems

Common use cases

Backup strategy before major changes

Scenario: You're about to delete assets or make significant changes to your dataset.

Recommended approach:

  1. Download full dataset for complete backup
  2. Store in cloud storage or version control
  3. Verify download completed successfully
  4. Proceed with changes
  5. Keep backup until changes are validated

Why: Complete backups ensure you can restore everything if needed.

Download full dataset guide →

Local model training and development

Scenario: You want to train models on your own infrastructure or develop locally.

Recommended approach:

  1. Download full dataset with train/test split
  2. Use ViDataset loader from Vi SDK to load data
  3. Integrate with your training framework
  4. Visualize annotations to verify data quality
  5. Train and iterate locally

Why: Having complete data locally enables offline development and custom training pipelines.

Vi SDK dataset loading →

Annotation format conversion

Scenario: You need to convert Vi annotations to COCO, YOLO, or other formats.

Recommended approach:

  1. Download annotations only in Vi JSONL format
  2. Parse JSONL with Python JSON library
  3. Write conversion script to target format
  4. Validate converted annotations
  5. Use with external training tools

Why: Lightweight annotation files are faster to download and easier to process for conversion.

Download annotations guide →

Data migration to another platform

Scenario: You're moving your dataset to another platform or storage system.

Recommended approach:

  1. Download full dataset to get all data
  2. Review folder structure and organization
  3. Convert annotations to target platform's format if needed
  4. Upload to new platform
  5. Verify data integrity after migration

Why: Full dataset export provides everything needed for complete migration.

Download full dataset guide →

Annotation quality analysis

Scenario: You want to analyze annotation distribution, check for class imbalance, or review label quality.

Recommended approach:

  1. Download annotations only in Vi JSONL format
  2. Write analysis scripts to parse JSONL
  3. Generate statistics (class counts, distributions)
  4. Identify potential issues or gaps
  5. Use insights to improve dataset

Why: Annotation-only downloads are lightweight and perfect for metadata analysis.

Download annotations guide →


Best practices

Download before deletion

Always export a backup before deleting datasets or performing bulk asset cleanup

Verify downloads

Check that downloaded files are complete and uncorrupted before relying on them

Use SDK for large datasets

For datasets over 1,000 assets, use Vi SDK for more reliable downloads

Organize exports

Use consistent naming and folder structures for downloaded datasets

Store in version control

Track dataset versions by exporting snapshots after significant changes

Schedule periodic backups

Automate regular exports with Vi SDK for important production datasets


Download size and time estimates

Plan your downloads by understanding approximate sizes and times:

Full dataset downloads

Dataset sizeNumber of assetsTypical download timeStorage needed
Small10-100 images30 seconds - 2 minutes50 MB - 500 MB
Medium100-1,000 images2-15 minutes500 MB - 5 GB
Large1,000-10,000 images15 minutes - 2 hours5 GB - 50 GB
Very large10,000+ images2+ hours50 GB+

Annotation-only downloads

Dataset sizeNumber of annotationsTypical download timeStorage needed
Small10-100 annotations< 5 seconds< 1 MB
Medium100-1,000 annotations5-30 seconds1-10 MB
Large1,000-10,000 annotations30 seconds - 2 minutes10-100 MB
Very large10,000+ annotations2-5 minutes100 MB - 1 GB
💡

Tip: Download times vary based on your internet connection speed, server load, and asset file sizes. Use Vi SDK for more reliable downloads of large datasets with automatic retry logic.


Troubleshooting

Download fails or times out

Potential causes:

  • Large dataset size
  • Network connectivity issues
  • Browser limitations or restrictions
  • Server-side processing delays

Solutions:

  • Use Vi SDK for large datasets (more reliable with retry logic)
  • Split into smaller downloads if possible
  • Check internet connection stability
  • Try during off-peak hours for faster server response
  • Ensure sufficient local storage space
  • Disable browser extensions that might interfere
Downloaded files are incomplete or corrupted

Potential causes:

  • Download interrupted before completion
  • Disk space exhausted during download
  • File system limitations
  • Network errors during transfer

Solutions:

  • Verify download completed (check progress reached 100%)
  • Check available disk space before downloading
  • Re-download the dataset from scratch
  • Use Vi SDK which includes file integrity checks
  • Try downloading to a different directory
Cannot find downloaded files

Potential causes:

  • Browser default download location
  • Custom download directory settings
  • File system permissions

Solutions:

  • Check browser's default download folder
  • Search for dataset name in your file system
  • Review browser download history for location
  • For SDK downloads, check the save_dir parameter path
Annotation format not compatible with my tools

Potential causes:

  • Tool expects different annotation format
  • Need conversion to COCO, YOLO, etc.

Solutions:

  • Download annotations in Vi JSONL format
  • Write or use conversion scripts to target format
  • Check if Vi SDK has utilities for your target format
  • Contact support for conversion assistance
  • See community tools and scripts for common conversions

Next steps


Related resources