Download Data

Downloading data from Datature Vi enables you to create backups, develop locally, process data externally, or migrate to other platforms. You can export complete datasets with all assets and annotations, or download only the annotation data for lightweight transfers.

This guide helps you choose the right export method and format for your specific needs.

💡
Two ways to download data

Full dataset — Assets (images/videos) + annotations in organized folders

Annotations only — Just the annotation data in Vi JSONL or TFRecord format

Choose based on whether you need the actual image files or just the annotation metadata.

Export options

Download full dataset

Export complete dataset including all assets and annotations for backup, local training, or migration

Download annotations

Export only annotation data (Vi JSONL or TFRecord) without asset files for lightweight transfers

Which export method should I use?

Choose the right download method based on your use case:

Use case	Recommended export	What you get	Format
Backup before deletion	Full dataset	Assets + annotations	Organized folders
Local model training	Full dataset	Assets + annotations	Training-ready structure
Data migration	Full dataset	Assets + annotations	Portable dataset
External processing	Full dataset	Assets + annotations	Complete data
Annotation analysis	Annotations only	Label data, no images	Vi JSONL or TFRecord
Format conversion	Annotations only	Annotation metadata	Vi JSONL or TFRecord
Lightweight backup	Annotations only	Compact annotation data	Vi JSONL or TFRecord
SDK integration	Annotations only	Programmatic access	Vi JSONL

Download full dataset

Download your complete dataset including all assets (images, videos) and their annotations in a training-ready folder structure.

What's included

Assets

All original images or videos
Preserved file names and formats
Organized by train/validation split (optional)
Full resolution, no compression

Annotations

Complete annotation data
Phrase grounding pairs (captions + bounding boxes)
VQA pairs (question-answer data)
Freeform annotations (coming soon)
Metadata and dataset information

Best for

Backup and disaster recovery — Complete copy of your dataset
Local development — Train models on your own infrastructure
Data migration — Move datasets to other platforms or systems
External processing — Use custom tools or pipelines
Offline access — Work without internet connectivity
Version control — Create snapshots of dataset versions

Export formats

Full dataset exports include:

Assets — Original image/video files in their native formats (JPEG, PNG, etc.)
Annotations — Vi JSONL format with all annotation data

Folder structure — Organized directories for training workflows:

dataset-name/
├── assets/
│   ├── train/
│   └── validation/
└── annotations/
    └── annotations.jsonl

Learn how to download full datasets →

Download annotations only

Export just the annotation data without asset files for lightweight transfers, analysis, or format conversion.

What's included

Annotation metadata — All phrase grounding and VQA annotation data (freeform coming soon)
Bounding box coordinates — Spatial information for phrase grounding
Text data — Captions, phrases, questions, and answers
Asset references — File paths and IDs (without the actual files)
Dataset metadata — Information about splits and structure

Best for

Lightweight backups — Small file size, fast transfer
Annotation analysis — Analyze labels without processing images
Format conversion — Convert Vi annotations to other formats
Label review — Check annotation quality and distribution
Programmatic access — Load and parse annotation data in scripts
Sharing labels — Send annotations to team members without large files

Export formats

Choose from two annotation-only formats:

Vi JSONL

Best for: Human-readable format, parsing, conversion

One JSON object per line
Easy to read and parse
Simple format conversion
Direct SDK integration
Supports all annotation types

TFRecord

Best for: TensorFlow training, performance

Binary format for efficiency
Optimized for TensorFlow
Faster I/O operations
Reduced file size
Training pipeline integration

Learn how to download annotations →

Comparison: Full dataset vs annotations only

Feature	Full dataset	Annotations only
Assets included	✅ Yes	❌ No
Annotations included	✅ Yes	✅ Yes
File size	Large (GB)	Small (MB)
Download time	Minutes to hours	Seconds to minutes
Use for training	✅ Direct training	❌ Need to fetch assets
Backup completeness	✅ Complete backup	⚠️ Annotations only
Format conversion	Possible but slower	✅ Fast and easy
Annotation analysis	Possible	✅ Optimized for this
SDK loading	`ViDataset()`	Parse JSONL directly

Download methods

You can download data through the web interface or programmatically with the Vi SDK:

Web interface downloads

Best for: One-time exports, small to medium datasets, visual workflows

Features:

Point-and-click export
Real-time progress tracking
Visual format selection
Browser-based downloads
No coding required

Access:

Navigate to dataset in Explorer
Click download button in dataset menu
Choose export format and options
Monitor download progress

Programmatic downloads with Vi SDK

Best for: Large datasets, automated workflows, repeated exports, CI/CD integration

Features:

Efficient batch processing
Automated retry logic
Progress tracking in code
Integration with pipelines
Scheduled exports

Example:

import vi

client = vi.Client(
    secret_key="your-secret-key",
    organization_id="your-org-id"
)

# Download full dataset
result = client.datasets.download(
    dataset_id="dataset-id",
    save_dir="./data"
)

# Download annotations only (recommended convenience method)
result = client.annotations.download(
    dataset_id="dataset-id",
    save_dir="./annotations"
)

# Alternative: Download annotations via datasets API
result = client.datasets.download(
    dataset_id="dataset-id",
    save_dir="./annotations",
    annotations_only=True
)

Learn more about Vi SDK →

Export formats explained

Vi JSONL format

JSON Lines format with one annotation record per line:

{"asset_id": "asset-123", "filename": "image001.jpg", "annotations": [...]}
{"asset_id": "asset-124", "filename": "image002.jpg", "annotations": [...]}

Advantages:

Human-readable and easy to understand
Simple to parse with standard JSON libraries
One record per line makes streaming efficient
Easy to convert to other formats
Works with text editors and command-line tools

Use cases:

Annotation analysis and statistics
Format conversion to COCO, YOLO, etc.
Custom processing scripts
Quality assurance reviews
Label distribution analysis

TFRecord format

TensorFlow's binary format optimized for training:

Advantages:

Highly efficient binary format
Optimized for TensorFlow data pipelines
Faster I/O compared to text formats
Compressed storage for large datasets
Built-in support in TensorFlow ecosystem

Use cases:

TensorFlow model training
Performance-critical data loading
Large-scale training workflows
Production ML pipelines
Distributed training systems

Common use cases

Backup strategy before major changes

Scenario: You're about to delete assets or make significant changes to your dataset.

Recommended approach:

Download full dataset for complete backup
Store in cloud storage or version control
Verify download completed successfully
Proceed with changes
Keep backup until changes are validated

Why: Complete backups ensure you can restore everything if needed.

Download full dataset guide →

Local model training and development

Scenario: You want to train models on your own infrastructure or develop locally.

Recommended approach:

Download full dataset with train/test split
Use ViDataset loader from Vi SDK to load data
Integrate with your training framework
Visualize annotations to verify data quality
Train and iterate locally

Why: Having complete data locally enables offline development and custom training pipelines.

Vi SDK dataset loading →

Annotation format conversion

Scenario: You need to convert Vi annotations to COCO, YOLO, or other formats.

Recommended approach:

Download annotations only in Vi JSONL format
Parse JSONL with Python JSON library
Write conversion script to target format
Validate converted annotations
Use with external training tools

Why: Lightweight annotation files are faster to download and easier to process for conversion.

Download annotations guide →

Data migration to another platform

Scenario: You're moving your dataset to another platform or storage system.

Recommended approach:

Download full dataset to get all data
Review folder structure and organization
Convert annotations to target platform's format if needed
Upload to new platform
Verify data integrity after migration

Why: Full dataset export provides everything needed for complete migration.

Download full dataset guide →

Annotation quality analysis

Scenario: You want to analyze annotation distribution, check for class imbalance, or review label quality.

Recommended approach:

Download annotations only in Vi JSONL format
Write analysis scripts to parse JSONL
Generate statistics (class counts, distributions)
Identify potential issues or gaps
Use insights to improve dataset

Why: Annotation-only downloads are lightweight and perfect for metadata analysis.

Download annotations guide →

Best practices

Download before deletion

Always export a backup before deleting datasets or performing bulk asset cleanup

Verify downloads

Check that downloaded files are complete and uncorrupted before relying on them

Use SDK for large datasets

For datasets over 1,000 assets, use Vi SDK for more reliable downloads

Organize exports

Use consistent naming and folder structures for downloaded datasets

Store in version control

Track dataset versions by exporting snapshots after significant changes

Schedule periodic backups

Automate regular exports with Vi SDK for important production datasets

Download size and time estimates

Plan your downloads by understanding approximate sizes and times:

Full dataset downloads

Dataset size	Number of assets	Typical download time	Storage needed
Small	10-100 images	30 seconds - 2 minutes	50 MB - 500 MB
Medium	100-1,000 images	2-15 minutes	500 MB - 5 GB
Large	1,000-10,000 images	15 minutes - 2 hours	5 GB - 50 GB
Very large	10,000+ images	2+ hours	50 GB+

Annotation-only downloads

Dataset size	Number of annotations	Typical download time	Storage needed
Small	10-100 annotations	< 5 seconds	< 1 MB
Medium	100-1,000 annotations	5-30 seconds	1-10 MB
Large	1,000-10,000 annotations	30 seconds - 2 minutes	10-100 MB
Very large	10,000+ annotations	2-5 minutes	100 MB - 1 GB

💡
Tip: Download times vary based on your internet connection speed, server load, and asset file sizes. Use Vi SDK for more reliable downloads of large datasets with automatic retry logic.

Troubleshooting

Download fails or times out

Potential causes:

Large dataset size
Network connectivity issues
Browser limitations or restrictions
Server-side processing delays

Solutions:

Use Vi SDK for large datasets (more reliable with retry logic)
Split into smaller downloads if possible
Check internet connection stability
Try during off-peak hours for faster server response
Ensure sufficient local storage space
Disable browser extensions that might interfere

Downloaded files are incomplete or corrupted

Potential causes:

Download interrupted before completion
Disk space exhausted during download
File system limitations
Network errors during transfer

Solutions:

Verify download completed (check progress reached 100%)
Check available disk space before downloading
Re-download the dataset from scratch
Use Vi SDK which includes file integrity checks
Try downloading to a different directory

Cannot find downloaded files

Potential causes:

Browser default download location
Custom download directory settings
File system permissions

Solutions:

Check browser's default download folder
Search for dataset name in your file system
Review browser download history for location
For SDK downloads, check the save_dir parameter path

Annotation format not compatible with my tools

Potential causes:

Tool expects different annotation format
Need conversion to COCO, YOLO, etc.

Solutions:

Download annotations in Vi JSONL format
Write or use conversion scripts to target format
Check if Vi SDK has utilities for your target format
Contact support for conversion assistance
See community tools and scripts for common conversions

Next steps

Download full dataset

Export complete dataset with assets and annotations

Download annotations

Export annotation data only in Vi JSONL or TFRecord

Use Vi SDK

Programmatic downloads and automation

Load datasets

Load and iterate through downloaded datasets

Manage datasets

Return to dataset management overview

Upload data

Upload new data or re-upload exported datasets

Related resources

Manage datasets — Overview of all dataset management operations
Rename a dataset — Update dataset names
Delete a dataset — Permanently remove datasets
Vi SDK Getting Started — Programmatic data access
Vi SDK Datasets API — Complete dataset API reference

Need help?

We're here to support your VLMOps journey. Reach out through any of these channels:

Contact Support

Get help from our team via our website or email us at [email protected]

Join Our Community

Connect with other Datature users, share ideas, and get community support on Slack

Explore Resources

Read our Blog
Check out GitHub
Watch Tutorials

Schedule a Demo

Book a personalized demo to see how Datature Vi can accelerate your vision AI projects