Download Data
Export your datasets and annotations for backup, local development, or external processing.
Downloading data from Datature Vi enables you to create backups, develop locally, process data externally, or migrate to other platforms. You can export complete datasets with all assets and annotations, or download only the annotation data for lightweight transfers.
This guide helps you choose the right export method and format for your specific needs.
Two ways to download data
- Full dataset — Assets (images/videos) + annotations in organized folders
- Annotations only — Just the annotation data in Vi JSONL or TFRecord format
Choose based on whether you need the actual image files or just the annotation metadata.
Export options
Export complete dataset including all assets and annotations for backup, local training, or migration
Export only annotation data (Vi JSONL or TFRecord) without asset files for lightweight transfers
Which export method should I use?
Choose the right download method based on your use case:
| Use case | Recommended export | What you get | Format |
|---|---|---|---|
| Backup before deletion | Full dataset | Assets + annotations | Organized folders |
| Local model training | Full dataset | Assets + annotations | Training-ready structure |
| Data migration | Full dataset | Assets + annotations | Portable dataset |
| External processing | Full dataset | Assets + annotations | Complete data |
| Annotation analysis | Annotations only | Label data, no images | Vi JSONL or TFRecord |
| Format conversion | Annotations only | Annotation metadata | Vi JSONL or TFRecord |
| Lightweight backup | Annotations only | Compact annotation data | Vi JSONL or TFRecord |
| SDK integration | Annotations only | Programmatic access | Vi JSONL |
Download full dataset
Download your complete dataset including all assets (images, videos) and their annotations in a training-ready folder structure.
What's included
Best for
- Backup and disaster recovery — Complete copy of your dataset
- Local development — Train models on your own infrastructure
- Data migration — Move datasets to other platforms or systems
- External processing — Use custom tools or pipelines
- Offline access — Work without internet connectivity
- Version control — Create snapshots of dataset versions
Export formats
Full dataset exports include:
- Assets — Original image/video files in their native formats (JPEG, PNG, etc.)
- Annotations — Vi JSONL format with all annotation data
- Folder structure — Organized directories for training workflows:
dataset-name/ ├── assets/ │ ├── train/ │ └── validation/ └── annotations/ └── annotations.jsonl
Learn how to download full datasets →
Download annotations only
Export just the annotation data without asset files for lightweight transfers, analysis, or format conversion.
What's included
- Annotation metadata — All phrase grounding and VQA annotation data (freeform coming soon)
- Bounding box coordinates — Spatial information for phrase grounding
- Text data — Captions, phrases, questions, and answers
- Asset references — File paths and IDs (without the actual files)
- Dataset metadata — Information about splits and structure
Best for
- Lightweight backups — Small file size, fast transfer
- Annotation analysis — Analyze labels without processing images
- Format conversion — Convert Vi annotations to other formats
- Label review — Check annotation quality and distribution
- Programmatic access — Load and parse annotation data in scripts
- Sharing labels — Send annotations to team members without large files
Export formats
Choose from two annotation-only formats:
Learn how to download annotations →
Comparison: Full dataset vs annotations only
| Feature | Full dataset | Annotations only |
|---|---|---|
| Assets included | ✅ Yes | ❌ No |
| Annotations included | ✅ Yes | ✅ Yes |
| File size | Large (GB) | Small (MB) |
| Download time | Minutes to hours | Seconds to minutes |
| Use for training | ✅ Direct training | ❌ Need to fetch assets |
| Backup completeness | ✅ Complete backup | ⚠️ Annotations only |
| Format conversion | Possible but slower | ✅ Fast and easy |
| Annotation analysis | Possible | ✅ Optimized for this |
| SDK loading | ViDataset() | Parse JSONL directly |
Download methods
You can download data through the web interface or programmatically with the Vi SDK:
Web interface downloads
Best for: One-time exports, small to medium datasets, visual workflows
Features:
- Point-and-click export
- Real-time progress tracking
- Visual format selection
- Browser-based downloads
- No coding required
Access:
- Navigate to dataset in Explorer
- Click download button in dataset menu
- Choose export format and options
- Monitor download progress
Programmatic downloads with Vi SDK
Best for: Large datasets, automated workflows, repeated exports, CI/CD integration
Features:
- Efficient batch processing
- Automated retry logic
- Progress tracking in code
- Integration with pipelines
- Scheduled exports
Example:
import vi
client = vi.Client(
secret_key="your-secret-key",
organization_id="your-org-id"
)
# Download full dataset
result = client.datasets.download(
dataset_id="dataset-id",
save_dir="./data"
)
# Download annotations only (recommended convenience method)
result = client.annotations.download(
dataset_id="dataset-id",
save_dir="./annotations"
)
# Alternative: Download annotations via datasets API
result = client.datasets.download(
dataset_id="dataset-id",
save_dir="./annotations",
annotations_only=True
)Export formats explained
Vi JSONL format
JSON Lines format with one annotation record per line:
{"asset_id": "asset-123", "filename": "image001.jpg", "annotations": [...]}
{"asset_id": "asset-124", "filename": "image002.jpg", "annotations": [...]}Advantages:
- Human-readable and easy to understand
- Simple to parse with standard JSON libraries
- One record per line makes streaming efficient
- Easy to convert to other formats
- Works with text editors and command-line tools
Use cases:
- Annotation analysis and statistics
- Format conversion to COCO, YOLO, etc.
- Custom processing scripts
- Quality assurance reviews
- Label distribution analysis
TFRecord format
TensorFlow's binary format optimized for training:
Advantages:
- Highly efficient binary format
- Optimized for TensorFlow data pipelines
- Faster I/O compared to text formats
- Compressed storage for large datasets
- Built-in support in TensorFlow ecosystem
Use cases:
- TensorFlow model training
- Performance-critical data loading
- Large-scale training workflows
- Production ML pipelines
- Distributed training systems
Common use cases
Backup strategy before major changes
Scenario: You're about to delete assets or make significant changes to your dataset.
Recommended approach:
- Download full dataset for complete backup
- Store in cloud storage or version control
- Verify download completed successfully
- Proceed with changes
- Keep backup until changes are validated
Why: Complete backups ensure you can restore everything if needed.
Local model training and development
Scenario: You want to train models on your own infrastructure or develop locally.
Recommended approach:
- Download full dataset with train/test split
- Use
ViDatasetloader from Vi SDK to load data - Integrate with your training framework
- Visualize annotations to verify data quality
- Train and iterate locally
Why: Having complete data locally enables offline development and custom training pipelines.
Annotation format conversion
Scenario: You need to convert Vi annotations to COCO, YOLO, or other formats.
Recommended approach:
- Download annotations only in Vi JSONL format
- Parse JSONL with Python JSON library
- Write conversion script to target format
- Validate converted annotations
- Use with external training tools
Why: Lightweight annotation files are faster to download and easier to process for conversion.
Data migration to another platform
Scenario: You're moving your dataset to another platform or storage system.
Recommended approach:
- Download full dataset to get all data
- Review folder structure and organization
- Convert annotations to target platform's format if needed
- Upload to new platform
- Verify data integrity after migration
Why: Full dataset export provides everything needed for complete migration.
Annotation quality analysis
Scenario: You want to analyze annotation distribution, check for class imbalance, or review label quality.
Recommended approach:
- Download annotations only in Vi JSONL format
- Write analysis scripts to parse JSONL
- Generate statistics (class counts, distributions)
- Identify potential issues or gaps
- Use insights to improve dataset
Why: Annotation-only downloads are lightweight and perfect for metadata analysis.
Best practices
Always export a backup before deleting datasets or performing bulk asset cleanup
Check that downloaded files are complete and uncorrupted before relying on them
For datasets over 1,000 assets, use Vi SDK for more reliable downloads
Use consistent naming and folder structures for downloaded datasets
Track dataset versions by exporting snapshots after significant changes
Automate regular exports with Vi SDK for important production datasets
Download size and time estimates
Plan your downloads by understanding approximate sizes and times:
Full dataset downloads
| Dataset size | Number of assets | Typical download time | Storage needed |
|---|---|---|---|
| Small | 10-100 images | 30 seconds - 2 minutes | 50 MB - 500 MB |
| Medium | 100-1,000 images | 2-15 minutes | 500 MB - 5 GB |
| Large | 1,000-10,000 images | 15 minutes - 2 hours | 5 GB - 50 GB |
| Very large | 10,000+ images | 2+ hours | 50 GB+ |
Annotation-only downloads
| Dataset size | Number of annotations | Typical download time | Storage needed |
|---|---|---|---|
| Small | 10-100 annotations | < 5 seconds | < 1 MB |
| Medium | 100-1,000 annotations | 5-30 seconds | 1-10 MB |
| Large | 1,000-10,000 annotations | 30 seconds - 2 minutes | 10-100 MB |
| Very large | 10,000+ annotations | 2-5 minutes | 100 MB - 1 GB |
Tip: Download times vary based on your internet connection speed, server load, and asset file sizes. Use Vi SDK for more reliable downloads of large datasets with automatic retry logic.
Troubleshooting
Download fails or times out
Potential causes:
- Large dataset size
- Network connectivity issues
- Browser limitations or restrictions
- Server-side processing delays
Solutions:
- Use Vi SDK for large datasets (more reliable with retry logic)
- Split into smaller downloads if possible
- Check internet connection stability
- Try during off-peak hours for faster server response
- Ensure sufficient local storage space
- Disable browser extensions that might interfere
Downloaded files are incomplete or corrupted
Potential causes:
- Download interrupted before completion
- Disk space exhausted during download
- File system limitations
- Network errors during transfer
Solutions:
- Verify download completed (check progress reached 100%)
- Check available disk space before downloading
- Re-download the dataset from scratch
- Use Vi SDK which includes file integrity checks
- Try downloading to a different directory
Cannot find downloaded files
Potential causes:
- Browser default download location
- Custom download directory settings
- File system permissions
Solutions:
- Check browser's default download folder
- Search for dataset name in your file system
- Review browser download history for location
- For SDK downloads, check the
save_dirparameter path
Annotation format not compatible with my tools
Potential causes:
- Tool expects different annotation format
- Need conversion to COCO, YOLO, etc.
Solutions:
- Download annotations in Vi JSONL format
- Write or use conversion scripts to target format
- Check if Vi SDK has utilities for your target format
- Contact support for conversion assistance
- See community tools and scripts for common conversions
Next steps
Export complete dataset with assets and annotations
Export annotation data only in Vi JSONL or TFRecord
Programmatic downloads and automation
Load and iterate through downloaded datasets
Return to dataset management overview
Upload new data or re-upload exported datasets
Related resources
- Manage datasets — Overview of all dataset management operations
- Rename a dataset — Update dataset names
- Delete a dataset — Permanently remove datasets
- Vi SDK Getting Started — Programmatic data access
- Vi SDK Datasets API — Complete dataset API reference
Need help?
We're here to support your VLMOps journey. Reach out through any of these channels:
Updated about 1 month ago
