Download Annotations

Export annotation data in Vi JSONL or TFRecord formats without downloading asset files.

Downloading annotations exports only your annotation data without the asset files. This includes phrase grounding annotations (captions, bounding boxes, grounded phrases), VQA annotations (question-answer pairs), and freeform annotations (coming soon). This is useful for lightweight backups, annotation analysis, format conversion, or sharing annotations without large image files.

📋

Annotation export formats

  • Vi JSONL — Datature's JSON Lines format for annotations, one record per line
  • TFRecord — TensorFlow's binary format optimized for training pipelines

Both formats support phrase grounding and VQA annotations from your datasets. Freeform annotations support is coming soon.

💻

Programmatic access with Vi SDK

You can also download annotations programmatically using the Vi SDK:

  • Download annotations only — Use client.annotations.download() convenience method
  • Multiple formats — Export in COCO, YOLO, Pascal VOC, or Vi formats
  • Automated exports — Integrate annotation downloads into your workflows
  • Parse and process — Directly load JSONL files in Python for analysis
  • Format conversion — Build custom scripts to convert Vi annotations to other formats

Learn more about Vi SDK annotations → | Vi SDK datasets →


Navigate to annotation export

  1. In your organization, navigate to the Explorer section from the sidebar

  2. Select the dataset containing the annotations you want to download

  3. Click the Annotations tab in the Dataset Explorer header

  4. The Annotations page displays the Export Annotations section with an Export button

📘

Dataset Explorer - Annotations tab

Dataset Explorer showing Export Annotations section

Access the Export Annotations section to download annotation data in various formats.


Export Vi JSONL annotations

Vi JSONL (JSON Lines) is Datature's native annotation format, storing each annotation record on a separate line for efficient streaming and processing.

Configure Vi JSONL export

  1. Click the Export button in the Export Annotations section

  2. The Annotation Export dialog opens

  3. In the Export Format dropdown, select Vi JSONL

📘

Vi JSONL export configuration

Annotation Export dialog with Vi JSONL format selected

Configure export settings for Vi JSONL annotation format.

Configure export options

Test Split Ratio (Optional)

Split your annotations into training and testing subsets:

  1. Check the Enabled checkbox to activate test splitting

  2. Enter a decimal value between 0 and 1:

    • 0.2 — 20% test set, 80% training set
    • 0.1 — 10% test set, 90% training set
    • 0.3 — 30% test set, 70% training set
  3. Leave unchecked to export all annotations together

Normalization

Select Normalized to organize annotations in a standardized structure:

  • Normalized — Annotations structured with consistent coordinate systems and formats
  • Ensures compatibility with training frameworks and processing tools

Export preview

Review the export summary before downloading:

You are about to export [N] assets with [M] annotations.

Verify the count matches your expectations.

Download Vi JSONL

  1. Review your configuration:

    • Export Format: Vi JSONL
    • Test Split Ratio: Your chosen value or disabled
    • Normalization: Normalized
  2. Click the Export button to start the download

  3. The export processes and downloads a .jsonl file containing your annotations


Export TFRecord annotations

TFRecord is TensorFlow's binary format optimized for high-performance data loading in training pipelines. Learn more about TFRecord format →

Configure TFRecord export

  1. Click the Export button in the Export Annotations section

  2. In the Annotation Export dialog, select TFRecord from the Export Format dropdown

  3. Configure the same export options:

    • Test Split Ratio — Optional train/test split
    • Normalization — Select Normalized for standardized output
  4. Review the export preview summary

  5. Click Export to download the TFRecord file

🔧

TFRecord use cases

  • TensorFlow training — Native format for TensorFlow data pipelines
  • Performance optimization — Binary format loads faster than JSON
  • Large datasets — Efficient for datasets with thousands of annotations
  • Production pipelines — Ideal for production ML workflows

Vi JSONL format specification

Vi JSONL files contain one JSON object per line, making them efficient for streaming and parallel processing.

For the complete Vi JSONL format specification including:

  • Full record structure and field descriptions
  • Phrase grounding annotation format with examples
  • VQA annotation format with examples
  • Coordinate systems and normalization details

See the complete Vi JSONL specification →


Download annotations with Vi SDK

You can download annotations programmatically using the Vi SDK, which is useful for automated workflows and batch processing.

Download annotations only (recommended)
import vi

# Initialize client
client = vi.Client(
    secret_key="your-secret-key",
    organization_id="your-organization-id"
)

# Download annotations using the annotations API (recommended)
result = client.annotations.download(
    dataset_id="your-dataset-id",
    save_dir="./annotations"
)

print(result.summary())
print(f"Downloaded to: {result.save_dir}")

Alternative method:

# Download annotations using the datasets API
result = client.datasets.download(
    dataset_id="your-dataset-id",
    save_dir="./annotations",
    annotations_only=True
)

Both methods produce identical results. The client.annotations.download() method is a convenience wrapper that's more intuitive for annotation-specific downloads.

Download with custom export settings
from vi.api.resources.datasets.types import (
    DatasetExportSettings,
    DatasetExportFormat,
    DatasetExportOptions
)

# Configure JSONL export with train/test split
settings = DatasetExportSettings(
    format=DatasetExportFormat.VI_JSONL,
    options=DatasetExportOptions(
        normalized=True,
        split_ratio=0.2  # 20% test, 80% training
    )
)

# Download annotations with custom settings
result = client.annotations.download(
    dataset_id="your-dataset-id",
    export_settings=settings,
    save_dir="./annotations",
    show_progress=True
)
Download in different formats
from vi.api.resources.datasets.types import DatasetExportFormat

# Download in COCO format (phrase grounding datasets only)
result = client.annotations.download(
    dataset_id="your-dataset-id",
    export_settings={"format": DatasetExportFormat.COCO},
    save_dir="./coco_annotations"
)

# Download in YOLO format (phrase grounding datasets only)
result = client.annotations.download(
    dataset_id="your-dataset-id",
    export_settings={"format": DatasetExportFormat.YOLO_DARKNET},
    save_dir="./yolo_annotations"
)

# Download in TFRecord format
result = client.annotations.download(
    dataset_id="your-dataset-id",
    export_settings={"format": DatasetExportFormat.VI_TFRECORD},
    save_dir="./tfrecord_annotations"
)

Supported formats by dataset type:

  • Phrase Grounding: COCO, YOLO, Pascal VOC, CSV variants, Vi JSONL, Vi TFRecord
  • VQA: Vi JSONL, Vi TFRecord
  • Freeform: Vi JSONL (coming soon)
Parse downloaded annotations
import json
from pathlib import Path

# Load annotations from downloaded JSONL file
annotations_file = Path(result.save_dir) / "dump" / "annotations" / "annotations.jsonl"

with open(annotations_file, 'r') as f:
    for line in f:
        record = json.loads(line)
        print(f"Asset: {record['filename']}")
        print(f"  Annotations: {len(record['annotations'])}")

Complete Vi SDK Annotations API → | Complete Vi SDK Datasets API →


Common use cases

Annotation backup

Create lightweight backups without large asset files:

  1. Export annotations in Vi JSONL format via web UI or Vi SDK
  2. Store in version control or cloud storage
  3. Restore annotations without re-uploading assets
  4. Track annotation changes over time
  5. Automate periodic backups using SDK scripts
Format conversion

Convert annotations for external tools:

  1. Export in Vi JSONL for readable format
  2. Write scripts to transform to target format
  3. Import to other annotation or training platforms
  4. Maintain annotation fidelity across conversions
Annotation analysis

Analyze annotation statistics and quality:

  1. Export Vi JSONL for easy parsing
  2. Use Python or other tools to process JSON
  3. Calculate class distributions and statistics
  4. Identify annotation errors or gaps
Training pipeline integration

Integrate annotations into ML pipelines:

  1. Export TFRecord for TensorFlow pipelines
  2. Export Vi JSONL for custom PyTorch dataloaders
  3. Automate periodic exports for continuous training
  4. Sync annotations with training infrastructure
Sharing annotations

Share labels without large asset files:

  1. Export lightweight annotation files
  2. Send to team members or collaborators
  3. Review and validate labels separately
  4. Reimport corrections or updates

Best practices

Choose the right format

Use Vi JSONL for general use, TFRecord for TensorFlow training

Include test splits

Enable test split ratio for training-ready exports

Version your annotations

Include version numbers or dates in export filenames

Validate exports

Verify annotation counts and sample records after export

Automate with SDK

Use Vi SDK to download and process annotations programmatically

Backup regularly

Export annotations at key project milestones


Working with exported annotations

Parse Vi JSONL in Python

import json

with open('annotations.jsonl', 'r') as f:
    for line in f:
        record = json.loads(line)
        asset_id = record['asset_id']
        filename = record['filename']
        annotations = record['annotations']

        # Process phrase grounding annotations
        for ann in annotations:
            if 'caption' in ann:
                print(f"Caption: {ann['caption']}")
                for phrase in ann['grounded_phrases']:
                    print(f"  Phrase: {phrase['phrase']}")
                    print(f"  BBox: {phrase['bbox']}")

            # Process VQA annotations
            elif 'interactions' in ann:
                for qa in ann['interactions']:
                    print(f"Q: {qa['question']}")
                    print(f"A: {qa['answer']}")

Load TFRecord in TensorFlow

TFRecord format can be loaded using TensorFlow's data pipeline for efficient training workflows:

import tensorflow as tf

# Load TFRecord dataset
dataset = tf.data.TFRecordDataset('annotations.tfrecord')

# Parse based on your annotation type (phrase grounding, VQA, or freeform)
# Feature descriptions will vary based on export format
# Refer to Vi SDK documentation for complete parsing examples

External resources:


Monitor export progress

Track your annotation exports in the Annotation Job History section:

  • Job type — Shows "Export" for download operations
  • User — Who initiated the export
  • File count — Number of annotation records
  • Status — In Progress, Finished, or Failed
  • Completion time — When the export finished

Troubleshooting

Export file is empty
  • No annotations — Verify your dataset has completed annotations
  • Filter active — Check if filters are excluding all assets
  • Export failed — Look for error messages in job history
  • Retry export — Try exporting again with correct settings
Cannot parse JSONL file
  • File encoding — Ensure file is UTF-8 encoded
  • Line endings — Check for proper newline characters
  • JSON errors — Validate each line is valid JSON
  • Truncated download — Redownload if file was interrupted
TFRecord not loading
  • Format version — Ensure TensorFlow version compatibility
  • Corrupted file — Redownload the export
  • Parser mismatch — Verify feature description matches export format
  • Dependencies — Install required TensorFlow packages
Missing annotations
  • Incomplete annotations — Some assets may lack annotations
  • Export filters — Check if annotation types were filtered
  • Verification needed — Review dataset before export
  • Partial export — Ensure export completed successfully
Test split issues
  • Inconsistent splits — Same split ratio may produce different results
  • Small datasets — Very small datasets may have uneven splits
  • Class imbalance — Some classes may be missing from test set
  • Random seed — Splits use random sampling without fixed seeds

Next steps