Datasets

Access datasets through client.datasets in the Datature Vi SDK.

The datasets resource is the root container for all your training data. Datasets hold assets (images) and annotations, and they belong to your organization. Every other data-related operation (uploading images, creating annotations, exporting for training) starts here.

Before You Start

Get started with the Vi SDK →


Dataset Creation Is UI-Only

The SDK does not support creating new datasets. Use the Datature Vi web interface to create a dataset, then use the SDK to manage, export, and download it.

Methods

list()

List all datasets in your organization.

datasets = client.datasets.list()

for dataset in datasets.items:
    print(f"{dataset.name}: {dataset.dataset_id}")
from vi.api.types import PaginationParams

pagination = PaginationParams(page_size=50)
datasets = client.datasets.list(pagination=pagination)

# Iterate through all pages
for page in datasets:
    for dataset in page.items:
        print(f"{dataset.name}: {dataset.statistic.asset_total} assets")
all_datasets = list(client.datasets.list().all_items())
print(f"Total datasets: {len(all_datasets)}")
Expected output
Output
my-dataset: dat_abc123 training-set: dat_def456 Total datasets: 2

Parameters

Name
Type
Description
Required
Default
pagination
object
Pagination settings. See PaginationParams.
Optional
None

Returns: PaginatedResponse[Dataset]


get()

Get a specific dataset by ID.

dataset = client.datasets.get("dataset_abc123")

print(f"Name: {dataset.name}")
print(f"Assets: {dataset.statistic.asset_total}")
dataset = client.datasets.get("dataset_abc123")
dataset.info()  # Prints formatted dataset summary
dataset = client.datasets.get("dataset_abc123")

print(f"Total assets: {dataset.statistic.asset_total}")
print(f"Annotated: {dataset.statistic.asset_annotated}")
print(f"Annotations: {dataset.statistic.annotation_total}")

if dataset.statistic.asset_total > 0:
    rate = dataset.statistic.asset_annotated / dataset.statistic.asset_total * 100
    print(f"Annotation rate: {rate:.1f}%")
Expected output
Output
Name: my-dataset Assets: 150 Total assets: 150 Annotated: 120 Annotations: 340 Annotation rate: 80.0%

Parameters

Name
Type
Description
Required
Default
dataset_id
string
Dataset identifier.
Required

Returns: Dataset


create_export()

Create a new dataset export.

export = client.datasets.create_export(dataset_id="dataset_abc123")
from vi.api.resources.datasets.types import (
    DatasetExportSettings,
    DatasetExportFormat,
    DatasetExportOptions
)

export_settings = DatasetExportSettings(
    format=DatasetExportFormat.VI_FULL,
    options=DatasetExportOptions(
        normalized=True,
        split_ratio=0.8  # 80% training, 20% validation
    )
)

export = client.datasets.create_export(
    dataset_id="dataset_abc123",
    export_settings=export_settings
)

print(f"Export ID: {export.dataset_export_id}")
from vi.api.resources.datasets.types import (
    DatasetExportSettings,
    DatasetExportFormat
)

export_settings = DatasetExportSettings(
    format=DatasetExportFormat.VI_JSONL
)

export = client.datasets.create_export(
    dataset_id="dataset_abc123",
    export_settings=export_settings
)
Expected output
Output
Export ID: exp_xyz789

Parameters

Name
Type
Description
Required
Default
dataset_id
string
Dataset identifier.
Required
export_settings
object
Export configuration. See DatasetExportSettings.
Optional

Returns: DatasetExport


list_exports()

List all exports for a dataset.

exports = client.datasets.list_exports("dataset_abc123")

for export in exports.items:
    print(f"Export: {export.dataset_export_id}")
    print(f"  Format: {export.spec.format}")
exports = client.datasets.list_exports("dataset_abc123")

ready_exports = [
    e for e in exports.items
    if e.status.download_url is not None
]

for export in ready_exports:
    print(f"Export {export.dataset_export_id} ready: {export.status.download_url.url}")

Parameters

Name
Type
Description
Required
Default
dataset_id
string
Dataset identifier.
Required
pagination
object
Pagination settings. See PaginationParams.
Optional

Returns: PaginatedResponse[DatasetExport]


get_export()

Get a specific export by ID.

export = client.datasets.get_export(
    dataset_id="dataset_abc123",
    dataset_export_id="export_xyz789"
)

if export.status.download_url:
    print(f"Export ready: {export.status.download_url.url}")
    print(f"Expires at: {export.status.download_url.expires_at}")
else:
    print("Export still processing...")
import time

def wait_for_export(dataset_id: str, export_id: str, timeout: int = 300) -> str:
    """Wait for export to complete and return download URL."""
    start = time.time()
    while time.time() - start < timeout:
        export = client.datasets.get_export(dataset_id, export_id)
        if export.status.download_url:
            return export.status.download_url.url
        time.sleep(5)
    raise TimeoutError("Export not ready")

url = wait_for_export("dataset_abc123", "export_xyz789")

Parameters

Name
Type
Description
Required
Default
dataset_id
string
Dataset identifier.
Required
dataset_export_id
string
Export identifier.
Required

Returns: DatasetExport


download()

Download a dataset with all assets and annotations.

downloaded = client.datasets.download(
    dataset_id="dataset_abc123",
    save_dir="./data"
)

print(downloaded.summary())
from vi.api.resources.datasets.types import (
    DatasetExportSettings,
    DatasetExportFormat,
    DatasetExportOptions
)

settings = DatasetExportSettings(
    format=DatasetExportFormat.VI_FULL,
    options=DatasetExportOptions(
        normalized=True,
        split_ratio=0.8
    )
)

downloaded = client.datasets.download(
    dataset_id="dataset_abc123",
    export_settings=settings,
    save_dir="./data",
    show_progress=True
)

print(f"Saved to: {downloaded.save_dir}")
print(f"Size: {downloaded.size_mb:.2f} MB")
print(f"Splits: {downloaded.splits}")
downloaded = client.datasets.download(
    dataset_id="dataset_abc123",
    save_dir="./annotations",
    annotations_only=True
)
downloaded = client.get_dataset(
    dataset_id="dataset_abc123",
    save_path="./data"
)
Expected output
Output
Saved to: ./data Size: 24.50 MB Splits: ['train', 'val']
Annotations-Only Downloads

For annotation-only downloads, client.annotations.download() is a cleaner alternative:

result = client.annotations.download(
    dataset_id="dataset_abc123",
    save_dir="./annotations"
)

This wraps client.datasets.download(annotations_only=True) with a more descriptive name. See the Annotations API →

Parameters

Name
Type
Description
Required
Default
dataset_id
string
Dataset identifier.
Required
dataset_export_id
string
Specific export ID to download.
Optional
None
export_settings
object
Export configuration. See DatasetExportSettings.
Optional
None
annotations_only
boolean
Download only annotation files, not image assets.
Optional
false
save_dir
string
Local directory to save the downloaded files.
Required
overwrite
boolean
Overwrite existing files in the save directory.
Optional
false
show_progress
boolean
Show progress bars during download.
Optional
true

Returns: DatasetDownloadResult


delete()

Delete a dataset permanently.

deleted = client.datasets.delete("dataset_abc123")
print(f"Deleted: {deleted.dataset_id}")
dataset = client.datasets.get("dataset_abc123")
print(f"About to delete: {dataset.name}")
print(f"  Assets: {dataset.statistic.asset_total}")
print(f"  Annotations: {dataset.statistic.annotation_total}")

confirm = input("Delete? (yes/no): ")
if confirm.lower() == "yes":
    client.datasets.delete("dataset_abc123")
    print("Deleted.")
Expected output
Output
Deleted: dat_abc123

Parameters

Name
Type
Description
Required
Default
dataset_id
string
Dataset identifier.
Required

Returns: DeletedDataset


bulk_delete_assets()

Bulk delete assets from a dataset using a filter query.

response = client.datasets.bulk_delete_assets(
    dataset_id="dataset_abc123",
    filter_criteria='{"status": "error"}',
    strict_query=True
)
response = client.datasets.bulk_delete_assets(
    dataset_id="dataset_abc123",
    filter_criteria='{"metadata.annotations.total": 0}'
)

Parameters

Name
Type
Description
Required
Default
dataset_id
string
Dataset identifier.
Required
filter_criteria
string
JSON filter query string.
Required
strict_query
boolean
Enable strict query mode.
Optional
false

Returns: BulkAssetDeletionSession


Response types

Dataset

from vi.api.resources.datasets.responses import Dataset

Properties

Name
Type
Description
Required
Default
dataset_id
string
Unique identifier
Optional
name
string
Dataset name
Optional
owner
string
Owner identifier
Optional
organization_id
string
Organization ID
Optional
type
string
Dataset type (phrase-grounding, vqa)
Optional
content
string
Content type
Optional
create_date
integer
Creation timestamp (Unix ms)
Optional
statistic
object
Dataset statistics (DatasetStatistic)
Optional
users
object
Users with access
Optional
tags
object
Tag counts
Optional
status
integer
Status code
Optional
last_accessed
integer
Last access timestamp
Optional
is_locked
boolean
Lock status
Optional
access
object
Access settings (DatasetAccess)
Optional
asset_statuses
object
Asset status definitions
Optional
self_link
string
API link
Optional
etag
string
Entity tag
Optional
description
string
Optional description
Optional

Methods: info() → prints a formatted dataset summary.


DatasetStatistic

from vi.api.resources.datasets.responses import DatasetStatistic

Properties

Name
Type
Description
Required
Default
asset_total
integer
Total number of assets
Optional
annotation_total
integer
Total number of annotations
Optional
asset_annotated
integer
Number of annotated assets
Optional
tags_count
object
Tag distribution
Optional

DatasetAccess

from vi.api.resources.datasets.responses import DatasetAccess

Properties

Name
Type
Description
Required
Default
is_public
boolean
Public accessibility
Optional
is_read_only
boolean
Read-only mode
Optional
is_hidden
boolean
Hidden from listings
Optional

AssetStatusDetail

from vi.api.resources.datasets.responses import AssetStatusDetail

Properties

Name
Type
Description
Required
Default
description
string
Status description
Optional
color
string
Color code
Optional
create_date
integer
Creation timestamp
Optional

DatasetExport

from vi.api.resources.datasets.responses import DatasetExport

Properties

Name
Type
Description
Required
Default
organization_id
string
Organization ID
Optional
dataset_id
string
Dataset ID
Optional
dataset_export_id
string
Export identifier
Optional
spec
object
Export specification (DatasetExportSpec)
Optional
status
object
Export status (DatasetExportStatus)
Optional
metadata
object
Metadata
Optional
self_link
string
API link
Optional
etag
string
Entity tag
Optional

DatasetExportSpec

from vi.api.resources.datasets.responses import DatasetExportSpec

Properties

Name
Type
Description
Required
Default
format
string
Export format
Optional
options
object
Export options
Optional

DatasetExportStatus

from vi.api.resources.datasets.responses import DatasetExportStatus

Properties

Name
Type
Description
Required
Default
conditions
array
Status conditions
Optional
download_url
object
Download URL when ready (DatasetExportDownloadUrl)
Optional

DatasetExportDownloadUrl

from vi.api.resources.datasets.responses import DatasetExportDownloadUrl

Properties

Name
Type
Description
Required
Default
url
string
Download URL
Optional
expires_at
integer
Expiration timestamp
Optional

DatasetDownloadResult

Properties

Name
Type
Description
Required
Default
save_dir
string
Save directory
Optional
size_mb
number
Total size in MB
Optional
splits
array
Available splits
Optional
assets_count
integer
Number of assets
Optional
annotations_count
integer
Number of annotations
Optional

Methods: summary() → returns a summary string. info() → prints detailed info.


DeletedDataset

from vi.api.resources.datasets.responses import DeletedDataset

Properties

Name
Type
Description
Required
Default
kind
string
Resource kind
Optional
user
string
User who deleted
Optional
organization_id
string
Organization ID
Optional
dataset_id
string
Deleted dataset ID
Optional
self_link
string
API link
Optional
etag
string
Entity tag
Optional
metadata
object
Metadata
Optional
status
object
Deletion status (DeletedDatasetStatus)
Optional

DeletedDatasetStatus

from vi.api.resources.datasets.responses import DeletedDatasetStatus

Properties

Name
Type
Description
Required
Default
conditions
array
Deletion conditions
Optional

BulkAssetDeletionSession

from vi.api.resources.datasets.responses import BulkAssetDeletionSession

Properties

Name
Type
Description
Required
Default
kind
string
Resource kind
Optional
organization_id
string
Organization ID
Optional
dataset_id
string
Dataset ID
Optional
delete_many_assets_session_id
string
Session identifier
Optional
self_link
string
API link
Optional
etag
string
Entity tag
Optional
metadata
object
Metadata
Optional
spec
object
Deletion spec (BulkAssetDeletionSpec)
Optional
status
object
Deletion status (BulkAssetDeletionStatus)
Optional

BulkAssetDeletionSpec

from vi.api.resources.datasets.types import BulkAssetDeletionSpec

Properties

Name
Type
Description
Required
Default
filter
string
Filter criteria
Optional
metadata_query
string
Metadata query
Optional
rule_query
string
Rule query
Optional
strict_query
boolean
Strict query mode
Optional

BulkAssetDeletionStatus

from vi.api.resources.datasets.responses import BulkAssetDeletionStatus

Properties

Name
Type
Description
Required
Default
conditions
array
Deletion conditions
Optional

Request types

DatasetExportSettings

from vi.api.resources.datasets.types import DatasetExportSettings

Properties

Name
Type
Description
Required
Default
format
string
Export format
Optional
VI_FULL
options
object
Export options
Optional
DatasetExportOptions()

DatasetExportOptions

from vi.api.resources.datasets.types import DatasetExportOptions

Properties

Name
Type
Description
Required
Default
normalized
boolean
Normalize coordinates to [0, 1]
Optional
True
split_ratio
number
Train/validation split ratio
Optional
None

Enums

DatasetType

from vi.api.resources.datasets.types import DatasetType

Values

Name
Type
Description
Required
Default
PHRASE_GROUNDING
Phrase grounding dataset
Optional
VQA
Visual question answering dataset
Optional

DatasetContent

from vi.api.resources.datasets.types import DatasetContent

Values

Name
Type
Description
Required
Default
IMAGE
Image content
Optional

DatasetExportFormat

from vi.api.resources.datasets.types import DatasetExportFormat

Values

Name
Type
Description
Required
Default
VI_FULL
Full dataset with assets
Optional
VI_JSONL
JSONL annotations only
Optional

Related resources

Assets API

Upload, download, list, and delete image assets within a dataset.

Annotations API

Upload, list, and download annotations for phrase grounding and VQA.

Create A Dataset

UI guide for creating datasets in Datature Vi.

Download Data

UI guide for exporting datasets and annotations.