Datasets
Dataset resource
Access datasets through client.datasets.
Prerequisites
- Vi SDK installed with authentication configured
- A secret key for API authentication
- Understanding of Vi SDK basics
- Familiarity with dataset concepts
Methods
list()
List all datasets in the organization.
# Basic usage
datasets = client.datasets.list()
for dataset in datasets.items:
print(f"{dataset.name}: {dataset.dataset_id}")# With custom pagination
from vi.api.types import PaginationParams
pagination = PaginationParams(page_size=50)
datasets = client.datasets.list(pagination=pagination)
# Iterate through all pages
for page in datasets:
for dataset in page.items:
print(f"{dataset.name}: {dataset.statistic.asset_total} assets")# Collect all datasets
all_datasets = list(client.datasets.list().all_items())
print(f"Total datasets: {len(all_datasets)}")Parameters:
| Parameter | Type | Description | Default |
|---|---|---|---|
pagination | PaginationParams | Pagination settings | None |
Returns: PaginatedResponse[Dataset]
get()
Get a specific dataset by ID.
# Basic usage
dataset = client.datasets.get("dataset_abc123")
print(f"Name: {dataset.name}")
print(f"Assets: {dataset.statistic.asset_total}")# Display detailed information
dataset = client.datasets.get("dataset_abc123")
dataset.info() # Prints formatted dataset summary# Access dataset statistics
dataset = client.datasets.get("dataset_abc123")
print(f"Total assets: {dataset.statistic.asset_total}")
print(f"Annotated: {dataset.statistic.asset_annotated}")
print(f"Annotations: {dataset.statistic.annotation_total}")
# Calculate annotation rate
if dataset.statistic.asset_total > 0:
rate = dataset.statistic.asset_annotated / dataset.statistic.asset_total * 100
print(f"Annotation rate: {rate:.1f}%")Parameters:
| Parameter | Type | Description |
|---|---|---|
dataset_id | str | Dataset identifier |
Returns: Dataset
create_export()
Create a new dataset export.
# Basic export with defaults
export = client.datasets.create_export(dataset_id="dataset_abc123")# Custom export settings
from vi.api.resources.datasets.types import (
DatasetExportSettings,
DatasetExportFormat,
DatasetExportOptions
)
export_settings = DatasetExportSettings(
format=DatasetExportFormat.VI_FULL,
options=DatasetExportOptions(
normalized=True,
split_ratio=0.8 # 80% training, 20% validation
)
)
export = client.datasets.create_export(
dataset_id="dataset_abc123",
export_settings=export_settings
)
print(f"Export ID: {export.dataset_export_id}")# Export annotations only (JSONL format)
from vi.api.resources.datasets.types import (
DatasetExportSettings,
DatasetExportFormat
)
export_settings = DatasetExportSettings(
format=DatasetExportFormat.VI_JSONL
)
export = client.datasets.create_export(
dataset_id="dataset_abc123",
export_settings=export_settings
)Parameters:
| Parameter | Type | Description |
|---|---|---|
dataset_id | str | Dataset identifier |
export_settings | DatasetExportSettings | Export configuration |
Returns: DatasetExport
list_exports()
# List all exports
exports = client.datasets.list_exports("dataset_abc123")
for export in exports.items:
print(f"Export: {export.dataset_export_id}")
print(f" Format: {export.spec.format}")# Find ready exports with download URLs
exports = client.datasets.list_exports("dataset_abc123")
ready_exports = [
e for e in exports.items
if e.status.download_url is not None
]
for export in ready_exports:
print(f"Export {export.dataset_export_id} ready: {export.status.download_url.url}")Parameters:
| Parameter | Type | Description |
|---|---|---|
dataset_id | str | Dataset identifier |
pagination | PaginationParams | Pagination settings |
Returns: PaginatedResponse[DatasetExport]
get_export()
Get a specific export.
# Check export status
export = client.datasets.get_export(
dataset_id="dataset_abc123",
dataset_export_id="export_xyz789"
)
if export.status.download_url:
print(f"Export ready: {export.status.download_url.url}")
print(f"Expires at: {export.status.download_url.expires_at}")
else:
print("Export still processing...")# Poll until export is ready
import time
def wait_for_export(dataset_id: str, export_id: str, timeout: int = 300) -> str:
"""Wait for export to be ready and return download URL."""
start = time.time()
while time.time() - start < timeout:
export = client.datasets.get_export(dataset_id, export_id)
if export.status.download_url:
return export.status.download_url.url
time.sleep(5)
raise TimeoutError("Export not ready")
url = wait_for_export("dataset_abc123", "export_xyz789")Parameters:
| Parameter | Type | Description |
|---|---|---|
dataset_id | str | Dataset identifier |
dataset_export_id | str | Export identifier |
Returns: DatasetExport
download()
Download a dataset with all assets and annotations.
# Basic download
downloaded = client.datasets.download(
dataset_id="dataset_abc123",
save_dir="./data"
)
print(downloaded.summary())# Download with custom export settings
from vi.api.resources.datasets.types import (
DatasetExportSettings,
DatasetExportFormat,
DatasetExportOptions
)
settings = DatasetExportSettings(
format=DatasetExportFormat.VI_FULL,
options=DatasetExportOptions(
normalized=True,
split_ratio=0.8
)
)
downloaded = client.datasets.download(
dataset_id="dataset_abc123",
export_settings=settings,
save_dir="./data",
show_progress=True
)
print(f"Saved to: {downloaded.save_dir}")
print(f"Size: {downloaded.size_mb:.2f} MB")
print(f"Splits: {downloaded.splits}")# Download annotations only
downloaded = client.datasets.download(
dataset_id="dataset_abc123",
save_dir="./annotations",
annotations_only=True
)# Using the convenience method on client
downloaded = client.get_dataset(
dataset_id="dataset_abc123",
save_path="./data"
)Annotations-only download convenience method
For annotation-only downloads, you can use the more intuitive
client.annotations.download()method:# Recommended for annotations-only downloads result = client.annotations.download( dataset_id="dataset_abc123", save_dir="./annotations" )This is a convenience wrapper for
client.datasets.download(annotations_only=True)that provides a cleaner API.
Parameters:
| Parameter | Type | Description | Default |
|---|---|---|---|
dataset_id | str | Dataset identifier | Required |
dataset_export_id | str | Specific export ID | None |
export_settings | DatasetExportSettings | Export configuration | None |
annotations_only | bool | Download only annotations | False |
save_dir | str | Path | Save directory | Required |
overwrite | bool | Overwrite existing | False |
show_progress | bool | Show progress bars | True |
Returns: DatasetDownloadResult
delete()
Delete a dataset permanently.
# Delete a dataset
deleted = client.datasets.delete("dataset_abc123")
print(f"Deleted: {deleted.dataset_id}")# Delete with confirmation
dataset = client.datasets.get("dataset_abc123")
print(f"About to delete: {dataset.name}")
print(f" Assets: {dataset.statistic.asset_total}")
print(f" Annotations: {dataset.statistic.annotation_total}")
confirm = input("Delete? (yes/no): ")
if confirm.lower() == "yes":
client.datasets.delete("dataset_abc123")
print("Deleted.")Parameters:
| Parameter | Type | Description |
|---|---|---|
dataset_id | str | Dataset identifier |
Returns: DeletedDataset
bulk_delete_assets()
Bulk delete assets from a dataset.
# Delete assets by status
response = client.datasets.bulk_delete_assets(
dataset_id="dataset_abc123",
filter_criteria='{"status": "error"}',
strict_query=True
)# Delete unannotated assets
response = client.datasets.bulk_delete_assets(
dataset_id="dataset_abc123",
filter_criteria='{"metadata.annotations.total": 0}'
)Parameters:
| Parameter | Type | Description |
|---|---|---|
dataset_id | str | Dataset identifier |
filter_criteria | str | Filter query |
strict_query | bool | Strict query mode |
Returns: BulkAssetDeletionSession
Response types
Dataset
Main dataset response object.
from vi.api.resources.datasets.responses import Dataset| Property | Type | Description |
|---|---|---|
dataset_id | str | Unique identifier |
name | str | Dataset name |
owner | str | Owner identifier |
organization_id | str | Organization ID |
type | DatasetType | Dataset type (phrase-grounding, vqa) |
content | DatasetContent | Content type (Image) |
create_date | int | Creation timestamp (Unix ms) |
statistic | DatasetStatistic | Dataset statistics |
users | dict[str, User | str] | Users with access |
tags | dict[str, int] | Tag counts |
status | int | Status code |
last_accessed | int | Last access timestamp |
is_locked | bool | Lock status |
access | DatasetAccess | Access settings |
asset_statuses | dict[str, AssetStatusDetail] | Asset status definitions |
self_link | str | API link |
etag | str | Entity tag |
description | str | None | Optional description |
Methods:
| Method | Returns | Description |
|---|---|---|
info() | None | Display formatted dataset information |
DatasetStatistic
from vi.api.resources.datasets.responses import DatasetStatistic| Property | Type | Description |
|---|---|---|
asset_total | int | Total number of assets |
annotation_total | int | Total number of annotations |
asset_annotated | int | Number of annotated assets |
tags_count | dict[str, int] | Tag distribution |
DatasetAccess
from vi.api.resources.datasets.responses import DatasetAccess| Property | Type | Description |
|---|---|---|
is_public | bool | None | Public accessibility |
is_read_only | bool | None | Read-only mode |
is_hidden | bool | None | Hidden from listings |
AssetStatusDetail
from vi.api.resources.datasets.responses import AssetStatusDetail| Property | Type | Description |
|---|---|---|
description | str | Status description |
color | str | Color code |
create_date | int | None | Creation timestamp |
DatasetExport
from vi.api.resources.datasets.responses import DatasetExport| Property | Type | Description |
|---|---|---|
organization_id | str | Organization ID |
dataset_id | str | Dataset ID |
dataset_export_id | str | Export identifier |
spec | DatasetExportSpec | Export specification |
status | DatasetExportStatus | Export status |
metadata | ResourceMetadata | Metadata |
self_link | str | API link |
etag | str | Entity tag |
DatasetExportSpec
from vi.api.resources.datasets.responses import DatasetExportSpec| Property | Type | Description |
|---|---|---|
format | DatasetExportFormat | Export format |
options | DatasetExportOptions | Export options |
DatasetExportStatus
from vi.api.resources.datasets.responses import DatasetExportStatus| Property | Type | Description |
|---|---|---|
conditions | list[ResourceCondition] | Status conditions |
download_url | DatasetExportDownloadUrl | None | Download URL when ready |
DatasetExportDownloadUrl
from vi.api.resources.datasets.responses import DatasetExportDownloadUrl| Property | Type | Description |
|---|---|---|
url | str | Download URL |
expires_at | int | Expiration timestamp |
DatasetDownloadResult
Result returned after downloading a dataset.
| Property | Type | Description |
|---|---|---|
save_dir | Path | Save directory |
size_mb | float | Total size in MB |
splits | list[str] | Available splits |
assets_count | int | Number of assets |
annotations_count | int | Number of annotations |
Methods:
| Method | Returns | Description |
|---|---|---|
summary() | str | Get summary string |
info() | None | Print detailed info |
DeletedDataset
from vi.api.resources.datasets.responses import DeletedDataset| Property | Type | Description |
|---|---|---|
kind | str | Resource kind |
user | str | User who deleted |
organization_id | str | Organization ID |
dataset_id | str | Deleted dataset ID |
self_link | str | API link |
etag | str | Entity tag |
metadata | ResourceMetadata | Metadata |
status | DeletedDatasetStatus | Deletion status |
DeletedDatasetStatus
from vi.api.resources.datasets.responses import DeletedDatasetStatus| Property | Type | Description |
|---|---|---|
conditions | list[ResourceCondition] | Deletion conditions |
BulkAssetDeletionSession
from vi.api.resources.datasets.responses import BulkAssetDeletionSession| Property | Type | Description |
|---|---|---|
kind | str | Resource kind |
organization_id | str | Organization ID |
dataset_id | str | Dataset ID |
delete_many_assets_session_id | str | Session identifier |
self_link | str | API link |
etag | str | Entity tag |
metadata | ResourceMetadata | Metadata |
spec | BulkAssetDeletionSpec | Deletion spec |
status | BulkAssetDeletionStatus | Deletion status |
BulkAssetDeletionSpec
from vi.api.resources.datasets.types import BulkAssetDeletionSpec| Property | Type | Description |
|---|---|---|
filter | str | dict | None | Filter criteria |
metadata_query | str | None | Metadata query |
rule_query | str | None | Rule query |
strict_query | bool | None | Strict query mode |
BulkAssetDeletionStatus
from vi.api.resources.datasets.responses import BulkAssetDeletionStatus| Property | Type | Description |
|---|---|---|
conditions | list[ResourceCondition] | Deletion conditions |
Request types
DatasetExportSettings
from vi.api.resources.datasets.types import DatasetExportSettings| Property | Type | Description | Default |
|---|---|---|---|
format | DatasetExportFormat | Export format | VI_FULL |
options | DatasetExportOptions | Export options | DatasetExportOptions() |
DatasetExportOptions
from vi.api.resources.datasets.types import DatasetExportOptions| Property | Type | Description | Default |
|---|---|---|---|
normalized | bool | Normalize coordinates | True |
split_ratio | float | None | Train/validation split ratio | None |
Enums
DatasetType
from vi.api.resources.datasets.types import DatasetType| Value | Description |
|---|---|
PHRASE_GROUNDING | Phrase grounding dataset |
VQA | Visual question answering dataset |
DatasetContent
from vi.api.resources.datasets.types import DatasetContent| Value | Description |
|---|---|
IMAGE | Image content |
DatasetExportFormat
from vi.api.resources.datasets.types import DatasetExportFormat| Value | Description |
|---|---|
VI_FULL | Full dataset with assets |
VI_JSONL | JSONL format annotations |
Related resources
- Create a dataset — Manual guide to creating datasets
- Manage datasets — Rename, delete, and organize datasets
- Download data — Export datasets and annotations
- Upload data — Upload assets and annotations
- Vi SDK getting started — Quick start guide for the SDK
- Assets API — Manage assets within datasets
- Annotations API — Manage annotations programmatically
- Vi SDK installation — Install the Vi SDK
- API resources — Complete SDK reference
- View dataset insights — Analyze dataset statistics
- Secret keys — Manage API authentication
Need help?
We're here to support your VLMOps journey. Reach out through any of these channels:
Updated about 1 month ago
