Sync From External Buckets

Connect Datature Vi to your existing object storage and sync image and video assets without uploading them to our servers. Read-only access keeps your source data in place.

If your training data already lives in a cloud storage bucket, you do not need to copy it into Datature Vi. Connect the bucket once and Vi reads the asset metadata in place. Your image and video files stay in your storage account, and the platform tracks references to them so you can annotate, train, and evaluate the same dataset you keep in production.

Paid tiers only

External bucket connections are available on paid account tiers. Check your plan in Billing before you begin setup.

By the end of this guide

Connect a cloud storage bucket to a Datature Vi dataset and sync assets without copying files off your infrastructure.

How bucket sync works

Vi uses a read-only metadata sync. The platform fetches filenames, dimensions, and EXIF data, then renders thumbnails and previews from your bucket on demand. The actual image and video bytes stay where they already live.

A few practical consequences of this design:

  • Bucket changes flow one way. New objects appear in Vi after the next sync. Annotations and labels you create in Vi never write back to your bucket.
  • Deletions in Vi remove the reference, not the file. The original object stays in your bucket until you remove it there.
  • Synced assets count toward your monthly data row quota the same way uploaded assets do.
  • Connecting multiple buckets to the same dataset merges them. Objects with identical filenames overwrite each other in the dataset view.
  • Only image and video assets sync from buckets. Annotations are not pulled from your storage; upload them directly to Vi after the sync finishes.

Choose your storage provider

What works the same across providers

Every connection follows the same four-step wizard inside the Dataset tab under Connect to External Buckets:

1

Bucket Details

Enter the connection name, bucket or container name, and an optional folder prefix to scope the sync to a subset of your data.

2

Access Credentials or Policy

Apply the IAM policy, role assignment, or access keys that Vi generates for you. Each provider has a different mechanism, but the goal is the same: grant read-only access to the bucket.

3

Connection Status

Vi tests the connection. A green status means Vi can list and read objects. A red status means a permission, region, or endpoint setting is wrong.

4

Sync Assets

Run an initial sync now or schedule it for later. Sync takes 5 to 40 minutes depending on object count.

Asset requirements

Vi reads metadata directly from your bucket, so files have to meet the same format requirements as direct uploads.

Asset format requirements for bucket sync

Asset type
Requirement
Images
No EXIF orientation tag, or an orientation value of 1 (upright). Other orientations cause display issues because Vi does not rotate pixels at sync time.
MP4 videos
Major brand mp42 and pixel format yuv420p. MinIO and S3-compatible services also accept isom, iso2, and mp41 major brands.
Other formats
See the supported formats list in Upload Images and Upload Videos. The same image and video formats supported for direct upload also work for bucket sync.

If a video fails to sync, run ffprobe against the file locally to confirm the major brand and pixel format match the requirements above.

Check video major brand and pixel format
ffprobe -v error \ -select_streams v:0 \ -show_entries format_tags=major_brand \ -show_entries stream=pix_fmt \ -of default=noprint_wrappers=1 \ your-video.mp4
Expected output for a compliant MP4
pix_fmt=yuv420p major_brand=mp42

If major_brand or pix_fmt differ, re-encode the file with FFmpeg before syncing:

Re-encode to mp42 / yuv420p
ffmpeg -i your-video.mp4 \ -c:v libx264 -pix_fmt yuv420p \ -brand mp42 \ -movflags +faststart \ your-video-fixed.mp4

CORS allowlist

Image and video previews load directly from your bucket in the browser, so the bucket has to allow cross-origin requests from the Vi web app. Add the Vi origin to your CORS configuration:

  • https://vi.datature.com

Each provider page has the exact CORS payload to apply.

Common questions

No. Vi reads file metadata such as filenames, dimensions, and EXIF data, then streams previews from your bucket on demand. The actual bytes stay in your storage account.

Yes. A synced asset uses one data row from your monthly quota, the same way a directly uploaded asset does. Check your remaining quota in Billing before a large sync.

The object disappears from the dataset on the next sync. Annotations attached to a deleted asset are also removed. To restore the asset, put the object back in the bucket and run another sync.

Yes. Add multiple connections to the same dataset and Vi syncs all of them. Objects with identical filenames across buckets overwrite each other, so use folder prefixes or rename files if you need to keep both.

The most common cause is hitting your monthly data row quota mid-sync. Check usage in Billing. The next most common cause is an asset that fails the format requirements above (rotated images, non-mp42 videos).

No. Bucket sync only pulls image and video metadata. Annotation files (COCO, YOLO, Pascal VOC, CSV, Vi JSONL) have to be uploaded directly to Vi using the annotation importer. Upload them after the asset sync finishes so the labels can match to the synced filenames.

Next steps

Connect AWS S3

Set up an S3 bucket connection using IAM roles.

Annotate Data

Label your synced images in the visual annotator.

Train A Model

Fine-tune a vision-language model on your synced dataset.