Sync From Google Cloud Storage

Connect a Datature Vi dataset to a Google Cloud Storage bucket using a service account and an IAM policy binding. Read-only sync that keeps objects in your GCS project.

Datature Vi reads images and videos from a Google Cloud Storage bucket using a service account that you authorise inside your GCP project. The service account has the Storage Object Viewer role on the bucket, which lets Vi list and read objects but nothing else. This guide walks through the bucket connection, the IAM binding, and the CORS rules.

Before You Start
  • A paid Datature Vi account. External bucket sync is not available on the free tier.
  • A GCP project with billing enabled and the bucket already created.
  • gcloud installed locally or access to Cloud Shell, signed in to your project.
  • Permission to add IAM bindings on the bucket.
1

Open the Explorer tab

Open the Explorer tab

In the left sidebar, click the Explorer tab on your dataset. This is where the synced assets will appear after the connection is set up.

You should see
Synced images appear in the dataset Explorer. The asset count in the header reflects the GCS objects that passed the format checks.

Synced images appear in the dataset Explorer. The asset count in the header reflects the GCS objects that passed the format checks.

Step 1: Enter the bucket details

Open your dataset, then walk through the wizard.

The Bucket Details tab asks for three fields:

Bucket details

Name
Type
Description
Required
Default
Connection Name
string
A label you choose for this connection. Used to identify the connection in the Connection Manager.
Required
GCS Bucket Name
string
The exact name of your GCS bucket. Follows GCS naming rules (lowercase letters, numbers, hyphens, no underscores).
Required
Folder Prefix
string
A path prefix to scope the sync. Leave empty to sync the whole bucket. Useful for buckets that hold non-training data alongside your dataset.
Optional

Click Next. Vi shows you a service account email address for the next step.

Step 2: Grant Storage Object Viewer to the Vi service account

Run this command in Cloud Shell or a local terminal where gcloud is signed in. Replace {bucket name} with the bucket from Step 1.

Add the Datature service account to the bucket IAM policy
gcloud storage buckets add-iam-policy-binding \ gs://{bucket name} \ --member=serviceAccount:[email protected] \ --role=roles/storage.objectViewer

The command grants the Vi service account read access to the bucket. The roles/storage.objectViewer role lets Vi list objects and read their content, but does not allow any write or delete actions.

Use the service account email Vi shows you

The example email above is illustrative. The wizard shows you the exact service account email for your account; copy it from the UI before running the command.

Step 3: Configure CORS on the bucket

Vi loads thumbnails and previews from the bucket in the browser, so the bucket has to allow cross-origin GETs from the Vi web app.

New buckets

Create a file called CORS_CONFIG_FILE with this content:

[
  {
    "maxAgeSeconds": 3600,
    "method": ["GET"],
    "origin": [
      "https://vi.datature.com"
    ],
    "responseHeader": [
      "Content-Type",
      "Access-Control-Allow-Origin"
    ]
  }
]

Apply the configuration:

Apply CORS to the bucket
gcloud storage buckets update gs://{bucket name} --cors-file=CORS_CONFIG_FILE

Existing buckets

If the bucket already has CORS rules, fetch the current configuration first, merge the vi.datature.com origin into the existing rule, then re-apply with the same gcloud storage buckets update command. Replacing the file wipes any rules that other applications rely on.

Step 4: Sync your assets

Back in the Vi wizard, click Next. Vi tests the connection, then offers Sync Now or Sync Later. If you pick Sync Now, Vi walks through three more screens before the sync starts in earnest.

  1. Preview Files to Sync. Vi scans the bucket prefix and shows the file count alongside a sample of object paths. Confirm the preview matches what you expect, then click Sync.
  2. Sync Started. A confirmation appears letting you know the job is running in the background. Click I Understand to dismiss the dialog; the sync continues even after you close the wizard or the browser tab.
  3. Track progress. Open the Connected Bucket dropdown in the top-right of the Explorer to see the connection name, status, provider, bucket, prefix, asset count, and a live progress bar while assets are retrieved.

The first sync takes 5 to 40 minutes depending on the bucket size.

Asset requirements

Objects in GCS must meet the same format rules as direct uploads.

GCS asset requirements

Asset type
Requirement
Images
No EXIF orientation tag, or an orientation value of 1.
MP4 videos
Major brand mp42 and pixel format yuv420p. Run ffprobe your-video.mp4 to verify.
Other formats
See Upload Images and Upload Videos for the full supported list.

Annotations are not part of the bucket sync. Vi reads only image and video metadata from GCS. If you have existing labels in COCO, YOLO, Pascal VOC, CSV, or Vi JSONL, upload them directly to Vi once the assets finish syncing.

Troubleshooting

Confirm two things. First, the --member flag uses the service account email shown in the Vi wizard, not the example email. Second, the binding is on the bucket itself, not on the parent project. The wrong scope causes a silent permission failure.

The most common cause is hitting your monthly data row quota mid-sync. Check usage in Billing. Files that fail the format requirements above are skipped during sync.

Re-check the CORS configuration. The most common mistake is forgetting Content-Type in the responseHeader array, or replacing an existing CORS file rather than merging. Run gcloud storage buckets describe gs://{bucket name} --format='value(cors)' to inspect the current rules.

The image has an EXIF orientation tag other than 1. You have two options.

Option 1: Bake the orientation into the pixels with ImageMagick. This rotates the image data and resets the orientation tag to 1.

Auto-orient with ImageMagick
mogrify -auto-orient your-image.jpg

Option 2: Strip the orientation tag with exiftool. Use this when the pixels are already correct and only the tag is wrong.

Remove the EXIF orientation tag
exiftool -Orientation= -overwrite_original your-image.jpg

To process a whole folder, point either tool at the directory:

Batch fix every image in a folder
mogrify -auto-orient ./images/*.jpg exiftool -Orientation= -overwrite_original -r ./images

Re-upload the fixed files to the bucket and run the sync again.

Re-run the sync from the Connection Manager. GCS changes do not propagate automatically; Vi re-reads the metadata only when you start a new sync.

Next steps

Annotate Data

Label the synced images and videos in the visual annotator.

Sync From External Buckets

Compare GCS sync with the other supported storage providers.

Train A Model

Fine-tune a vision-language model on the synced dataset.