Sync From Azure Blob Storage

Connect a Datature Vi dataset to an Azure Blob Storage container using a service principal and a scoped role assignment. Read-only sync that keeps blobs in your Storage Account.

Datature Vi reads images and videos from an Azure Blob Storage container using a service principal that you authorise inside your Azure tenant. The service principal gets the Storage Blob Data Reader role on a single container, scoped through a condition so it cannot read anything else in the Storage Account. This guide walks through the full setup.

Before You Start
  • A paid Datature Vi account. External bucket sync is not available on the free tier.
  • An active Azure subscription with Blob Storage configured.
  • Azure CLI installed locally and signed in (az login).
  • A Storage Account with at least one container and the assets you want to sync.
  • Permission to create service principals and assign IAM roles in your subscription.
1

Open the Explorer tab

Open the Explorer tab

In the left sidebar, click the Explorer tab on your dataset. This is where the synced assets will appear after the connection is set up.

You should see
Synced images appear in the dataset Explorer. The asset count in the header reflects the blobs that passed the format checks.

Synced images appear in the dataset Explorer. The asset count in the header reflects the blobs that passed the format checks.

Step 1: Enter the blob details

Open your dataset, then walk through the wizard.

The Blob Details tab asks for four fields:

Blob details

Name
Type
Description
Required
Default
Connection Name
string
A label you choose for this connection. Used to identify the connection in the Connection Manager.
Required
Storage Account Name
string
The name of the Azure Storage Account that holds your container. Three to 24 lowercase letters and numbers, no special characters.
Required
Container Name
string
The container inside the Storage Account that holds your assets.
Required
Folder Prefix
string
A path prefix to scope the sync to a subfolder. Leave empty to sync the whole container.
Optional

Click Next. Vi generates a unique service principal ID and a role assignment identifier for the next step.

Step 2: Create the service principal

Vi shows you a unique application ID. Run this command in a shell where the Azure CLI is signed in, replacing the placeholder with the value Vi gave you:

Create the Datature service principal
az ad sp create --id <UNIQUE_ID_GENERATED_BY_VI>

This registers a service principal in your tenant for the Vi application. The command finishes with a JSON object describing the principal. You do not need to copy any of that output; the next step uses the role assignment identifier from the Vi wizard, not from this command.

Step 3: Assign the Storage Blob Data Reader role

Grant the service principal read-only access to the container. The condition below restricts the role to your specific container, so the principal cannot list or read blobs anywhere else in the Storage Account.

1

Open Access Control on the Storage Account

In the Azure portal, open the Storage Account, then click Access Control (IAM).

Assign the role on the Storage Account, not on the container or an individual blob. The condition scopes access to a single container; assigning the role at the container level breaks the listing call.

2

Add a role assignment

Click Add, then Add role assignment. Search for and select Storage Blob Data Reader.

3

Set the member

Switch to the Members tab. Paste the IAM Role Assignment identifier that the Vi wizard generated.

4

Add the container condition

Switch to the Conditions tab and add a condition. Paste the expression below, replacing <YOUR_CONTAINER_NAME> with the container you entered in Step 1.

(
  (
    !(ActionMatches{'Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read'})
  )
  OR
  (
    @Resource[Microsoft.Storage/storageAccounts/blobServices/containers:name] StringEquals '<YOUR_CONTAINER_NAME>'
  )
)
5

Review and assign

Click Review + assign twice to apply the role.

Assignment scope matters

The role must be assigned at the Storage Account level with the container condition above. If you assign it directly on the container or on an individual blob, the listing API call fails and Vi reports a connection error.

Step 4: Configure CORS (videos only)

You only need this step if your container holds video assets. Image previews work without CORS.

1

Open Resource sharing (CORS)

In the Storage Account, click Settings, then Resource sharing (CORS).

2

Add the Vi origin

Add a CORS rule under the Blob service tab using GET as the allowed method.

Azure CORS entry

Allowed origins
Allowed methods
https://vi.datature.com/
GET
3

Save

Click Save. The change takes effect immediately.

Step 5: Confirm the connection

Back in the Vi wizard, click Next to test the connection. Azure can take up to five minutes to propagate the role assignment, so a connection failure right after Step 3 does not always mean something is wrong.

If the test fails, wait a few minutes and click Retry. If the failure persists, jump to the troubleshooting section.

Step 6: Sync your assets

Choose Sync Now to start the first sync immediately, or Sync Later to set up the connection without syncing. If you pick Sync Now, Vi walks through three more screens before the sync starts in earnest.

  1. Preview Files to Sync. Vi scans the container prefix and shows the file count alongside a sample of blob paths. Confirm the preview matches what you expect, then click Sync.
  2. Sync Started. A confirmation appears letting you know the job is running in the background. Click I Understand to dismiss the dialog; the sync continues even after you close the wizard or the browser tab.
  3. Track progress. Open the Connected Bucket dropdown in the top-right of the Explorer to see the connection name, status, provider, container, prefix, asset count, and a live progress bar while blobs are retrieved.

The first sync takes 5 to 40 minutes depending on the container size.

Asset requirements

Blobs in the container must meet the same format rules as direct uploads.

Azure asset requirements

Asset type
Requirement
Images
No EXIF orientation tag, or an orientation value of 1.
MP4 videos
Major brand mp42 and pixel format yuv420p. Run ffprobe your-video.mp4 to verify.
Other formats
See Upload Images and Upload Videos for the full supported list.

Annotations are not part of the bucket sync. Vi reads only image and video metadata from the Azure container. If you have existing labels in COCO, YOLO, Pascal VOC, CSV, or Vi JSONL, upload them directly to Vi once the assets finish syncing.

Troubleshooting

Azure changes can take up to five minutes to propagate. Wait a few minutes and click Retry in the wizard.

The most common cause is assigning the role at the wrong scope. Confirm the role is on the Storage Account (not on the container or a blob) and that the condition references the correct container name. Re-paste the JSON if needed.

Check your remaining data row quota in Billing. Synced blobs count toward the same monthly quota as direct uploads. Files that fail the format requirements (rotated images, non-mp42 videos) are skipped during sync.

Confirm the CORS rules above are present on the Storage Account. Image previews work without CORS, but video streaming requires the cross-origin headers.

Verify the current CORS configuration with the Azure CLI:

List CORS rules on the Storage Account
az storage cors list \ --account-name your-account \ --services b

The output should include https://vi.datature.com under AllowedOrigins. If the rule is missing, re-add it from Step 4.

The image has an EXIF orientation tag other than 1. You have two options.

Option 1: Bake the orientation into the pixels with ImageMagick. This rotates the image data and resets the orientation tag to 1.

Auto-orient with ImageMagick
mogrify -auto-orient your-image.jpg

Option 2: Strip the orientation tag with exiftool. Use this when the pixels are already correct and only the tag is wrong.

Remove the EXIF orientation tag
exiftool -Orientation= -overwrite_original your-image.jpg

To process a whole folder, point either tool at the directory:

Batch fix every image in a folder
mogrify -auto-orient ./images/*.jpg exiftool -Orientation= -overwrite_original -r ./images

Re-upload the fixed files to the container and run the sync again.

Re-run the sync from the Connection Manager. Container changes do not propagate automatically; Vi re-reads the metadata only when you start a new sync.

Next steps

Annotate Data

Label the synced images and videos in the visual annotator.

Sync From External Buckets

Compare Azure Blob with the other supported storage providers.

Train A Model

Fine-tune a vision-language model on the synced dataset.