Sync From AWS S3

Connect a Datature Vi dataset to an Amazon S3 bucket using an IAM role and a trust policy. Step-by-step setup with the exact JSON payloads to apply.

Datature Vi reads images and videos from an Amazon S3 bucket using an IAM role that you create in your AWS account. The role has read-only access to the bucket, and a trust policy lets Vi assume the role from our account. This guide walks through the setup in the order the wizard asks for it.

Before You Start
  • A paid Datature Vi account. External bucket sync is not available on the free tier.
  • An AWS account with permission to create IAM policies and IAM roles.
  • An S3 bucket that already contains the assets you want to sync.
  • Permission to edit the bucket's CORS configuration.
1

Open the Explorer tab

Open the Explorer tab

In the left sidebar, click the Explorer tab on your dataset. This is where the synced assets will appear after the connection is set up.

You should see
Synced images appear in the dataset Explorer with thumbnails. The asset count in the header reflects the new objects pulled from S3.

Synced images appear in the dataset Explorer with thumbnails. The asset count in the header reflects the new objects pulled from S3.

Step 1: Enter the bucket details

Open your dataset, then walk through the wizard.

The Bucket Details tab asks for four fields:

Bucket details

Name
Type
Description
Required
Default
Connection Name
string
A label you choose for this connection. Used to identify the connection in the Connection Manager.
Required
AWS Bucket Name
string
The exact name of your S3 bucket. Follows AWS naming rules (lowercase, no underscores, no spaces).
Required
Folder Prefix
string
A path prefix to scope the sync. Leave empty to sync the whole bucket. Useful for buckets that hold non-training data alongside your dataset.
Optional
AWS Bucket Region
string
The region where the bucket lives, for example us-east-1 or ap-southeast-1. Find this in the AWS console under your bucket's properties.
Required

Click Next. Vi generates two JSON files for the next step.

Connection error on this step

If you see a banner that reads "There was an error in creating the bucket connection," check that the bucket name and region match the AWS console exactly, and confirm your account has remaining data row quota.

Step 2: Apply the IAM policy and trust policy

Vi gives you two JSON payloads. Apply them in the order shown below.

IAM policy

This policy grants Vi read access to the bucket. Replace {bucket name} with your bucket name and {prefix}* with your folder prefix (or leave it as * to cover the whole bucket).

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::{bucket name}",
            "Condition": {
                "StringLike": {
                    "s3:prefix": "{prefix}*"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectAttributes"
            ],
            "Resource": "arn:aws:s3:::{bucket name}/{prefix}*"
        }
    ]
}

In the AWS console:

1

Open IAM Policies

Go to IAM in the AWS console, then Policies, and click Create policy.

2

Paste the JSON

Switch to the JSON tab and paste the policy above.

3

Name and create the policy

Click Next, give the policy a recognisable name such as vi-s3-read, and click Create policy.

Trust policy

The trust policy lets Vi's AWS account assume the role you are about to create. Vi generates the exact Principal ARN and ExternalId for your connection. Copy those values from the wizard.

{
  "Version": "2012-10-17",
  "Statement": [
      {
          "Effect": "Allow",
          "Action": "sts:AssumeRole",
          "Principal": {
              "AWS": "arn:aws:iam::277710232653:role/ext-<your-connection-id>"
          },
          "Condition": {
              "StringEquals": {
                  "sts:ExternalId": "<your-external-id>"
              }
          }
      }
  ]
}
Use the values Vi generates

The example ARN and external ID above are placeholders. The wizard generates a unique pair for every connection. Copy them from the Vi UI before pasting into AWS.

In the AWS console:

1

Open IAM Roles

Go to IAM, then Roles, and click Create role.

2

Choose Custom trust policy

Under Trusted entity type, select Custom trust policy and paste the JSON from Vi.

3

Attach the policy you created

Click Next. On the Add permissions screen, search for vi-s3-read (or whatever you named the policy in the previous section) and select it.

4

Name the role

Give the role a name such as vi-s3-sync, review, and click Create role.

Step 3: Configure CORS on the bucket

Vi loads thumbnails and previews from your bucket in the browser, so the bucket has to allow cross-origin requests.

In the S3 console, open your bucket, click Permissions, scroll to Cross-origin resource sharing (CORS), click Edit, and paste:

[
    {
        "AllowedHeaders": [],
        "AllowedMethods": [
            "GET"
        ],
        "AllowedOrigins": [
            "https://vi.datature.com"
        ],
        "ExposeHeaders": []
    }
]

If your bucket already has a CORS configuration, merge the vi.datature.com origin into the existing rule rather than replacing it.

Step 4: Paste the role ARN into Vi

Back in the AWS console, open the role you created and copy the ARN from the Summary section. Paste it into the Vi wizard and click Next.

The Connection Status step shows one of two outcomes:

Connection status indicators

Indicator
Meaning
What to do
Green heart
Vi can list objects in the bucket and read individual files.
Click Next to move on to syncing.
Broken heart
Vi cannot read the bucket. Either the role ARN is wrong, the trust policy points to a different external ID, or the IAM policy is missing the bucket.
Recheck the role ARN, trust policy, and IAM policy. Then click Retry.

Step 5: Sync your assets

Choose Sync Now to start the first sync immediately, or Sync Later to set up the connection without syncing. If you pick Sync Now, Vi walks through three more screens before the sync starts in earnest.

  1. Preview Files to Sync. Vi scans the bucket prefix and shows the file count alongside a sample of object paths. Confirm the preview matches what you expect, then click Sync.
  2. Sync Started. A confirmation appears letting you know the job is running in the background. Click I Understand to dismiss the dialog; the sync continues even after you close the wizard or the browser tab.
  3. Track progress. Open the Connected Bucket dropdown in the top-right of the Explorer to see the connection name, status, provider, bucket, prefix, asset count, and a live progress bar while assets are retrieved.

The first sync takes 5 to 40 minutes depending on the bucket size.

Asset requirements

Files in S3 must meet the same format requirements as direct uploads.

S3 asset requirements

Asset type
Requirement
Images
No EXIF orientation tag, or an orientation of 1. Rotated images render incorrectly because Vi does not rewrite pixels.
MP4 videos
Major brand mp42 and pixel format yuv420p. Run ffprobe your-video.mp4 to verify.
Other formats
See Upload Images and Upload Videos for the full supported list.

Annotations are not part of the bucket sync. Vi reads only image and video metadata from S3. If you have existing labels in COCO, YOLO, Pascal VOC, CSV, or Vi JSONL, upload them directly to Vi once the assets finish syncing.

Troubleshooting

Three things to check:

  1. The role ARN you pasted into Vi matches the role you created.
  2. The trust policy on the role uses the exact Principal ARN and ExternalId from the Vi wizard. A copy-paste error in either value blocks the assume-role call.
  3. The IAM policy attached to the role lists your bucket name in the Resource ARNs. A trailing slash or a typo in the bucket name causes a silent permission failure.

Two common causes. First, you may have hit the monthly data row quota mid-sync. Check usage in Billing. Second, individual files may have failed the asset format requirements above; rotated images and non-mp42 videos are skipped.

The image has an EXIF orientation tag other than 1. You have two options.

Option 1: Bake the orientation into the pixels with ImageMagick. This rotates the image data and resets the orientation tag to 1.

Auto-orient with ImageMagick
mogrify -auto-orient your-image.jpg

Option 2: Strip the orientation tag with exiftool. Use this when the pixels are already correct and only the tag is wrong.

Remove the EXIF orientation tag
exiftool -Orientation= -overwrite_original your-image.jpg

To process a whole folder, point either tool at the directory:

Batch fix every image in a folder
mogrify -auto-orient ./images/*.jpg exiftool -Orientation= -overwrite_original -r ./images

Re-upload the fixed files to the bucket and run the sync again.

Re-run the sync from the Connection Manager. Bucket changes do not propagate automatically; Vi re-reads the metadata only when you start a new sync.

Next steps

Annotate Data

Label the synced images and videos in the visual annotator.

Sync From External Buckets

Compare AWS S3 sync with the other supported storage providers.

Train A Model

Fine-tune a vision-language model on the synced dataset.