Upload Videos
Add videos to a Datature Vi dataset. Covers supported formats, file size limits, processing details, data row consumption, and how to plan your quota usage.
Videos in Datature Vi are processed frame-by-frame for annotation. Once processing is complete, each frame behaves like an individual image. This guide covers the upload steps, supported formats, what happens during processing, and how to manage your data row quota.
- A dataset created in Datature Vi. Create one now if you haven't yet.
- A video file in a supported format (see the table below), under 512 GB.
- An estimate of how many data rows your video will consume (see Data row consumption).
Open your dataset's Explorer tab

Open your dataset and click the Explorer tab. This is where your uploaded videos will appear.

Your dataset is created when you see the dataset page with empty statistics.
Supported video formats
For best compatibility and processing speed, use MP4 (H.264) or MOV formats.
File size limits
- Maximum: 512 GB per video
- Recommended: Under 100 MB for faster uploads and processing
For large videos, consider splitting them into shorter segments before uploading. This improves upload reliability and makes processing faster.
How Datature Vi processes videos
When you upload a video, the platform does three things before it is available for annotation:
- Frame extraction: The video is broken into individual frames. Each frame becomes an annotatable asset.
- Resolution optimization: Videos are resized so the longest dimension is 1024 pixels. Some lossy compression is applied to individual frames. This keeps the annotator fast without affecting annotation accuracy for bounding boxes and text labels.
- Audio removal: Audio tracks are stripped. The platform focuses on visual content only.
Variable frame rate (VFR) conversion
Some recording devices (screen capture software, smartphones) produce videos with a variable frame rate, where the time gap between frames changes throughout the clip. Datature Vi converts VFR videos to a constant frame rate during processing, using the video's average frame rate. This ensures consistent frame spacing for annotation and training. The conversion happens automatically; you do not need to pre-process your videos.
Resolution and compression
Frames are resized so the longest edge is 1024 pixels, and light lossy compression is applied. This reduces file size for faster loading in the web annotator. The quality is high enough that bounding box placement, text reading, and object identification are not affected. If you need to reference the original resolution for any reason, keep your source video files.
Data row consumption
Each video frame consumes data rows, the same way a single image does. Frame count determines your total usage.
Formula: frames = duration (seconds) × frame rate (FPS) and each frame costs 5 data rows.
Consider lower frame rates (e.g., 15 FPS) if your use case does not require high temporal resolution. Halving the frame rate halves the data row cost. See resource usage for quota information.
Troubleshooting
Do this with the Vi SDK
import vi
client = vi.Client(
secret_key="your-secret-key",
organization_id="your-organization-id"
)
result = client.assets.upload(
dataset_id="your-dataset-id",
paths="./videos/",
wait_until_done=True
)
print(f"Uploaded: {result.total_succeeded} assets")For more details, see the full SDK reference.
Next steps
Updated about 1 month ago
