Concepts

Understand the core vision-language model concepts that power Datature Vi.

Vision-language models (VLMs) combine computer vision and natural language understanding to enable flexible, intuitive interactions with images. Instead of detecting pre-defined object categories, VLMs understand natural language descriptions and questions about visual content.

Datature Vi supports core VLM capabilities that form the foundation of modern vision AI applications.


Core VLM capabilities

Dataset types

Choose the right dataset type for your vision AI application:

Explore all dataset types →

Advanced capabilities


Which capability should I use?

Choose based on what you need to accomplish:

NeedUseOutput
Locate objects described in textPhrase GroundingBounding boxes with locations
Get information about imagesVisual Question AnsweringText answers to your questions
Flexible object detection without fixed categoriesPhrase GroundingSpatial locations
Conversational interaction with imagesVisual Question AnsweringNatural language responses
Custom annotation schemas for specialized needsFreeform (coming soon)User-defined formats

Learn more:


Common use cases

Phrase Grounding applications

Perfect for scenarios requiring object localization with flexible descriptions:

  • Robotics — "Pick up the red mug on the left"
  • Image editing — Select objects using natural language
  • Autonomous vehicles — Identify "the pedestrian crossing from the right"
  • Warehouse automation — Find "the damaged box on the top shelf"

Explore detailed use cases and examples →

Visual Question Answering applications

Ideal for extracting information through conversational queries:

  • Quality inspection — "Is there a defect on the surface?"
  • Accessibility — Describe images for visually impaired users
  • Content moderation — "Does this image contain inappropriate content?"
  • Inventory management — "How many items are on the shelf?"

Explore detailed use cases and examples →

Freeform applications

🚧 Coming soon — Perfect for specialized scenarios requiring custom annotation formats:

  • Research projects — Novel computer vision tasks and experimental approaches
  • Medical imaging — Custom diagnostic annotations and measurements
  • Scientific imaging — Domain-specific labels and specialized metadata
  • Hybrid requirements — Combining multiple annotation types for complex use cases

Explore freeform concepts and use cases →


Getting started

Ready to build with VLMs? Start here:


Learn more

Related resources