Concepts

Vision-language models (VLMs) combine computer vision and natural language understanding to enable flexible, intuitive interactions with images. Instead of detecting pre-defined object categories, VLMs understand natural language descriptions and questions about visual content.

Datature Vi supports core VLM capabilities that form the foundation of modern vision AI applications.

Core VLM capabilities

Dataset types

Choose the right dataset type for your vision AI application:

Phrase Grounding

Localize objects in images using natural language descriptions. Find "the red car on the left" or "person wearing blue jacket" without pre-defined categories.

Visual Question Answering

Answer questions about images in natural language. Ask "What color is the car?" or "Is there a defect?" and get conversational answers.

Freeform

🚧 Coming soon — Define custom annotation schemas for specialized use cases and research projects.

Explore all dataset types →

Advanced capabilities

Chain-of-Thought Reasoning

Break down complex visual tasks into step-by-step reasoning processes for more accurate and explainable results.

Which capability should I use?

Choose based on what you need to accomplish:

Need	Use	Output
Locate objects described in text	Phrase Grounding	Bounding boxes with locations
Get information about images	Visual Question Answering	Text answers to your questions
Flexible object detection without fixed categories	Phrase Grounding	Spatial locations
Conversational interaction with images	Visual Question Answering	Natural language responses
Custom annotation schemas for specialized needs	Freeform (coming soon)	User-defined formats

Learn more:

Common use cases

Phrase Grounding applications

Perfect for scenarios requiring object localization with flexible descriptions:

Robotics — "Pick up the red mug on the left"
Image editing — Select objects using natural language
Autonomous vehicles — Identify "the pedestrian crossing from the right"
Warehouse automation — Find "the damaged box on the top shelf"

Explore detailed use cases and examples →

Visual Question Answering applications

Ideal for extracting information through conversational queries:

Quality inspection — "Is there a defect on the surface?"
Accessibility — Describe images for visually impaired users
Content moderation — "Does this image contain inappropriate content?"
Inventory management — "How many items are on the shelf?"

Explore detailed use cases and examples →

Freeform applications

🚧 Coming soon — Perfect for specialized scenarios requiring custom annotation formats:

Research projects — Novel computer vision tasks and experimental approaches
Medical imaging — Custom diagnostic annotations and measurements
Scientific imaging — Domain-specific labels and specialized metadata
Hybrid requirements — Combining multiple annotation types for complex use cases

Explore freeform concepts and use cases →

Getting started

Ready to build with VLMs? Start here:

Create a dataset

Set up datasets for Phrase Grounding, VQA, or Freeform tasks

Follow the quickstart

Complete VLM workflow from data to deployment

Train a model

Fine-tune VLMs on your specific use case

Learn more

Phrase Grounding explained → — Deep dive into visual grounding, how it works, and best practices
Visual Question Answering explained → — Complete guide to VQA, architectures, and optimization tips
Chain-of-Thought Reasoning explained → — Learn how step-by-step reasoning improves accuracy for complex visual tasks
Freeform explained → — Understanding custom annotation schemas and specialized use cases (coming soon)

Related resources

Phrase grounding — Deep dive into object localization with natural language
Visual question answering — Understand VQA capabilities and use cases
Chain-of-Thought reasoning — Learn step-by-step reasoning for complex visual tasks
Freeform — Custom annotation schemas for specialized use cases (coming soon)
Quickstart — End-to-end VLM training workflow
Train a model — Complete training guide
Annotate data — Create phrase grounding and VQA annotations
Create a dataset — Set up datasets for training
Glossary — Common VLM terminology and definitions
Vi SDK — Python SDK for programmatic access
Run inference — Use trained models for predictions
Configure your model — Select model architecture
Evaluate a model — Assess model performance
Contact us — Get help from the Datature team

Need help?

We're here to support your VLMOps journey. Reach out through any of these channels:

Contact Support

Get help from our team via our website or email us at [email protected]

Join Our Community

Connect with other Datature users, share ideas, and get community support on Slack

Explore Resources

Read our Blog
Check out GitHub
Watch Tutorials

Schedule a Demo

Book a personalized demo to see how Datature Vi can accelerate your vision AI projects