What Are System Prompts and How Do I Write One?

A system prompt is a set of natural language instructions that tells your vision-language model (VLM) how to behave. It defines what the model looks for in images, how it formats responses, and what domain knowledge it applies. Datature Vi uses the same system prompt during both training and inference. This page explains what system prompts do, how to structure them, and what mistakes to avoid.

On this page

What a system prompt does The four elements Why training and inference must match Hallucination guards Common mistakes FAQ Related resources

What does a system prompt do?

Think of a system prompt as a job description for your model. Before your model sees a single image, the system prompt tells it:

What role to play: "You are a quality control inspector for printed circuit boards."
What to focus on: "Identify solder bridges, missing components, and cold joints."
How to respond: "Return a JSON object with defect_found, defect_type, and severity fields."
What to avoid: "Do not speculate about components hidden under other parts."

Without a system prompt, the model falls back on its general training. It might describe an image in broad terms when you needed a specific defect report. The system prompt narrows the model's attention to your task.

New to Datature Vi?

System prompts are one of the first things you configure when setting up a training workflow. Follow the quickstart to see how prompts fit into the full pipeline.

The four elements of a system prompt

Every effective system prompt covers four areas. You don't need long paragraphs for each. A few precise sentences per element is enough.

Define the role

Tell the model what kind of expert it is. This sets the tone and vocabulary for all responses.

Weak: "You analyze images." Strong: "You are a radiologist assistant that describes findings in chest X-rays using standard medical terminology."

Specify what to look for

Name the specific objects, conditions, or patterns the model should identify. Generic instructions produce generic results.

Weak: "Find problems in the image." Strong: "Identify cracks, chips, and discoloration on ceramic tiles. Ignore surface dust and reflections."

Set the output format

Tell the model exactly how to structure its response. This is especially important for structured data extraction and programmatic workflows where downstream code parses the output.

Weak: "Describe what you see." Strong: "Answer in one sentence starting with Yes or No, followed by the defect type and location."

Add hallucination guards

Instruct the model to only report what it can see. Without guardrails, VLMs may fill gaps with plausible-sounding information that isn't grounded in the image.

Example: "Only describe what is directly visible in the image. Do not speculate about areas outside the frame or occluded by other objects. If you cannot determine the answer, say 'Unable to determine from this image.'"

See What are hallucination guards? below for more detail.

Why the training prompt and inference prompt must match

Datature Vi uses your system prompt as part of the training data. The model learns to follow the specific instructions, formatting, and vocabulary in that prompt. If you change the prompt at inference time, the model receives instructions it never trained on.

The effect is similar to training a translator in French-to-English and then asking them to translate Spanish. They might produce something, but the quality drops.

Scenario

What happens

Severity

Same prompt for training and inference

Model performs as expected

No issue

Minor wording changes (synonyms, reordering)

Small performance drop, may not be noticeable

Low

Different output format requested

Model may ignore the new format or mix formats

Medium

Different task or domain

Model produces unreliable or irrelevant output

High

If you need to change the system prompt after training, retrain the model with the updated prompt. Small adjustments to wording may work, but changes to the task, format, or domain require a new training run.

What are hallucination guards?

Hallucination is when a VLM generates information that isn't present in the image. The model might describe objects that don't exist, invent counts, or assign labels based on patterns from its pre-training data rather than what's actually visible.

This happens because VLMs are trained on large datasets of image-text pairs. When the model encounters ambiguity, it fills gaps with statistically likely content. A model trained on factory images might "see" a common defect type even when the image shows no defect at all.

Hallucination guards are instructions in the system prompt that constrain the model:

Hallucination guards don't eliminate hallucination entirely, but they reduce it. Combining guards with high-quality annotations and fine-tuning on your specific data gives the strongest results.

Common mistakes

Mistake

Why it hurts

Fix

Prompt is too vague ("analyze this image")

Model has no specific direction, produces generic descriptions

Name the exact task, objects, and output format

Prompt is too long (500+ words)

Consumes context window tokens that could be used for image processing and responses

Keep prompts under 200 words. Move reference material to annotations instead.

Different prompt at inference vs training

Model receives unfamiliar instructions, quality degrades

Copy the exact training prompt to your inference code

No hallucination guards

Model may fabricate objects or details not in the image

Add explicit constraints: "only describe what is visible"

Generic output format ("describe what you see")

Inconsistent response structure across images

Define the exact format: JSON fields, sentence structure, or categories

Frequently asked questions

Related resources

Configure Your System Prompt

Step-by-step guide to setting up prompts in the workflow canvas.

Annotation Guide

How to create effective training data for your VLM.

Chain-of-Thought Reasoning

Add step-by-step reasoning to model outputs.