What Are System Prompts and How Do I Write One?
Learn what system prompts are, how they shape VLM behavior during training and inference, and how to write effective prompts for your domain.
A system prompt is a set of natural language instructions that tells your vision-language model (VLM) how to behave. It defines what the model looks for in images, how it formats responses, and what domain knowledge it applies. Datature Vi uses the same system prompt during both training and inference. This page explains what system prompts do, how to structure them, and what mistakes to avoid.
What does a system prompt do?
Think of a system prompt as a job description for your model. Before your model sees a single image, the system prompt tells it:
- What role to play: "You are a quality control inspector for printed circuit boards."
- What to focus on: "Identify solder bridges, missing components, and cold joints."
- How to respond: "Return a JSON object with defect_found, defect_type, and severity fields."
- What to avoid: "Do not speculate about components hidden under other parts."
Without a system prompt, the model falls back on its general training. It might describe an image in broad terms when you needed a specific defect report. The system prompt narrows the model's attention to your task.
System prompts are one of the first things you configure when setting up a training workflow. Follow the quickstart to see how prompts fit into the full pipeline.
The four elements of a system prompt
Every effective system prompt covers four areas. You don't need long paragraphs for each. A few precise sentences per element is enough.
Define the role
Tell the model what kind of expert it is. This sets the tone and vocabulary for all responses.
Weak: "You analyze images." Strong: "You are a radiologist assistant that describes findings in chest X-rays using standard medical terminology."
Specify what to look for
Name the specific objects, conditions, or patterns the model should identify. Generic instructions produce generic results.
Weak: "Find problems in the image." Strong: "Identify cracks, chips, and discoloration on ceramic tiles. Ignore surface dust and reflections."
Set the output format
Tell the model exactly how to structure its response. This is especially important for structured data extraction and programmatic workflows where downstream code parses the output.
Weak: "Describe what you see." Strong: "Answer in one sentence starting with Yes or No, followed by the defect type and location."
Add hallucination guards
Instruct the model to only report what it can see. Without guardrails, VLMs may fill gaps with plausible-sounding information that isn't grounded in the image.
Example: "Only describe what is directly visible in the image. Do not speculate about areas outside the frame or occluded by other objects. If you cannot determine the answer, say 'Unable to determine from this image.'"
See What are hallucination guards? below for more detail.
Why the training prompt and inference prompt must match
Datature Vi uses your system prompt as part of the training data. The model learns to follow the specific instructions, formatting, and vocabulary in that prompt. If you change the prompt at inference time, the model receives instructions it never trained on.
The effect is similar to training a translator in French-to-English and then asking them to translate Spanish. They might produce something, but the quality drops.
If you need to change the system prompt after training, retrain the model with the updated prompt. Small adjustments to wording may work, but changes to the task, format, or domain require a new training run.
What are hallucination guards?
Hallucination is when a VLM generates information that isn't present in the image. The model might describe objects that don't exist, invent counts, or assign labels based on patterns from its pre-training data rather than what's actually visible.
This happens because VLMs are trained on large datasets of image-text pairs. When the model encounters ambiguity, it fills gaps with statistically likely content. A model trained on factory images might "see" a common defect type even when the image shows no defect at all.
Hallucination guards are instructions in the system prompt that constrain the model:
Hallucination guards don't eliminate hallucination entirely, but they reduce it. Combining guards with high-quality annotations and fine-tuning on your specific data gives the strongest results.
Common mistakes
Frequently asked questions
Related resources
Updated 6 days ago
