Manufacturing Quality Inspection
Train a VLM to detect defects and surface anomalies on product images. End-to-end walkthrough for production line quality control.
Manual visual inspection is one of the most common and most error-prone tasks in manufacturing. A line inspector checking hundreds of units per shift will miss defects, especially toward the end of a shift when fatigue sets in. Datature Vi lets you train an AI model on photos of your products so it can flag defects automatically, without writing code or hiring a data science team.
For an interactive overview of this application, visit the manufacturing inspection use case on vi.datature.com.
Why automate quality inspection?
Every production line depends on catching bad parts before they ship. Human inspectors get tired. Miss rates climb as shifts wear on. Rework and customer returns eat into margins.
Camera-based inspection with Datature Vi works differently. You take photos of your products, label the good ones and the bad ones, and train a model that learns to tell them apart. Once trained, the model checks every unit at the same standard, around the clock.
The result: fewer escapes to customers, faster feedback to the line, and inspectors freed up for the judgment calls that still need a human eye.
This guide is written for operations and engineering teams, not data scientists. You need product images. Budget about 30 minutes of active work to set up and launch your first training run, matching the quickstart timeline; after that, GPU training usually runs 1-3 hours.
What you'll build
A model that takes a photo of a product and answers:
- Is there a defect? (yes/no)
- Where is it? (location on the product)
- What type of defect is it? (scratch, dent, discoloration, crack, etc.)
You can use this model to flag items for manual review, route defective parts off the line, or generate inspection logs automatically.
What you need
You do not need ML experience, GPU hardware, or custom code to get started.
Step 1: Prepare your images
Before uploading, review your image set with these goals:
Lighting and angle consistency: Train on images taken the same way your production camera captures them. If your camera is mounted overhead, don't train on handheld photos.
Background standardization: A plain background (conveyor belt, inspection table) reduces noise. If your production setup has a cluttered background, include that in your training images.
Balance your dataset: Include roughly equal numbers of defective and non-defective images. If all your examples show defects, the model may over-predict them.
Diversity within defect types: If scratches appear in different locations, include examples from each location. Don't only show scratches on the left edge.
Step 2: Choose your task type
Datature Vi supports two main approaches for inspection:
Phrase grounding in plain terms: you write a short phrase (for example "the scratch near the logo") and the model returns a box on the image for that region. Recommended starting point: VQA. It is easier to annotate and produces answers you can parse directly. Use phrase grounding when you need boxes for dashboards, cropping, or overlay visuals.
You can also combine both using freeform text with structured JSON output to get all fields in one response.
Step 3: Create a dataset and annotate
- Create a dataset: choose Visual Question Answering as the type
- Upload your images
- Add annotations: for each image, write a question and answer pair
Annotation examples:
Tips:
- Use the same question for every image. The model learns from consistency.
- Be specific in your answers: "upper left edge" is better than "somewhere on the left"
- Use consistent terminology: pick one word for each defect type and stick with it ("scratch" not "scratch / scrape / mark")
You can use IntelliScribe to draft labels in the browser from each image (AI-assisted starting text). You should still review and edit every label so wording matches your line and safety rules.
Step 4: Configure and train
- Create a training project
- Create a workflow with these recommended settings. A workflow is the saved recipe inside your training project: system prompt, which images to train on, model choice, and training knobs. A run is one execution of that recipe on the GPU. In the workflow canvas, the main nodes are System Prompt (instructions to the model), Dataset (your images and labels), and Model (architecture and training method).
- Start the training run. Datature Vi allocates GPU hardware automatically.
Training typically takes 1–3 hours. Monitor progress via loss curves. A healthy run shows both training and validation loss decreasing steadily.
Step 5: Deploy and test
Once training completes, most teams validate on new photos with a short Python script. Vi SDK is the Datature Vi Python package: it loads your trained weights and returns answers or boxes for each image file. Weights ship in SafeTensors (a common, safe weight format). If your IT stack already standardizes on NVIDIA containers, you can also serve the same export through NVIDIA NIM (NIM deployment guide). Hugging Face compatibility here means compatible tooling for teams that already load transformer-style models.
- Download your model
- Install the Vi SDK
- Run inference on new product images:
from vi.inference import ViModel
model = ViModel(
run_id="your-run-id",
secret_key=".your-secret-key.",
organization_id="your-organization-id",
)
result, error = model(
source="product_photo.jpg",
user_prompt="Is there a visible defect on this component?"
)
if error is None:
print(result.result.answer)
# e.g. "Yes, there is a surface scratch on the upper left edge."Get structured JSON output
For automated pipelines, use structured data extraction to get machine-readable results:
import json
from vi.inference import ViModel
model = ViModel(
run_id="your-run-id",
secret_key=".your-secret-key.",
organization_id="your-organization-id",
)
result, error = model(
source="product_photo.jpg",
user_prompt="Inspect this component for defects.",
generation_config={"temperature": 0.0, "do_sample": False}
)
if error is None:
data = json.loads(result.result)
# {"defect_found": true, "defect_type": "scratch", "location": "upper left", "severity": "low"}
if data["defect_found"]:
print(f"FAIL: {data['defect_type']} at {data['location']} (severity: {data['severity']})")
else:
print("PASS")To use this pattern, train on a freeform text dataset with JSON annotations and a system prompt that specifies the schema. See the Structured Data Extraction guide for full setup instructions.
Improving accuracy
If your model misses defects or produces false positives:
Add more examples of edge cases: if the model misses subtle scratches, add 20–30 more images showing subtle scratches with clear annotations.
Use chain-of-thought reasoning: for multi-step checks (surface, then edges, then joints), ask the model to state those steps in order before the final pass or fail. That pattern is chain-of-thought reasoning. Allowing the model to show intermediate reasoning steps before a final verdict can reduce missed defects on complex checklists.
Check annotation consistency: if similar images get different answers, the model will learn inconsistent behavior. Review your annotations for disagreements.
Standardize your camera setup: if production images look different from training images (different lighting, angle, or distance), accuracy will drop. Retrain with images from your actual production camera.
Next steps
Updated 6 days ago
