Manufacturing Quality Inspection

Train a VLM to detect defects and surface anomalies on product images. End-to-end walkthrough for production line quality control.

Manual visual inspection is one of the most common and most error-prone tasks in manufacturing. A line inspector checking hundreds of units per shift will miss defects, especially toward the end of a shift when fatigue sets in. Datature Vi lets you train an AI model on photos of your products so it can flag defects automatically, without writing code or hiring a data science team.

For an interactive overview of this application, visit the manufacturing inspection use case on vi.datature.com.


Why automate quality inspection?

Close-up of a printed circuit board used for automated visual defect inspection

Every production line depends on catching bad parts before they ship. Human inspectors get tired. Miss rates climb as shifts wear on. Rework and customer returns eat into margins.

Camera-based inspection with Datature Vi works differently. You take photos of your products, label the good ones and the bad ones, and train a model that learns to tell them apart. Once trained, the model checks every unit at the same standard, around the clock.

The result: fewer escapes to customers, faster feedback to the line, and inspectors freed up for the judgment calls that still need a human eye.

No ML experience required

This guide is written for operations and engineering teams, not data scientists. You need product images. Budget about 30 minutes of active work to set up and launch your first training run, matching the quickstart timeline; after that, GPU training usually runs 1-3 hours.


What you'll build

A model that takes a photo of a product and answers:

  • Is there a defect? (yes/no)
  • Where is it? (location on the product)
  • What type of defect is it? (scratch, dent, discoloration, crack, etc.)

You can use this model to flag items for manual review, route defective parts off the line, or generate inspection logs automatically.


What you need

Requirement
Details
Product images
50–200 photos of your product (mix of good and defective units)
Image format
JPEG or PNG; consistent lighting and camera angle preferred
Defect examples
At least 20–30 images that show actual defects
Time
~30 minutes to set up your first training run; GPU training typically 1–3 hours

You do not need ML experience, GPU hardware, or custom code to get started.


Step 1: Prepare your images

Before uploading, review your image set with these goals:

Lighting and angle consistency: Train on images taken the same way your production camera captures them. If your camera is mounted overhead, don't train on handheld photos.

Background standardization: A plain background (conveyor belt, inspection table) reduces noise. If your production setup has a cluttered background, include that in your training images.

Balance your dataset: Include roughly equal numbers of defective and non-defective images. If all your examples show defects, the model may over-predict them.

Diversity within defect types: If scratches appear in different locations, include examples from each location. Don't only show scratches on the left edge.


Step 2: Choose your task type

Datature Vi supports two main approaches for inspection:

Approach
Best for
Output
Visual Question Answering (VQA)
Yes/no defect detection, defect classification
Text answer: "Yes, there is a scratch on the upper left edge."
Phrase Grounding
Pinpointing defect location visually
Bounding box around the defect area (the model draws a box where your phrase points)

Phrase grounding in plain terms: you write a short phrase (for example "the scratch near the logo") and the model returns a box on the image for that region. Recommended starting point: VQA. It is easier to annotate and produces answers you can parse directly. Use phrase grounding when you need boxes for dashboards, cropping, or overlay visuals.

You can also combine both using freeform text with structured JSON output to get all fields in one response.


Step 3: Create a dataset and annotate

  1. Create a dataset: choose Visual Question Answering as the type
  2. Upload your images
  3. Add annotations: for each image, write a question and answer pair

Annotation examples:

Image
Question
Answer
Defective unit (scratch)
Is there a visible defect on this component?
Yes, there is a surface scratch on the upper left edge.
Good unit
Is there a visible defect on this component?
No, the component surface appears intact with no visible defects.
Defective unit (dent)
Is there a visible defect on this component?
Yes, there is a dent near the center of the component face.

Tips:

  • Use the same question for every image. The model learns from consistency.
  • Be specific in your answers: "upper left edge" is better than "somewhere on the left"
  • Use consistent terminology: pick one word for each defect type and stick with it ("scratch" not "scratch / scrape / mark")

You can use IntelliScribe to draft labels in the browser from each image (AI-assisted starting text). You should still review and edit every label so wording matches your line and safety rules.


Step 4: Configure and train

  1. Create a training project
  2. Create a workflow with these recommended settings. A workflow is the saved recipe inside your training project: system prompt, which images to train on, model choice, and training knobs. A run is one execution of that recipe on the GPU. In the workflow canvas, the main nodes are System Prompt (instructions to the model), Dataset (your images and labels), and Model (architecture and training method).
Setting
Recommended value
Model
Qwen3.5 4B first (faster, lower cost), then Qwen3.5 9B when you need higher accuracy
Fine-tuning method
LoRA (updates a small adapter instead of every weight; good default here)
Epochs
50–100 for datasets under 100 images (one epoch is one full pass over the training split)
Validation split
20% (default): held-back images used to score progress, not for weight updates
  1. Start the training run. Datature Vi allocates GPU hardware automatically.

Training typically takes 1–3 hours. Monitor progress via loss curves. A healthy run shows both training and validation loss decreasing steadily.


Step 5: Deploy and test

Once training completes, most teams validate on new photos with a short Python script. Vi SDK is the Datature Vi Python package: it loads your trained weights and returns answers or boxes for each image file. Weights ship in SafeTensors (a common, safe weight format). If your IT stack already standardizes on NVIDIA containers, you can also serve the same export through NVIDIA NIM (NIM deployment guide). Hugging Face compatibility here means compatible tooling for teams that already load transformer-style models.

  1. Download your model
  2. Install the Vi SDK
  3. Run inference on new product images:
from vi.inference import ViModel

model = ViModel(
    run_id="your-run-id",
    secret_key=".your-secret-key.",
    organization_id="your-organization-id",
)

result, error = model(
    source="product_photo.jpg",
    user_prompt="Is there a visible defect on this component?"
)

if error is None:
    print(result.result.answer)
    # e.g. "Yes, there is a surface scratch on the upper left edge."

Get structured JSON output

For automated pipelines, use structured data extraction to get machine-readable results:

import json
from vi.inference import ViModel

model = ViModel(
    run_id="your-run-id",
    secret_key=".your-secret-key.",
    organization_id="your-organization-id",
)

result, error = model(
    source="product_photo.jpg",
    user_prompt="Inspect this component for defects.",
    generation_config={"temperature": 0.0, "do_sample": False}
)

if error is None:
    data = json.loads(result.result)
    # {"defect_found": true, "defect_type": "scratch", "location": "upper left", "severity": "low"}

    if data["defect_found"]:
        print(f"FAIL: {data['defect_type']} at {data['location']} (severity: {data['severity']})")
    else:
        print("PASS")

To use this pattern, train on a freeform text dataset with JSON annotations and a system prompt that specifies the schema. See the Structured Data Extraction guide for full setup instructions.


Improving accuracy

If your model misses defects or produces false positives:

Add more examples of edge cases: if the model misses subtle scratches, add 20–30 more images showing subtle scratches with clear annotations.

Use chain-of-thought reasoning: for multi-step checks (surface, then edges, then joints), ask the model to state those steps in order before the final pass or fail. That pattern is chain-of-thought reasoning. Allowing the model to show intermediate reasoning steps before a final verdict can reduce missed defects on complex checklists.

Check annotation consistency: if similar images get different answers, the model will learn inconsistent behavior. Review your annotations for disagreements.

Standardize your camera setup: if production images look different from training images (different lighting, angle, or distance), accuracy will drop. Retrain with images from your actual production camera.


Next steps

Structured Data Extraction

Get JSON output from inspections (defect type, location, severity) ready for your database.

Chain-of-Thought Reasoning

Improve accuracy on complex multi-step inspection tasks.

Phrase Grounding

Draw bounding boxes around defect locations for visual inspection dashboards.