CCTV Intelligence

Train a VLM to detect unauthorized access, tailgating, loitering, and perimeter breaches from security camera feeds.

Overhead security camera view of a shopping mall interior with pedestrians

Security teams monitor dozens of camera feeds at once. A single operator watching 16 screens will miss events, especially during long shifts. Most incidents are caught after the fact, when someone reviews footage following a complaint or report.

Datature Vi trains a model on your facility's own camera feeds. You label frames showing normal activity and frames showing events of interest (unauthorized entry, tailgating through a secured door, loitering in restricted areas). The model learns the patterns specific to your layout and alerts in real time when it spots something unusual.

This is not a replacement for security staff. It is a tool that watches every feed simultaneously so your team can respond faster to the events that matter.

For an interactive overview of this application, visit the CCTV intelligence use case on vi.datature.com.


Common applications

Task
What the model does
Unauthorized access
Detects people entering restricted areas without authorization
Tailgating
Flags multiple people passing through a secured door on one badge swipe
Loitering
Identifies individuals remaining in an area longer than expected
Perimeter breach
Detects people crossing fence lines or entering after hours
Natural language querying
Answers questions about camera feeds: "Is anyone in the server room?"

Choose your task type

Approach
Best for
Output
Visual Question Answering (VQA)
Yes/no security checks, scene state queries
Text answer: "Yes, there is one person in the server room."
Phrase Grounding
Locating the person or activity of concern
Bounding box around the individual or area
Freeform Text (JSON)
Structured security event reports
JSON: {"event": "tailgating", "door": "east_entrance", "count": 2}

Annotation examples

Image
Question
Answer
Normal lobby traffic
Is there any unusual activity in this frame?
No. Normal foot traffic in the lobby area.
Person in server room
Is anyone present in the server room?
Yes. One person is standing near the server racks on the left side of the room.
Tailgating at door
Is someone tailgating through this door?
Yes. Two people are passing through the door, but only one badge scan was recorded.

Tips:

  • Train on frames from each camera angle separately, as perspectives vary between cameras
  • Include frames from different times of day (daytime, nighttime, low-light conditions)
  • Label what "normal" looks like for each zone so the model learns context-specific baselines

Deploy and test

from vi.inference import ViModel

model = ViModel(
    run_id="your-run-id",
    secret_key=".your-secret-key.",
    organization_id="your-organization-id",
)

result, error = model(
    source="camera_frame.jpg",
    user_prompt="Is anyone present in the server room?"
)

if error is None:
    print(result.result.answer)

Training tips

Train per camera or zone: each camera has a unique perspective and baseline. A model trained on lobby footage will not generalize well to parking lot footage. Train separate models or include examples from each zone.

Include environmental variation: weather, lighting, and seasonal changes affect outdoor cameras. Indoor cameras change less, but include examples with different occupancy levels.

Define "normal" clearly: the model learns from your labels. If you label a frame as "no unusual activity," make sure similar-looking frames are labeled consistently. Ambiguous labeling degrades performance.

Test on real shift footage: validate the model on footage from actual shifts, not curated test images. Real-world camera feeds include motion blur, compression artifacts, and partial occlusion.


Next steps

Structured Data Extraction

Return structured JSON event reports for integration with security management systems.

Phrase Grounding

Highlight people or areas of concern in camera frames.

Chain-of-Thought Reasoning

Multi-step security assessment: scene state, then threat level, then recommended action.