How Do I Deploy My Trained Model?
Learn what VLM deployment means and compare three paths in Datature Vi: local inference with the Vi SDK, NVIDIA NIM containers, and self-hosted servers.
Deployment is the step where your trained model starts processing real images outside of Datature Vi. You download the model weights, load them into an inference environment, and send images with prompts to get predictions. Datature Vi supports three deployment paths: local inference with the Vi SDK, NVIDIA NIM containers for production, and self-hosted servers using frameworks like vLLM. This page explains when to use each and what you need to get started.
What does deployment mean for a VLM?
Training teaches your model to recognize patterns. Deployment puts that model to work. During deployment, you:
- Download the trained model weights from Datature Vi
- Load those weights into an inference runtime (Vi SDK, NIM container, or your own server)
- Send new images with prompts and receive predictions
The model runs on a GPU (or a quantized version on a smaller GPU). It uses the same system prompt and generation settings you configured during training. If you change the system prompt at inference time, the model's behavior may degrade. See system prompt consistency for details.
Three deployment paths
Vi SDK (local inference)
The Vi SDK downloads your model weights and runs inference on your local machine. You write a few lines of Python, pass in an image and a prompt, and get results back. This is the fastest way to test a trained model.
When to use it: During development, for testing on sample images, or for applications where a single machine handles all requests. Good for prototyping before committing to a production setup.
Limitations: Tied to one machine. No built-in load balancing, scaling, or API endpoint. You manage the GPU yourself.
NVIDIA NIM containers
NIM (NVIDIA Inference Microservice) packages your model into a Docker container with GPU acceleration and an OpenAI-compatible API endpoint. You deploy the container on any server with an NVIDIA GPU and send requests over HTTP.
When to use it: For production applications where multiple users or services need to call the model. NIM handles batching, GPU memory management, and provides a standard API that your backend code can call.
Limitations: Requires Docker and NVIDIA GPU drivers on the host. Container images are large (several GB).
Self-hosted (vLLM, TGI)
For teams that want full control, you can load the downloaded model weights into inference frameworks like vLLM or Hugging Face Text Generation Inference (TGI). These frameworks provide optimized serving with features like continuous batching and speculative decoding.
When to use it: When you have specific infrastructure requirements, need custom optimization, or are already running vLLM/TGI for other models.
Limitations: You manage everything: GPU allocation, model loading, API routing, scaling, and monitoring. Requires ML infrastructure experience.
Which path should I choose?
Start with the Vi SDK
Every deployment starts here. Download your model, run inference on a few test images, and verify the outputs match your expectations. This validates your model before you invest in production infrastructure.
Move to NIM for production
When you need to serve predictions to an application, other team members, or customer-facing features, package the model into a NIM container. The OpenAI-compatible API makes integration with backend services straightforward.
Self-host for custom needs
If NIM doesn't meet your infrastructure requirements (custom GPU clusters, specific latency SLAs, multi-model serving), use vLLM or TGI with the downloaded weights.
What you need before deploying
Before downloading and deploying your model, check that:
Datature Vi does not add a separate "release approval" button after training. Export and inference access follow project roles and secret keys. Many teams assign an accountable approver in their own runbook; see Roles and RACI checklist for a template.
Frequently asked questions
Related resources
Updated 5 days ago
