Start a Training Run

Launch training and monitor your VLM's progress in real-time.

📍

Step 3 of 3: Start a Training Run

Part of the training workflow quickstart. Next: Deploy and test.

Launch training with your configured workflow and monitor your VLM's learning progress.

⏱️ Time: ~2 minutes to start (training runs 1-3 hours)

📋

Prerequisites

Learn about training workflows →


Start your training run

From your workflow canvas, click Run Training to open the training configuration dialog.

Run Training dialog

You'll configure four steps before starting training.


1. Advanced Settings

Configure checkpoint and evaluation settings.

Advanced Settings configuration

Checkpoint Strategy — Configure frequency of evaluation checkpoints

Advanced Evaluation — Enable to view advanced evaluation metrics and previews during training

📘

For quickstart

Keep default settings. You can adjust these in future training runs for more control.

Click Next to continue.


2. Hardware Configuration

Select your GPU type and quantity for training.

Hardware Configuration

Choose between:

  • Vi Cloud — Train on Vi's GPU infrastructure (recommended for quickstart)
  • Custom Runner — Train on your own infrastructure (coming soon)

GPU Type — Select from available GPU models

Number of GPUs — Choose quantity (1, 2, 4, or more depending on GPU type)

📘

For quickstart

Start with 1× NVIDIA T4 GPU. Larger models and batch sizes require more GPU memory (VRAM).

View complete GPU list and pricing →

Click Next to continue.


3. Dataset Validation

The system validates your dataset to ensure it's ready for training.

Dataset Validation

Wait for validation to complete. If issues are found, review and fix them before proceeding.

Click Next when validation shows "Ready for Training".


4. Review Summary

Review your complete training configuration.

Training Summary

The summary shows:

  • System Prompt character count
  • Model architecture
  • Batch size
  • Training epochs
  • Usage Multiplier (GPU credit consumption rate)

Click Run Training to start.


Monitor training progress

Once started, you can monitor your training in real-time. Training typically takes a few hours depending on dataset size, model architecture, epochs, and GPU type.

📘

Training runs in the background

You can safely close the browser. Training continues, and you'll be notified when complete.

Learn about monitoring training →


Common questions

Can I stop training early?

Yes! You can cancel a run from the training dashboard. Saved checkpoints are preserved.

Learn about managing runs →

What if training fails?

Check the training logs for error details. Common issues:

  • Insufficient GPU memory — Reduce batch size or use larger GPU
  • Dataset errors — Verify annotations are correct
  • Configuration issues — Review workflow settings
How do I know if my model is good?

After training:

  • Check validation metrics
  • Compare with baseline performance
  • Test on unseen data
  • Evaluate model predictions visually

Learn about evaluation →

Can I run multiple trainings at once?

Yes! You can start multiple runs in parallel. Each run uses separate GPU resources and consumes Compute Credits independently.

How much will training cost?

Training costs depend on GPU type and duration. Each GPU type has a usage multiplier that determines Compute Credit consumption per minute.

Example: 1× NVIDIA T4 for 60 minutes = 60 Compute Credits

View GPU pricing and usage multipliers →


After training completes

Training complete!

Your model is trained and ready for evaluation and deployment.

Once training finishes, you can:

  1. Evaluate performance — Review metrics and test set results
  2. Compare runs — Analyze different configurations
  3. Download the model — Export for deployment
  4. Start new runs — Experiment with different settings

What's next?

Download your trained model and test it with new images to see how it performs.


Related resources