Evaluate a Model

Review training results, analyze metrics, and assess model performance across different checkpoints.

After training completes, evaluate your model's performance using comprehensive metrics, visual predictions, and detailed logs. Evaluation helps you understand how well your VLM learned from training data and performs on validation examples.

📋

Prerequisites

Before evaluating a model, ensure you have:


Access training results

Training results are organized by run in the Runs section of your training project.

Navigate to runs

  1. Open your training project from the Training section
  2. Click Runs in the left sidebar

The Runs page displays all training sessions with key information:

ColumnDescription
NameWorkflow name and Run Session ID
StatusCurrent run state (see all statuses)
StartedHow long ago the run began
Training TimeTotal duration of the training session
Model & GPUArchitecture and hardware configuration used

Common statuses:

  • Finished — Training completed successfully
  • 🔄 Running — Currently training
  • ⚠️ Out of Memory — GPU memory exhausted; needs troubleshooting
  • Failed — Error occurred; check logs for details
  • 🚫 Killed — Manually stopped by user
  • 💳 Out of Quota — Compute Credits depleted; refill to resume

View complete status guide →

Open a training run

Click on any run to view its detailed results. Each run provides three tabs for analyzing different aspects of performance.


Evaluation components


Quick evaluation workflow

Follow these steps to thoroughly evaluate your trained model:

1. Check run status

Verify training completed successfully or identify errors:

View all statuses →

2. Review metrics

Analyze quantitative performance measurements:

  • Loss curves — Verify model converged successfully
  • Evaluation metrics — Assess accuracy for your task type
  • Hyperparameters — Document configuration for reproducibility

Complete metrics guide →

3. Inspect predictions visually

Compare model outputs against ground truth:

  • View side-by-side comparisons of annotations
  • Navigate through evaluation checkpoints
  • Identify systematic errors and failure patterns

Advanced evaluation guide →

4. Analyze logs (if issues found)

Dig deeper into training behavior:

  • Review training steps and epochs
  • Identify error patterns and warnings
  • Troubleshoot configuration issues

Logs and troubleshooting →


Common evaluation tasks

Compare multiple runs

To identify best configuration:

  1. Document metrics from each run in a comparison table
  2. Note which hyperparameters changed between runs
  3. Compare visual predictions on the same evaluation specimens
  4. Select configuration with best performance for your requirements

Detailed comparison strategies →

Understand overfitting

Signs your model is overfitting:

  • Training loss decreases but validation metrics plateau or degrade
  • Large gap between train and validation performance
  • Model performs perfectly on training images but poorly on new examples

Complete overfitting guide →

Troubleshoot failed runs

Step-by-step diagnostic process:

  1. Check run status for error type
  2. View error details in Training Progress section
  3. Review logs for complete error messages
  4. Apply fixes based on error pattern
  5. Retry with adjusted configuration

Full troubleshooting guide →

Determine if model is production-ready

Evaluation criteria:

  • ✅ Metrics meet your accuracy requirements
  • ✅ Visual predictions show consistent quality
  • ✅ Model generalizes to diverse validation examples
  • ✅ False positive/negative rates are acceptable
  • ✅ Performance stable across multiple runs

Model evaluation best practices →


What's next?

If your run failed:


Related resources