Evaluate a Model

Review training results, analyze metrics, and assess model performance across different checkpoints.

After training completes, Datature Vi gives you three tabs to assess how well your vision-language model (VLM) learned: Metrics, Advanced Evaluation, and Logs. Each tab answers a different question about your run.

Before You Start

New to Datature Vi? Learn what it does or follow the quickstart.

By the end of this guide

Review training results and assess your model's performance using loss curves, evaluation metrics, and visual predictions.

What each tab shows

Training Metrics

Loss curves, evaluation metrics (F1, IoU, BLEU, BERTScore), and hyperparameters logged during the run.

Advanced Evaluation

Side-by-side comparison of ground truth annotations and model predictions across training checkpoints.

Training Logs

Full training output including step-level loss values, epoch markers, and error messages for debugging.

Access training results

Open your training project from the Training section, then click Runs in the left sidebar. Click any run with a Finished status to open its results. For a full breakdown of run statuses and the Runs page layout, see Monitor a Run.

How to evaluate a run

Work through these four steps after a run finishes:

1

Confirm the run finished

Check the status on the Runs page. A Finished status means all checkpoints saved and evaluation data is available. For other statuses, see Monitor a Run.

2

Review the metrics

Open the Metrics tab. Check that total loss decreased over time and review task-specific metrics (F1 and IoU for phrase grounding, or BLEU and BERTScore for visual question answering). See Training Metrics for interpretation guidance.

3

Inspect predictions visually

Open the Advanced Evaluation tab. Compare ground truth annotations against model predictions on individual validation images. Use the checkpoint slider to see how predictions changed during training. See Advanced Evaluation.

4

Check logs if something looks wrong

If metrics are unexpected or the run failed, open the Logs tab for the full training output. See Training Logs for common error patterns.

Common questions

Vi does not have a built-in run comparison view. The recommended approach is to open each run in a separate browser tab, document the key metrics (final loss, F1, BLEU) in a table, then use Advanced Evaluation on each run to compare predictions on the same validation images.

This is a classic overfitting pattern. The model memorized training examples but did not learn general patterns. Check the loss curves in Training Metrics for a widening gap between the orange (training) and blue (validation) curves. Solutions include training for fewer epochs, adding more diverse training data, or adjusting hyperparameters.

Open the Logs tab and scroll to the end of the output. The last few lines usually contain the error message. Common causes include GPU out-of-memory errors (reduce batch size) and dataset annotation format issues. See Training Logs for a full troubleshooting walkthrough.

Next steps

View Training Metrics

Interpret loss curves and task-specific evaluation metrics for your training run.

Inspect Predictions

Compare ground truth and model predictions visually across checkpoints.

Read Training Logs

Debug failed runs and trace errors with the full training log output.