Model Settings
Configure training mode, hyperparameters, and inference settings for your VLM in Datature Vi.
- A workflow open in the workflow canvas with an architecture selected
- A configured dataset to understand your data size and training requirements
- Basic familiarity with your GPU resources (resource usage)
Model settings control three things: how the model trains (training mode and quantization), how it learns (hyperparameters), and how it generates output at inference (evaluation settings). You access all of them by clicking the Model node in the workflow canvas.
Model options
Architecture size
The number of parameters in the model, measured in billions (B). Larger models have more capacity to learn complex patterns but require more GPU memory and training time.
Start with a smaller model to validate your approach, then scale up if you need higher accuracy.
Training mode
Training mode determines which parameters are updated during training.
LoRA (Low-Rank Adaptation) inserts small trainable adapter layers into a frozen base model. Only the adapter weights are updated. This requires 3–5× less memory and trains 2–3× faster than full fine-tuning. The accuracy difference is small for most tasks.
Full fine-tuning (SFT) updates every parameter in the network. This requires more GPU memory and training time but gives maximum flexibility for tasks that differ substantially from the base model's training data.
Start with LoRA. Switch to full fine-tuning only if LoRA results fall short of your accuracy target and you have the GPU resources to support it.
Quantization
Quantization reduces model weight precision to save GPU memory. Both available formats reduce memory by approximately 4×:
- NF4 (Normalized Float 4): Optimized for transformer models. Better quality preservation than FP4 for VLMs. This is the recommended default.
- FP4 (4-bit Floating Point): Standard 4-bit format. Use if NF4 causes compatibility issues.
Precision type
The numeric format used for calculations during training:
- BFloat16 (recommended): Best numerical stability for large models on modern GPUs (NVIDIA Ampere+). Runs at ~2× the speed of Float32.
- Float16: Good for older GPUs without BFloat16 support. Slightly more prone to gradient instability on large models.
- Float32: Use only when debugging numerical stability issues. Slowest and uses the most memory.
Hyperparameters
Each hyperparameter control in the Model node maps 1:1 to a property under hyperparameters on the model block in the saved run.json file you get with a model download: same keys and values, no hidden renames. Use that JSON as the canonical reference when you compare UI training runs to Vi SDK flow specs.
Epochs
The number of complete passes through your training data. One epoch means the model sees every training image once.
Choosing the right number depends on dataset size:
Watch for overfitting: if validation performance stops improving or gets worse while training loss keeps falling, you have too many epochs. Monitor the loss curves in your training run to catch this early.
Learning rate
Controls how much the model's parameters change with each training step. The typical range is 1e-5 to 1e-4.
- A rate that is too high causes unstable training (loss oscillates or diverges)
- A rate that is too low makes training converge slowly
Start with the default. Adjust only if training looks unstable (try lower) or converges unusually slowly (try slightly higher).
Batch size
The number of training images processed in each step before the model updates its weights. Larger batches make training faster but require more GPU memory.
- If you see out-of-memory errors, reduce batch size
- Use gradient accumulation to simulate a larger effective batch size when GPU memory is limited
Gradient accumulation steps
Accumulates gradients across multiple steps before applying a weight update. Setting this to N has a similar effect to multiplying your batch size by N, without the extra memory cost. Useful when GPU memory prevents you from using a larger batch size directly.
Optimizer
The algorithm that computes weight updates from gradients. The default optimizer is tuned for VLM fine-tuning. Change this only if you have a specific reason: optimizer choice rarely has more impact than learning rate and epoch count.
Evaluation settings
These settings control how the model generates output at inference time. They affect response length, diversity, and coherence.
Max new tokens
The maximum number of tokens the model can generate in a single response. Keep this high enough to accommodate your longest expected output. For short classification answers this can be small; for detailed descriptions it should be larger.
Temperature
Controls output randomness. Lower values (0.1–0.5) produce more consistent, deterministic responses. Higher values (0.7–1.0) produce more varied outputs. For structured outputs (JSON, fixed-format answers) keep temperature low.
Top-p (nucleus sampling)
Limits token selection to the smallest set of candidates whose cumulative probability exceeds the threshold. A value of 0.9 means the model only considers tokens that together account for 90% of the probability mass. Lower values make outputs more focused; higher values allow more variety.
Top-k results
Limits token selection to the k most likely candidates at each step. Works alongside top-p to control generation diversity.
Repetition penalty
Reduces the likelihood of the model repeating the same words or phrases. Values above 1.0 penalize repetition. If your model produces repetitive outputs, increase this slightly (try 1.1–1.3).
Frequently asked questions
Do this with the Vi SDK
import vi
client = vi.Client(
secret_key="your-secret-key",
organization_id="your-organization-id"
)
flow = client.flows.get("your-flow-id")
blocks = []
for block in flow.spec.blocks:
settings = dict(block.settings)
if "model" in block.block:
settings["trainingMode"] = {"type": "LoRA"}
settings["quantization"] = {"enabled": True, "type": "NF4"}
settings["compute"] = {"precisionType": "BFloat16"}
settings["hyperparameters"] = {
"epochs": 10,
"learningRate": 0.0001,
"batchSize": 8,
"gradientAccumulation": 1,
"optimizer": "AdamW"
}
settings["evaluation"] = {
"maxNewTokens": 512,
"topK": 50,
"topP": 1,
"temperature": 1,
"repetitionPenalty": 1.05
}
blocks.append({
"block": block.block,
"settings": settings,
"style": block.style,
})
client.flows.update(flow_id=flow.flow_id, spec={"blocks": blocks})For more details, see the full SDK reference.
Next steps
Model Architectures
Review the seven available VLM architectures and choose the best fit for your task.
Start A Training Run
Configure training settings, select GPU hardware, and start a training run.
How Does VLM Training Work?
Learn the concepts behind fine-tuning, LoRA, and how VLMs learn from your data.
Updated 4 days ago
