QLoRA Training Guide
Understand how QLoRA works in Datature Vi and configure quantization settings for memory-efficient VLM fine-tuning.
- A workflow open in the workflow canvas with a model architecture selected
- Familiarity with how LoRA and quantization work
- An understanding of your GPU resources
QLoRA combines LoRA (Low-Rank Adaptation) with NF4 quantization. The base model weights are stored in 4-bit precision to save memory, while small adapter matrices train in BF16 for full gradient precision. This gives you the memory savings of both techniques at once, letting you fine-tune a 7B model on a single T4 GPU (16 GB VRAM).
Datature Vi handles LoRA rank, alpha, target modules, and learning rate scheduling automatically. The one setting you control is the quantization format: NF4 or FP4.
When to use QLoRA
QLoRA is the recommended default for most Datature Vi training runs. It covers the widest range of model sizes on the most affordable hardware.
Start with QLoRA. Move to LoRA without quantization only if you measure quality degradation on your specific task. Move to full SFT only if LoRA results plateau and you have the GPU budget. See the Full SFT Training Guide for that path.
Configure quantization
In the workflow canvas, click the Model node. Under Quantization, select the format:
- NF4 (Normalized Float 4) -- the recommended default. NF4 maps its 16 possible values to match the bell-curve distribution of transformer weights, preserving more information where it matters.
- FP4 (4-bit Floating Point) -- a standard 4-bit format with uniform distribution. Use as a fallback if NF4 causes numerical stability issues with a specific model architecture.
Both formats reduce memory by roughly 4x compared to FP16. With either enabled and LoRA selected as training mode, Datature Vi automatically configures QLoRA: base weights in 4-bit, adapter matrices in BF16.
See LoRA and Quantization for a deeper explanation of NF4 vs FP4 and when quantization affects quality.
To run standard LoRA without quantization, disable the quantization toggle in the Model node. This doubles the memory needed for base weights (FP16 instead of 4-bit) but preserves full weight precision. Only do this if you confirm through evaluation that NF4 causes measurable quality loss on your task.
How QLoRA works under the hood
Datature Vi configures LoRA internals automatically based on your selected model architecture and size. Understanding what happens behind the scenes helps you interpret training behavior and troubleshoot issues.
Adapter rank and alpha
LoRA inserts small trainable matrices (adapters) into specific model layers. The rank controls adapter capacity: how many dimensions the adapter uses to capture fine-tuning changes. Higher ranks give more capacity but use more memory. The alpha parameter scales the adapter's contribution to the final output.
Datature Vi selects rank and alpha values tuned for each architecture and model size. You do not need to configure these manually.
Target modules
LoRA adapters are inserted into specific transformer layers. The default targets are the attention projection layers (q_proj, k_proj, v_proj, o_proj), which control how the model relates different parts of image and text inputs. For some architectures, Datature Vi also targets feed-forward layers (gate_proj, up_proj, down_proj) when the model size and task complexity warrant it.
Learning rate scheduling
Datature Vi uses cosine annealing with warmup as the default learning rate schedule. The learning rate rises from near-zero during a warmup phase, then smoothly decays following a cosine curve. This gives the model time to make large updates early (when there is the most to learn) and small, precise updates later (when fine-tuning is finishing).
The learning rate range, warmup duration, and schedule type are set automatically based on the model architecture and training mode.
VRAM requirements
These estimates include model weights, adapter parameters, optimizer states, and activation memory with gradient checkpointing enabled. Actual usage varies by 10-15% depending on batch size and sequence length. Leave 20-30% headroom above the estimate when selecting your GPU.
For a detailed VRAM estimation formula and full GPU comparison, see GPU and Compute Resources Guide.
Troubleshooting QLoRA training
Frequently asked questions
Next steps
Updated 4 days ago
