Screenshot to HTML
Train a VLM to convert design mockups and screenshots into production-ready HTML and CSS using your design system tokens and component patterns.
Frontend development often starts with a design mockup. A designer creates the visual layout, and a developer translates it into HTML and CSS. That translation step is time-consuming, especially when the design uses a custom design system with specific tokens, spacing, and component conventions.
Datature Vi trains a model on your own design system. You pair screenshots of UI components with the corresponding HTML and CSS code, and the model learns to generate code that matches your conventions. New mockups go in, production-ready markup comes out.
This does not replace a frontend developer. It handles the mechanical translation from visual layout to code, freeing developers to focus on interactivity, state management, and business logic.
For an interactive overview of this application, visit the screenshot to HTML use case on vi.datature.com.
Common applications
Task type: Freeform Text
This use case uses the freeform text dataset type. Each training pair is a screenshot (image) and the corresponding HTML/CSS code (text annotation).
Annotation structure
Each image annotation contains the HTML and CSS that faithfully reproduces the screenshot:
<nav class="nav-bar">
<a href="/" class="nav-logo">Aura Collective</a>
<ul class="nav-links">
<li><a href="/collections">Collections</a></li>
<li><a href="/artisans">Artisans</a></li>
<li><a href="/stories">Stories</a></li>
</ul>
</nav>
<style>
.nav-bar {
display: flex;
align-items: center;
justify-content: space-between;
padding: var(--spacing-4) var(--spacing-8);
background: var(--color-surface);
}
.nav-links {
display: flex;
gap: var(--spacing-6);
list-style: none;
}
</style>System prompt
You are a frontend code generator. Given a screenshot of a UI component or page, produce semantic HTML and CSS that reproduces the layout. Use the following design system conventions:
- CSS custom properties for colors (--color-*), spacing (--spacing-*), and typography (--font-*)
- BEM-style class naming
- Flexbox or Grid for layout
- No inline styles
Output only the HTML and CSS. Do not include JavaScript.Deploy and test
from vi.inference import ViModel
model = ViModel(
run_id="your-run-id",
secret_key=".your-secret-key.",
organization_id="your-organization-id",
)
result, error = model(
source="mockup_screenshot.png",
user_prompt="Generate HTML and CSS for this UI.",
generation_config={"temperature": 0.0, "do_sample": False}
)
if error is None:
html_code = result.result
with open("output.html", "w") as f:
f.write(html_code)
print("HTML written to output.html")Training tips
Use your own design system: the model should generate code that follows your team's conventions, not generic HTML. Train on screenshots paired with code that uses your actual CSS variables, class names, and component structure.
Start with components, not full pages: train on individual components first (nav bars, cards, forms, footers). Full-page generation works better once the model understands your component vocabulary.
Include multiple viewport sizes: if you want responsive output, include screenshots at mobile, tablet, and desktop widths, each paired with the responsive code.
Clean up training code: the code in your annotations should be the code you want the model to generate. Remove dead CSS, commented-out blocks, and framework boilerplate from training examples.
Next steps
Updated about 1 month ago
