High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753

October 28, 2025

52 min episode · 2 min read

Hung Bui

Episode

52 min

Read time

2 min

Topics

Productivity

AI-Generated Summary

Published Dec 31, 2025

Key Takeaways

✓Model Size Reduction: A sub-4-billion parameter Vietnamese language model outperformed the original 7-billion parameter version by iterating over the same dataset multiple times during training and applying minor optimization adjustments, proving smaller can be better with proper training techniques.
✓One-Step Diffusion: Swift Brush eliminates the typical 50-100 denoising steps in diffusion models by distilling multi-step knowledge into a single-step student network, achieving image generation in under 0.25 seconds while maintaining quality scores equal to or better than the original teacher model.
✓Image Editing Architecture: Swift Edit enables one-step image editing by training an inverted network that converts images to noise, then applies the one-step generation model. Training uses both real data and synthetic data generated by the efficient one-step model, creating highly intuitive loss functions.
✓Test-Time Scaling Advantage: Small models with inference-time scaling can outperform significantly larger models on specific tasks like math, making them viable for on-device deployment despite the increased compute requirements. This transforms the constraint of limited device resources into an opportunity for efficient specialized performance.

What It Covers

Hung Bui explains how VinAI Research achieved efficient on-device AI by training smaller models that match larger model performance, developing one-step diffusion for real-time image generation, and building Vietnam's top AI research lab.

Key Questions Answered

•Model Size Reduction: A sub-4-billion parameter Vietnamese language model outperformed the original 7-billion parameter version by iterating over the same dataset multiple times during training and applying minor optimization adjustments, proving smaller can be better with proper training techniques.
•One-Step Diffusion: Swift Brush eliminates the typical 50-100 denoising steps in diffusion models by distilling multi-step knowledge into a single-step student network, achieving image generation in under 0.25 seconds while maintaining quality scores equal to or better than the original teacher model.
•Image Editing Architecture: Swift Edit enables one-step image editing by training an inverted network that converts images to noise, then applies the one-step generation model. Training uses both real data and synthetic data generated by the efficient one-step model, creating highly intuitive loss functions.
•Test-Time Scaling Advantage: Small models with inference-time scaling can outperform significantly larger models on specific tasks like math, making them viable for on-device deployment despite the increased compute requirements. This transforms the constraint of limited device resources into an opportunity for efficient specialized performance.

Notable Moment

Vietnamese users complained that even the 7-billion parameter open-weight model was too large for their GPUs, prompting the team to halve the model size. The resulting sub-4-billion parameter version unexpectedly performed better than the original larger model.

Know someone who'd find this useful?