DiffusionGemma
A groundbreaking parallel generation text model from Google DeepMind, designed for exceptionally fast, higher-throughput AI experiences.
By abandoning traditional token-by-token generation in favor of discrete text diffusion, DiffusionGemma dramatically accelerates inference. This opens a new frontier for real-time AI agents and high-throughput local deployment without compromising quality.
256
Tokens in Parallel
1,000
Tokens / Sec (Max)
26B
Total Parameters
3.8B
Active Parameters
⚡ Parallel Text Generation
Traditional LLMs use autoregressive decoding, predicting text strictly one token after another. DiffusionGemma uses diffusion-based denoising to refine an entire block of text simultaneously. By processing up to 256 tokens at once, it breaks the bottleneck of sequential generation, delivering massive throughput improvements.
🧠 MoE Efficiency
Built upon a robust Mixture-of-Experts (MoE) foundation, DiffusionGemma scales capability without destroying inference efficiency. While the total model footprint is 25.2 billion parameters, it leverages a sparse design. During each step, only 3.8 billion parameters are actively utilized for computation.