- Google released DiffusionGemma today, an open-text AI generating entire 256-token blocks at once via text diffusion, hitting 1,000+ tokens per second on an NVIDIA H100.
- This Apache 2.0-licensed model is four times faster than standard Gemma but trades quality for speed and currently lacks the drafter module needed for local inference on consumer setups.
- The 256K context model is preconfigured for just 8,192 tokens on NVIDIA NIM, blocking its use with agentic frameworks like Hermes Agent that require a 64,000-token minimum.
Google launched DiffusionGemma today, a revolutionary open-weight AI model that generates text using the same diffusion process as image generators. The model achieves over 1,000 tokens per second on an NVIDIA H100, marking a speed breakthrough that is detailed in Google’s announcement.
Unlike autoregressive models that write token by token, it starts with noise and refines 256-token blocks in parallel. This approach provides bidirectional attention, allowing the beginning of a text to be influenced by its end.
Consequently, it excels at constrained tasks like code infilling and structured output. A fine-tuned version designed to solve Sudoku puzzles achieved an 80% success rate, a massive leap from the base model’s near-zero performance.
However, significant hurdles prevent immediate widespread adoption. Running DiffusionGemma locally requires a special drafter module that isn’t yet available in popular runtimes like mlx-lm or LM Studio.
Furthermore, the model launched on NVIDIA NIM with only 8,192 tokens of context by default. This is below the 64,000-token minimum required by frameworks like Hermes Agent, effectively blocking autonomous workflows.
The model is therefore aimed at developers building real-time tools on high-end NVIDIA hardware. Researchers are also intrigued by its potential for generating complex, interdependent sequences like protein structures.
Text diffusion has evolved from academic projects like LLaDA and Dream. DiffusionGemma represents the first major open release from a top-tier lab, building on the strategy of its predecessor, Gemma 4.
✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.
