Google DeepMind Launches Gemma 4: Open-Source Frontier Multimodal AI Comes to On-Device
31B model hits LMArena 1,452; 26B MoE scores 1,441 with just 4B active params — Apache 2.0 license unlocks full commercial use

- •Google DeepMind launches Gemma 4 with 31B dense model scoring 1,452 and 26B MoE scoring 1,441 on LMArena.
- •Full Apache 2.0 open-source release supports image, text, and audio multimodal inputs with on-device deployment.
- •New architecture innovations — PLE, Shared KV Cache, Dual RoPE — improve memory efficiency and long-context handling.
Google DeepMind's On-Device Multimodal Breakthrough
Google DeepMind has officially launched the Gemma 4 family of open-source multimodal models through Hugging Face. Released under the Apache 2.0 license, the models support image, text, and audio inputs. The 31B dense model achieved an estimated LMArena score of 1,452 (text-only), while the 26B Mixture-of-Experts (MoE) model reached 1,441 with just 4 billion active parameters. The release supports all major inference engines including transformers, llama.cpp, MLX, WebGPU, and Rust.
Why Gemma 4 Matters
Gemma 4 is not merely a performance upgrade. Its significance lies in delivering frontier-level multimodal intelligence at on-device scale within the open-source ecosystem.
While previous open-source multimodal models were largely confined to image-text inputs, Gemma 4's smaller variants (E2B, E4B) also support audio — enabling real-time speech processing and text generation simultaneously on edge devices.
Support for variable aspect ratios and five configurable image token budgets (70, 140, 280, 560, 1,120) lets users fine-tune the trade-off between speed, memory, and quality. The same model family can serve use cases from mobile apps to server-side deployments.
Hugging Face noted that during pre-release testing, the out-of-the-box performance was so strong that finding meaningful fine-tuning examples proved difficult — a testament to the model's intrinsic quality.
What Changed from Previous Versions
| Feature | Gemma 3 | Gemma 3n | Gemma 4 | Change |
|---|---|---|---|---|
| Multimodal | Image+Text | Image+Text+Audio | Image+Text+Audio+Video | Video added |
| Aspect Ratio | Fixed | Fixed | Variable | More flexible |
| Image Token Budget | Single | Single | 5 adjustable levels | Perf-efficiency balance |
| KV Cache | Standard | Standard | Shared KV Cache | Memory efficiency up |
| Embedding | Single | PLE introduced | PLE extended | Per-layer residual signal |
| LMArena Score | — | — | 1,452 (31B) / 1,441 (26B MoE) | Frontier-level achieved |
| License | Apache 2.0 | Apache 2.0 | Apache 2.0 | Unchanged |
Three Core Architecture Innovations
Per-Layer Embeddings (PLE): In standard transformers, each token receives a single embedding vector at input. PLE adds a second embedding table that injects a small residual signal into every decoder layer, enabling richer context-dependent representations at each depth.
Shared KV Cache: The last N layers reuse key-value states from earlier layers, eliminating redundant KV projections and reducing both memory usage and inference latency.
Dual RoPE Configuration: Standard RoPE is applied to sliding-window attention layers while pruned RoPE handles global attention layers, enabling efficient long-context processing. Small dense models use 512-token windows; larger models use 1,024.
Gemma's Journey: Google's Open-Source AI Strategy
Google's serious entry into open-source AI is relatively recent. Following ChatGPT's explosive growth in 2023, Meta's LLaMA series and Mistral's open releases demonstrated that open-source models could rival proprietary ones.
Google DeepMind joined the open-source race in early 2024 with Gemma 1. Gemma 2 improved parameter efficiency for small-model markets; Gemma 3 added image-text multimodality; Gemma 3n focused on on-device optimization and first introduced PLE and audio support.
Gemma 4 integrates all these advances into a single cohesive family, proving the thesis that open-source models can achieve frontier performance. Official fine-tuning support via TRL, Unsloth Studio, and Vertex AI rounds out the ecosystem.
What Comes Next [Expert Analysis]
Gemma 4's release is likely to reshape the competitive landscape of open-source AI.
On-device AI goes mainstream: With small variants capable of on-device audio processing, privacy-first AI applications that reduce cloud API dependency are likely to proliferate rapidly.
New benchmark for cost-efficient enterprise deployment: The 26B MoE model's 1,441 LMArena score with only 4B active parameters signals dramatically lower inference costs. Compared to commercial APIs of similar capability, the total cost of ownership advantage will likely accelerate Gemma 4 adoption among startups.
Multimodal standard competition intensifies: Variable aspect ratio support and five-level image token control are likely to pressure competitors to adopt similar features. These capabilities could become standard in upcoming updates to GPT-4o, Claude, and other leading models.
Strategic implications of Apache 2.0: Full commercial licensing allows direct product integration, strengthening Google Cloud ecosystem ties while expanding Google's influence in the open-source AI community — a calculated dual strategy.
The pace at which open-source AI has been catching up to frontier models has accelerated sharply since 2024. Gemma 4 is likely to stand as a significant milestone in that trajectory.
댓글 (17)
Google의 성공 비결이 궁금합니다.
정말 대단하네요! DeepMind 사례가 좋은 선례가 되기를 바랍니다. 계속 응원하겠습니다!
구독 중인데 만족합니다.
희망적인 소식이네요. gemma-4 성과가 세계적으로 인정받는 것 같습니다.
LLM의 성공 비결이 궁금합니다. 계속 응원하겠습니다!
언론이 이래야죠.
기분 좋은 뉴스입니다. DeepMind 사례가 좋은 선례가 되기를 바랍니다.
기사 잘 읽었습니다.
자랑스럽습니다! gemma-4 같은 소식이 더 자주 들렸으면 좋겠습니다.
LLM 덕분에 해당 분야에 관심이 더 생겼습니다.
Google의 성공 비결이 궁금합니다. 이런 소식이 힘이 됩니다.
DeepMind 덕분에 해당 분야에 관심이 더 생겼습니다.
Launches의 과정이 쉽지 않았을 텐데 결과가 인상적입니다.
반가운 소식입니다. gemma-4 사례가 좋은 선례가 되기를 바랍니다.
LLM에 대해 더 자세히 알고 싶어졌습니다.
Google에 참여한 모든 분들께 박수를 보냅니다.
DeepMind에 대해 더 자세히 알고 싶어졌습니다.
More in this series
More in AI & Tech

애플 맥북 네오 4월 물량 완판...신규 주문 5월로 밀려

OpenAI Launches GPT-Rosalind, Specialized Reasoning AI for Life Sciences... Shaking Up Drug Development Paradigm

EU Begins Direct Talks with Anthropic Over Claude Mythos AI Cybersecurity Threats

Perplexity Officially Launches Mac-Exclusive AI Agent 'Personal Computer'

Global Financial Authorities Launch Coordinated Emergency Response to Anthropic's 'Mythos' AI Cyber Threat

Anthropic Secures 800-Person London Office...Building European Foothold Amid Pentagon Conflict
Latest News

ICIJ Exposes Merck's Keytruda Pricing Strategy and Patent Abuse
ICIJ's Cancer Calculus investigation exposes Merck's Keytruda pricing and patent strategies.

Israel-Lebanon 10-Day Ceasefire Takes Effect; UN Hopes It Opens Path to Talks
A 10-day Israel-Lebanon ceasefire took effect at midnight on April 17.

JWST, 성간 혜성 3I/ATLAS에서 메테인 최초 검출…외계 행성계 단서 포착
JWST가 성간 혜성 3I/ATLAS에서 메테인을 최초 직접 검출, 외계 행성계 내부 조성 단서 확보.

IMF Resumes Relations with Venezuela After 7 Years...Hopes for $4.9 Billion Frozen SDR Release
The IMF has resumed official relations with Venezuela after 7 years of suspension since 2019.

America's Political Cartoonists Capture the Week in Washington
Political cartoonists across the U.S. document the era through weekly satire.

IMF Resumes Relations with Venezuela After 7 Years...Possibility of Unfreezing $4.9 Billion
The IMF decided to resume official cooperation with Venezuela after seven years.

When the Jungle Swallowed Concrete: The Paradox of London's Barbican Conservatory
Photographer Altrath captures the spatial paradox of London's Barbican Conservatory in a new series.

Record-High Current Account Surplus, Yet Why is the Won Weakening?
Bank of Korea officially analyzes structural causes of continued won depreciation despite current account surplus.