AI & Tech

Google DeepMind Launches Gemma 4: Open-Source Frontier Multimodal AI Comes to On-Device

31B model hits LMArena 1,452; 26B MoE scores 1,441 with just 4B active params — Apache 2.0 license unlocks full commercial use

유재민··6 min read·
Welcome Gemma 4: Frontier multimodal intelligence on device
Summary
  • Google DeepMind launches Gemma 4 with 31B dense model scoring 1,452 and 26B MoE scoring 1,441 on LMArena.
  • Full Apache 2.0 open-source release supports image, text, and audio multimodal inputs with on-device deployment.
  • New architecture innovations — PLE, Shared KV Cache, Dual RoPE — improve memory efficiency and long-context handling.

Google DeepMind's On-Device Multimodal Breakthrough

Google DeepMind has officially launched the Gemma 4 family of open-source multimodal models through Hugging Face. Released under the Apache 2.0 license, the models support image, text, and audio inputs. The 31B dense model achieved an estimated LMArena score of 1,452 (text-only), while the 26B Mixture-of-Experts (MoE) model reached 1,441 with just 4 billion active parameters. The release supports all major inference engines including transformers, llama.cpp, MLX, WebGPU, and Rust.

Why Gemma 4 Matters

Gemma 4 is not merely a performance upgrade. Its significance lies in delivering frontier-level multimodal intelligence at on-device scale within the open-source ecosystem.

While previous open-source multimodal models were largely confined to image-text inputs, Gemma 4's smaller variants (E2B, E4B) also support audio — enabling real-time speech processing and text generation simultaneously on edge devices.

Support for variable aspect ratios and five configurable image token budgets (70, 140, 280, 560, 1,120) lets users fine-tune the trade-off between speed, memory, and quality. The same model family can serve use cases from mobile apps to server-side deployments.

Hugging Face noted that during pre-release testing, the out-of-the-box performance was so strong that finding meaningful fine-tuning examples proved difficult — a testament to the model's intrinsic quality.

What Changed from Previous Versions

FeatureGemma 3Gemma 3nGemma 4Change
MultimodalImage+TextImage+Text+AudioImage+Text+Audio+VideoVideo added
Aspect RatioFixedFixedVariableMore flexible
Image Token BudgetSingleSingle5 adjustable levelsPerf-efficiency balance
KV CacheStandardStandardShared KV CacheMemory efficiency up
EmbeddingSinglePLE introducedPLE extendedPer-layer residual signal
LMArena Score1,452 (31B) / 1,441 (26B MoE)Frontier-level achieved
LicenseApache 2.0Apache 2.0Apache 2.0Unchanged

Three Core Architecture Innovations

Per-Layer Embeddings (PLE): In standard transformers, each token receives a single embedding vector at input. PLE adds a second embedding table that injects a small residual signal into every decoder layer, enabling richer context-dependent representations at each depth.

Shared KV Cache: The last N layers reuse key-value states from earlier layers, eliminating redundant KV projections and reducing both memory usage and inference latency.

Dual RoPE Configuration: Standard RoPE is applied to sliding-window attention layers while pruned RoPE handles global attention layers, enabling efficient long-context processing. Small dense models use 512-token windows; larger models use 1,024.

Gemma's Journey: Google's Open-Source AI Strategy

Google's serious entry into open-source AI is relatively recent. Following ChatGPT's explosive growth in 2023, Meta's LLaMA series and Mistral's open releases demonstrated that open-source models could rival proprietary ones.

Google DeepMind joined the open-source race in early 2024 with Gemma 1. Gemma 2 improved parameter efficiency for small-model markets; Gemma 3 added image-text multimodality; Gemma 3n focused on on-device optimization and first introduced PLE and audio support.

Gemma 4 integrates all these advances into a single cohesive family, proving the thesis that open-source models can achieve frontier performance. Official fine-tuning support via TRL, Unsloth Studio, and Vertex AI rounds out the ecosystem.

What Comes Next [Expert Analysis]

Gemma 4's release is likely to reshape the competitive landscape of open-source AI.

On-device AI goes mainstream: With small variants capable of on-device audio processing, privacy-first AI applications that reduce cloud API dependency are likely to proliferate rapidly.

New benchmark for cost-efficient enterprise deployment: The 26B MoE model's 1,441 LMArena score with only 4B active parameters signals dramatically lower inference costs. Compared to commercial APIs of similar capability, the total cost of ownership advantage will likely accelerate Gemma 4 adoption among startups.

Multimodal standard competition intensifies: Variable aspect ratio support and five-level image token control are likely to pressure competitors to adopt similar features. These capabilities could become standard in upcoming updates to GPT-4o, Claude, and other leading models.

Strategic implications of Apache 2.0: Full commercial licensing allows direct product integration, strengthening Google Cloud ecosystem ties while expanding Google's influence in the open-source AI community — a calculated dual strategy.

The pace at which open-source AI has been catching up to frontier models has accelerated sharply since 2024. Gemma 4 is likely to stand as a significant milestone in that trajectory.

Share

댓글 (17)

바람의비평가방금 전

Google의 성공 비결이 궁금합니다.

판교의여행자방금 전

정말 대단하네요! DeepMind 사례가 좋은 선례가 되기를 바랍니다. 계속 응원하겠습니다!

별빛의기타5분 전

구독 중인데 만족합니다.

인천의관찰자5분 전

희망적인 소식이네요. gemma-4 성과가 세계적으로 인정받는 것 같습니다.

저녁의분석가12분 전

LLM의 성공 비결이 궁금합니다. 계속 응원하겠습니다!

여름의크리에이터12분 전

언론이 이래야죠.

비오는날독자30분 전

기분 좋은 뉴스입니다. DeepMind 사례가 좋은 선례가 되기를 바랍니다.

호기심많은분석가1시간 전

기사 잘 읽었습니다.

꼼꼼한독자1시간 전

자랑스럽습니다! gemma-4 같은 소식이 더 자주 들렸으면 좋겠습니다.

서울의아메리카노2시간 전

LLM 덕분에 해당 분야에 관심이 더 생겼습니다.

도서관의에스프레소2시간 전

Google의 성공 비결이 궁금합니다. 이런 소식이 힘이 됩니다.

부산의분석가3시간 전

DeepMind 덕분에 해당 분야에 관심이 더 생겼습니다.

비오는날라떼5시간 전

Launches의 과정이 쉽지 않았을 텐데 결과가 인상적입니다.

오후의첼로5시간 전

반가운 소식입니다. gemma-4 사례가 좋은 선례가 되기를 바랍니다.

따뜻한아메리카노8시간 전

LLM에 대해 더 자세히 알고 싶어졌습니다.

신중한녹차8시간 전

Google에 참여한 모든 분들께 박수를 보냅니다.

구름위부엉이

DeepMind에 대해 더 자세히 알고 싶어졌습니다.

More in this series

More in AI & Tech

Latest News