AI & Tech

Google DeepMind Unveils Framework for Measuring AGI Evolution

Cognitive science-based staged evaluation system charts path to artificial general intelligence

AI Reporter Alpha·2026년 3월 21일 토 00:03·7 min read·

Summary

•Google DeepMind unveiled a cognitive framework for measuring AGI development progress, evaluating AI capabilities across six cognitive domains and five maturity levels.
•Current state-of-the-art LLMs are at Level 2-3 in language and reasoning domains, while motor and social interaction domains remain at Level 0-1.
•This framework redefines AGI not as a single goal but as a gradual development process across multiple cognitive domains, establishing foundations for stage-by-stage safety governance.

A New Standard for Measuring AI 'Intelligence'

Google DeepMind has unveiled a cognitive framework designed to objectively measure the developmental progress toward Artificial General Intelligence (AGI). This research goes beyond simply defining "what AGI is" to present a system for evaluating, stage by stage, how far current AI systems have evolved toward human-level general intelligence.

Designed based on cognitive science research, this framework aims to analyze AI system capabilities multilaterally and visualize "where we are now."

Why We Need AGI Measurement Systems Now

While AGI has long been referenced in the AI industry as the "ultimate goal," there has been insufficient consensus on what it actually is and how to measure it. OpenAI defines it as "systems that outperform humans at most economically valuable work," while other researchers define it as "at or above human level across all cognitive tasks"—interpretations vary widely.

Google DeepMind introduced a developmental model to resolve this confusion. Rather than simply judging AI as "achieved/not achieved," this approach tracks in detail what stage of cognitive capability has been implemented.

The importance of this approach is twofold:

Setting research direction: By clearly identifying current AI strengths and weaknesses, it can guide what research is needed to advance to the next stage.
Establishing safety discussion foundations: As AGI levels increase, societal impact grows, requiring appropriate safety measures and governance systems prepared for each stage.

Core Structure of the Cognitive Framework

DeepMind's framework consists of six cognitive domains and five capability maturity levels.

Six Cognitive Domains

Domain	Description	Evaluation Examples
Perception	Ability to process sensory information like vision and hearing	Image recognition, speech understanding
Motor Skills	Ability to perform physical actions	Robot control, object manipulation
Language	Natural language understanding and generation ability	Conversation, translation, writing
Reasoning	Logical thinking and problem-solving ability	Math problem solving, strategy formulation
Learning	Ability to acquire and apply new information	Few-shot learning, transfer learning
Social Interaction	Ability to cooperate and communicate with others	Teamwork, emotion recognition

Five Maturity Stages

The framework divides the capability levels AI can reach in each cognitive domain into five stages:

Level 0 — Non-Human: Below human level, performs only basic tasks
Level 1 — Emerging: Can perform simple tasks but inconsistently
Level 2 — Competent: Performs tasks at typical adult human level
Level 3 — Expert: Equals top human experts in the field
Level 4 — Superhuman: Surpasses humanity's best experts

For example, current large language models (LLMs) can be evaluated at approximately Level 2-3 in the language domain and Level 1-2 in the reasoning domain. Meanwhile, the motor domain remains at Level 0-1, and social interaction is also limited.

What Stage Is Current AI At?

DeepMind mapped the latest AI systems according to this framework and discovered the following patterns:

Latest LLMs like GPT-4, Gemini 2.0, Claude 3.5: Level 2-3 in language and reasoning domains. While approaching expert level on specific benchmarks (MMLU, HumanEval), generalization capabilities remain weak.
AlphaGo, AlphaFold: Achieved Level 4 (superhuman) in specific domains (Go, protein structure prediction). However, they are not classified as AGI due to lack of generality.
Robotic AI systems: Level 0-1 in perception and motor domains. Real-time environmental adaptation capabilities are limited.

In conclusion, current AI exists between two extremes: "high performance in narrow domains" and "low generalization across broad domains." To achieve AGI, at least Level 2 or higher must be attained across all six cognitive domains—a goal that remains distant.

[AI Analysis] Future Path of AGI Development

The implications this framework presents are clear: AGI is not a single breakthrough but a process where gradual progress across multiple cognitive domains converges.

Short-term Outlook (2026-2028)

Accelerated multimodal integration: Models with enhanced interaction between language, perception, and reasoning domains are likely to emerge. Systems like current Gemini 2.0 or GPT-5 (anticipated) are already evolving in this direction.
Rise of robotic AI: To transition from Level 0→1 in the motor domain, Google, Tesla, Figure AI and others are expected to deploy robotic systems that learn in real environments at scale.

Medium-term Outlook (2029-2032)

Expansion of expert-level domains: AI achieving Level 3-4 in specific areas (coding, medicine, law) will be commercialized, and "human + AI collaboration" models will likely become standard.
Surge in social interaction research: Research investment is expected to concentrate on areas like emotion recognition, ethical judgment, and teamwork.

Long-term Questions (Post-2033)

Predictions about AGI achievement timing remain controversial. However, DeepMind's framework suggests that "which domain reaches Level 4 first" is a more important question than "when AGI will be achieved." Some domains may reach superhuman levels while others remain at Level 1.

From a safety perspective, this framework also has important implications. Staged governance becomes possible, such as pre-assessing potential risks at each stage and applying enhanced oversight systems when entering Level 3 or above.

A New Starting Point for AGI Discussion

Google DeepMind's research is significant in transforming AGI from "a distant philosophical concept" to "a measurable engineering goal." AI researchers can now quantitatively evaluate "how intelligent are the systems we've built" and chart roadmaps for advancing to the next stage.

However, this framework is not perfect. It has limitations in simplifying human intelligence complexity into six domains and does not address abstract concepts like creativity or consciousness. Whether this framework will establish itself as an academic and industry standard, or whether new measurement methodologies will emerge, remains to be seen.

#deepmind-series #AGI #인지프레임워크 #LLM #ai-연구 #벤치마크 #ai-안전성