Home AI Tools and Trends

Google’s Nested Learning: “Attention Is All You Need V2”

November 23, 2025

Google just dropped what could be the sequel to the foundational “Attention is All You Need” paper. This research could solve AI’s most stubborn challenge: catastrophic forgetting.

When AI models learn something new, they tend to drastically forget what they previously learned. Humans don’t work this way—and now Google Research has a solution.

What Is Nested Learning?

Nested Learning is a new machine learning paradigm that treats models as a system of interconnected optimization problems running at different speeds—just like how our brain processes information.

The Core Problem

Current LLMs don’t learn from experiences; they remain limited to what they learned during training. They can’t learn or improve over time without losing previous knowledge.

As research shows, when neural networks are trained sequentially on multiple tasks, weights important for Task A are changed to meet objectives of Task B—causing abrupt knowledge loss.

How It Works: The Brain-Inspired Approach

Nested Learning changes this by viewing the model’s architecture and training algorithm as the same thing—just different “levels” of optimization.

Traditional AI	Nested Learning
Binary memory (short/long-term)	Spectrum of memory modules
Static after training	Continuously self-modifying
New learning overwrites old	Multi-tempo updates preserve both
Single optimization process	Nested, multi-level optimization

Like the human brain, which runs fast circuits for immediate processing and slower ones for consolidating patterns, Nested Learning creates different update frequencies for different knowledge types.

Meet Hope: The Proof of Concept

The paper introduces Hope, a proof-of-concept architecture that demonstrates this approach:

Outperforms modern recurrent models on language modeling tasks
Handles long-context memory better than state-of-the-art models
Uses “continuum memory systems” that update at different frequencies
Self-modifying architecture that learns its own update rules

This is similar to how our brain manages short-term and long-term memory simultaneously—what Google calls a “Continuum Memory System” (CMS).

Real Performance Benchmarks

According to benchmark results, Hope demonstrates:

Task Type	Metric	Result
Language Modeling	Perplexity	Lower than Transformers
Common-sense Reasoning	Accuracy	Higher than recurrent models
Needle-in-Haystack	Memory retrieval	Superior long-context handling
Continual Learning	Knowledge retention	Minimal catastrophic forgetting

Why This Matters Now

As AI researcher Andrej Karpathy noted, AGI is still a decade away mainly because “no one has been able to develop an AI system that learns nonstop and constantly while working on its limitations every time, like a feedback loop does.”

Nested Learning directly addresses this gap. According to early analysis, Hope’s ability to mitigate catastrophic forgetting brings us closer to closing the gap between the static nature of current LLMs and the dynamic, adaptive intelligence of the human brain.

The Technical Breakthrough

What makes Nested Learning revolutionary is its reframing of fundamental concepts:

Backpropagation → Associative memory mapping data to errors
Attention mechanisms → Memory systems mapping tokens to context
Optimizers (Adam, SGD) → Memory modules compressing gradients
Model architecture → Nested optimization at multiple scales

This unified view allows for “deep optimizers” that move beyond simple dot-product similarities, incorporating more robust loss metrics like L2 regression.

What’s Next

We might finally be closing the gap between AI and the human brain’s ability to continually learn. The implications extend to healthcare, robotics, education, and conversational AI—any domain requiring systems that adapt without forgetting.

Read the full paper: Nested Learning: The Illusion of Deep Learning Architectures

What Is Nested Learning?

The Core Problem

How It Works: The Brain-Inspired Approach

Meet Hope: The Proof of Concept

Real Performance Benchmarks

Why This Matters Now

The Technical Breakthrough

What’s Next

RELATED ARTICLESMORE FROM AUTHOR

Ultimate Guide: Top MCP Servers for LLMs in 2025

Join the conversation

RELATED ARTICLES MORE FROM AUTHOR