Google just dropped what could be the sequel to the foundational “Attention is All You Need” paper. This research could solve AI’s most stubborn challenge: catastrophic forgetting.
When AI models learn something new, they tend to drastically forget what they previously learned. Humans don’t work this way—and now Google Research has a solution.
What Is Nested Learning?
Nested Learning is a new machine learning paradigm that treats models as a system of interconnected optimization problems running at different speeds—just like how our brain processes information.
The Core Problem
Current LLMs don’t learn from experiences; they remain limited to what they learned during training. They can’t learn or improve over time without losing previous knowledge.
As research shows, when neural networks are trained sequentially on multiple tasks, weights important for Task A are changed to meet objectives of Task B—causing abrupt knowledge loss.
How It Works: The Brain-Inspired Approach
Nested Learning changes this by viewing the model’s architecture and training algorithm as the same thing—just different “levels” of optimization.
| Traditional AI | Nested Learning |
|---|---|
| Binary memory (short/long-term) | Spectrum of memory modules |
| Static after training | Continuously self-modifying |
| New learning overwrites old | Multi-tempo updates preserve both |
| Single optimization process | Nested, multi-level optimization |
Like the human brain, which runs fast circuits for immediate processing and slower ones for consolidating patterns, Nested Learning creates different update frequencies for different knowledge types.
Meet Hope: The Proof of Concept
The paper introduces Hope, a proof-of-concept architecture that demonstrates this approach:
- Outperforms modern recurrent models on language modeling tasks
- Handles long-context memory better than state-of-the-art models
- Uses “continuum memory systems” that update at different frequencies
- Self-modifying architecture that learns its own update rules
This is similar to how our brain manages short-term and long-term memory simultaneously—what Google calls a “Continuum Memory System” (CMS).
Real Performance Benchmarks
According to benchmark results, Hope demonstrates:
| Task Type | Metric | Result |
|---|---|---|
| Language Modeling | Perplexity | Lower than Transformers |
| Common-sense Reasoning | Accuracy | Higher than recurrent models |
| Needle-in-Haystack | Memory retrieval | Superior long-context handling |
| Continual Learning | Knowledge retention | Minimal catastrophic forgetting |
Why This Matters Now
As AI researcher Andrej Karpathy noted, AGI is still a decade away mainly because “no one has been able to develop an AI system that learns nonstop and constantly while working on its limitations every time, like a feedback loop does.”
Nested Learning directly addresses this gap. According to early analysis, Hope’s ability to mitigate catastrophic forgetting brings us closer to closing the gap between the static nature of current LLMs and the dynamic, adaptive intelligence of the human brain.
The Technical Breakthrough
What makes Nested Learning revolutionary is its reframing of fundamental concepts:
- Backpropagation → Associative memory mapping data to errors
- Attention mechanisms → Memory systems mapping tokens to context
- Optimizers (Adam, SGD) → Memory modules compressing gradients
- Model architecture → Nested optimization at multiple scales
This unified view allows for “deep optimizers” that move beyond simple dot-product similarities, incorporating more robust loss metrics like L2 regression.
What’s Next
We might finally be closing the gap between AI and the human brain’s ability to continually learn. The implications extend to healthcare, robotics, education, and conversational AI—any domain requiring systems that adapt without forgetting.
Read the full paper: Nested Learning: The Illusion of Deep Learning Architectures

