Artificial intelligence faces a persistent challenge: as models learn new tasks, they often forget previous knowledge, a problem commonly known as catastrophic forgetting. Unlike the human brain, which flexibly adapts and retains information through neuroplasticity, most AI models struggle to maintain a balance between learning and memory. A promising breakthrough is emerging in the form of Nested Learning, which aims to bridge this gap by drawing inspiration from biological learning processes.
Where Traditional Models Fall Short
Despite their remarkable achievements, modern machine learning models, including large language models (LLMs), falter when it comes to continual learning. These systems typically update their parameters with new information but lose track of earlier insights. Most current solutions focus on modifying architectures or refining optimization, yet they often treat these aspects separately. This division has slowed progress toward robust, self-improving AI.
The Nested Learning Approach: Unifying Architecture and Optimization
Nested Learning reimagines machine learning models as networks of interconnected, nested optimization problems. Each "level" in this system has its own context flow and update frequency, echoing the brain’s multi-frequency learning. By treating architecture and optimization as a unified process, Nested Learning enables more nuanced, persistent adaptation and opens new possibilities for designing resilient AI systems.
Associative Memory: Unpacking Model Components
- Backpropagation functions like associative memory, linking data points to their error values and assessing the surprise of each input.
- Key architectural elements, such as transformers’ attention modules, also act as associative memory, learning how different parts of data relate to each other.
Viewing models as collections of modular learning units, with individual update rhythms, offers greater flexibility in managing knowledge retention and adaptation.
Introducing Continuum Memory Systems
Nested Learning extends the idea of memory well beyond the short- and long-term memory found in standard transformers. It proposes a continuum memory system (CMS), where multiple memory modules update at different speeds. This design mirrors the layered nature of human memory, enabling AI to retain information over longer periods and at varying depths.
Hope: A Practical Test of Nested Learning
To showcase Nested Learning in action, researchers developed Hope: a recurrent architecture that evolves its own memory system. Unlike earlier models, which were limited to just two update levels, Hope supports unlimited nested learning layers and leverages CMS blocks to handle longer context windows. Its groundbreaking self-referential process allows it to adapt and optimize its own memory management, potentially reaching infinite layers of learning adaptation.
How Hope Performs
- Hope surpassed leading recurrent models and transformers in language modeling and common-sense reasoning, achieving lower perplexity and higher accuracy.
- In tasks requiring comprehension of long contexts, such as "Needle-In-a-Haystack" challenges, Hope’s CMS efficiently managed information sequences and outperformed models like Titans and Samba.
- Nested Learning’s deep optimizers, which view optimization as associative memory, exhibited greater resilience to noisy data by prioritizing robust loss measurements.
Implications for AI’s Future
The Nested Learning paradigm signals a major leap in AI design. By fusing architecture and optimization into a unified, multi-level system, it allows models like Hope to emulate the continuous, adaptive learning of biological brains. This not only addresses catastrophic forgetting but also lays the groundwork for more dynamic, self-improving AI.
Researchers encourage the AI community to explore Nested Learning as a foundation for next-generation models - ones that learn continuously, remember deeply, and adapt on their own.

Nested Learning: How a New AI Paradigm Tackles Catastrophic Forgetting