An interactive essay

Dropout.

Watch a neural network overfit, then watch it dream.

On Erik Hoel's Overfitted Brain Hypothesis — and what biological dreams and machine-learning regularization may have in common.

Scroll to begin

Act I

The Brain That Remembered Too Much

Erik Hoel begins with a question every neuroscientist has tried and failed to silence: why do we dream? The imagery is bizarre, the recombinations are sloppy, the half-faces look like nothing in particular. None of it pays its evolutionary rent in any obvious way, and after a century of careful science the field is still arguing about whether dreams do anything.

Hoel's answer borrows a vocabulary from machine learning. A student who memorizes every practice question fails the real exam. A network that minimizes training loss without restraint fits noise as eagerly as signal. The brain, Hoel argues, has the same problem — and may have evolved the same kind of fix.

The demonstration below is the failure mode in miniature. A small network is asked to separate two interlocking spirals. Give it too much capacity and too many epochs and the boundary becomes ornate, baroque, tortured — a curve that hugs every training point at the cost of any generality at all.

loading demo…

Act II

What Networks Forget

Training a model means showing it labeled examples and adjusting its weights so its predictions get closer to the answers. The quantity it tries to minimize is a loss — for a binary classifier, the binary cross-entropy:

$; L = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} lo g \overset{y}{^}_{i} + (1 - y_{i}) lo g (1 - \overset{y}{^}_{i})]$

That number falls steadily on the data the network was shown. The interesting question is what happens to the same loss on data the network has not seen — the validation set. For a healthy network, the two curves track each other. For an overfitting one, they part ways: training loss keeps dropping while validation loss bottoms out and starts climbing back.

Widen the network and the gap widens with it. Capacity that isn't restrained gets spent on memorization.

loading demo…

Act III

The Hoel Hypothesis

So far the argument has been mechanical. Networks overfit; regularization stops them. The leap Hoel asks the reader to consider is whether the brain, faced with a structurally identical problem, may have arrived at a structurally identical solution — and that this solution is what we experience, every night, as dreaming.

The argument turns on a peculiar property of any system that learns from experience. Such a system must extract regularities from a finite, noisy sample and then apply those regularities to a future it has not yet seen. The danger — call it the curse of intelligence — is that the more flexible the learner, the more easily it learns the noise along with the signal. A student who memorizes every example in their textbook will fail an exam built from new ones. A network trained too long on its training set will fit every datapoint and generalize nothing. The brain, Hoel argues, is exactly such a system, and faces exactly this problem.

The standard responses in neuroscience have framed dreaming as memory consolidation, threat simulation, emotional processing, or epiphenomenal noise from the housekeeping work of sleep. None has held up cleanly against the breadth of dream phenomenology — the bizarreness, the recombinations of unrelated experiences, the half-faces, the impossible architectures, the persistent thematic concerns. There is a tension in the consolidation view in particular: if dreams are about storing or rehearsing experience, why are they so unfaithful to it? Why is the dreaming brain so determined to mangle the very experiences it is supposed to be preserving?

Hoel's answer is that the mangling is the point.

Modern deep networks face the overfitting curse and have grown a toolkit to defeat it. Dropout silences random neurons during training, forcing the network to build representations that cannot rely on any single path through the architecture. Noise injection perturbs the inputs themselves, teaching the network to ignore variation that does not predict the label. Data augmentation generates rotated, scaled, distorted variants of the training examples so the network sees a wider distribution than was ever in the dataset. The three share a structural commitment: that learning generalizes better when the learner is forced to confront slightly-broken versions of the world it expects.

On the Overfitted Brain Hypothesis, biological dreams discharge this same structural function. The wild distortions of dream content — the sudden topic-shifts, the people who are also other people, the geographies that fold in on themselves — are precisely the kind of perturbed input a sleeping brain would generate if its goal were to prevent its waking circuits from over-specializing on the day's experience. The brain dreams in order to keep generalizing.

The hypothesis makes specific predictions. Sensory deprivation, which narrows the variety of incoming experience, should drive the brain to compensate with more vivid imagery — and the prisoner's-cinema reports of solitary confinement, the hallucinations of long-haul truckers, and the imagery of flotation-tank subjects all fit. Sleep deprivation, which cancels the dream-augmentation pass, should impair generalization more than rote memory; experimentally, exactly that dissociation appears. Children, still building the structure of their world, dream proportionally more than adults; the elderly, whose models are settled, dream less. Even the within-night architecture of REM and non-REM sleep — concentrated dreaming late in the cycle, after consolidation has done its day's work — fits the picture of a generalization pass that runs once the storage pass is complete.

None of this proves the hypothesis. The neuroscience of dreaming is genuinely contested; the mechanisms of generalization in deep networks are themselves an active frontier; and a structural analogy between two systems is not, on its own, evidence that they evolved for the same reason. But the analogy is not idle. It is the kind of structural parallel that, in the history of science, has tended to be load-bearing. Vision evolved more than once. Wings evolved many times. Solutions that work get rediscovered.

The reader is now in a position to do something the author cannot do on their behalf. Scroll on to Act IV. Train the network without dreams; train it with them. Watch the boundary smooth. Then ask whether a brain — a hundred billion neurons, four hundred million years of selection pressure behind it — would really have failed to discover the same trick.

Act IV

Teaching the Network to Dream

The same network that memorized its training set can be coaxed back into generality. The intervention has names — dropout, noise injection, augmentation — but the structure is uniform. Inject a controlled hallucination. Let the network see a world that is almost-but-not-quite the world it knows. Watch the decision boundary smooth.

Each technique below is offered as a structural analogue to a property of biological dreaming. The point is not that the brain runs dropout in REM sleep. The point is that the shape of the fix — perturb the input or the representation just enough to force generalisation — appears in both systems, and may be the reason both systems work at all.

loading demo…

Act V

What This Means (And Doesn't)

The analogy is suggestive, not proof. The Overfitted Brain Hypothesis is contested; the neuroscience of dreaming is unsettled; and the way deep networks generalize is itself still under active investigation.

The strongest criticism is that the parallel may be too neat. Regularization in machine learning runs during training, on labelled inputs, against an explicit loss. Dreaming runs offline, on no inputs at all, against no obvious objective. The brain's learning rule is not gradient descent in any tidy sense, and the cost function it might be minimizing — if there is one — does not behave like a deep-network loss. A structural analogy is not a mechanistic one, and the gap between the two is exactly the gap a future theory will have to close.

Even granting all of that, something in the framing seems worth keeping. Until recently the bizarreness of dreams was treated almost universally as a problem — the phenomenon any theory of dreaming had to explain away. The Overfitted Brain Hypothesis, whatever the eventual verdict on its specifics, reframes the bizarreness as evidence of a function rather than against one. If dreams were faithful replays, they would not work. The distortion is the work.

That reframing is the part most likely to survive. The structural parallel between a brain that over-specializes on the day's experience and a network that over-specializes on its training set is real even if dreaming turns out to do something other than fix it. Real enough, anyway, that watching a small network overfit and then watching it dream feels — to one reader, at least — like looking at the inside of an idea.

The rest is yours.