Notes · Observations · Chain of Thought

INFORMATION THEORY · COGNITION

Your Mind Is a Codec, Not a Camera

The hard math of lossy compression explains memory, trauma, product design, storytelling, education, and AI.

A black-and-white typographic plate reading "Your Mind Is a Codec, Not a Camera" in serif type on a white background.

The hard math of lossy compression explains memory, trauma, product design, storytelling, education, and AI. You are not who you were. You are what your brain could afford to keep.


The witness is sure about the broken glass.

She watched the cars stop. She heard the metal crunch. Later, a lawyer asks how fast the cars were going when they smashed into each other. The verb does quiet surgery. The collision becomes more violent in memory. Glass appears where none was ever filmed.

Nothing supernatural happened.

Her brain did what it always does when the world gives it too much data and too little time. It saved the useful shape of the event. Then it rebuilt the scene from that shape when asked to testify. The detail felt remembered because the reconstruction felt coherent.

This is the first insult cognitive science delivers to common sense.

Memory is not a shelf. Perception is not a feed. Attention is not a neutral spotlight. Your mind is not a camera whose pictures fade with age. It is closer to the software inside the camera, deciding what to throw away before you ever see the image.

The compression is not a bug. It is the only reason the system runs at all (Loftus and Palmer, 1974).


The Theorem That Explains the Mind

Claude Shannon turned communication into a science by separating a message from its meaning. A system does not need to understand a sentence to send it. It needs to represent enough of the possible messages so a receiver can recover the intended one. In 1948 he gave us entropy, channel capacity, and the bit. In 1959 he asked the sharper question. What if exact recovery is too expensive, and a good-enough version is allowed?

That question is rate-distortion theory.

Two terms do all the work.

Rate is the average number of bits you spend per symbol. Distortion is the error you permit, priced by a loss function.

The theorem states a brutal tradeoff. Lower distortion costs more bits. Lower rate costs more error. There is no exit.

Given a source XX, a reconstruction X^\hat{X}, and a distortion measure d(X,X^)d(X, \hat{X}), the rate-distortion function is:

R(D)=minp(x^x):  E[d(X,X^)]DI(X;X^)R(D) = \min_{p(\hat{x}|x):\; \mathbb{E}[d(X, \hat{X})] \leq D} I(X; \hat{X})

Plain English. The minimum number of bits per symbol you must spend, if you refuse to let your average distortion exceed DD.

The choice of dd is everything.

A distortion function is not neutral. It is a map of what the system cares about. A missing pixel in a blue wall is cheap. A missing decimal in a bank balance is expensive. The same bit budget behaves completely differently once you decide which errors count.

A JPEG is the household version. It does not save every photon from the room. It throws away distinctions your eye is unlikely to punish. It keeps edges and structure. It rebuilds an image that is useful enough. A bad JPEG reveals the bargain in blocks and smears. A good JPEG hides the loss.

The theorem does not say distortion is good. It says distortion is the price of finite storage, finite bandwidth, and finite time.

The question is never whether to lose information. The question is which information to lose, for what task, at what cost.

This is the math the brain runs on.


The Brain Is Not Receiving. It Is Choosing.

A camera receives. A codec chooses.

A codec asks what must be preserved, what can be blurred, and what errors will hurt the job. The brain looks much more like the second system.

Chris Sims has argued that rate-distortion theory gives a computational account of human perception. The perceiver minimizes costly error under channel-capacity constraints. Bates and Jacobs extended the same argument to perceptual memory. Barlow’s efficient-coding hypothesis got there decades earlier. Sensory pathways do not just pass signals along. They recode redundant input into more informative signals. Predictive coding and Friston’s free-energy principle add the active version. The brain predicts, compares, updates, and spends attention where the error matters.

The limits show up everywhere.

Working memory holds about four chunks under clean conditions (Cowan, 2001). Visual long-term memory can hold many objects with surprising detail, but not as raw image files. It stores categories, relations, task-shaped features. Brady, Konkle, and Alvarez kept emphasizing this point. Memory capacity cannot be understood by counting items alone. You have to ask what kind of representation is being stored.

This is why ordinary experience feels richer than experiments say it should.

You can look at a kitchen and feel as if you possess the whole scene. Then someone asks about the number of mugs, the position of the sponge, the title on the cookbook. The apparent photograph tears. Most of the scene was never stored as pixels. It was stored as kitchen, counter, objects, threat absent, action possible.

Sterling and Laughlin make the engineering pressure explicit. Nervous systems live under budgets of time, space, energy, and information. Spikes cost. Synapses cost. Attention costs. A full-fidelity model of the world would be too slow to save an animal from anything.

So the mind does what any competent compression machine does. It preserves the differences that change action. It lets the rest decay into gist.

Your perception is not a feed. It is a forecast you keep correcting.


Six Domains This Rewires

1. Memory

Your past keeps editing itself because recall is not playback. It is reconstruction under present constraints.

Bartlett’s 1932 “War of the Ghosts” study showed people retelling an unfamiliar Native American story by shortening it, normalizing strange details, and pulling it toward their own schemas. Schacter later organized memory’s failures into patterns. Misattribution. Suggestibility. Bias. Persistence. He argued these flaws are often by-products of adaptive memory. Reconsolidation research adds the biological mechanism. After retrieval, a consolidated memory can enter a labile state and be stored again.

The camera model says this is corruption.

The codec model says it is maintenance.

The most useful memory is not the one closest to yesterday. It is the one best indexed for tomorrow. It compresses repeated experience into warnings, affordances, names, and causes. A memory that cannot update is not faithful. It is dead storage.

2. Trauma

Trauma looks like a compression failure, but that phrase is too weak.

The encoding-priority hypothesis is sharper. Trauma is high-distortion-cost data the codec refuses to compress in the ordinary way. Most pain can be reduced to gist. That person betrayed me. That road is dangerous. That room felt unsafe. But some events carry a prediction that bad compression may kill you. The system keeps fragments, sensations, postures, threat cues at high priority. They return not because they are perfect video, but because they are treated as too costly to blur.

Van der Kolk framed traumatic memory as something that leaves marks in brain and body. Predictive-processing accounts of PTSD describe a nervous system whose threat priors keep dominating incoming evidence.

The camera wants the past behind glass. The trauma codec keeps the file open.

This is also why bare reassurance can fail. The conscious sentence says the danger is over. The lower codec still prices the distortion as catastrophic. It will not downsample until new evidence arrives in the channels it trusts.

3. Product Design

A product fails when its interface mirrors the engineer’s full-fidelity model instead of the user’s lossy schema.

Take file permissions. The backend may know users, groups, inherited roles, organization policies, external domains, pending invites, audit states. The user knows one question.

Who can see this?

Show the database model and you create cognitive noise. Show the user’s compressed action model and the product becomes legible.

A good permissions screen is not a camera pointed at the architecture. It is a codec. It collapses the system into distinctions that match the user’s distortion function. Private. Shared with these people. Public to this link. Risky.

The best interface is not the one with the most truth visible. It is the one that preserves the truth users need to act, without making them reconstruct the machine.

Every dashboard faces the same choice. You can expose the warehouse, or you can compress it into the few state changes a decision-maker can safely act on. Most teams pick the warehouse because it feels honest. It is actually a refusal to do design.

4. Storytelling

Narratives win because they are pre-compressed cognition.

A story turns the blooming confusion of experience into agents, motives, obstacles, reversals, consequences. That is why Bartlett’s participants changed “War of the Ghosts” when they retold it. The original violated their schema, so the mind compressed it toward a familiar causal shape.

This is not just a laboratory curiosity. It is the reason myths travel, brands stick, and gossip outruns documentation.

A story is not a camera record of events. It is a codec that says: keep this conflict, keep this desire, keep this consequence, discard the rest.

Bad storytelling asks the listener to compress from scratch. Good storytelling delivers a small file that expands inside the listener’s existing model.

Stories also choose the acceptable distortion. The hero can stand for a generation. The dinner can stand for a marriage. The monster can stand for a fear no essay can point at cleanly.

5. Education

Teaching is choosing a distortion function.

The teacher decides which errors are acceptable at each stage. A beginner learning calculus does not need epsilon-delta rigor on day one. That precision may be true. It is also destructive. The learner first needs a compressed model. A derivative is a local rate of change. An integral is accumulated quantity.

Later, the codec can spend more bits.

This is why concept ladders beat content dumps. A dump mistakes education for high-resolution transfer. A ladder accepts controlled distortion, then tightens fidelity as the student’s channel capacity expands.

The aim is not to make ideas smaller. The aim is to make the first approximation useful enough that the next, better approximation has somewhere to attach.

A teacher is a codec designer. The curriculum is a sequence of lossy encodings that become less lossy as the student gains bandwidth. The sequence matters because early distortions become scaffolds, and scaffolds become habits.

6. AI

LLMs and VAEs are not metaphors for the brain in the loose magazine sense. They are members of the same family of solutions to the same problem. Preserve useful structure under constraint.

Tishby’s Information Bottleneck formalizes learning as a short code that keeps information relevant to a target:

minp(tx)  I(X;T)βI(T;Y)\min_{p(t|x)} \; I(X; T) - \beta \, I(T; Y)

Compress the input XX into a representation TT as much as possible, but keep the information about the target YY. That is one equation, and it describes a startling amount of what brains and models actually do.

A VAE sends high-dimensional data through a latent bottleneck and reconstructs a plausible output. A language model learns by predicting, and prediction has a deep equivalence with compression. Delétang and colleagues showed in 2024 that language modeling and compression are formally interchangeable lenses on the same objective.

This does not make machines conscious. It does not make brains transformers. It does dissolve the mystical gap.

A hallucination is what lossy reconstruction looks like when the codec is asked for exact provenance. The AI is not a camera either. It is a compression machine trained on a different distortion function.

That distinction matters more than people think.

A model optimized to predict the next token is not automatically optimized to preserve sources, causal truth, or moral stakes. Change the distortion function and you change the mind-like behavior. If you want a system that remembers correctly, you do not ask it to remember harder. You change what it is paid to keep.


The Same Insight Across Six Surfaces

It is worth pausing to see how tightly the six domains rhyme.

Memory throws away pixels and keeps gist, because tomorrow’s action is the loss function. Trauma refuses to downsample, because the codec has priced the error as fatal. A good interface throws away the schema and keeps the action. A good story throws away the timeline and keeps the cause. A good curriculum throws away precision early and adds it later. A good model throws away the input and keeps the prediction-relevant signal.

Six different surfaces. One question underneath all of them.

What is the loss function, and is the system spending its bits in the right places?

That is the only question worth asking once you take the codec view seriously.


The Uncomfortable Implication

Your certainty is not a checksum.

It is often the felt smoothness of a reconstruction. This should not turn you into a relativist, because the world still pushes back. Some compressions predict better, heal better, teach better, design better, and survive contact with reality better.

But it should make you suspicious of the inner glow that says, “I remember exactly,” “I saw it clearly,” or “that is just how I am.”

The mind hides the labor of compression from itself. It shows you the rebuilt image, not the missing bits.

Intelligence is not the fantasy of becoming lossless. It is the discipline of asking three questions, in order.

What is my current compression optimized for. What is it throwing away. Does that bargain still serve the life I am trying to build.

The next time you hold up a phone and take a picture of a sunset, remember that even the photograph is already a negotiation. The sensor catches more than the file keeps. The file keeps more than your memory will carry. Between those two losses sits agency. You choose what deserves higher fidelity, and what can be safely left as glow.

Then tomorrow you will not remember the sky.

You will remember the meaning your mind could afford to save.


The One-Line Version

The mind survives by losing information intelligently.

Every other claim in cognitive science is a footnote to that sentence.


Sources

Shannon (1948), A Mathematical Theory of Communication. Shannon (1959), Coding Theorems for a Discrete Source With a Fidelity Criterion. Cover and Thomas (1991), Elements of Information Theory, Ch. 13. Sims (2016, 2018) on rate-distortion and human perception. Bates and Jacobs (2020) on efficient data compression in perception and perceptual memory. Barlow (1961) on efficient sensory coding. Sterling and Laughlin (2015), Principles of Neural Design. Friston (2010) on the free-energy principle. Cowan (2001) on the magical number four. Brady, Konkle, and Alvarez (2011) on visual memory capacity. Bartlett (1932), Remembering. Loftus and Palmer (1974). Loftus (2005) on misinformation effects. Schacter (1999), The Seven Sins of Memory. Nader, Schafe, and LeDoux (2000) on reconsolidation. Van der Kolk (2014), The Body Keeps the Score. Wilkinson, Dodgson, and Meares (2017) on predictive processing and trauma. Tishby, Pereira, and Bialek (2000), The Information Bottleneck Method. Kingma and Welling (2013) on variational autoencoders. Delétang et al. (2024), Language Modeling Is Compression.