Notes · Observations · Chain of Thought

PSYCHOLOGY · SELF AWARENESS · SYSTEMS

You Are Not Focused. You Are Overfit.

The two ways to raise signal over noise look identical inside the room. They diverge violently the moment you leave it.

A black-and-white typographic plate reading "You Are Not Focused. You Are Overfit." in serif type on a white background.

The two ways to raise signal over noise look identical inside the room. They diverge violently the moment you leave it.


Most people who say they have built a focused life have not built focus. They have built isolation with productivity metrics layered on top. From inside the room, the two are indistinguishable. Output is high. Distractions are gone. The work moves. Everything reads as a success state.

Then the room ends.

A queue at the airport. A neighbor at the door. Ten minutes of small talk after a meeting that ran over. An afternoon with no agenda. And the system that looked so formidable inside the room collapses outside it. The person describes themselves as drained, overstimulated, in need of recovery. They go home and rebuild the room.

What happened is not a personality flaw.

It is a math problem.

The Two Ways

There are two ways to raise the signal-to-noise ratio of any adaptive system. You can subtract noise, or you can widen the channel. The two strategies produce identical numbers as long as the input distribution stays the same. They diverge the instant it changes. One survives the change. The other does not.

The first one is what most knowledge workers have been quietly building for the last decade.

It is not focus. It is overfit.

Where the Math Comes From

Claude Shannon defined the signal-to-noise ratio in 1948. He proved that the carrying capacity of any channel is bounded by its bandwidth and the ratio of signal power to noise power.

C=Blog2(1+SN)C = B \log_2\left(1 + \frac{S}{N}\right)

C is how much information the channel can carry. B is its bandwidth. S over N is the ratio of useful signal to random noise. There are exactly two ways to push C up. Shrink N (suppress the noise) or grow B (widen the channel). The math is indifferent. Both work as long as the noise behaves the way you assumed.

The catch lives one floor down, in feedback theory. John Doyle proved in 1978 that the mathematically optimal feedback controller, the one that crushes the disturbances it knows about, has zero guaranteed stability margin against the disturbances it does not. The Bode Sensitivity Integral made the result general.

0lnS(jω)dω=πRe(pi)\int_0^\infty \ln |S(j\omega)| d\omega = \pi \sum \text{Re}(p_i)

S is the sensitivity of the closed-loop system to disturbances at frequency ω. The integral says every decibel of suppression you buy at one frequency must reappear, with interest, at another. Engineers call it the waterbed effect. Push down here. It pops up there.

You cannot delete noise from a system. You can only choose where it breaks.

Statistical learning calls this trade-off bias-variance. A model that perfectly fits its training set has zero bias and unbounded variance under distributional shift. The fix is regularization. You inject noise during training so the model cannot afford to memorize. You accept worse training scores in exchange for survival on data it has never seen. Arjovsky’s Invariant Risk Minimization (2019) makes the trade explicit. Optimize for features that are stable across environments, not for features that maximize accuracy in one of them.

Same trade-off. Different vocabulary.

Strategy A subtracts noise. Wins this epoch. Strategy B widens the channel. Survives the next one.

Pick one. The math will not let you have both.

The Same Shape Everywhere

Control theory. Doyle’s 1978 LQG counter-example is the cleanest case. You design a controller that minimizes a quadratic cost function under a known plant and Gaussian noise. The result is mathematically perfect inside the assumption. Outside it, even tiny model errors produce unbounded oscillation. Robust control (Zhou and Doyle, 1996) refuses to be optimal. It accepts worse nominal performance to keep stability margins. The plane that survives turbulence is not the most efficient plane. It is the one that refused to be the most efficient.

Machine learning. A network trained to convergence on a clean benchmark posts loss numbers that look like a triumph. Run it on data with shifted lighting, shifted demographics, shifted vocabulary, and the loss explodes. Data augmentation deliberately damages the training set. It lowers the ceiling. It also raises the floor on out-of-distribution behavior. The model that wins benchmarks is rarely the model that ships.

Cybernetics. Ross Ashby’s Law of Requisite Variety (1956) is the most general form of the result. Only variety can absorb variety. A regulator with N internal states can stabilize an environment with up to N states of disturbance. Anything past N routes to system failure. There are two ways to keep N greater than the environment. Shrink the environment, or grow the regulator. Shrinking is cheaper at first. It is also a debt with compounding interest. Every state the regulator avoids learning shrinks its internal variety, and the environment it can handle shrinks with it.

Physiology. The hygiene hypothesis is the same shape in a body. Children raised in sterile environments show higher rates of allergy, asthma, and autoimmune dysregulation. Their immune systems were never trained on harmless inputs. Without training, they cannot tell harmless from hostile, and they attack both. Trained immunity (Netea, 2020) demonstrates the inverse mechanism. Controlled exposure to microbial noise epigenetically reprograms innate immune cells, expanding capacity for novel pathogens. The strong immune system is not the cleanest one. It is the one that has fought a thousand small fires.

Clinical psychology. The same shape appears in the place it matters most for a person. Hayes’ work on experiential avoidance (1996) names the pattern. When something uncomfortable arises, alter the environment to remove the input. Salkovskis showed that the same safety behaviors that produce immediate relief permanently prevent the brain from updating its threat model. Foa and Kozak’s emotional processing theory and Craske’s inhibitory learning model (2014) describe the mechanism. Exposure without retreat builds a competing non-threat association. Exposure with retreat does not. Daniel Siegel’s window of tolerance is the operating envelope of the nervous system. Every act of avoidance narrows it. Every act of bounded exposure widens it.

Five disciplines. Same theorem in five accents.

You can shrink the world. Or you can grow yourself.

One Tradeoff, Five Surfaces

The structural invariant underneath all of these is the bias-variance trade-off under non-stationary distributions. Every adaptive system has a finite resource budget. Every system that spends its budget driving bias to zero ends up with unbounded variance the first time the world moves. Every system that spends part of its budget controlling variance accepts more bias. The first one wins every static benchmark. The second one wins every distributional shift.

Bode said it in feedback. Ashby said it in regulators. Hayes said it in nervous systems. Doyle said it in airplanes. Arjovsky said it in deep networks. Holling (2001) said it in ecosystems and called the failure mode the rigidity trap. Perrow (1984) said it in organizations and called it the normal accident.

It is one tradeoff. Five surfaces.

The system that maximizes performance in a frozen environment cannot survive a changing one. This is not a moral claim. It is a theorem.

The Diagnostic Nobody Runs

If peak performance inside the room cannot distinguish a robust system from a fragile one, you need a different measurement. The measurement is portability. The first derivative of performance with respect to distributional shift.

Portability=1PinPoutPin\text{Portability} = 1 - \frac{|P_{\text{in}} - P_{\text{out}}|}{P_{\text{in}}}

P_in is performance inside the controlled envelope. P_out is performance the instant you leave it. A Strategy A system shows a sharp delta and a near-zero score. A Strategy B system barely registers the move.

The metric refuses to be impressed by output you produced in a room you cannot leave.

I have come to think of portability as the only honest performance metric for a human operating system. Everything else is in-distribution accuracy, and in-distribution accuracy says almost nothing about a system designed to operate across a life. The interesting question is never how good you are inside the room. The interesting question is what your output looks like in the airport, in the unscheduled afternoon, in the conversation that has no agenda.

Four behavioral tests, ranked roughly by how much they hurt.

How long can you sit in an unstructured public environment with no task, no device, no schedule, without leaving?

How long does it take to reset after a ten-minute unscheduled conversation?

What is the activation cost of leaving the optimized environment for no instrumental reason?

Can you do moderate cognitive work in an airport, a café, a noisy room?

These are not personality tests. They are stress tests. They tell you whether you have built focus or whether you have built a clean room with focus-shaped output coming out of it.

I started running them on myself. The results were not flattering.

The Uncomfortable Part

The deep-work playbook of the last decade is Strategy A in evangelist clothing. Noise-cancelling headphones. Asynchronous communication. Single-tasking. Notifications off. Inputs controlled. Schedule defended like a fortress. Every one of these is a noise-suppression operation. Every one of them raises in-distribution SNR. Every one of them, in isolation, is correct.

The mistake is never any individual move. The mistake is letting Strategy A stop being a state and start being an identity.

A short, deliberate retreat into a controlled environment is rational specialization. The math endorses it. Newport endorses it. Every monastic tradition for two thousand years endorses it. The cost of robustness is real. H-infinity controllers do underperform LQR on the nominal plant. Adversarially trained networks do post lower clean accuracy. Robustness has a price tag, and refusing to pay it in stable conditions is not a sin.

A permanent retreat is something else.

A permanent retreat is not specialization. It is a rigidity trap.

The signal that the line has been crossed is not a feeling. It is a behavior. The voluntary becomes involuntary. The room shifts from a tool to a precondition. The metabolic cost of leaving rises. The metabolic cost of staying drops. The window of tolerance contracts every time it is not used. Salkovskis’ safety behaviors stop being a tactic and become an architecture.

The cognitive worker who can write production-grade code for six hours straight and then dysregulates at ten minutes of small talk is not a paradox. They are a textbook Strategy A system. High in-distribution SNR. Near-zero portability. They look formidable from the inside. From the outside they look like a high-end controller that has never been tested against unmodeled dynamics.

The fix is also not a feeling. It is a protocol. Inhibitory learning theory says new associations are built by exposure without retreat. The dose has to be graded. The contexts have to be varied so the nervous system does not overfit to a single “safe” public space. The exposure has to be sustained past the autonomic spike, or the spike itself becomes the lesson the system encodes.

Trust the portability metric. The feeling lags.

You do not need to become someone who loves chaos. You need to become someone who does not break in its presence. That is a smaller and more honest goal than the transformation rhetoric usually attached to it.

The point is not to romanticize noise. Most noise is worthless. Most clean rooms exist because clean rooms are correct (a product manager’s intuition, not a categorical claim about every life). What I am pointing at is the specific moment Strategy A stops being a deliberate choice and starts being the only thing the system knows how to do. That moment is invisible from inside the room.

The portability metric is the only thing that catches it.

The One-Line Version

You are not focused. You are overfit. Independence that cannot leave the room is not independence. It is a clean room with a person inside.


Sources

The mathematical backbone of the argument comes from Shannon (1948) on channel capacity, Doyle (1978) on the failure of LQG margins, Zhou and Doyle (1996) on H-infinity synthesis and the Bode integral, and Vapnik (1995) on statistical learning theory. Arjovsky et al. (2019) on Invariant Risk Minimization and Tsipras et al. (2019) on the accuracy-robustness frontier translate the same trade-off into modern deep learning. Ashby (1956) supplies the Law of Requisite Variety. Holling (2001) and Perrow (1984) extend the result into ecosystems and high-risk organizations, with Weick and Sutcliffe (2001) on high-reliability organizations and Uzzi (1996) on the embeddedness threshold. The clinical and physiological case rests on Hayes et al. (1996) on experiential avoidance, Salkovskis on safety behaviors, Foa and Kozak’s emotional processing theory, Craske et al. (2014) on inhibitory learning, Siegel (1999) on the window of tolerance, Porges (2011) on polyvagal regulation, Friston (2010) on the free-energy principle, Dickerson and Kemeny (2004) on social-evaluative threat and cortisol, and Netea et al. (2020) on trained immunity. Taleb (2012) and Newport (2016) frame the two extremes of the trade-off in popular vocabulary.