The Art of Doing Science and Engineering: Learning to Learn

Introduction

Richard Hamming spent thirty years at Bell Labs alongside Shannon, Tukey, von Neumann, and Feynman before moving to the Naval Postgraduate School, where he taught a capstone course called Learning to Learn. The course was his answer to a question he carried for those thirty years of close observation: what habits separated the colleagues who consistently produced first-rate work from those who did not? The Art of Doing Science and Engineering is the book version of that course, edited from his lectures and published posthumously.

Important

Hamming-greatness is more a practice than a trait. The book is full of great people performing mighty deeds, but they are not there to be admired. They are to be aspired to, learned from, and surpassed.

The book is not a textbook on any one subject, nor is it a Hamming memoir. It is a 30-chapter pedagogical experiment in transmitting style of thinking - the patterns Hamming watched separate first-rate scientists from competent ones across his career, narrated through cases where he was a participant (digital filters, error-correcting codes), an observer (Shannon’s information theory, Tukey’s data analysis, Einstein’s late work, Planck’s quantum), or both. The technical chapters are vehicles for thinking moves that survive long after the specific techniques rotate. The meta-chapters - creativity, experts, measurement, selling work, you and your research - state the moves directly.

The question Hamming is answering is the one anyone with a long technical career will face whether they look at it or not: how do you stay productive and make first-rate contributions over your career when the field will turn over multiple times beneath you?

The Compounding Question

Hamming opens with a calculation. Knowledge doubles roughly every seventeen years, with a fifteen-year half-life on technical specifics. The number is approximate, which he derives from scientist-population growth. A long career sits on top of two to three full doublings of the field underneath it.

Two consequences follow. First, most of what you learn early will be obsolete by the time you would otherwise be at peak influence. Second, the only way to stay relevant is to keep relearning, and the only way to keep relearning at scale is to invest in things that don’t decay. Knowing what doesn’t decay is the hard part.

Part I - Learning to Learn

Vision and the Directed Walk

A career without direction is a random walk. With \(n\) steps, an unguided walk reaches roughly \(\sqrt{n}\) from the origin; a directed one reaches a distance proportional to \(n\). Actual careers have correlated steps from skill compounding even without explicit vision, but the relative gap is the point.

The Random-Walk Math

\[ \text{Distance after } n \text{ steps} \;=\; \begin{cases} \;\sqrt{n} & \text{without vision (random walk)} \\ \;\propto n & \text{with vision (directed walk)} \end{cases} \] After \(1,000\) steps, the gap between drifting and directed is roughly \(30\)x. After \(10,000\), it is \(100\)x.

Hamming’s first practical move is the \(10\%\) rule. Spend roughly \(10\%\) of your time; Friday afternoons in his case for decades, on what he called “great thoughts”: what is the future of computing? of my organization? of my field? Without this practice, you stay reactive: every problem in front of you looks important because it is in front of you. With it, you are rarely surprised and usually positioned.

The companion practice is to carry ten to twenty important problems mentally at all times. When a clue or technique appears that fits one of them, drop everything and pursue it. Hamming’s claim is that this is why great people seem to “come in first”: they were waiting for the gap.

A common objection is that the vision will be wrong. Hamming’s answer is that it almost certainly will be, and that the accuracy of the vision matters less than people think. Drift is fine; absence is not.

Education vs Training

The Distinction

Education answers what, when, and why. Training answers how.

Both are required. The two are often conflated, especially in industry, where training masquerades as education and produces people who can do today’s tasks but cannot learn tomorrow’s.

A trained programmer can ship in the current language and stack. An educated one can switch stacks because they understand the underlying primitives such as control flow, memory model, type systems, and computational complexity well enough to map any new system onto things they already know. The trained engineer is paid until the stack rotates; the educated one is paid across rotations.

Hamming’s claim is that education is the harder open problem, and that most institutional teaching has it backwards: it focuses on training because training is easier to grade. Computer-aided instruction can handle most training; education must come from elsewhere.

The Fundamentals Test

Hamming’s test for what to invest in is two-pronged. A piece of knowledge is fundamental if and only if:

It has lasted a long time, and
The rest of the field can be derived from it using the field’s standard methods.

The conjunction is the load-bearing part. “Has lasted long” alone is nostalgia. “Field can be derived from it” alone overweights anything currently fashionable that happens to be expressive.

Subject	Lasted long?	Field derives from it?	Fundamental?
Linear algebra	Yes	Yes	Yes
Probability	Yes	Yes	Yes
Optimization	Yes	Yes	Yes
Information theory	Yes	Yes	Yes
The framework I learned this year	No	Maybe	No

The mistake worth avoiding is to mistake mastery of the current vocabulary for fundamentals. Vocabulary rotates. The mathematical objects underneath do not.

One nuance the test misses: what counts as fundamental is partly perspectival - different practitioners carry different mental models and arrive at different answers about which ideas the field actually derives from. Deliberately reading across adjacent fields is itself a fundamental practice. Wide exposure is what gives you multiple framings of the same object and lets you see which one makes the structure obvious.

Sometimes the only way forward is to go backward such as reexamining the fundamentals you took for granted, especially the ones you absorbed without proof. Most capable people don’t, because they assume the foundations were settled before them; the rare person who actually checks sometimes finds a leap forward, and sometimes finds a contradiction to what everyone assumed was true.

Engagement Over Consumption

“What you learn from others you can use to follow; what you learn for yourself you can use to lead.” This is the line that does the most work in the book. Hamming’s claim is that passive learning compounds slowly, if at all, and that the only way to internalise an idea deeply enough to use it under pressure is to derive it yourself.

He tells the story of John Tukey, who could recall the relevant reference at the right moment when Hamming could not. Tukey did not have a better memory; he had more hooks - more places where new information had been actively connected to existing structure when he first encountered it. Hooks are built by deriving and applying, not by reading.

Build Hooks Deliberately

When you learn something new, ask:

What’s the simplest case where this matters? Strip the example down until you see the structure.
Which fundamental is this an instance of? Almost everything reduces to a small number of categories - entropy, MLE, gradient, sparsity, conservation, symmetry.
Where else have I seen this structure? Past applications and future applications both count.

A new idea without three hooks is barely retained. With three, it is part of your retrieval index for the rest of your career.

The deeper move is to re-derive even what you already know. Hamming repeatedly works through Stirling’s approximation, Pythagoras in \(n\) dimensions, the gamma function - “I have found it very valuable in important situations to review all the basic derivations involved so I have a firm feeling for what is going on.” The slow path is the moat.

Back-of-Envelope as Daily Practice

Hamming uses numerical estimates constantly to test claims. The four functions of a back-of-envelope calculation:

Get a feeling for the problem - numbers expose what verbal arguments hide.
Surface hidden assumptions - you cannot estimate without naming what you are assuming.
Retain the result - memory holds numbers you derived; it discards numbers you read.
Falsify the claim - anything that doesn’t survive a 30-second estimate doesn’t deserve more time.

Apply to every claim, every paper, every meeting. The cost is tiny; the filter is enormous.

The deeper bet behind engagement is that any serious book is supposed to change you, and the change happens whether you agree with the author or not, because even disagreement forces you to reconstruct the argument well enough to refuse it. You get out what you put in: the reader who derives, adapts the result to their own domain and personality, and recombines it with ideas from elsewhere walks away with something the author did not write. Cross-domain recombination is where new theories actually live.

Hooks compound into speed, not just retention: the more places a new idea can attach, the faster the next one attaches after it - fundamentals pay back in learning velocity, not just memory. The hidden trap on the other side is the perpetual learner who reads everything and ships nothing; learning that does not feed back into a contribution is consumption with extra steps.

Part II - How Great Work Gets Done: Six Cognitive Methods

The book’s strongest claim is that style of thinking can be transmitted, and that the technical chapters are the vehicles. The point of working through error-correcting codes or digital filters is not to learn the codes or filters but to watch how first-rate scientists actually moved through hard problems. For example, Shannon proving his coding theorem, Tukey building hooks, Planck stopping short of the limit, Einstein refusing to let go of his unified theory, and the colleagues whose habits Hamming spent thirty years cataloguing. Six recurring moves carry across the cases. Each is a method, not an attitude. The names below are the operational summary; the chapters supply the worked examples.

How Hamming Teaches Style

Style cannot be taught by rule. It transmits only through paired examples - good against bad, with the iteration from one to the other demonstrated. Every method in the book arrives with its failure mode, walked rather than asserted.

Strip to the Simplest Case

Hamming proves Shannon’s noisy coding theorem on the binary symmetric channel only and notes immediately: “the very particular case used suffices to show you the true nature of the theorem.” In the digital filter chapters he fits five points at \(t \in \{-2, -1, 0, 1, 2\}\) - symmetric coordinates kill half the algebra - and designs a two-coefficient filter when more general forms exist. The discipline is to pick the smallest non-trivial case that still carries the structural claim, prove it there, and generalize by analogy.

The temptation in any technical field is to start from the most general formulation. Hamming’s instinct is the opposite: collapse everything you can while preserving the part of the problem that actually matters, then see what the structure tells you. Almost everything else is noise that obscures the lesson.

Trace a Numerical Example First

Before stating the general theorem about digital filters, Hamming designs a specific filter and runs actual numbers through it: a table with two input frequencies and the resulting output, showing the second frequency vanishes exactly as predicted. “You have just seen a digital filter in action.” The general theory comes after.

A worked example provides a physical check on the theorem before you commit to its abstractions. If the example doesn’t behave the way the theory says it should, the theory is wrong (or you misunderstood it) - and you find out cheaply. The discipline is to refuse abstraction until you have verified at least one concrete instance.

Find Multiple Independent Reasons

Hamming gives three reasons for using complex exponentials in digital filtering: time-invariance implies Fourier eigenfunctions; linear systems share the same eigenfunctions; Nyquist’s sampling theorem implies clean reconstruction from uniform samples. “On further digging into the matter I found yet a third reason.”

When several independent arguments converge on the same choice, the choice is robust to any one assumption failing. A choice supported by one reason is fragile; a choice supported by three is structural. The generalization is operational: triangulate justifications, never rest on a single argument when you can find another.

Recognize the Same Idea in Disguise

“Lo and behold, the famous transfer function is exactly the eigenvalues of the corresponding eigenfunctions! Upon asking various electrical engineers what the transfer function was, no one has ever told me that!” Different fields hide identical structure under different vocabularies. \(L_1\) distance is Hamming distance; \(L_2\) is Pythagoras; \(L_\infty\) is Chebyshev - same family, three names. Entropy appears in coding theory, thermodynamics, and statistical mechanics with the same mathematical form and three different interpretations.

The discipline is to get down to basics every time until the disguise drops. The test is whether you recognize the same integral \(\int_1^n \ln x \,dx\) across three classrooms, three notations, and three problem domains: if you can solve it in the calculus class but not the physics class, you have the symbol but not the form. Most “new” things in any field are old things wearing different clothes; the practitioners of one field are often the last to recognize that practitioners of another have been studying the same object for decades.

The failure mode is to learn each disguise separately as if it were a new object - accumulating vocabulary instead of structure. Three weeks of mastering “transfer functions” gets you nothing you didn’t already have if you understood eigenvalues; the time was the cost.

Critique the Result Before Celebrating It

Hamming proves Shannon’s noisy coding theorem via random codebooks and immediately undercuts it: “Let us critique the result. Again and again we said, ‘For sufficiently large \(n\).’ How large is this \(n\)? Very, very large indeed if you want to be both close to channel capacity and reasonably sure you are right!”

The theorem is not useless. But its limits are named in the same paragraph it is celebrated, and the practical engineering response - error-correcting codes, which trade Shannon-optimality for finite block length and computability - falls out naturally because the limits were named. The discipline: every theorem you prove or borrow ships with a “how does this fail in practice” paragraph attached.

Invert - Ask What the Answer Must Be

Hamming’s standard move when stuck is to ask: “if I had a solution, what would it look like?” What conservation laws must apply? What symmetries are forced? Which assumptions are doing real work? In the digital filter chapters he uses time-symmetry to halve the parameter space before solving for coefficients; in the noisy coding theorem he reasons about what an optimal codebook must look like - exponential growth in \(n\), large minimum distance - before exhibiting one.

The structural move is to constrain the answer space before searching it. Most problems shrink dramatically once you write down the properties the answer is forced to have - derivatives that must vanish, dimensions that must match, limits that must be finite. The remaining search is often mechanical.

The operational rule: when stuck for more than a short while, stop searching and write down the constraints the answer must satisfy. If you cannot write any, the problem is not yet well-posed and forward search will not help.

The failure mode is to skip the inversion out of impatience - fishing for a solution while the constraints are still unstated. Forward search feels like progress, which is the trap; the discipline is to refuse to start until you have at least three constraints on paper.

Part III - Doing Important Work

The Bell Labs Lunch Question

Hamming would sit down at the chemistry table at Bell Labs and ask the room two questions:

The Two-Question Test

What are the important problems in your field?
Why aren’t you working on them?

The chemists eventually stopped inviting him. One of them took the summer to think about it and became head of the group. Most spend their lives on unimportant problems and tell themselves the unimportant problems are important because they happen to be working on them.

The kicker is the asymmetric one: a problem is important partly because there is a possible attack on it. Anti-gravity is important and nobody is working on it because there is no clue where to start. Importance plus tractability is the operative pair. Importance alone is wishful thinking; tractability alone is busywork.

Order-of-Magnitude Changes Are Qualitative

A 10x improvement is not a marginal win. “Typically a single order of magnitude change produces fundamentally new effects.” When evaluating which problem is important, the ones whose plausible attack offers an order-of-magnitude improvement deserve disproportionate weight; everything else is a refinement of what already exists.

Drop Wrong Problems

Once-creative people get stuck on the problem that made them. Einstein spent the last decades of his life on a unified theory and produced little. Hamming’s claim is that the ability to drop a wrong problem is rarer than the ability to start a new one, and that the very confidence that produces success is the same confidence that prevents abandonment.

The annual question worth asking yourself: which of my problems should I drop? If the honest answer is “none,” it is almost certainly wrong.

Tolerate Ambiguity

Hamming’s portrait of great scientists includes a feature most career advice avoids: they believe and disbelieve their theories simultaneously. Believe enough to do the work; disbelieve enough to see the openings. Either alone is fatal - pure belief produces zealots who cannot update; pure disbelief produces critics who cannot ship.

This is one of the few traits Hamming names as necessary for great work, not merely helpful.

The Creative Process

Hamming’s account of how creative work actually unfolds, drawn from introspection and from watching colleagues for thirty years, follows a recognizable pattern:

Recognition - the problem comes into focus, often dimly.
Refinement - a longer or shorter period of reformulating, often discarding the conventional framing. “Do not be too hasty at this stage, as you are likely to put the problem in the conventional form and find only the conventional solution.”
Gestation - sustained thinking, including periods of explicit abandonment, that lets the subconscious work.
Insight - a candidate solution arrives, often when you are not actively working on the problem.
Revision - most insights are wrong on first inspection; the next round either rescues the candidate or replaces it.

Two operational tactics carry the cycle:

Saturate the subconscious with the problem. Avoid serious thinking about anything else for hours, days, or even weeks. The subconscious works on whatever you have fed it; deprive it of competing material.
When stuck, ask: “If I had a solution, what would it look like?” This sharpens the approach and may reveal new angles you had subconsciously ignored - what conservation laws must apply, what symmetries are present, which assumptions are actually necessary.

Insight is not free. The eureka moment is paid for by the saturation and the revisions. Most aspiring researchers do not have eureka moments not because they are unlucky, but because they are not sufficiently saturated.

Sell Your Work

Mode	What it is	When to use it
Formal presentation	Talks, conference papers, public lectures	Reaching audiences you can’t meet individually
Written report	Papers, memos, blog posts, documentation	Compounding your reach over time
Informal conversation	Hallway chats, lunch tables, calls	Where most actual influence happens

Three further moves:

Open door beats closed door over a career, even though closed-door researchers do more work per year. The closed-door work is on slightly wrong problems because the researcher loses the input that would have re-pointed them. “Open door researchers come up with slightly less, but it is on the right problems.”

Don’t make yourself indispensable. The engineer who hoards a tool, codebase, or piece of infrastructure is not promoted, and the work does not spread. Tell ideas freely; they are rarely stolen, and the cost of withholding is paid by the work.

Engineer the close, not just the abstract. The endings of Hamming’s technical chapters are diagnostic of what he thinks the lesson actually is. The information theory chapter ends on Eddington’s fish-net and “definitions create artifacts” - not the coding theorem. The digital filter chapter ends on Pasteur and the Gibbs overshoot story - not the smoothing coefficients. The \(n\)-dimensional space chapter ends on three metric choices - not the sphere paradox itself. The technical result is the vehicle; the closing insight is the cargo. When you write or present, ask what the reader will still remember six months from now once the specifics have decayed, and engineer the close to deliver exactly that.

Self-Management

Compound Interest of Effort

\[ \text{Lifetime output} \approx \text{Daily effort} \times \text{Years} \times \text{Compounding} \] One extra hour per day, sustained over a career, more than doubles output. The math is unsentimental. The marginal hour is best spent on whichever pillar is weakest, not on whichever is most fun.

Hamming’s other self-management moves are operational and specific:

Manage yourself by external commitments. Promise a result by a date and become “a cornered rat.” Self-confidence plus deadlines beats willpower.
Practice changing yourself in small things first. Self-modification is hard; build the muscle on minor habits before attempting major ones. The small change you can sustain beats the heroic one you cannot.
Cultivate courage. Hamming repeated to himself a Shannon line when stuck: *“I ain’t scared of nothing”. Confidence is performative until it isn’t.

Part IV - The Traps

The other half of the book is a catalogue of failure modes. Knowing them by name is a non-trivial fraction of avoiding them.

You Get What You Measure

Eddington’s Fish-Net

A group of biologists studied the fish in a sea by repeatedly casting a net. Examining their catch, they concluded there was a minimum size of fish in the sea. Their conclusion arose from the tool used, not from reality. The net’s mesh size determined the smallest fish they could see.

Hamming’s claim is that measurement does not just record behaviour; it shapes it. IQ is normally distributed because the test was constructed normal - calibrated until it produced a Gaussian. Quarterly earnings reporting produces short-termism. “Lines of code” as a productivity metric produces bloated software. In every case, the second-order effects of a measure swamp its first-order intent.

The lesson is operational: plan the second-order effects of any new measure before you install it. Whatever you measure, behaviour will bend toward it, often in ways that defeat the original purpose.

Unreliable Data: The 90% Rule

Hamming’s 90% Rule

The next independent measurement will fall outside the previous 90% confidence limits more often than 10% of the time - sometimes much more.

Stated as exaggeration to be memorable. Treat advertised data accuracy as fiction until proven otherwise.

Three operational consequences:

Pretest data before processing it. Examine for consistency and outliers before running the analysis. Most “surprising results” turn out to be data problems.
Test the test equipment first. Life-test the life-tester. Do not assume measurement infrastructure is more reliable than what it measures.

Hamming’s own most-cited example: a Western Electric inventory study where he was burned by trusting the input data, learning afterwards that the people who collected it had every incentive to massage it.

The Expert as Paradigm-Blocker

Drawing on Kuhn’s Structure of Scientific Revolutions, Hamming names the dilemma directly: experts are necessary, and experts are also among the principal blockers of progress. Most innovations come from outside the field - continental drift, carbon dating, the first automatic telephone. The expert’s tacit assumptions, often inherited from training and never re-examined, become the wall that the next paradigm must climb over.

His operational rule: “If an expert says something can be done, they are probably right. If an expert says something is impossible, get a second opinion. All impossibility proofs rest on assumptions, and experts forget to inspect them.”

The personal corollary, which he flags as the harder one: when you become the expert, refuse to block the next generation. Hamming declined to participate in some computer-decision processes after he rose, writing: “Modesty? No, pride.”

Local Optimization Ruins the System

The cathedral parable: three stonecutters were asked what they were doing. The first said, “I am cutting stone.” The second said, “I am supporting my family.” The third said, “I am building a cathedral.” Only the third had the system in view.

flowchart TD
    A[Local optimization<br/>of one component] --> B[Component metric improves]
    B --> C[Side effects on<br/>adjacent components]
    C --> D[System metric degrades]
    D --> E[More local optimization<br/>to compensate]
    E --> A

Optimising a component reliably degrades the system. Cramming for individual courses produces a bad total education; optimising per-course curricula produces calculus without induction, complex numbers, or impossibility proofs. The component is not the system. Most reform attacks the most visible symptom and makes the underlying system worse.

The closely related principle is graceful degradation: “The closer you meet specifications, the worse performance is when overloaded.” Designs that fail safely beat designs that nail the spec exactly. The implication for any engineered system - physical, organisational, software - is that some slack at the spec is a feature, not waste.

Survivorship Bias

This is Hamming’s own meta-failure, and the most honest reader notices it. He explicitly chooses to study successful scientists because there are more ways to fail than to succeed, and patterns of success transfer where patterns of failure don’t.

That is correct for pattern transfer. It is silent on base rates and worse, silent on systemic selection. Hamming’s sample is not just survivors; it is survivors selected by:

Wartime US institutional infrastructure - Bell Labs, Los Alamos, Manhattan Project adjacency - that no longer exists at the same scale or with the same patience.
Hamming’s own taste in “great” - heavy on mathematicians and physicists, light on biologists and engineers who chose less legible problems.
Career structures with stable funding, lifetime employment, and unhurried timelines that the modern researcher does not have.

The reader is left to infer “if I do these things, I will succeed,” which the data does not support. The honest reading is: if I do these things and am also lucky and well-positioned in a system that rewards them, I might. The book does not pretend otherwise. It also does not foreground it.

Part V - Reading Hamming as an AI Scientist

The frame translates to AI research and engineering once two adjustments are made: the doubling time is shorter, and the technical content is younger.

What’s Fundamental When the Stack Rotates Annually

The fundamentals test passes for the same things in AI as in any other technical field: probability, linear algebra, optimisation, information theory, basic statistics of estimators. It fails - interestingly - for things that feel fundamental but aren’t: a specific architecture, a specific training technique, a specific framework. These will rotate. The honest taxonomy for AI specifically:

Layer	Examples	Verdict
Fundamental	Probability, linear algebra, optimization, information theory, calculus, basic statistics	Will outlast every architecture
Probably fundamental	Statistical learning theory, high-dimensional geometry, dynamical systems for training, basic complexity theory	Lasted decades; field derives from them
Contested - too young to know	Mechanistic interpretability primitives (features, circuits, superposition), scaling-law formalism, RL-from-feedback theory	Plausibly load-bearing in 10 years; plausibly footnotes
Fashionable, not fundamental	Specific architectures (Transformers), specific training recipes (RLHF, DPO), specific frameworks (PyTorch), specific eval suites	Will rotate inside a decade

The contested row is the one that actually matters for an AI scientist’s investment decisions, and it is the one Hamming’s framework cannot answer for you - only time can.

The rule worth internalising: for every paper you take seriously, derive one component from scratch - where a component is one self-contained mathematical block you could write on a blank notebook in under an hour. Multi-head attention. An optimizer step. A KV cache. A KL term in a loss. Concrete artifacts of this practice in my own work include tiny_pytorch, GPT-2 from scratch, an annotated LSTM implementation, a transformer explainer, neural net building blocks, and a gradient descent walkthrough. They are not for teaching. They are the engagement pillar, written down so I can re-enter the derivation when I need to.

The Creative Cycle Is Different Now

Hamming’s account of creative work - recognition, refinement, gestation, insight, revision - is the cycle of theorem-proving and experimental physics in the 1980s. The bottleneck was the subconscious: feed it the problem, wait for the connection. Modern AI research has a different bottleneck. The cycle is closer to hypothesis → small-scale experiment → ablate → scale → eval → repeat, and the rate-limiting step is GPU time, dataset construction, and eval design - not subconscious gestation.

The “saturate the subconscious” advice still applies; the cycle around it does not. Hamming’s portrait of the lone scientist staring at a problem for weeks fits a paper-and-pencil discipline. It fits less well when the next iteration costs hundreds of thousands of dollars of compute and several days of distributed training. The discipline that compounds is no longer just gestation; it is eval design - knowing what question your next run is allowed to answer, before it starts.

Where Hamming’s Traps Hit Hardest

Four traps from the book land with particular force in modern ML:

You get what you measure, applied to LLM benchmarks. MMLU got optimised. HumanEval got optimised. The ability of a benchmark score to predict deployment quality remains poor. Eddington’s net, exactly: the conclusion arises from the tool, not from reality.
Hamming’s 90% rule, applied to reported model improvements. Most published wins are single-run, against a fixed eval, on a fixed seed. Run the seed sweep. Check for eval contamination. Assume the next independent measurement will land outside the confidence interval - because more often than not it does.
The expert as paradigm-blocker, applied to scaling. The canonical case is the 2017–2019 mainstream consensus that “scale alone won’t get you to general capabilities.” That was experts forgetting to inspect their own assumptions, exactly as Hamming warned. The personal version of the rule is harder: when you are the expert claiming the next regime won’t work, the prior should shift to “get a second opinion” - including from people whose intuitions you usually discount.
Local optimization ruins the system, applied to overfit-to-eval models. Hamming’s graceful degradation line - “the closer you meet specifications, the worse performance is when overloaded” - is one of the strongest robustness claims in the book. A model that hits 95% on the eval and 60% under distribution shift is the cathedral with one perfect stone. The implication for AI safety and robustness is direct: leave slack at the spec; reward systems that fail safely over systems that nail the headline number.

Where the Frame Breaks

Hamming’s lens is on the individual scientist. It is largely silent on the four constraints that define modern frontier AI:

Coordination at scale. Long training runs need teams, not solo gestation. The “open door” rule still applies; the “informal lunch conversation” rule needs translation to async work in a distributed team - Slack, design docs, and code review are the new lunch table, with different latencies and different failure modes.
Capital and compute as research levers. Vision in 2026 is not just what is the future of computing? but what training runs are worth the GPU budget? and what experiments are cheap enough to run before committing capital? Hamming’s framework has no slot for this question.
Infrastructure as fundamentals. A scientist who cannot write a dataloader, eval harness, or distributed training script in 2026 is not just impractical - they have a missing fundamental that Hamming’s list does not anticipate. The fundamentals test passes for systems knowledge here, not just mathematical knowledge.
Safety and governance as career constraints. Hamming treats legal liability as friction. In modern AI it is closer to a first-class research direction. A serious AI scientist’s vision now includes which capabilities are worth shipping at all - a question Hamming’s lens does not pose.

Use Hamming where the unit is you. Use a different lens - closer to The Mythical Man-Month, closer to operations research, closer to safety engineering - where the unit is the team, the run, or the deployment.

Conclusion

The book is, finally, an argument that two things compound: knowledge in the form of fundamentals deeply understood, and reputation in the form of important problems publicly attempted. Both reward the long game. Both require the discipline to refuse the short one.

Underneath this is a values claim the book never quite states but everywhere assumes: the choice between coasting through one career comfortably and spending it on important problems is not a productivity question - it is a question of what the time was for, and the unexamined life is not worth living. The reward turns out to be in the work itself, not the outcome - joy lives in the striving and the process, not the achievement. The harder path is harder; it is also the only one that leaves a legacy.

Hamming closes - and so do I - with the line he attributes to Pasteur: “Luck favours the prepared mind.” The unfair part of careers is that luck is real. The fair part is that the preparation is not.