Desirable Difficulty

I was recently watching a Jeremy Howard conversation on Machine Learning Street Talk, and one phrase kept sticking in my head: desirable difficulty.

That phrase comes from learning science. The idea is simple: some kinds of difficulty make learning stronger, not weaker. If recall is effortful, if understanding takes some wrestling, if the problem pushes back, the thing you learn tends to stay with you longer.

The more I think about AI coding tools, the more I think this is the real issue.

Not because AI is useless. It clearly is not. I use it, and in the right context it is absurdly helpful. But there is a real difference between removing pointless friction and removing the friction that actually builds skill. That second kind is dangerous.

That is also why the recent Anthropic skill-formation study caught my attention. I do not need to re-explain the whole paper here. The short version is enough: when people leaned on AI too much while learning something new, they tended to learn less. The people who used it best used it for explanation and feedback, not as a total substitute for thinking.

The Work Has To Push Back

People talk about productivity as if friction is obviously bad. That is only half true.

If you have ever debugged a distributed systems issue, trained a brittle model, or tried to understand a complex research paper properly, you know growth does not come from smoothness. It comes from contact with reality. Your mental model predicts one thing, the system does another, and now you have to revise your understanding.

In programming that pushback shows up when a test fails for a reason you did not expect, when an API behaves differently from the docs, when a concurrency bug only appears under load, when a model looks good offline and collapses in the real world, or when a design that felt elegant starts breaking the moment you try to extend it.

Those moments are frustrating, but they are also where a lot of engineering judgment comes from.

If a tool gives you answers too quickly, too cleanly, and too early, you can skip the exact moments that would have forced you to understand the system. That is the paradox of AI coding right now: the workflow can feel smoother while your actual capability grows more slowly.

And this pressure does not only come from the individual level. In a lot of companies now, the push is organizational: ship faster with AI, increase output, compress timelines, and do not be the person who looks slower than the rest of the team. Once speed becomes the only visible metric, it becomes very easy to optimize for short-term throughput and slowly lose the harder kinds of learning that good engineering depends on.

I feel grateful that my role in my current company still gives me room for a different kind of work. I am not only building features and APIs on a fixed loop every day. I also get to research new ideas, push myself in different directions, learn things I would not have touched otherwise, and think about how an existing system can be bent into something better suited to the problem we are actually trying to solve. That space matters to me, because it keeps the learning loop alive instead of replacing it.

The Gap Is Judgment

What Jeremy Howard is really pointing at, I think, is not some basic "coding versus software engineering" slogan. Everybody serious already knows typing code is only one layer of the problem. The real question is where the bottleneck actually is.

LLMs are often strong at producing code-shaped output. They can translate between libraries, scaffold boilerplate, and fill in familiar patterns very quickly. But that is not the part that usually decides whether a system will hold up.

The harder part is judgment: choosing the right abstraction, seeing where the boundaries should be, noticing the edge case before it becomes an outage, and understanding when two problems only look similar on the surface. That is why so many "AI built the whole thing" demos feel impressive right until you imagine maintaining them six months later.

Fred Brooks made a related point decades ago in No Silver Bullet: the biggest gains usually come from reducing accidental complexity, but the deepest difficulties are essential. AI can absolutely help with the accidental part. What it does not automatically give you is the mental model needed for design, judgment, and long-term reasoning.

Why This Matters Even More In AI And ML

This matters even more when the work is genuinely hard.

If you are solving a well-known problem with well-known patterns, AI can be a huge advantage. But AI/ML work and serious software engineering are often not like that.

A lot of the real work in ML is not "write some PyTorch." It is figuring out why the data distribution shifted, why training is unstable, where leakage entered the pipeline, why a retrieval system looks good offline and bad online, or which hidden assumption inside the evaluation setup is rewarding the wrong behavior.

Likewise, a lot of the real work in software engineering is not "produce code for feature X." It is deciding what should be reusable and what should not, designing around failure modes you have not seen yet, understanding temporal behavior rather than just static code, and making systems observable enough that future debugging is possible.

These are not just generation problems. They are sensemaking problems. You inspect logs, run experiments, falsify assumptions, isolate constraints, and update your model of the system. That process is slow, but the slowness is often where the expertise comes from.

The Risk Is Understanding Debt

The obvious risk of overusing AI is bad code. Hallucinated APIs, fragile abstractions, wrong edge-case handling. Everybody talks about that.

I think the deeper risk is understanding debt.

You can ship something that works and still be in trouble if nobody on the team really understands why it works, where it is brittle, or how it should evolve. You can pass the tests today and still have accumulated future confusion.

That debt compounds, especially for people still forming engineering judgment. You can look productive in the short term while quietly not developing the debugging, abstraction, and systems reasoning you will need later.

Some of what I do about that is mechanical, on purpose. In AGENTS.md or CLAUDE.md I keep explicit rules so that the model does not jump straight to code: it has to teach and to force me to consider other directions first. For review, I rely on specialized agent skills that spell out a different bar than "check for bugs." I want architecture tradeoffs named, alternatives we did not pick, and perspectives on scale and failure that a narrow correctness review would never surface. Reading those reviews seriously, not skimming them for green lights, has helped me grow as an engineer more than any quick pass for obvious mistakes.

That is why desirable difficulty is more than a learning-science phrase here. It is an engineering principle. Some struggle is not inefficiency. Some struggle is the mechanism by which capability gets formed.

So What Does Good AI Use Look Like?

The answer is not "never use AI." The better question is: how do you use AI without deleting the learning loop?

For me, the good pattern is to use AI to compress lookup, not to compress thinking. I want it to explain tradeoffs, compare approaches, surface assumptions, or generate small pieces I can fully inspect. I do not want it to replace debugging, tracing, testing, or architectural judgment. And when I am learning a new library or concept, I would rather use it for explanation than for total delegation.

I am not the person to hand down universal rules on this; I am still experimenting like everyone else. But one place where I have leaned on AI hard, and deliberately, is my own study of deep learning and the scientific side of the field. I use it to understand research papers in depth, not as a shortcut to a lossy summary, but to work through claims until I can restate what is actually going on. When something blocks me, usually dense math or notation I never internalized, I try to clear that first, often with the model walking me through prerequisites, before I let myself "understand" the paper at only the headline level.

In other words, use AI as a collaborator in the loop, not as a substitute for having one.

That is also why strong engineers do not just ask the model to "build the thing." They keep interrogating it: why this design, what invariant are you relying on, what breaks under concurrency, what is the failure mode here, can you explain this in terms of the underlying model and not just the syntax?

That is a very different workflow from vibe coding your way into a repo you can no longer reason about.

I suspect the best teams in this era will not be the ones that remove every ounce of friction. They will be the ones that learn to distinguish between wasteful friction and skill-building friction.

That is why I keep coming back to desirable difficulty. Not because programming should be painful, but because in hard technical work, the right kind of difficulty is often the thing doing the teaching. If AI smooths over every rough edge, it might also smooth over the places where engineers are made.

Previous Blog← The Seven Algorithms That Make Graph RAG Actually Work