You Still Need to Think

As coding agents become more capable and long-running, they don’t remove the human job of thinking. Someone still has to direct the work—set goals, choose constraints, and judge outputs.

Everyone seems to recognize this in the abstract. But nobody seems to talk much about how the product shape fundamentally shifts the type of thinking required to write code.

I'd argue that seemingly minor UX differences between different coding agents end up having a massive impact on how users spend their "thinking budget."

With a remote-first product like Codex Cloud (not the CLI), the product encourages you not to spend time thinking while the agent is working and spend more time thinking about the end result that the agent gave. We intentionally didn't want users watching the agent invoking terminal commands, because the model works very differently than a human might.

With an interactive product like Claude Code or Codex CLI, you naturally spend more time thinking about about specs and the high level approach the agent is taking as you follow the chain of thought in the terminal. But you need some other means of looking at diffs (editor / GitHub), so it tends to be less a part of the workflow vs following the plan the tool has created and the commands it is running to verify as it implements.

With an IDE-focused product like Cursor, you accept most diffs as they come in. Your thinking window is relatively short because you are accepting code (or not). You need to do relatively little thinking to supply the right context, because it's already in your editor. The trade-off is that you have probably broken down the problem ahead of time. You need to spend more time coming up with the plan/approach yourself.

All of these products need to shift the "active thinking" cycles around between...

  1. supplying the right context
  2. coming up with a plan
  3. implementing the code
  4. verifying and reviewing it

If I were to guess, the area that LLMs are strongest right now is 3 (implementation) > 4 (verification) > 2 (planning) > 1 (context). Until we deploy tools for doing search across the organization, eliciting info from the user, and understanding the global context of an org–providing the right context still seems to be the area where humans can provide the most value.

Conversely, LLMs excel at taking well-specified plans and implementing them. They handle race conditions, error handling, and complex technical details remarkably well.

If you accept this idea of a thinking budget, it's easy to understand why engineers might have very different experiences (positive and negative) using the different tools.

Some problems require only a clear spec—you know the implementation completely. Others require writing code to "think through" the problem, then refactoring iteratively. Some engineers prefer reviewing a first pass to writing from scratch.

Whatever your preference, different UX requires you to think in different ways. I think it's unlikely that a 'single workflow' will satisfy all users. You still have to think... but the best products will let users choose how they want to do that. 1

Footnotes

  1. The new Codex IDE extension actually does a very good job of this. You can decide to kick off tasks locally, or hand them off to the cloud. Disclosure: I was working on this team, so I am biased.