How much of a manager are you?
Every few weeks I see someone comment on Hacker News: "sure, LLMs are great, but they really couldn't do my job". And I've been trying to square that opinion with all the impressive levels of output that I see everywhere else. What accounts for the difference in perspective between the two camps?
This debate has become top-of-mind for me again as I've been trying out a bunch of new AI coding assistants. I've been doing a lot of what Andrej Karpathy calls 'vibe coding' on various side projects:
There's a new kind of coding I call "vibe coding", where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It's possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like "decrease the padding on the sidebar by half" because I'm too lazy to find it. I "Accept All" always, I don't read the diffs anymore. [...] I'm building a project or webapp, but it's not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.
Coding this way feels a bit like cheating. It's so easy to do, that I just start ignoring the code all together and focusing on the outputs. And it works well. Really well. I can generate full-blown webapps with relatively complex functionality.
Clearly LLMs are doing something right here? So why do we feel retiscent about turning over the keys entirely to an LLM in a production codebase?
Fundamentally, I think a lot of success or failure comes down to exactly how you as an engineer choose to augment yourself with LLMs. And in particular, how much of the task requires being a manager vs a software engineer.
The 'what' vs the 'how'
When I first started managing the engineering team at Segment, an engineering mentor gave me the following advice...
as a manager, it's your job to think about the what, not the how. 1
The manager should be responsible for clearly describing the problem, setting the team goals, defining guardrails, and being specific about what 'good' looks like. And then the engineering team should be responsible for coming up with the actual tech for how to achieve those goals.
There are a few reasons for this...
- it's difficult for someone in an engineering leadership position to come up with great technical solutions if they aren't deep in the codebase.
- from a systems perspective, the individual engineers should have ownership over the codebase. if an engineer is getting paged by a system at 3am, they should be able to change that system.
But more generally, it's just an instance of principals and agents. The principals define the rules of the game and the goals, and then the agents are free to solve them however they see fit.
The tools are getting good enough that while using them, I've been feeling less like a software engineer and more like a manager. I'm specifying the requirements, and the model fills in the details.
I think this is the key to using LLMs well. You have to have some sense for where they shine and where they fall short. And you have to be okay with the fact that you are managing the LLM, not writing code.
The art of managing well
As I've gone down the rabbit-hole in my own side projects, there are a few things that stand out as making vibe coding successful. Tactically, I've been using Cursor with Sonnet 3.5 to build various sites, but I think they'd work in almost any context.
Be clear about what you want
The first, and maybe most obvious thing is that you have to be clear about the behavior you want to achieve. Instead of saying "build me a chat app", it's worth spending some extra time on the prompt to describe the pages and UI elements you want.
I've started writing down a spec doc ahead of doing any coding. I will typically scaffold a few different things just to give the LLM a place to get started:
- what framework I'm using (e.g. running
npx create-next-app
to give it some boilerplate) - specs for the data structures I want (e.g. "create the following Typescript types with the following fields")
- what I want the API to look like (or pages for a frontend app)
In areas where I don't have great ideas (e.g. design), I'll give the agent a little bit more leash. I'll give it more general instructions like "add a layout button to the logout" and see what it comes up with. Often it finds a better path than I would have on my own.
Tighten your feedback loops to instantaneous
For all of my projects, having the ability to instantly reload the page is key. It means that I spend less time looking at the code the models are generating, and more time actually just looking at the output.
I'm typically running npm run dev
in a terminal and then just reloading the page to test functionality. But asking the models to write and re-run unit tests also works.
While using Cursor, any changes which are applied will automatically go through a lint check, and the model is typically able to resolve them on its own. Having some automated step (like a CI/CD pipeline) that is cheap to run and check will get you a lot more mileage.
Start small, and iterate
Sonnet is remarkably good at following multi-part instructions, but I've still had the most success by starting small and iterating.
For example, if I'm building a webapp, I'll first ask to create the database types and methods. Then I'll ask it to add the API routes. Then I'll ask to add the frontend pages for it. Then I'll critique the styling.
Most of today's models don't nail a "one-shot" implementation. But if they have a starting point, they are much more likely to get it right.
You could probably do this in a different order, but I find working 'bottoms-up' tends to give fairly accurate results and catch errors as they happen.
Give the model a toe-hold
With Cursor + Sonnet in particular, I find that having a good 'toe-hold' makes all the difference. If there are some norms for things like "use shadcn components" or "use tailwind classes", the model is more likely to mirror these when generating new outputs.
Rather than starting with a totally blank canvas and asking the model to do everything, starting with a few opinions on basic styles will help a lot. Sonnet is quite good at matching what is already there.
Make failure cheap
I suspect that the reason LLMs work so well for vibe coding is that making mistakes is incredibly cheap. You aren't writing critical infrastructure or production code, you are just trying out ideas.
This creates a bit of a paradox though... most of the work that software engineers do today isn't try out ideas. It's writing production-grade code. Code that has to work, be debugged, and maintained for years to come.
To use LLMs effectively (today anyway), engineers need to understand the difference between:
- cheap use cases: scaffolding new boilerplate, writing unit tests, fixing small regressions, adding documentation, scripting one-off tasks
- expensive use cases: big refactoring, mission-critical APIs, core data models, database migrations
Note that this is a moving target! And more and more "expensive" tasks are becoming cheaper.
Dictating your requirements
Like Karpathy, I've found that dictating what I want really forces me to think more about the requirements. I was really skeptical about people building software by just recording their voices (isn't voice really lossy?).
But recently I became a new parent, and found a high premium on using my hands. I'm relying on voice a lot more often, for everything from "implement this database call" to "make these two buttons the same size". It works shockingly well, and I find myself relying more naturally on describing the what vs the how.
What's missing?
I painted a bit of a rosy view of how to use LLMs well, but there are a lot of things they won't do...
Image transfer – I haven't had luck uploading an image, and then asking the models attempt to replicate it in code. If this is what you are looking for, you'll have a much better time exporting some base CSS from Figma or describing what you want in your own words. Since there's an intermediate text step for most models, image -> code tends to be lossy.
Refactor on their own – typically LLMs will just spit out more and more code without considering how to consolidate functionality as part of one shared component. You have to explicitly nudge the model to simplify code and pull out pieces of shared functionality.
Sweating the details – My experience with LLMs is that they can implement many small steps in isolation (consider the security implications, improve memory performance, update the API so it's more idiomatic). But coordinating them all at the same time doesn't tend to work so well. It's hard to get a model to simultaneously focus on all of these areas at once. If there are particular features which really require a lot of performance critical work, I think we need to wait a bit for models to improve.
System design – Using a model is no substitute for good architectural decisions. You probably have a better sense of how the system should evolve than the model does. I wouldn't hand off these tasks to a model quite yet.
Implementing complex pieces of functionality – in my experience, most LLMs still lack the ability to think carefully about things like race conditions, advanced edge cases, and really high-performance code.
Understanding the system as a whole – context windows are increasing, but they really aren't enough to include the whole codebase. Cursor + Sonnet do extremely well at searching codebases for the relevant context.
Admittedly, this is a long list–and there are a lot of things here that seem like dealbreakers. But I think it's worth recognizing that a lot of these limitations will go away over time. Reasoning is improving, instruction-following is improving, and context windows are increasing.
The engineering divide
Back to the original question... why do some engineers love LLMs and others don't?
I think this is mostly accounted for by whether you care about what or how. For every engineer, there's some level of comfort with 'handing over the keys' to an agent... and that level depends a lot on what you are trying to get done.
If you don't know any frontend code, you are much more likely to lean on the LLM to produce something that 'looks good' even if it doesn't optimize for a refactor. If you don't know how to fix some random devops on a server, you are probably happy to run all the machine-generated bash you can get.
On the flip side, if you spend most of your time doing performance engineering and know everything there is to know about how Go does memory allocations, you'll probably be unsatisfied with an LLM that generates working (but maybe inefficient code).
If I were just starting out in software and giving advice to my younger self, I'd say there are a few skills that are really worth developing...
- being able to clearly articulate what you want to get done. thinking more like a PM than a software engineer. understanding at any given time "what's the most important thing to build".
- being able to architect systems in a way that they are flexible later on. it's easy to lock yourself into a poorer data model or a bad API with an LLM in the loop. if you think clearly about the system, you can use the LLM as a colleague. the outputs you get will be better as the system evolves, because you did some pre-work.
- understanding how systems work. you get a lot more value from LLMs if you understand at least some of the underlying components and have intuition for how they fit together. in my experience, this subtly changes your ability to prompt and gets you much better outputs. I don't think this goes away even as reasoning + context windows improve.
- build intuition for what's expensive vs cheap. we haven't begun to explore the capability overhang that exists with today's models. understanding their limits is one of the best things you can do as a software engineer.
As one parting thought... my hypothesis is that a large philosophical divide on using LLMs comes from the enjoyment engineers derive from writing code. Many of us (myself included) originally started coding because it seemed like a fun way to solve puzzles.
As 'solving puzzles' becomes the domain of machines, that can't be the main reason we still write code. Solve problems instead.
Footnotes
-
This was a time before 'founder mode', though I think much of this advice still holds. Obviously this philosophy is no substitute for deep inspection. As a leader, you must still be thinking critically about how the team is doing and diving deep to understand and critique the 'how'. ↩