An Elegant Puzzle

An Elegant Puzzle

JAN 18, 2022

An Elegant Puzzle is one of those classic engineering leadership books that I wish I'd read far earlier than I did. It's written by Will Larson, the CTO of Calm.

It's perhaps one of the best primers on growing and scaling yourself as an eng leader (with a focus on people management). In my experience, all leaders at scale need to adopt the systems-level thinking that Will so thoughtfully articulates.

If you are a founder and can read one book this year, An Elegant Puzzle would be a good place to start.

Organization Design

Makeup

Will argues that effective engineering teams should be "abstracted" away from their individual members. I hadn't heard this phrasing before, but it definitely resonates. In order to have a high-functioning team, it has to have enough people that a single one leaving, taking vacation, or having a bad week won't effect the team's overall output.

His general rules...

  1. teams should be six-to-eight during steady state
  2. to create a new team, grow an existing team to eight to ten
  3. never create empty teams
  4. never leave managers supporting more than eight individuals

For managers of managers, the director should be able to support 4-6 different managers.

On-call rotations should be 8 people. If you ever have fewer than 8 people on an on-call rotation, it's worth trying to split it between two teams.

This goes a little bit against the advice we noticed at Segment. I'd say optimal teams were something like ~5 engineers, but admittedly that tended to leave room for leaky abstractions (individual engineers mattered a lot more) and a spiky maintenance load.

Operating modes

Will has a good framing for the modes that teams tend to exist in, and proposed methods for getting them to stage four.

  1. Falling behind (add people)
  2. Treading water (reduce scope)
  3. Repaying debt (add time)
  4. Innovating (add slack)

If a team is falling behind, you need to add people to it. That helps it get out of a feeling of never making progress.

If a team is treading water, then reduce the amount of in-progress work, or try and shift the work to another team.

If a team is repaying debt, that is generally a good place to be. The cycle of repaying debt compounds.

If a team is innovating, then you are at the happy part of the spectrum! In this case, ensure you have enough slack to keep innovating and make sure you are emphasizing priority into your work.

Will has the same perspective we tried to achieve at Segment, you shouldn't have teams which are focused solely on innovation vs teams focused on maintenance. Every team should "eat its vegetables".

He explicitly advises hiring new people vs moving people from within the org. He argues that people aren't really fungible, and moving existing hires tends to become political.

In all of these operating modes, the shifts are slow, so you shouldn't expect one mode to switch to another overnight. Instead, it's like a board game, be confident in your bets and your strategy.

Long-lived and shifting scope

Will makes a case that teams should also be long-lived and stable. Several managers at Segment (particularly Gerhard Esterhuizen) really leaned into this idea too.

It's harder to get teams to re-gel than it is to shift scope or workloads between different teams. Instead of shifting people, shift responsibilities.

If many teams are behind, it's not that useful to do a "peanut butter" approach and spread focus across all of them. Instead, try and get one team to a great spot, and then focus on the next team. This will help you get earlier return on investment.

He also advises that hiring happen on one team at a time to limit the team's re-gelling process. This is pretty different than the techniques we employed at Segment, and I'm curious to know how it shifts the culture. But, I can see some merit for it in terms of onboarding.

Building in slack

Will has an interesting idea that most teams should always strive for a certain amount of slack. He argues that teams generally know what to focus on, and having some amount of time which is unspecified allows them to steadily improve tools and processes they own.

It reminds me of the SlateStarCodex Post on the same concept.

Hiring and onboarding

During periods of rapid hiring, it's worth taking a "systems approach" to how your eng team is performing. Will uses the model that each new engineer is 30% effective until six months, and each existing engineer must spend 10h per week training a new engineer.

The result is sort of astonishing. If each existing engineer is training 2 others, you end up getting about 1.16 output from 3 people! No wonder it can feel like hiring slows things down, and creates some scary situations with your burn.

Managing organizational debt

Will calls out that at any given time, it's best to have 2-3 things you're making progress on. If a team isn't performing, fix the team, then fix the next team, then fix the org.

What should you do with everything else? Write it down, identify it, and explicitly treat it as a "non-issue" that you won't pay any attention to. I like this mode of thinking because it explicitly calls out areas you won't be working on, and allows you to ignore them for the time being.

Minimizing distractions

There's a section on trying to minimize distractions. The biggest systems-level advantages you can take are...

  1. trying to make documentation a norm (both in writing and search)
  2. making it someone's explicit duty to answer questions

I'd say we ever only did a so-so job of this at Segment. Will admits that there's never a silver bullet here, but it does make me wonder what "best-in-class" documenting companies look like. My guess is that Gitlab, Sourcegraph, and other fully remote companies come close.

Tools

Systems thinking

Model everything as "stocks and flows". If there's some bit of complexity around what you are doing, try and model it as a cloud. The canonical book here is Thinking in Systems, but it's easy to get the broad strokes.

He models a good diagram for "developer productivity" which basically combines all of the progress around making changes and getting them reviewed with reverts that need to happen due to some sort of production issue.

It's easy to see in this model where something breaks down! If you don't have enough new pull requests, it doesn't matter how quickly they are reviewed. If you have too many production issues, you won't be getting ahead.

Building product

Building product goes through four cycles...

  1. Problem discovery. Begin by looking at user's pain and purpose, what are they trying to do? Narrow in on your competitive advantages and moats.
  2. Problem selection. Can you win the next round, win in future rounds, and get compounding returns on investment?
  3. Solution validation. Find the fast path, look at prior art and reference users. Prefer experimentation over analysis–this is a really good tip that we did not execute to the extent we should have at Segment!
  4. Execution

Rinse... and... repeat. As an aside this is an interesting space that I'm at currently!

Strategies vs Visions

Strategies are concrete plans to solve some sort of problem. Visions are aspirational. The former should be your operating plan, while the latter are a little more vague and help independent teams eventually reach alignment.

To craft a strategy doc, a good framework comes from Good Strategy, Bad Strategy.

  • Diagnose -- this is the problem statement. It should be exceedingly clear and already prompt some ideas about where to go.
  • Policies -- exactly what you plan to do about the problem. This should piss some people off or else you are just codifying the status quo. It should generally guide your strategy (what gets prioritizes? What gets ignored? Do we prefer the short term gain vs long term investment?)
  • Actions -- these should fall out of the policies and be concrete plans that you expect to do.

Visions, on the other hand are aspirational. They govern "here's what we wish we could do if we didn't have any constraints." Visions should be detailed, but the details shouldnt limit you! Instead they should be illustrative and clarifying to help the org think more broadly.

Visions should have...

  1. A vision statement (1-2 sentences)
  2. A value proposition (how to be valuable)
  3. Capabilities: what your customers need
  4. Solved constraints: what problems exist today that you will have solved
  5. Future constraints: what problems will exist in the future
  6. Reference materials
  7. Narrative: a one-pager on what you want to build

You'll know it's working if people are using it and referencing it! And know it won't be if people don't.

Goals

A good goal has four parts:

  • a baseline (where you are today)
  • a target
  • a trendline
  • a time period

A good example of a goal would be something like this...

“In Q3, we will reduce time to render our frontpage from 600ms (p95) to 300ms (p95). In Q2, render time increased from 500ms to 600ms.”

With goals, you typically want to bake in constraints as well (e.g. we can lower the time to process our data by 50% while maintaining a 30% cost of revenue as infrastructure).

Metrics

Always start with metrics and a dashboard when deciding goals. Explore them to get a feel for them, figure out what your baseline is, understand what teams need to be involved, then nudge teams to fix them.

I saw this done successfully a few times at Segment: infra cost, reliability, digital new customers.

Migrations

Will argues that migrations are the #1 tool to create additional leverage for engineering teams. They will help you get out of teams running at a "standstill" and getting out of debt.

The first thing you have to do in any migration is de-risk it, make sure you have a path towards a credible solution. Then you can stop the bleeding using the new approach, and generate tracking tickets for the migration (ideally automatically at first, then manually for any stragglers). Last but not least, you should finish it yourself, and never leave a migration half-done (I learned this lesson the hard way)!

We were fairly good at these at Segment, and I think in large part it was due to following a similar procedure outlined in the book.

Running a reorg

Only do it when you absolutely have to. Do it for structural reasons, not people or management reasons. Project out what the world will look like in 12mo.

Controls

Will has an interesting concept of creating controls. These are tools a manager can use to drive a specific outcome, where they also control the process. Too often, we forget that these controls are explicitly "tools for a job", whether they are metrics, sprint planning, or demos-each tool serves some explicit purpose. Each control should be paired with a degree of alignment indicating who is doing the task or what the alignment needs are. Do you want to do the thing yourself, be consulted, make the final call, or just be in the loop?

Career Planning and avoiding Artificial Competition

Too often, individuals will view themselves as focusing on a particular career path (e.g. become the head of engineering). The trouble here is that it puts them in competition vs everyone else at the company, and only one person will actually be able to achieve that goal.

Instead, you should focus on acquiring all the skills to be a. great head of engineering, whether at your current company or elsewhere. Figure out whether you are weak at recruiting, strategy, technical direction, supporting your reports, etc. You can work to maximize these skills to make yourself better without limiting yourself to the fishbowl of peers you are with.

Talking to press

Three simple rules for talking to press (from his time at Digg).

  1. Respond to the question you want to answer, not necessarily the question you were asked. (I think Paul Graham has a similar idea.)
  2. Stay positive
  3. Speak in threes. Narrow your message to a few points.

Model, document, share

When instituting a new process, first model out your teams health via various metrics. Then, try the process as an experiment. When you are confident it works, document it and share it. It tends to work better than mandates, people can pick up the technique if they'd like.

Work the system, not the exceptions

When creating new policies, you want them to be fair, evenly applied, and to limit 'exceptions' which bubble up outside the system. That said, there will always be exceptions. It's best to attempt to re-do the policies every few months in batch.

Management is an ethical profession

Previously I hadn't thought of managers as being mostly ethics-forward, but in retrospect, this is exactly right on. It's on the manager to be fair, even, and explain things well.

  1. What's good for the company?
  2. What's good for the team?
  3. What's good for me?

Three ways to handle a request

  1. Close out the issue, make a decision and communicate it so it never comes back
  2. Solve, design a solution do that you never need to spend time on this problem again
  3. Delegate, give it to someone who has the skills or can work in the system.

"With the right people, any process works. With the wrong people, no process works"

Managing in the growth plates

Management can be very different depending on whether you are in the "growth plates" the new part of the company or the piece which has solidified. It's worth being aware, which part are you in? Is it a young part of the company where it's important to just go ahead and execute? Or is it more stagnant, where ideas are treasured more?

Sprint Planning

A good sprint process answers the following questions...

  • the team knows what to work on
  • the team knows why the work is valuable
  • the team can determine if the work is complete
  • the team knows how to figure out what to work on next
  • stakeholders can learn what the team is working on
  • stakeholders can learn what the team plans to work on next
  • stakeholders know how to influence a team's plans

Culture

Make your peers your first team

I'd never heard it put this way, but Larson advises making your peers your 'first' team. These are the group of people who will be pursuing the same sets of problems you are, and they won't be awkwardly separated by the layer of management.

In order to have a good group of peers, you need to know what everyone is working on, and know who they are.

Killing your heroes

Don't keep working harder! If you have hero programmers, try and get away from that culture. Either the heroes will get burnt out, or else the project will fail.

I've seen this a few times at Segment. We had some remarkable people who could do truly astounding things. But we didn't quite have this dichotomy of 'haves' and 'have nots'.

I think the key here is being able to split the project outside of one person's brain. Once that happens, the documentation, testing, runbooks, etc all get much better.