<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"><title>Calvin French-Owen</title><link href="https://calv.info/atom.xml" rel="self"/><link href="https://calv.info"/><updated>2026-03-10T06:22:07.443Z</updated><id>https://calv.info/</id><author><name>Calvin French-Owen</name></author><entry><title>Coding Agents in Feb 2026</title><link href="https://calv.info/agents-feb-2026"/><id>https://calv.info/agents-feb-2026</id><updated>2026-02-17T12:00:00.000Z</updated><author><name>Calvin French-Owen</name></author><summary>My point-in-time snapshot of using Claude Code and Codex together. Opus shines at context management and tool use, Codex writes fewer bugs. Here&#x27;s how I use both.</summary><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I recently joined my friends Diana, Harj, Garry, and Jared on the <a href="https://www.youtube.com/watch?v=qwmmWzPnhog">YC Lightcone Podcast</a> to discuss coding agents. I had a lot of fun with the conversation, but afterwards I couldn&#x27;t help but feel like I hadn&#x27;t gotten into the details of how I&#x27;m using the different agents. And ultimately, the details matter a lot!</p>
<p>In this post, I&#x27;d like to dive into some more of the nuance around the models.</p>
<p>As quick background: I helped launch the <a href="https://calv.info/openai-reflections">Codex web product</a>, and have worked pretty extensively with <a href="https://calv.info/coding-agent-metagame">Claude Code</a> in the intervening months. I&#x27;ve come to the conclusion that both models have different strengths and weaknesses related to their training mix.</p>
<p>The biggest change to my workflow is that <em>my time</em> is now the biggest consideration. I&#x27;m primarily picking my coding agent <em>as a function of how much time I have</em>, and how long I want it to run autonomously (is it better to get a 80% draft done overnight, or try and work collaboratively with the model during the day?). Depending on the type of feature and how mission critical it seems, I&#x27;ll reach for different tools. I pay for all of Claude Max, ChatGPT Pro, and Cursor Pro+, and it&#x27;s some of the best money I spend anywhere.</p>
<p>Here&#x27;s my point-in-time snapshot of what I&#x27;m doing today (Feb, 2026). It all changes quite quickly so I don&#x27;t expect this to stay relevant over time.</p>
<h2>Guiding principles</h2>
<p>To use coding agents well, you must <strong>understand context</strong>.</p>
<p>It&#x27;s easy to fool yourself into thinking that the coding agents are magical models. They have read all of the internet, developed deep levels of intuition for the structure of codebases, and been trained to write extremely correct code.</p>
<p>But at the end of the day, the agent is doing next token prediction. And each token <em>must</em> fit in a context window.</p>
<p>There are a bunch of corollaries that fall out of that...</p>
<ul>
<li><strong>Your work needs to somehow be chunked.</strong> If the problem you are trying to solve is &#x27;too big&#x27; for the context window, the agent is going to spin on it for a long time and give you poor results.</li>
<li><strong>Compaction is a lossy technique.</strong> When deciding what to compact and how, the agent is going to make choices on which information to include and omit. Maybe it does a good job, maybe it doesn&#x27;t! In my experience, more compaction tends to lead to more degradation in performance.</li>
<li><strong>Externalizing context into the filesystem</strong> (e.g. a plan doc with stages which are checked or not) allows agents to selectively read and remember without filling up the full context of the conversation. This is helpful for resuming tasks and continuing to be context-efficient.</li>
<li><strong>Stay in the &#x27;smart&#x27; half of the context window.</strong> It&#x27;s generally easier to train on short-context data vs long-context data. Results will tend to be better when the context window is &#x27;less full&#x27;. Dex Horthy calls this staying out of the <a href="https://www.youtube.com/watch?v=rmvDxxNubIg&amp;t=355s">dumb zone</a>.</li>
<li><strong>You don&#x27;t know what you don&#x27;t know.</strong> If the agent somehow misses a relevant file or package, it might really go in a direction you didn&#x27;t anticipate. If it&#x27;s not in the context window, there&#x27;s no way to know. Your codebase&#x27;s structure can help this, as can &#x27;progressive disclosure&#x27; of parts of the architecture. OpenAI has a nice <a href="https://openai.com/index/harness-engineering/">blog post</a> about structuring many different markdown files to do this well.</li>
</ul>
<p>As a result, model performance and speed is governed <em>both</em> by the pure performance of the model, but also by how it is able to manage multiple context windows and delegate to sub-agents or teams of agents.</p>
<h2>Opus: context whiz, better tool-use, more human</h2>
<p>Claude Code is my regular driver for planning, orchestrating my terminal, and managing git/github operations. I&#x27;ll typically use it for creating an initial plan, scripting against things on my laptop, and also for explaining how parts of the codebase work.</p>
<p>Opus has been trained to work across context windows extremely efficiently, so using it with Claude Code <em>feels</em> much faster than using Codex. You&#x27;ll notice Opus frequently spinning up multiple sub-agents simultaneously, whether it&#x27;s to <code>Explore</code> parts of the codebase or handle multiple simultaneous <code>Task</code> calls. The explore tool uses Haiku so it is very fast at processing a lot of tokens, and handing the relevant context back to Opus.</p>
<p>What&#x27;s more, Opus has been trained well to use the tools on my laptop: <code>gh</code>, <code>git</code>, various MCP servers. Occasionally I will use the <code>/chrome</code> extension to verify a bug, which works pretty well, but can be slow and buggy.</p>
<p>I also find the permission model of Claude Code is easier to understand. There&#x27;s a bunch of prefixes for commands you can run. With Codex models it&#x27;s harder to whitelist individual CLI tools because the model will often &quot;script&quot; against these commands in bash (<code>for ... gh</code>).</p>
<p>I&#x27;ve talked a lot about Claude Code being an incredible harness for the model, and there are a lot of small things that it does which make it nice to use: updating the terminal title to be task-relevant, showing the current PR in the statusline, the little status messages.</p>
<p>Finally, a lot of folks talk about &quot;personality&quot; of the model. I don&#x27;t buy into this too much, but find Opus is more willing to give me human-understandable PR descriptions and detailed architecture diagrams which are easy for me to understand.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/agents-feb-2026/claude-code-pr.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Fig 1: Claude Code</span></div></div>
<div class="mt-7 md:mx-0"><img src="https://calv.info/agents-feb-2026/codex-pr.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Fig 2: Codex</span></div></div>
<p>If I am asking for an explanation about how some piece of the code is structured, I&#x27;ll typically use Claude Code.</p>
<p>I also find Opus is a little more &#x27;creative&#x27; in terms of suggesting things that I may have forgotten to mention. This is actually a very nice property when creating plans, I&#x27;ll invariably leave some aspect out, and Opus can help point out areas of ambiguity.</p>
<h2>Codex: way fewer bugs</h2>
<p>Where Codex shines is <strong>correctness of code</strong>. Other folks who are pushing the models a lot <a href="https://x.com/steipete/status/2018032296343781706">seem</a> <a href="https://x.com/antirez/status/2022045607385596411">to</a> <a href="https://x.com/thdxr/status/2022301118458462647">agree</a>.</p>
<p>I run on GPT-5.3-Codex-xhigh or high, and the Codex code just straight up has fewer bugs. Opus will frequently add a set of React components which pass unit tests, but then the model just straight forgets to add it to the top-level <code>&lt;App&gt;</code> which gets rendered. There&#x27;s obvious off-by-one errors which don&#x27;t get caught. Dangling references or race conditions which are subtle and hard to spot.</p>
<p>For a long time, I thought there was a negligible difference between the two models. But after seeing enough PRs with automated reviews from Codex and Cursor&#x27;s Bugbot, I&#x27;ve realized that OpenAI&#x27;s model is superior in terms of the code it writes. If you&#x27;d like to A/B test this yourself, just check out a branch and run <code>/code-review</code> in Claude Code vs <code>/review</code> in Codex.</p>
<p>Unfortunately, Codex is <em>slow</em>.</p>
<p>The biggest reason for this is that it&#x27;s not delegating tasks across context windows, though I get the sense that the latency between tokens might also be higher. I have been running with the experimental subagent support (toggle in <code>/experimental</code>) which does manage to do this and it works well, though perhaps not as seamlessly as Claude. The parallelism still isn&#x27;t there quite yet.</p>
<p>The net result between the two is that I&#x27;ll <em>start</em> with Claude Code and keep that open as a pane, then flip to Codex when I&#x27;m ready to actually start the coding.</p>
<p>Every so often I will still leverage Opus, but it mostly comes down to leveraging tool use (<code>/chrome</code> to debug, make stylized frontend changes).</p>
<h2>Useful things</h2>
<p>A few caveats: I am working on greenfield codebases. They are much <em>much</em> smaller than any production codebase I&#x27;ve worked in, and are relatively token-efficient.</p>
<p><strong>Repo structure</strong> -- all of my repos have a <code>plans/</code> folder, with numbered plans as I ask the agents to implement things. Typically I&#x27;ll also have an <code>apps/</code> folder for different services I&#x27;m running. I&#x27;m using turborepo to manage monorepos in typescript, and <code>bun</code> for fast installs.</p>
<p><strong><a href="https://ghostty.org/">Ghostty</a></strong> -- Mitchellh&#x27;s terminal is fantastic. Fast, native, and constantly improving. For a little while I was running multiple Claude/Codex instances in tmux, but now I just use multiple panes within the same terminal tab.</p>
<p><strong>Next.js on Vercel, APIs on Cloudflare Durable Objects</strong> -- I&#x27;ve mostly been leveraging Cloudflare Durable Objects for APIs and storage. The idea that you can partition your database ahead of time, wake it up on demand, and not have to worry a ton about concurrent writes is a really wonderful property. In a time of agents acting on small pieces of data, I can&#x27;t help but feel that it just <em>makes sense</em> from an infra perspective. Cloudflare is also leaning into this with their <a href="https://github.com/cloudflare/actors">cloudflare/actors</a> library which bundles up compute with small bits of co-located storage.</p>
<p><strong>Worktrees</strong> -- since all of my code is relatively lightweight and small, I&#x27;m able to leverage parallel worktrees by using <code>bun install</code> and then <code>bun run dev</code> across each one to verify locally. I have a <code>worktree</code> skill which copies any relevant plans, env vars, and other updates, and starts a new branch. I didn&#x27;t really use worktrees prior to coding agents (I was mostly a branch guy) but having them work in parallel and letting Claude Code manage the syntax is a godsend.</p>
<p><strong>Plan, Implement, Review</strong> -- I almost always have the model start with a plan. This is useful for two reasons: 1) it externalizes the context beyond a single context window 2) it allows me to review or interrogate what was done. If an agent stops for some reason, I can always start a new context window and tell it to resume the plan.</p>
<p><strong>Preview deploys</strong> -- every branch gets a fresh Web + API deploy. It makes running in parallel and quickly testing branches a breeze. It&#x27;s way harder to work without this.</p>
<p><strong>Cursor Bugbot and Codex Code Review</strong> -- I still spot-check the code, and make sure I understand it from an architecture level, but increasingly I am not reading every line of every PR. I rely on agents to spot subtle bugs (they are far better at it than I am). For awhile I was using Claude Code, Cursor&#x27;s Bugbot, and Codex. Claude Code didn&#x27;t seem to catch real issues for me. Since then, I&#x27;ve settled mostly on Cursor as the default option since you can tell when it&#x27;s working, though I find the results of Codex are good too.</p>
<h2>Skills</h2>
<p>Today I have a bunch of <strong>skills</strong> and a shared AGENTS.md/CLAUDE.md defined in a repo I call claudefiles. My rule for adding skills is not to do it prematurely, but only if I find myself settling into a workflow after a few times.</p>
<p>While I find the AGENTS/CLAUDE.md are handy for steering the model overall, skills are for two specific reasons:</p>
<ol>
<li><strong>They let you chain and automate workflows.</strong> This is my (and generally?) the most common use case for skills. I&#x27;ll typically want to start with a plan, then implement in stages, then review. Why not have skills for each of these processes? Then I can have a meta skill which implements them all in order (see below).</li>
<li><strong>They help you split context windows</strong> -- at least in Claude Code, invoking a skill can happen in a new context window if you set <a href="https://code.claude.com/docs/en/skills#frontmatter-reference"><code>context: fork</code></a>. This is really handy if you have a single context window for the &#x27;master orchestrator&#x27;, and then sub-agents which go and do parts of a task. The orchestrator can keep sub-agents working, and process outputs as they finish.</li>
</ol>
<p>Skills are also great because they are <em>very</em> context efficient. Unlike MCP calls (which take up thousands of tokens), skills tend to be ~50-100 tokens.</p>
<h2>Automating with skills</h2>
<p>Early on in my journey with Claude Code, I was intrigued by the idea of skills marketplaces. That you could just install great frontend-design / security detail / architecture review.</p>
<p>After working for awhile, I&#x27;ve mostly abandoned skills that anyone else has written. Instead, I start doing things manually, then think about how I want to automate them. Over time, I&#x27;ve ended up building a lot of time-saving automation.</p>
<p>Here&#x27;s how my use of skills has evolved. I say this not to tell you what the &quot;golden path&quot; for which skills to use, but more as an illustrative example of how you might automate more of what you are doing with them. Here&#x27;s my journey.</p>
<p>The first skill I added was obvious, rather than tell the model to commit and push (in half a dozen different ways) add a <code>/commit</code> skill which is borrowed directly from Claude Code.</p>
<p>Then I realized if I wanted agents working in separate worktrees, I should probably just rely on Claude Code to manage it. So I added a <code>/worktree</code> skill which creates a new worktree, keyed off the plan&#x27;s number (e.g. 00034-add-user-auth).</p>
<p>The next thing I noticed myself doing was always doing a plan step, then implement the plan in stages. I&#x27;d first clear the context window, and then say &quot;implement the next stage of the plan, then <code>/commit</code>&quot;. But it became clear this was a good candidate for a skill: <code>/implement</code> which basically does exactly that.</p>
<p>Of course, just typing <code>/implement</code> in succession is annoying, so I added an <code>/implement-all</code> which looks at the current worktree path, ties it to the plan number, and then implements everything in stages. Sometimes when I&#x27;m running overnight, I&#x27;ll leverage the <code>/ralph-loop</code> just to keep it going until all stages are done. I also added a local <code>/codex-review</code> which basically spawns a <code>codex --review</code> process to run the review.</p>
<p><code>/implement-all</code> was working pretty well, but I wasn&#x27;t really closing the loop on addressing feedback from the models in CI. I&#x27;d get these bug reports from Cursor + Codex indicating that there was still more work to do, even after each phase of the plan had succeeded.</p>
<p>So I added an <code>/address-bugs</code> skill which looks at the GitHub API, searches for Cursor + Codex comments since the last commit, and then attempts to verify and fix them. Then it will fix those bugs.</p>
<p>Finally I realized this was just working in a loop, so I added a <code>/pr-pass</code>, which runs at the end of <code>/implement-all</code>. It essentially 1) pushes to the remote 2) waits for all CI to pass 3) runs <code>/address-bugs</code> and then optionally goes to step 1 until it&#x27;s finished.</p>
<p>These were all nice speedups, and I realized they were helping me with a lot of bookkeeping. So I also added a <code>/focus</code> skill which looks at my <code>plans</code> dir, my outstanding PRs, and my worktrees to help refresh my memory and keep me on track.</p>
<p>Importantly, I don&#x27;t think I would&#x27;ve had any success trying to <em>start</em> with this process. But by building it over time and noticing little areas where I could automate, it&#x27;s significantly improved my workflow.</p>
<h2>Stuff I didn&#x27;t mention here</h2>
<p>I gave the Codex App a shot recently, and I was pleasantly surprised at the attention to detail and the little touches related to it. I have yet to move my workflow over fully from the CLI tools since I appreciate the flexibility of the terminal. Still, I like the idea. I tried giving Cowork a spin as well, and had a hard time getting it to work properly. In each case I think the sandboxing model makes a big difference!</p>
<p>Occasionally I will use the web interface for async jobs, though I find right now I&#x27;m more and more tied to the CLI. This is different from what I was doing 6 months ago, where I was mostly using Cursor and the built-in agent or extensions.</p>
<p>I&#x27;ve picked up <a href="https://www.pencil.dev/">pencil.dev</a> for working on frontend UI. The deployment model is fascinating if nothing else, it shells out to your local Claude Code (able to re-use your existing subscription).</p>
<p>I&#x27;m still feeling like I should be using a more well-defined issue tracker. David Cramer&#x27;s <a href="https://github.com/dcramer/dex">Dex</a> seems to be promising, in a similar spirit to Steve Yegge&#x27;s <a href="https://github.com/steveyegge/beads">beads</a>. Both feel like a little more than I need, but perhaps I&#x27;m just not in the right workflows.</p>
<p>I am not really using Playwright or other automated e2e MCPs.</p>
<h2>Free advice to the labs</h2>
<p>No one asked, but here it is :)</p>
<h3>Anthropic</h3>
<p><strong>Model:</strong> Like I mentioned before, the Opus models totally shine on feeling human, working with engineering tools, splitting context in the right ways to achieve good parallelism, and taking liberty on things &quot;you might have forgotten&quot;. Where they don&#x27;t is in the code correctness. I&#x27;d love to see an &#x27;Opus Strict&#x27; mode, where some base model has really been RL&#x27;d to achieve better performance. Opus is where I start, but Codex writes all my code. If I was budget constrained, I&#x27;d probably pick Codex.</p>
<p><strong>Product harness:</strong> This is the one area where I basically have no notes. Boris and Cat have better ideas than I do mostly. My two requests:</p>
<ol>
<li>Adopt <a href="https://agentskills.io/home">agent skills</a> so I don&#x27;t have to do this dumb symlinking between a bunch of directories. I think Anthropic has little incentive to make this happen, but it&#x27;d be nice for those of us who flex between the two CLIs.</li>
<li>Publish the output format for <code>--stream-json</code>. I&#x27;m probably not alone in terms of being interested in running Claude Code in a sandbox on behalf of users. But I am worried the format will change out from underneath me. Depending on the sandbox, it&#x27;s also been annoying to setup the right pathing properly for Claude Code, whereas the other CLI tools (Codex, Cursor, Gemini) seem to install without issue.</li>
</ol>
<h3>OpenAI</h3>
<p><strong>Model:</strong> The number one thing the OpenAI models can do to improve is figure out how to split across context windows and delegate to sub-agents. I&#x27;m using the experimental sub-agent version. There&#x27;s also this &quot;more than what you asked for&quot; concept that Opus manages to accomplish during planning that would be useful.</p>
<p><strong>Product harness:</strong> I have a lot of small feedback here that I think would go a long way (and maybe some of this is out of date)...</p>
<ol>
<li>I still don&#x27;t understand the sandboxing model vs Claude Code&#x27;s, and because the model often tends to script, I end up needing to give a lot of approvals. Since the models are so determined, I worry a bit about running in <code>--yolo</code> mode.</li>
<li>Like Claude Code, add some sort of &quot;user guide&quot; that ships with the CLI so I can ask questions about things like where to put skills, which fields are supported, etc. I&#x27;d love to be able to tell Codex what sort of sandboxing model I want and have it automatically configure that without needing to have it fetch the repo and look at the source.</li>
<li>Make <code>/review</code> a regular skill rather than the odd sort of packaged command it is right now. I want the model to be able to invoke it dynamically.</li>
<li>Nit: change the title of my terminal tab to something related to the task when executing. I frequently lose track of dozens of <code>codex</code> title tabs.</li>
<li>Have some sort of training specific to PR descriptions and commit descriptions. I generally like Codex&#x27;s terse personality, but these could be expanded.</li>
<li>Support <code>context: fork</code> in skill definitions.</li>
<li>If a link overflows the line in the pane, it should still be clickable.</li>
<li>Show my current worktree/PR/branch name at the bottom of the status bar.</li>
</ol>
<h2>Where this all goes</h2>
<p>A few weeks ago, a friend sent me the post on <a href="https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04">Gas Town by Steve Yegge</a>. It&#x27;s still one of the wildest things I&#x27;ve read.</p>
<p>If you haven&#x27;t seen it, Steve basically makes the case that you should just always be maxing out tokens. You should have a pool of workers, who are hungry to accept more work, and they should be going 24/7. You should make lots of plans. You should expect to throw them away.</p>
<p>For all your take on whether the abstractions are correct or not, <em>directionally</em>, I think Steve is absolutely correct.</p>
<p>The dream is to have your laptop (or cloud sandbox or whatever) constantly churning on ideas in the background, and have you be able to nudge it in directions, go off and do research, review its output and come back. Working with coding agents has made me feel much more like an engineering manager again when it comes to coordination, but without worrying about motivating the agents or the personalities involved.</p>
<p>Today it feels like we&#x27;re quite a bit closer to that future. This is over-hyped on Twitter, but I do really try and kick off 3-4 tasks in Codex before going to bed, so they&#x27;ll be ready for review in the morning. But I&#x27;m still not at the point where I feel like I can have agents running 24/7.</p>
<p>I think there are two barriers to more progress here...</p>
<ol>
<li><strong>Context window size/coordination</strong> (see above). Agents can&#x27;t endlessly compact/recycle in the same context window. We need either smarter harnesses or something which provides more delegation.</li>
<li><strong>Resistance to prompt injection.</strong> Sometimes the agents will only run for a few minutes before asking for an escalated approval. If I&#x27;m going to let them run overnight, I don&#x27;t really trust them to run in <code>--yolo</code> mode, but there&#x27;s a subset of sane permissions/domains I&#x27;d allow.</li>
</ol>
<p>On the first point, Cursor has been <a href="https://cursor.com/blog/long-running-agents">pushing the bounds</a> of what <a href="https://cursor.com/blog/scaling-agents">swarms of agents can do</a> across many context windows. I still haven&#x27;t seen great answers to the second, and it&#x27;s an active area of research. Running in a sandbox seems like the best workaround for the moment, but this is still more painful to configure than it should be, and if there&#x27;s privileged data that your agent has access to with open internet access, it&#x27;s vulnerable to the <a href="https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/">Lethal Trifecta</a>.</p>
<p>As a solo engineer working on projects, I&#x27;m already finding that I am the bottleneck when it comes to the right ideas. More and more, <strong>ideas</strong>, <strong>architecture</strong>, and <strong>project sequencing</strong> are going to become the limiting factors for building great products.</p></div></content></entry><entry><title>How I Use Obsidian</title><link href="https://calv.info/how-i-use-obsidian"/><id>https://calv.info/how-i-use-obsidian</id><updated>2026-01-07T22:00:00.000Z</updated><author><name>Calvin French-Owen</name></author><summary>Obsidian has become my personal operating system for thinking and writing. Here&#x27;s the golden rule I follow and the systems I use.</summary><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Recently, I&#x27;ve found myself introducing more and more friends to <a href="https://obsidian.md/">Obsidian</a>.</p>
<p>For those unfamiliar: Obsidian can act as a nice markdown editor for taking structured notes. It&#x27;s sort of like a local-first hybrid of Notion and Roam Research. Obsidian is local-first and works with markdown files on your laptop, so it has a bunch of nice properties: 1) it&#x27;s always available 2) it&#x27;s fast 3) it works remarkably well with Claude Code.</p>
<p>I&#x27;ve been a daily Obsidian user since November, 2021 (I checked my daily notes), and it&#x27;s gradually become my &#x27;personal operating system&#x27; for a lot of my own thinking and writing. In that time, I&#x27;ve become more opinionated about how it should work.</p>
<p>This post serves as a sort of guide for how I work with it.</p>
<h2>The golden rule of note-taking</h2>
<p>I have one rule of working with Obsidian and other note-taking apps:</p>
<blockquote>
<p><strong>The primary goal of the tool should be eliciting thought. Nothing else matters.</strong></p>
</blockquote>
<p>It is easy to get caught up in the &#x27;most optimal note-taking setup&#x27;. Automations, tagging, etc, etc. I&#x27;ve now come to believe that the only thing that matters is whether your system causes you to have better (and probably more) thoughts.</p>
<p>There are all sorts of corollaries that fall out of this:</p>
<ul>
<li><strong>Don&#x27;t think too hard about where to put notes</strong>. If you need to make a decision every time you take a note, you probably are recording fewer thoughts!</li>
<li><strong>Don&#x27;t obsess over organizing old notes.</strong>  It&#x27;s tempting to want to spend time &#x27;tending to your note-taking garden&#x27;. Maybe you want it all formatted the same way or to follow some sort of standard. But I don&#x27;t think this makes much sense. 99% of the time I never re-visit my notes, and that is <em>fine</em>. The number one goal is to elicit my thoughts. The only time I use folders now is for automations (daily/ or clippings/), where the folder structure is done for me.</li>
<li><strong>Notes are for you.</strong> The other benefit I like from Obsidian is that the state is all local (encrypted if you use Obsidian sync or iCloud). I originally didn&#x27;t think this would matter that much, but there&#x27;s something freeing about knowing your notes will not leak.</li>
</ul>
<p>When in doubt, respect the golden rule. It&#x27;s almost always better to just start writing.</p>
<h2>Systems I use</h2>
<p><strong>Daily note</strong> - the backbone of my notes is the &#x27;daily note&#x27;. I leverage the &#x27;daily note&#x27; plugin and make sure to set a hotkey for it (Ctrl+Shift+D). I set a template for it (stored in templates/Daily.md) just to get started, which I refresh every ~6mo or so. Here&#x27;s what my current template looks like.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info./how-i-use-obsidian/daily-template.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/></div>
<p>Again, the point of this is so I don&#x27;t have a blank page and I can just get started writing. I&#x27;ll typically journal in the &quot;What&#x27;s happening?&quot; section, and then add any goals I&#x27;d like to finish to the goals/calendar. I have some standard links I keep to remind myself of longer term projects, but I haven&#x27;t been using these a ton.</p>
<p>Beyond the daily note, I have a few types of long-lived notes: <strong>Projects</strong>,  <strong>Nouns</strong>, <strong>Lists</strong>, and <strong>Docs</strong>.</p>
<p><strong>Project</strong> notes revolve around some sort of longer-lived project. I am pretty free-form with these (see rule #1). Rather than worry about organizing them, I&#x27;ll typically just come up with a name for the project that&#x27;s a single word and then add an em-dash (colons aren&#x27;t allowed) to whatever. If it&#x27;s some point-in-time doc, I&#x27;ll typically add a month it&#x27;s related to (e.g. Dec &#x27;25).</p>
<p>Right now, I&#x27;m hacking on an app called <strong>Tacit</strong> so this is what that looks like:</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info./how-i-use-obsidian/project-notes.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/></div>
<p>I&#x27;ll sometimes create a &quot;Captain&#x27;s Log&quot; doc for running notes. Again, I&#x27;m not too precious about structure here.</p>
<p><strong>Nouns</strong> typically reference some long-lived thing that I&#x27;ll come into contact with again and again: people, companies, restaurants, concepts / mental models. I tend to give these their own note and take notes directly there. I have a short meeting template which basically is just a header with the current date which I&#x27;ll insert whenever I interact with that entity.</p>
<p><strong>Lists</strong> are almost always things that I want to keep track of and pretty self explanatory: book recs, goals, etc.</p>
<p><strong>Docs</strong> are kind of the &#x27;catch-all&#x27; for when an idea seems worth exploring. Rather than fire up an entirely new note, my typical workflow is to add the stub to my daily note as one of my TODOs, then go in and flesh out the doc. I&#x27;m not sure why, but something about seeing a stub that needs to be written causes me to want to flesh it out.</p>
<p><strong>Hotkeys + Slash Commands</strong>. Using Obsidian well means learning the hotkeys. You don&#x27;t need all of them, but I do use a handful on a very regular basis:</p>
<ul>
<li><em>Daily notes</em>: ctrl+shift+d, (next: ctrl+shift+n, prev: ctrl+shift-p)</li>
<li><em>New note</em>: ctrl+n</li>
<li><em>Formatting</em>: bold (cmd+b), italic: (cmd+i)</li>
<li><em>Open/Search</em>: cmd+o</li>
<li><em>Add/Complete TODO:</em> cmd+l</li>
<li>Slash command: insert template, insert today&#x27;s date</li>
</ul>
<p><strong>MacWhisper</strong>. If I ever get stuck writing, I&#x27;ve often found some use in &#x27;changing modes&#x27;. Transcription software has gotten good enough that I&#x27;ll just start speaking and that usually allows me to go from there. I like <a href="https://goodsnooze.gumroad.com/l/macwhisper">MacWhisper</a>, which (like Obsidian) works locally and is out of the way. I&#x27;ve configured the right option key to toggle it on and off. I use NVIDIA&#x27;s parakeet model which I find is both fast and accurate. I like the fact that I control where the voice appears by where my cursor is.</p>
<p><strong>Theme: Flexoki</strong>. For the longest time I didn&#x27;t care much for themes and just used the default. I&#x27;ve recently fallen in love with Steph Ango&#x27;s (Obsidian creator) <a href="https://stephango.com/flexoki">Flexoki</a> theme, so I use that. I find it feels a little more personal, which gets me to write slightly more on the margin.</p>
<p><strong>Templates</strong>. I use a couple of templates. There&#x27;s the Daily template which is the whole note (shown above). I have a Weekplan template which I use to <a href="/the-secret-of-the-weekplan">plan my weeks</a>. And I have a &#x27;meeting&#x27; template which I insert into one of the &quot;noun notes&quot; whenever I have a meeting.</p>
<h2>Stuff I use occasionally</h2>
<p><strong>Web Clipper</strong>. One of the officially supported ways to integrate with Obsidian is the <a href="https://obsidian.md/clipper">web clipper</a>. Every so often, I&#x27;ll find myself starting to copy/paste from some website, and then remember I can just click a button and it will create a new note in <code>clippings/</code>. Useful for saving things like recipes or referencing specific blog/reddit posts.</p>
<p><strong>Claude Code</strong>. Because it&#x27;s all markdown, Claude Code is <em>very</em> good at things like Semantic Search or updating parts of your vault. I&#x27;ve also used it to keep track of stuff that I&#x27;m doing (e.g. put this in my daily note for obsidian).</p>
<p><strong>Mobile app</strong>. If there&#x27;s one place which I have a slightly mixed bag, it&#x27;s the mobile app. Don&#x27;t get me wrong, Obsidian has a great mobile app which is actually really well designed. But the fact that it&#x27;s local first means that it effectively downloads a bunch of markdown while you&#x27;re using the app, leading to the occasional skew issues. I wish there were a better way to do voice transcription → notes on the device but everything I&#x27;ve tried is sort of kludge-y due to the way that Apple has setup the ecosystem on iPhone</p>
<p>Then there&#x27;s stuff I tried and don&#x27;t really use much anymore: tagging, folder organization, complicated templates, community plugins. I am sure these are great, but I don&#x27;t find them that useful. I don&#x27;t use Obsidian for obsessively managing TODOs anymore. I treat each day as a new todo list, and then I keep moving.</p>
<h2>Interesting Outcomes</h2>
<p>When I first started using Obsidian, I tried to add too much structure. Now I find I just start writing about whatever is on my mind. Daily notes have gotten to the point where they&#x27;ve become a habit, they are pretty much the first thing I start looking at when I start my day.</p>
<p>My friend Titiaan asked me if there were any downsides to using Obsidian. The main one I&#x27;ve found is that I now <em>need</em> to write in order to think. It&#x27;s sort of hard for me to deal with a complex decision without spending some time writing about it.</p></div></content></entry><entry><title>You Still Need to Think</title><link href="https://calv.info/you-still-need-to-think"/><id>https://calv.info/you-still-need-to-think</id><updated>2025-09-26T15:15:04.674Z</updated><author><name>Calvin French-Owen</name></author><summary>When it comes to coding agents, the product shape fundamentally shifts how you think.</summary><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>As coding agents become more capable and long-running, they don’t remove the human job of thinking. Someone still has to direct the work—set goals, choose constraints, and judge outputs.</p>
<p>Everyone seems to recognize this in the abstract. But nobody seems to talk much about how the product shape fundamentally shifts the type of thinking required to write code.</p>
<p>I&#x27;d argue that seemingly minor UX differences between different coding agents end up having a massive impact on how users spend their &quot;thinking budget.&quot;</p>
<p>With a <strong>remote-first product like Codex Cloud</strong> (not the CLI), the product encourages you not to spend time thinking while the agent is <em>working</em> and spend more time thinking about the <em>end result</em> that the agent gave. We intentionally didn&#x27;t want users watching the agent invoking terminal commands, because the model works very differently than a human might.</p>
<p>With an <strong>interactive product like Claude Code or Codex CLI</strong>, you naturally spend more time thinking about about specs and the high level approach the agent is taking as you follow the chain of thought in the terminal. But you need some other means of looking at diffs (editor / GitHub), so it tends to be less a part of the workflow vs following the plan the tool has created and the commands it is running to verify as it implements.</p>
<p>With an <strong>IDE-focused product like Cursor</strong>, you accept most diffs as they come in. Your thinking window is relatively short because you are accepting code (or not). You need to do relatively little thinking to supply the right context, because it&#x27;s already in your editor. The trade-off is that you have probably broken down the problem ahead of time. You need to spend more time coming up with the plan/approach yourself.</p>
<p>All of these products need to shift the &quot;active thinking&quot; cycles around between...</p>
<ol>
<li>supplying the right context</li>
<li>coming up with a plan</li>
<li>implementing the code</li>
<li>verifying and reviewing it</li>
</ol>
<p>If I were to guess, the area that LLMs are strongest right now is <strong>3 (implementation) &gt; 4 (verification) &gt; 2 (planning) &gt; 1 (context)</strong>. Until we deploy tools for doing search across the organization, eliciting info from the user, and understanding the global context of an org–providing the right context still seems to be the area where humans can provide the most value.</p>
<p>Conversely, LLMs excel at taking well-specified plans and implementing them. They handle race conditions, error handling, and complex technical details remarkably well.</p>
<p>If you accept this idea of a thinking budget, it&#x27;s easy to understand why engineers might have very different experiences (positive and negative) using the different tools.</p>
<p>Some problems require only a clear spec—you know the implementation completely. Others require writing code to &quot;think through&quot; the problem, then refactoring iteratively. Some engineers prefer reviewing a first pass to writing from scratch.</p>
<p>Whatever your preference, different UX requires you to think in different ways. I think it&#x27;s unlikely that a &#x27;single workflow&#x27; will satisfy all users. You still have to think... but the best products will let users choose <em>how</em> they want to do that. <sup id="footnote-fnref-1"><a href="#footnote-fn-1">1</a></sup></p>
<section data-footnotes="true" class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
<ol>
<li id="footnote-fn-1">
<p>The new Codex IDE extension actually does a very good job of this. You can decide to kick off tasks locally, or hand them off to the cloud. Disclosure: I was working on this team, so I am biased. <a href="#footnote-fnref-1">↩</a></p>
</li>
</ol>
</section></div></content></entry><entry><title>The Coding Agent Metagame</title><link href="https://calv.info/coding-agent-metagame"/><id>https://calv.info/coding-agent-metagame</id><updated>2025-08-26T15:15:04.674Z</updated><author><name>Calvin French-Owen</name></author><summary>Using Claude Code feels more like playing the piano than shuffling tickets in Jira. I get the vague sense that by using the tool differently, I could become a virtuoso. Here&#x27;s my best attempt to explain why that is.</summary><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>There&#x27;s an endless debate around coding agents. What&#x27;s the best way to prompt? How about reviewing the outputs? Are the results garbage, and do agents really waste more time in the end?</p>
<p>But the one thing that all my engineering friends seem to agree on is that &quot;coding today feels much more <em>fun</em>&quot;. And the majority of them point to the same tool: <a href="https://www.anthropic.com/claude-code">Claude Code</a>. <sup id="footnote-fnref-1"><a href="#footnote-fn-1">1</a></sup></p>
<p>After working on <a href="https://calv.info/openai-reflections">Codex at OpenAI</a> <sup id="footnote-fnref-2"><a href="#footnote-fn-2">2</a></sup>, I wanted to do a closer study of Claude Code to understand exactly what makes developers so excited about it.</p>
<p>Initially, I was very skeptical about using a CLI tool to manage coding edits. My entire programming career has been IDE-centric. But after using Claude Code for a few weeks, it&#x27;s hard not to feel that sense of fun.</p>
<p>I could say many great things about the way Claude Code is built–but the most underrated aspect is the feeling that you can <strong>endlessly hack it to your liking</strong>.</p>
<p>Using Claude Code feels more like playing the piano than shuffling tickets in Jira. I get the vague sense that by using the tool differently, I could become a virtuoso.</p>
<p>There&#x27;s not just the game of writing code, but a metagame that we&#x27;re all playing by using it in weird and wonderful ways.</p>
<h2>A retro gaming aesthetic</h2>
<p>When I first ran <code>claude</code>, I was quickly struck by 1) the attention to detail and 2) the inspiration from old-school video games.</p>
<p>Here&#x27;s the first thing I see when I launch Claude Code.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/coding-agent-metagame/cleanshot-2025-08-06-at-1222462x-7jrv29.png" class="border-none" style="transform:scale(1.15);transform-origin:center"/></div>
<p>Unlike your average CLI tool, it actually has... UI design?</p>
<p>Small accent colors make the UI playful and interesting. The three shades of text color draw your eye to specific sections (header &gt; input box &gt; changelog). Animated text subtly changes color. There are tasteful unicode icons that feel vaguely reminiscent of the Anthropic logo. There&#x27;s a fun retro aesthetic. And I&#x27;m greeted with a clear call to action: <code>Try &quot;write a test for InboxList.tsx&quot;</code>.</p>
<p>The very first screen also <strong>tells me how to get more from the tool</strong>. It gives me a changelog, a prompt, and a tip of the day. Expanding the shortcut menu shows me how to use the advanced features.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/coding-agent-metagame/cleanshot-2025-08-18-at-0938202x-1-9gvz29.png" class="border-none" style="transform:scale(1.15);transform-origin:center"/></div>
<p>When making a choice, Claude Code gives me a handful of options for things like follow-ups.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/coding-agent-metagame/cleanshot-2025-08-18-at-0950022x-1-6fhpg6.png" class="border-none" style="transform:scale(1.15);transform-origin:center"/></div>
<p>I can select option 1, 2, or 3 just like I would in an RPG. The hotkeys all work the way I&#x27;d expect.</p>
<p>Claude Code is also partially defined by what it is not–<strong>it&#x27;s not an IDE</strong>.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/coding-agent-metagame/pasted-image-20250818094902-tsrw94.png" class="border-none" style="transform:scale(1.15);transform-origin:center"/><div class="text-center -mt-6 italic"><span>I&#x27;m picking on an old version of JetBrains here which is maybe the worst offender, but also emblematic of the desire to stack IDEs with every feature under the sun.</span></div></div>
<p>This is important for two big reasons:</p>
<ol>
<li>It feels <strong>lightweight</strong>. I&#x27;m not signing up to learn an entirely new editor. I&#x27;m just running a tool in my terminal.</li>
<li>It is <strong>unique</strong> and <strong>differentiated</strong>. Authoring code outside an IDE feels like a weird new paradigm–but so is writing AI-generated code. Rather than compete with IDEs on their terms, a CLI tool embraces a different paradigm.</li>
</ol>
<p>The resulting product feels <strong>easy to try</strong> and also has the <strong>freedom to optimize</strong> for a future where humans aren&#x27;t the ones writing code.</p>
<p>Claude Code can devote more screen real estate to things like subagents or hooks because I don&#x27;t expect to use it to view my folder tree or open files.</p>
<h2>Building trust</h2>
<p>After onboarding users, the biggest hurdle for any agentic tool is <strong>building trust</strong>. Users need to answer questions like: <em>&quot;What complexity of tasks can the model one-shot?&quot;</em> and <em>&quot;How should I verify the outputs without needing to examine every single line?&quot;</em></p>
<p>Vibe-coding apps often struggle with building trust. On the one hand, the user can instantly verify that the behavior works by clicking through the app (great!). On the other, the surface area makes it harder to inspect what is going on when things <em>do</em> go wrong.</p>
<p>When we were building the cloud version of <a href="https://openai.com/codex/">Codex</a> at OpenAI, we didn&#x27;t want the user checking in on the model and trying to &quot;steer&quot; it.</p>
<p>This tends to be good from a correctness perspective: the model is surprisingly adept at failure recovery, and giving the agent a &quot;longer leash&quot; to execute code dramatically improves performance. But it doesn&#x27;t do as much to instill confidence in the user that the model is really doing the right thing.</p>
<p>We knew we needed <em>some</em> solution here, so we showed users the terminal commands that the model was executing in its own environment. This felt like a nice compromise, in addition to having the model output the results of tests and lint steps.</p>
<p>The one problem is that terminal commands are far less intuitive than TODO lists—completed TODOs are straightforward, but <code>sed</code> commands... less so.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/coding-agent-metagame/cleanshot-2025-08-19-at-1031382x-dp80oj.png" class="border-none" style="transform:scale(1);transform-origin:center"/></div>
<p>I&#x27;m not sure whether leveraging the TODO list tool actually improves eval scores or is just a courtesy to the user. But it certainly helps the person in front of the keyboard understand what&#x27;s going on.</p>
<p>There&#x27;s also the question of <strong>context management</strong>. In an IDE, it&#x27;s less clear which files and tabs are being pulled into the context window. Creating a new chat might erase the model&#x27;s prior thinking, but will it still include the file you&#x27;re looking at?</p>
<p>Claude Code makes this easier to reason about: everything invoked within the terminal session is included in the context. If Claude Code is compacting the context window, it will tell you that. If you want to run a compaction manually, <a href="https://docs.anthropic.com/en/docs/claude-code/costs#reduce-token-usage">the docs explain how</a>.</p>
<h2>Speed and momentum</h2>
<p>A large part of what contributes to that feeling of flow is Claude Code&#x27;s speed.</p>
<p>Sampling from the models feels fast. When inspecting the network traffic of the CLI, I see it switching between all three models (Haiku, Sonnet, Opus) depending on what the use case is.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/coding-agent-metagame/cleanshot-2025-08-07-at-1045212x-k73g9i.png" class="border-none" style="transform:scale(1.15);transform-origin:center"/></div>
<p>Looking carefully at the interface, I also notice a bunch of different small cues which tell me that Claude Code is <strong>doing something</strong>:</p>
<ul>
<li>animated icons and flavor text (Contemplating / Flibbertigibetting / Symbioting); they change as work progresses and the colors subtly &#x27;pulse&#x27;</li>
<li>a current runtime counter that counts up every second</li>
<li>a token counter paired with an icon that switches between &quot;upload&quot;, &quot;download&quot;, and &quot;working&quot;.</li>
</ul>
<p>The sum total of all of it is that the interface feels <strong>responsive and alive</strong>. I know that work is happening as I use it.</p>
<p>An OpenAI friend of mine once remarked <em>&quot;I get stressed out when &lt;INTERNAL_AGENT&gt; is not running on my laptop&quot;</em>.</p>
<p>In a time when new SOTA models appear every week and VCs are funding millions of GPUs, not using coding agents feels wasteful. It&#x27;s like leaving free intelligence on the ground.</p>
<p>Claude Code taps deeply into this mentality. Rather than trying to obscure the number of tokens I&#x27;m using, it encourages me to maximize them. And if I pay for the &quot;Claude Max&quot; plan, the tokens I get <em>feel</em> unlimited, even if the per-token cost ends up being high.</p>
<h2>The machine that builds the machine</h2>
<p>This brings me to the last magic piece of Claude Code: the notion that through dedicated practice, I can learn to become 10x as productive.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/coding-agent-metagame/agent-loops.svg" class="rounded-lg border" style="transform:scale(1.05);transform-origin:center"/></div>
<p>When using Claude Code, there are effectively <em>two parts of the loop</em> I can optimize:</p>
<ol>
<li><strong>core agent loop</strong>: building the core product I&#x27;m working on (adding new features, fixing bugs, etc)</li>
<li><strong>product harness</strong>: the tools, environment, memory, and prompts that the CLI uses to run the core agent loop</li>
</ol>
<p>It turns out that optimizing the product harness is almost as much fun as actually building product.</p>
<p>All coding agents involve an element of gambling. When I submit a prompt, I never know where it&#x27;s going to end up. I can keep spinning the wheel a bunch of times to see how it pans out.</p>
<p>But there&#x27;s a key difference with Claude Code. Instead of just changing the way that I <em>prompt</em> the model, Claude Code encourages me to <strong>change the product harness itself</strong>.</p>
<p>That means that whenever Claude Code does the <em>wrong thing</em>, instead of blaming the tool, I find myself asking: <strong>&quot;What could I have done better?&quot;</strong> Could changes in the product harness give me a better result?</p>
<p>This &quot;metagame&quot; ends up being the #1 marketing tool for Claude Code, because it&#x27;s something <strong>everyone does</strong>. It doesn&#x27;t matter whether you write embedded Rust frameworks or Typescript frontends, you&#x27;re going to be experimenting with how best to use all of the features of Claude Code.</p>
<p>Because users can &quot;own&quot; their product harness, everyone wants to show off their tips and tricks. My twitter feed is filled with Claude Code automations and best practices.</p>
<p>The product shape helps this too; CLI tools are naturally composable. It&#x27;s easy to start imagining the possibilities: <em>&quot;Could I kick off tasks from a GitHub issue?&quot;</em>, <em>&quot;Could I chain agents together? And have them invoke the CLI in series?&quot;</em>, <em>&quot;Could I use git worktrees and run a bunch of sessions in parallel?&quot;</em> <sup id="footnote-fnref-3"><a href="#footnote-fn-3">3</a></sup></p>
<p>The end result resembles an <a href="https://www.factorio.com/">automation game</a>, but when I&#x27;m done playing, I&#x27;ve brought something useful into the world.</p>
<h2>A feeling of flow</h2>
<p>A friend pointed out that what makes Claude Code fun isn&#x27;t &quot;gamification&quot; in the traditional sense. There are no badges, levels, streaks, or anything like that.</p>
<p>Instead, it&#x27;s the feeling of <strong>flow state</strong> and <strong>mastery</strong> that you get from learning to use the tool well.</p>
<p>There&#x27;s <strong>a low barrier to entry</strong>–start by running a command and typing a prompt into the box. The product <strong>gradually reveals new ways of using the CLI</strong> through daily tips and suggestions. There&#x27;s a <strong>sense of momentum</strong>, paired with <strong>clear, immediate feedback</strong> on what the agent is doing.</p>
<p>And most importantly, there are <strong>opportunities to improve your skill</strong> by using more advanced features and automations. Claude Code encourages you to keep honing that &quot;product harness loop&quot;.</p>
<p>While coding is the perfect Petri dish for these sorts of ideas, these same techniques can be applied to other domains. At the end of the day, most knowledge work still involves critical thinking, resource allocation, and strategic planning. Perhaps enterprise software should borrow more ideas from Minecraft and fewer from 1950s-era accounting.</p>
<p>If coding is any indication, the tools that seem to be winning allow the user to <strong>hack their own workflows</strong> and <strong>optimize for that &quot;feeling of flow&quot;</strong>.</p>
<p>My biggest takeaway is that it&#x27;s not just the evals that matter–the product harness matters too.</p>
<p>Does all of this tweaking and customization actually make us more productive or is it just fun to do? There&#x27;s probably some efficient frontier in between. And  we&#x27;ll need <a href="https://www.inkandswitch.com/essay/malleable-software/">malleable tools</a> to find it.</p>
<section data-footnotes="true" class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
<ol>
<li id="footnote-fn-1">
<p>I&#x27;m discussing Claude Code in this post since it pioneered the agentic CLI concept. In my (admittedly biased) opinion of my former employer, the <a href="https://github.com/openai/codex">Codex CLI</a> team is doing great work and is incorporating a bunch of these same ideas. They have shipped a <strong>ton</strong> of updates, so check it out if you haven&#x27;t in a while (be sure to upgrade to the latest version!). <a href="https://github.com/google-gemini/gemini-cli">Gemini</a> also followed suit here as well, but I haven&#x27;t had the chance to test it nearly as much. <a href="#footnote-fnref-1">↩</a></p>
</li>
<li id="footnote-fn-2">
<p>The cloud version, not the CLI. <a href="#footnote-fnref-2">↩</a></p>
</li>
<li id="footnote-fn-3">
<p>This was actually the &quot;killer feature&quot; bet we made with Codex. The fact that it operates asynchronously means that you can kick off as many tasks as you&#x27;d like in parallel, no worktrees required. In the fullness of time I think it will also be the way users <em>want</em> to interact with the models. But it does require some &quot;re-tooling&quot; on the user&#x27;s side. <a href="#footnote-fnref-3">↩</a></p>
</li>
</ol>
</section></div></content></entry><entry><title>Reflections on OpenAI</title><link href="https://calv.info/openai-reflections"/><id>https://calv.info/openai-reflections</id><updated>2025-07-15T15:49:56.330Z</updated><author><name>Calvin French-Owen</name></author><summary>There&#x27;s a lot of smoke and noise around what OpenAI is doing, but not a lot of first-hand accounts of what the culture of working there actually feels like. Here&#x27;s my view on what it was like to work there.</summary><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>I left <a href="https://openai.com/">OpenAI</a> three weeks ago. I had joined the company back in May 2024.</p>
<p>I wanted to share my reflections because there&#x27;s a lot of smoke and noise around what OpenAI is doing, but not a lot of first-hand accounts of what the culture of working there actually <em>feels like</em>.</p>
<p><a href="https://nabeelqu.co/">Nabeel Qureshi</a> has an amazing post called <a href="https://nabeelqu.substack.com/p/reflections-on-palantir">Reflections on Palantir</a>, where he ruminates on what made Palantir special. I wanted to do the same for OpenAI while it&#x27;s fresh in my mind. You won&#x27;t find any trade secrets here, more just reflections on this current iteration of one of the most fascinating organizations in history at an extremely interesting time.</p>
<p>To put it up-front: there wasn&#x27;t any personal drama in my decision to leave–in fact I was deeply conflicted about it. It&#x27;s hard to go from being a <a href="https://segment.com">founder of your own thing</a> to an employee at a 3,000-person organization. Right now I&#x27;m craving a fresh start.</p>
<p>It&#x27;s entirely possible that the quality of the work will draw me back. It&#x27;s hard to imagine building anything as impactful as AGI, and LLMs are easily the technological innovation of the decade. I feel lucky to have seen some of the developments first-hand and also been a part of the <a href="https://openai.com/index/introducing-codex/">Codex launch</a>.</p>
<p>Obviously these aren&#x27;t the views of the company–as observations they are my own. OpenAI is a big place, and this is my little window into it.</p>
<h2>Culture</h2>
<p>The first thing to know about OpenAI is <strong>how quickly it&#x27;s grown</strong>. When I joined, the company was a little over 1,000 people. One year later, it is over 3,000 and I was in the top 30% by tenure. Nearly everyone in leadership is doing a drastically different job than they were ~2-3 years ago. <sup id="footnote-fnref-1"><a href="#footnote-fn-1">1</a></sup></p>
<p>Of course, <strong>everything breaks</strong> when you scale that quickly: how to communicate as a company, the reporting structures, how to ship product, how to manage and organize people, the hiring processes, etc. Teams vary significantly in culture: some are sprinting flat-out all the time, others are babysitting big runs, some are moving along at a much more consistent pace. There&#x27;s no single OpenAI experience, and <strong>research</strong>, <strong>applied</strong>, and <strong>GTM</strong> operate on very different time horizons.</p>
<p>An unusual part of OpenAI is that <strong>everything, and I mean <em>everything</em>, runs on Slack</strong>. There is no email. I maybe received ~10 emails in my entire time there. If you aren&#x27;t organized, you will find this incredibly distracting. If you curate your channels and notifications, you can make it pretty workable.</p>
<p>OpenAI is <strong>incredibly bottoms-up, especially in research</strong>. When I first showed up, I started asking questions about the roadmap for the next quarter. The answer I got was: &quot;this doesn&#x27;t exist&quot; (though now it does). Good ideas can come from anywhere, and it&#x27;s often not really clear which ideas will prove most fruitful ahead of time. Rather than a grand &#x27;master plan&#x27;, progress is iterative and uncovered as new research bears fruit.</p>
<p>Thanks to this bottoms-up culture, OpenAI is also <strong>very meritocratic</strong>. Historically, leaders in the company are promoted primarily based upon their ability to have good ideas and then execute upon them. Many leaders who were incredibly competent weren&#x27;t very good at things like presenting at all-hands or political maneuvering. That matters less at OpenAI then it might at other companies. The best ideas do tend to win. <sup id="footnote-fnref-2"><a href="#footnote-fn-2">2</a></sup></p>
<p>There&#x27;s a <strong>strong bias to action</strong> (you can just do things). It wasn&#x27;t unusual for similar teams but unrelated teams to converge on various ideas. I started out working on a parallel (but internal) effort similar to <a href="https://help.openai.com/en/articles/11487775-connectors-in-chatgpt">ChatGPT Connectors</a>. There must&#x27;ve been ~3-4 different <a href="https://openai.com/index/introducing-codex/">Codex</a> prototypes floating around before we decided to push for a launch. These efforts are usually taken by a small handful of individuals without asking permission. Teams tend to quickly form around them as they show promise.</p>
<p>Andrey (the Codex lead) used to tell me that you should think of researchers as their <strong>own &quot;mini-executive&quot;</strong>. There is a strong bias to work on your own thing and see how it pans out. There&#x27;s a corollary here–most research gets done by nerd-sniping a researcher into a particular problem. If something is considered boring or &#x27;solved&#x27;, it probably won&#x27;t get worked on.</p>
<p><strong>Good research managers</strong> are insanely impactful and also incredibly limited. The best ones manage to connect the dots between many different research efforts and bring together a bigger model training. The same goes for great PMs (shoutout ae).</p>
<p>The <strong>ChatGPT EMs</strong> I worked with (Akshay, Rizzo, Sulman) were some of the coolest customers I&#x27;ve ever seen. It really felt like they had seen everything at this point <sup id="footnote-fnref-3"><a href="#footnote-fn-3">3</a></sup>. Most of them were relatively hands-off, but hired good people and tried to make sure they were setup for success.</p>
<p>OpenAI <strong>changes direction on a <em>dime</em></strong>. This was a thing we valued a lot at Segment–it&#x27;s much better to do the <em>right</em> thing as you get new information, vs decide to stay the course just because you had a plan. It&#x27;s remarkable that a company as large as OpenAI still maintains this ethos–Google clearly doesn&#x27;t. The company makes decisions quickly, and when deciding to pursue a direction, goes all in.</p>
<p>There is a <strong>ton of scrutiny</strong> on the company. Coming from a b2b enterprise background, this was a bit of a shock to me. I&#x27;d regularly see news stories broken in the press that hadn&#x27;t yet been announced internally. I&#x27;d tell people I work at OpenAI and be met with a pre-formed opinion on the company. A number of Twitter users run automated bots which check to see if there are new feature launches coming up.</p>
<p>As a result, OpenAI is a <strong>very secretive place</strong>. I couldn&#x27;t tell anyone what I was working on in detail. There&#x27;s a handful of slack workspaces with various permissions. Revenue and burn numbers are more closely guarded.</p>
<p>OpenAI is also <strong>a more serious place</strong> than you might expect, in part because the <strong>stakes feel really high</strong>. On the one hand, there&#x27;s the goal of building AGI–which means there is a lot to get right. On the other hand, you&#x27;re trying to build a product that hundreds of millions of users leverage for everything from medical advice to therapy. And on the other, other hand, the company is competing in the biggest arena in the world. We&#x27;d pay close attention to what was happening at Meta, Google, and Anthropic–and I&#x27;m sure they were all doing the same. All of the major world governments are watching this space with a keen interest.</p>
<p>As often as OpenAI is maligned in the press, everyone I met there is <strong>actually trying to do the right thing</strong>. Given the consumer focus, it is the most visible of the big labs, and consequently there&#x27;s a lot of slander for it.</p>
<p>That said, you probably <strong>shouldn&#x27;t view OpenAI as a single monolith</strong>. I think of OpenAI as an organization that started like Los Alamos. It was a group of scientists and tinkerers investigating the cutting edge of science. That group happened to accidentally spawn the most viral consumer app in history. And then grew to have ambitions to sell to governments and enterprises. People of different tenure and different parts of the org subsequently have very different goals and viewpoints. The longer you&#x27;ve been there, the more you probably view things through the &quot;research lab&quot; or &quot;non-profit for good&quot; lens.</p>
<p>The thing that I <strong>appreciate most is that the company is that it &quot;walks the walk&quot; in terms of distributing the benefits of AI</strong>. Cutting edge models aren&#x27;t reserved for some enterprise-grade tier with an annual agreement. Anybody in the world can jump onto ChatGPT and get an answer, even if they aren&#x27;t logged in. There&#x27;s an API you can sign up and use–and most of the models (even if SOTA or proprietary) tend to quickly make it into the API for startups to use. You could imagine an alternate regime that operates <em>very differently</em> from the one we&#x27;re in today. OpenAI deserves a ton of credit for this, and it&#x27;s still core to the DNA of the company.</p>
<p><strong>Safety is actually more of a thing than you might guess</strong> if you read a lot from <a href="https://thezvi.wordpress.com/">Zvi</a> or <a href="https://www.lesswrong.com/">Lesswrong</a>. There&#x27;s a large number of people working to develop safety systems. Given the nature of OpenAI, I saw more focus on practical risks (hate speech, abuse, manipulating political biases, crafting bio-weapons, self-harm, prompt injection) than theoretical ones (intelligence explosion, power-seeking). That&#x27;s not to say that nobody is working on the latter, there&#x27;s definitely people focusing on the theoretical risks. But from my viewpoint, it&#x27;s not the focus. Most of the work which is done <em>isn&#x27;t published</em>, and OpenAI really should do more to get it out there.</p>
<p>Unlike other companies which freely hand out their swag at every career fair, OpenAI <strong>doesn&#x27;t really give much swag</strong> (even to new employees). Instead there are &#x27;drops&#x27; which happen where you can order in-stock items. The first one brought down the Shopify store, it had so much demand. There was an internal post which circulated on how to POST the right json payloads and circumvent this.</p>
<p>Nearly <strong>everything is a rounding error compared to GPU cost</strong>. To give you a sense: a niche feature that was built as part of the Codex product had the same GPU cost footprint as our entire <a href="https://segment.com/infrastructure/">Segment infrastructure</a> (not the same scale as ChatGPT but saw a decent portion of internet traffic).</p>
<p>OpenAI is perhaps the most <a href="https://www.paulgraham.com/ambitious.html"><strong>frighteningly ambitious org</strong></a> I&#x27;ve ever seen. You might think that having one of the top consumer apps on the planet might be enough, but there&#x27;s a desire to compete across dozens of arenas: the API product, deep research, hardware, coding agents, image generation, and a handful of others which haven&#x27;t been announced. It&#x27;s a fertile ground for taking ideas and running with them.</p>
<p>The company <strong>pays a lot of attention to twitter</strong>. If you tweet something related to OpenAI that goes viral, chances are good someone will read about it and consider it. A friend of mine joked, &quot;this company runs on twitter vibes&quot;. As a consumer company, perhaps that&#x27;s not so wrong. There&#x27;s certainly still a lot of analytics around usage, user growth, and retention–but the vibes are equally as important.</p>
<p>Teams at OpenAI are <strong>much more fluid</strong> than they might be elsewhere. When launching Codex, we needed some help from a few experienced ChatGPT engineers to hit our launch date. We met with some of the ChatGPT EMs to make the request. The next day we had two badass folks ready to dive in and help. There was no &quot;waiting for quarterly planning&quot; or &quot;re-shuffling headcount&quot;. It moved really quickly.</p>
<p><strong>Leadership is quite visible and heavily involved</strong>. This might be obvious at a company such as OpenAI, but every exec seemed quite dialed in. You&#x27;d see gdb, sama, kw, mark, dane, et al chime in regularly on Slack. There are no absentee leaders.</p>
<h2>Code</h2>
<p>OpenAI uses a <strong>giant monorepo</strong> which is ~mostly Python (though there is a growing set of Rust services and a handful of Golang services sprinkled in for things like network proxies). This creates a lot of strange-looking code because there are so many ways you can write Python. You will encounter both libraries designed for scale from 10y Google veterans as well as throwaway Jupyter notebooks from newly-minted PhDs. Pretty much everything operates around FastAPI to create APIs and Pydantic for validation. But there aren&#x27;t style guides enforced writ-large.</p>
<p>OpenAI <strong>runs everything on Azure</strong>. What&#x27;s funny about this is there are exactly three services that I would consider trustworthy: Azure Kubernetes Service, CosmosDB (Azure&#x27;s document storage), and BlobStore. There&#x27;s no true equivalents of Dynamo, Spanner, Bigtable, Bigquery Kinesis or Aurora. It&#x27;s a bit rarer to think a lot in auto-scaling units. The IAM implementations tend to be <em>way</em> more limited than what you might get from an AWS. And there&#x27;s a strong bias to implement in-house.</p>
<p>When it comes to personnel (at least in eng), there&#x27;s a <strong>very significant Meta → OpenAI pipeline</strong>. In many ways, OpenAI resembles early Meta: a blockbuster consumer app, nascent infra, and a desire to move really quickly. Most of the infra talent I&#x27;ve seen brought over from Meta + Instagram has been quite strong.</p>
<p>Put these things together, and you see a lot of <strong>core parts of infra</strong> that feel reminiscent of Meta. There was an in-house reimplementation of <a href="https://engineering.fb.com/2013/06/25/core-infra/tao-the-power-of-the-graph/">TAO</a>. An effort to consolidate auth identity at the edge. And I&#x27;m sure a number of others I don&#x27;t know about.</p>
<p><strong>Chat runs really deep</strong>. Since ChatGPT took off, a <em>lot</em> of the codebase is structured around the idea of chat messages and conversations. These primitives are so baked at this point, you should probably ignore them at your own peril. We did deviate from them a bit in Codex (leaning more into learnings from the <a href="https://platform.openai.com/docs/api-reference/responses">responses API</a>), but we leveraged a lot of prior art.</p>
<p><strong>Code wins.</strong> Rather than having some central architecture or planning committee, decisions are typically made by whichever team plans to do the work. The result is that there&#x27;s a strong bias for action, and often a number of duplicate parts of the codebase. I must&#x27;ve seen half a dozen libraries for things like queue management or agent loops.</p>
<p>There were a few areas where <strong>having a rapidly scaled eng team and not a lot of tooling created issues</strong>. sa-server (the backend monolith) was a bit of a dumping ground. CI broke a lot more frequently than you might expect on master. Test cases even running in parallel and factoring in a subset of dependencies could take ~30m to run on GPUs. These weren&#x27;t unsolvable problems, but it&#x27;s a good reminder that these sorts of problems exist everywhere, and they are likely to get worse when you scale super quickly. To the credit of the internal teams, there&#x27;s a <em>lot</em> of focus going into improving this story.</p>
<h2>Other things I learned</h2>
<p><strong>What a big consumer brand looks like.</strong> I hadn&#x27;t really internalized this until we started working on Codex. Everything is measured in terms of &#x27;pro subs&#x27;. Even for a product like Codex, we thought of the onboarding primarily related to individual usage rather than teams. It broke my brain a bit, coming from predominantly a B2B / enterprise background. You flip a switch and you get traffic from day 1.</p>
<p><strong>How large models are trained (at a high-level)</strong>. There&#x27;s a spectrum from &quot;experimentation&quot; to &quot;engineering&quot;. Most ideas start out as small-scale experiments. If the results look promising, they then get incorporated into a bigger run. Experimentation is as much about tweaking the core algorithms as it is tweaking the data mix and carefully studying the results. On the large end, doing a big run almost looks like giant distributed systems engineering. There will be weird edge cases and things you didn&#x27;t expect. It&#x27;s up to you to debug them.</p>
<p><strong>How to do GPU-math.</strong> We had to forecast out the load capacity requirements as part of the Codex launch, and doing this was the first time I&#x27;d really spent benchmarking any GPUs. The gist is that you should actually start from the latency requirements you need (overall latency, # of tokens, time-to-first-token) vs doing bottoms-up analysis on what a GPU can support. Every new model iteration can change the load patterns wildly.</p>
<p><strong>How to work in a large Python codebase</strong>. Segment was a combination of both microservices, and was mostly Golang and Typescript. We didn&#x27;t really have the breadth of code that OpenAI does. I learned a lot about how to scale a codebase based upon the number of developers contributing to it. You have to put in a lot more guardrails for things like &quot;works by default&quot;, &quot;keep master clean&quot;, and &quot;hard to misuse&quot;.</p>
<h2>Launching Codex</h2>
<p>A big part of my last three months at OpenAI was launching <a href="https://chatgpt.com/codex">Codex</a>. It&#x27;s unquestionably one of the highlights of my career.</p>
<p>To set the stage, back in November 2024, OpenAI had set a 2025 goal to launch a coding agent. By February 2025 we had a few internal tools floating around which were using the models to great effect. And we were feeling the pressure to launch a coding-specific agent. Clearly the models had gotten to the point where they were getting really useful for coding (seeing the new explosion of vibe-coding tools in the market).</p>
<p>I returned early from my paternity leave to help participate in the Codex launch. A week after I returned, we had a (slightly chaotic) merger of two teams, and began a mad-dash sprint. From start (the first lines of code written) to finish, the whole product was built in just <strong>7 weeks</strong>.</p>
<p>The Codex sprint was probably the hardest I&#x27;ve worked in nearly a decade. Most nights were up until 11 or midnight. Waking up to a newborn at 5:30 every morning. Heading to the office again at 7a. Working most weekends. We all pushed hard as a team, because every week counted. It reminded me of being back at YC.</p>
<p>It&#x27;s hard to overstate how incredible this level of pace was. I haven&#x27;t seen organizations large or small go from an idea to a fully launched + freely available product in such a short window. The scope wasn&#x27;t small either; we built a container runtime, made optimizations on repo downloading, fine-tuned a custom model to deal with code edits, handled all manner of git operations, introduced a completely new surface area, enabled internet access, and ended up with a product that was generally a delight to use. <sup id="footnote-fnref-4"><a href="#footnote-fn-4">4</a></sup></p>
<p>Say what you will, OpenAI still has that launching spirit. <sup id="footnote-fnref-5"><a href="#footnote-fn-5">5</a></sup></p>
<p>The good news is that the right people can make magic happen. We were a senior team of ~8 engineers, ~4 researchers, 2 designers, 2 GTM and a PM. Had we not had that group, I think we would&#x27;ve failed. Nobody needed much direction, but we did need a decent amount of coordination. If you get the chance to work with anyone on the Codex team, know that every one of them is fantastic.</p>
<p>The night before launch, five of us stayed up until 4a trying to deploy the main monolith (a multi-hour affair). Then it was back to the office for the 8a launch announcement and livestream. We turned on the flags, and started to see see the traffic pour in. I&#x27;ve never seen a product get so much immediate uptick just from appearing in a left-hand sidebar, but that&#x27;s the power of ChatGPT.</p>
<p>In terms of the product shape, we settled on a form factor which was entirely asynchronous. Unlike tools like <a href="https://cursor.com/en">Cursor</a> (at the time, it now supports <a href="https://cursor.com/blog/agent-web">a similar mode</a>) or <a href="https://docs.anthropic.com/en/docs/claude-code/cli-reference">Claude Code</a>, we aimed to allow users to kick off tasks and let the agent run in its own environment. Our bet was in the end-game, users should treat a coding agent like a co-worker: they&#x27;d send messages to the agent, it gets some time to do its work, and then it comes back with a PR.</p>
<p>This was a bit of a gamble: we&#x27;re in a slightly weird state today where the models are <em>good</em>, but not <em>great</em>. They can work for <em>minutes</em> at a time, but not yet <em>hours</em>. Users have widely varying degrees of trust in the models capabilities. And we&#x27;re not even clear what the true capabilities of the models are.</p>
<p>Over the long arc of time, I do believe most programming will look more like Codex. In the meantime, it&#x27;s going to be interesting to see how all the products unfold.</p>
<p>Codex (maybe unsurprisingly) is really good at working in a large codebase, understanding how to navigate it. The biggest differentiator I&#x27;ve seen vs other tools is the ability to kick off multiple tasks at once and compare their output.</p>
<p>I recently saw that <a href="https://github.com/aavetis/PRarena">there are public numbers</a> comparing the PRs made by different LLM agents. Just at the public numbers, Codex has generated <strong>630,000 PRs</strong>. That&#x27;s about <strong>78k public</strong> PRs per engineer in the <strong>53 days</strong> since launch (<em>you can make your own guesses about the multiple of private PRs</em>). I&#x27;m not sure I&#x27;ve ever worked on something so impactful in my life.</p>
<h2>Parting thoughts</h2>
<p>Truth be told, I was originally apprehensive about joining OpenAI. I wasn&#x27;t sure what it would be like to sacrifice my freedom, to have a boss, to be a much smaller piece of a much larger machine. I kept it fairly low-key that I had joined, just in case it wasn&#x27;t the right fit.</p>
<p>I did want to get three things from the experience...</p>
<ul>
<li>to build intuition for how the models were trained and where the capabilities were going</li>
<li>to work with and learn from amazing people</li>
<li>to launch a great product</li>
</ul>
<p>In reflecting on the year, I think it was one of the best moves I&#x27;ve ever made. It&#x27;s hard to imagine learning more anywhere else.</p>
<p>If you&#x27;re a founder and feeling like your startup really isn&#x27;t going anywhere, you should either 1) deeply re-assess how you can take more shots on goal or 2) go join one of the big labs. Right now is an incredible time to build. But it&#x27;s also an incredible time to peer into where the future is headed.</p>
<p>As I see it, the path to AGI is a three-horse race right now: OpenAI, Anthropic, and Google. Each of these organizations are going to take a different path to get there based upon their DNA (consumer vs business vs rock-solid-infra + data). <sup id="footnote-fnref-6"><a href="#footnote-fn-6">6</a></sup> Working at any of them will be an eye-opening experience.</p>
<hr/>
<p>Thank you to Leah for being incredibly supportive and taking the majority of the childcare throughout the late nights. Thanks to PW, GDB, and Rizzo for giving me a shot. Thanks to the SA teammates for teaching me the ropes: Andrew, Anup, Bill, Jeremy, Kwaz, Ming, Simon, Tony, and Val. And thanks for the Codex core team for giving me the ride of a lifetime: Albin, AE, Andrey, Bryan, Channing, DavidK, Gabe, Gladstone, Hanson, Joey, Josh, Katy, KevinT, Max, Sabrina, SQ, Tibo, TZ and Will. I&#x27;ll never forget this sprint.</p>
<p><strong>Wham.</strong></p>
<section data-footnotes="true" class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
<ol>
<li id="footnote-fn-1">
<p>It&#x27;s easy to try and read into a lot of drama whenever there&#x27;s a departing leader, but I would chalk ~70% of them up to this fact alone. <a href="#footnote-fnref-1">↩</a></p>
</li>
<li id="footnote-fn-2">
<p>I do think we&#x27;re in a slight phase change here. There&#x27;s a lot of senior leadership hires being made from outside the company. I&#x27;m generally in favor of this, I think the company benefits a lot from infusing new external DNA. <a href="#footnote-fnref-2">↩</a></p>
</li>
<li id="footnote-fn-3">
<p>I get the sense that scaling the fastest growing consumer product ever tends to build a lot of muscle. <a href="#footnote-fnref-3">↩</a></p>
</li>
<li id="footnote-fn-4">
<p>Of course, we were also standing on the shoulders of giants. The CaaS team, core RL teams, human data, and general applied infra made this all possible. <a href="#footnote-fnref-4">↩</a></p>
</li>
<li id="footnote-fn-5">
<p>We <a href="https://help.openai.com/en/articles/11428266-codex-changelog">kept it going too</a>. <a href="#footnote-fnref-5">↩</a></p>
</li>
<li id="footnote-fn-6">
<p>We saw some big hires at Meta a few weeks ago. xAI launched Grok 4 which performs well on benchmarks. Mira and Ilya both have great talent. Maybe that will change things (the people are good). They have some catching up to do. <a href="#footnote-fnref-6">↩</a></p>
</li>
</ol>
</section></div></content></entry><entry><title>Upsides and Downsides</title><link href="https://calv.info/upsides-and-downsides"/><id>https://calv.info/upsides-and-downsides</id><updated>2025-02-27T00:00:00.000Z</updated><author><name>Calvin French-Owen</name></author><summary>When to shift mentality from upsides to downsides. And why most AI demos don&#x27;t make great products.</summary><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Every startup founder knows about <a href="https://en.wikipedia.org/wiki/Crossing_the_Chasm">Geoffrey Moore&#x27;s concept of &quot;crossing the chasm&quot;</a>–that you have to change your marketing and sales approach as you gain marketshare fit a more conservative buyer. But most fail to internalize what crossing the chasm means when it comes to their product.</p>
<p>I recently stumbled upon <a href="https://www.experimental-history.com/p/science-is-a-strong-link-problem">Adam Mastroanni&#x27;s post on strong-link problems</a>, and realized that it&#x27;s the perfect framework for thinking about this shift.</p>
<p>In essence, Adam says there are two types of problems: <strong>strong-link</strong> and <strong>weak-link</strong> problems. Strong-link problems are solved by looking for excellence in a single dimension. When building a startup or doing drug discovery, it doesn&#x27;t matter how many times you are wrong, you just have to be right <em>once</em>. Weak-link problems are solved by eliminating failure in all dimensions–it&#x27;s why the FDA puts standards on the internal temperature of your meat or why you might study the p95 latency of an endpoint.</p>
<p>When you have a strong-link problem, you increase variance because you benefit from outliers. You focus on <strong>upsides</strong>. When you have a weak-link problem, you decrease variance because outliers will destroy you. You focus on <strong>downsides</strong>.</p>
<p>Early stage startups tend to benefit primarily from solving problems of <strong>upside</strong>. Early adopters choose a startup <em>because</em> it provides some quantum of utility that no other product does. It doesn&#x27;t really matter how much downside that startup might have, customers pick it because there is effectively no replacement.</p>
<p>But as a startup matures, new problems come into focus: uptime requirements, security and access controls, audit logging, cost + performance, etc etc. The <strong>downside</strong> problems.</p>
<p><img src="https://calv.info./strong-links/adoption-curve.svg" alt=""/></p>
<p>Many startup teams fail to realize that as they gain marketshare, the problems of their customers shift. The early adopters who once valued variance are replaced by late adopters who care about minimizing risk.</p>
<p>Making this change can be really difficult–revenue stalls at 5-10m, product velocity falls off a cliff, churn is up. All because the new breed of customers care more about the <strong>downsides</strong> than the <strong>upsides</strong>.</p>
<p>This model also helps explain why founders from big companies often fail to get off the ground. They don&#x27;t understand that the early phase of a startup is all about exploration. The downsides don&#x27;t matter until they begin to find product market fit. The only things to do are actively ship and cut scope.</p>
<h3>Balancing upsides and downsides</h3>
<p>Where this gets really tricky when a company has <em>matured</em> but still needs to launch another product. Founders need to simultaneously balance the &quot;production-grade&quot; infrastructure needs of their existing customers while incubating the high-variance labs to launch the next big product.</p>
<p>Doing this well takes skill and practice. You can&#x27;t just ignore your customers when they say &quot;the product doesn&#x27;t work&quot;. But you should still be taking new bets, even as a mature company.</p>
<p>At Segment, we&#x27;d think about balancing these needs via the McKinsey horizon&#x27;s framework:</p>
<ul>
<li><strong>60% core</strong> (existing product)</li>
<li><strong>30% emerging bets</strong> (<a href="https://segment.com/product/protocols/">products which would generate revenue in the next year</a>)</li>
<li><strong>10% new bets</strong> (<a href="segment.com/personas">more speculative bets</a>)</li>
</ul>
<p>We&#x27;d typically start new bets with a very small percentage of customers who were likely to pay a premium (e.g. only enterprise). They weren&#x27;t asked to be built with rock-solid infrastructure from day one. It was only once they had some measure of product-market fit that they would start the transition.</p>
<p>There were times where we&#x27;d focus a lot more on downsides as well. After a particularly bad few months of uptime, we did a &quot;reliability reset&quot; to double down on improving our core data pipelines. We had some big security up-levels after a handful of scary incidents. But all of those came years down the line.</p>
<h3>LLMs and the &quot;upside&quot; phase</h3>
<p>The second reason this idea top-of-mind today is because I&#x27;ve been trying a lot of AI-powered products out there.</p>
<p>When it comes to the models themselves, my belief is that most models are still optimizing for the &quot;upside&quot; phase. They are impressive, but don&#x27;t handle cases of nuanced judgement, adversarial inputs, or uncertainty well enough for a mature business to trust them entirely. Many models will crush various advanced reasoning evals, but you probably wouldn&#x27;t trust them with a credit card.</p>
<p>The same goes for AI products. In a strange twist, most AI demos nail the &quot;upside&quot; phase, but that&#x27;s where they get stuck: just a demo, not a product.</p>
<p>There are a handful of products which really manage to make the leap (Copilot, Cursor, Midjourney, etc). My belief is that these tools can cross the chasm primarily <em>because</em> they aren&#x27;t flakey.</p>
<p>To get a toe-hold, the product needs to be amazing. But to get to mass adoption, it needs to just work.</p>
<hr/>
<p>As your product matures, it&#x27;s worth asking every few months: does my customer care more about the upside or the downside?</p>
<p>If it&#x27;s the latter, it might be time to shift focus.</p></div></content></entry><entry><title>How much of a manager are you?</title><link href="https://calv.info/how-much-of-a-manager"/><id>https://calv.info/how-much-of-a-manager</id><updated>2025-02-11T12:00:00.000Z</updated><author><name>Calvin French-Owen</name></author><summary>There&#x27;s a lot of debate around how much &#x27;good code&#x27; LLMs can write for you. Some engineers claim coding with LLMs is amazing, while others think they are trash. Ultimately, it depends on how much of a manager you are.</summary><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Every few weeks I see someone comment on Hacker News: &quot;sure, LLMs are great, but they really couldn&#x27;t do <em>my job</em>&quot;. And I&#x27;ve been trying to square that opinion with all the impressive levels of output that I see everywhere else. What accounts for the difference in perspective between the two camps?</p>
<p>This debate has become top-of-mind for me again as I&#x27;ve been trying out a bunch of new AI coding assistants. I&#x27;ve been doing a lot of what <a href="https://x.com/karpathy/status/1886192184808149383">Andrej Karpathy calls &#x27;vibe coding&#x27;</a> on various side projects:</p>
<blockquote>
<p>There&#x27;s a new kind of coding I call <strong>&quot;vibe coding&quot;</strong>, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It&#x27;s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good. Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like &quot;decrease the padding on the sidebar by half&quot; because I&#x27;m too lazy to find it. I &quot;Accept All&quot; always, I don&#x27;t read the diffs anymore. [...] I&#x27;m building a project or webapp, but it&#x27;s not really coding - I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.</p>
</blockquote>
<p>Coding this way feels a bit like cheating. It&#x27;s so easy to do, that I just start ignoring the code all together and focusing on the outputs. And it works well. Really well. I can generate full-blown webapps with relatively complex functionality.</p>
<p>Clearly LLMs are doing <em>something</em> right here? So why do we feel retiscent about turning over the keys entirely to an LLM in a production codebase?</p>
<p>Fundamentally, I think a lot of success or failure comes down to exactly how you as an engineer choose to augment yourself with LLMs. And in particular, how much of the task requires being a <strong>manager</strong> vs a <strong>software engineer</strong>.</p>
<h2>The &#x27;what&#x27; vs the &#x27;how&#x27;</h2>
<p>When I first started managing the engineering team at <a href="https://segment.com/">Segment</a>, an engineering mentor gave me the following advice...</p>
<blockquote>
<p>as a manager, it&#x27;s your job to think about the <strong>what</strong>, not the <strong>how</strong>. <sup id="footnote-fnref-1"><a href="#footnote-fn-1">1</a></sup></p>
</blockquote>
<p>The <strong>manager</strong> should be responsible for clearly describing the problem, setting the team goals, defining guardrails, and being specific about what &#x27;good&#x27; looks like. And then the <strong>engineering team</strong> should be responsible for coming up with the actual tech for how to achieve those goals.</p>
<p>There are a few reasons for this...</p>
<ol>
<li>it&#x27;s difficult for someone in an engineering leadership position to come up with great technical solutions if they aren&#x27;t deep in the codebase.</li>
<li>from a systems perspective, the individual engineers should have ownership over the codebase. if an engineer is getting paged by a system at 3am, they should be able to change that system.</li>
</ol>
<p>But more generally, it&#x27;s just an instance of <a href="https://en.wikipedia.org/wiki/Principal%E2%80%93agent_problem">principals and agents</a>. The principals define the rules of the game and the goals, and then the agents are free to solve them however they see fit.</p>
<p>The tools are getting good enough that while using them, I&#x27;ve been feeling <em>less like a software engineer</em> and <em>more like a manager</em>. I&#x27;m specifying the requirements, and the model fills in the details.</p>
<p>I think this is the key to using LLMs well. You have to have some sense for where they shine and where they fall short. And you have to be okay with the fact that you are <em>managing</em> the LLM, not <em>writing code</em>.</p>
<h2>The art of managing well</h2>
<p>As I&#x27;ve gone down the rabbit-hole in my own side projects, there are a few things that stand out as making vibe coding successful. Tactically, I&#x27;ve been using Cursor with Sonnet 3.5 to build various sites, but I think they&#x27;d work in almost any context.</p>
<p><strong>Be clear about what you want</strong></p>
<p>The first, and maybe most obvious thing is that you have to be clear about the behavior you want to achieve. Instead of saying &quot;build me a chat app&quot;, it&#x27;s worth spending some extra time on the prompt to describe the pages and UI elements you want.</p>
<p>I&#x27;ve started writing down a spec doc ahead of doing any coding. I will typically scaffold a few different things just to give the LLM a place to get started:</p>
<ul>
<li>what framework I&#x27;m using (e.g. running <code>npx create-next-app</code> to give it some boilerplate)</li>
<li>specs for the data structures I want (e.g. &quot;create the following Typescript types with the following fields&quot;)</li>
<li>what I want the API to look like (or pages for a frontend app)</li>
</ul>
<p>In areas where I don&#x27;t have great ideas (e.g. design), I&#x27;ll give the agent a little bit more leash. I&#x27;ll give it more general instructions like &quot;add a layout button to the logout&quot; and see what it comes up with. Often it finds a better path than I would have on my own.</p>
<p><strong>Tighten your feedback loops to instantaneous</strong></p>
<p>For all of my projects, having the ability to instantly reload the page is key. It means that I spend less time looking at the code the models are generating, and more time actually just looking at the output.</p>
<p>I&#x27;m typically running <code>npm run dev</code> in a terminal and then just reloading the page to test functionality. But asking the models to write and re-run unit tests also works.</p>
<p>While using Cursor, any changes which are applied will automatically go through a lint check, and the model is typically able to resolve them on its own. Having some automated step (like a CI/CD pipeline) that is cheap to run and check will get you a lot more mileage.</p>
<p><strong>Start small, and iterate</strong></p>
<p>Sonnet is remarkably good at following multi-part instructions, but I&#x27;ve still had the most success by starting small and iterating.</p>
<p>For example, if I&#x27;m building a webapp, I&#x27;ll <em>first</em> ask to create the database types and methods. <em>Then</em> I&#x27;ll ask it to add the API routes. Then I&#x27;ll ask to add the frontend pages for it. <em>Then</em> I&#x27;ll critique the styling.</p>
<p>Most of today&#x27;s models don&#x27;t nail a &quot;one-shot&quot; implementation. But if they have a starting point, they are much more likely to get it right.</p>
<p>You could probably do this in a different order, but I find working &#x27;bottoms-up&#x27; tends to give fairly accurate results and catch errors as they happen.</p>
<p><strong>Give the model a toe-hold</strong></p>
<p>With Cursor + Sonnet in particular, I find that having a good &#x27;toe-hold&#x27; makes all the difference. If there are some norms for things like &quot;use shadcn components&quot; or &quot;use tailwind classes&quot;, the model is more likely to mirror these when generating new outputs.</p>
<p>Rather than starting with a totally blank canvas and asking the model to do everything, starting with a few opinions on basic styles will help a lot. Sonnet is quite good at matching what is already there.</p>
<p><strong>Make failure cheap</strong></p>
<p>I suspect that the reason LLMs work so well for vibe coding is that making mistakes is incredibly cheap. You aren&#x27;t writing critical infrastructure or production code, you are just trying out ideas.</p>
<p>This creates a bit of a paradox though... most of the work that software engineers do today isn&#x27;t try out ideas. It&#x27;s writing production-grade code. Code that has to work, be debugged, and maintained for years to come.</p>
<p>To use LLMs effectively (today anyway), engineers need to understand the difference between:</p>
<ul>
<li><strong>cheap use cases</strong>: scaffolding new boilerplate, writing unit tests, fixing small regressions, adding documentation, scripting one-off tasks</li>
<li><strong>expensive use cases</strong>: big refactoring, mission-critical APIs, core data models, database migrations</li>
</ul>
<p>Note that this is a moving target! And more and more &quot;expensive&quot; tasks are becoming cheaper.</p>
<p><strong>Dictating your requirements</strong></p>
<p>Like Karpathy, I&#x27;ve found that dictating what I want really forces me to think more about the requirements. I was really skeptical about people building software by just recording their voices (isn&#x27;t voice really lossy?).</p>
<p>But recently I became a new parent, and found a high premium on using my hands. I&#x27;m relying on voice a lot more often, for everything from &quot;implement this database call&quot; to &quot;make these two buttons the same size&quot;. It works shockingly well, and I find myself relying more naturally on describing the <em>what</em> vs the <em>how</em>.</p>
<h2>What&#x27;s missing?</h2>
<p>I painted a bit of a rosy view of how to use LLMs well, but there are a lot of things they <em>won&#x27;t</em> do...</p>
<p><strong>Image transfer</strong> – I haven&#x27;t had luck uploading an image, and then asking the models attempt to replicate it in code. If this is what you are looking for, you&#x27;ll have a much better time exporting some base CSS from Figma or describing what you want in your own words. Since there&#x27;s an intermediate text step for most models, image -&gt; code tends to be lossy.</p>
<p><strong>Refactor on their own</strong> – typically LLMs will just spit out more and more code without considering how to consolidate functionality as part of one shared component. You have to explicitly nudge the model to simplify code and pull out pieces of shared functionality.</p>
<p><strong>Sweating the details</strong> – My experience with LLMs is that they can implement many small steps in isolation (consider the security implications, improve memory performance, update the API so it&#x27;s more idiomatic). But coordinating them all at the same time doesn&#x27;t tend to work so well. It&#x27;s hard to get a model to simultaneously focus on all of these areas at once. If there are particular features which <em>really</em> require a lot of performance critical work, I think we need to wait a bit for models to improve.</p>
<p><strong>System design</strong> – Using a model is no substitute for good architectural decisions. You probably have a better sense of how the system <em>should</em> evolve than the model does. I wouldn&#x27;t hand off these tasks to a model quite yet.</p>
<p><strong>Implementing complex pieces of functionality</strong> – in my experience, most LLMs still lack the ability to think carefully about things like race conditions, advanced edge cases, and really high-performance code.</p>
<p><strong>Understanding the system as a whole</strong> – context windows are increasing, but they really aren&#x27;t enough to include the whole codebase. Cursor + Sonnet do extremely well at searching codebases for the relevant context.</p>
<p>Admittedly, this is a long list–and there are a lot of things here that seem like dealbreakers. But I think it&#x27;s worth recognizing that a lot of these limitations will go away over time. Reasoning is improving, instruction-following is improving, and context windows are increasing.</p>
<h2>The engineering divide</h2>
<p>Back to the original question... why do some engineers love LLMs and others don&#x27;t?</p>
<p>I think this is <em>mostly</em> accounted for by whether you care about <strong>what</strong> or <strong>how</strong>. For every engineer, there&#x27;s some level of comfort with &#x27;handing over the keys&#x27; to an agent... and that level depends a lot on what you are trying to get done.</p>
<p>If you don&#x27;t know any frontend code, you are much more likely to lean on the LLM to produce something that &#x27;looks good&#x27; even if it doesn&#x27;t optimize for a refactor. If you don&#x27;t know how to fix some random devops on a server, you are probably happy to run all the machine-generated bash you can get.</p>
<p>On the flip side, if you spend most of your time doing performance engineering and know everything there is to know about how Go does memory allocations, you&#x27;ll probably be unsatisfied with an LLM that generates working (but maybe inefficient code).</p>
<p>If I were just starting out in software and giving advice to my younger self, I&#x27;d say there are a few skills that are really worth developing...</p>
<ol>
<li><strong>being able to clearly articulate <em>what</em> you want to get done</strong>. thinking more like a PM than a software engineer. understanding at any given time &quot;what&#x27;s the most important thing to build&quot;.</li>
<li><strong>being able to architect systems in a way that they are flexible later on</strong>. it&#x27;s easy to lock yourself into a poorer data model or a bad API with an LLM in the loop. if you think clearly about the system, you can use the LLM as a colleague. the outputs you get will be better as the system evolves, because you did some pre-work.</li>
<li><strong>understanding how systems work</strong>. you get a lot more value from LLMs if you understand at least some of the underlying components and have intuition for how they fit together. in my experience, this subtly changes your ability to prompt and gets you much better outputs. I don&#x27;t think this goes away even as reasoning + context windows improve.</li>
<li><strong>build intuition for what&#x27;s expensive vs cheap</strong>. we haven&#x27;t begun to explore the capability overhang that exists with today&#x27;s models. understanding their limits is one of the best things you can do as a software engineer.</li>
</ol>
<p>As one parting thought... my hypothesis is that a large philosophical divide on using LLMs comes from the enjoyment engineers derive from writing code. Many of us (myself included) originally started coding because it seemed like a fun way to solve puzzles.</p>
<p>As &#x27;solving puzzles&#x27; becomes the domain of machines, that can&#x27;t be the main reason we still write code. Solve <em>problems</em> instead.</p>
<section data-footnotes="true" class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
<ol>
<li id="footnote-fn-1">
<p>This was a time before <a href="https://paulgraham.com/foundermode.html">&#x27;founder mode&#x27;</a>, though I think much of this advice still holds. Obviously this philosophy is no substitute for deep inspection. As a leader, you must still be thinking critically about how the team is doing and diving deep to understand and critique the &#x27;how&#x27;. <a href="#footnote-fnref-1">↩</a></p>
</li>
</ol>
</section></div></content></entry><entry><title>Heat Pumps, More Than You Wanted to Know</title><link href="https://calv.info/heat-pumps"/><id>https://calv.info/heat-pumps</id><updated>2023-02-06T18:38:13.003Z</updated><author><name>Calvin French-Owen</name></author><summary>Heat Pumps are one of our best tools to fight climate change, but the process of getting one is broken at every step. Here&#x27;s why.</summary><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Over the last few months, <a href="https://www.linkedin.com/in/bakershogry/">Baker</a> and I have been digging in deep on <a href="https://heatpumpshooray.com/">Heat Pumps</a>. I&#x27;m putting my explorations here on pause but wanted to write up everything we&#x27;ve learned over the last few months.</p>
<p>Heat pumps are unique in that they are a tool for fighting climate change, where <strong>we have the technology <em>today</em></strong>. It&#x27;s the same tech that we&#x27;ve been using in refrigerators and AC units for decades. Switching to heat pumps would reduce <a href="https://www.epa.gov/ghgemissions/sources-greenhouse-gas-emissions">hundreds of MT of CO2 emissions per year</a>. And yet, they are only present in <a href="https://www.eia.gov/consumption/residential/data/2020/hc/pdf/HC%206.1.pdf">15% of US single-family homes</a>.</p>
<p>Unlike more speculative areas of climate tech, adopting heat pumps en-masse is more a question of <strong>deployment</strong>. There are barriers with <strong>financing</strong>, the <strong>purchase journey</strong>, the <strong>large up-front install cost</strong>, and the <strong>lack of differentiated outcomes</strong>. I&#x27;ll get to all of these down below.</p>
<p>Overall, it feels like a space which is ripe for disruption, which is why you see a lot of startup founders starting to investigate this space. Every step in the process today is broken, which means that there is a lot to improve.</p>
<p><em>Author&#x27;s note: I&#x27;m not a grizzled heat pump or HVAC expert, so it&#x27;s certainly possible I got some stuff wrong. What follows is the result of about six months of research. If you have a correction, please reach out.</em></p>
<h2>Types of heat pumps</h2>
<p>If you&#x27;re totally new to the space: <strong>heat pumps are a high-efficiency, climate-friendly option to heat and cool your home</strong>. They run on pure electricity, and are 3-4x more efficient than existing electric options. They are one of our best tools in the fight against climate change, and should be deployed in every home in America.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/heat-pumps/heat-pump-cycle.jpg" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>The heat pump cycle. Via https://airteamltd.com/</span></div></div>
<p>Heat pumps consist of the same set of components...</p>
<p><strong>An outdoor unit</strong> which either collects heat from the outdoors to be brought inside, or releases heat collected from indoors. It will typically have a fan to move air across a set of coils (metal) which conduct heat from the refrigerant. there are two primary styles of these: &quot;barrel-style&quot; which look more like a traditional AC unit and &quot;box-style&quot; which are the boxes you see on <a href="http://heatpumpshooray.com/">Heat Pumps, Hooray!</a>.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/heat-pumps/mitsubishi-unit.jpg" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>An outdoor unit</span></div></div>
<p><strong>An indoor unit (or indoor heads)</strong> which either pulls heat in from inside the home or disperses it out. think of it as functionally the same thing as the outdoor unit, but it sits <em>inside</em> your house and looks different based on how heat is distributed throughout the home (see below)</p>
<p><strong>Refrigerant lines</strong> which exchange heat between the indoors and outdoors. these are metal tubes that carry a substance known as refrigerant which has an extremely low boiling point.</p>
<p><strong>A compressor</strong> which is used to modulate pressure of the refrigerant</p>
<p><strong>A reversing-valve</strong> which changes the directional flow of refrigerant to either heat or cool your home. when you adjust the thermostat &#x27;mode&#x27;, it is effectively changing the direction of the reversing-valve.</p>
<p><strong>Sometimes: heat strips / resistive heating</strong>: in <em>very</em> cold climates below -5 degrees F, heat pump capacity will go down to the point where it may have trouble meeting the heating load. To combat this, certain heat pumps ship with resistive elements (imagine a toaster oven that generates heat by running through a wire).</p>
<p>Additionally, there are a few other &#x27;flavors&#x27; of heat pumps depending on your home setup, but all of these consist of the same core parts.</p>
<h3>Heat pump flavors: how heat is distributed</h3>
<p>If you want to install a heat pump in your home, you have several options for how that process works.</p>
<p><strong>Option 1: Central-ducted</strong> - most US homes have duct-work to move heat (and air!) throughout the house. In these systems, there&#x27;s a central set of tubes which run air throughout the house, kind of like blood vessels will carry blood throughout the body. Centrally ducted heat pumps are the most popular, since about 60% of US homes already have ducts. If you have ducts, using that ductwork is typically the cheapest and most effective way to distribute heat.</p>
<p><strong>Option 2: Mini-split aka ductless</strong> - for homes without duct-work, or cases where you want to condition a specific room (e.g. think of a garage), you can use a mini-split unit. This works in the same way as the central units, but instead of hiding the indoor unit in a basement or closet, they are instead mounted visibly on the walls. Refrigerant lines will run directly from an outdoor unit to an indoor one. Because these can&#x27;t heat the whole home, they are typically smaller. In some cases, you can route refrigerant to multiple &#x27;heads&#x27; inside the home.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/heat-pumps/mini-split.jpg" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Mini-split units don&#x27;t use central ducts and instead sit on your wall</span></div></div>
<p><strong>Option 3: Mono-block or &#x27;packaged&#x27;</strong> - there&#x27;s one other type of heat pump which is generally less popular because it&#x27;s not as flexible or performant for many housing configurations, the &#x27;mono-block&#x27; unit. These combine indoor and outdoor units into a single form factor that sits outside the house, and then moves air inside. It benefits from simplicity of install (no refrigerant lines), but won&#x27;t be nearly as efficient as a split system and has a bigger outdoor footprint.</p>
<p><strong>Option 4: Air-to-water</strong> - for households who have radiant heating, heat pumps (or heat pump water heaters) can exchange heat with water. This is less popular in the US, but a popular way to heat in Europe.</p>
<h3>Heat pump flavors: heating sources</h3>
<p>In addition to how heat is distributed inside, there are also differences in how heat pumps.</p>
<p><strong>Air-source heat pumps</strong> are the most common heat pump variety, that you see everywhere. These exchange heat with the outside air, just like an AC Unit.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/heat-pumps/air-source-heat-pump.jpg" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Air-source heat pumps sit outside your home</span></div></div>
<p><strong>Dual-fuel</strong> - a dual fuel heat pump setup is basically the same as centrally ducted, but it just means that you <em>don&#x27;t</em> replace your existing furnace, and have the heat pump installed alongside it. This is a popular route for homeowners looking to gradually transition. With dual-fuel, the heat pump and the gas furnace can never run simultaneously, so it won&#x27;t save much money on heating. But it can be useful for gradually moving to electric heating.</p>
<p><strong>Air-to-ground (aka ground-source)</strong> - in extreme climates (think the northeast), heat pump heating and cooling both become lower efficiency. pulling heat out of the very cold air is hard to do, as is pushing heat into hot humid days. it turns out that if you <a href="https://www.energy.gov/energysaver/geothermal-heat-pumps">drill down ~30-40 feet, the soil temperature will remain more or less constant throughout the year</a>. this gives a heat pump much better efficiency.</p>
<h2>The market</h2>
<p>Today&#x27;s market participants are split into a few different layers: <strong>consumers, contractors and technicians, wholesalers, and manufacturers</strong>.</p>
<p>Let&#x27;s start with <strong>consumers</strong>. Furnace equipment typically has a lifetime of 20 years, while AC equipment has a lifetime of more like 15 years. When your furnace or AC dies (or perhaps you want to get an AC for the first time), you have a magical window of opportunity to upgrade to a heat pump.</p>
<p>Today, most consumers will either call the number on the side of the unit left by the original installer (an ingenious and basically zero-cost lead-flow technique), or will search on Yelp, Angie&#x27;s List, or Google to find a contractor. Typically these lead-flow businesses will get ~$60-80 per lead.</p>
<p>Typically the breakdown is...</p>
<ul>
<li>2-4k equipment cost (pure hardware)</li>
<li>4-15k labor cost (removing old equipment, running refrigerant lines, install)</li>
<li>10-20k (maybe) duct upgrades, panel upgrades</li>
</ul>
<p>That labor cost has a wide variance. It&#x27;s mostly a function of supply and demand. In areas where many people have AC, it&#x27;s relatively easy to retrofit a heat pump and it happens all the time</p>
<p>Most of the time, heat pump upgrades will be done under <em>duress</em>. About 80% of furnace replacements happen during a &quot;break-fix&quot; scenario, where the existing equipment has stopped working and new equipment needs to be installed. It&#x27;s much more difficult to ask for a bigger replacement (both in terms of time and cost) when the heating is out and your family is cold! Homeowners will call a <strong>contractor</strong> (also referred to as an HVAC technician) who comes in to investigate the issue.</p>
<p><strong>Contracting firms</strong> are typically both small and fragmented. There are <a href="https://www.bls.gov/oes/current/oes499021.htm">300,000 HVAC technicians in the US</a>, but <a href="https://www.ibisworld.com/industry-statistics/number-of-businesses/heating-air-conditioning-contractors-united-states/">145,000 companies</a>! That means many of them are either sole proprietors, or just a few individuals with a truck.</p>
<p>It&#x27;s a good business to be in, generally speaking. Individual shops may be making anywhere from $500k to $10m per year depending on the size of operations. The <a href="https://www.bls.gov/oes/current/oes499021.htm">mean hourly wage for a technician is $26/hour</a>.</p>
<p>Most contractors that we&#x27;ve talked with indicate that there&#x27;s a severe labor shortage. Contractors have to be experts at system design, understanding heating and cooling loads, some electrical work, pressure variations, and construction. It’s a skilled trade that takes a long time to master.</p>
<p>New contractors typically learn via the <a href="https://en.wikibooks.org/wiki/Mentor_teacher/Apprenticeship_model"><strong>apprenticeship model</strong></a>, but in the last few years, fewer individuals have entered these trades. In some cases, contractors aren&#x27;t even taking new jobs for months. Demand for contractors spikes twice per year; the first hot day in summer, and the first cold day of the winter.</p>
<p>Over the last few years, there&#x27;s been an increase in private equity roll-up vehicles buying up these firms. Many of them are trying to centralize and bring technology to key functions (scheduling, financing, invoicing) and allow the technicians to spend more time on their specialized work.</p>
<p>Contractors have not historically adopted a lot of software, though there are companies trying to serve them (<a href="https://www.servicetitan.com/">ServiceTitan</a>).</p>
<p>Finally there are the <strong>manufacturers</strong>. These are the only large players in the market, they are a handful of publicly traded companies, like <a href="https://en.wikipedia.org/wiki/Carrier_Global">Carrier</a>, <a href="https://en.wikipedia.org/wiki/Trane">Trane</a>, <a href="https://en.wikipedia.org/wiki/Lennox">Lennox</a>, <a href="https://en.wikipedia.org/wiki/Mitsubishi_Group">Mitsubishi</a> and <a href="https://en.wikipedia.org/wiki/Daikin">Daikin</a>. Each of them bring in <a href="https://d18rn0p25nwr6d.cloudfront.net/CIK-0001466258/45e3cbd7-53cc-416d-9ed7-6e84d9b7c7ce.pdf#page=3">$1b+ in hardware, and typically maintain a 20% margin</a>.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/heat-pumps/trane-finances.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Trane&#x27;s financials from their 10-K filing. 4.3b in revenue, 20% margin</span></div></div>
<p>Many technicians have agreements directly with wholesalers or the manufacturer. As a result, it&#x27;s tough to buy a heat pump unit <em>even if you really wanted to</em>. Similar to the way car dealerships worked 20 years ago, you have to go through the dealer (aka the contractor or wholesaler) who is a licensed retailer for the manufacturer.</p>
<p>These manufacturers are slightly disincentivized from pushing heat pump adoption for two reasons: 1) many of them <em>also</em> sell furnaces and A/C units, and have whole divisions responsible for those P&amp;L statements 2) they want to maintain a margin on equipment. Rather than provide equipment that is higher-cost but easier to install, they will try and ship the lowest possible cost, and then pass the labor costs on to the end consumer.</p>
<p>There are <a href="https://www.eia.gov/consumption/residential/data/2020/hc/pdf/HC%202.1.pdf"><strong>77 million single-family homes in the US</strong></a>, and roughly <a href="https://rpsc.energy.gov/tech-solutions/hvac"><strong>3 million</strong> furnace or AC units installed per year</a>.</p>
<h2>Installing a heat pump</h2>
<p>Okay, so what&#x27;s a heat pump install look like? There&#x27;s a few different pieces..</p>
<p><strong>On-site visit</strong>: the purpose of this is two-fold. One part of it is to actually assess the quote and ensure that there aren&#x27;t any issues. The second is to help sell the homeowner on the desired solution. As part of this on-site visit, the contractor will want to do some initial system sizing to determine what sort of system you need.</p>
<p>In repair cases, the first visit ends up being the most important. When a technician says that the existing equipment is and needs to be replaced, they set the framing for the entire journey.</p>
<p>The output of the system design should include a <strong>load sizing report for the home</strong>. It should indicate how many BTUs you need to heat your home in the winter, and how many tons of cooling you need in the summer.</p>
<p>As it turns out, load sizing is a surprisingly tricky problem to get right. Design will depend on your room configuration, insulation, and even position of the home relative to the sun. The “gold standard” way to do this is by running a <a href="https://www.acca.org/standards/technical-manuals/manual-j"><strong>Manual J calculation</strong></a> on the home, and doing a <a href="https://en.wikipedia.org/wiki/Blower_door"><strong>blower door test</strong></a> which assesses the actual leakiness of the home.</p>
<p>Many contractors will skip this step if replacing the existing equipment with a newer version of the same equipment. This is another reason that contractors will prefer to stick with replacing existing systems vs swapping out for a heat pump.</p>
<p>There&#x27;s a few things that will blow up a quote...</p>
<ul>
<li><strong>new ductwork</strong> - if you live in a climate where it doesn&#x27;t get too hot, you might have small ductwork. The ducts don&#x27;t need to be very big because gas furnaces heat at a high temperature (140F). Heat pumps heat at a much lower temperature (100-110F), which means they need to move more volume of air to work. Getting to this increased volume requires increasing the duct size, which tends to add 10-20k to the up-front cost. In practice, if existing ductwork is insufficient, the installer will instead recommend a ductless system, which also tends to be more expensive (remember, if you have ducts, they are the cheapest way to deliver heat).</li>
<li><strong>panel upgrade / and new 240v outlets</strong> - older homes were wired in a time before we had all these electric appliances (solar panels, evs, etc). A heat pump might require upgrading your panel to accommodate increased amperage. Doing a panel upgrade isn&#x27;t terribly expensive, but it might add 2-3k to rewire the home. It also increases the end-to-end time for the job, as the utility may have to upgrade service to the home.</li>
<li><strong>new insulation</strong> - insulating a home is expensive, but probably the thing you should do before making any other upgrades. in most cases, insulation will be the #1 thing you can do to make your home more efficient.</li>
</ul>
<p><strong>Install</strong>: a heat pump should be able to be installed in about a day. The technicians will have to remove any old units, and run any new refrigerant lines (aka drilling holes in the walls of your home). If you have an existing AC, the technicians can likely re-use the refrigerant lines already in your walls. The technicians will install the outdoor unit by placing it on a concrete slab and hooking it up to the electrical outlets, as well as the different indoor units. Often, technicians will recommend installing new thermostats as part of this process. Finally, they&#x27;ll need to &quot;charge&quot; the refrigerants, filling the lines with R-410A (or another refrigerant).</p>
<p><strong>Follow-up / maintenance:</strong> most installers offer a 1-year warranty on any work that they do. Callbacks with heat pumps are most frequently due to leaking refrigerant (which also is bad from a climate-perspective, more on that below).</p>
<h2>Utilities</h2>
<p>There&#x27;s one other player in this market who we didn&#x27;t talk about yet: <strong>utilities</strong> and <strong>government funding</strong>.</p>
<p>The government has set aside budgets to help consumers drive the adoption of green technology. Typically these budgets are set at the federal level, and then flow down to the &#x27;state energy office&#x27;. Each state energy office then works with local utilities (<a href="https://www.coned.com/en/">ConEd</a>, <a href="https://www.pge.com/">PG&amp;E</a>) or NGOs (<a href="https://www.bayren.org/">BayRen</a>) to provide programs for consumers to adopt high-efficiency devices.</p>
<p>While the large budget numbers are set at the federal level, actually dispersing that money is very much a local issue.</p>
<p>I&#x27;ve personally found the way utilities work to be a hard thing to reason about, but in general, here&#x27;s how people have described their incentives to me...</p>
<ul>
<li>utilities charge each consumer a <strong>&#x27;fixed service fee</strong>&#x27; which is something like $20/mo. this costs them basically nothing to service (think of it like the gym membership)</li>
<li>utilities generally <strong>want to decrease peak demand</strong> on their systems, while simultaneously keeping customers connected and paying the service fee. the higher the demand, the more expensive it is for a utility to deliver that energy, so utilities will minimize spike and lower-margin &quot;peak load&quot;</li>
<li>utilities are <strong>&#x27;procurement engines&#x27;,</strong> they are very good at funding new capital improvements, if they can show a return on those projects. they tend to build little in-house</li>
<li><strong>electric</strong> utilities will incentivize <strong>switching to all-electric</strong>. <strong>gas</strong> utilities will incentivize <strong>high-efficiency devices</strong>. in some cases, utilities which do both will figure out the cost to serve individual homes, and try and switch load to whatever method is cheapest to serve.</li>
</ul>
<p>Despite all this, the software and data utilities provide is an area that is ripe for startups to dig into. It&#x27;s also a slog, the most well-known company here is <a href="https://www.oracle.com/industries/utilities/opower-energy-efficiency/">Opower</a>, but there aren&#x27;t many others willing to brave the high-touch sales cycle.</p>
<h2>Incentives and rebates</h2>
<p>A key result of that government funding is that utilities often offer $2k-3k <em>per heat pump install</em> to bring higher-efficiency devices into the home. With the <a href="https://www.rewiringamerica.org/app/ira-calculator">Inflation Reduction Act this number has expanded dramatically for low and middle income families</a>.</p>
<p>There&#x27;s a lot of problems that exist with these incentives...</p>
<p><strong>Discovery</strong> - before they can apply for incentives, homeowners have to know about them! The unfortunate thing is that these incentives are spread across many different sites. In many cases, incentives are &quot;good until the money runs out&quot;, which means a consumer might be expecting <em>more</em> money back.</p>
<p><strong>Eligibility</strong> - woof, incentive eligibility is a complex thing. Often it depends on a multitude of factors: the exact unit which was installed, the sizing vs existing equipment, the household income, and more. It is difficult for homeowners and contractors alike to wade through to know how much money they are eligible for.</p>
<p><strong>Application</strong> - applying for rebates requires an annoying amount of paperwork. In cases of things like EV chargers, consumers can fill out the applications themselves. For a heat pump, it will probably require 2-3 hours of contractor time... which is why many contractors just don&#x27;t bother!</p>
<p><strong>Financing</strong> - the biggest incentives have historically come in the form of tax credits that you can claim off your tax bill at the end of the year. This means that the consumer has to &#x27;float&#x27; the cost of the install up-front, until they can claim the rebate months later. More progressive utilities will partner with banks to <a href="https://www.velocitycu.com/loans/austin-energy/">provide interest-free loans</a>, but there&#x27;s still a big gap here.</p>
<h2>Heat pump economics (am I getting ripped off?)</h2>
<p>You might be wondering how money flows as part of this process...</p>
<ul>
<li>the <strong>wholesalers</strong> will typically charge <strong>3-5k</strong> per heat pump unit. Higher efficiency units will be more expensive (<strong>6-8k</strong>). The manufacturer will try and maintain a <strong>30% margin</strong>, with higher-efficiency units having a greater margin.</li>
<li>the <strong>installers</strong> will pass that cost onto the consumer, and then typically <em>double the cost of the unit for labor</em>. For a 5k system, there will be minimum 5k in labor costs. In certain geos with a more competitive labor market (SF, Seattle), we&#x27;ve even seen labor account for <strong>70-90% of the bill</strong>.</li>
<li>the average consumer with a high-efficiency gas furnace will save something like <strong>~$200/year on fuel costs</strong>. the consumers using more expensive forms of energy (e.g. fuel oil) will save closer to <strong>$2,000 per year</strong>.</li>
<li><strong>contractors</strong> will usually include some buffer room</li>
</ul>
<p>The net-result is that <strong>in a heat-pump popular market with many AC units</strong> and mild climate (think Austin, TX), the cost will be something like <strong>$8-10k to switch to a heat pump</strong>. In a market where AC usage is much rarer (SF), or the climate is more extreme (e.g. Seattle, Boston), <strong>a heat pump install might run you $20k.</strong></p>
<h2>A word on Refrigerants</h2>
<p>While I&#x27;m generally bullish on heat pumps and convinced that we must adopt them to prevent climate change, today&#x27;s heat pumps aren&#x27;t <em>all</em> sunshine and rainbows. There&#x27;s one big problem with rolling them out: <strong>refrigerants</strong>.</p>
<p>Remember, <a href="https://en.wikipedia.org/wiki/Refrigerant">refrigerants are substances which transfer heat from outside in (or vice versa)</a>. Refrigerants have a very low boiling point (think -40 degrees F). The low boiling point makes it easy to change the pressure to adjust whether a refrigerant is liquid or gas, which is good for heating transfer, but can create problems elsewhere.</p>
<p>Notably, many refrigerants have a high<a href="https://en.wikipedia.org/wiki/Global_warming_potential"> <strong>global-warming-potential</strong> (GWP)</a>. This number is measured relative to CO2 (which has a GWP of 1), and expresses how much worse a molecule of refrigerant is when it comes to global warming.</p>
<p>There have been thousands of refrigerants categorized over time. They are typically numbered by <a href="https://en.wikipedia.org/wiki/List_of_refrigerants">R-{Ccount - 1}{Hcount + 1}{Fcount}</a> when viewed as pure chains, but when they are combined the numbering scheme changes to be more sequential.</p>
<p>Today&#x27;s most popular refrigerant is <a href="https://en.wikipedia.org/wiki/R-410A">R-410A</a>. It is a mixture of CHF<sub>2</sub>CF<sub>3</sub> and CH<sub>2</sub>F<sub>2</sub> has a GWP of 1,400. (It should be noted that even with some refrigerant leakage, <a href="https://heatpumpshooray.com/">we still see that heat pumps come out ahead when it comes to GWP</a>)</p>
<p>Over time, we&#x27;ve been steadily outlawing high global-warming-potential refrigerants (remember CFCs and HFCs?). <a href="https://www.epa.gov/newsreleases/us-will-dramatically-cut-climate-damaging-greenhouse-gases-new-program-aimed-chemicals">And in 2024, the industry is gearing up to phase out R-410A</a>.</p>
<p>The one problem is that there aren&#x27;t a lot of other <em>good</em> alternatives that exist. Refrigerants fall into two classes...</p>
<ul>
<li><strong>flammable</strong>: these are refrigerants like <a href="https://en.wikipedia.org/wiki/Propane_refrigeration">Propane (R-290)</a> or <a href="https://en.wikipedia.org/wiki/Isobutane">Isobutane</a>. We can&#x27;t roll them out without updating a bunch of building codes. <em>As an aside, it&#x27;s sort of crazy to me that we decide to limit propane in building codes, when we pipe natural gas into homes all the time to literally light it on fire.</em></li>
<li><strong>high-pressure</strong>: among the leading non-flammable refrigerants is <a href="https://www.danfoss.com/en/about-danfoss/our-businesses/cooling/refrigerants-and-energy-efficiency/refrigerants-for-lowering-the-gwp/carbon-dioxide-co2/">CO2</a>. It&#x27;s not flammable (good!) has a GWP of 1 (also good!) but requires a <em>lot</em> more pressure to turn from vapor to liquid (bad!). This means more expensive compressors.</li>
</ul>
<p>Finding new refrigerants is an active area of research, and will probably require a sea change. Most of the big manufacturers seem to be heavily investigating this area of research.</p>
<h2>Buckets of problems</h2>
<p><strong>Consumer awareness</strong> - most consumers don&#x27;t know (or care much) about how they get their heat. Despite being available for many years, heat pumps have only recently entered the popular discourse. Unlike solar or EVs, heat pumps are pretty much invisible, and it&#x27;s not yet a status symbol in any dimension.</p>
<p><strong>Urgency of problem</strong> - most HVAC systems are replaced when the old system has broken. Consumers typically don&#x27;t have a ton of appetite to do a lengthy heat pump install when they can&#x27;t keep their home warm! On the flip side, when an HVAC system is working smoothly, consumers have little desire to switch it out and add risk to the process.</p>
<p><strong>Contractor incentives</strong> - contractors typically want to avoid any sort of callback. The easiest way to do this is to replace the existing equipment. It minimizes risk, and additional labor hours. You don&#x27;t have to worry about running into issues which might lead to a drastically higher quote (duct sizing or panel upgrades).</p>
<p><strong>Regional markets</strong> - one reason HVAC is tricky for startups to tackle is that it&#x27;s a highly fragmented, very regional market. The pitch for consumers is actually different depending on geo. In SF or Seattle, heat pumps might be viewed as &#x27;adding an AC for the first time&#x27;. In the southeast, it&#x27;s a way of reducing your electricity bill by 2-3x. In the northeast, heat pumps are best positioned to replace expensive sources of fuel like fuel oil.</p>
<h2>Areas of opportunity</h2>
<p><strong>Financing</strong>. Solar has a rich set of options for financing the big up-front cost. These largely do not exist with heat pumps today (though local utilities may sometimes run 0% APR loan programs). Applying for rebates and incentives is an arduous process, and often <em>still</em> requires that you spend money up-front.</p>
<p><strong>System design</strong>. Designing a system for a home is tricky, and often involves a visit to the home. This is both annoying for consumers, and costs the contractor as it requires an extra 2-3h of transit time to prepare a quote that they may not even be paid for! The holy grail here would be working in the same way that solar does: going directly from address -&gt; design. We&#x27;ve built a lot of this modeling into <a href="http://heatpumpshooray.com/">Heat Pumps, Hooray!</a>, but there&#x27;s still internals of the system which require more work to surface (ductwork, panel, refrigerant lines). There are some companies focused exclusively on system design (<a href="https://www.coolcalc.com/">CoolCalc</a>, <a href="https://www.getarch.com/">Arch</a>, <a href="https://www.getconduit.com/">Conduit</a>).</p>
<p><strong>New heat pumps and refrigerants</strong>. There&#x27;s a number of startups trying to build new heat pumps: <a href="https://www.gradientcomfort.com/">Gradient</a>, <a href="https://stow.energy/">Stow</a>, <a href="https://dandelionenergy.com/">Dandelion Energy</a>, and others. A system like Gradient&#x27;s window unit doesn&#x27;t require any sort of contractor install and works for multi-family apartment buildings. Dandelion takes a differentiated approach by focusing on ground-source heat pumps, which are more difficult to purchase.</p>
<p><strong>Purchase journey.</strong> Right now, it&#x27;s fairly difficult to get a heat pump even if you are highly motivated and not budget constrained. There&#x27;s a number of companies trying to help the purchase journey: <a href="https://www.woltair.com/">Woltair</a> in Poland, <a href="https://lun.energy/">Lun</a> in the EU. Each of them are working to make buying a heat pump more similar to buying a dishwasher or washer/dryer online.</p>
<p>Additionally there are some other pathways that folks are trying to explore that I&#x27;m a little less bullish on, but I thought are worth mentioning here.</p>
<p><strong>Luxury products?</strong> A number of folks are investigating whether heat pumps might become a luxury product. I&#x27;m personally skeptical here. Heating and cooling are a lot less tangible than an EV or having solar on your roof. That said, if you could find a way to turn a heat pump into a futuristic technological device or a high-status symbol, I think you&#x27;d have a lot of luck. <a href="https://www.quilt.com/">Quilt</a> is one such company, applying high industrial design to an antiquated market. For products like an induction stove where you use it daily, I think this approach can work much better (a la <a href="https://www.impulselabs.com/">Impulse</a>).</p>
<p><strong>Whole-home electrification</strong>. It&#x27;s far more common to upgrade a single component at a time. But there are a few companies trying to do whole-home electrification by providing better hardware and software: <a href="https://www.lunarenergy.com/">Lunar Energy</a>, <a href="https://www.elephantenergy.org/">Elephant Energy</a>.</p>
<h2>Thanks</h2>
<p>First and foremost, I wanted to thank <a href="https://www.linkedin.com/in/bakershogry/">Baker</a> for being an incredible thought partner throughout all of this and building something really cool and useful.</p>
<p>Additionally, there have been a handful of folks we&#x27;ve had conversations with who fundamentally changed my understanding of the Heat Pump market: <a href="https://www.linkedin.com/in/shaylekann/">Shayle</a> (if you have not checked out <a href="https://www.canarymedia.com/podcasts/catalyst-with-shayle-kann">his podcast</a>, you owe it to yourself to do so), <a href="https://www.linkedin.com/in/andy-lubershane-0ab3bb24/">Andy</a>, and Gregory from EIP, <a href="https://www.innovationendeavors.com/team/sam-smith-eppsteiner/">Sam</a> and <a href="https://www.innovationendeavors.com/team/josh-rapperport/">Josh</a> from Innovation Endeavors, <a href="https://fiftyyears.com/team/seth-bannon">Seth</a> and <a href="https://fiftyyears.com/team/alex-teng">Alex</a> from 50y, and <a href="https://twitter.com/KVibhor">Vibhor</a> and <a href="https://www.linkedin.com/in/david-cahn-60150793/">David</a> from Coatue, <a href="https://erikareinhardt.com/">Erika Reinhardt</a></p>
<p>Thank you demo crew: <a href="https://www.linkedin.com/in/kevin-niparko-5ab86b54/">Kevin Niparko</a> and <a href="https://www.sequoiacap.com/people/lauren-reeder/">Lauren Reeder</a> for keeping us going with feedback and encouragement throughout.</p></div></content></entry><entry><title>Designs for Thought</title><link href="https://calv.info/designs-for-thought"/><id>https://calv.info/designs-for-thought</id><updated>2022-10-31T21:26:30.091Z</updated><author><name>Calvin French-Owen</name></author><summary>Why did it take us 50 years to figure out better tools for thought? And how do we shortcut that next time around?</summary><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>Everybody has that one friend who&#x27;s always trying to hack their productivity. For me, that friend is Andy.</p>
<p>Andy is the type of guy who obsessesively logs his meals and meticulously tracks his sleep. He&#x27;s experimented with keto and intermittent fasting. He loves his Eight Sleep mattress and Oura Ring. And every week, Andy has a new note-taking system.</p>
<p>One day, Andy posted about <a href="https://roamresearch.com/">Roam</a>. I was at an exploratory phase in life and my curiosity was piqued, so I decided to check it out.</p>
<p>I have to say I was pretty skeptical going in–new note-taking apps seem to become &#x27;in vogue&#x27; and then fade on a predictable cycle.</p>
<p>But Roam (and now <a href="https://obsidian.md/">Obsidian</a>) changed my mind. They have completely altered the way I go about my day.</p>
<p>If you&#x27;re concerned that this will be a puff piece, don&#x27;t be. This isn&#x27;t a tools-for-thought fanboy post.</p>
<p>Instead, I&#x27;d like to dig into a handful of the little <strong>design decisions</strong> that make Roam a really novel product.</p>
<p>This is important because we&#x27;ve had the computing power, network stack, and all the software we&#x27;d need to build these tools for thought for decades. It&#x27;s all just files, markdown, and a basic UI.</p>
<p>It&#x27;s not entirely clear to me why it took so long for us to build these sorts of tools. But what is clear is that we need to do more of it (more on that later).</p>
<h2>Judging a book by its cover</h2>
<p>I took too long to give Roam a shot because the UI looks... kind of ugly?</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info./designs-for-thought/header.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/></div>
<p>Odd typography, uneven spacing, bulletpoints and brackets everywhere. What even is this tool?</p>
<p>But in many ways, Roam <em>is</em> actually quite well-designed. Just not designed to be looked at.</p>
<p>Roam is designed to be <em>engaged</em> with. Each part of the interface silently nudging you to write a little more, and put down a few more connections.</p>
<p>That&#x27;s no accident. There&#x27;s a ton of small features that add up to a transformational experience.</p>
<h2>Time-based vs subject-based</h2>
<p>Most note-taking tools either focus on being <strong>time-based</strong> (a journal that has one new entry per day) or <strong>subject-based</strong> (let me create a new <a href="https://www.notion.so/">Notion</a> page for each project).</p>
<p>Critically, you have to <em>choose</em> which approach you prefer! Most note-taking apps require that you buy into one approach or the other... but you can&#x27;t do both!</p>
<p>Here&#x27;s the problem though: I&#x27;ll often have uncategorized musings that pop up from day-to-day across all variety of subject matter <em>(e.g. I read an <a href="https://slatestarcodex.com/2019/06/04/book-review-the-secret-of-our-success/">interesting blog post on culture</a>, or a friend told me about <a href="https://slimemoldtimemold.com/2021/07/07/a-chemical-hunger-part-i-mysteries/">the obesity epidemic</a>)</em>. At the same time, I&#x27;ll have highly structured &#x27;streams&#x27; of projects (what are the next steps on the <a href="https://heatpumpshooray.com">Heat Pump calculator I&#x27;m building</a>?).</p>
<p>With most notetaking apps, <strong>before I start writing anything, I have to figure out where my writing should go!</strong> Sometimes I lose the thought before I even start a new note.</p>
<p>Roam&#x27;s default view gives me a new time-based note every single day. It&#x27;s the default &#x27;homepage&#x27; where I&#x27;ll start cataloging my day.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info./designs-for-thought/time-based.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/></div>
<p>Down below, yesterday&#x27;s note is right there. I don&#x27;t even have to &#x27;reach for the note&#x27; to see what I was doing yesterday. It&#x27;s already right there in front of me.</p>
<p>This does two things:</p>
<ol>
<li>It&#x27;s easy to see what I was doing on a particular date (I can jump straight to the February 1st, 2022 note)</li>
<li>It&#x27;s easy for me to scroll back in time to get context on what I was just doing recently.</li>
</ol>
<p>In short, it&#x27;s the best of both worlds: I can still jump to a particular day, <em>and</em> it feels like it&#x27;s part of a long-running note. But I can also jump into an individual idea or topic as I see fit.</p>
<p>The nice thing about these subject-based notes is that they feel very free. I don&#x27;t have to worry about losing them because I can search for them, or find them temporally.</p>
<h2>The &#x27;blank page&#x27; problem</h2>
<p>Most note-taking apps don&#x27;t do anything to help you <em>start</em> taking a note. Here&#x27;s what the Apple Notes interface looks like.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info./designs-for-thought/apple-notes.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/></div>
<p>It&#x27;s pretty... bare. There&#x27;s not a lot here that&#x27;s encourage me to write.</p>
<p>Roam does a few very clever things to solve this. I already touched on the &#x27;daily notes&#x27;, but every day there&#x27;s a new note which is prompting me to start writing. [1]</p>
<p>What&#x27;s more, any time you write a line and hit &#x27;enter&#x27;, a new line appears. But that new line isn&#x27;t so daunting that it&#x27;s totally free form and blank. Instead, a helpful bullet has appeared waiting for your input.</p>
<p><img src="https://calv.info./designs-for-thought/prompt.png" alt=""/></p>
<p>I used to hate the bullets, until I realized how much new information I was adding because of them.</p>
<p>Something about bulleted writing feels less scary, and more information dense. It encourages work-in-progress thoughts that can be edited later.</p>
<h2>Losing context</h2>
<p>Another big issue I have with most tools (note-taking, browsers, etc) is that I tend to perform an action that makes me lose context about what I was doing in the first place.</p>
<p>A good example is when I&#x27;m going to look up someone&#x27;s contact info on LinkedIn. I&#x27;d guess that about 30% of the time, I&#x27;m immediately distracted by the feed and forget for a moment why I&#x27;m there. [2]</p>
<p>Roam solves that by letting you open notes &quot;in-context&quot;. If you shift+click a given page that you&#x27;ve linked to, it will open up in the &#x27;sidepane&#x27;.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info./designs-for-thought/sidepane.gif" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>In-line panes help maintain flow and keep context</span></div></div>
<p>What&#x27;s great about this is that you can edit <em>either</em> pane, and continue to keep the flow of your thoughts. There&#x27;s no losing flow state as you switch tabs.</p>
<h2>Progress</h2>
<p>Another area most note-taking tools miss out on is that they don&#x27;t give you any feeling of &#x27;progress&#x27;. There&#x27;s not a ton of incentive to add more to what you already have.</p>
<p>Contrast that to Roam: there&#x27;s a view that shows you every single connection you&#x27;ve made. If you&#x27;ve seen any marketing material of the tool, you&#x27;ve probably seen it. Here&#x27;s my Obsidian graph:</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info./designs-for-thought/obsidian-graph.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Would this be a real &#x27;tools for thought&#x27; post if I didn&#x27;t show off a graph?</span></div></div>
<p>Now, the graph is basically useless. I never look at it on a daily basis at all. But it does give me one thing: <strong>a sense of progress</strong>.</p>
<p>The more I write, the more intricate and exciting my graph looks. It feels like I&#x27;m getting smarter just by writing. Every few months, I can check back in with it.</p>
<p>Connections in Roam and Obsidian aren&#x27;t just more visible, it&#x27;s <em>easier</em> to make them too.</p>
<p>Instead of a bunch of nested sub-menus and button presses, it&#x27;s 4-5 keystrokes (double bracket, some lines of the connection you&#x27;d like to make, enter).</p>
<p>Connections are useful in a second way: they make me feel like I&#x27;m doing &quot;behavior-driven development&quot;.</p>
<p>Instead of going through 2-3 step workflow to create a new note and typing it out, I&#x27;ll start my day by creating the stubs of all the writing I&#x27;d like to do (right in-line of what I&#x27;m doing), and then fill them in over the course of the day.</p>
<h2>UI makes the difference</h2>
<p>I&#x27;d like to close with a broader thought around why any of these little UI patterns matter.</p>
<p>Think about text editors writ large. They have existed the dawn of computing. <a href="https://en.wikipedia.org/wiki/Vi">Vi</a> was created 46 years ago. But up until the last few years, these sorts of &#x27;networked tools for thought&#x27; had never existed.</p>
<p>What was the limiting factor? Could we have built these tools in the past?</p>
<p>Absolutely!</p>
<p>We weren&#x27;t limited by computation speed, or size of data. The new &quot;innovation&quot; here is just a curated set of good ideas for interacting with text. It could have happened at literally any time in the age of the personal computer... and it happened today rather than 20 years ago.</p>
<p>Right now, we&#x27;re on the edge of incredible capabilities that exist with AI: <a href="https://openai.com/api/">GPT-3</a>, <a href="https://openai.com/dall-e-2/">Dall-e 2</a>, <a href="https://huggingface.co/spaces/stabilityai/stable-diffusion">Stable Diffusion</a>, etc. There&#x27;s a lot of research going into these fundamental capabilities (as there should be).</p>
<p>Where I think there&#x27;s a ton of whitespace is in how we interact with and leverage these tools. We need the layer that sits on top of AI to really unlock the true potential of the tech.</p>
<p>I&#x27;m happy to see a number of startups pop up in this space: <a href="https://everyprompt.com">Everyprompt</a> (disclosure: investor), <a href="https://playgroundai.com/">PlaygroundAI</a>, <a href="https://www.midjourney.com/">Midjourney</a>, <a href="https://lex.page/">Lex</a> and more. But it&#x27;s time for even more.</p>
<p>The future won&#x27;t come from people typing into the GPT-3 textarea box. It won&#x27;t from &quot;bolting AI onto an existing product&quot;.</p>
<p>The big paradigm shifts will be the ones that design new AI-native interfaces. <em>Those</em> are the tools I&#x27;m excited to use.</p>
<hr/>
<p>[1]: <a href="https://paper.dropbox.com">Dropbox Paper</a> does this as well
[2]: My friend <a href="https://jamie-wong.com/">Jamie Wong</a> has created <a href="https://github.com/jlfwong/dedistract">dedistract</a> to deal with this problem</p>
<hr/>
<p><em>Thanks to Peter Reinhardt and Lauren Reeder for giving feedback on this post.</em></p></div></content></entry><entry><title>Visiting Quaise</title><link href="https://calv.info/visiting-quaise"/><id>https://calv.info/visiting-quaise</id><updated>2022-05-22T22:14:57.646Z</updated><author><name>Calvin French-Owen</name></author><summary>Quaise is a new startup aiming to make widespread geothermal energy a reality. They utilize a high-powered mm-wave energy beam to vaporize rock 10km beneath the earth&#x27;s surface. I got to tour their facilities, and watch them pulverize rock before my eyes.</summary><content type="xhtml"><div xmlns="http://www.w3.org/1999/xhtml"><p>A few weeks ago, I had the incredible experience touring the <a href="https://www.quaise.energy/">Quaise</a> facilities at <a href="https://www.ornl.gov/">Oak Ridge National Laboratory</a>. Unlike other lab tours I&#x27;ve done, the folks at Quaise were actually so kind as to run their machine end-to-end. Thanks to a $30 webcam (that sadly malfunctioned due to high heat in the process), I could watch the whole thing before my eyes.</p>
<p><em>Disclosure #1: I am a small-time investor in Quaise. I am biased, but what they are doing is incredibly cool.</em></p>
<p><em>Disclosure #2: I am an armchair physicist at best. Some of these descriptions may be a little butchered.</em></p>
<h2>What is Quaise?</h2>
<p>Historically, geothermal has met less than 0.1% of the US energy needs. And yet, the total amount of thermal energy stored in the earth&#x27;s mantle is <em>massive</em>, it&#x27;s over two billion times our annual energy consumption! In fact, <a href="https://elidourado.com/blog/geothermal/">Eli Dourado estimates that there is 23,800x the amount of energy in the earth&#x27;s crust as from coal, oil, gas, and methane <em>combined</em></a>.</p>
<p><a href="https://www.quaise.energy/">Quaise</a> is a new startup out of MIT which is trying to unlock that extra energy. The goal is to produce cheap, clean (and practically limitless) geothermal electricity.</p>
<p>The tech is still early, but incredibly promising. I think we&#x27;ll start seeing geothermal generation begin in earnest in the next decade.</p>
<p><strong>How does Geothermal work?</strong></p>
<p>If you haven&#x27;t spent a lot of time thinking about geothermal, here&#x27;s the easiest way to think about it:</p>
<ol>
<li>we drill two holes, one for cold water, one for hot water down to an ambient temperature of about ~400 degrees C (enough for water to become <a href="https://en.wikipedia.org/wiki/Supercritical_fluid">supercritical</a>), and we connect them via fissuring/fracking [1].</li>
<li>we fill the holes with water (or some other working fluid) and create a closed loop</li>
<li>we pump water from the surface down the cold borehole</li>
<li>the water absorbs heat as it pushes towards the hot borehole</li>
<li>supercritical water rises as steam up the hot borehole, spinning a turbine and generating clean electricity</li>
</ol>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/catf-geothermal.gif" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Superhot Rock Geothermal, courtesy CATF: https://www.catf.us/</span></div></div>
<p>This process is known as <a href="https://www.catf.us/work/superhot-rock/">superhot rock geothermal</a>, and it could be one of our best sources of clean electricity.</p>
<p>There are a handful of (mostly volcanic) places where temperatures near the surface are quite high and you only need to go a few kilometers deep: Iceland, Italy, and New Zealand to name a few. But these are very limited in number.</p>
<p>Everywhere else, we need to go deep. Very deep. Deeper-than-the-Mariana-trench deep.</p>
<p>As you might guess, drilling this deep gets really expensive. Most oil &amp; gas drilling rigs use mechanical drill bits to drill down 1-3km. The deepest hole ever bored was the <a href="https://en.wikipedia.org/wiki/Kola_Superdeep_Borehole">Kola Borehole dug by the Russians</a>, and it took 10+ years to drill down to 12km depth.</p>
<p>For widespread deployment of super-hot geothermal energy, we need to drill to a depth closer to 10km, typically going through crystalline basement rock. This would be cost-prohibitive using traditional techniques from the oil and gas industry. Most of your time is spent pulling out broken mechanical drill-bits and replacing them.</p>
<p>The more we can efficiently drill low-cost, super-deep holes, the more potential sites there are for geothermal. That&#x27;s where Quaise comes in.</p>
<p><strong>Quaise: throwing away the drill-bit</strong></p>
<p>Instead of having to constantly repair mechanical drill bits working at high temperatures and pressure, Quaise hopes to provide low-cost drilling to 10km depths via a high energy milimeter-wave.</p>
<p>You can think of this as a high-powered energy beam that will melt and vaporize rock (even at great depths). The plan is to then blow the vapor up to the surface using pumped gases.</p>
<p>The beauty of this approach is that it can all be done using energy generation from Earth&#x27;s surface.</p>
<p>If you&#x27;d like to learn more about the mm-wave approach directly from the source, I recommend checking out Paul Woskov&#x27;s 2015 talk <a href="https://www.youtube.com/watch?v=J0Zk6sVxKbI">&quot;Into the Bedrock by Full Bore Millimeter-Waves&quot;</a>.</p>
<h2>Oak Ridge National Laboratory</h2>
<p>I arrived Monday night at Knoxville airport, grabbed one of the last rental cars, and then drove from Knoxville to Oak Ridge. After a late-night McDonald&#x27;s and check-in to the Holiday Inn Express, I woke up and headed out to Oak Ridge National Labs.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/quaise-badge.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>When you left the &#x27;company&#x27; field blank</span></div></div>
<p>Why Oak Ridge?</p>
<p>One reason is that Quaise raised money from <a href="https://arpa-e.energy.gov/">ARPA-E</a> and the <a href="https://www.energy.gov/">DOE</a> to demonstrate a live drilling test (more on the milestones below).</p>
<p>The other reason is that ORNL is also one of the few places in the US that has working <a href="https://en.wikipedia.org/wiki/Gyrotron">Gyrotrons</a>. These are high-powered devices that basically drive a stream of resonant electrons. They&#x27;ve been used extensively with fusion research as a way of heating hydrogen into a plasma, and as part of non-lethal weapons research.</p>
<h2>Architecture</h2>
<p>Quaise&#x27;s architecture consists of a few parts...</p>
<ol>
<li>the Gyrotron generates a high-energy mm-wave</li>
<li>the beam travels many kilometers along a waveguide</li>
<li>the guide hits rock which gets it up to temperatures where the rock vaporizes</li>
<li>injected gas blows the vapor out of the hole to the surface</li>
</ol>
<p>The demo we saw consisted of three of these pieces with a ~40ft waveguide.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/quaise-architecture.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/></div>
<p><strong>The Gyrotron</strong></p>
<p>The gyrotron is the device that&#x27;s responsible for generating the mm-wave energy beam. There&#x27;s a set of transformers which create a very large voltage differential (80kV in the machine we saw, but it could be up to 1MW) that then drive dc current through a filament. The filament emits electrons which are then focused via a series of electromagnets.</p>
<p>The moving electrons in a magnetic field will generate electromagnetic waves at approximately 100ghz or so. The electrons themselves become coherent due to resonant standing waves in the magnetic cavity (sort of like an invisible laser beam).</p>
<p>Here&#x27;s a picture of the Gyrotron. Note the three electromagnet bands around it, and the cooling tubes at the bottom. The wave will travel vertically up the Gyrotron. I&#x27;d guess this one was about 8ft tall or so.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/gyrotron.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Power supply and cooling tubes feed into the bottom and the wave travels vertically</span></div></div>
<p>The voltage is supplied by giant power supplies sitting nearby...</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/gyrotron.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>There were four of these in the lab</span></div></div>
<p>Historically there hasn&#x27;t been a very large market for Gyrotrons, on the order of tens per year are produced, and each one costs something like 700k-2.5m. The kind Quaise are using today get about 30-35% efficiency on output, but the Quaise team guessed that they could get up to something like 60% power efficiency if they can switch to superconducting magnets.</p>
<p><strong>The Waveguide</strong></p>
<p>Once the energy beam has started from the Gyrotron, it needs to be directed to the borehole. I got the impression that this is a relatively well-understood idea in the lab, but a non-trivial problem drilling at depth. After all, how do you send an energy beam down tens of kilometers into the ground?</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/waveguide.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>A corrugated waveguide</span></div></div>
<p>Quaise is using &#x27;corrugated&#x27; waveguides (see the little ridges in the guide). This helps avoid electric field buildup on the walls of the wave guides. Today&#x27;s wave guide costs about $1,000/m, which is incredibly expensive. The good news is that these waveguides can be re-used and can pretty easily &#x27;reflect&#x27; the waves themselves around corners.</p>
<p>The wave guide will also act as the conduit for the gas, pumping it down into the hole so that the rock vapor can rise to the surface.</p>
<p><strong>The window</strong></p>
<p>Finally there&#x27;s a window at the end to ensure that the mm-wave can drill into the rock, but that rock can&#x27;t come back up the tube.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/window.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>A cracked window. The wave can travely freely through it, but it should keep particulate out.</span></div></div>
<p>These window are tricky in that they 1) have to let the mm-wave pass through them and 2) have to withstand very high pressures and temperature to avoid damaging any part of the gyrotron with reflected energy or arcing from the plasma.</p>
<h2>In Action</h2>
<p>The Quaise team were kind enough to test the product end-to-end.</p>
<p>They started by locking up the room. In order to test, multiple people had to unlock their keys from the lockbox. When the team engaged the power supply, you could hear a noticeable &quot;thunk&quot; from the machinery next door.</p>
<p>On the lefthand side (next to Tim) you can see the power controls for the Gyrotron. On the right is the system monitoring, manned by Matthew.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/starting-the-gyrotron.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>The left-hand controls manage the gyrotron. The &#x27;arc detectors&#x27; sit above.</span></div></div>
<p>The upper two units you see are to protect against any arcing from the plasma. The minute that happens, these will cut the power and the current flow to avoid damaging equipment.</p>
<p>Here&#x27;s a closer look at the metrics being monitored:</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/labview.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Labview in action</span></div></div>
<p>The most important metrics to watch here are the forward power and rock temperature.</p>
<p>Here&#x27;s a shot of the first test run with the system fully engaged. On the right you can see the webcam inside the drum. The left-hand monitor is showing off an outside high-speed camera for diagnostics, as well as a bunch of labview charts inspecting the input voltage, forward power, and temperatures of the rock.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/first-run.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>The left-hand monitor shows the important metrics to watch. The right-hand monitor shows a view from the webcam inside the drum.</span></div></div>
<p>We tried 3 attempts. In each one, the power would get to about 12kw, and then we&#x27;d have some sort of arc which would kick in and cut the power. [2]</p>
<h2>Milestones</h2>
<p>Quaise views their milestones in terms of width:depth ratio. Most of the milestones are trying to prove out that the technology is actually able to reliably drill a hole, using nothing more than electricity.</p>
<p><strong>Vaporized hole</strong></p>
<p>Today, Quaise is able to melt rock. The next goal is to fully vaporize a core sample all the way through. You can see the following discarded samples from previous attempts. Each one is about the size of a dinner plate and a few inches thick.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/previous-samples.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Previous runs. The gray rock is crystalline basalt, and it&#x27;s surrounded by concrete.</span></div></div>
<p>The white-ish surrounding is concrete which is poured to ensure the sample holds its shape. The muted gray is the crystalline form of the rock (Basalt). The shiny pieces are rock in its amorphous form, after it has rapidly cooled and doesn&#x27;t have time to crystallize.</p>
<p>The next step is to try and apply more pressure (something like 70psi) to see if that will help.</p>
<p><strong>1:10</strong></p>
<p>This is the scale that will happen in the lab. Quaise engineers will keep the gyrotron stationary, and &#x27;feed&#x27; rock up into it at a constant distance. The goal will be to drill about 10cm worth of rock.</p>
<p><strong>1:100</strong></p>
<p>Once the vaporization is working in the lab, it&#x27;s time to move outside.</p>
<p>Right outside the lab, there&#x27;s a 40ft deep well, about 10in across. The end goal of the ARPA-E funding is to fill the well with rocks and drill all the way to the bottom with nothing more than electricity and a purge gas.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/outer-well.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Hold on to your phones...</span></div></div>
<p>To do this, the Quaise team will keep the Gyrotrons in their main building, but run the waveguide through a separate building and then out to the well site.</p>
<p><strong>1:1000</strong></p>
<p>Once the above milestones are complete, it&#x27;s time to actually test at a real site. The Quaise team are looking at sites in NM and CO, where there&#x27;s plenty of crystallized basalt available to test with.</p>
<h2>Interesting Problems</h2>
<p><strong>Mode</strong></p>
<p>One problem with using old Gyrotrons is that the <a href="https://en.wikipedia.org/wiki/Mode_(electromagnetism)">mode frequency</a> has some skew. What that means is that there are hot-spots in the area where drilling happens, rather than a focused energy beam that is able to channel more of the power directly from the Gyrotron to the rock.</p>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/mode-frequency.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>The right screen shows where the power is dissipated. It should be a single focused beam rather than a &#x27;four-leaf clover&#x27; shape.</span></div></div>
<p>You can see this in the above picture on the right-hand screen. Instead of having a &#x27;single focused beam&#x27;, there are 4 different hot spots. In order to achieve maximal power output, the team will have to figure out how to consolidate these frequency modes.</p>
<p><strong>Containing plasma</strong></p>
<p>Modeling plasma is notoriously difficult.</p>
<p>Generally speaking, this is a relatively cold plasma (only around 3,000 deg C). It <em>should</em> work sort of like a candle. There&#x27;s heat which is applied to the rock. The rock starts melting and turns into a liquid. The liquid starts turning into plasma and vapor.</p>
<p>The trouble though is that plasma in practice tends to be very difficult to model and maintain. If not kept as a thin layer, the plasma will tend to just suck up all energy that is applied to it.</p>
<p>Additionally, the plasma has a tendency to &#x27;arc&#x27;, sort of like lightning where it will fly back up into the waveguide to follow the stronger electric field coming from the Gyrotron. The arc protectors immediately cut the current when this happens, but it&#x27;s currently an unsolved problem.</p>
<p><strong>Window material</strong></p>
<p>The window which keeps unwanted particles coming up the waveguide have a lot of requirements. It needs to be incredibly strong to resist the pressure, heat, and plasma that are coming up to it. It also need to be made of a material that won&#x27;t interfere with the wave.</p>
<p>Today, the team is using a mixture of quartz and sapphire windows. There&#x27;s some talk of using a pure diamond plate... but doing so would easily cost hundreds of thousands of dollars.</p>
<p><strong>Reflective substances</strong></p>
<p>I hadn&#x27;t really thought about this before talking , but hitting reflective substances can be really tricky. If the substance is <em>too</em> reflective, the energy won&#x27;t be absorbed, and will instead send the energy back up the tube! It&#x27;s the opposite of what we want.</p>
<p>Different types of substances also behave differently when exposed to high-energy mm-waves. Limestone, for example, has a tendency to crumble, and won&#x27;t provide a consistent vaporization.</p>
<p>There&#x27;s a lot more work to be done when it comes to testing out how various substances behave.</p>
<p><strong>Hole siding</strong></p>
<p>The hope is that as the Gyrotron bores the hole, it will begin to &quot;vitrify&quot; the sides of the borehole, and they turn to glass. These conditions haven&#x27;t been fully tested as yet, and there&#x27;s some question of whether the borehole will need additional pipes along the walls of the rock.</p>
<p><strong>Better with pressure?</strong></p>
<p>As the waveguide gets deeper and deeper into the ground, there&#x27;s some thinking that the additional heat and pressure will actually make it <em>easier</em> to drill into the rock. I get the impression that we don&#x27;t know a ton about the conditions that deep in the earth.</p>
<h2>Tailwinds and thanks</h2>
<p>If it isn&#x27;t clear already, Quaise is doing incredible work. I&#x27;m extremely bullish on their ability to make widespread geothermal a reality. There&#x27;s a few other tailwinds...</p>
<ul>
<li><strong>regulatory ease</strong>: we&#x27;re already very used to drilling holes in the ground, thanks to oil &amp; gas. it&#x27;s a very straightforward regulatory pathway</li>
<li><strong>abundant land</strong>: most of the western US has sufficient ground temperatures and crystalline rock for Quaise to work (see following chart). specialized sites aren&#x27;t really required</li>
<li><strong>falling cost of electricity</strong>: combined with the falling cost of solar, Quaise&#x27;s drills will become cheaper and cheaper to power as the cost of electricity falls</li>
<li><strong>demand for renewable baseload power:</strong> there&#x27;s increasing demand to provide renewable &#x27;baseload&#x27; energy at all times of day. only fission and hydroelectric power manage to do this right now. fission is more difficult from a regulatory standpoint (though it should get more support!). hydroelectric is limited in terms of potential sites. geothermal can navigate both.</li>
</ul>
<div class="mt-7 md:mx-0"><img src="https://calv.info/visiting-quaise/shr-sites.png" class="rounded-lg border" style="transform:scale(1);transform-origin:center"/><div class="text-center -mt-6 italic"><span>Sites where super-hot-rock geothermal is viable, via CATF: https://www.catf.us/work/superhot-rock/</span></div></div>
<p>Personally, I&#x27;m quite excited about a future where clean renewable energy is abundant. Quaise&#x27;s approach seems to be one of the best shots at that future.</p>
<p>Many thanks to <a href="https://www.linkedin.com/in/carlos-araque-quaise/">Carlos Araque</a>, <a href="https://www.linkedin.com/in/henry-phan-2026243/">Henry Phan</a>, and <a href="https://www.linkedin.com/in/matt-houde-71a4215b/">Matthew Houde</a> for organizing the tour, and to <a href="https://elidourado.com/blog/geothermal/">Eli Dourado for turning me onto Quaise in the first place</a>. If these problems seem interesting to you, Quaise is hiring <a href="https://www.quaise.energy/company">across the board</a>.</p>
<hr/>
<p>[1]: Technically, Quaise drills three holes, because the water coming out is less dense than the water being pumped underground.</p>
<p>[2]: Carlos tells me they&#x27;ve now hit 20kW</p></div></content></entry></feed>