Articles

Articles

Why did it take us 50 years to figure out better tools for thought? And how do we shortcut that next time around?

MAY 22, 2022

Quaise is a new startup aiming to make widespread geothermal energy a reality. They utilize a high-powered mm-wave energy beam to vaporize rock 10km beneath the earth's surface. I got to tour their facilities, and watch them pulverize rock before my eyes.

Recently I've talked to a bunch of different founders about how they should think about the CTO role as they scale. They aren't sure what they should be doing. Here's the framework I wish I'd had in Segment's early years.

As a founder, you want to strive for what I call 'high bit-rate' communication. For every minute of speaking, is your listener getting a lot of information?

In Segment’s early days, we hit countless problems as a founding team. And at the time, I thought those problems were unique to our own special snowflake of a founding journey. I chalked it up to us being new grads and first-time founders.

NOV 11, 2020

As they say... some personal news. After nearly a decade of building Segment, I'm leaving to start a new adventure. My last day was November 2nd.

“Any sufficiently advanced technology is indistinguishable from magic.” — Arthur C. Clarke

How much time should I be spending on my UI/UX? Does it matter at all for a B2B company?

The best advice often seems blindingly obvious in hindsight. After taking it, you wonder how you ever survived without it.

Lately, a bunch of early stage founders have emailed me to ask for go-to-market advice. They have a real product, it solves a real problem for their handful of customers... so why isn't it selling?

For B2B companies, the most valuable products more or less offer a single value prop: you don’t need to ask permission to get your job done.

I gave a talk at Google Cloud NEXT, highlighting our use of BigQuery and Bigtable for building out our Personas architecture. In this session, I share a little bit about Personas, and dig into the Dremel and Bigtable papers to share why they work for our use case.

I recently gave a talk at our annual Synapse conference about ctlstore (control store), a new database that we’ve been developing internally at Segment. It’s designed to give good consistency guarantees to the writer, while providing good availability guarantees for readers.

In this talk, I share an actual walkthrough of a production incident. The tooling we used. What we thought was going on. How we diagnosed it over time. How we eventually got to the root cause. And how we postmortemed it afterward.

Everyone wants to build the next “rocketship.” We all want to start businesses with an incredible growth trajectory and massive impact.But when looking at the most successful startups, there’s a curious phenomenon that happens. A lot of early success often hurts a startup in the long run.

NOV 20, 2016

“What’s the biggest thing you’ve learned while building Segment?” I wish we’d written everything down.

This talk covers how we manage tfstate, separate environments, specific module definitions, and how use terraform to boot new services in production. I also discuss the challenges we’re currently facing, and how we plan to attack them going forward.

Recently, there’s been a lot of commotion on twitter and in #node.js about the new streams2 API. The official stream docs leave a lot to be desired, which has lead to general confusion. That’s too bad, because using new streams can really simplify your code once you understand how they work. Let me take you there…

A little over a year ago, I tore my ACL playing ultimate frisbee. I made a quick cut to lose my defender, and as I planted my foot to change directions, I felt a sudden pop in my knee. Next thing I know, I’m on the ground because my knee had given out.

At Segment.io, we deal with a lot of important user data on a daily basis. Consequently, our top priorities are that we don’t lose your visitor’s data and that our incoming API stays available at all times. As you might guess, all of the data that comes in has to be validated against our database. Our API servers maintain a connection to the DB to check that incoming requests are actually good.

When my co-founders and I first started on our startup a little over a year ago, we asked other startups about what to database they were using. Nine out of ten people all had the same response: “Just go with Mongo.”

At Segment.io, we’ve been using node.js with the express framework for about 8 months now. It’s simple, doesn’t prescribe too much, and is way less verbose than the same java code. Over that time, I’ve discovered a few patterns and conventions which make my code significantly cleaner and easier to follow. Here they are.

One of the best parts about mongo is its entirely schemaless nature. We can store any kind of data we’d like without having to worry about where that data comes from or how it should be laid out. However, it’s sometimes tricky to index that data in a way that will be performant using mongo’s schema-less nature.

Today, that subject is upstream caching, hints in the HTTP headers that developers can give to routers and content delivery systems that intermediate between client and server.

German computer scientist H.P. Luhn came up with a method for automatically generating abstracts from scientific papers, along with many cool information theoretic ideas. I'd like to give you a brief rundown of how the algorithm works.

When you load a website, your browser sends packets to the site requesting information, the server creates a response addressed to you, and then the browser recreates all this disparate information into a single webpage.

JUN 6, 2010

As we talked about in our networking class, LZW compression is about as close as you can get to an optimal compression rate for large compression objects. The single biggest advantage of LZW is that unlike traditional Huffman Encoding, very little information needs to be transmitted from the compressor to the decompressor.

I'd like to talk a little about and give an implementation of B-Trees, perhaps the best example of an algorithmic data structure designed specifically 'real computers' rather than the theory of academia. Given that, it took me a while to figure out the why on earth B-Trees are used. It's sort of like the Binary tree's uncoordinated half-cousin who looks slow, but turns out to be speedy when tested.