In Segment’s early days, we hit countless problems as a founding team. And at the time, I thought those problems were unique to our own special snowflake of a founding journey. I chalked it up to us being new grads and first-time founders.
As they say... some personal news. After nearly a decade of building Segment, I'm leaving to start a new adventure. My last day was November 2nd.
The best advice often seems blindingly obvious in hindsight. After taking it, you wonder how you ever survived without it.
Lately, a bunch of early stage founders have emailed me to ask for go-to-market advice. They have a real product, it solves a real problem for their handful of customers... so why isn't it selling?
For B2B companies, the most valuable products more or less offer a single value prop: you don’t need to ask permission to get your job done.
I gave a talk at Google Cloud NEXT, highlighting our use of BigQuery and Bigtable for building out our Personas architecture. In this session, I share a little bit about Personas, and dig into the Dremel and Bigtable papers to share why they work for our use case.
I recently gave a talk at our annual Synapse conference about ctlstore (control store), a new database that we’ve been developing internally at Segment. It’s designed to give good consistency guarantees to the writer, while providing good availability guarantees for readers.
In this talk, I share an actual walkthrough of a production incident. The tooling we used. What we thought was going on. How we diagnosed it over time. How we eventually got to the root cause. And how we postmortemed it afterward.
Everyone wants to build the next “rocketship.” We all want to start businesses with an incredible growth trajectory and massive impact.But when looking at the most successful startups, there’s a curious phenomenon that happens. A lot of early success often hurts a startup in the long run.
“What’s the biggest thing you’ve learned while building Segment?” I wish we’d written everything down.
This talk covers how we manage tfstate, separate environments, specific module definitions, and how use terraform to boot new services in production. I also discuss the challenges we’re currently facing, and how we plan to attack them going forward.
Recently, there’s been a lot of commotion on twitter and in #node.js about the new streams2 API. The official stream docs leave a lot to be desired, which has lead to general confusion. That’s too bad, because using new streams can really simplify your code once you understand how they work. Let me take you there…
A little over a year ago, I tore my ACL playing ultimate frisbee. I made a quick cut to lose my defender, and as I planted my foot to change directions, I felt a sudden pop in my knee. Next thing I know, I’m on the ground because my knee had given out.
At Segment.io, we deal with a lot of important user data on a daily basis. Consequently, our top priorities are that we don’t lose your visitor’s data and that our incoming API stays available at all times. As you might guess, all of the data that comes in has to be validated against our database. Our API servers maintain a connection to the DB to check that incoming requests are actually good.
When my co-founders and I first started on our startup a little over a year ago, we asked other startups about what to database they were using. Nine out of ten people all had the same response: “Just go with Mongo.”
At Segment.io, we’ve been using node.js with the express framework for about 8 months now. It’s simple, doesn’t prescribe too much, and is way less verbose than the same java code. Over that time, I’ve discovered a few patterns and conventions which make my code significantly cleaner and easier to follow. Here they are.
One of the best parts about mongo is its entirely schemaless nature. We can store any kind of data we’d like without having to worry about where that data comes from or how it should be laid out. However, it’s sometimes tricky to index that data in a way that will be performant using mongo’s schema-less nature.
Today, that subject is upstream caching, hints in the HTTP headers that developers can give to routers and content delivery systems that intermediate between client and server.
German computer scientist H.P. Luhn came up with a method for automatically generating abstracts from scientific papers, along with many cool information theoretic ideas. I'd like to give you a brief rundown of how the algorithm works.
When you load a website, your browser sends packets to the site requesting information, the server creates a response addressed to you, and then the browser recreates all this disparate information into a single webpage.
As we talked about in our networking class, LZW compression is about as close as you can get to an optimal compression rate for large compression objects. The single biggest advantage of LZW is that unlike traditional Huffman Encoding, very little information needs to be transmitted from the compressor to the decompressor.
I'd like to talk a little about and give an implementation of B-Trees, perhaps the best example of an algorithmic data structure designed specifically 'real computers' rather than the theory of academia. Given that, it took me a while to figure out the why on earth B-Trees are used. It's sort of like the Binary tree's uncoordinated half-cousin who looks slow, but turns out to be speedy when tested.