The history of servers, the cloud, and what’s next – with Oxide - podcast episode cover

The history of servers, the cloud, and what’s next – with Oxide

Dec 17, 20251 hr 39 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

This episode features Bryan Cantrill, co-founder and CTO of Oxide, tracing the evolution of computing infrastructure from the dot-com boom and bust to the modern cloud era. He delves into how Sun Microsystems dominated early web infrastructure, the unexpected innovations spurred by economic downturns, and the profound impact of open source, x86, and AWS. The conversation also explores the strategic shift towards custom, on-premise hardware for hyperscalers, Oxide's unique approach to designing systems from first principles, and their transparent, uniform compensation model. Bryan also shares his perspective on the practical uses and limitations of AI tools in both software and hardware engineering, emphasizing the enduring need for human ingenuity, diverse teams, and continuous learning in a rapidly changing tech landscape.

Episode description

Brought to You By:

•⁠ Statsig ⁠ — ⁠ The unified platform for flags, analytics, experiments, and more.

•⁠ Linear ⁠ — ⁠ The system for modern product development.

How have servers and the cloud evolved in the last 30 years, and what might be next? Bryan Cantrill was a distinguished engineer at Sun Microsystems during both the Dotcom Boom and the Dotcom Bust. Today, he is the co-founder and CTO of Oxide Computer, where he works on modern server infrastructure.

In this episode of The Pragmatic Engineer, Bryan joins me to break down how modern computing infrastructure evolved. We discuss why the Dotcom Bust produced deeper innovation than the Boom, how constraints shape better systems, and what the rise of the cloud changed and did not change about building reliable infrastructure.

Our conversation covers early web infrastructure at Sun, the emergence of AWS, Kubernetes and cloud neutrality, and the tradeoffs between renting cloud space and building your own. We also touch on the complexity of server-side software updates, experimenting with AI, the limits of large language models, and how engineering organizations scale without losing their values.

If you want a systems-level perspective on computing that connects past cycles to today’s engineering decisions, this episode offers a rare long-range view.

Timestamps

(00:00) Intro

(01:26) Computer science in the 1990s

(03:01) Sun and Cisco’s web dominance

(05:41) The Dotcom Boom

(10:26) From Boom to Bust 

(15:32) The innovations of the Bust

(17:50) The open source shift

(22:00) Oracle moves into Sun’s orbit

(24:54) AWS dominance (2010–2014)

(28:15) How Kubernetes and cloud neutrality

(30:58) Custom infrastructure 

(36:10) Renting the cloud vs. buying hardware

(45:28) Designing a computer from first principles 

(50:02) Why everyone is paid the same salary at Oxide

(54:14) Oxide’s software stack 

(58:33) The evolution of software updates

(1:02:55) How Oxide uses AI 

(1:06:05) The limitations of LLMs

(1:11:44) AI use and experimentation at Oxide 

(1:17:45) Oxide’s diverse teams

(1:22:44) Remote work at Oxide

(1:24:11) Scaling company values

(1:27:36) AI’s impact on the future of engineering 

(1:31:04) Bryan’s advice for junior engineers

(1:34:01) Book recommendations

The Pragmatic Engineer deepdives relevant for this episode:

Startups on hard mode: Oxide. Part 1: Hardware

Startups on hard mode: Oxide, Part 2: Software & Culture

Three cloud providers, three outages: three different responses

Inside Uber’s move to the Cloud

Inside Agoda’s private Cloud

Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email podcast@pragmaticengineer.com.



Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Transcript

Intro

Can you tell us about the dot-com boom? We did much more technically interesting work in the bust than we did in the boom. There's a degree to which innovation requires some level of desperation, that good economic times are kind of hard to summon that desperation. How have AI tools

changed how you're working at Oxide. Certainly we're using Cloud Code a bunch and people are doing that. But for a lot of the work that we're doing, it is helpful as maybe a polishing tool, but less as at the epicenter of its creation. Can you tell me what it actually means to design or build?

Oh, it's very involved. Yeah, it's very involved. So first of all, how have servers and cloud infrastructure evolved since the late 1990s and what is next? Brian Cantrell was a distinguished engineer at some microsystems during the dot-com boom and dot-com bust.

built a small competitor to AWS called Joyent, and is now the co-founder at Oxide. Today, we go into the history of servers and the cloud from the late 1990s to today. The challenges of building hardware like the Oxide computer from scratch. How the Oxide team uses AI and where they find it

practically useless for hardware engineering challenges, why Oxide builds everything as open source, and how they manage to work remotely as a hardware startup, and many more. If you'd like to understand more about how the cloud works and learn how nimble hardware plus software startup operates,

This episode is for you. This podcast episode is presented by Statsik, the unified platform for flags, analytics experiments, and more. Check out the show notes below to learn more about them and our other season sponsor. So Brian, welcome to the podcast. Oh, it's great to be with you. Thanks for having me. I'd love to jump back in time a lot, back in the 1990s, because you're someone who's been around the block. And back then, you worked at some interesting companies, including at Sun.

Computer science in the 1990s

And if you could give us listeners and viewers a sense of what was it like in the 90s in terms of... Software, servers, what was the vibe like? Yeah, it was an interesting inflection point because I was interviewing in 1995. I started in 1996. So I would say that the internet, I mean... HTTP had been developed in like 93, 94. We had kind of the first web browsers, but it was still very, very, very new.

And the internet was just kind of primed for takeoff. Java had come out in maybe 1995. Java had kind of taken off immediately. So there was a lot of... Really exciting energy, but it was nowhere near what it would become a couple years, even a couple years later. It became very frothy, of course. And it was exciting. It was very clear to me. I went to school, actually, in the East Coast.

Just coming out here to Silicon Valley, the energy was extraordinary and really knew that I wanted to come out here for my career. So at Sun, those next couple of years, I mean, I got very lucky, really, because... Sun was in the right place at the right time with the right technology, which, you know, sometimes you only appreciate in hindsight because it was so explosive. And if you wanted to build a website as part of that dot-com boom,

you were buying Sun servers, you were buying Cisco switches. Now, why was this the case? Because again, just taking myself back, just being a bit naive, I would assume that let's say I'm in the 1995, I want to build a website.

Sun and Cisco's web dominance

Could I not just use the PC and spun up a server? Did it not work like that? Or how did it work? I mean, a PC, like maybe, but you didn't really have an operating system, right? Because Linux is... Linux is very, very new. Linux is not... I'll back down. Oh, yeah, definitely. Linux is, you know, what would be like Haiku today, which is an operating system you haven't heard of for a reason. It's kind of like a hobbyist operating system. You know what I mean? You'd be like...

What? No, you wouldn't write. And then you kind of had the BSDs. The free BSD was certainly out there. Also still very much under the shadow, though, of this lawsuit from AT&T. So the Unisys, there's not really open source operating system options. There was the, actually, this was kind of funny because, so where was the GNU option?

It was going to be the herd operating system. So herd was kind of like the Duke Nukem forever of its time. It was the operating system that was constantly coming kind of next year and next year and next year. And it was going to be microkernel based and. So you know that it's kind of amazing, but you really couldn't do it on PCs because of the lack of system software. And actually, part of my attraction to Sun was...

I had used Solaris on Spark, but I knew Solaris existed on x86, but I never used it. So I was excited to use... Solaris on x86. And so what did Sun build? You mentioned Solaris. That was the operating system, right? Solaris is the operating system. We built servers. So we built Spark-based servers. We built desktop machines. So we... Sun was a computer company. It was a systems company. So we built...

desktop machines, built some ill-advised laptops. So basically desktop machines, workstations. But then at that time in the 90s, what was really exploding were everything from those kind of workgroup style servers up to really getting bigger and bigger servers up to... Very large machines. Machines that are as physically the same size as what oxide makes today. And I remember vividly in what would have been like 97, 98 maybe, Greg Papadopoulos, then the CTO of Sun.

giving it to the entire company saying, here are the top three applications for Sun Microsystems, databases, databases, and databases. So that gives you an idea of kind of how it was being used. And this is... Again, as that kind of in that knee up of that dot-com build out. where if you, again, if you wanted to really build a web presence, you were going to use Java, you were going to do it on Solaris, you were going to do it on Sun Servers.

And it was a wild time, for sure. And can you tell us about the dot-com boom? Because right now, I know AI is pretty exciting, and it feels like we're in a special time. But what was it like, especially working on Sync? It sounds like it was the epicenter of it.

The Dotcom Boom

And you know what was funny is I did – it was frenetic in a way that was not always positive. So one of the things that is just a point of fact and one can take from what one will. I did, we did much more technically interesting work in the bust than we did in the boom. Because I think that when you're in boom times, you know, everyone kind of like secretly believes that this is because of me.

It is because of the thing that I am working on. I once had one of the early technologies behind Java once told me with a straight face, every server that Sun sells, they sell because of Java. And I'm like...

You know what? You know what's most amazing? You believe that is actually the more interesting fact that, I mean, it is like obviously false, especially with, you know, databases, databases, databases being the top three applications. But that kind of reflects the zeitgeist of the time, that everyone believes that this is...

You know, if I work on the microprocessor, it's because of the microprocessor is perfect. If I work on the operating system, it's because, oh, this is the operating system that people are buying the machine for. And it's like that. It doesn't really lend itself to real innovation, I think. I think there's a degree to which innovation requires some level of desperation that good economic times.

It's kind of hard to summon that desperation sometimes. So I think that during the boom, and it was just, it was frothy, and it felt like there was a period of time where I'm like, this obviously can't go on forever.

And, you know, The Economist is having these very, like, gloomy covers about how this is all going to end and it's going to be an apocalypse, which I believed. And then I just stopped believing it. I'm like, well, maybe The Economist is right. It just went on longer. And, you know, one of my early life lessons from the boom and bust.

is these things go on longer than you think possible. In terms of the growth? In terms of the boom. When you're in frothy times, that boom will go on longer than you think possible. And when it switches, it will collapse faster than you can fathom. In the boom, do I understand correctly, that customers were just like wanting to buy your servers. They're flying off the shelves. Oh, absolutely. Everything. And on a day-to-day work, what would it mean?

have been for you so i'll tell you like a day-to-day it meant first of all meant that traffic was terrible that the you know there is you couldn't get housing you couldn't get you know everything was in short supply you couldn't uh customers are you know they are

We had a customer that was going to buy 19,000 servers, which is obviously a very big number. And these were these massive big servers, right? Yeah. Well, in that case, those were actually one-use servers to build out a broadband initiative. That actually was a company called Enron. You know, I remember vividly we were at a dinner here in the city at a restaurant called Aqua, which is a very kind of fancy restaurant long since out of business.

And I don't think Aqua survived the bust. And we were at Aqua with a bank who was a customer of Sun's. And they were spending a galactic amount of money every year with Sun. And we were at a dinner. And I just remember, I mean, it was the kind of like 19th century Gilded Age kind of dinner. People are ordering, you know, nine courses. What I remember is at the end of that having Chateau de Quim, which is a Sauternes.

So I don't know very much. I don't know very little about wine. I know nothing about Sauternes. What I did know is there was someone who knew wine, and it's like, we are going to all drink the 1952 Chateau de Quim Sauternes, which is –

And I remember being like, I'm like, I'm not much of a drinker, but I was like too drunk at that point to really appreciate it. So I have had this so turn that, you know, that, that enough files kind of live their life to, to drink. And I'm sad to inform you that there's. one less bottle of this precious vintage because it was poured down the gullet of a 20-something dot-commer who really had. And I just remember being back in my apartment, being literally drunk on Chateau de Quim.

thinking about in Potrero Hill and remember thinking to myself, this can't last. This is not sustainable. And I swear. The dot-com boom turned to a bust like that night. That is September of 2000. So the pets.com had kind of busted out and a bunch of NASDAQ had busted out early in 2000. The traffic got lighter early in 2000. Anyone who was here would be like that. The absolute spookiest thing is it went from like gridlock to like COVID like traffic in the span of like a month.

Without COVID happening. Without COVID happening. With only the NASDAQ collapsing. And you're like, okay, that's very odd. And then 2000 kind of muddled along. And then that dinner was in September of 2000.

From Boom to Bust

And what really stopped was the telco build-out. So there was a lot of telco build-out because people are like, the internet is the future. And telco buildup, meaning the towers, the server. The servers, the infrastructure for, and then all of the concomitant, the fiber, like JDS Uniphase was a huge company. You had these companies that were, you know, Global Crossing and MCI World.

And all these companies were explosive. And everyone believed that the internet is the future. And this is like an important thing. And they were right. They were right. Brian just said how an important lesson with the dot-com boom was that people who believe the internet will be the future. They were right. Today, we're in a similar stage with AI. It's pretty likely that AI will be part of the software stack in the future, even if timing is harder to predict.

The latest shift is how AI agents are becoming a lot more commonly used for development. And this is a great time to talk about our season sponsor, Linear, and how they think about collaborating with agents. Linear had taken an interesting approach here. Instead of building one proprietary AI assistant and locking you into it, they built an open API and SDK that lets any agent plug into your issue tracker.

That means you don't need to wait for a linear to build the features that you need. You can connect the best coding gauges on the market like Cursor, GitHub, Copilot, OpenAI Codex, and Devin.

or you can build your own agent for your team-specific workflow. It's a fundamentally different approach from most issue tracking and project management tools on the market. You get optionality. And the experience is surprisingly natural. You assign an issue to an agent the same way you assign it to a teammate.

Or you can simply mention the agent in an issued thread. Cursor then can pick up a bug, understand the context from the issue, open a PR. Codes can explore a fix while you're focused on something else. centric and root cause analysis when something breaks it's pretty powerful what you can get these agents to do and here's what i like you the human stays the accountable owner the agent works for you not instead of you you review the work

You decide when it's good and when it ships. If agents are going to be a part of the tools that are building software, and it feels to me they increasingly are, you'll want a system that's actually designed for them. Linear is a system like this. To learn more, head to linear.app.agents. And with this, let's get back to the point where Brian was saying how those believing the internet will be the future back in 2001 were right.

This is the other thing. It's like they're right. And so like a very famous impact creator from the dot-com boom is Webvan, right? Webvan was delivering groceries, which many people today are going to get their groceries delivered, right? Right, right. And the Instacart. It's like they weren't wrong. But their timing was off, and they lost track of the underlying economics completely. And so when it busted out – so in the fall of 2000, in November of 2000 in particular, there were –

there were zero orders from telecoms at sun. Like it went to zero. And every, and you know, you're kind of used to kind of ups and downs, but that's like, just like off a cliff. And from that point, we, you know, going 2000 and then, and then. And it was then very, very grim. I would say that the thing that happened through the bust and layoff after layoff after layoff and because companies had kind of built themselves and geared themselves around these fat times lasting forever.

And now they were gone. And expectations, as frothy as expectations were during the boom, they were that much negative in the bus. People were like, everything is, it's the end of days. And were you a software engineer back then? Yeah, a software engineer. Yep. So as a software engineer, both you and also thinking about your colleagues back at the time or friends, how did it impact you? Were you kind of just chugging along? So I would say that lots of people left.

And you had like the statistic of, you know, the U-Hauls were 10 to 1 out of the Bay Area. So you – They moved away. They moved away. And the – thing that i noticed is that the people that had moved out to silicon valley because they were they really had a an interest in the technology all were there all stayed and we're not adversely affected honestly i mean i the um yes we

Every one of us, if you had equity in your company, which of course you all did, like you try not to overthink it, right? You just try to like, you try to remind yourself like. I never had it to begin with. So like, it's hard to, you know, but it's definitely gone. Sun lost 98% of its value. So it's like definitely gone. And, you know, there was some thinking, and I think it also like a boom.

can get you to care about things that you actually don't care about. And a boom can get you to, because in a boom, everyone is so financially driven that it's hard not to become financially driven. But it's like, that's actually not why I got into this. And so during the bust, you know, definitely able to put, you know, put a meal on my table and a roof on my head. But the, it was really a reminder about like what's important. And again, because we did.

we did do better technical work in the bus than we did in the boom. And I think it's because in the bus, it's like, okay, now like we really, we have to focus. We have fewer resources that the fewer resources.

The innovations of the Bust

actually forced more creativity. So, you know, all of the things that we did, certainly speaking at Sun and system software, so ZFS and D-Trace and service managerability, all these things that were really revolutionary for the operating system all happened in the same kind of post-bust. period of time so they all those all of those things happened from 2001 to say 2005. and and so

What were these specific innovations? So I'd gone to work at Sun to work with Jeff Bonwick. And as long as I'd known Jeff from the mid-90s, Jeff had wanted to rethink file systems. And now finally in the early 2000s, he and Matt Ahrens were able to really go. Take a clean sheet of paper from the file system, and that's ZFS. I had a chip on my shoulder about the way we understand debug systems, by the way we observe systems.

So I, along with two other colleagues, did Detroit, which allowed us to dynamically instrument systems. And you can kind of go down the line, and there were a bunch of things like this where... And I don't know that all of this is related to the bust. It's just that the timing lined up such that it was all happening during the bust. And what we ended up with was a whole bunch of interesting technology coming together actually in a single version of the operating system. And then...

Very, I mean, fortunate for us, and I do think this is a bit of a consequence of the bust because Sun was definitely open to new approaches. We open sourced all the operating system. So that happened in 2005. And that was very important to give these kind of technologies eternal life. But I think we can never predict the future. But to me, it is pretty positive in the sense that even in the bus...

Hearing the stories that innovation did not stop. Sure, you know, sounds like it was probably harder to get jobs and there might have been fewer of them. But, you know, industry kept innovating. And what you said, I didn't expect to hear. that it was a bit easier to innovate. It's just less manic. We were able to focus more. And so not that, now, I mean, not that one should necessarily pine for a bust because busts are brutal.

But there is a clarity that you get too. So, I mean, ideally you would like to have just like, can we just be like normal economically? But like, nope, apparently in high tech, we've got to be like on or off. So bust aside, in the early 2000s, leading up to this internet boom, the way to, you know, most companies went about buying Sun servers, what Solaris installed, everything was hardware and software came together, it was beautiful, it worked well. Again, I heard from folks who did it.

The open source shift

What happened then? Because when I got into tech in 2000, I did not hear about Solaris, and that was not how it did. What was the shift? So the shift was, first of all, open source. So we said in the mid-90s, Linux was kind of still very much a hobby project.

Not so by the 2000s, right? It grew up. It grew up, absolutely. And it grew up because you had a bunch of companies that really backed up the truck. And, you know, the things that at first, IBM and SGI, Data General, some other companies, those companies were very important.

Because they decided to contribute their technologies like XFS, right? XFS, many people still use today on Linux. That's from SGI. XFS was SGI on IREX. That was happening in kind of the late 90s. And then in the 2000s, I mean, Google. was always built on Linux, right? And so you had kind of the companies that became that next boom were all built on open source and indeed needed to be built on open source. So they economically relied on open source to be able to build what they built.

So then it became much more practical to certainly run Linux and I think the other BSDs or we open sourced Solaris. So there were a lot of options that were now available. So that shifted. I think the other thing that shifted is that, I mean, Spark bluntly lost to x86. And Spark is a hardware architecture. Spark is a microprocessor. And there was a time in the 90s when if you wanted the fastest microprocessor, it was a RISC microprocessor. It was...

It was a Spark microprocessor, or it was MIPS, or it was Alpha. And x86 was a commodity, but was obviously available with a personal computer, but was not faster than those RISC microprocessors.

shifted that shifted in the late 90s and we you know because we ran the operating system on that was in solaris on both spark and x86 we could see how fast these x86 machines were and could see frankly how like you know you talk to the microelectronics folks they really did not they kind of dismissed x86 and dismiss intel and you shouldn't do that and in particular intel was was very focused and

architected the way around what was called the memory wall. And they were able to, in part because they used speculative execution, they were able to actually make these microprocessors that were... became much faster than the RISC microprocessor. So by the time, say, you are in 2004, 2005, if you want a leading edge microprocessor, it's X86.

So that was a big and important shift. So by the time you're coming up, it's like, okay, yeah, if I want this, I'll just like, I don't know, get like a Dell box or a Supermicro box.

And then I'll put Linux on it or maybe FreeBSD and away I go. Then the next kind of big and important shift that happened started in 2006. You could argue with... with s3 but then especially in those next kind of aw7 8 9 with the introduction of ec2 and now you have like the the cloud that starts to come into play and now like people are like why would i even like

screw around on the server at all i mean it was so great to be able to just spin up infrastructure yeah i i remember one of my early companies mid-2000s we we had a server room we had server administrators

The server room was always hot. And this was a small company, mind you. This was not a big one. Every company needed to do that. It's kind of amazing to think. It's like that every single company, no matter if you were a website, you had your own server room. And if you were a dev, you wanted to be friends with the server admin because when...

and you wanted to deploy your stuff, they could do stuff for you. They could do stuff for you. That's it, totally. And so I think that cloud computing was really important. This is not a deep thought, that elastic infrastructure was really important. The ability to have API-driven infrastructure. And so for me personally, so I was at Sun, and then in 2006, I started a storage group inside of Sun, which was great, really successful group.

Oracle moves into Sun's orbit

But so successful that it actually attracted Oracle as a customer for the first time in a long time. This is like a little bit of residual shame that I have that like, did I attract the marine apex predator that ate the company?

Oracle Bot Sun, and that closed in early 2010. I left shortly thereafter because I could... see what oracle was well i never heard the story of your potential role here right so i yeah and uh uh oracle and i and i gave something maybe a year later i gave a talk uh in 2011. With some rather unvarnished opinions about Oracle and Larry Ellison, in particular, I caution people about anthropomorphizing Larry Ellison. You have to treat Larry Ellison as a machine.

Like a lawnmower. You stick your hand in the lawnmower, it'll chop it off. All right, so I'm giving this talk in 2011. Again, this is after I've left what was then Oracle. And, you know, like I was just saying things that I felt were obvious, but people, you know, the audience is kind of gasping and, you know, it's like, and people are coming up to you after the talk. Like, do you think there's going to be like.

It's going to be retribution from Oracle? No, no, you're misunderstanding. The lawnmower's not angry at you. It's a machine. It doesn't have the mirror neurons to be... I would almost... It would almost... Show me that I'm wrong for Oracle to resent what I'm saying about them. Anyway, so, but all the videos for that conference go up and my video does not go up. Oh, right.

Okay, and so my colleagues were like, this is an Oracle conspiracy. I'm like, this is not an Oracle conspiracy, which it wasn't. It wasn't orchestrated by Oracle. But what I did, what I underestimated was the fear of the conference organizers. They themselves were terrified of offending Oracle. Yes, even though it probably would have been fine. No, the talk did finally go up. Before the talk starts, there is a disclaimer. The views in this talk do not represent the views of the USNX Association.

And you're like, all right, I get it. Like, I've never seen this disclaimer before, but fine. Then during the talk, you know, the format of the talk is you've got a slide and then you've got like a little blank strip and then you've got this talking head in the little right corner. So there's like kind of this dead space above the speaker.

They took this disclaimer and they re-justified it and they put it above my head the entire time I'm speaking. So if you – and maybe in this regard, they were prescient because to this day, if – Ellison is mentioned on Hacker News or Oracle is mentioned on Hacker News. Someone will immediately cite minute 33 of this talk, which is when I go on this kind of Oracle.

Again, I don't view it as a rant. I view it as just like me describing what is obviously true that we all know. But anyway, I had left. I left Oracle after they bought Sun. We're now around like 2000 returns. So cloud has taken off. x86 architecture is everywhere. Linux is now winning both for small time servers, but also on the cloud.

AWS dominance (2010-2014)

And then what happens? This was an interesting time when Google started to figure out that, hey, they could do something interesting on their cloud, right? Yeah, that's right. So this is still a little bit before that. So this is in kind of from, I would say, from 2010 to about 2014 is when... is a period of relentless execution from AWS. AWS is executing so extremely well.

there are not really other public cloud options there's like kind of azure's kind of drifting out there people forget that that you know like gcp on paper has been around from 2009 but up to like 2014 it was like It was almost like a joke. It was a joke. I would say before it existed, but it was a joke. And in particular, at every single re-invent, Amazon would announce a new price cut. And if you were a competitor... to aws you are like dreading reinvent because here comes another price cut

If you are a partner of AWS, you're dreading reInvent because here comes the announcement of a new service that competes with what you're making. I think people who have not been around have forgotten, but it really has happened because it's not been the norm the last, like, let's say five, 10 years or so.

And in particular, they did a couple of things that are just like, man, you got to tip your hat to just, I mean, Jeff Bezos is the apex predator of capitalism. Like Larry Ellison may be the lawnmower, but Bezos is ultimately the apex predator because the thing that was so impressive. is they were able to give people the idea that this was a terrible business. So in particular, they did not break out their financials. So everyone's like,

Oh my God, what an awful business. Like they're cutting the price every year. Like you do not want to, like this is a classic red ocean. It's bloody. You don't want to compete. And so we were at joint. We were actually competing head to head with AWS. So you were offering a public cloud. So we have a public cloud and then unlike AWS, taking the software that we'd used to run the public cloud and making it available for people that wanted to run a cloud on-prem on their own hardware.

So people that would buy Dell or HP or Supermicro, they would buy our software and they would run it on there and get a cloud. So we ran a public cloud and we knew what the economics of a public cloud were. Namely, pretty good. Margins were good.

And so what we knew that Amazon wasn't volunteering, but what we knew is that AWS S3 was underwriting a war on big box retail. S3 was paying for your prime shipping. It was... a genius move and so there's also some some insider information that you had because you did your own thing well we don't know that the margins are very good and then of course i mean we did

you will be unsurprised to learn that several of Joyant's most prominent customers were retailers. Retailers, this was not lost. Retailers were like, gee, I wonder what's happening. Retailers are like, If you think I'm going to take my dollars and spend them on AWS, so AWS, so Amazon can go to war with me? Like, no thank you. There was a period of time when it felt like in order to be in the cloud, you have to implement every AWS API.

So there's this idea that you had to be API compatible with EC2. There's a company called Eucalyptus that tried to do this. It was just a disaster. And part of the reason it was thought that GCP and Azure could never compete with AWS because they could never be API compatible. And so I am convinced that the, because what changes? What changes is like 2015? What starts in 2015? Kubernetes. And I think that part of that initial attraction to Kubernetes.

How Kubernetes and cloud neutrality

is that people wanted to get some optionality around their cloud. And they felt locked into AWS. They're like, I'm not using all this stuff. I'm not using Elastic Beanstalk. I'm not using Greengrass. I'm not using Redshift. What I actually want is this kind of basic infrastructure, and Kubernetes now gives me this layer.

upon which I can deploy and get some sort of true cloud neutrality. So multi-cloud didn't really exist, I would say, before Kubernetes. And I think a lot of that, especially early momentum behind Kubernetes. is around this idea of like, I need to get some optionality in here. I want to actually be able to go to GCP. So I think, you know, and I don't.

I think it's giving Google slightly too much credit, but only slightly too much credit to say that it is masterstroke. On the podcast, I had Kat Cosgrove, who's released a project manager on Kubernetes. And she's been in the project for a long time. And I asked her, she was never a Google employee.

But I asked her, why do you think Google open source Kubernetes, which, you know, they have Borg, which is amazing. And they kind of built on a better version for external. And they just released it just like that. They put a lot of work in it. And to me, it didn't really compute. Like, why would Google?

Like, what is the business reason? And she told me that she thought, again, speculation from the outside, that she thought that they probably thought that it would help Google Cloud. That's right. to have the a container which is now portable and now you can give the promise that if you run this on azure especially aws

you could come over. So it kind of makes sense. Is this your thinking? Yeah, absolutely. But I think that is definitely the argument that Kubernetes proponents would make inside of Google. In terms of like why they did it, nobody prevented it. You know what I mean? It's like they kind of open sourced it. Google was a pretty cool place, but in the sense that it was very bottoms up, as I understand back then still. And then I think part of their, you know, it was Craig McCaukey who really.

pushed for the CNCF, the formation of the CNCF around Kubernetes to give it kind of a foundation home. I do remember one conversation with Craig in our talking early as he's contemplating the CNCF. And he's like, well, I think this is going to allow Kubernetes to get the marketing dollars that it needs.

don't you work for the most profitable company on earth? Like, do you really, isn't it just like gushing cash over there and you can't get like, you know, a couple million bucks for marketing for this thing, but no, apparently you can't. So, but so I think that, that the, the argument that people were making internally was about.

We should be encouraging cloud neutrality because we are the ones that have something to win. And they're right. And they did. And GCP is now not an afterthought. GCP is very important. It's a very big business. And I think that they've got Kubernetes to thank. Solely, no, but I think it's played an important role for sure.

and where are we today in terms of the the hardware and the software stack running specifically thinking of these big clouds what's happening inside the likes of meta these giants as i understand you know they're no longer just like you know ordering servers from dell or or where

Custom infrastructure

Never were. Never were. Never were. Never were. What did they do? So it's kind of funny because for all of these folks, they took a somewhat similar path. They never were because in Google's earliest days. they were assembling machines from fries you know rip fries fries being a local electronics shop that has long since disappeared

But they were kind of famously Velcroing machines together and finding. So they bought like the processor, the different networking switch, whatever. And they had this idea that like it doesn't matter what junk we run on because, you know, our. our software is going to run as a distributed system. It actually doesn't matter. We don't need ECC protected memory because it doesn't matter if your dibs fail. And so I think they learned, well.

It does matter a little bit. If your dims have rampant data corruption, like dims failing, that's actually not a problem. Dims, your memory returning the wrong thing. Like, that is a problem. You can actually, like, you turn that, like, next thing you know, like, your software inserts that into a row into a database. And, like, yeah, now you got it. Yeah, but this correctness is a problem. Yeah, correctness is a problem. It's like, okay, overshot the mark. So by the time they're like, okay.

We're not going to Velcro machines together. But by that point in time, the business was established enough that they actually did, they built the machines that were fit for scale. They have a great book that was written in the kind of the mid-2000s, The Warehouse Size Computer. where they talk about all the things they did at DC bus bar, really thinking about power across the entire DC. So they kind of, they went from being kind of too cheap for kind of Dell or even Supermicro.

to then being much better engineered than those systems ever were. So they were never really meaningful customers. And ditto for Facebook, Meta. They were never really meaningful. I mean, they kicked them out very early and did their own stuff.

Brian just talked about how Facebook built their own servers because off-the-shelf solutions didn't work at their scale. And what's interesting is that companies like Meta and Google didn't just build better hardware, they also built incredible internal tools.

tools for safe deployments, feature flagging, experimentation, debugging, analytics, the whole stack that lets teams shift fast and with confidence. Most companies never get access to this level of infrastructure. You either build it yourself, which takes years and large entering teams.

or you make with scattered tools that don't talk to each other. That's exactly where StatSec comes in. StatSec is our presenting partner for the season, and they give every engineering team access to the kind of tooling that only the biggest tech companies used to have internally.

At its core, Static is a toolkit for safer deployments and experimentation. You ship a new feature to 10% of users behind the feature gate. You validate that it behaves correctly, watch the metrics, and expand to remaining 90% only when you're confident. And if something goes wrong, you can turn it off instantly, long before it affects everyone. And safe deployments require visibility. StatSec includes analytics,

both product analytics and infrastructure analytics, so you can actually see what your code is doing in production. Errors, performance changes, funnels, user behavior. Because you cannot ship safely if you can't see what's happening. Companies like Microsoft and Notion run hundreds of experiments per quarter where Statsig, velocity that used to require entire platform teams to build and maintain. This used to be infrastructure available to maybe 10 or 15 tech giants.

Now startups and mid-sized teams use Statsik to ship quickly without breaking things. If you want to give your engineering team world-class tooling from day one, go to Statsik.com slash Pragmatic. There's a generous freeze here, a $50,000 starter program, and affordable enterprise plans.

And now let's get back to the conversation about the history of computing and what might be coming next. And this was independent. So like both Google and Meta both came to the conclusion of like we should just build our own stuff. And Microsoft and Amazon all came to the independent conclusion. Because the scale at which they needed to run.

was not at all the scale at which Supermicro and Dell and HP were geared. What they were geared to do was to run the servers in your server room where you needed to know the devs, right? Where it's like, I'm going to have a little rack. It's going to have six servers. Then maybe it's got 12 servers. Okay, maybe we...

out of 24 servers, that's what they were designed to do. If you're like, no, I want to buy servers by the thousands because I've got a public cloud business. Like if you want to buy servers by the thousands, there is no product from those companies for you. And in very, very basic ways, well, like the DC bus bar. At every juncture, they've been designed to be a personal computer that you happen to be slapping many personal computers together.

but they're not designed to actually run infrastructure at scale. So, and that was happening inside of effectively all the hyperscalers. And Joyant, meanwhile, was bought by Samsung in 2016. Joyant was bought by Samsung because their cloud bill was off the charts. They bought you to bring it in a house. Yeah, and there was not a product they could go buy, so they went to go buy a company.

So you're like, wow. And then it's like, wow, that's a big AWS bill. It's like, yes, very big AWS bill. But then that was not a product or a company that was available for the next, what does the next Samsung do? It's like, well, that's one less company available to buy. So when we were contemplating the next thing in 2019, one of the things that we had seen is that – and we felt – we earnestly believed that one, cloud computing is the future of all computing, not a deep thought.

Renting the cloud vs. buying hardware

That elastic infrastructure, API-driven infrastructure, that is modernity, one. Two, you shouldn't be able to only rent that. You should be able to buy that, own it.

run it in your own data center. Why would you want to do that? Well, you might want to do that for risk management, for security, or for economics, because it, you know, if you're at a certain scale, you'd rather own it than rent it and i think you know before oxide or like in 2019 or even in like you know 2020 2021 if you were like a mid-sized company you know like not big enough to build out your own custom cloud and build everything that the hyperscalers did

you could like buy some off-the-shelf like hp or dell like a bunch of them i think that's what base camp did i think they posted that they they bought a bunch of bunch of these things they rented a space in a in a one of these shared or or i think two different locations they put in their boxes with all the memory and then you know they kind of set it up and put it together so i guess

those were the two options, right? Yeah, those are the two options. And I think that, you know, Basecamp ended up being a real poster child for the economic advantage because, I mean, DHH, you know, obviously outspoken and the economic advantage was... really, really, really clear. They're also at a scale which is like not the scale that we're targeting, right? The scale we're looking at is a much larger scale.

And so the economic argument is actually even more compelling when you're at that larger scale. I love it when, you know, the VCs that passed on us because they felt there was no market then would send me like the DHH blog post. It's like. Why are you sending this to me? I should be sending this to you. Like, I know this. We just knew the economics of it. And we knew, couldn't predict exactly what the trends would look like, but.

but believed that there would be folks that were born on the public cloud that would outgrow the economics of the public cloud and want to go on-prem. Economics aside, what does it take to build one of these things? And I saw one of these things. We'll put in a picture of it. it's like a proper like you know like nine feet tall rack it's it's big it's

It feels like you're putting like, I don't know, like 16 or 32 of those Dell things in terms of size just to get a sense of it. Yeah, we would 32 compute sluts in there. And what does it take? What did it take to actually build? What did you need to design in terms of? hardware, and then software. Yeah, and we knew this, too, that going into the company. We knew we were taking a clean sheet of paper, right? And so we were deliberately like, no, we're going to start with a problem.

We're not going to build it out of Dell HP Supermicro. We are going to start with a problem. And how do you best solve the problem? And as it turns out, like there were a whole bunch, there's a lot of technical debt that had been accrued by this kind of PC ecosystem. I mean, you know, where do you start? Just on the environmentals, like on power, right? The fact that you've got AC power in each of these Dell, HP, Supermicro. Yeah, so if you like put 16, you have like 16 separate AC power.

Times two because you have two power supplies per 1U2U chassis, two power supplies. By the way, there are two fans sitting on those power supplies. And those, and those fans are actually what wear out. If you go to the, like, in terms of like the whirring fans, it's not just coming from the computer, it's coming from the power supplies. Because those power supplies are dense, they're packed with stuff.

So they've got to overcome a huge amount of static pressure. So that's not the way anyone does it at scale. What people do at scale is you've got a DC bus bar. You've got a power shelf that is much more efficient. that rectifies from AC to DC, and then you run DC up and down, and then you blind made into that. So we knew we were going to do that. So that's the law of electronics engineering right there. Yeah, yeah, yeah. Power engineering for sure. And we knew we were going to do that.

We also knew that by taking a clean sheet of paper that we would have opportunity made available to us that we weren't necessarily thinking of. And that manifested pretty early. So we blind mate into.

power, which is to say that when you feed a sled in that power connector, you don't see it. It's at the back. You lock the sled in blind mates into power. And we had assumed that we were going to do what Facebook and Google and others had done, Amazon had done, and had networking out the front in the cold aisle.

But as we were, you know, taking a clean sheet of paper, talking to some connectivity vendors, they asked us like, why are you, wait a minute, you guys are like taking a clean sheet of paper. Why are you putting cabling in the front? Like, why wouldn't you also blind mate in the network and the network connection?

And we were like, can you do that? They're like, oh, you can definitely do that. It's like, well, why don't the hyperscalers do that? It's like, oh, they would all tell you that if they could start over today, they would blind me the networking.

And they're just too afraid to do it at this point, which is like, I mean, that was like catnip for us. You know, like they're too afraid to do it. Like, okay, we got it. And one of the very early, holy God, we're going to bet the company decisions was blind dating networking.

Because if blind mating networking doesn't work, you've got nothing. You don't have a problem. And so what is the difference in blind mating networking versus... It means there is no cabling in the system at all. So when you've got a sled... you are blind mating into a cabled backplane. So it's cabled in the factory. So the operator... So when the box comes in, that's why I didn't see any cables. It's inside. It runs inside. It runs down the back.

And so versus when I look at the pictures of a data center, let's say Google, you see they're very neatly organized. It's like, I love organization. So it's like beautiful, but it's cables everywhere. And you can see. So you don't have that. We don't have that. And in particular, so because there's no cabling, there's also no miscabling, right? So every computer is not actually on just one network. It actually needs to be on three. It's on a power detect, a presence detect network.

It is on a service processor network. And then it's on that high-speed network that you really care about, like the actual network. In any facility, you need another network for power, environmentals, and so on. It's very easy to have miscabling there. That's got to go to a different router. It's like you, there's a bunch of just complexity that we eliminate because we do. And then.

Part of that decision came out of an arguably earlier bet the company decision, which was we did our own switch. So we also did, in addition to doing our own compute slide, we did our own switch. And last time you told me about this in our deep dive, we did a little bit that, like at first you said we did our own switch and I was like...

yeah okay cool you did your own switch and then you told me that actually like that is a second computer to build can you can you tell me why it's funny because we went when he went through sand hill initially raising money for the company Nobody asked us. And we were definitely like, I've got a technical question for you. And you're like, God, here it comes. Switch. It's the switch question. But then be no, some other random ask questions like, all right, that's not a very good question.

But nobody was asking us about the Switch. And we were concerned about the Switch because we'd already come to the conclusion in order to make this thing really work, we had to do our own Switch.

And the reason you have to do our own switch, if we didn't do our own switch, it would be a third-party integration nightmare, and we wouldn't be able to actually solve the problem that we're trying to solve, which is when this thing shows up in your data center, we want this thing to come out of the crate. We want you to wheel it up. We want you to put in power and networking and go. We do not want you to have to cable anything. The level of operator involvement should be really minimal.

so we'd already come to the conclusion that in order to make this thing operable and manageable we need to do our own switch and so you're saying that like buying because the switch to me sounds like a somewhat

Simple component. And you're going to tell me why it's not. Oh, yeah, it's definitely not. No, but that attitude is very important. If you want to go build your own switch, I encourage you to have that attitude as long as you possibly can, because otherwise you won't go do it. So what is your switch? What is a switch being?

obviously the networking switch what does your networking switch do or or that made it so important for you to build it as opposed to like going to one of the many suppliers and saying you know let's get yours not many suppliers oh if you actually go to the actual switching silicon is coming from like

it was like one and a half providers. Oh. It's all Broadcom. And so what you're actually talking about is Broadcom Silicon. What we discovered is this actually interesting piece of actually Intel Silicon from a company they had bought called Barefoot. And we found Intel Tofino, which allowed us to have true programmable networking. So we use Intel Tofino. Intel later killed Tofino.

Complicated relationship with Intel over this. We fortunately have procured enough Tofino to be able to, we bought ourselves the time we need to kind of design our next-gen Switch. But that programmability was very, very important for us and that we were not going to get from. Broadcom is a very proprietary company.

We were not going to get a bunch of the things that we needed in building that switch. We were not going to get out of Broadcom. So it ended up being very important. We were concerned. I mean, again, another one of these kind of bet the company decisions. Very, very concerned about. about having our own Switch, integrating our own Switch. And what we found is that was a win.

in so many dimensions so many dimensions that we did not anticipate it is now you can't imagine the company without having i guess sometimes do stuff and you might get some wins absolutely well i think also like whenever you're deliberating something big like that

The fact that it is big kind of forces you to really deliberate. And then once you commit to it, to taking that big risk, you often see unexpected dividends. Well, as long as we're going to do this, as long as we are taking a clean sheet of paper, as long as we're doing our own switch. we can blind me at the networking. If we were not doing our own switch,

We really couldn't blind mate the networking. We really needed to be able to own both sides of that in order to be able to do our own switch. A lot of us listeners, viewers are software engineers, so we don't know as much about hardware. Obviously, we know how the things work.

Designing a computer from first principles

Can you tell me a bit on what it actually means to design or build a computer? Because, you know, I'll give you the novice approach, which is obviously going to be wrong. But the novice approach is like, oh.

Here's a processor. Here's a few chips. Here's a mainboard. I'll just put it on there and I'm done. But when I was in your lab, Oxide... uh you told me that one of the first engineers turned out to be a radio frequency engineer you told me how this is great because of the all the fda approvals and all these things and i was like okay this is way more involved than i ever imagined yeah

It's very involved. How do you build a new computer? First of all, it's all, I mean, it would be a lot easier if people were all slower, right? The problem is it's very fast. It's high speed. So the connection to memory via now DDR5, double data memory 5.

is ridiculously high throughput, is very, from a signal integrity perspective, really complicated. These boards, by the way, ultimately, this is all analog. We think of it as digital, and it is digital, but digital is like a lie that... double ease allow us to tell ourselves it is actually like you are talking about signals that are racing through a substrate and And with a PCIe or DDR5, all of the – so those signals are very complicated to lay out. That's complicated. The actual like –

How does the computer start? Like, this computer is like, it's like a 777. You know, 747 used to be my favorite jet to kind of pick on, but now the 747s are retired, so I've got to pick something else. And I'm not going to pick another Boeing aircraft, I don't think. An A380, I guess. I should pick an Airbus. But you think about, like, the...

Okay, an Airbus doesn't just like come by itself like it needs an airport. It needs like a runway. It needs all the infrastructure to feed it. Well, so too for a microprocessor. It doesn't like just the power sequencing for those things is very complicated. It needs another surround. that manages the power distribution network, that actually manages its power on sequencing, that manages all of its environmentals, that manage its connection to memory, to I.O. So it's just fractally complicated.

To the point that people often just take reference designs and iterate on them. They don't actually really innovate on this stuff because it takes so long. And you told me this was really interesting last time that, as I understand, reference design means, correct me if I'm wrong, that...

you're an electronics engineer or hardware engineer, and you want to build a new hardware, and you take an existing reference that has been tested, measure it out, like it doesn't create accidental, like all sorts of radio frequency things, and then you implement that. But you told me that this is not what you did.

You also told me that it's pretty hard to find electronics engineers who are used to not doing reference design, but who are brave enough to like... Who are brave, yes. I would say that in computer design in particular... The high-speed designs are so hard. People got very accustomed to taking the reference designs, and it was harder to find folks. that were willing to take a clean sheet of paper. And we ultimately found them. I mean, and we've got a double E team that is extraordinary.

And double E is electronic engineer, right? Yeah, and absolutely fearless. And in part because like they're actually – but they didn't spend their careers. at Dell and HPE. Like they're coming. No, they're like coming from like GE Medical where they worked on CT systems. Wow. How did that happen? How did they come to Oxide? It's not, but it feels like such a different field. I would have assumed naively that if you're building a computer, you'll...

Try to get electronics engineers who have built computers. You would think. And that was probably our thought as well. And then we discovered that we were... Not getting along with those engineers. We didn't hire them because we were, but we were just like finding like there's a lot of friction because.

There wasn't a real first principles approach from those folks. And this is where you get to, especially you get to talk to folks that like been at Dell for a generation. And like for any design, they're used to calling what's called the FAE, which is the field applications engineer.

for you know the for the voltage regulator it's like well the fae gives me the design it's like all right well how do you know that it's the right design well no he there so it's like all right so like let's go hire that person then let's forget you And we were really just, we were struggling. I was struggling to get outside of my own personal network to find the right engineers. And we were kind of brainstorming, like, how can we get people?

to see the company who wouldn't otherwise see it. And specifically for hardware engineers like we're talking about. Yeah, and just in general, but especially for double-e's. Yeah, for double-e's, it was feeling especially acute. One of the...

Why everyone is paid the same salary at Oxide

thing you were kind of brainstorming as a team and uh you know one of our engineers said you know i you know the values are very important to us at oxide which they are and i relay oxides values and our principles to people outside of oxide and they're like that's just And I explain that like, you know, normally I would agree with you, but it's when I get to the compensation, people, their heads turn because our compensation is transparent and uniform.

And people were like, wait, what? And I'm like, I can write a blog entry on it. Like, yeah, that'd be great. I'm like, okay. And so up to that point, we had not talked about it at all. We had not talked about it publicly at all. I just came up with the idea that like. compensation is just private. It's just not something you talk about with people, you know? And you go to a level of FYI or some of the forums, you're anonymously asking, people are anonymously sharing. That's how you get.

information that's a good information and so i kind of had this idea that it was that it just is not something to you and so we wrote this blog entry in march of 2021 and it sent our hiring non-linear it wasn't that people were like

oh my God, I want to work for a company where everyone's paid the same. Like that is like, that's like. Yeah, because your composition was both the same and you also put the number specifically. I think it was something like $200,000 back then. Yeah, it was a little bit less back then, but it is a little bit more than that. Yeah.

Now we just got another raise. So now I've lost track. It was 207, but now it's more than that. I actually don't know because the one thing is when compensation is uniform, like you don't keep total track of like, oh, like literally people are like, wait a minute. Like I got, there's an error in my paycheck. got paid more and be like no no we got a raise like when was that like no it was at the last all hands like oh you know i did have to go to the bathroom like at the end of last

All hands, I didn't listen to the recording. I guess I missed my raise. Like, yeah, you got to pay attention around here. But it was more that what drew attention was that people, engineers in particular, But just in general, people drawn to a company that would be so nuts as to do that. And ultimately, like that engineer that made the suggestion was absolutely right.

It was the compensation that convinced people that we take our values really seriously, that we're a really principled company. Which is you're paying everyone the same base salary. That's right. Exactly the same. Yeah. They're making the same as you, the electronics engineer. Software engineer, the whatever other role you might have. That's right. And I don't know if...

You should just go ahead and say it if you want to. But many people are like, would you pay support engineers the same amount? It's like, why are people always like pick on support? They would ask. Exactly. Answer to that is yes. And the answer to that is, if you do that, you find superlative support engineers. And so we have got, I think we've got the best support engineers in the business. I think we've got really, really phenomenal.

folks and support. I heard a small company called Gumroad do this where they paid their support staff really high, again, about the same as software engineers, and then they got support staff who were software engineers. And they could fix the code or like write tools for themselves. And you get people for whom, because, you know, there's a certain thrill in support that because you've got someone with a problem.

It's technical. You get to come up, you get to be technical, you get to solve a hard problem. And then immediately you get such gratitude, you know, and like, that's a rush. If there are people that are really drawn to that, like I love helping other people. I love that feeling that I get when I resolve a problem for someone that immediacy. So one of the things that we've heard repeatedly from, from several of our support engineers is.

My heart was always in support, but, but my career path was forcing me into a different career path. And I love the fact that I can get back to where my heart is. That's nice because now like it, yeah, you're not going to make more by. doing something that you're not as into i love that so going back to where we were which is like you build the hardware you build this like really complicated piece and you went through electronics engineering putting it together

Oxide's software stack

Let's put a software because that's super exciting. What does it take to... build software for this did you start from let's talk from from the low level did you start from scratch from operating system did you have to or could you use yes and there's kind of different answers at different levels of the stack so on our service processor we did serve

scratch we did our own de novo operating system um in rust appropriately called hubris because we had the hubris to do it um the the debugger by the way for hubris is called humility feels like appropriate for a debugger so that was was de novo and this is open source right

Yeah. The entire stack is open source. Everything we've done is open source. We can go on GitHub and check it out. Go on GitHub and check it out. And yeah, I mean, we've got God's own revenue model because like, you're like, well, what if somebody like can download it, run it on a different computer? It's like.

Knock yourself out because, you know, we think the best way to run this is on the machines that we make. And those are not free. An Oxide machine is not, you know, that's not freely downloadable. But all open source. That was for the service processor. For the host CPU, we really had it kind of at a quandary. Like, what are we going to do in the host CPU? And with that to say, like, on the actual, like, what was then AMD Milan, now AMD Turin Silicon.

we knew that we wanted to do in the product we would do our own hypervisor and our own control plane it was very so this is not something that you run the control plane is is that controlling multiple Like the whole, like you have a bunch of processors and memory and all that and control playing control. You plug this thing on, you power it on, you put a networking. What you get is.

A console that looks a lot like a, well, would like look like AWS, AWS looked better. I mean, it's a console. I mean, I mean, look, not disparaging AWS, but like we know that like design is not really the strong suit. We agree with that. Yeah, exactly. So it looks gorgeous, of course.

And it's also got, you've got your API, you've got your CLI, and you're provisioning instances. Where are those instances provisioned? It's the control plane that makes those decisions. You are attaching virtual storage to those instances.

Where does that storage live? It's the control plane that makes that decision. So just like with AWS, you don't need to know that stuff. That's just happening. You're using Terraform to spin up your cluster. You're running Kubernetes on it. You're knocking yourself out.

We are delivering all of the software from that lowest layer, that service processor, the operating system that's running on the host CPU, and then that distributed system. Very importantly, that distributed system, which we called Omicron. before the Omicron variant of COVID, which was feeling very, like, ill-timed for a very...

brief period of time, it was feeling ill-timed. And now I feel like the Omicron variant of COVID is just like, it's just forgotten and now it's a good name again. So it's like, you know, we just- It was a really short-lived. It was a short-lived, yeah. So, you know, we lived longer than the Omicron variant of COVID. And that is our- are controlling um and um that is a very sophisticated body of software um in addition to because it's

It's not enough just like provision in an instance, right? And you need to do that robustly. You need to do that via API, COI, and so on. But then all of the software that does that and keeps track of your instance and so on, it's very important that you can actually... update that software. That whole distributed system, you need to be able to update to a new version of the software. And this gets really thorny, right? Because in a public cloud, you do that with a runbook.

I mean, even the, you know, we don't feature it prominently, but even in GCP and AWS, yes, there's a lot of automation, but there's also humans involved. And there are humans that are taking the responsibility for actually. updating software for sure really yeah i mean again for the most part yeah i mean there's a lot of automation involved but in particular if something goes wrong in an update you know you've got devops that can can hop in and figure out what's going on and get it rectified

We are shipping a distributed system across an air gap in an oxide rack that's potentially running in a secure facility. We cannot be there if it goes wrong. Especially because a lot of your customers are buying it because they want to do it themselves.

So in many ways, the thorniest software problem for us, we had actually several thorny problems between them because they're all thorny for different reasons. One of the very, very thorny problems was how do we ship a distributed system that we can then update?

The evolution of software updates

And one of the things we did that was important was like, okay, because it's very easy to paint a roadmap that is very complicated for update. You'll never ship anything. So what we need to ship in that first product that we shipped when you were in Emeryville two years ago.

We needed the minimum viable update. We needed an update where the software could be updated, even if it was painful. So what we did is we had this thing called Mupdate, which is the minimum update. And Mupdate, in particular...

required the control plane to be parked. So we're going to take this rack that's running Instas, take it offline. We're going to update it and then bring it back online. And that was robust. It was great. And we got that working. That's great. That it's great in that you can update it. That's actually not what you want in a cloud, right? You're like, I, sorry, I'm like using this thing 24 seven. Like I actually, I want to, these instances need to remain up while I update it.

But that gave us the platform to go build that update functionality into the software. Extraordinarily sophisticated and really an extraordinary body of work. And actually just recently we had at our internal meetup. The engineer who led the charge on that, Dave Pacheco, gave a presentation on looking back of two years of update. And I got to tell you, I think this is one of the best single talks on software you'll ever see. And we will link this, but.

Can you give me just a short overview of like why this update is so difficult? Because like some listeners will be used to just building applications, for example, on the iPhone. And an update there, what it means, obviously, I know this is way more complicated, but an update is there's a new binary version and it replaces the old binary version. Now, of course, you're saying this is an operating system update or...

or you know like with the car and of course you might think like well you know you could just replace the old version with the new version and there's some downtime but where is the complexity that actually like puts all this thorn because i'm sensing this is like I am missing something, something very obvious. Because it's a distributed system. When you've got an app on an iPhone, it's not a distributed system.

Oh, and distributed system, meaning that you've got a bunch of different nodes. Components that are going to speak to one another. Those might need updating as well. Oh, they definitely need updating. Oh, they all need updating. Yeah, the whole thing needs to be updated. You got to be able to update all of the software in the rack. Oh.

This is not just updating the operating system. This is updating absolutely everything. So you might need to update some parts or all parts. You need to update the service processor, the root of trust, the drive firmware, the host operating system, and then all of the components that speak to one another.

okay and then it's like okay so i mean this is challenge is fractally complicated i mean one of the very basic ways it's complicated is like so when we're updating we are moving the system from from one version to another version In between, it's going to kind of be in both versions. Like, what does that mean to have the system that's operable while you've got some new components and some old components? What if you change your database schema?

from one version to the next version, which we definitely have. Like you have to have a method of doing that. And for every one of these components, how is it... We've got to reason about the system when it's in this hybrid state. And then it needs to be done in a way that's very, very robust. First and foremost, we had to develop the foundation that allowed us to do this absolutely robustly. And so the way Dave and team did this is.

you know, with that foundation and then very slowly lighting up different aspects of the system and making it more and more automatic over time. And, you know, first started running that on what we call our dog food rack. and did our first automatic update on the dog food rack, it was a really great feeling for that team because this has been a very long software road. And it has been one that has been very deliberate. And ultimately, like...

And, you know, full credit to Dave and team, took us about the amount of time that we thought it would, which is kind of very rare for software, because I think software is so practically complicated. But that's only because they've been very carefully... managing scope versus schedule making because quality has got to be the constraint and dave's talk goes into that in detail in a way that i think is just extraordinary so i'd like to talk about

the topic that is, you know, a lot of people's mind is AI, specifically AI tools. How have AI tools changed?

How Oxide uses AI

how you're working at Oxide, specifically think about software engineering, maybe even hardware. Are you using these tools? Are you experimenting with them? For sure. We've been early on in terms of using them. Yeah, I mean, you use them for different, and people are using them in different ways. No part of the Oxide stack is vibe-coded. I think that that is safe to say. But we are using it, and we're using it to, and again, different people are using it in different ways. We are...

you know, using it to do things that are tedious. We're using it to do generate test cases, you know, generate the, I use it. Because I think the thing that is just like unmatched at is just document comprehension. We've got a very writing intensive culture. We've got a lot of documents. It is great. You always had that. Yeah, I always had that.

And if you've got a writing intensive culture, like you're LLM ready not to generate those documents, but to consume them. And to, you know, one of the things that I've always wanted to do, and it's still like now is possible. I haven't. quite found the time to do it. Early on, I wanted to make an RFD glossary. So RFD are a request for discussion. We've got a lot of technical terms. I want to make a glossary. I tried to do that for like three hours. This is like in 2020. I'm like, this would...

This spreads to the horizon. Just making a glossary is so complicated. A glossary is something that an LLM can just turn out. And so there are lots of things that we're doing. to to use llms in particular is clearly a very real very very big shift in lots of different aspects of software engineering i i think that it you know but of course there are people that are being kind of productive about it

I am definitely not a doomer. There are a lot of doomers that are out there. And, you know, I tried to give this talk about building the oxide itself, the oxide rack. And in particular. the problems that we had along the way that an LLM was never going to be of any assistance on. And so, and I, the title of the talk was Intelligence is Not Enough.

And one of the prominent doomers actually did a reaction video to my talk. It's like the only time I've ever had someone. And my daughter, who was then like 11, was just like thought it was hilarious that someone had. held their own time in such low regard that they would spend it recording a reaction video to my talk. And so she was like, I want to watch this. I'm like, oh God, I do not want to watch this again. Ultimately, it was really frustrating.

This person obviously disagrees with what I was saying, but then when I was giving these very concrete examples of here are the specific technical problems that required more than intelligence to resolve, that an LLM was not going to be able to resolve. He literally fast forwarded through those parts. He's like, we just don't need this. This is like, this is just, you're like, bro.

this is the talk like you you can't do this like you're fast forwarding over to the actual like meat of the talk can you give an example of like a problem which you felt was this like even you know if we fast forward to like The arbitrary future. Yeah, yeah, yeah. So yeah, super simple. I mean, we've had many, many scary problems, but...

The limitations of LLMs

Um, we had a, uh, the CPU when we did our first bring up of our first machine. And then what does the bring up mean? A bring up means taking a board and powering it up and. trying to get it to work for the first time. I think you mentioned that the term smoke tests comes from electronics engineers. Oh, I mean, smoke tests, I always think of a smoke test more from aerodynamic engineers, but yes, I mean, aeronautical engineers, but yes, I mean, you're...

definitely like smoke is definitely a possibility as a very bad you do not want smoke that is bad but no smoke please and bring up so to bring up but we are doing bring up and we are unable to get the cpu out of reset And after 1.25 seconds, the CPU presets itself. What's going on? Is the power network bad? We're doing all, and like, when you have something like that happen, it's like, well, what's happening? It's like, I mean.

It's just not working. I mean, like, what do you tell your LOM to be like, like, it's not working? I mean, and they can maybe give you some suggestions, but in this case it wouldn't. So we are going deep into this, understanding like...

are maybe the power network is like marginal no no no we resolve that no no we're we've got a man actually we're working with amd at the time and he's like no these power numbers are amazing like your margin is very good you're measuring it out you're like eliminating that one eliminating that when you're going through to eliminate eliminate eliminating and uh couldn't get we and this was weeks and you're like we are we don't have a company like we're wow we are absolutely dead

And I feel like this is the kind of thing that desperate, you know, you get desperate, you know, like we're going to try kind of anything. And what we, the engineer was working on this actually looked at the protocol between the CPU. and the voltage regulators. There's a protocol that it goes back and forth and says, hey, I need this voltage and this voltage. And one of the things he notices is that there is no acknowledgement packet.

from the regulator. So the CPU asks for a voltage to be set to a certain level. And he's noticing that there's no acknowledgement packet back from the regulator. Which should come. Which should come. And the test, they've got something called SDLE, which is this great...

test goober that you take the CPU off, you put on the SDLE, and it will measure the power for you. Well, the SDLE didn't care whether it got an acknowledgement packet or not. The CPU definitely did. And the CPU, so the CPU says, I want you to go to 0.9 volts.

It never gets an acknowledgement back. And meanwhile, sitting at 0.9 volts, and it's just like, well, I never got an acknowledgement, so we're going to reset, and I'll do it again. And that was due to a firmware bug on the Renaissance controller. And so we got a firmware update from Renaissance, and... Done. I mean, to be fair, the Renaissance FAU is great. It was like, well, you guys should reach out a lot sooner. Like, yeah, I know. We really wanted to make sure that we got like everything.

And that's the kind of problem. And there were many, many problems like this where it's not merely intelligence. It's not – building a board is not an IQ test. It's more – I mean –

You need to be intelligent to do it, but intelligence is not enough. You need these other kind of characteristics. And I feel we also need a team in this case, right? You absolutely need a team. 100%. 100% you need a team. Like you're going to solve these problems with, you know, you had that engineer who just like thought of. Measuring the South. Right. An engineer who was desperate.

Because we were all getting desperate. And again, we've had many of these over the history of the company. And you're right. You absolutely need a team. You need a team. And you see also the value. When you have a team, people have different ways of approaching a problem. That diversity is really important because you need – and actually sometimes this has happened more than once at the company where –

somebody kind of like is just kind of like walking through the problem. And like, someone's like, Hey, I'm just joining, you know, that's about a remote company. Anyone joins them, you know, they're joining the Google meet. Yeah. I'm just joining because, you know, I think that I'm following along and you get someone will be like, just make an.

Like, hey, I got a dumb question. Are those virtual addresses? Like those look like similar virtual addresses. You get something where someone's making and you need someone to kind of like come and make that observation that is maybe. less grounded in it and people are like oh wait a minute no that's actually like that's something to go check and so you need that that different kind of approach um that that is really a team kind of uniquely summons and you know i think you

I might have alluded to it, but on the previous podcast, Armin Ronercher mentioned to me he's the creator of Flask. He's been around the block for quite a while, and he's now doing a startup. And he said that right now it's just him and his co-founder, and he's got an army of...

AI interns right now, there's prototyping him. But he told me, I'd like to start to hire people soon because people bring energy and you need energy for a company to live and thrive. And I'm kind of sensing the same thing. Oh, for sure. No, for sure. And I just listened to this great piece with Richard Sutton, who was the inventor of reinforcement learning. And I think rightfully, and I agree with him, it's like, you guys are conflating an LLM with artificial intelligence. It doesn't have...

This is really important. So like a prompt is not a goal and guessing the next word is not a goal. And, but like us together as a startup and like wanting to make it together, not wanting to die here together. That's a goal. And so we can use that creativity. Maybe we use an LLM certainly as a tool to help us achieve our goal. But I do think that that's a very important distinction.

and can you tell me like what kind of tools you use and what are the areas that you you find it helpful i understand you're experimenting with stuff and you know this is all work in progress but where are areas that that

AI use and experimentation at Oxide

And you mentioned like the summarizing was one example of gloss stories. Oh, yeah. I mean, I use LLMs as an editor all the time. I find it to be a really, I mean, actually, it was funny. I had a blog entry that went on Hacker News and someone's like, oh, this is LLM. I'm like, actually.

It is LLM edited, but the only thing that I did based on the LLM is I deleted an entire paragraph. So there's a paragraph that like wasn't working. And the LLM was like, this paragraph is not working. And I'm like, you know what? I'm just going to delete the paragraph. So it's like. I don't know. You want to say that's LLM edited? Because like every word there is written by me, but there were some words that there was written by me that in LLM social analysis I deleted, which I deleted.

So, I mean, I use it for in writing for sure. I mean, I also like to use it. And this is like a stupid reason, stupid thing. But when you're writing Rost, and we write a lot of Rost, especially when you're new to Rost, you wonder like. The way I just phrased this, is this like idiomatic? Is there a better way to do this? That's a great little problem. I've got this small little snippet of code.

is this an idiomatic way of doing this? Is there a better way of doing this? And that's a great thing for an LLM to make a suggestion or not, or tell you like, no, that's an idiomatic way of doing it. Maybe I would make this small adjustment. So I find it really valuable. I find LLMs to be more valuable in the small. than in the large. So like, again, this kind of, I might, you know, hats off to people who want to.

spend their lives acting as a middle management for robots. But like, that's not necessarily for me. Certainly at Oxide, I mean, our belief is that people take responsibility for their own work. So if you want to have an LLM help you out on that, that's fine. But ultimately, like if there's a bug in this, like you can't blame the LLM. The LLM broke my code is like not interesting. LLMs don't have accountability.

And so one thing that is starting to spread across, I think, a lot of engineering is engineers using LLMs either inside your ID with autocomplete and also kicking off now agents. Now there's more advanced ones with like cloud code and codecs where.

it can actually run command prompts and run your tests. Are you seeing engineers use some of these tools? And there's a little bit of back and forth as well. It's very clear that when you're doing kind of more boilerplate things that... like our so-called on distribution which is they've learned like reactor type script it can spit out a bunch of stuff but you strike me as someone who's doing a lot more

nuanced things. Yeah, I mean, you're writing a bunch of C code in the operating system kernel. It is less valuable.

Yeah, but so what are you seeing across the team in terms of... I think, you know, across the team, I encourage people to experiment. And I would say we're seeing a wide variety of experimentation. Certainly we've got... we're using cloud code a bunch and people are doing that and um but i would say you know broadly speaking for a lot of the work that we're doing um it is helpful as like maybe a polishing tool

but less as a kind of at the epicenter of its creation. It's not true of everything. There's some software for... No, but that's also nice to hear because I'm kind of asking you more to... putting on your CTO hat, who's also very, like, you know, you're very hands-on and you know what's going on with the industry because a lot of non-hands-on executives are kind of looking their finger and thinking, oh, we must be 10 or 20 or 30% more productive. But what I'm hearing is...

like things are kind of the same as before right yeah i mean i mean my big belief is it's a tool it's a powerful tool i mean i will say the thing i you know occasionally people are like well i don't want to use it at all and i'm like you should So like you should try, right? Yeah. Like, let me get you off of that position and let me, you know, we had Simon Wilson on our podcast. Simon's delightful. And, you know, one of the lines that he has that I really love.

is people should run these LLMs on their own laptop where they run slowly and poorly so they can see the bad output that they generate so they can understand what some of the limitations are. So I definitely, I love that. I do think that... People should use them enough to know where they are valuable. It's a very important tool in the toolbox. You want to be aware of it, but it's definitely productive to think it's the only tool in the toolbox because it isn't.

Now, you're in such an interesting company because, like, you know, you don't not just do software, but you do a lot of hardware. Yeah. Have you found any use? No. No. No. Zero. I mean, okay, zero is a bit reductive. I have found it to be useful when, for example, you know, you've got a waveform of an I2C transaction.

It actually, amazingly, you can send that to an LLM and have it like interpret this like, hey, what am I, am I seeing I squared C kind of compliant behavior? And it can help you out on that a little bit, but it's like. absolutely at the edges okay so that's a 0.01 also like i think people don't realize like there are already tools for that like that's what eda is you spend a lot of money

We're not laying this stuff out by hand with graph paper. When you do layout for a board, there are a bunch of rules that are automatically checked for SI. We do a bunch of simulation work. Like, we're not doing that by hand. We're using software. Yeah, and I saw you have those machines in there. Like, I saw that. I think it's a bit reassuring to hear because I think it's very clear. Like, maybe we don't realize software engineers, but...

programming is such a great use case for LLMs. It's a simple grammar. You can validate it. And I think it's sometimes nice to just, you know, touch sand of like an area that is very, very different. Yes. But it's cool that you're checking and, you know, you're seeing if...

If it changes over time, I guess you always keep checking. Yeah, for sure. And I think that it is frustrating to me because programming is such a good use case for certain kinds of programs. So as a result, you end up with certain kinds of programmers.

who just, in part because of their own self-centric view of the universe, believe that, oh, this is just going to replace every job. And it's like, no, not even close. Not even close. And you need to spend more time. You need to get outside a little bit more.

So speaking of getting outside and meeting different people, what I noticed when I went to Oxide is just like, it was great. We had double E's, as you say, software engineers, people who used to work on Virtual Reality at Oculus, all in the same room. Can you tell me about...

Oxide's diverse teams

how big is the team what's the composition yeah so we were on you know we've i think you know we got some more offers going out tonight. So I think we've got on the order, we'll be at like 85. I should probably keep better, I should keep better mental track of it, but we've got like 85 plus minus. And we, you know, we've been very blessed.

We've really put a beacon out there. We've got a lot of people rooting for the company. We've got a lot of people. And as a result, we've got a lot of people who want to work for the company. So, you know, we, as we talked about last time, we really. put a lot on folks to describe the work they've done, what's important to them, why they want to work for oxide. I mean, a lot of my LLM use is...

I will look at someone's materials. You can imagine we've started to see materials that are heavily LLM authored, potential applicants stock side. Please do not do this. We get people who like, who, who human author their entire. And then they get to the last question. Why do you want to work for oxide? Why do you want to work in this role? And they have an LLM spit that out.

And you're like, do you think you want to work here? Like, I'm just like, let's leave aside whether this is like, you know, is this right or wrong or cheating or not? It's like fine, I guess. But like, I don't think you want to work here. like you're not gonna get a job here because i don't think you actually want to work here put it in your own words but that process really has allowed us to attract people who themselves are attracted to the company

and attracted to the culture, the problem, the team. And it's just extraordinary. I mean, I just feel so lucky to be with such an... unbelievable group of people across more and more and more and more disciplines. I mean, the great thing about our approach is it brings people in who are, you know, God, it's like, I love this approach for, we talked about support engineering.

people who are like, God, I love this approach. Like finally QA can stand on its own two feet. I feel that QA has been kind of subjugated by these other disciplines. Now QA is kind of really thought to be. as important as anything else in the company. And it is because at some like monetary perspective, it is as important as anything else. Yeah, but I remember like when I worked at Microsoft back like 15 years ago or so.

the QAs were just on a lower pay grade, you know, like the senior QA was at the same as like, I think software engineer two or something, which just kind of implied. Yep. You're less important. You're less important. You're just less important. And so like, if you tell the world. That we think it's as important. You know who you get? You get people who are extraordinary at QA. You get the best of the best. And so that has been really exciting. And now we've got people coming. I mean, I do love.

how many different companies. My belief is that like every company has something to teach us, that there is something positive you can take from every company. Now there are some companies just like. Oof, you're really scraping the bottom of the barrel. Maybe not Enron, although they did buy some. Yeah, yeah, that's right. There are, like, even Oracle, you can find, there are, that may be a bit of a challenge. Let's not do that one. But you know what?

And at the time, I thought this was a negative, but now I'm like, I see it. Larry Ellison makes every hiring decision at Oracle. So what's positive about that? Exactly. But you'd be like, what's up? I... Really, I really think that the kind of the founder mode, the Paul Graham essay on founder mode is talking about founders that lost track of their own hiring.

So I think now I don't like the way Alison does it. I think that you want to have, you want to trust a team to make a decision, but ultimately I believe that the, that the CEO of a company bears responsibility on every single hire. And I think should be looking at every single hire coming into your company. And that is, to me, that is a very important check.

on these kinds of companies that, that, so I, that is, there you go. Something that I've, something that I positive, I get to take it from Oracle and it's telling that your immediate reaction is like, wait, what's positive about that? Yeah. I'm not sure. Like.

I'm not sure you undid that talk on Oracle. Yeah, fair enough. Exactly. Yeah. And they're from some companies more than others, but I think that there are. And so I love having all of these different experiences present at Oxide because I do think that there's so much. to learn and we're trying, you know, you want to take all the positive things. Cause I also think that every company, including, you know, people, I had actually one of the questions I love that I got once is like,

What do you not want to emulate from Sun? I'm like, oh, thank God. Because people think of Oxide as kind of the second coming of Sun microsystems. And there are lots of things I love about Sun. There are lots of things I did not love about Sun that I did not want to emulate.

And so I think for any, also any company, there are things we want to leave behind. And, you know, I think when you've got a big diverse team, you get to go do that. And one thing that really surprised me last time I was at your office is it turns out that.

most people are not in the office and they work remote and i i would understand for software but how do you make that work for a hardware development where physically you do need to you know be at the the hardware sometimes i understand you see measure stuff i saw a lot of like

Remote work at Oxide

you know you know units sometimes you need to go to like check on manufacturing How does that part work? Yeah. So, I mean, a lot in people's basements. So, you know, fortunately we're making, you know, this is the advantage of making a server and not making like, you know, a tractor or like, you know, we're not making like a, you know, I don't know, like a wind turbine or something.

You know, this is something that people can actually model in their basements. So that helps. But then a lot of even hardware engineering is using these software tools, using EDA tools, you're using SolidWorks, you're using Altium. You're kind of putting this thing together. When you're doing layout, for example.

which is a very important task when you're laying out a board. All of that can be done anywhere. That's all just software. Okay. So there are things where that physicality is very important. And then when you're doing bring up, you actually need to be at your... manufacturer when you do that.

So like that is also not an office. You would need to travel anyway. Yeah, you need to travel anyway. And anyone coming to the electronics industry is like, okay, I'm interested in oxide, but please tell me I never have to go spend any time in Taipei or Beijing because you go out there for, you know, or Shenzhen or wherever, and you're out there for.

two weeks in a windowless office trying to get this thing brought up. And all of our assembly is done here in the United States of Minnesota. So we are all, in fact, we've got a bunch of folks out there this week for Benchmark Electronics in Rochester. Oh, this is wonderful. One thing that you told me is one of the things that's on top of your mind right now. As oxide is growing, you still have this culture of the same compensation.

Scaling company values

full remote. So it's kind of been the same since the start. What will be the challenge in maintaining it? Because again, you worked at large companies. You've seen how it goes. It can get tricky. What are the things that you're seeing and what are the things that you're trying to do to keep this?

kind of start of vibe, even as you might be just bigger. Yeah, so I think that the thing that is top of mind right now for me is, and especially because, you know, we raised a big Series B, which is great. I think... Much more importantly, we're seeing a lot of customer traction, which is great. Wonderful. It's paying off. Yeah, I know. It really is. It's really great. And we kind of knew that it was going to happen in the abstract, but it's fun to actually see it happen and fun to actually see.

Um, the customers that are, you know, like, you know, I bought one rack and I mentioned it, but now I want to buy a lot more racks. I love what I'm seeing. And I want, you know, that's great. Very, very, very exciting stuff. That means we're growing the company a bunch. And one of the things that's very important to me, because I've seen this happen so many times is.

companies take their eye off the ball when it comes to hiring in particular. And it is very important to me that we continue to have absolute discipline in the way we hire. And we're doing that. And fortunately, you know, the nice thing about our hiring process is every single Oxide employee has gone through it. So it's like I'm not having to persuade anyone about the importance of our process.

because everybody has gone through it. And the thing that we've got overwhelmingly in our favor is because we've used our values as a lens for that hiring, Oxide's culture is important. every single person at Oxide. That's what it takes to really preserve that. And it doesn't mean that it won't change at all, but the bones aren't changing. Like what will change is it will be bigger and it will be, I think.

You know, and I love the fact that, you know, even at like 85, we're already so big that, you know, Steve and I know everybody at the company, but very few other people know everybody at the company. So when we get everyone together, it's like.

the best party you've ever been to. Because in college, I used to throw the best parties in college. And the reason I threw the best parties in college is not because of me. It was because of the roommates that I had. So I was a computer science student who played Ultimate. My roommate was an engineer who was on the water polo team. My other roommate was a history student who was in the course.

That's six different demographics that don't normally overlap. And then, very importantly, we made sure that the women's swim team was always in fight. The women's swim team, they were like the foundation. Waterfall player. Yeah, exactly. Waterfall player. You always check their calendar to make sure they can make whatever.

People loved the parties we had. Why? Because they would meet people that they'd never met before who were really interesting. And what I love about Oxide is we've got this, when we get the whole team together. People get all these delightful surprises. So people take me aside and be like, God, you know, Rye is awesome. I'm like, yeah, no, I know. I know. You know too now. That's great. But like, you know, or whomever it is, it's just, it's really exhilarating. And I think that.

Also serves to reinforce how important what we've got is, as I tell the team, like we have lightning in a bottle and we cannot take it for granted. And that means that every single one of us, we need to rise to the moment. We need to do what our customers need us to do, but we need to do it in a way that protects and preserves what got us here. So thinking a little bit.

ahead let's assume that you know these ai tools will just get better eventually they'll be able to you know help more even on on your kind of low level things you've been in the industry for quite a while you've seen a lot of shifts

AI's impact on the future of engineering

What do you think are some of the things, both in software engineering or in hardware engineering or just in general engineering, that will probably not change, even if we predict? these things being like more capable. Yeah, I think that what we, I mean, I think that it's certainly a revolution. I think it's going to allow us all to do more. I do think that we are going to hit a point.

where people understand that this is a tool where because there's a little bit where we're still have this tension of like oh is this going to be AGI is this going to replace all jobs and this is like as far as I'm concerned. And it's distracting kind of nonsense. And we actually need to get back to putting the tools in the toolbox of the human that's building it. Now, these tools have become...

much more powerful. And I think that's going to be, I think it's extraordinary. I think it's important. I think that also will be, you know, we've got a lot of experiments right now. We humanity. that I'm not sure are going to make economic sense. So, you know, we'll be figuring that out as well. But I think that, you know, one of the things I am a little bit worried about is a little bit of despair from younger software engineers in particular.

who are like, what's the point? Like an AI can do all this. Well, and there's also the news, even for more experienced software engineers, in the mainstream media, there's this news that company X is laying off.

health their workforce because of ai and by the way when we look closer it's not because of ai but it is coming across and it does give not younger people a lot of anxiety tons Even like mid-level folks or even some more experienced, it does give a sense of, I think it's the first time in computer history that most of us remember that there is this thing that could threaten my job.

And I think we've just never had to deal with this. I think, you know, there are other industries that might have been a bit more used to it. Yeah, I would say that we, I mean, there have been busts before. The dot-com bust was a bust. Like a lot of jobs did disappear, right? So. I think that we, but the buses really come in what feels to be a broader and more permanent way. I mean, my view is like this is an opportunity.

For, I mean, I think one of the things we should be societally really encouraging is new company formation. Because now, I mean, just like you're talking to Armin about how, you know, just a small group, you know, just Armin and his co-founder were able to do. so much together, right? We should be really encouraging that. And what are some of the gaps that we can all go fill? Because ultimately, like, we all need to find a livelihood. We need to find meaning.

And the way we do that as engineers is we build useful things. And so we can now build many more useful things. What would we go build? If you could build anything, what would you go build? And that's kind of the question people need to ask themselves. It's scarier. It's scarier than like, go to this school, get concentrated in this, and then mama Google will hire you and take care of you and feed you breakfast. It's like, no, that's not what's going to happen. And it feels a lot scarier.

Because it feels like there's at some level like less security, less job security. But yeah, that's true. And that's scarier, but there's also a lot more opportunity. And for a college student or someone in school or with little experience who says like, look, my goal would be one day in like five years time to be as good.

Bryan's advice for junior engineers

that I could get a job at a place like Oxide. It doesn't need to be Oxide, but again, a place that has a high bar, they often hire experienced people, but I want to get there. And yeah, there's all this AI stuff is happening. What would you advise them in terms of... what to focus on, what areas to study, what things to do, or how to think about like, you know, like they have the goal is there. What advice would you have them part with? Yeah.

So I think that you need to have a different mindset. And that mindset needs to be not around how do I create as much as possible, but rather how do I get better? How am I getting better every day? And I think LMs are a great tool to get better. How can I learn about something new, go deeper, go into something that I wouldn't go into before?

get over that kind of that fear. And one needs to, especially if you're in school now, you want to work at a place like Oxide. It's like, you kind of have to view it as like, all right, like you want to play major league baseball. That's great. Like you're a, you're a great high school player. You want to play major league baseball. It's very hard. Got to get better every single day. And you're going to be, need to be really.

focused on getting better and you need to be like really realistic about like what i need to go do to get better and it's hard but and it's chancy because you might not get there but you could get there And you're certainly not going to get there if you don't focus on that kind of self-improvement. So I really think that there is a shift in mindset that needs to happen or that one needs to have. I would put it that way. You really got to have a mindset towards.

getting better, understanding more. What do you not understand? There is lots that you don't understand. I mean, I think one of the challenges of modernity is that we delude ourselves into thinking that we understand it all. You don't. I don't. Like one of the things that I've learned, I've joked at Oxide that like I keep waiting for the day that I know how computers work. And it like.

like it wasn't today definitely wasn't yesterday you understand how they work but i mean that earnestly in that the the the amount of of complexity that i that i definitely i mean I knew but also didn't know. It's like every day I feel I'm still learning new facets and not just like a computer but actually delivering a computer to people. Like there's so much to learn out there. So many – and now –

With the way you've got to view LLMs is not like this thing is coming from my job. You've got to view it as like, no, I've got now this like. private coach, tutor, what have you, that I can ask any question to. It's not going to say I got to like fact check its answers for sure. But now you've got the opportunity.

And it is easier to get into this domain than it ever has been. And that's great and it's powerful, but it can also be scary. And as closing, what's a book or two books that you would recommend? the folks and why oh so many good books you know my uh my uh my i've got a i've got a 21 year old an 18 year old and a 13 year old and when the 18 year old was in his he's now a freshman in college he's a high school senior he got this assignment

Book recommendations

Great assignment from his English teacher. Namely, go to someone that you know and ask them for three books that they would recommend that you read. And I'm going to assign you one of those three books to read and you're going to read it.

And then you're going to talk with him about that book. And I'm like, oh, I love this assignment. So he's like, dad, I'm coming to you. And I'm like, oh, you have, thank you so much. And then of course my wife was like, why didn't you come to me? Like, hey, look, I'm, you know, I, sorry. You know, look, it was great. So.

Yeah, I'll give you those three books that I gave to him. And I think that each of these is really terrific. First is Solving a Machine by Tracy Kidder. So this one won the Pulitzer Prize in 1980 or 1981. but about the building of a new computer at Data General. And it's extraordinarily well-written.

And even if folks think, well, I'm not like, what do I have to do with a computer company in the late 70s and early 80s? Any engineer will see something of themselves in that book. It is just masterfully told. Tom West, who's the, is kind of a complicated figure, but that is, Solve is still, I mean, it's literature for us. So I would absolutely, Solve a New Machine, every engineer should read Solve a New Machine by Tracy Kidder.

For me personally, very influential was Skunk Works by Ben Rich. So about the history of Skunk Works, Clarence Kelly Thompson was kind of the originator of Skunk Works at Lockheed Martin. Extraordinary story about what engineers can do. when they kind of task themselves on the impossible. It's such a good book. It's such a good book. Amazing book. And then the other one is Steve Jobs and the Next Big Thing by Randall Strauss.

So Steve Jobs is kind of like lionized by the industry, but people forget about a very important chapter of his life, namely Next. And I believe we are, it was just an anniversary. Maybe it was the 30th anniversary, it must have been. of the, or maybe the 40th anniversary, Jesus, of the announcement of the next machine. So Steve Jobs left Apple, was fired from Apple, started a computer company called Next. Really interesting company in a lot of ways.

was at Next for a very long time. It was a 13-year journey before Next was bought by Apple. Next is bought by Apple. Steve Jobs returns to Apple when they buy Next. This book, Steve Jobs and the Next Big Thing, is written before. Apple buys next. And it is at Steve Jobs' lowest moment. It is not here to praise him. It is here to bury him. And it is very interesting about all the missteps at next.

And the thing that we cannot know, because Jobs obviously died, but I believe, having read the book, which gets basically, Next gets essentially no treatment in the Isaacson biography. Next is like six pages of glory. It's like, that's not what it was. But Randall Strauss's book is masterful. And in particular, I believe that Jobs' failures at Next were essential for the resurrection of Apple. And because you look at the way he handled himself.

coming back to Apple was very different from the jobs that got fired from Apple. And I think that like when people look at jobs, like they don't really take him apart. And I think you should, because I think he's a really interesting guy. He's enigmatic. He's someone who's like. He did things that I think are really fascinating and also things that I really strongly disagree with. So just to be clear, I'm not like, but I think that he's indisputably an important figure and that.

book is by far the best book so steve jobs no i'm adding that i actually want to read that oh it's extraordinary it's very good well brian this was such a fun discussion oh my my pleasure i mean we knew this was going to be long and wide-ranging so hopefully it delivered but uh I really appreciate that. We went from the 90s all the way to the future. There you go. Awesome. Well, thank you so much for having me, Gerger. It was terrific. I've got to say.

Oxide is one of my favorite companies, and I say this as someone who has zero affiliation with them. It's just so rare to find a startup that built both hardware and software and are world-class in doing both of these, and are so open about talking exactly how they do it all. Honestly, the only downside I can think about Oxide is how their server racks are built for pretty large companies and are definitely out of reach for hobby devs.

In this episode, I really appreciate how much of a straight shooter Brian was, especially about the impact of AI tools. Yes, everyone at Oxide uses them and they do find use cases for coding and working with documents.

but it's eye-opening how it gives them basically zero help with hardware engineering. This is a good reminder that elements might be the single best fit for coding related tasks. And as devs, we should know that these tools might be more specialized than many people think. I hope you enjoy the stories in this.

episode as much as I did. If you'd like to learn more about Oxide, I did a two-part deep dive about the company and you can read it linked in the show notes below. If you've enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. This helps the podcast a lot. A special thank you if you also leave a rating on the show. Thanks, and I'll see you in the next one in the next year.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android