Mark Zuckerberg - Llama 3, Open Sourcing $10b Models, & Caesar Augustus | Dwarkesh Podcast

00:00

Mark, welcome to the podcast. Mark Zuckerberg, Big Fan of Your Podcast. Mark Zuckerberg, Big Fan of Your Podcast by Dwarkesh Patel, Okay, so let's start by talking about the releases that will go out when this interview goes out. Tell me about the models, tell me about Met AI, what's new, what's exciting about them. Mark Zuckerberg, Yeah, sure. I think the main thing that most people in the world are going to see is the new version of Met AI.

00:22

The most important thing about what we're doing is the upgrade to the model. We're rolling out Llama 3. We're doing it both as open source for the dev community and it is now going to be powering Met AI. So there's a lot that I'm sure we'll go into around Llama 3. But I think the bottom line on this is that with Llama 3, we now think that Met AI is the most intelligent AI assistant that people can use that's freely available. We're also integrating Google

00:48

and Bing for real-time knowledge. We're going to make it a lot more prominent across our apps. So basically at the top of WhatsApp and Instagram and Facebook and Messenger, you'll just be able to use the search box right there to ask any question. And there's a bunch of new creation features that we added that I think are pretty cool, that I think people enjoy. And I think animations is a good one. You can basically just take

01:14

any image and animate it. But I think one that people are going to find pretty wild is it now generates high quality images so quickly. I don't know if you've gotten a chance to play with this, that it actually generates it as you're typing and updates it in real time. So you're typing your query and it's honing in on, and it's like, okay, here. Show me a picture of a cow in a field with mountains in the background. It's drinking beer and

01:44

it's updating the image in real time. It's pretty wild. I think people are going to enjoy that. So yeah, so that I think is that's what most people are going to see in the world, right? We're rolling that out. You know, not everywhere, but we're starting in a handful of countries and we'll do more over the coming weeks and months. So that's that I think is going to be a pretty big deal. And I'm really excited to get that in people's hands. It's a big step forward for

02:09

METAI. But I think, you know, if you want to get under the hood a bit, the law of three stuff is obviously the most technically interesting. So, you know, we're basically, for the first version, we're training three versions, you know, an eight billion and a 70 billion, which we're releasing today, and a 405 billion dense model, which is still training. So we're not releasing that today. But, you know, the eight in 70, I mean, I'm pretty excited about how they turned out.

02:39

I mean, it's, you know, they're, they're leading for their scale. You know, it's, I mean, we'll release a blog post with all the benchmarks so people can check it out themselves. And obviously, it's open source so people get a chance to play with it. We have a roadmap of new releases coming that are going to bring multimodality, more multilinguality, bigger context windows to those as well.

03:03

And then, you know, hopefully sometime later in the year, we'll get to roll out the 405, which I think is, is, you know, in training, it's still training, but for where it is right now in training, it is already at around 85 mmLU. And, and just we expect that it's going to have leading benchmarks on a bunch of, on a bunch of the benchmarks. So I'm pretty excited about all of that. I mean, that the 70 billion is, is great to, I mean, we're releasing that today. It's around 82 mmLU

03:37

and has leading scores on math and reasoning. So I mean, it's, I think just getting this in people's hands is going to be pretty wild. Oh, interesting. Yeah, that's the first time here we're going to be trying to master from impressive. Yeah, and the 8 billion is, the 8 billion is, is nearly as, as powerful as the biggest version of llama to that we released. So it's like the smallest llama three is basically as powerful as the, the biggest llama to. Okay, so before we dig

04:01

into these models, I actually want to go back in time. 2022 is, I'm assuming when you started acquiring these H 100s or you can tell me when, we were like stock prices getting hammered, people are like, what's happening with all this cat, beck, people aren't buying the metaverse, and presumably you're spending that cat, beck to get these H 100s. How back then, how did you know to get the H 100s? How did you know we'll need the GPUs? I think it was, it was because we were

04:25

working on reels. So, you know, we got into this situation where, you know, we always want to have enough capacity to build something that we can't quite see that were on the horizon yet. And we got into this position with reels where we needed more GPUs to train the models, right? It was this big evolution for our services, we're instead of just ranking content from people who you follow

04:53

or your friends and whatever pages you follow. We made this big push to basically start recommending what we call unconnected content, basically content from people or pages that you're not following. So now, kind of the corpus of content candidates that we could potentially show you expanded from, you know, on the order of thousands to on the order of hundreds of millions. So, completely

05:19

different infrastructure. And we started working on doing that and we were constrained on, on basically the infrastructure that we had to catch up to what TikTok was doing as quickly as we would have wanted to. So, I basically looked at that and I was like, hey, we have to make sure that we're never in this situation again. So, let's order enough GPUs to do what we need to do

05:42

on reels and ranking content and feed. But let's also, let's double that, right? Because again, like our normal principle is there's going to be something on the horizon that we can't see yet. Did you know it would be AI? Well, we thought it would be, we thought it was going to be something that had to do with training large models, right? I mean, but at the time, I thought it was probably going to be more something that I had to do with content. But I don't know. I mean, it's almost just

06:05

the pattern matching and running the company is there's always another thing, right? So, I'm not even sure I had that time I was so deep in just trying to get, you know, the recommendations working for reels and other content. Because I mean, that's just such a big unlock for Instagram and Facebook to now being able to show people content that's interesting to them that they're from people that they're not even following. But yeah, I that ended up being a very

06:31

good decision, retrospect. Yeah. Yeah. Okay. And it came from being behind. So then it wasn't like, I was, you know, I wasn't like, oh, I was so far ahead. Actually, most of the times, I think, where we kind of make some decision that ends up seeming good is because we messed something up before and just didn't want to repeat the mistake. This is a total deterrent. But I actually want to ask about this while we're on this. We'll get back to you and I in a second. So you didn't

06:53

suffer one billion. But presumably there's some amount you would have sold for, right? Because you write down in your head like, I think the actual valuation of Facebook at the time is this and they're not actually getting the valuation right? They're already five trillion dollars. Of course, you would have sold. So what, like, how did you think about that choice? Yeah, I don't know. I mean, look, I think some of these things are just personal. I don't know at the time that I was sophisticated

07:17

enough to do that analysis. But I had all these people around me who were making all these arguments for how like a billion dollars was, you know, it's like, here's the revenue that we need to make and here's how big we need to be and like, it's clearly so many years in the future. Like, and it was, it was very far ahead of where we were at the time. And I don't know, I didn't really have the financial sophistication to really even engage with that kind of debate. I just, I think I

07:44

sort of deep down believed in what we were doing. And I did some analysis. I was like, okay, well, what would I go do if I wasn't doing this? It's like, well, I really like building things and I like helping people communicate and I like understanding what's going on with people and the dynamics between people. So I think if I sold this company, I'd just go build another company like this. And

08:09

I kind of like the one I have. So, so I mean, you know, what's why? Right? But I don't know. I think a lot of the biggest bets that people make are often just based on conviction and values. Not it's actually usually very hard to do the analyses trying to connect the dots forward. Yeah. So you've had Facebook AI research for a long time. Now it's become seemingly central to your company. At what point did making AGI or whatever, however you consider that mission?

08:44

At what point is that like this is a creek priority of what meta is doing? Yeah, I mean, it's been a big deal for a while. So we started fair about 10 years ago. And the idea was that along the way to general intelligence or AI, like full AI, whatever you want to call it, there can be all these different innovations and that's going to just improve everything that we do.

09:08

So we didn't kind of conceive it as a product. It was more kind of a research group. And over the last 10 years, it has created a lot of different things that have basically improved all of our products and advanced the field and allowed other people in the field to create things that have improved our products too. So I think that that's been great. But there's obviously a big change. In the last few years when chat GPT comes out, the diffusion models are an image creation come out.

09:39

And like, I mean, this is some pretty wild stuff, right? That I think is like pretty clearly going to affect how people interact with like every app that's out there. So at that point, we started a second group, the Gen AI group, with the goal of basically bringing that stuff into our products, a building leading foundation models that would sort of power all these different products. And initially, when we started doing that, the theory at first was, hey, a lot of the stuff that we're doing is

10:13

is pretty social, right? So it's helping people interact with creators, helping people interact with businesses to, you know, so the businesses can sell things or your customer support or, you know, basic assistant functionality for, you know, whether it's for our apps or the smart glasses or their VR or like all these different things. So initially, it wasn't completely clear that you were going to need kind of full AGI to be able to support those use cases. But

10:44

then through working on them, I think it's actually become clear that you do, right? They're in all these subtle ways. So for example, you know, for a llama to when we were working on it, we didn't prioritize coding. And the reason why we didn't prioritize coding is because people aren't going to ask meta AI a lot of coding questions in WhatsApp. Now they will. Right? Well, I don't know. I'm not sure that WhatsApp is like the UI that people are going to be doing a lot of coding

11:04

questions. So we're like, all right, look, in terms of the things that, you know, or Facebook or Instagram or, you know, those, those different services, maybe maybe the website, right? Meta.ai. That we're launching, I think. But, but the, the thing that was sort of, I think has, has been a, you know, somewhat surprising result over the last, you know, 18 months is that it, it turns out

11:25

that coding is important for a lot of domains, not just coding, right? So even if people aren't asking coding questions to the models, training the models on coding helps them just be more rigorous and answer the question and kind of help reason across a lot of different types of domains.

11:41

Okay, so that's one example where it's like, all right, so for llama three, we like really focused on training it with a lot of coding because it's like, all right, that's going to make it better on all these things, even if people aren't answering, aren't asking primarily coding questions. Reasoning, I think, is another example. It's like, okay, yeah, maybe you want to chat with a creator or, you know, your business and you're trying to interact with a customer. You know, that interaction

12:02

is not just like, okay, the person sends you a message and you just reply, right? It's a, it's like a multi-step interaction where you're trying to think through, how do I accomplish the person's goals? And, you know, a lot of times when a customer comes, they don't necessarily know exactly what they're looking for or how to ask their questions. So it's not really the job of the AI to just respond to the question. It's like, you need to kind of think about it more holistically. It's really

12:25

becomes a reasoning problem, right? So if someone else, you know, solves reasoning or makes good advances on reasoning, and we're sitting here with a basic chatbot, then like our product is lame compared to what other people are building. So it's like, so okay, so at the end of the day, we've got, we, you know, we basically realized we've got to solve general intelligence.

12:43

And we just kind of up the ante and the investment to make sure that we could do that. So the version of Lama that, that, that's going to solve all these use cases for users, is that the version that will be powerful enough to like replace a programmer you might have in this building? I mean, I just think that all this stuff is going to be progressive over time. But in case, Lama 10, I mean, I think that there's a lot baked into that question. I'm not sure that we're replacing people as much as

13:16

making people tools to do more stuff. Is the programmer in this building 10x more productive after a lot of more? But, but no, I mean, look, I'm not, I don't believe that there's like a single threshold of intelligence for, for humanity, because I mean, people have different skills. And at some point, I think that AI is going to be, is probably going to surpass people at most of those things. I'm depending on how powerful the models are. But, but I think it's progressive.

13:43

And I don't think AI is one thing. I think it's, you're basically adding different capabilities. So multi-modality is kind of a key one that we're focused on now, initially with photos and images and text, but eventually with videos. And then because we're so focused on the metaverse, kind of 3D type stuff is important. One modality that I'm pretty focused on that I haven't seen as many other people

14:04

in the industry focus on this is sort of like emotional understanding. Like, I mean, so much of of the human brain is just dedicated to understanding people and in kind of like understanding your expressions and emotions and I think that that's like its own whole modality, right? That, I mean, you could say, okay, maybe it's just video or image, but it's like clearly a very

14:25

specialized version of those two. So there's all these different capabilities that I think you want to basically train the models to focus on as well as getting a lot better at reasoning, getting a lot better at memory, which I think is kind of its own whole thing. I don't think we're going to be primarily shoving context or kind of things into a query context window in the future to ask more complicated questions. I think that there will be kind of different stores of memory or different

14:52

custom models that are maybe more personalized to people. But I know that I think that these are all just different capabilities and then obviously making them big and small we care about both because you know, we want to, you know, if you're running something like meta AI, then we have the ability to, that's pretty server based. But we also want it running on smart glasses and you know, there's not a lot of space in smart glasses. So you want to have someone that's very

15:15

efficient for that. What is the use case that if you're doing tens of billions of dollars with inference or even eventually hundreds of billions of dollars with inference using intelligence and industrial scale? What is the use case? Is it simulations? Is it the AIs that will be in the metaverse? What will we be using the data centers for? I mean, our bet is that it's going to, this is basically going to change all of the products, right? So I think that there's going to be a

15:42

kind of meta AI general assistant product. And I think that that will shift from something that feels more like a chatbot where it's like you just ask a question that kind of formulates an answer to things where you're increasingly giving it more complicated tasks and that goes away and does them. So I think that that's going to take a lot of inference. It's going to take a lot of compute

16:01

in other ways too. Then I think that there's a big part of what we're going to do that is like interacting with other agents for other people, so whether it's businesses or creators. I guess a big part of my theory on this is that there's not just going to be like one singular AI that you interact with because I think every business is going to want an AI that represents their interests. They're not going to want to primarily interact with you through an AI that

16:30

is going to sell their competitors customers. So sorry, they're competitors products. So yeah, so I think creators is going to be a big one. I mean, there are about 200 million creators on our platforms. They all basically have the pattern where they want to engage their community, but they're limited by hours in the day and their community generally wants to engage them,

16:53

but they don't know if they're limited by hours in the day. So if you could create something where an AI could basically, or that creator can basically own the AI and train it in the way that they want and can engage their community, I think that that's going to be super powerful too. So I think there's going to be a ton of engagement across all these things. But these are just the consumer

17:16

use cases. I mean, I think when you think about stuff like, I mean, I run our foundation, right, a Chan Zuckerberg initiative with my wife, and we're doing a bunch of stuff on science, and there's obviously a lot of AI work that I think is going to advance science and healthcare and all these things too. So I think that it's like, this is I think going to end up affecting basically every area of the products and the economy.

17:41

The thing you mentioned about an AI that can just go out and do something for you that's multi-step is that a bigger model is that you'll make like Lama 4 will still there'll be a version that's still 70b, but we'll just be you'll just turn it on the right data and that will be super powerful. What is the progression look like? Is it scaling? Is it just same size, but different banks

18:01

like you were talking about? I don't know that we know the answer to that. So I think one thing that is seems to be a pattern is that you have the Lama, sorry, the Lama model, and then you build some kind of other application-specific code around it, right? So some of it is the fine-tuning for the use case, but some of it is just like logic for okay, how, like how many AI should integrate, that should work with tools like Google are Bing to bring in real-time knowledge. I mean,

18:37

that's not part of the base Lama model, that's like part of it. Okay, so for Lama 2, we had some of that and it was a little more kind of hand engineered, and then part of our goal for Lama 3 was to bring more of that into the model itself, and but for Lama 3 as we start getting into more of these agent-like behaviors, I think some of that is going to be more hand engineered, and then I think

19:00

our goal for Lama 4 will be to bring more of that into the model. So I think at each step along the way, you kind of have a sense of what's going to be possible in the horizon, you start messing with it and hacking around it, and then I think that that helps you hone your intuition for what you want to try to train into the next version of the model itself. Interesting. Which makes it more general, because obviously anything that you're hand-coding is, you know, you can unlock some

19:26

use cases, but it's just inherently brittle and non-general. Hey everybody, real quick, I want to tell you about a tool that I wish more applications used. So obviously you've noticed every single company is trying to add an AI chatbot to their website, but as a user, I usually find them really

19:44

annoying because they give these long generic, often useless answers. Command bar is a user assistant that you can just embed into your website or application, and it feels like you're talking to a friendly human support agent who's browsing with you and for you, and it's much more personalized than a regular chatbot. It can actually look up user's history and respond differently based on that. It can use APIs to perform actions. It can even practically nudge users to explore new features.

20:14

One thing that I think is really cool is that instead of just outputting text, command-barc can kind of just say here, let me show you, and start browsing alongside the user. Anyways, there are a bunch of great products already. You can learn more about them at commandbar.com. Thanks to them for sponsoring this episode, and now back to Mark. When you say into the model itself, you train it on the thing that you want in the model itself,

20:40

what do you mean by into the model itself? Well, I think like the example that I gave for Lama 2, where we really, I mean, for Lama 2, the tool use was very specific. Whereas Lama 3 has the ability to have much better tool use. So we don't have to hand code all the stuff to have it use Google to go do a search. It just can do that. So, in similarly for coding and kind of running code and just a bunch of stuff like that. But I think once you get that capability,

21:18

then you get a peak of, okay, well, what can we start doing next? Well, I don't necessarily want to wait until Lama 4 is around to start building those capabilities. So let's start hacking around it. So you do a bunch of hand coding, and that makes the product better for the interim. But then that also helps show the way of what we want to try to build into the next version of the model. What is the community fine tune of Lama 3 you're most excited by? Maybe not the one that will

21:41

be most useful to you, but Jess, you'll just enjoy playing it with the most. They like fine tune it on in tick-a-dee, and you'll just be like talking to Virgil or something. What are you excited about? I don't know. I mean, I think the nature of the stuff is it's like you get surprised, right? So I think like any, any specific thing that I sort of thought would be valuable, we'd probably be building, right? So, but I think you'll get distilled versions. I think you'll get kind of

22:12

smaller versions. I mean, one thing that I think is 8 billion, I don't think is quite small enough for a bunch of use cases, right? I think like over time, I'd love to get a billion parameter model, or a two billion parameter model, or even like a, I don't know, maybe like a 500 million parameter model and see what you can do with that, because I mean, as they start getting, if with 8 billion parameters were basically nearly as powerful as the largest Lama-2 model, then with a billion

22:41

parameters, you should be able to do something that's interesting, right? And faster, good for classification, or a lot of kind of like basic things that people do before kind of understanding the intent of a user query and feeding it to the most powerful model to kind of hone what the prompt should be. So, I don't know, I think that's one thing that maybe the community can help fill in, but I mean, we'll also, we're also thinking about getting around to distilling some of these

23:07

ourselves, but right now the GPUs are, like, peg training the 405, so. Okay, so you have all these GPUs, I think you said 350,000 by the end of the year. That's the whole fleet. I mean, I was, we built two, I think it's like 22,000, 24,000 clusters that are kind of the single clusters that we have for training the big models. I mean, obviously across a lot of the stuff that we do, a lot of our stuff goes towards training, like, Reels models and like Facebook newsfeed and Instagram feed,

23:38

and then inference is a huge thing for us, because we serve a ton of people, right? So our ratio of inference compute required to training is probably much higher than most other companies that are doing this stuff, just because the sheer volume of the community that we're serving. Yeah,

23:55

yeah. Yeah, that was really interesting in the material they shared with me before that you trained it on more data than as compute optimal just for training, because the inference is such a big deal for you guys and also for the community that it makes sense to just have this thing have trillions of tokens in there. Yeah, yeah, although in one of the interesting things about it, that we saw even with the 70 billion, as we thought it would get more saturated at,

24:18

it's like training on around 15 trillion tokens. I guess our prediction going in was that it was going to ask them to it more, but even by the end it was still learning, right? It's like, we probably could have fed it more tokens and it would have gotten somewhat better, but I mean, at some point, you're running a company, you need to do these meta reasoning questions of like, all right, how do I want to spend our GPUs on like training this 70 billion model further? Do we

24:45

want to kind of get on with it so we can start testing hypotheses for a lot of four? So we kind of needed to make that call and I think we got it, I think we got to a reasonable balance for for this version of the 70 billion. There will be others in the future where you know, the 70 billion multimodal one that'll come over the next period, but yeah, I mean, that was fascinating that you could just that it's the architectures at this point can just take

25:11

so much data. Yeah, that's really interesting. So what is this implied by future models? You mentioned that the Lama 3, 8B is better than the Lama 270B? No, it's nearly as good. I don't know, I don't know, but does that mean like the Lama 4 or Ragnitude? Does that mean like the Lama 4, 7B will be as good as the Lama 3, 4, 5B? Like, what is the future of the great question?

25:31

That's right that I think no one knows is basically, you know, it's one of the trickiest things in the world to plan around is when you have an exponential curve, how long does it keep going for? And I think it's likely enough that it will keep going that it is worth investing the tens or you know, 100 billion plus in building the infrastructure to assume that if that kind of keeps going, you're going to get some really amazing things that are just going to make amazing

26:02

products. But I don't think anyone in the industry can really tell you that it will continue scaling at that rate for sure. Right? In general, you know, in history, you hit bottlenecks at certain points and now there's so much energy on this that maybe those bottlenecks get knocked over pretty quickly. But I don't know, I think that's that's an interesting question. What does the world look like where there aren't these bottlenecks? You know, suppose like

26:28

progress just continues at this pace, which seems like plausible. Like zooming out. Well, they're pretty much like, they're going to be different bottlenecks. Right. So if not training, then like, oh yeah, go ahead. Well, I think at some point, over the last few years, I think there's this issue of GPU production.

26:47

Yeah. Right. So even companies that had the models, sorry, that had the money to pay for the GPUs, couldn't necessarily get as many as they wanted because there were all these supply constraints. Yeah. Now I think that's sort of getting less. So now, I think you're seeing a bunch of companies think about, wow, we should just like really invest a lot of money in building out these things.

27:10

And I think that that will go for some period of time. I think there's a there is a capital question of like, okay, at what point does it stop being worth it to put the capital in, but I actually think before we hit that, you're going to run into energy constraints. Right. Because I just, I mean, I don't think anyone's built a gigawatt single training cluster yet. Right. And then you run into these things that just end up being slower in the world. Like getting energy permitted is like a

27:43

very heavily regulated government function. Right. So you're going from on the one hand software, which is somewhat regulated. I'd argue that it is more regulated than I think a lot of people in the in the tech community feel, although it's obviously different. If you're starting a small company, maybe you feel that less. If you're a big company, you know, we just interact with people, but different governments and regulators are, you know, we have kind of lots of rules that we need

28:09

to kind of follow and make sure we do a good job with around the world. But I think that there's no doubt that like energy. And if you're talking about building large new power plants or large buildouts and then building transmission lines that cross other private or public land, that is just a heavily regulated thing. So you're talking about many years of lead time. So if we wanted to stand up to some like massive facility to power that, I think that that is that's a very

28:41

long term project. Right. And so I don't know. I think that that's I think people do it. But I don't think that this is like something that can be quite as magical is just like, okay, you get a level of AI and you get a bunch of capital and you put it in and then like all of a sudden the models are just going to kind of like interest. Like I think you do hit different bottlenecks along the way. Yeah. Is there something a project? Maybe I really did. Maybe not that even a company like Meta

29:06

doesn't have the resources for it. Like if you're R&D budget or a cap ex budget was 10x what it is now, then you could pursue it. Like it's in the back of your mind. But Meta today and maybe you could like even you can't even issue stock or bond for it. It's like just 10x bigger than your budget. Well, I think energy is one piece. Yeah. Right. I think we would probably build out bigger clusters than we currently can if we could get the energy to do it. So I think that's that's that's

29:35

fundamentally money bottleneck in the limit. Like if you had a trillion dollars. I think it's time. Yeah. Right. Well, if you look at it in terms of, but it depends on how far the the exponential curves go. Right. Like I think a number of companies are working on, you know, right now I think I did like a lot of data centers around the order of 50 megawatts or 100 megawatts or like a big one might be 150 megawatts. Okay. So you take a whole data center and you fill it up with just all the

30:01

stuff that you need to do for training and you build the biggest cluster you can. I think you're that's kind of I think a bunch of companies are running at stuff like that. But then when you start getting into building a data center that's like 300 megawatts or 500 megawatts or a gigawatt, I just I mean just known as built a single gigawatt data center yet. So I think it will happen. Right. I mean, this is only a matter of time, but it's it's not going to be like

30:27

next year. Right. It's I think that some of these things will take I don't know some some number of years to build out. And then the question is, okay, well, if you I mean, just to I guess put this in perspective, I think a gigawatt, it's like around the size of like a meaningful nuclear power plant only going towards training a model. Didn't it didn't Amazon do this? There's like they have a 950 megawatt. Yeah, I'm not exactly sure what you did. You have to what they did. You don't have to

30:58

ask them. But it doesn't have to be in the same place, right? If distributed training works, it can be that I think is a big question. Yeah. Right. It's basically how that's going to work. And I do think in the future, it seems quite possible that more of what we call training for these big models is actually more along the lines of inference generating synthetic data to then go feed into the model. So I don't know what that ratio is going to be, but I consider the generation

31:28

of synthetic data to be more inference than training today. But obviously if you're doing it in order to train a model, it's part of the broader training process. So I don't know, that's an open question is to kind of where what the balance of that and how that plays out. If that's the case, would that potentially also be the case with Lama 3? And maybe like Lama 4 onwards where you put this out and if somebody has a ton of compute, then using the models that you've put out, you can

31:54

just keep making these things arbitrarily smarter. Like some Kuwait or UAE or some random country has a ton of compute and they can just actually use Lama 4 to just make something much smarter. I do think that they're going to be dynamics like that, but I also think that there is a fundamental limitation on kind of the network architecture, or the model architecture. So I think like a 70 billion model that kind of we trained with the Lama 3 architecture can get

32:31

better, right? It can keep going. Like I was saying, it's, you know, we felt like if we kept unfeeding it more data or rotated the high value tokens through again, then you know, it would continue getting better. But and we've seen a bunch of other people around the world, you know, different companies basically take the Lama 2 70 billion base, like take that model architecture

32:54

and then build a new model. It's still the case that when you make a generational improvement to the kind of Lama 3 70 billion or the Lama 3 405, there's nothing open source anything like that today, right? Like it's not, I think that's like it's a big step function and what people are going to be able to build on top of that I don't think can go infinitely from there. I think it can, there can be some optimization in that until you get to the next step function.

33:21

Yeah. Okay. So let's zoom out a little bit from specific models and even the many years lead times you would need to get energy, purples and so on. Like big picture, these next couple of decades. Sure. What's happening with AI? Does it feel like another technology, like metaverse or social or does it feel like a fundamentally different thing in the course of human history? I think it's going to be pretty fundamental. I think it's going to be more like

33:50

the creation of computing in the first place, right? So you'll get all these new apps in the same way that when you got the web or you got mobile phones, you got like people basically rethought all these experiences and a lot of things that weren't possible before now became possible. Something that will happen, but I think it's a much lower level innovation. It's going to be more like going from people didn't have computers to people have computers. It's my sense.

34:24

But it's also it's a, I don't know, it's very hard to reason about exactly how this goes. I tend to think that even like the cosmic scale obviously it'll happen quickly over a couple of decades or something, but I do think that there is some set of people who are afraid of like you know, it really just kind of spins and goes from being like somewhat intelligent to extremely intelligent overnight. And I just think that there's all these physical constraints that make that

34:55

so that that's unlikely to happen. I just don't I don't really see that that playing out. So I think you'll have time to kind of acclimate a bit, but it will really change the way that we work and give people all these creative tools to do different things that they, yeah, I think it's going to be it's going to really enable people to do the things that they want a lot more as it is my view. Okay, so maybe not overnight, but is it your view that like on a cosmic scale if you think like

35:24

humans evolved? And then like AI happened and then they like went out through the galaxy or maybe it takes many decades, maybe it takes a century, but like, is that like the grand scheme of what's happening right now in history? Sorry, in what sense? I mean, in the sense that there were other technologies like computers and even like fire, but like the AI happening is as significant as

35:46

humans evolving in the first place. I think that's tricky. I think people like to, yeah, I mean, the history of humanity, I think, has been people basically, you know, thinking that certain aspects of humanity are like really unique in different ways. And then coming to grips with the fact that that's not true, but humanity is actually still super special, right? So it's, it's like we thought that the earth was the center of the universe. And it's like it's not, but like,

36:23

it's like humans are still pretty awesome, right? And pretty unique. I think that another bias that people tend to have is thinking that intelligence is somehow kind of fundamentally connected to life. And it's not actually clear that it is, right? I think like people think that, I mean, I don't know that we have a clear enough definition of consciousness or life to kind of fully interrogate this, but I think there's all the science fiction about, okay, you create intelligence and now it

37:02

starts taking on all these human like behaviors and things like that. But I actually think that the current incarnation of all this stuff, at least, kind of feels like it's going in a direction where intelligence can be pretty separated from consciousness and agency and things like that. That,

37:19

I think just makes it a super valuable tool. So I don't know, I mean, obviously it's, it's, it's very difficult to predict what direction the stuff goes in over time, which is why I, I don't think anyone should be dogmatic about, you know, how they plan to develop it or what they plan to do. I think you want to kind of look at like each release. You know, it's like we're obviously very pro-open source. Yeah, but I haven't committed that we're going to like release every single

37:40

thing that we do. But it's basically, we, like I'm just generally very inclined to thinking that open sourcing it is going to be good for the community and also good for us, right? Because we'll benefit from, from the innovations. But if it's at some point, like there's some qualitative change in what the thing is capable of and we feel like it's just not responsible to open source, that then we won't. But, um, so I don't know, it's, it's, it's all, it's all very difficult to

38:06

predict. Yeah. Um, what is a kind of qualitative change? It's a specific thing. You're training laminify lamin for and you've seen this and like, you know, I'm not sure about open sourcing it. Um, I think that that it's a little hard to answer that in the abstract because there are negative behaviors that any product can exhibit that as long as you can mitigate it, it's like, it's okay, right? So, um, I mean, there's bad things about social media that we work

38:37

to mitigate, right? There's bad things about Lama 2 that we spend a lot of time trying to make sure that it's not like, you know, helping people commit violent acts or things like that, right? I mean, that doesn't mean that it's like a kind of a autonomous or intelligent agent. It just means that it's learned a lot about the world and it can answer a set of questions that, we think it would be unhelpful for it to answer. Um, so I, um, I don't know, I think the question

39:04

isn't really what behaviors would it show? It's what things would we not be able to mitigate after it shows that and, um, and I don't know, I think that there's so many ways in which something can

39:18

be good or bad that it's hard to actually enumerate them all up front. If you even look at like, what we've had to deal with in, um, you know, social media and like the different types of harms, we've basically gotten to it's like, there's like 18 or 19 categories of harmful things that that people do and we've basically built AI systems to try to go identify what those things are, the people are doing and try to make sure that that doesn't happen on our network as much as possible.

39:42

So, um, yeah, I think you, you can over time, I think you'll be able to break down, um, this into more of a taxonomy too. And I, I think this is a thing that we spend time researching too because we want to make sure that we understand that. So one of the things I asked Mark is what industrial scale use of LLUMS would look like? You see this in previous technological revolutions where at first, they're thinking in a very small scale way about what's enabled. And I think that's what chatbots

40:07

might be for other LUMS. And I think the large scale use case might look something like what V7 go is. And by the way, it's made by V7 labs who's sponsoring this episode. So it's like a spreadsheet, you put in raw information like documents, images, whatever, and they become rows and the columns are populated by an LLM of your choice. And in fact, I used it to prepare for Mark. So I fed in a bunch of blog posts and papers from Metas AI research. And as you can see, if you're on YouTube,

40:36

it summarizes and extracts exactly the information I want as columns. And obviously mine is a small use case. But you can imagine, for example, a company like FedEx has to process half a million documents a day. Obviously a chatbot can't do that. A spreadsheet can, because this is just like a fire hose of intelligence in there, right? Anyways, you can learn more about them at v7labs.com, slashgo, or the link in the description. Back to Mark. Yeah. Like it seems to me it would be a good

41:04

idea. I would be disappointed in a future where AI systems aren't broadly deployed and everybody doesn't have access to them. At the same time, I want to better understand the mitigations. Because if the mitigation is the fine tuning, well, the whole thing about open weights is that you can then remove the fine tuning, which is often superficial on top of these capabilities. Like, if it's like talking on Slack with a biology researcher, I think like models are very far from

41:30

this. They're right now they're like Google search. But it's like I can show them my P3DIS and they can next lane. Here's why you're a smallpox sample didn't grow. Here's what to change. How do you mitigate that? Because somebody can just fine tune that in there, right? Yeah. I mean, that's true. I think a lot of people will basically use the off-the-shelf model and some people who have basically bad faith are going to try to strip out all the bad stuff.

41:56

So I do think that that's an issue. The flip side of this is that, and this is one of the reasons why I'm kind of philosophically so pro-open source, is I do think that a concentration of AI in the future has the potential to be as dangerous as kind of it being widespread. So I think a lot of people are, they think about the questions of, okay, well, if we can do this stuff, is it bad

42:25

for it to be outwild, like just in kind of widely available? I think another version of this is like, okay, well, it's probably also pretty bad for one institution to have an AI that is way more powerful than everyone else's AI. So if you look at like, I guess one security analogy that I think of is, you know, it doesn't take AI to basically, okay, there's security holes in so many different things. And if you could travel back in time a year or two years, right? It's like,

43:02

that's not AI. It's like, you just, let's say you just have like one year or two years more knowledge of the security holes. It's pretty much hacking to like any system, right? So it's not that far fetch to believe that a very intelligent AI would probably be able to identify some holes. And basically be like a human who could potentially go back in time a year or two and compromise

43:23

all these systems. Okay, so how have we dealt with that as a society? Well, one big part is open source software that makes it so that when improvements are made to the software, it doesn't just kind of get stuck in one company's products, but it can kind of be broadly deployed to a lot of different systems, whether it's banks or hospitals or government stuff and like just everyone can kind of like as the software gets hardened, which happens because more people can see it and more people

43:48

can bang on it and there are standards on how the stuff works. The world can kind of get upgraded together pretty quickly. And I kind of think that a world where AI is very widely deployed in a way where it's gotten hardened progressively over time and is one where all the different systems will be in check in a way that seems like it is fundamentally more healthy to me than one where this is more concentrated. So there are risks on all sides, but I know that's one risk that I think

44:23

people I don't hear them talking about quite as much. I think like there's sort of the risk of like, okay, well, what if the AI system does something bad? I am more like, you know, I stay up at night more worrying, well, what if like some actor that whatever it's like from wherever you sit,

44:39

there's going to be some actor who you don't trust. If they're the ones who have like the super strong AI, whether it's some like other government that we that is sort of like an opponent of our country or some company that you don't trust or whatever it is, like I think that that's potentially a much bigger risk. As in they could like overthrow our government because they have a weapon

45:05

that like nobody else has. It's cause a lot of mayhem, right? I think it's like, I think the intuition is that this stuff ends up being pretty kind of important and invaluable for both kind of economic and kind of security and other things. And I don't know, I just think yeah, if like if someone who you don't trust or is an adversary of you get something that is more powerful than I think that that could be an issue. And I think the probably the best way to

45:32

mitigate that is to have good open source AI that basically becomes the standard. And in a lot of ways kind of can become the leader. And in that way, it just ensures that it's a much more kind of even and balanced playing field. Yeah, that seems plausible to me. And if that works out, that would be the future I prefer. I guess I want to understand mechanistically how if somebody was going to cause me a ham with AI systems, how the fact that there are other open source systems

46:02

in the world prevents that. Like this specific example of like somebody coming with a bio weapon. As it just that will do a bunch of like R&D in the rest of the world to like figure out the vaccines really fast. Like what's happening? If you take like the computer security one that I was talking about, I think someone with a weaker AI trying to hack into a system that is like protected by a stronger AI will succeed less. Right. So so I think that that's I mean, that's like in terms

46:27

of how do you know everything in the world is like that? Like what if bio weapons aren't like that? No, I mean, I don't know that everything in the world is like that. I think that that's I guess one of the bio weapons are one of the areas where I think the people who are most worried about the stuff are focused and I think that that's I think that makes a lot of sense to think about that. The I think that there are certain mitigations you can try to not train certain knowledge into

46:58

the model. Right. There's different things but yeah, I mean it's some level. I mean, if you get a sufficiently bad actor and you don't have other AI that can sort of balance them and understand what's going on and what the threats are, then that could be a risk. So I think that that's one

47:18

of the things that we need to watch out for. Is there something you could see in the deployment of these systems where you you observe like you're training Lama 4 and it's like light you because it thought you were noticing or something and you're like, whoa, I what's going on here? Not that you this is probably not likely with the Lama 4 tie system but is there something you can imagine like that where you'd like be we really concerned about deceptiveness and if like

47:43

billions of copies of things are on the wild. Yeah, I mean, I think that that's not necessarily I mean right now it's where you see a lot of hallucinations. Yeah, it's more more that. I think it's an interesting question how you would tell the difference between a hallucination

48:02

and deception. But yeah, I mean, I look I mean, I think there's a lot of risks and things to think about the um the flip side of all this is that there are also a lot of I try to in in running our company at least balance what I think of as these longer term theoretical risks with what I actually think are quite real risks that exist today. So like when you talk about deception, the form of that that I worry about most is people using this to generate misinformation

48:35

and then like pump that through whether it's our networks or others. So the way that we've basically combatted a lot of this type of harmful content is by building AI systems that are smarter than the adversarial ones. And like this is part of this kind of informs part of my theory on this right is if you look at like the different types of harm that people do or try to do through

48:57

social networks. There are ones that are not very adversarial. So for example, I got hate speech I would say is not super adversarial in the sense that like people aren't getting better at being racist right there just like it's you just like okay if you kind of that's one where I think the AIs are generally just getting way more sophisticated faster than people are at those issues. So we have and we have issues both ways it's like people do bad things that whether

49:31

they're trying to incite violence or something. But we also have a lot of false positives right so where we basically censor stuff that we shouldn't and I think understandably make a lot of people annoyed. So I think having the AI that just gets increasingly precise on that that's going to be good over time. But let me give you another example which is like nation states trying to enter fear in elections. That's an example where they are absolutely they have cutting edge technology and

49:55

absolutely get better each year. So we block some technique they learn what we did they come at us with a different technique right it's not like a person trying to you know I don't say say mean thing. So right it's like it's it's they're they're basically they have a goal they're sophisticated they have a lot of technology. In those cases I still think the ability to kind of have RAI systems grow and in sophistication to faster rate than theirs have it's an arms race but I think we're

50:24

at least currently winning that arms race. So I don't know I think that that's but this is like a lot of the stuff that I that I spend time thinking about is like okay yes it is possible that whether it's llama four or llama five or llama six yeah we need to think about like what behaviors were we're observing and it's not just us. I think part of the reason why you make this open source

50:45

is that there are a lot of other people who study this too. So yeah we want to see what other people are observing what we're observing what we can mitigate and then we'll make our assessment on whether we can make it open source but I think for the foreseeable future I'm optimistic we will be able to and in the near term I don't want to take our eye off the ball of what our actual bad things that people are trying to use the models for today even if they're not existential but they're

51:12

like they're like pretty bad kind of day-to-day harms that we're familiar with and running our services that's actually a lot of what we have to I think spend our time on as well. Yeah yeah actually I found this synthetic data thing really curious. I'm actually interested in why you don't think like current models it makes sense why there might be an asymptote with just doing this synthetic

51:35

data again and again. Yeah. They get smarter and you use this kind of techniques you talk about in the paper or the blog post that's coming out on the day this will be released where it goes to the thought chain that is the most correct. Why this wouldn't like lead to a loop that of course it wouldn't be overnight but over many months or years of training potentially with a smarter model it gets smarter makes better output gets smarter and so forth.

52:01

Well I think it could within the parameter of whatever the model architecture is. It's just that like it's some level I don't know. I think like today is 8 billion parameter models. I just don't think you're going to be able to get to be as good as the state-of-the-art multi-hundred billion parameter models that are incorporating new research into the architecture itself. But those will be open sources as well right? Well yeah but I think that that's if I mean

52:33

subject to all the questions that we just talked about. Sure. Yes. I mean we would we would hope that that'll be the case but I think that at each point I don't know it's like when you're building software there's like a ton of stuff that you can do with software but then at some level you're constrained by the chips that it's running on.

52:52

So there are always going to be different physical constraints and it's like how bigger the models is going to be constrained by how much energy you can get and and use for inference. So I guess I'm simultaneously very optimistic that this stuff will continue to improve quickly and also a little more measured than I think some people are about kind of it's I just don't

53:27

think the runaway case is like a particularly likely one. I think it makes sense to keep your options open like there's so much we don't know there's a case in which like it's really important to keep the balance of power so when nobody becomes like a tie-letter indicator there's a case

53:42

in which like you don't want to open source the architecture because like China's catch it can use it to catch up to America's AIs and like there is an intelligence explosion and they like win that yeah a lot of things can be possible just like keeping your options open considering all of them

53:55

seems reasonable yeah let's talk about some other things go for okay metaverse what time period in human history would you be most interested in going into a 100,000 BCE to now you just want to see what it was like for the past yeah it has to the past yeah I don't know I mean I have the

54:19

periods of time that I'm interested I mean I'm really interested in American history and classical history and I'm really interested in the history of science too so I actually think seeing and trying to understand more about how some of the big advances came about I mean all we have are like somewhat limited writings about some of that stuff I'm not sure the metaverse is going to let you do that because I mean it's um you know we can't we're it's going to be hard to kind of go back in time

54:49

for things that we don't have records of but I'm actually not sure that going back in time is going to be that that that important thing for the I mean I think it's going to be cool for like history classes and stuff but um that's probably not the use case that I'm most excited about for the

55:04

for the metaverse overall I mean it's um I mean the main thing is just the ability to feel present with people no matter where you are I think that's going to be killer I mean there's um um I mean in the AI conversation that we that we're having I mean it's uh you know so much of it

55:21

is about physical constraints that kind of underlie all of this right and you want to move I mean one lesson of technology is you want to move things from the physical constraint realm into software as much as possible because software is so much easier to build and and evolve and

55:37

like you can democratize it more because like not everyone is going to have a data center but like a lot of people can can kind of write code and take open source code and modify it um the metaverse version of this is I think enabling realistic digital presence

55:55

is going to be just an absolutely huge difference for um for making it so that um people don't feel like they have to physically be together for as many things um now I mean I think that they're going to be things that are better about being physically together um so it's not I mean

56:13

these things aren't binary it's not going to be like okay now it's you don't need to do that anymore but um but overall I mean I think that this it's just going to be really powerful for for socializing for feeling connected with people for working um for I don't know parts of industry for medicine

56:34

for like I'll like so many things I want to go back to something you said at the beginning the conversation where um you didn't sell the company for a billion dollars and like the metaverse you knew we were going to do this even though the the market was hammering you for it

56:45

and then I'm actually curious like what is the source of that edge and you said like oh values I have this intuition but like everybody says that right like what if you had to say something that's specific to you what is how would you express what that is like what why are you so

56:58

convinced about the metaverse um um well I think that those are different questions so what what what are the things that that kind of power me um I think we've talked about it into the theme so it's I mean I just really like building things um I specifically like building things around

57:24

how people communicate and sort of understanding how people express themselves and how people work right as everyone I was in college I was I was I said computer science and psychology I think a lot of other people in the industry studied computer science right so um it's uh it's uh it's always

57:38

been sort of the intersection of those two things for me but I think it's also sort of this like really deep drive I don't know how to explain it but I just feel like in the constitutionally like I'm doing something wrong if I'm not building something new right and um so I think that there's like

58:05

you know even when we're putting together the business case for you know investing like a hundred billion dollars in AI or some huge amount in the metaverse it's like yeah I mean we have plans that I think make it pretty clear that if our stuff works it'll be a good investment

58:21

but like you can't know for certain from the outset and um there's all these arguments that people have you know whether it's like you know with advisors or or different folks it's like well how how could you like it's how are you confident enough to do this and it's like well

58:38

the day I stop trying to build new things I'm just done I'm gonna go build new things somewhere else right it's like um it's like it is I'm fundamentally incapable of running something or in my own life and like not trying to build new things that I think are interesting it's like that's not even

59:00

a question for me right it's like whether like whether we're gonna go take a swing at like building the next thing it's like it's like I'm just incapable of not doing that um and I don't know I'm kind of like this and like all the different aspects of my life right it's like

59:18

we built this like you know family built this ranch and koi and like I just like worked like design all these buildings I'm like kind of trying to like we start raising cattle and I'm like all right well I want to make like the best cattle in the world right so it's

59:34

like how do we like how do we architect this so that way we configure this out and like in build and call the stuff up that we need to try to do that um so I don't know that's me um what was the other part of the question look meta is just a really amazing tech company right they've all

59:51

these great software engineers and even they work with Stripe to handle payments and I think that's just a really notable fact that Stripe's ability to engineer these checkout experiences is so good the big company is like Ford zoom meta even open AI they work with Stripe to handle payments

01:00:08

because just think about how many different possibilities you have to handle if you're in a different country you'll pay a different way and if you're buying a certain kind of item that might affect how you decide to pay and Stripe is able to test these fine-grained optimizations across

01:00:20

tens of billions of transactions a day to figure out what will convert people and obviously conversion means more revenue for you and look I'm not a big company like meta or anything but I've been using Stripe since long before they were advertisers Stripe Atlas was just the easiest way for me to set up an LLC and they have these payments and invoicing features that make it super convenient for me to get money from advertisers and obviously without that it would have been much harder for me

01:00:44

to earn money from the podcast and so it's been great for me go to Stripe.com to learn more thanks to them for sponsoring the episode now back to Mark I'm not sure but that I'm actually curious about something else which is um so in 19 year old Mark reads a bunch of like antiquity and classics high school college what important lesson did you learn from it not just interesting things you found but like there aren't that many tokens you're consumed by the time you're 19 a bunch of

01:01:10

them were about the classics clearly that was important in some way. I don't know that's a good question I mean one of the things that I thought was really fascinating is um so when a gustus was first so he became emperor and um and he was trying to establish peace and the there was no real

01:01:44

conception of peace at the time like the people's people's understanding of peace was it is the temporary time between when your enemies will inevitably attack you again so you get like a short rest and and he had this view which is like look like we want to change the economy from instead of

01:02:02

being so mercenary and like in kind of militaristic to like actually this positive something it's like a very novel idea at the time um I don't know I think that there's like something that's just really fundamental about that it's like in terms of the the bounds on like what people can

01:02:25

conceive at the time of like what are rational ways to work and um I'm going back to like I'm this applies to both the metaverse and the AI stuff but like a lot of investors and just different people just can't wrap their head around why we would open source this and it's like are like

01:02:43

like I don't understand it's like open source that much just be like the temporary time between which you're making things proprietary right and it's um but but I actually think it's like this very profound thing in tech that has actually it it creates a lot of winners right and it's

01:03:03

and and um so I don't know I don't want to strain the analogy too much but but I do think that there's um there's a lot of times I think ways where you can that are just like models for building things that people can't even like they just like often can't

01:03:22

wrap their head around how that would be a valuable thing for people to go do or like a reasonable state of the world that it's I mean it's uh I think there's more reasonable things than people think that's super fascinating um could I give you my answer what I was thinking what you might

01:03:40

have gotten from it um this is probably totally off but um just how young some of these people are who have very important roles in the empire like Cesar Augustus like by the time he's 19 he's actually incredibly one of the most prominent people in Roman politics and he's like leading battles

01:03:55

and forming the second time I remember it I wonder if you're like the 19 year old is like I can actually do this because like Cesar Augustus did this I think that's an interesting example both from a lot of history and American history yeah I mean it's um

01:04:07

um I mean one of my favorite quotes is it's this Picasso quote that all children are artists and the challenges how do you remain an artist when you grow up and it's like basically I think because when you're younger I think it's just easier to have kind of wild ideas and you're not you know you

01:04:27

have no there are all these analogies to the innovators dilemma that exist in your life as well as your company or whatever you've built right so you know you're kind of early around your trajectory it's easier to pivot and it's taken new ideas without it disrupting other commitments that you've made to to different things and um so I don't know I think that's an interesting part of of running a company is like how do you how do you kind of stay dynamic?

01:04:53

hmm um going back to the investors in open source uh the 10 billion dollar model suppose it's totally safe you've done these evaluations and unlike in this case the evaluators can also fine tune the model um which hopefully will be the case in future models uh would you open source set the 10 billion dollar model? well I mean as long as it's helping us then yeah but would it like

01:05:14

to 10 billion dollars of R&D and then now it's like open source anyway? well I think here's I think a question which we'll we'll have to evaluate this is time goes on too but um um we have a long history of open sourcing software right we don't tend to open source our product

01:05:31

right so it's not like we take we don't take like the code for Instagram and make it open source but we take like a lot of the low level infrastructure and we make that open source right the the the probably the biggest one in our history was open compute project where we took the designs for

01:05:48

kind of all of our um our servers and network switches and data centers and made it open source and ended up being super helpful because now I mean a lot of people can design servers but now like the industry standardized on our design which meant that the supply chains basically all got built

01:06:03

out around our design the volumes went up so it got cheaper for everyone and saved us billions of dollars so awesome right okay so there's multiple ways where open source I think could be helpful for us one is if people figure out how to run the models more cheaply well we're going to be spending

01:06:18

tens or like a hundred billion dollars or more over time um on all this stuff so if we can do that 10 percent more effectively we're saving billions or tens of billions of dollars okay that's probably worth a lot by itself um especially if there's other competitive models out there it's not like our

01:06:34

thing is like we'll be giving away some kind of crazy advantage um so you as your view that the trading will be commodified um I think there's a bunch of ways that this could play out that's one the um the other is is that so commodity kind of implies that it's going to get very cheap because

01:06:57

because there's lots of options the other direction that this could go in is qualitative improvements so um so you mentioned fine tuning right it's like right now it's it's um you know it's pretty limited what you can do with fine tuning major other models out there and there's some options but

01:07:13

generally not for the biggest models um so I think being able to do that and and be able to kind of do different app specific things or use case specific things or build them into specific tool chains I think will not only enable kind of more efficient development it could enable qualitatively

01:07:33

different things um here's one analogy on this is um so one thing that I think generally sucks about the mobile ecosystem is that like you have these two gatekeeper companies apple and google that can tell you what you're allowed to build and there are lots of times in our history so there's

01:07:52

the economic version of that which is like all right we build something there just like I'm going to take a bunch of your money but then there's the there's the um the qualitative version which is actually what kind of upsets me more which is there's a bunch of times when we've launched or

01:08:05

wanted to launch features and then apples just like nope you're not launching that which is like that sucks right and um so the question is what is like are we kind of set up for a world like that with AI where like you're going to get a handful of companies that run these close

01:08:24

models that are going to be in control of the APIs and therefore are going to be able to tell you what you can build um well for one I can say for for us it is worth it to go build a model ourselves to make sure that we're not in that position right like I don't want any of those other companies

01:08:40

telling us what we can build um but from an open source perspective I think a lot of developers don't want those companies telling them what they can build either um so the question is what is the ecosystem that gets built out around that what are interesting new things how much does that

01:08:55

improve our products um I think that there's a lot of cases where if this ends up being like you know like our databases or caching systems or architecture we'll get valuable contributions from the community that will make our stuff better and then our app specific work that we do

01:09:12

will still be so differentiated that it won't really matter right it's like we'll we'll be able to do what we do we'll benefit and all the systems ours and the communities will be better because it's open source there is one world where um maybe it's not that I mean maybe the model just ends up

01:09:27

being more of the product itself in that case then I think it's um it's a trickier economic calculation about whether you open source that because then you you are kind of commoditizing yourself a lot but I don't from what I can see so far it doesn't seem like we're in that zone um

01:09:42

do you expect to earn significant revenue from licensing your model to the Clip providers so they have to pay you a fee to actually serve the model um we want to have an arrangement like that but I don't know how significant it'll be and we have this um this is basically our license for

01:10:00

for a llama yeah um you know in in a lot of ways it's it's like a very permissive open source license except that we have a limit for the largest companies using it and this is why we put that limit in is we're not trying to prevent them from using it um we just want them to come

01:10:15

talked us because if they're going to just basically take what we built and resell it and make money off of it then it's like okay well if if you're like you know Microsoft Azure or Amazon then yeah if you're going to reselling the model then we should have some revenue share on that so just come

01:10:29

talk to us before you go do that and that's how that's played out so for llama 2 it's um I mean we basically just have deals with all these major cloud companies and llama 2 is available as a hosted service on all those clouds and um I assume that as we as we release bigger and bigger models that

01:10:47

will become a bigger thing it's not the main thing that we're doing but I just think if others are if those companies are going to be selling our models it makes sense that we should you know share the upside of that somehow yeah um with the rest of the other open source dangers I think I

01:10:59

think I've been genuinely legitimate of points about the balance of power stuff um and potentially like the harms you can get rid of because we have better alignment techniques or something um I wish there was some sort of framework that metahead like other labs have this where they say like

01:11:12

if we see this exact concrete thing then the that's an logo on the open source or like we're even potentially on deployment um just like writing it down to like uh the company is ready for it uh people have expectations around it and so forth yeah no I think that that's a fair point

01:11:27

on the existential risks side yeah right now we focus more on the types of risks that we see today which are more of these content risks so you know we have lines on we don't want the model to be basically doing things that are helping people commit violence or fraud or I know just

01:11:45

harming people in different ways so um in practice for today's models and I would guess the next generation then maybe even the generation after that I think while it is somewhat more maybe intellectually interesting to talk about the the existential risks I actually think the

01:12:05

the real harms that need more energy being mitigated are things that are going to like have someone take a model and do something to hurt a person with today is parameters of of and kind of the types of kind of more mundane harms that we see today like people kind of

01:12:23

committing fraud against each other things like that so um that I just don't want a short change that I think we we have a responsibility to make sure we do a good job on that yeah meta is a big company you can handle both yeah um uh okay so as far as the open source goes I'm actually curious if

01:12:40

you think the impact of the open source from pytorch react open compute these things has been bigger for the world than even the social media aspects of meta because I've like talked to people who use these services would think like it's plausible because a big part of the internet runs on

01:12:52

these things um it's an interesting question I mean I think almost half the world uses are yeah that's the truth so I think it's hard to beat that but no I think I think open sources it's really powerful as a new way of building things and yeah I mean it's possible I mean it's

01:13:18

you know it may be one of these things where I don't know it like L labs right where they you know it's like they were working on the transistor because they wanted to enable long distance calling and they did and it ended up being really profitable for them that they were able to

01:13:36

enable long distance calling and if you ask them five to ten years out from that um what was the most useful thing that they invented so okay well we enabled long distance calling and now all these people are long distance calling but if you ask a hundred years later maybe it's a different

01:13:51

question so um I think that's true of a lot of the things that we're building right reality labs some of the AI stuff some of the open source stuff I think it's like the specific products evolve and to some degree come and go but I think like the advances for humanity persist and that's like a

01:14:12

I don't know a cool part of what we all get to do um by when will the Lamma models be trained on your own custom silicon um soon not not not not Lamma 4 um the approach that we took is first we we basically built custom silicon that could handle inference for our ranking and recommendation

01:14:35

type stuff so Reels newsfeed ads and um that was consuming a lot of GPUs but when we we're able to move that to our own silicon we now we're able to use the more expensive and video GPUs only for training so um at some point we will hopefully have silicon ourselves that we can be using for

01:15:02

probably first training some of the simpler things that eventually training these like really large models um but in the meantime I'd say the program is going quite well and we're just rolling it out methodically and have a long-term roadmap for it uh final question this is totally out of the left field but um if you were made CEO of Google Plus could you have made a work

01:15:25

Google Plus? Oof well I don't know I don't know that's that's that's a that's a very difficult very difficult counterfactual okay then the real final question will be when Gemini was launched did you uh was there any chance that somebody in the office ordered Carthagod Dylinda S? No I think we're tamer now Google I'm not mad yeah I don't know it's a good question I don't know the the problem is there was no CEO of Google Plus it was just like a division within a company

01:16:00

I think it's like and you asked before about what are the kind of scaricist commodities but you asked about it in terms of dollars and I actually think for most companies it's um it's of this scale at least it's focus right it's like when you're a startup maybe you're more constrained on

01:16:15

capital um you know you you just are working on one idea and you you might not have all the resources I think you cross some threshold at some point where the nature of what you're doing you you're building multiple things and you're creating more value across them but you become more constrained on

01:16:31

what can you direct and to go well and like there's always the cases where something just random awesome happens in the organization I don't even know about it and those are that's great but like but I think in general the organization's capacity is largely limited by what like the CEO and

01:16:53

the in the management team are able to kind of oversee and and kind of manage it's I think that that's just been a big focus for us it's like all right keep the as as I guess Ben Horowitz says keep the main thing the main thing right and and try to kind of stay focused on your key priorities

01:17:14

yeah yeah all right awesome that was excellent Mark thanks so much that was a lot of fun yeah really fun thanks for me yep absolutely hey everybody I hope you enjoyed that episode with Mark as you can see I'm now doing ads so if you're interested in advertising on the podcast go to

01:17:29

the link in the description otherwise as you know the most helpful thing you can do is just share the podcast with people who you think might enjoy it you know your friends group chats Twitter I guess threads yeah hope you enjoyed and I'll see you on the next one

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Mark Zuckerberg - Llama 3, Open Sourcing $10b Models, & Caesar Augustus

Episode description

Transcript

Mark Zuckerberg - Llama 3, Open Sourcing $10b Models, & Caesar Augustus

Episode description

Transcript ✨

Transcript