Experiences where creation amongst 65 or 70 million people is just part of the way it goes. You really want to think about, okay, what's somewhat possible today? What do you see glimpses of today? If this is like, you have to fill out a thousand pages of paperwork and get 15 different licenses from different bodies to make an AI system, that's never going to work.
Entertainment is like this $2 trillion a year industry and like the dirty secret is that entertainment is imaginary friends that don't know you exist. And I wouldn't bet against startups in a general sense there. It's so early. The AI revolution is here. But as we collectively try to navigate this game changing technology, there are still many questions that even the top builders in the world are grappling to answer.
That is why A16Z recently brought together some of the most influential founders from OpenAI, Anthropic, Character AI, Reblocks and more to an exclusive event called AI Revolution in San Francisco. Today's episode continues our coverage of this event as we discuss the very real world impact of this revolution on industries ranging from gaming to design and the considerations around alignment along the way.
Now if you miss part one, do yourself a favor and keep that up next so that you can use drop on these top builders breaking down the current economics of this wave. Plus whether scaling laws will continue and how these models will evolve to capture more of the world around us. Plus if you'd like to listen to all the talks in full today, head on over to A16Z.com slash AI Revolution.
As a reminder, the content here is for informational purposes only should not be taken as legal, business, tax or investment advice or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the company's discussed in this podcast. For more details including a link to our investments, please see A16Z.com slash Disclosures.
As this wave continues to unfold, it is worth reflecting on just how wide-reaching it is. So in this episode we start with Mira Murati, co-founder and CTO of OpenAI, explaining how she ended up focusing her career here of all places, especially after a degree in mechanical engineering and after working as an aerospace engineer. There's not going to be a more important technology that we all build. Then building intelligence, it's such a core unit and the university affects everything.
But in order for AI to impact everything, we'll need a lot of compute. The good news is that it's on the way. Here is NUMSHA ZIR, co-founder of Character AI and lead author on the Seminole 2017 Transformer paper, calculating just how much compute will soon be available.
I think I saw an article yesterday like Nvidia is going to build another 1.5 million H100s like next year. So that's roughly a quarter of a trillion operations per second per person, which means that could be processing on the order of one word per second, on a 100 billion parameter model for everyone on Earth.
It's often not hard to convince people that this compute is coming or that it'll impact a wide variety of industries. But what can be hard to convince is that this disruption is a positive thing. But instead of pontificating, let's take a look back at what the democratization of technology has yielded in the past, from people running platforms with millions of users.
We'll start with Dylan Field, co-founder and CEO Figma commenting on how design has been shaped by technology for years. A16Z General Partner, David George, coming in with the...
Firey question, is AI actually going to take the job of the designer in the future? It's kind of interesting to put in the question is like, okay, well, there'll be less things to design or is AI going to do all the design work, right? It's like you're on one of those paths maybe. On the first one of, will there be less things to design?
If you go get every technological shift or platform shift so far, it's resulted in more things to design. So you'll like the printing press and then you have to figure out what you put on a page. And you've got even more recently mobile. You would think, okay, less pixels, less designers, right? But no, that's when we saw the biggest explosion of designers. And so maybe if you'd asked me this like, beginning of the year, I might have said, okay, well, we'll all have these chat boxes.
And people will be asking questions and then that's going to be our interface for everything. You know, look at OpenAI. They're on a hiring acquisition spree trying to get product people and designers right now so that they're able to make great consumer products that turns out design kind of matters.
The second one of will AI be doing the design is I think pretty interesting so far we're not there right now. We're at a place where AI might be doing the first draft right and game from first draft of final product actually turns out that's kind of hard. And usually takes a team.
But if you could get AI to start to suggest interface elements to people and do that in a way that actually makes sense, I think that could unlock a whole new era of design in terms of creating contextual designs designs that are responsive to what the user's intent is at the moment. And I think that be a fascinating era for sort of all designers to be working in, but I don't think it replaces the need for human designers.
So fewer pixels to design does not actually equate to fewer design nerves. And it turns out that many of the experiences where people are already spending hours a day have a lot of room for upside known here on how AI can drastically improve entertainment. Entertainment is like this $2 trillion a year industry and like the dirty secret is that entertainment is imaginary friends that don't know you exist.
And then people interact with TV or any of these other things it's called like these parasocial relationships like your relationship with TV characters or book characters or celebrities and everybody does it. It's actually a cool first use case for a GI like essentially there was the option to like go into like lots of different sorts of applications.
Have a lot of like overhead and requirements like you want to launch something that's a doctor it's going to be a lot slower because you want to be really really really careful about not providing like false information. But friend you can do like really fast like it's just entertainment it makes things up that's a feature.
And we likely won't build these fundamentally new experiences by dreaming them up in some lab. We'll get to create by iterating and putting these products into the hands of users. Here's Mira on how this approach underpinned chat GPT success thus far. We did make a strategic decision a couple of years ago to pursue product and we did this because we thought it was actually crucial to figure out how to deploy these models in the real world.
And it would not be possible to just you know sit in the lab and develop this thing in a vacuum without feedback from users from the real world. And also with chat GPT you know the week before we were worried that it wasn't good enough.
And we put it out there and then people told us it is good enough to discover new use cases and you see all this emerge and use cases that I know you've written about. And that's what happens when you make this stuff accessible and easy to use and put it in the hands of everyone.
There is beauty in putting such powerful tools into the hands of everyone but how do we measure how powerful these tools are. Since 1950 people look to the Turing test as one guide post but it turns out that the popular benchmark had its flaws. For one it's surprisingly easy to trick humans.
Now as the AI community looks for new guide posts and benchmarks here is David Vizuki co founder and CEO of Roblox proposing a quote new Turing test. Vettigweather and AI can reason pass the explicit data it's trained on.
I have a Turing test question for AI and that would be if we took AI in 1633 and trained on all the available information at that time would it predict the Earth or the Sun is the center of the solar system even though 99.9% of the information is saying the Earth is the center of the solar system. So I think five years is right at the fringe. If we were to run that AI Turing test it might say the Sun. Interesting.
Do you have a different answer if it was can use 10 years I think it will say the Sun. Now here's Dylan's version. What's the modern day Turing test and I feel like this question kind of comes up everywhere now and we're now seeing from these systems that it's easy to convince a human that you're human. It's hard to actually make good things. Yeah, like I could have TP for create a business plan and come pitch you.
That is a me you're going to invest when you actually have two businesses side by side and they're competing and one of them is run by an I and other ones run by a human and you invest in the I business then I'm worried. Yeah, we're not there yet. Finally here's mirror commenting on how open AI thinks about the threshold for AGI. How do you define AGI? In our open AI charter we define it as a computer system basically that is able to perform autonomously the majority of intellectual work.
Passing the Turing test is one thing but ensuring these models perform the goals that humans intend is another. Here, mirror shares, how arguably the most successful AI product, chat GPT, was born out of open AI trying to align the underlying model using reinforcement learning with human feedback. If you consider how chat GPT was born, it was not born as a product that we wanted to put out there.
In fact, the real roots of it go back to more than five years ago when we were thinking about how do you make these safe AI systems. You know, you don't necessarily want humans to actually write the goal functions because you don't want to use proxies for complex goal functions or you don't want to get it wrong.
And so this is where reinforcement learning with human feedback was developed. What we were trying to really achieve was to align the AI system to human values and get it to receive human feedback. And based on that human feedback, it would be more likely to do the right thing less likely to do the thing that you don't want it to do.
And then after we developed GPT 3 and we put it out there in the API, this was the first time that we actually had safety research become practical into the real world. And this happened through instruction following models. So we use this method to basically take prompts from customers using the API. Then we had contractors generate feedback for the model to learn from and we fine tuned the model on this data and build instruction following models.
There were much more likely to follow the intent of the user and to do the thing that you actually wanted to do. And so this was very powerful because AI safety was not just this theoretical concept that you sit around and you talk about, but it actually became, you know, sort of like how do you integrate this into the real world. And obviously with large language models, we see great representation of concepts, ideas of the real world, but on the output front, there are a lot of issues.
And one of the biggest ones is obviously hallucinations. So how do you get these models to express uncertainty and the precursor to child GPT was actually another project that we called Web GPT and it used retrieval to be able to get information and site sources. So this project then eventually turned into child GPT because we thought the dialogue was really special because it allows you to sort of ask questions to correct the other person to express uncertainty. There's just so much.
Exactly. There is this interaction and you can get to a deeper truth. So we started going down this path and at the time we were doing this with GPT 3 and then GPT 3.5. But you know, one thing that people forget is that actually at this time we had already trained GPT 4. And so internally at OpenAI we were very excited about GPT 4 and sort of put child GPT in the rear view mirror. And we kind of realized, okay, we're going to take six months to focus on alignment and safety of GPT 4.
And we started thinking about things that we could do. And one of the main things was actually to put child GPT in the hands of researchers out there that could give us feedback since we had this dialogue modality. And so this was your original intent to actually get feedback from researchers and use it to make GPT 4 more aligned and safer and more robust, more reliable and eventually plan.
I mean, just for clarity, when you say align and safety, do you include in that like correct and does what it wants? Or do you mean actual like protecting from some sort of harm? By alignment, I generally mean that it aligns with the user's intent. So it does exactly the thing that you wanted to do. But safety includes other things as well like misuse.
Where the user is intentionally trying to use the model to create harmful outputs. In this case with child GPT we were actually trying to make the model more likely to do the thing that you wanted to do to make it more aligned. And we also wanted to figure out the issue of hallucinations, which is obviously an extremely hard problem. So I do think that with this method of 3.4 million with human feedback, maybe that is all we need if we push this hard deal.
Tech week is back. And we're coming to New York City. We had over 750 events in San Francisco and all day this year. And starting on October 16th, there are already over 300 events on the calendar for New York Tech week. So to celebrate, we are giving away three tickets to A16Z's Welcome Party to Kix the whole week off. And there are several ways to enter, including you can retweet the giveaway announcement post.
You can also tweet your own attendance using hashtag NY Tech Week or you can let us know on YouTube by using the phrase see you at New York Tech Week. All the details and more can be found at a16z.com slash Tech Week NYC. Given that this field is so early, so are the methods of alignment. Here's another approach that Dario from Anthropic has proposed, one that involves a guiding constitution and AI that reinforces those principles.
Here's Dario in conversation with A16Z General Partner, Anjani Minha. The method that's been kind of dominant for steering the values and the outputs of AI systems up until recently has been RL from human feedback.
I was one of the co-inventors of that at OpenAI, but since then it's been improved to power chat GPT and the way that method works is that humans give feedback on model outputs, say which model outputs they like better and over time the model learns what the humans want and learns to emulate what the humans want. Constitutional AI, you can think of it as the AI itself giving the feedback. So instead of human raiders, you have a set of principles.
And our set of principles is in our constitution, it's very short, it's five pages, we're constantly updating it, there could be different constitutions for different use cases, but this is where we're starting from.
And whenever you train the model, you simply have the AI system, read the constitution, look at some tasks like, you know, summarize this content or give your opinion on X. And the AI system will complete the task and then you have another copy of the AI system, say, okay, was this in line with the constitution or was it not? At the end of this, if you train it, the hope is that the model acts in line with this guide star set of principles.
So as a result of that approach, you know, the seed of the constitution captures some set of values of the constitutional authors, right? How are you grappling with a debate that that means you are imposing your values on the constitutional system? Yeah, a couple of directions in that. So first we took the original constitution, you know, we tried to add as little of our own content as possible.
We added things from like the UN Declaration on Human Rights, just kind of like generally agreed upon, and kind of, you know, deliberative principles, some principles from like apples, terms of service, at their very vanilla, there are things like, you know, produce content that, you know, would be acceptable if shown to children or things like this, or, you know, don't violate fundamental human rights.
I think from there, we're going into directions. One is that different use cases, I think, demand different operating principles, and maybe even different values, like a psychotherapist probably behaves in a very different way from a lawyer. So the idea of having a kind of very simple core, and then specializing from there in different directions, is kind of a way not to have this kind of mono constitution that applies to everyone.
Second, we're looking into the idea of, I don't want to say crowd sourcing, but some kind of deliberative democratic process whereby people can design constitutions. To folks who aren't sort of privy to what's going on inside of Antropa, you can often seem paradoxical, because we found a way to efficiently sort of scale and keep the scaling laws proceeding at the same time we're big advocates of making sure that this doesn't happen very fast. What is the thinking behind that paradox?
Yeah, a few points on that. I think it's just kind of an inherently tricky situation with a bunch of trade-offs. I think one of the things that most drives the trade-offs is, and you see it a bit in constitutional AI, that the solution to a lot of the safety problems, the best solutions we found almost always involve AI itself.
So, there's a community of very theoretically oriented people who tries to work on AI safety kind of separate from the development of AI, and at least my assessment of this, I don't know if others would say it was fair, is that that hasn't been that successful. And that the things that have been successful, even though there's much more to do, we've only made limited progress so far, are areas where AI has kind of helped us to make AI safe.
Now, why would that happen? Well, as AI gets more powerful, it gets better at most cognitive tasks. One of the relevant cognitive tasks is judging the safety of AI systems, eventually doing safety research. So, there's this kind of self-referential component to it, and we even see it with areas like interpretability looking inside the neural nets where we thought at the beginning, we've had a team on that since the beginning, that that would be very separate.
But I think it's converged in two ways. One is that powerful AI systems can help us to interpret the neurons of weaker AI systems. So, again, there's that recursive process. And second, that interpretability insights often tell us a bit about how models work, and when they tell us how models work, they often suggest ways that those models could be better or more efficient.
As the industry continues to explore alignments and safety, we're already seeing this technology completely reshape industries, with a lot of opportunity on the horizon, even if AI systems are not always reliable yet. In general, I think we're sort of the most tasks, kind of like in turn level, I would say, that's what I generally say. The issue is reliability, right? You know, you can fully rely on the system to do the thing that you wanted to do all the time.
And, you know, how do you increase that reliability over time, and then how do you obviously expand the capabilities? The new, the emergent capabilities, the new things that these models can do? I think though, that it's important to pay attention to these emergent capabilities, even if they're highly unreliable. And especially for people that are building companies today, you really want to think about what's somewhat possible today.
What do you see glimpses of today? Because very quickly, these models could become reliable. Here are some glimpses of what may be to come, first in games. I think there's three categories. There is one category where people on our platform don't even think of it as AI, even though it's been going on for two or three or four years.
Quality of personalized discovery, quality of safety, civility, voice, and text, monitoring, asset, monitoring, quality of real time, natural translation, how good is our translation versus others. So that's the one that people don't notice. The next one is I think the one that's really exciting right now, which is generative, either code, generative, 3D object, generative, avatar, generative, game, generative, which is very interesting.
And then the future one, which is really exciting, is how far do we get to a virtual doppelganger or a general intelligence agent inside of a virtual environment that's very easy to create by a user. You want George Washington in your 12 year old school project, how good is George Washington? Or I'm not on Tinder, but if someday Tinder has a Roblox app, can I send my virtual doppelganger for the first 3D meeting kind of things?
So I'll think going all the way for the things we don't notice to the things that are exciting around generative to future than general intelligence, these are all going to change the way this one is.
When you think about the poets that go into building game, there are just so many pieces, right? There's the concepting, the storyboarding, there's the writing, there's the creation of the 2D images, the 3D assets, and there's the code, and the physics engine, and so Roblox has built many of these pieces into its own studio and its platform.
What poets do you think will be most affected by this new generation of generative battles that you just spoke about? Yeah, it's almost worth saying the anti-thesis, what will not be affected, because ultimately there will be acceleration on all of these.
We have a bit of an optimistic viewpoint right now because of the 65 million people on Roblox, most of them are not creating at the level they want to, and we for a long time imagined simulation of project runway where the early days of Roblox, we imagine project runway is just pretty skew more thick, you have sewing machines and fabrics, and it's all 3D simulated, and that's how you would do it.
But when we think about it, even that's kind of complex for most of us, and I think now when project runway shows up on Roblox, it will be text prompt, image prompt, avoid prompt, whatever you want, as if you're sitting there, and if I was helping you make that, say I have one kind of a blue denim shirt, I want some cool things, I want some buttons, make a little more trim, fit it, what we'll see those kind of creations.
I actually think we're going to see an acceleration of creation, for example, experiences where creation amongst 65 or 70 million people is just part of the way it goes, not been possible, an experience where there's millions of people acting as fashion designers and voting and picking who's got the best stuff.
And then possibly imagining some of that, going off and being produced in real life, or some of them being plucked up by Parsons and saying, okay, the future designer, you can imagine other genres like this, where you actually create on platform and then get identified as a future star. Most of the AI tools today operate in the second dimension, but naturally the Roblox team has its sight set on the third.
I think one area we're really watching that's a very difficult problem right now is true high quality 3D generation, as opposed to 2D generation. There's lots of wonderful 2D generations of out there, we're really double down on 3D generation. A couple of weeks ago, you had tweeted that the Roblox app on a meta quest had actually hit a million downloads just to first five days and it's beta form, which is out on the actual Oculus store. What are your thoughts on VR, spatial computing?
Our thesis has been that just as when the iPhone shipped and all of a sudden we had 2D HTML, consumable on a small screen rather than a large screen with the pinch and zoom and now we take it for granted. I think my kids probably don't realize there was some cheesy mobile web thing 10 years ago pre iPhone where browsers were large screen things. Now we just assumed 2D HTML is everywhere. I think 3D we feel is the same. It's immersive multiplayer in the cloud, simulated 3D.
And because of that, every device has better optimal for the device camera, optimal for the device user interaction, different levels of immersiveness. Your phone is not as immersive as your VR headset, but your phone is more spontaneous. So I think we felt that and we think the market ultimately figures out which device you assume this with. For any founders excited to build at the intersection of gaming and AI, here are some themes that are top of mind at Roblox.
What's the future of training cheaply at mega volume? What's the future of running inference cheaply at mega volume? What types of technology abstracts away different hardware devices? How can you run a mix CPU, GPU environment over time? We're very interested in that. So I think we're watching those types of text acts a lot. Another area of opportunity is a newfound ability to interact with unstructured data, especially as context Mendo's linked it. Your story.
One thing that I think people are starting to realize, but I think is still underappreciated, is the longer context and things that come along with that that we're working on, you know, things in the direction of retrieval or search, really open up the ability of the models to talk to very large databases. One thing we say is like, oh yeah, you can talk to a book. You can talk to a legal document. You can talk to a financial statement.
I think people have this still this picture in mind of like there's this chatbot. I ask you a question and it answers the question. But the idea that you can upload a legal contract and say, what are the five most unusual terms in this legal contract?
We're up low to financial statement and say, summarize the position of this company. What is surprising relative to what this analyst said two weeks ago? So all these kind of knowledge manipulation and processing of large bodies of data that take hours for people to read.
I think much more is possible with that than what people are doing. We're just at the beginning of it. And you know, that's an area I'm excited about. I'm particularly excited about because it's an area where I think there are a lot of benefits and all the costs that we've talked about.
And what about infinite context windows? Really the main thing holding back infinite context windows is just, you know, as you make the context window longer and longer, of course the majority of the compute starts to be in the context window.
So at some point it just becomes too expensive in terms of compute. So we'll never have literally infinite context windows. But we are interested in continuing to extend the context windows and to provide other means of interfacing with large amounts of data. Another area of interest for Dylan. Science. I feel like when it comes to science, just the applications of all this technology that's happening right now are still completely under-sport.
Whether it's, you know, using deep warning to get approximations of systems faster or figuring out how we can use to accelerate human progress in general. And why does still time to build? I understand the arguments for why incumbents may benefit in a disproportionate way. Basically every platform should this happen. People have claimed that and then it has been the case.
And so I think that if you're a startup, this is a pretty good time to basically pick the area that you think could really benefit from this technology and go after it. And I wouldn't bet against startups in a general sense there. It's so early and most of what I see coming right now is still at the foundational slash sort of base model area. And you know, if it's not that it's like infrastructure or dev tools and not all like, okay, how do we use this all the way up the stack?
And so I think that enterprises coming, you know, there's a lot of stuff that will show up in all these areas, but this can take some time. Plus here is Daria's take on why you can still take part, even if you don't have a deep background in AI. So my view is basically that there's two kinds of fields at any given point in time. There's fields where an enormous edifice of experience and accumulated knowledge has been built up.
And you need many years to become an expert in that field, the canonical example of that would be biology very hard to contribute groundbreaking or Nobel Prize work in biology if you've only been a biologist for six months. Then there are fields that are very young or that are moving very fast. AI was and is still as to some extent very young and is definitely moving very fast.
And so when that's the case, really talented generalists can often outperform those who have been in the field for a long time because things are being shaken up so much if anything having a lot of prior knowledge can be a disadvantage. Finally, if you needed any more convincing, a timely reminder from A6&Z General Partner, Martine Casado. The punchline is if you've ever wanted to start a startup or join a startup, now is a great time to do it.
Alright, thank you so much for listening to part two of our coverage of AI Revolution. We really hope you leave inspired to build and be a part of this wave. And if you'd like to visit all the talks in full today, don't forget to visit A6&Z.com slash AI Revolution. We will be back soon with two more episodes covering how AI is or isn't impacting the enterprise and the timely collision between machine learning and genomics. We'll see you then.
If you liked this episode, if you made it this far, help us grow the show, share with a friend, or if you're feeling really ambitious, you can leave us a review at ratethispodcast.com slash A6&Z. You know, candidly producing a podcast can sometimes feel like you're just talking into a void. And so if you did like this episode, if you liked any of our episodes, please let us know. We'll see you next time.