Gemini 2.0: 20x Cheaper Than GPT-4?! (DEEP DIVE) | Logan Kilpatrick | The Next Wave - AI and The Future of Technology podcast

⁠¶ Gemini 2.0 Launch Excitement

00:04

Hey, welcome back to the Next Wave Podcast. I'm Matt Wolfe. I'm here with Nathan Lanz. Today, we're joined by Logan Kilpatrick, who is the Senior Project Manager over at Google DeepMind. And the day we're recording this episode is the same day that Google... Google just released a whole bunch of new AI tools. Gemini Flash 2.0, Gemini Flash 2.0 Lite, Gemini Flash 2.0 Pro.

00:28

all sorts of really, really cool stuff coming out of Google right now. And Logan's going to break it all down and you're going to get a pretty grand overview of where the AI world is headed, according to Google. So let's just go ahead and dive right in with Logan Kilpatrick.

00:42

Thank you so much for joining us. It's probably a really busy day, so I really appreciate you taking the time to join us today. Yeah, I'm excited to catch up with you both and talk about all things Gemini and what's happening in the AI world.

00:53

Well, so this is actually your second time on the show, so we've already kind of dove into some of the backstory and introduced people to you in a past episode. So let's just jump straight into it. Can you break down what is 2.0 Flash, 2.0 Flashlight, 2.0 Pro? Like what are the differences? What's better about these models than what was out prior to them?

01:11

I think this is an exciting moment for us just because of the amount of effort and work that's gone into bringing Gemini 2.0 actually into the world. And Matt, you were there. Nathan, I don't remember if you were there at IO last year, but we announced 1.5 Flash and the...

01:26

on Context and a bunch of this other stuff last May, so literally less than a year ago. And for the last year-ish, Flash has been this wild success story for us of building a model that developers really love. And a lot of that is rooted in

01:40

the right trade-offs of cost, intelligence, performance, capabilities. And if you look at the 1.5 Flash model and you think about how do we do this better? It's like, you have to make it more powerful. You have to give it more capabilities. You have to do that all while not...

01:54

making it cost a lot more money for developers. And it feels like we pulled a rabbit out of the hat to a certain extent with 2.0 Flash, because like the actual cost for developers, it was historically like seven and a half cents per million tokens. Now it's 10 cents.

02:08

So the blended cost is actually less for this model. And we did all that while this model is actually better than pro. It has all these capabilities. It's natively agentic. It has search built into it. It has code execution built into it. Yeah, it's just exciting for me as a developer. I think ultimately you remove

02:25

the cost barriers and all these other things for people to build really cool stuff. And like, that's what enables the world to make really cool products. So I'm super excited. So that's the sort of headline for Flash is better, faster, cheaper, which continues to be sort of the tagline. to get my Gemini t-shirts that say better, faster, cheaper on them. I read that it's better than GPT-40, but like 20 times cheaper. Is that like roughly correct? Yeah.

02:49

Which is just crazy to me. And I think we've got a lot of work to do. I think one of the dimensions of the Gemini story is we continue to put out really great models. I think we need to do a great job as well of going and telling the world about this technology that we're building. Because I don't think people...

03:04

really understand. And actually for a lot of developers, the cost is the reason in many cases they don't build the stuff that they want to build. It's like, I can't afford to put this thing into production. It's too expensive. So I think Flash is really important in that story. But we also landed 2.0.

⁠¶ Cheaper Flashlight Model Previewed

03:18

pro we also landed an even cheaper version of flash flashlight which sort of has the capabilities pared down a little bit but makes it so that we can keep delivering on that like frontier cost performance sort of trade off and that models in preview and it'll go ga in the next few weeks as we iron out the last few bugs and i think pro gives me a lot of excitement about you know this whole narrative of

03:41

pre-training, being dead. An interesting sort of realization I had after a conversation with Jack Ray, who's one of the co-leads for the reasoning models in DeepMind, is there's this non... linear amount of extra effort it takes to make the models continue to get better. You look at like, okay, what is 3%? mean on some benchmark. You think 3% and we think of the normal world where 3% is actually 3%. And in the model world, 3% is actually a 25% increase in the amount of effort that went in.

04:15

into this but also that three percent is like the difference between unlocking a bunch of capabilities and a bunch of use cases that like just didn't work before because like the thing failing three to four percent of the time versus not is there

04:30

difference between you putting AI into production at your company and not. So it actually matters a lot. And that's why I think we continue to push on that frontier. So what's the big difference between like these new models and the last ones as far as like how they were created is it like more parameters that are being trained on obviously you know the big narrative like you just mentioned with things like deep seek and oh three and things like that from open ai are the sort of what happens

04:54

at inference right when somebody enters a prompt it does all of this thinking and that's really what they're sort of like pushing on is like the next sort of breakthrough like what sort of breakthroughs what changed between the last models and this one to make this one so much better

05:07

Yeah, there's two dimensions of this. One, it's a story of the really difficult work of doing algorithmic improvements and breakthroughs. And I think like the team at DeepMind stuff way beyond my understanding as far as how they're able to make this continue to work. So I think there's like core fundamentals.

05:22

research advancements that are happening. There's a lot of like data efficiency wins as well, which is also exciting. But as far as like new capabilities of these models, I think the two big ones is When Gemini was first announced, it was announced as this model that's natively multimodal. And it was natively multimodal in the sort of input sense that it could really understand the videos, audios, images that it was being given. And that was one of the main differences.

05:46

differentiators. Today, the model is actually capable of doing that except on the output sense, which I think was a huge jump for us. And it actually requires, again, a bunch of non-trivial amount of engineering work in order to make the models capable of doing that. I had an interesting conversation a few weeks ago with someone on our

06:04

reminded me of this. Someone asked a question of like, why does it matter if the models are capable of natively outputting these multimodal capabilities? Like we have really great text to speech models. We have great speech to text models. We have great image generation models. Like why is it? that the model can do this natively and there's all of these really great examples like you know a calculator versus like

06:26

I don't know, an AI model that has access to code execution. Like the code execution version can really like solve these problems in a really complicated way that you wouldn't otherwise be able to, or at least that the effort is required as you on the user of the models.

06:40

And I think that's the world of these custom domain-specific models like image generation and audio generation versus the native capability really feels like the model can just do the heavy lifting for you, which is really interesting. so right now can gemini actually output like an image if i give it a prompt to generate an image does it generate an image right now

06:58

Not accessible to everyone yet. And I think this is the gap. So we have it internally and folks are using it in our early access program. And we should get you both early access to play around with it and test it out. And we'll roll it out more broadly soon, which I'm excited about.

07:13

That sort of same line of thinking is what takes us to like native tool use as well. And like native tool use is available to everyone. And it's like the model was trained knowing how to differentiate questions that it should go and search the internet for or question.

07:27

that it needs to use a tool like code execution for. So you get all of those silly examples where the model would be like, let me try to solve this math problem, which I know I'm not going to be able to solve just because you asked me to. With code execution, it knows it needs to use. that tool. And there's a whole bunch of verticals where like the performance goes up significantly because of that. So Gemini is actually generating the image. It's not going and calling upon like

07:51

imagine three to generate the image. It's actually Gemini who's creating that image when it does generate an image. Exactly. And I'll push on getting you both access after this conversation, because I think the world knowledge piece really highlights like why this matters. And there's like a bunch of examples that I played around with of like pictures of a room and like having the image change.

08:11

based on these like really complex, nuanced prompts around moving objects in certain ways, like things that if it doesn't have world knowledge and understand, like it understands physics and understands all these things that, again, require the world knowledge piece.

08:25

I think it's actually there's some interesting trends of what is the outcome of like being able to take other domain specific models and bring them into these LLMs that have world knowledge. I think there'll be some really cool capabilities that like we're not thinking of. Yeah, it seems like that's kind of required for this also to work in like everything from like gaming to robotics to it actually have an understanding of the world. Oh, yeah, 100%.

⁠¶ Experiencing Gemini AI in London

08:51

Yeah, yeah. I mean, I actually had the opportunity to go out to London, go visit the DeepMind offices and got to play with Astra on the phone. And I mean, that was like the first. taste I got of like an actual useful AI assistant out in the real world. And I don't know if that was using Gemini 2.0 flash or if that was actually using the full Gemini 2.0 yet at the time, but it was definitely.

09:15

a very, very impressive model to actually see the tool use in real life and just be able to walk around and understand images and understand video and understand audio and understand text. And it was all built from the ground up. to understand that stuff as opposed to like, look at an image, use OCR to figure out the text on the image and then pull in the text or listen to the audio, transcribe the audio to text and then use the text. It's actually.

09:42

understanding what it's seeing what it's hearing which i think is like one of the major differentiators of about like what gemini is doing that you don't see the other models doing yet so super super impressive

09:54

Yeah, I think the other piece of this is other than just the raw capabilities from a complexity standpoint, it means that like developers building stuff, you don't have to go and like do a ton of scaffolding work in order to like make this happen. It's like the overall complexity of your application when the model is just. able to sort of

10:09

take in a bunch of things and put out a bunch of things makes life incredibly easy and you don't have to deal with like frameworks on frameworks on frameworks. So much of the agent world is like, hey, the models actually aren't that good at doing some of these things. And like the way that we supply.

10:24

that is by building a bunch of scaffolding and frameworks. That's where a lot of developers are focused today. And actually, there's going to be a moment where all of a sudden the model capabilities are just good enough that it kind of works. And then people are going to be like, well, why do I have all this scaffolding that's doing these things?

10:38

that the models can just do out of the box now. So it'll be interesting to see how that plays out. Yeah, I know like 2025 is going to be sort of the year of the agent, right? That term has already been thrown around quite a bit. But I also feel like everybody kind of has a different definition of an agent. You know, some people will look at something that like you can build over on make.com or Zapier, where it's tying different tools together using APIs as an agent.

11:00

But I'm curious, does Google and DeepMind, do they actually have like an internal definition of an agent that they're shooting for? Do you think we actually have agents now based on their definition? Where does Google stand on agents?

⁠¶ AI Agents: Need Proactive Models

11:11

I actually don't know what our formal definition of agents are. I have tried a bunch of the agent products and historically haven't been super impressed at what they're capable of. I think we're just not there yet. The thing that I want, and I think a lot of users want this... as well as just the models to be proactive and like all of the products of today that build on ai require me to basically

11:34

change my workflows or put in extra work or put in extra effort like through this sort of guise of like, oh, this is actually going to save you time if you do this thing. And like really what I want is like the models to just be like looking at the stuff that I give them access to.

11:48

and like coming up with ways to be useful and like save me time. And like, yeah, it's going to get some things wrong, but like I don't want to have to be the one in the driver's seat all the time. And it feels like today's AI agents, again, because the models aren't good enough, like have to be proactive.

12:02

Like it requires the proactiveness of the human. And I think once that role reversal switches, I think that's where we see like billion agent scale deployments, like all of a sudden just like happening and working. And this is also where like things get crazy.

12:16

again with compute because again like actually if you look at how compute is being used today it's in a lot of cases like this one-to-one correlation between like a human input and the token output and i think the future is thousands and thousands of x more usage of AI happening by the agents themselves than by humans, which will be fascinating to see play out. We'll be right back to the show.

12:40

But first, I want to tell you about another podcast I know you're going to love. It's called Entrepreneurs on Fire, and it's hosted by John Lee Dumas, available now on the HubSpot Podcast Network. Entrepreneurs on Fire stokes inspiration and shares strategies to fire up your entrepreneurial journey. and create the life you've always dreamed of.

12:56

The show is jam-packed with unlimited energy, value, and consistency. And really, you know, if you like fast-paced and packed with value stories and you love entrepreneurship, this is the show for you. And recently they had a great episode about how women are taking... So listen to Entrepreneurs on Fire wherever you get your podcasts. Yeah, no, I think what you described is sort of what I envision of an agent is almost like.

13:28

predictive like it sort of figures out what you need before you need it and then make suggestions based on that so i think that's a world that i'm really excited to get into but i do want to touch on the word compute that you just mentioned for a second because obviously there was a bit of like a you know a freak out so to speak in the u.s when that deep seek model came out and everybody thought that well

13:49

This DeepSeq model uses a lot less compute than these other models that have been trained. So therefore, you know, NVIDIA GPUs are no longer necessary. And then we saw NVIDIA sort of lose some market share as a result of it.

14:03

But I'm just curious, like, what are your thoughts on the compute? Because I saw all of that happening and I thought it was so bizarre. I was like, this seems like a bullish sign for Nvidia to me, not a bearish sign. Like, what's going on here? But I'm curious, like, what's your take on what happened there?

14:17

Yeah, there's a lot of complexity to that story and parts of the story that I don't want to touch on. But the thing that I do want to touch on is like, if you look around the world, I think this example that I just gave of like, who is in the driver's seat of using AI today?

⁠¶ Embracing Inefficiency for Productivity

14:32

Today, it's humans that are in the driver's seat. And we're just inherently bounded by the amount of humans who are using AI because it just takes a while for technology to assimilate into culture and real use cases and all that stuff. We're on this exponential right now. I think as soon as agents start to take off, that exponential becomes a straight line up and to the right. Like, I think it's going to be pretty profound because like, again, the challenge is.

14:56

The human process just doesn't scale. And the agent process is going to scale, which is going to be really interesting. Like I have 10,000 emails I haven't read in the last three months. I was thinking emails, first thing I was thinking of. I wanted to handle all of that for me.

15:10

but any of it. Yeah, it's going to be wonderful. And really like go in and find the things like, I know there's things that I should be doing that would create value. Missed opportunities. 100%. And there's like so many of those things where if you also like think about like, what's the economics?

15:24

value of all that work. Like one of the crazy frames of mind that I look at the world through is like, you look at the world and like the world is just filled with all this inefficiency and like, it's beautiful in many ways, but like, it's also this really cool opportunity.

15:39

you can be the one to create something that sort of makes people more productive and explicitly makes them more productive at maybe the things they don't want to do. And I feel like that's the other part of this agent story and this compute story, which is a lot of the products that I... I see people building are actually going after the things that people really like doing. And like maybe like shopping is this sort of.

16:02

tongue-in-cheek example of this because some people really like shopping and some people don't but like if i told my girlfriend that we could never go shopping again together and we could never go you know try out different experiences and go check out the vibe of different stores. Like there's so much of this is like such a fundamental part of the human condition actually of like going and seeing these different places. And like, it's really baked into who we are and that's.

16:25

such a traditional example of what agents are going to do for people. And it's odd to me. I feel like people have kind of misconstrued what the value creation is going to be in some of these examples.

16:36

I mean, I think that's like the bias from like, you know, Silicon Valley nerds building this stuff, right? Yeah. I don't want to shop. I just want it to be automated for me. And it's like, that's just like a small subject of the world. Like, you know, when I shop, I just like go in and get what I want. There's like two options. Okay. I like this one.

16:50

better i get it but my wife she just loves checking out stores and different shops and like she would hate the idea of like skipping all of that would just make no sense to her at all yeah i think there's a underlying story here which is like first of all there's a lot of variance in human preference but also there's like

17:05

ways about going about a certain task that like make it interesting or not. And like, I like shopping too, if it's like me getting to do it on the terms that I want. And like, it will be interesting to see like, how can agents and some of these products like actually help create that experience.

⁠¶ AI Infrastructure and Consumer Impact

17:19

And this goes back to this deep seek narrative around like the value creation happening at the application layer. And it really does feel like this is true. Like if you look back two years ago, the narrative was, you know, all these companies are just wrappers on top of AI. There's no value creation.

17:33

all the value creations in the tokens. Don't spend your time thinking about these companies. And it's so funny how quickly this flip-flops back and forth between now all the value creation is at the application layer and LLMs are this commodity. thing that no one should think about. I enjoy watching it all play out. It's fun. So how do you think it's going to play out? Because right now, Google's kind of focused on developers, right? More than consumers.

17:56

So on one hand, I spend all my time thinking about developer stuff. Google's got a ton of other people who are doing consumer stuff. I think a good example of this is Gemini is both in search and through the Gemini app. deployed across billion user scale. There are literally billions of people who are interacting with outputs and the models themselves, which is crazy to think about. And it's a very consumer-forward use case for those folks. And I think it's also still incredibly early.

18:25

for Gemini in search. And there's some interesting stories around that stuff. But yeah, I mean, I personally am incredibly bullish on the infrastructure layer and like the infra tooling. And I think like actually a good example of this and you see this in some of. Sam's recent tweets about OpenAI and the pro subscription are good examples of this, which is

18:46

Builders at the application layer have a lot of tension with building more AI and like actually back to the thread of having AI be more proactive. This is why I believe in Flash so much and I believe in the direction that we're going as far as like. reducing the cost for developers while continuing to push the performance frontier is because

19:05

The story of AI to me is a story of like the actual infrastructure incentivizing developers to not use it. Like you literally have an economic incentive not to use AI because it costs you money. And like the more AI you build into your product, the more expensive it is.

19:19

the more margin pressure you have as an application builder. It's scary too, right? To build an app and all of a sudden you get a gigantic bill because people are using your thing and you haven't figured out how to properly monetize it yet. As someone who creates companies, it's kind of intimidating.

19:32

Yeah, 100%. That is like the realest reaction and like also just like the truest reaction developers have to the cost of the technology. So back to the point of like where the value creation happens, I think. The nice thing for infrastructure providers is you have a fixed margin. So you know exactly how much money you're going to make by providing some infrastructure. At the application layer, you're constantly incentivized to almost not add...

19:59

additional stuff. And I think this has been the story for the ChatGPT Plus and Pro subscription. They built a subscription for $20 a month and they realized, hey, we actually can't give people all of these things anymore. We have to make something different. And even at $200 a month, it's...

20:13

not a break-even scenario for them yet. So it's super interesting to see that play out. And it's a lot of food for thought for people who are building stuff to be like, are there new economic incentive mechanisms that you can create as you're building? product more so than just like charging a $20 a month subscription. And like the one example I can think of, of this that has made me think.

20:34

I don't know if y'all are familiar with OpenRouter, but it's a surface that lets you sort of swap in and out different language models. OpenRouter's product leaderboard, I'm pretty sure they give you discounts on tokens and stuff like that for showing you.

20:49

And some metadata passing back and forth so they can understand how people are generally using AI models. So some interesting things like that. Alex Atala, who's the CEO who previously worked on OpenSea, he has this quote which always rings in my head.

21:03

which is like usage is the ultimate benchmark. How many people are using your model or your thing is like the proof point of success, not all these other benchmarks that people are chasing after. So super interesting platform. Do they actually publish like a leaderboard similar to like what LMSYS does? Exactly. Okay. Checking it out right now. It's openrouter.ai, and it's forward slash rankings. Oh, cool.

21:24

I'm curious, a little bit of a topic shift here, but I know you're a proponent of open source and Google obviously has their Gemma models. Are there any updates, any idea of what's going on with Gemma and what we can expect next out of the open source side of things?

⁠¶ Imagen 3 Model Update & Insights

21:37

Yeah, I think this is also the piece that makes me excited about what we're doing at Google, which is it really is the exact same research that powers the Gemini models that ends up making the Gemma models. And Gemma 2 was, I think it's like the...

21:51

second most downloaded open source model in existence, which is awesome to see. And Gemma 3 is definitely going to happen. I think the timeline is soon, so you'll hear more. And y'all should do an episode with some of the Gemma folks because there's lots of cool stuff coming.

22:05

of interesting fine tunes for different use cases. They have a version for RAG, they have a version for Vision. I think they're probably going to do some agent stuff as well. So there's lots of really cool explorations happening on making those open models. Super cool. I want to talk about Imagine 3, which...

22:23

I just learned is actually pronounced Imagine 3. Like I've been calling it Imogen. I always thought it was like image generator. So like image gen, but I heard you pronounce it imagine. So now I'll start saying it that way. But you mentioned that there's some updates with that as well. Can you tell us about that?

22:38

Yeah. And you're in good company. Don't worry. I swear like 50% of the meetings I'm in, I hear imagine 50% I hear image gen. So yeah, there's no conclusive answer. I think it's imagine, but someone should correct me if that's not the case. So we released the imagine.

22:52

3 model across a couple of services late December and then have been doing a bunch of work in the last few months to bring that to developers. So Imagine 3 should be available to developers in the API and is the frontier image generation model.

23:07

across like quality and a bunch of like human ranking benchmarks, which as an aside comment, it's super interesting that if you look at text models, I think one of the reasons the world has had so much success hill climbing on making models better is like there's a definitive.

23:21

source of truth in some of the tasks that the models can perform. I think with image models, it's actually not the case. It's really hard to eval and you actually need humans in the loop to do a lot of those evals or it's artistic and stylistic stuff that's hard to put.

23:36

finger on which of these two things are better. So a lot of those evals use human raters and human benchmarks. So there's some degree of error. But yeah, it's been exciting to see the models available in the Gemini app. It's available to enterprise customers. be available to developers to build with, which is awesome. I think this like gen media future is going to be super exciting and VO hopefully sometime in the future as well.

23:59

Yeah, yeah. Vio is awesome. I have access to it, early access to it. It's super fun to play with. Is the best way to use the Imagine 3 inside of the image effects? Is that still kind of like the easiest way for a consumer to just go and play around with it? Exactly. I think it's also available for free.

24:13

to folks in the Gemini app. I think if you ask to generate an image, it'll just do it through the Gemini app. But ImageFX gives you a little bit more controllability and stuff like that. So there's a few more features that are built into ImageFX. So that's definitely a place that it's publicly available.

⁠¶ AI Studio: Free Multimodal Experience

24:28

Super cool. Yeah, I know you and Nathan sort of before we hit record, we're nerding out a little bit about, you know, the whole like sort of text to application concept. He wished there was a way that you could have Unity open on the screen and then actually have an AI sort of like assist you with like where to click and what to do next while you're building a game in Unity.

24:46

And I think both you and I in unison went, you can do that in AI Studio right now. So I thought it might be kind of cool to pick up where we left off on that conversation and talk about some of the cool stuff that's available inside of AI Studio that maybe a lot of people don't even realize. realize exists and probably definitely don't realize you could use most of it for free still right now too.

25:06

Everything in AS Studio is free, which I don't think people realize. The entire product experience, there is no paid version of it. There's a paid version of the API, which hopefully developers can scale with and do all that fun stuff. But all of our latest models end up... an AI studio for free, including the experience that powers this like real time multimodal live experience, which if folks haven't played around with it, AI studio.com slash live lets you do things like share your screen.

25:35

or show your camera and ask all these different questions and interact with the model. There's a bunch of different voices. There's a bunch of different modalities to choose from. But back to the conversation of like, what will agents look like? What do we want out of agents? One of the limitations for...

25:49

agents is you have to build all this scaffolding for the agent to be able to see the things that you do like to see my email and my text and my etc etc my personal laptop and my work laptop or my phone and my watch like incredible amount of work to make that happen except if you have a camera.

26:05

all of a sudden, all of it just works. And you can sort of make the determination of being able to show the information you want to show and share the stuff that you want to share. It's definitely more of a showcase of what's possible.

26:18

And that's why we put it in the API, because we don't have the answers, ultimately. Developers should go and build these products. But I do think, Matt, you mentioned the Astra experience earlier. I think the Multimodal Live API gets to what Astra does at the core.

26:32

which is like be able to be this co-presence with tools. And again, through a simple API, which is really exciting. I think the piece that the multimodal live API doesn't have that we'll build that I think the Astra experience did have is this notion of memory, which again is. critical for agents. I don't want agents to just forget that

26:51

I prefer sitting in window seats instead of aisle seats or whatever it is. You want all that context to be retained as agents are making decisions for you in the future. And I think that's going to require this sort of memory layer, which we're working on building, which is exciting.

⁠¶ AI Production and Infrastructure Challenges

27:04

Yeah, yeah. And I mean, Project Astra even sort of remembered between sessions, too, that like I was in London, right? So I was talking to it while in London, asking for, you know, restaurants to go check out, things like that. close that session, started another session later on. And it remembered that I was, you know, over in London, it remembered all the previous conversations. So it wasn't just like,

27:25

memory in the terms that like you can plug into the custom instructions on open AI and it'll remember your name and stuff like that. It was actually like remembering the past conversations and bringing in that as additional context, which I thought was really cool because that's really helpful to just like.

27:39

I think this is an infrastructure problem. And I think we didn't talk about this explicitly, but like one of the other narratives over the last like year and a half has been like not enough AI in production. You know, it's kind of this demo.

27:53

toy thing and no one really uses it. I think a lot of this is because it's just like taken a while for companies to build the infrastructure to actually put AI into production. And I think memory is this example of there aren't a bunch of companies that are like building this memory as a service.

28:07

building this, let's talk. I'd love to hear about it and hear about what you're building. But I think there's a lot of opportunity still to be built around that, around memory, as a service for folks. You could also start to think about, there's so many interesting ways to explore this.

28:22

your personal contacts already live today? Like how does that, whoever that provider is, plug into the world of where all the other memory services are going to be? So I think there's a lot of like really, really interesting directions that need to be built for memory specifically.

28:35

Logan, I'm curious. So you're talking about earlier the models that a lot of people don't realize that all of it's free, like an AI studio. Like, why do you guys hide it in AI studio? Recently, I talked to a bunch of different people about DeepSeek and they were talking about how amazed they were by it. And I was like...

28:50

yeah but like you can get the same stuff but better for free on ai studio right now and they didn't know and it's like it's like there's a lot of people who don't know and so i was like

28:59

You got to communicate that better somehow. Or like, I think you guys should have like, you know, its own website or something like outside of Google, like a new product where you guys just like, hey, here's the new frontier and here's what we're pushing. And Google is still there and it uses some of the tech, but we have a new thing. And that's my personal opinion.

29:13

Yeah, yeah. No, I think you're spot on. And for what it's worth, I think we get this feedback pretty consistently. I think some of this is a factor of just the state of the world and the challenges that we have. as a product. I think on one hand, we are a developer platform. We're not building the front door for Google's AI surfaces. That's not the product that I'm signed up to build. That's not the product that we're directionally building towards. We're really focused on how do we enable

29:40

builders to get the latest AI technology. The Gemini app, formerly BARD, is the sort of front door to Google's AI technology. And I think from a consumer standpoint and also from an enterprise standpoint and like workspace and other places.

29:54

There's all this interesting organizational work that's happened at Google over the last couple of years. I think one of the cool stories is operationalizing Google DeepMind from doing this foundational research to being an organization that builds the world's strongest generative...

30:09

AI models and like actually delivers those to the world sort of as a product. And then now bringing the product surfaces that are the front line of delivering those to the world, the Gemini app, Google AI studio into deep mind so that we can sort of continue to accelerate like all.

30:23

those things to me are like directionally the right stuff for us to do. I agree with you. I think we need to put the models in front of the world as soon as possible. And I think having a single place to do that makes sense. And it should probably be the Gemini app. It probably shouldn't be AI studio.

30:37

But at the same time, I say that like we also want to be a surface to sort of showcase what's possible. So there's a lot of like tension points, but I do think I fundamentally do think we're going to get there. The Gemini app is moving super quickly to like get the latest models like they just shipped to.

30:51

flash. They shipped 2.0 pro today. They shipped the thinking model today. So I think that Delta between the Gemini app and AI studio is sort of going away, which yeah, I'm excited about because like the consistent feedback is people don't like that Delta. They want to have a single place. go to to sort of see the future.

31:07

I always saw the AI studio as sort of like the playground to test what the APIs are capable of. In the same way, like OpenAI has their OpenAI playground and you can kind of go and mess with some of the settings and see what the output will look like before using it in your own API. That's kind of how I always saw the AI studio.

31:24

Because like once you get into it, if you're not very technically inclined, you might get a little overwhelmed seeing things like what models should I be using? What is the temperature? You know, things like that. People aren't necessarily going to know like how to play with that. Like, what should I set my token limit?

31:39

Things like that, I don't really feel like general consumers want to mess with. They just want to go to a chat bot and ask their question, right? It feels very tailored towards developers to me. And then the Gemini app feels like, all right, this is their front user. interface that they want, you know, the general public to go be using at least. Yeah. I got a ping yesterday actually from someone saying.

⁠¶ Democratizing App Creation Tools

32:00

why is the chat experience have all this stuff in it? Like, why are there all these settings and stuff? And I was like, I was like, you're in the wrong place. Like Gemini app is the place that you need. And they responded right away on Twitter and they're like, ah, yes, this is much better for me. I don't want to see all that complexity, but I do. think about this a lot is like

32:15

How do people actually show up? How are they finding their way into these products? I do think the Gemini app is like this very large front door. So it tends to capture most of these folks. Like it's literally built into the Google app on iOS and all this other stuff versus like you actually kind of have to do a little.

32:30

bit of searching to find AI Studio, which probably makes sense in some cases. Awesome. You know, Logan, you said you were super excited about text application. You were talking about like Lovable, Bolt, other, you know, companies like that. Like, what are you excited about in the space and where do you think that kind of stuff is going? Yeah. I think just being able to

32:47

democratize access to people building software and like creating things. There's a ton of people in my life and I don't live in the Bay Area. So there's a disproportionate amount of people who aren't in tech where I live. But the proportion of like people with interesting ideas, I actually think is the same.

33:01

It's just like the actual tools themselves that they have to go and execute on those ideas that I think is like much less distributed in places outside the Bay Area, New York and other places like that. So I think this frontier of text to app creation is going to be. so, so interesting to see play out. And yeah, there's a ton of companies that are having lots of actual real early commercial success and traction today, which I think, again, this is...

33:27

one of those examples where like sometimes there's use cases that don't work and then all of a sudden like the model quality just gets good enough or you build the right sort of couple of things from a product experience and then all of a sudden it clicks and like now this thing is possible and to me it feels like It feels like text to app creation has had that moment and it's now possible. And I think it'll...

33:47

take a while and there'll still be a bunch of other things to hill climb on. But I think especially now with reasoning models and the ability for them to keep thinking and writing more code and doing all that work, I think the complexity of the apps is also going to continue to go up on this exponential. So, and actually Replit just had their launch, I think today or yesterday of like a similar sort of product of text apps. I think there's more and more players showing up in this space.

⁠¶ DIY Software Solutions

34:10

I would assume that probably 50% of products or something like that that are building with AI have this type of experience. And you could think about how does that translate to someone who's doing something very, very domain-specific. I think there's a lot of... that try to build extension ecosystems or connectors or all these other sidecars of their product. You can imagine you just let your users create those. Here's the sort of generic set of APIs that talk to your email client.

34:39

client, for example, and like, here's a text box and like, go build the sort of product experience you want. Like it's sort of in your hands. And like, that's a crazy world that you can totally customize that however you want to. Yeah, it doesn't feel that far away. My new email client that's got like, you know, 80s.

34:53

video game stuff, you know, it's like mixed in with the email client. Yeah, no, I've been loving that concept. We've talked about this a couple of times on this show of like, I've gotten in the habit now of when I have a little problem or a bottleneck that I need to solve, instead of going and searching out if there's a SaaS company that already exists that has that product for me, if it's simple enough, I'll just go prompt that software into existence.

35:15

And I have like a little Python script that runs on my computer to solve the problem for me, right? Like I made a little script where I can input my one minute short form videos into it. It automatically transcribes them and then cleans up the transcription and adds like proper punctuation and stuff.

35:30

I created another mini app where I can drag and drop any image file. It doesn't matter what type of file format it is. It'll convert it to a JPEG for me so I can use it instead of like my video editing. And these are probably softwares that exist. I could go and hunt them down on the internet and maybe. pay five bucks a month to use them, but.

35:47

I can just go use an AI tool, prompt the tool that I need. And 15 minutes later, I have something on my desktop that I don't need to go pay anybody else for anymore. Maybe it connects to an API like my transcription one connects to the OpenAI Whisper API. So it is costing me like. a penny every time I use it, but so what? I just love this concept of like, when I have a bottleneck in my business, I can just go like prompt an app into existence that solves that bottleneck. Yeah. I think that.

36:14

carried out one step farther towards this like infinite app store where like truly everyone is creating and contributing to this thing and like remixing. This is the stuff that gets me excited about the future because there's so much cool stuff to be created. And really, I think that the lens of all of this is like, how do you democratize access and make it so that anyone can go and build this stuff? And as someone who can program, but also knows how painful it is.

36:38

lot of ways. It's just so cool that more folks are going to be able to participate in that. It's going to be awesome. Yeah. Coincidentally, someone today from my hometown, Alabama, like messaged me like, hey, I have this idea for an app. I get this kind of stuff all the time. I have an idea for an app and who can I hire to build it and all that stuff. And I'm like, I'm about to send him a link to like Replit. Have you like tried this yet? You know, it's like just go try that.

36:58

And instead of paying someone $5,000, that's probably like a ton of money for him, right? Instead, just go try Replit and sign up for one month, cancel it if you don't like it after that, and just see what you can get. And it's going to get better and better. I've tried Replit and all of them, and they're pretty good.

37:12

good it feels like there's something that's like slightly missing but every time i check it's better than the last time and and it feels like probably within the next year or two you're just gonna make any kind of software you want just by talking it's just that's gonna be such a magical moment like in the early days of the internet the internet i feel like was more

37:27

fun because people there's always like different websites and different kind of things or like you'd have win amp and you put a skin on your win amp but there's all these different things in terms of customization that was happening more than there is now on the internet and it feels like this kind of stuff might bring that back we're like yeah the internet you can kind of customize how you

37:41

interact with the internet through creating your own custom software with AI. Yeah. I was just thinking about, as you were describing a fun internet, I was thinking of my personal website, which is a blank HTML page and there's no styling or anything like that. But if I didn't have to shoulder the cost...

37:55

and the LLM on someone's computer, I could just kind of talk to and say, remix this site and do it in all types of crazy ways. That would be so fun of every time someone shows up, it's a different experience to see this content.

38:09

I think there's a lot of interesting threads to pull on that. Is there anything else happening at Google right now? Any other things that you're working on that you're allowed to talk about? Is there any avenues we haven't gone down that you really want to talk about that you're allowed to talk about, I guess?

38:24

Yeah, I think the only other thread, and we alluded to it a couple of times, is reasoning model stuff. It feels like, and I tweeted this the other day, it feels like the GBT2 era for these models. There is so much new capability and so much progress being squeezed out of the model. in such a short time. And we released our first reasoning model back in December, right after the Gemini 2.0 flash moment.

38:45

one month later, like a normal, like six months worth of progress, honestly, on like a bunch of the benchmarks that matter for this stuff. We released an updated version like January 21st, a couple of weeks ago. And if you look at the chart, it's like...

⁠¶ Reasoning Models Unlock Contextual Understanding

38:59

literally linear progress up and to the right across a bunch of the, and like, it's just crazy to think that again, like a month ago, the narrative was like, the models are hitting a wall. There's no more progress to be had. And it is, it's funny, like how much. nuance some of the conversation lacks because these innovations are deeply intertwined. I was having a conversation earlier today about long context and how long context is actually a fundamental enabler of the reasoning models.

39:27

like by themselves, the long context innovation, like. The model's okay at pulling out certain pieces of information. Like it can do needle in a haystack well. It can find a couple of things in a million tokens, but it's really hard for the models to attend to the context of like, you know, find a hundred things in this million tokens.

39:45

and context window example, reasoning models are the unlock for this because the reasoning models, the model can really just like continue to go through that process and think through all the content and like really do the due diligence. And it's almost uncanny how similar it is to. Like, how would you go about this? Like, I couldn't.

40:02

watch a two hour movie. And then if you quiz me on a hundred random little things as part of it, like, I'm not going to get those things, right? Like, it's going to be really hard to do that. But if you let me go through the movie and, you know, watching an iMovie and like add little inserts and like...

40:16

clip things and cut things and do all this. So like I'd be able to find those things if you asked me those questions again. And it feels like that's kind of what reasoning is doing is actually being able to do that. So I think we're super early in this progress and it's going to be a lot of fun to see.

40:30

both the progress continue for us, but again, through this narrative of how all this innovation trickles into the hands of people who are building stuff. And like, there's going to be a ton of new products that get built, like maybe text to app, just like... gets 10x better in the next year because of reasoning models, that's possible, which is just crazy to think about.

40:48

Yeah, yeah. The reasoning models, they almost feel like they're like double checking, triple checking themselves in real time. It'll be like sort of starting to give a response and be like, let me actually double check what I just said. And when it comes to coding, that seems like it's the. ideal use case almost right because it can almost look back at its code and be like oh i think i made a mistake there and sort of continually fix its code before it finally even gives you an output which

41:11

I've just found to be really, really cool. But also, from what I understand, that's where a lot of the cost in the future is going to come in. The cost of the inference to do all of this analysis in real time as it's giving its output. Yeah, and the other thing to think about, which is interesting, is we're seeing all of this progress with the reasoning models, and they...

41:33

are doing the most naive version of thinking. They really are, if you were to think about the human example of this, you're sort of sitting in a box thinking to yourself. You have no interaction with the outside world. You're not able to...

41:45

test your hypothesis, use a calculator, search the internet, any of those things. And like, you have to sort of form your thoughts independent of the outside world. And you imagine what starts to happen when you give these things tools. And like, it really does feel like that's the agent.

41:59

that we've been promised is like all of these tools in a sandbox interacting with the model, letting it sort of have that feedback loop of trying things and seeing what doesn't work. So I couldn't be more excited about that.

42:10

Yeah, that's really interesting to think about. So like right now, it's just sort of thinking through things and sort of double checking itself. But in the future, it could actually be working with other tools that can also like assist in the double checking and things like that and get even smarter in those ways.

42:24

Yeah, and I think put a different way, to be more extreme, I think it has to do that. I think the version of the future that we're going towards is we're not going to be able to see the progress continue to scale unless the models can do that. And again, this goes back to this thread of there's lots of hard problems to solve in the world.

42:39

Making it so the models can do that efficiently and securely and safely and have that sort of sandbox to do that type of thinking and work is going to have to happen. And it's probably a lot of work that hasn't been solved today, which is interesting and opportunistic.

⁠¶ AI Frontier: Risks and Costs

42:53

Yeah, it's crazy to think that like probably soon like AI is going to be helping create all those tools as well. So that's when we'll see things just go exponential. It already is. It's like whether people with AI or AI itself creating the tools and just get that back into the system. And it's going to be wild how fast things are going to get better.

43:07

all the engineers being powered by Cursor. It's crazy. It's happening today. So many people are, I feel this way for myself. I write more software now than I did when I was a software engineer because I have AI tools and I can do all this crazy stuff. Yeah. How far do you think we are from like AI actually being able to update its own weights based on conversation? So it actually learns based on new input that it gets through conversations that it has.

43:33

I think in the small-scale example sense, you could probably already do this to a certain extent. I think in the real frontier use cases, probably far from that. Some of the OpenAI operator stuff was talking about this. around the need for having evals of basically creating economic value, like actually creating money.

43:55

and where we are in that and like you probably don't want the models to do things that have a high cost today because if they get it wrong it costs you a lot of money and training frontier models is definitely on the list of things that would cost you a lot of money if you got that wrong like you don't want a bunch

44:09

of training rooms that are just wasted compute. That's millions of dollars of potential lost money. So I think there'll be a human in the driver's seat for those things for a while. But I do think you can sort of accelerate this small scale.

⁠¶ AI Updates on Twitter

44:24

feedback loop. And I think that's why small models matter. This innovation that's happening of being able to compress the frontier capabilities down into small models, I think it enables that rapid iteration loop where maybe AI is more a co-pilot in that example.

44:38

Gotcha. Well, cool, Logan. This has been absolutely amazing. If people want to follow you, what's the best platform to pay attention to what you're doing and to keep up with what Google and DeepMind are up to? Yeah, I'm on Twitter. I'm on LinkedIn. I'm... on email. So whichever one of those three is easiest to get ahold of me, would love to chat with folks about Gemini stuff or the like.

44:59

Yeah, you're pretty active over on X slash Twitter, whatever you want to call it. Whenever there's a new like Google or DeepMind rollout, you're pretty much either tweeting about it or retweeting about it. So very, very good resource to keep up with what's going on in the world of AI with Google.

45:14

And Logan, thank you so much for hanging out again with us today. I'm sure we'll have you back in the future if you want, but this has been an absolutely fascinating conversation. So thanks again for hanging out. Yeah, this is a ton of fun. I'll see you both at IO, I hope. I think hopefully we'll get the game back together and we'll spend time in person, hopefully at IO. It's going to be fun. Would love to do it. Thanks. Thank you.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Gemini 2.0: 20x Cheaper Than GPT-4?! (DEEP DIVE) | Logan Kilpatrick

Episode description

Transcript