AI video completely taking over our social feeds in the span of a week, which is absolutely insane. VO3 was sort of like the chat GPT moment for AI video, where we were suddenly seeing all of these VO3 generations blowing up with millions of views. It feels like between this and VO3,
Like a world of possibilities has been opened up for AI storytelling, especially in video form. Yes, it's an exhausting time for AI creatives. It's great, but exhausting. The median ARR, annualized revenue run rate, is now... $4.2 million at month 12 for consumer startups. Consumer is back. In today's episode, we have a takeover. Justine and Olivia Moore, twin sisters, creators, and partners on the A16Z consumer team. You'll hear them demo Google's new VO3 video model.
Break down major upgrades to voice tools like Chai Chibi Tea and Eleven Labs, and walk through how Justine used AI to create an entire frozen yogurt brand, complete with logo, product shots, and a storefront. It's a fast-paced look at what's new, what's working and where things are going next for creators, builders and consumer AI companies. Let's get into it. As a reminder, the content here is for informational purposes only.
should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a16z.com forward slash disclosures. I'm Justine. I'm Olivia. And this is our very first edition of This Week in Consumer AI.
So we are both partners on the investing team here at A16Z, and we are also identical twins. Very confusing. Extremely confusing, but should be fun for a podcast. And we're excited to chat about some of the cool things we saw in the wild world of consumerism. AI this week, starting with VO3. Yes. And things are moving so quickly that it feels like we went from exciting but maybe not super realistic AI video to...
AI video completely taking over our social feeds in the span of a week, which is absolutely insane. Yeah. I've been following AI video for a few years now. You probably remember. I've been... an early user of all these models. And I have wanted them to work and to make cool things that everyday people would like for so long.
And I would say VO3 was sort of like the chat GPT moment for AI video, where we were suddenly seeing all of these VO3 generations blowing up with millions of views, channels only featuring VO3 video. Yeah. What's actually different about VO3? Yes. So I should give the overview first. So VO3 is Google DeepMind's latest video model effort. So they released VO2 late last year, which was the first sort of breakthrough in showing that you could get really high quality video.
like a consistent scene, consistent characters, physics, like things that just looked good. And VO3 is the next iteration of that model series. And what's very different about it is it generates audio natively. at the same time it generates video. So you can actually prompt it with a text prompt to say something like a street-style interview where a man and a woman are talking about dating apps.
Or you can be even more specific and say something like a street-style interview where a man walks up to a woman and asks her, what dating apps are you on? And she replies, why are you asking? And then gives him a suspicious look.
no longer have to go to another platform to do an audio voiceover or anything like that. You can get a full-featured talking human video with multiple characters in one place. Yeah. It feels like a real unlock to me as someone who's been following AI video less closely and that people are...
now able to generate in one prompt a full vlog, a full talking head video, something that looks like a podcast. Yes. In one go. And I think that's why we've seen things like the Stormtrooper vlogs completely blowing up on TikTok and Instagram. Yes. So the interesting thing about VO3 is it's limited to eight second generations only.
And it doesn't generate audio if you start from an image to video, only if you start from text, which means that it's really hard to have longer than an eight-second clip with character consistency unless in your text prompt you're referencing a character.
that the model already knows. Okay. And so that's why we've seen all of these hacks of all the viral vlogs featuring, like, stormtroopers or a Yeti. Because you can't see their faces. They're covered by a mask. Yes. Or the Yeti, the model knows what the Yeti looks like.
If it's not a human face, I think we're less sensitive to little changes between the eight-second clips. And so you have people generating minutes-long videos that look like a consistent vlog character. Yeah, they've been super fun to watch. And then how do you actually use VO3? It feels like there's been some confusion. Yes. So when VO3 first came out, it was only available on the Google AI Ultra plan. Very confusing.
through Flow, Google's new creative studio, and you had to be on the $250 a month plan. So there was a lot of hype, a lot of FOMO. Now the model is available via API. So what that means is a bunch of consumer video platforms like Hydra or Create. are offering access to VO3 on their $10 a month plan.
Or some of the more developer-oriented, like, API platforms, like Fall or Replicate, are offering generations where you pay per video. It's priced around, like, 75 cents per second today. Wow, okay. So it's still pretty expensive. You have to be careful.
Careful about how you prompted, but the results are amazing. And then what do we expect next, either from Google, from creators? Like, what does this mean for AI video? Yeah, I think on the creator front, we've already started to see this explosion of basically...
I think what people have called faceless channels. So this idea of like, now you don't have to put your own face behind a camera or on a screen to be able to talk about a topic or film a vlog or something like that. You can have a fully AI generated character.
essentially telling your story or acting out your narrative for you, which is huge. And I think like people are using it to tell extremely funny jokes, have these narrative storylines, like Greg, the incompetent stormtrooper who's crashing all of the missions, things like that, that people are.
getting really invested in. In terms of from the model providers and the companies, VO3 is clearly very expensive to run. So I would imagine Google will want to train the next model that's even bigger and is able to generate longer videos, but will struggle with things. like coherence and will also struggle with, honestly, the pricing of the model, hopefully we'll see more sort of condensed, optimized, distilled models that are able to do similar things at a lower cost.
Okay, so there was a lot of news last week, so this got kind of lost, but I heard there was a big update to ChatGPT's advanced voice mode. Yes, they announced it on Saturday, which was an interesting choice. Weird time to drop. Yeah, I think they actually dropped the improvements last Thursday or Friday.
first only for all paid users, and now I think it started rolling out across the broader user base. But essentially, they made advanced voice mode a lot more human. The really interesting thing was ChatGPT was the first one to do what I would call kind of real-time consumer. voice where you could have like a conversation. And this was like last September in the ChatGPT app.
But then they didn't really improve the product or the model that much, at least from my perspective. And we saw Sesame and other open source providers come out with arguably better and way more human-like models.
Gem and I, we saw Grok launch voice products that were much more realistic. So I think it was kind of a question mark for a lot of people what ChatGPT was doing with consumer voice. And so what makes it better now or what were the main upgrades? We should actually just maybe talk to it. It's live demo time, folks. Hey, ChatGPT, are you there? Hey there. Yeah, I'm here and ready for the demo. What can I help you with today? Did they make some improvements to your voice over the last week?
Yeah, they did. They've been rolling out some updates to make my voice sound more natural and expressive. I'm glad you noticed. Yeah, I can hear like your voice maybe inflects upwards when you're approaching a question or sometimes you'll say um or uh or something that sounds like a mistake but actually makes you sound more human.
Those little touches are all intentional to make the conversation feel more natural and relatable. It definitely makes things a bit more fun and engaging, I think. Very cool. Amazing. Yeah, it's crazy to hear. It's always wild when a voice model like coughs or even funner things like taking on an accent or speaking in another language or things like that. But then also I think just the pure realism of the voice.
that demonstrated is also extremely impressive. It's so funny, too, because when Advanced Voice Mode first came out, my feeling was like, wow, this is amazing. This is incredible. This is so human-like. But then like a month or two later, Notebook LM came out. And that was the first real voice experience that put in those ums, ahs, pauses, other things that are so human-like. And it felt like such a huge upgrade. And then when you use advanced voice mode, you're like, this is not that advanced.
anymore. And so now it's finally there, which is super exciting. So it went from advanced voice mode to basic voice mode to advanced voice mode again. It's advanced voice mode again. Yeah. I think one of my questions then has been like, what took them so long? Right.
Because they're on the cutting edge of so many models and yet it feels odd to me that it took maybe six plus months for them to roll out improvements that we saw from other model companies much faster. Yeah. I honestly think a big part of it might have been when they first released it. voice mode.
If you remember all the controversy around her. Yes. And was this going to be a companion that replaced humans and some of what people would think of as kind of scary implications of that. Right. It seemed like that maybe spooked them a little bit and so they didn't want to put anything out. there that sounded too human.
Yeah, I mean, that and then also, I mean, OpenAI has been super busy. This has always, I think, been the question about the frontier, largely LLM labs, which is like, how do they balance priorities between the North Star of like text-based AGI, then what they're doing in... video with Sora, all the image stuff they did, which we'll talk about a bit later with the 4.0 image model, reasoning, all of those sorts of things. Yeah, totally. It reminds me a little bit actually of one of the other...
I say big tech, like OpenAI is now big tech in some ways. It counts. But the other big tech consumer update this week, which was the Apple Developer Conference. Yes. And all of the things that they announced around AI. Or didn't announce. Or didn't announce. Right. And the fact that I think that people have been.
So far, somewhat disappointed by Apple Intelligence, which is their bundled set of AI features. Yeah. I think we've all been waiting on like the AI version of Siri or some kind of true personal assistant on mobile. Yeah. I had this the other day where I asked Siri, OK, tomorrow's Monday. What Monday is it of the month? Because SF Street Cleaning, I had to know if it was going to be the second Monday of the month. And it said, I don't know that. Can I search ChatGPT for you? And I was like, Siri.
How can you not answer this basic question? Well, okay, it does seem like from a lot of Apple's updates that they put out, they're kind of outsourcing a lot of the true AI features to ChatGPT just running on your phone. And I think a similar story, it seemed like when they rolled out.
those AI-powered notification summaries where they would group like three or four sets of notifications into one and they got a little jumbled and people got upset. It seems like that spooked Apple a little bit and they keep kind of retrenching on the timeline of releasing AI Siri.
Yeah. So we'll see what happens. They were, at least in the announcement yesterday, leaning into things like updates to Genmoji and call transcription. I think the coolest thing I saw was real-time translation of calls and FaceTimes. Yes. Across languages.
We haven't seen more on that because that feels like a really natural and obvious use case. Yeah. Especially, I think Google might have done a real-time translation, but I haven't seen a ton of adoption yet. I did see for the first time a viral Gen Z TikTok feature. Genmojis, which I'm surprised it took that long to hit because Gen Z loves Genmojis. Holding out hope for those to make it really big. Yes.
Okay, and before we get too far off voice, should we talk about 11v3? Yes. So 11 Labs, the text-to-speech company, actually broader AI voice company, released their third-generation model called 11v3 also last week. Yep. I believe maybe last week. Thursday or Friday. It was a very busy week on the voice front. And what makes 11v3 really special is it does a bunch of stuff with voice that you used to have to do via speech to text to speech. So before, if you wanted to have Yeah.
they essentially take all of the weird inflections, emotion, even accents, and they turn it into text prompting through these things called tags. So basically, the Eleven interface is an editor. where you can take a sentence that you want the character to say, you pick your voice, you write your sentence, and then you can tag it like sadly or resigned or whispering or something like that. And you can do sound effects too, right? That is huge. Okay, so I made this one.
And what's the prompt on it? Oh, it's a text prompt. It'll say, hey, y'all, my name is Austin. I'm coming to you live from our family farm in Fort Worth. Then he's going to walk through milking a cow and someone's going to interrupt him. Great. Hey, y'all. My name is Austin. I'm coming to you live from our family farm in Fort Worth.
Today, I'm going to walk through what it's lacking. Austin, are you faking an accent again? It's not faking. I was born here. Everyone knows you don't talk like that. So my favorite thing about that is it showcases a couple of things about the model. It can do bad accents. It can do terrible accents. It can do great accents. That was two different characters. At first, I prompted the Austin character of having a thick Texas accent.
Then I prompted the cows mooing. And then you can also prompt interruptions, which is really cool. So a tag is literally like starts talking and gets interrupted. And then the next character that comes in, you can say cuts. the other character off. And so for narrative storytelling, for ads, marketing, anything, it makes it sound like a natural conversation, which we've never had with AI voice before. It feels like between this and VO3.
like a world of possibilities has been opened up for AI storytelling, especially in video form. Yes, it's an exhausting time for AI creatives. It's great, but exhausting because there's just way too many fun stuff to test. Yeah. I think Eleven's actually doing a competition now where they're soliciting the best examples of people using V3 from all around the world. So I'm going to be very curious to see. We've made all sorts of fun stuff, but how the professional narrative builders and...
storytellers are using it because I think we've just scratched the surface of what's possible here. Amazing. Okay, so you put out some data last week about AI revenue ramp and how fast companies are growing. Let's chat through the main takeaways from that. Yeah, so basically the methodology here, or maybe even back up, the purpose here was...
I think we all have this idea in mind, or maybe we have that idea because we've heard it a billion times, that, like, we're in a new era of growth now. Thanks to AI, companies are scaling faster than ever before. But my question was, like, what does that really mean? fast is that? Is it 20% faster? Is it 50% faster than what we saw pre-AI? Right. So we—
are blessed to get to meet tons of companies here every day. We meet dozens of companies a week. So we went back and essentially just pulled all the data from companies we've met in the Gen AI era, which I would say is the last 22 to 24 months. Right. at once they started monetizing, how fast are they growing? I would say pre-AI, if you're a B2B startup selling to enterprises, if you got to a million dollars in ARR in the first year, that's like...
Amazing. Best in class. That was like the rule of thumb. I remember that. Yes. The known metric. Very exciting. If you were a consumer startup, you would not make money for. Three, five years, maybe longer. Yes. The whole idea was to build up a user base and then probably monetize them directly via ads. Or transactions for like a marketplace, maybe. Yes, down the line. And there were counterexamples to that. Some subscriptions.
companies, but that was definitely not the dominant model. That has fully shifted in the AI era, and most companies are now making money directly from consumers via subscription. What we found was actually pretty surprising, which is that the median ARR, annualized revenue run rate, is now $4.2 million at month 12 for consumer startups. The bottom quartile is $2.9 million. Yeah. And the top quartile is 8.7 million. Wow.
So like a median company, median B2C company in the age of AI is getting to 4 million ARR after a year. Yes. And the best in class companies are getting to 8 million in a year. Upwards of 8, almost 9. In the pre-AI era. Yes. Like unheard of. We never would have seen anything like that.
And the even more surprising thing is those numbers are twice as high as the B2B benchmarks in the AI era. So consumer companies are actually ramping revenue faster, which again is like a total reversal from what we saw before. I think there's a couple reasons why this was happening. First, it's like, why have consumer AI companies adopted subscription? They were kind of forced to, because especially in the early era of models, they were so expensive that you as a company...
had pretty high costs of goods sold. Right. The inference costs, you mean? Yes. Of running the model. Historically, software, the benefit was like there was no marginal cost. So you made an app. There's no additional cost to serving the next user. In AI, that is actually not true at all. Yeah. Especially if you're running inference on a model, it costs you cents. Yeah.
Maybe even dollars for each query. So each user could be costing you dozens of dollars a month. Yeah, absolutely. So a lot of companies had to at least try to charge. Right. And it turns out that these new products that are AI native are so powerful that consumers are willing to pay.
So we also ran some additional data analysis that shows that on average, consumer AI startups are charging $22 a month across the average user, which again is like more than double what they were able to charge pre-AI. for subscription companies on average. Right. And do we have theories? I mean, on the creative tool side, what I've seen is like for people who weren't creative, AI tools allow them, like I can make.
photos or images or art for the first time I can make videos I can make animations and then for creative people like we have a cousin who's a creative they can genuinely use this to supercharge their workflows and do their job a lot faster so they're willing to pay for it Have we seen examples of that outside of creative tools yet? It's a good question. We've seen some of it around companion apps, I would say. Yes. Where, again, the products are just so powerful to have.
a friend with you 24-7 that people are excited to pay. We've also seen this around categories like language learning, teaching your kid how to read, other things that previously you'd have to pay a human being, I don't know, $50 an hour, if that, to get out.
access to that now $22 a month with AI feels pretty cheap. Totally. I mean, I'm even thinking some of the things I've seen monetized while are in like nutrition or coaching where for the first time, thanks to vision models, you can take a picture of what you're eating.
have a vision model pull out how many calories are in this, how much protein, and then at the end of a day or week can summarize insights about what you should be eating more or less of. Yes. Which is something that pre-AI people, like, I don't know, you could take a picture enough.
Yes. Or you have to find a nutritionist, wait weeks to book an appointment, maybe get a referral from your doctor. Like, it would take forever. So it's really exciting. I think it's monetizing people who never would have paid for that before. Yes.
would have paid for it are switching over to the AI version or they're willing to pay even more, which is super exciting. The other thing that I think people have questions about or maybe doubts about would be like, OK, great, they're growing fast, but they're not retaining a lot of users.
We did some analysis on this, too. There's definitely a lot of tourism. We call it AI tourism behavior in terms of free users, which means you get a lot of hits on your website, essentially. And most of those users don't stick around. But if you look at it on a paid user base.
So once you actually subscribe, consumer AI companies are retaining at the median pretty much just as well as pre-AI consumer companies, which is really exciting. I feel like what we've seen, especially on revenue retention, which is fascinating, is like you might have more to work.
So you might have more people who subscribe and then cancel. But then for the first time, you have like real sort of upsell activity in consumer subscriptions, whereas you're not just paying $10 a month for the app. You're paying $10 a month for the image model. But then if you love it and you run out of credits, you're paying another $10, $12, $50 for additional credit packs before the next month of your subscription starts. Yeah.
And we're seeing, I would say, companies convert consumer revenue to enterprise revenue way faster than they ever did before. Like companies like Canva previously took. five, six, seven plus years to really move from consumer prosumer to enterprise. And now we're seeing companies like Eleven Labs is a great example where someone might start using it as a $10 a month plan to make their own fun videos at home.
make voiceovers for their fun videos at home. And then turns out they work at some big entertainment company and they bring it in there to work and then they convert to a really high CB Enterprise contract, which is super exciting. I mean, I feel like we even saw that in the very early days of...
consumer AI. I remember our friends at ad agencies or entertainment companies would tell us that they were using MidJourney to mock things up or even using the images in their final work products. So it was like a true enterprise use case, but growing. bottoms up, which is a fascinating motion. Yeah, it's exciting. Consumer is back.
All right. Awesome. We're moving on to our demo of the week. So one fun fact about us is that we genuinely love, at least for me, it's probably my number one hobby now, trying out all of the AI creative tools, especially, but also AI. like consumer products more broadly.
Figuring out how to make cool things and then sharing the workflows to other people whose number one hobby is not doing this and who do not have hours. So this week, we are going to talk about brand creation and ideation using AI. I made this new frozen yogurt brand called Melt that I iterated on with ChatGPT. Then I took to Ideagram. And then I took to Kriya to kind of do the final touches and to make these really cool product photos and even store photos.
And I think that the initial idea about this was seeing Flux Context come out, which is the new image editing model from Black Forest Labs, which is hosted on CREA. In Flux Context, you can kind of think of it like the GPT-40 image model, where you can upload an image, and then you can say, you know, make this Ghibli style was the viral example. You can also say, like, take the person from this photo and put them in.
a new environment or take the logo and change it slightly. Yeah. Add or remove objects. I've seen it described as kind of like Photoshop, but with natural language prompts. Yes. Like you can edit with words for the first time. And I think that is what makes it.
different than the 4.0 image model, which is the consistency to which it retains the item or the character or whatever is much, much better. We'll show some examples here. But basically, if you're taking a photo of yourself and uploading it to GPT-4.0 and saying like...
put me in a podcast studio, you will likely end up looking completely different in the new photo than you did in the initial photo. Or maybe some similar features, but quite different. Whereas this model does an amazing job at maintaining consistency. And so that sparked this idea for me of...
oh, that means that this can actually be used for like brands, product photos or other sorts of marketing collateral because the logos and the products can be consistent. Awesome. And you know, I'm a huge Froyo fan. I feel like Froyo has gotten an unfair shape. in recent years. It's kind of seen as the little kid thing. And so I wanted to make a cool, hip, modern, 20s New York Froyo brand. I went back and forth with ChatGPT on this idea to land on the name.
Melt, which I love, and to land on the branding of like, this is sort of what the font of the logo will look like. And then this is the color of the packaging. And then I took that prompt for the logo to Ideagram, which is an image. and sort of editing canvas. And it's super good, I think, at logos, typography, anything sort of product or word related. And I had it generate this photo of this Froyo cup that is floating in the air with the Melt logo and branding. Yep.
Then I downloaded that photo and I took it to CREA where I used the Flux Context new editing model to run all different kinds of scenarios. So what's really cool about that is you can upload the photo and then you can say, take this froyo and put it sitting on a counter at a trendy restaurant. Put it in the hand of a woman at a park. Or even make the froyo cup instead of blue.
make it white and give it a pink border. Make the froyo itself purple if they're having like an ube froyo special. And then I think the next step, which I didn't do here, I kind of stopped at the product images and I actually made an image of the store too. I took the... logo and I superimposed it over a store I generated. I know. It's like you want to go there.
But the next step even further would be video. Yes. So my idea is to take all of those sort of product shots, take it to VO3 or Higgs field, which does really cool special effects stuff. Yeah. And have the froyo cup in action. See how good they are. And have it actually melting. over the side. It has to melt. I'm very curious to see, do the models understand the physics of ProYo? Like if it tosses the cup in the air.
How does the Froyo land? Does it kind of plop like we all know Froyo would in real life? Yeah. And obviously this was just like a fun, fun experiment for me. I'm not unfortunately actually going to be starting a Froyo brand, but it sort of makes you start thinking about.
Like, why would, if you work at an ad agency, for example, and you were mocking up a deck for your client about your latest campaign, why would you not use something like this to show them what it might look like? And you did it in, I mean, less than a couple hours. And honestly, the branding.
Does look more exciting than a lot of kind of professional brands that we see out there. And so it makes me think about the next generation of entrepreneurs are going to be completely AI assisted in a lot of these assets that they're putting together. And I think they're going to be able to.
make full stack AI brands. There's also products where you can design with AI. You can make ads with AI. Like, I think there'll be no reason for any person not to have their own product line, small business, open a store if they want to. AI is assisting with these kinds of things too. Totally, yeah. I think we'll see brands that are like logo, product photo, maybe even product itself designed by AI, vibe coded slash vibe designed website or mobile app.
And then kind of drop ship to the end consumer. Social media ads also generated with AI and avatar that holds it up for you and sells it on TikTok. Yeah, it's promoted by AI influencers who are VO3. They don't actually exist. I think that sort of thing is going to be. really fascinating to see because it kind of like you no longer have to know
how to work all of these technical tools that you had to be able to use. Like even Photoshop, there's so many buttons. It's like very complicated. And now you can just ask for what you want in a text prompt, get something generated and iterate on it until you end up with something. which I think is crazy powerful. Awesome.
Thanks for listening to the A16Z podcast. If you enjoyed the episode, let us know by leaving a review at rate this podcast.com slash A16Z. We've got more great conversations coming your way. See you next time.