#152 - live translation on phones, Meta aims at AGI, AlphaGeometry, political deepfakes - podcast episode cover

#152 - live translation on phones, Meta aims at AGI, AlphaGeometry, political deepfakes

Jan 28, 20241 hr 27 minEp. 191
--:--
--:--
Listen in podcast apps:

Episode description

Our 152nd episode with a summary and discussion of last week's big AI news!

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

Email us your questions and feedback at [email protected] and/or [email protected]

Timestamps + links:

Transcript

Andrey

Hello and welcome to Skynet today's Last Week in AI podcast, where you can hear us chat about what's going on with AI. As usual, in this episode, we will summarize and discuss some of last week's most interesting AI news. You can also check out our last week in AI newsletter at lastweekin.AI for articles we did not cover in this episode. I am one of your hosts, Andrey Kurenkov.

I finished my PhD focused on AI last year -- before I said it was earlier this year, but now it's last year -- and I now work at a generative AI startup.

Jeremie

And I'm your host, Jeremie Harris. I'm the co-founder of an ICT startup called Gladstone AI. With you, a bunch of national security meets AI stuff around, extreme risks from the technology. And yeah, I mean, if you ever want to contact me to admissions in the previous podcast, but I've had reach out since and people are having trouble finding my email. So hello at Gladstone, I will

absolutely do it. And you can also reach out to me on I'm on the Twitter or the X or whatever it's called now, but yeah.

Andrey

And we also do include the contact email if you want to send us any thoughts or suggestions or feedback in the episode description every time. And I'll also go ahead and include Jeremy's email so you can, oh, make sure it reaches him. And, just a quick shout out once again to a couple new reviews. We love to see it. There was a funny review on Apple Podcasts about, someone listening to this on their morning run in Frankfurt, Germany. So that's, fun to imagine.

I guess we do go pretty fast, so maybe it goes well together. I'm not sure. And there was another, fun, cool review by George, who just is very nice, says we are supposedly fun and fast, which is, we didn't try for it, but we'll see. Well, we have a lot of news this week. There was, surprisingly, quite a lot to happened. No major sort of world changing news, but a lot of kind of individual, somewhat significant bits and pieces.

So we'll be moving fast. And we do hope you keep up and it makes your run more enjoyable. So starting with tools and apps with her story is that Samsung's latest Galaxy phones offer live translation over phone calls and texts. And that's this story. In the S 24 line of smartphones. There will be the ability to receive calls in a language people don't speak, and receive a live translation of the call, both audibly and on screen.

And this is, following up. Also, meta has a similar feature in their meta smartglasses of live translation. If you're talking to someone in person, actually, we have these speakers in their glasses that do live translation. So received a significant, you know, to start getting hardware that does offer live translation from any language or not any language for so many, but for many languages, too many other languages.

Language barriers is really going to start becoming less of a thing as feels in this next year.

Jeremie

Yeah. They're also there's a lot of interesting stuff that goes into this too, because anytime you're doing conversational translation, there's a bunch of stuff like word choice that's it's really tricky, right? Because like depending on the tone, you might actually mean a different thing. You might have, you know, sarcastic tone that changes the meaning of things in like a more whimsical tone. And Samsung's actually doing that.

They're allowing people to pick different communication styles and have them like preprogramed into their, actually in the into the text version of this too. There is a text version of this translation service as well. So you can let it know that, hey, like let's get into a casual mode or, or whimsical mode or whatever. So kind of an interesting meta parameter that you can, you can tweak there.

I think one of the most interesting things, though technically about this is just the blazingly fast inference speed inference time that you need in order to do real time translation. Right? Because like in order for it to be natural, it's basically got to like translate your words right as they're coming out of the oven. And that requires very low latency. Apparently it's all happening on the device too.

So and this is for privacy reasons. So all you want, all the computations associated with the translation to happen on the phone itself. So not beamed to some central server. But that you know, that's really interesting. So all this blazingly fast translation, all these inferences are being done on the edge device on the phone itself. So I think a really interesting, question is what does the back end look like? Like how are they pulling this off at speed.

This is, I think, the fastest live translation thing I've heard about. Like it seems like it's pretty much instantaneous. So I think a pretty important phase transition in just like the effectiveness and market ability, market readiness, I should say of, of translation.

Andrey

Exactly. And with regards to us to. Details. It does seem like they're using Gemini Nado, the smallest variant of Google's Gemini that was designed to be on device, designed to go into Android phones and offer these sorts of features, really, of like on device super fast translation about access to cloud or anything. So it would be very interesting to see how good this is.

And if, you know, having this super fast model that's a slimmed down large language model can still handle very robust translation. And just to be very concrete, they offer a 13 languages starting out. So it'll be Chinese, English, French, German, Hindi, Italian, Japanese, and some other ones, a lot of a bigger languages in terms of, I guess, population seem to be covered here. So yeah, pretty exciting, I would say.

As far as news of on device AI and, the sorts of things we'll get in our phones and in the near future.

Jeremie

Yeah, it's cool to see Gemini Nano kind of get it's, it's sort of not big debut, but like, this is a really cool, concrete application. And it like I said, with a back end, it makes me wonder how this is integrated with the hardware. Like how what kinds of optimizations are they running to make this work so smoothly? Yeah, I really cool and and cool to see Gemini.

Andrey

And speaking of cool AI hardware, the next story is on rabbit are one of this little gadget that was just introduced, a couple weeks ago and had a lot of people excited. And the new story here is just specifying that this device will receive live info from perplexity, AI's answer machine. So the rabbit is just a little handheld, that kind of quasi phone. It has a screen and it has a camera and ability to talk to it.

And you can think of it as a sort of AI smart assistant in your hand, which you can tell to do things. And the pitch is it would just sort of figure out how to accomplish whatever you wanted to accomplish without you having to scroll and type and

all that sort of thing. And yeah, with this knowledge that they'll be providing live info from perplexity, basically what that means is you can now ask it any question, and it will be using this sort of advanced, AI driven search and perplexity offers to be able to apply to any query, kind of a more advanced version of what you might have with, existing assistants like Siri, where you can ask it various questions and they can try to answer using all the techniques like knowledge graphs.

Jeremie

Yeah, it seems like perplexities. Big differentiator really, is the ability to integrate real kind of concrete facts into the response. So obviously we're seeing that more and more, with chat, I think being the first to really do this at scale. But yeah, it's it's a kind of a competitor to ChatGPT with. Yeah, a little bit better kind of on the, the grounding side is at least the argument. And this does seem like it's part of a deeper partnership between, between rabbit and perplexity.

So apparently the first 100,000 rabbit R1 purchases are going to come with one year of the Perplexity Pro subscription. So it's sort of like a, you know, it is a deeper business partnership. Though that plan also includes access to newer LMS that include, like GPT four, and it's normally 20 bucks per month. So, it kind of kind of cool. I think at this point, this whole rabbit thing is really taking off. I, I guess was late in the game. I, I caught on to the announcement that came out.

I think this was the week that I was in DC that I was traveling. But, but yeah, I mean, you know, it's like this, like little Tamagotchi type thing. Maybe that's an analogy only millennials would get here, but like, did you play with Tamagotchi? I think you're.

Andrey

Right. I'm aware of it. I was around that time. Yeah.

Jeremie

It's always terrible when somebody tells you, like I'm aware of it, like. Like I'm such an old person. And this is something that's like. It wasn't my generation, but. But I've heard of it. Well, yeah. Anyway, it kind of reminds me of that, like one of these pocket held things and, apparently there also are the 51st 50,000 sold out now. So they've been doing super, super well, so maybe, maybe that will be a nice little bump for a perplexity.

Andrey

That's right. Yeah. Similar to the AI pen from humane. That was also kind of pretty hyped up. This is kind of try to imagine a new gadget and you, device that is kind of like an AI version of a smartphone, it seems, where, you know, you the main thing is to carry around an AI with you in a physical embodiment and let it do things for you.

So it'll be very interesting to see once people do get their hands on this and they type in whether that in practice is something that is a game changer or people will start having all the time, or if people just, you know, keep using their phones that already can do a lot of this. So yeah, we'll see.

Jeremie

Yeah, I, I'm really curious about that piece. Like the form factor. Right. Do we really. Need a new form factor for an AI specialized device. I wonder if part of this has to do as well with while the AI hardware, you know, they can they can ditch basically all the unnecessary phone related hardware. You know, depending on, on what they consider necessary, unnecessary for

this. But, and then replace it with just AI optimized stuff is still kind of interesting because we do have our phones, and maybe I just need to get a rabbit run and see how it works. Is it like the iPhone for AI or is it just, you know, another another fad?

Andrey

On to the lightning round for sure. Is Google is using AI to organize and customize your Chrome browser. So there are a couple of new features here of. There is a new tab organizer feature that uses AI to group similar tabs together, which is particularly for people who have hundreds of tabs with chaotic.

Jeremie

Users.

Andrey

Yeah. Which, you know, I've been there sometimes, but I try to avoid it. And besides that, the Chrome theme store is also getting an AI upgrade. You can use a text to image model to automatically generate a browser theme based on their preferences. And the last thing is there will be a feature called Help Me right, which will use an AI to generate a first draft of text for users in any text box on the web. So if you.

Yeah, and a smaller feature is being thrown in there, but does show that, AI is being pushed throughout Google's, products and in various ways.

Jeremie

And actually Google somewhat surprisingly, a little late kind of entering this space, right? We've seen obviously, Microsoft really leading the way with, their integration of Bing and, you know, being chat and all that. But there have been other browser is all the way going back to like opera which, you know, it's been a while for me at least. But but yeah, Google kind of coming in late here. They, I don't know, they've had this, this catch up vibe lately.

At least where lately is like last six months. And, this is sort of an interesting and interesting except for them. It's an obvious next step, and it's technically obviously very necessary. But, kind of interesting to note that it has has been a little while for them to, to be getting into this space.

Andrey

Next up, Adobe's new AI powered Premiere Pro features eradicate boring audio editing tasks. So Premiere Pro is the one of the main applications for video editing, and one of the leading ones that are used, professionally and just in general by people who make a lot of videos. And now there will be this AI powered audio editing feature that will make it easier to do several things. So one of those is audio category tagging.

It will automatically identify and label clips as dialog, music, sound effects, or ambient noise. And besides that there will also be automatic resizing of waveforms, and various kind of little adjustments, updating colors for clips for better visibility and stuff like that. But, I guess one last thing to note is the enhanced speech feature, which will improve the clarity of poorly recorded dialog. And will presumably be pretty heavily leaning on AI for that bit.

Jeremie

And up next we have Waymo looks to launch a full fleet of robot taxis in LA. So this is a reference to. So alphabet, Google's parent company their autonomous driving unit is Waymo. And they've been around for a while. Really doing this. The kind of the rubber starting to meet the road in a literal way. They're actually starting to expand their driverless robotaxi service in LA. And they've been testing in that market for a while now for about a year.

They initially they started off in San Francisco and now they're kind of. Yeah, looking for a license to expand out into into LA. San Francisco of course, like your default starting point for most tech things, just because that's where the companies are founded. And, so, you know, migrating to each new city has a bunch of policy and regulatory hurdles to factor in. It also has a bunch of new risks because the

cities can look different. So you need to kind of patch some some edge cases that can come up with your self-driving cars. So anyway, sort of interesting that this is now happening. They have a permit right now to operate 250 robotaxis in S.F. in San Francisco. And at any one time, they're operating about 100 of them. So, you know, unclear whether it's going to be at the same scale. But, this kind of interesting that we're finally starting to see this broader rollout of self-driving cars.

Andrey

Yeah. So they're basically expanding the paid version of this. They've been, as you said, already testing there. And now it sounds like they want people to start being able to use the app and call it, as has been the case in San Francisco for, I think, much of last year. They've been kind of, you know, letting people in for a waitlist, letting people hail a robotaxi to pick you up. I've been doing it for several months now and, really enjoying it.

So yeah, Waymo seemingly continuing to kind of slowly expand and probably this year going to try to commercialize much more and more markets beyond where we've been and testing and making sure things work. So far, no terrible incidents with Waymo. And, you know, six months of being commercially available in San Francisco. So yeah, it's kind of impressive that we're actually rolling out without any major, incidents. And on that topic of self-driving.

The next story is Tesla finally releases FSD 12. And according to the author of this article, that is its last hope for self-driving. That's in the title of the article. But the there's a reason why that is a title.

So, this, full Self-Driving beta V12 update, that's what FSD is, where Self-Driving feature Suite for Tesla is a pretty major update that's been kind of hyped up for a while by Elon Musk and the company where the big deal is they're replacing a lot of the stack, a lot of the implementation of self-driving away from kind of more hand written code and logic to fool on end to end neural net AI, where they have videos coming in from the sensors, and the AI gets to decide

what to do. Much simplified implementation by a lot. And if you have really good data, potentially leading to human level driving, although it's it's, you know, unclear whether that will be the case. So there is a case to be made. If this doesn't work, then you know, what will Tesla do without getting new sensors, without, etc., etc.? That is kind of a justification for that last hope for self-driving, where so far FSD has been pretty good, but not really reliable.

You have to be pretty careful when driving and pretty attentive. From my own experience and from what I think the general consensus on FSD. So yeah, it's starting to roll out in beta and we'll have to see how it looks.

Jeremie

Yeah, I don't know if I agree with the last hope. It I mean, I think it's you know, we're looking at a much more incremental, kind of improvement in, you know, in, in general self-driving car capabilities. This I think is significant from an architectural standpoint. You know, as we move into everything is in neural network territory. Everything is trained rather than hard coded, rather than the kind of Frankenstein monster of some neural networks and then some hard coded rules.

It does kind of make the whole system more uniform. It also makes it, in some ways more unpredictable, of course, because neural networks are not hard coded rules. And they introduce kind of interesting failure modes when you have of distribution inputs that the system hasn't been trained on or does know how to handle, that becomes more of an issue. And it does seem based on the at least the report here, that it's kind

of a mix. So there are apparently cases in which, v12 this latest version gives you a smoother ride and more natural ride in some cases. But it seems to get dumber and others. And this is just sort of what you get, right? These systems have weird failure modes. So I think what we what we can think of this as kind of a resetting the flaw on the capabilities of full Self-Driving.

And as more data comes in, as better algorithms come in, as more computing hardware starts to be deployed in this direction. We can expect to see an incremental improvement in self-driving capabilities. I think that's more what's happening here. We're kind of getting a reset. We're going to be placed somewhere on some effectively some kind of scaling curve, and we'll just ride that scaling curve upward.

As the hardware and software get better. But, yeah, it's an important fundamental shift in the, in the architecture for sure.

Andrey

That's right. Yeah. To be clear, I wouldn't say last hope is it's probably overdramatic. It, does mean that, you know, FSD has been incrementally trying to get to better and better and more reliable self-driving for years. We have had it in development for, I believe, about seven years now. So, this is a major push for sure to try and, achieve a real jump in quality.

And it will be, as you said, a big part of a challenge for self-driving is all these, like, edge cases, all these tricky little individual things that aren't typical driving. And with a neural net that, you know, one of the drawbacks is you can't change the code and just handle a case. That is weird, right? It's kind of hard to edit, so to speak, with behavior, and you might get

unexpected things. So it will be interesting to see, you know, if there are a lot of weird cases or if in general, there's enough data already from Tesla, which is possible, they do collect data from their fleet. So yeah, it's it's a major deal for FSD for sure.

Jeremie

And moving on to applications in business, we're opening with a story called opening AI. CEO Sam Altman is still chasing billions to build AI chips. So, Obanikoro Sam Altman, he is known for his interest now, his recent interest in, the process of chip making, a funding chip making efforts. So as we've talked about before on the podcast, when we talk about hardware, there are kind of two relevant stages of the hardware development process

that are worth noting here. One is you can be a company like Nvidia that designs, cutting edge AI processors like the H100, but you don't actually build them. You ship those designs over to a chip foundry, a semiconductor manufacturing company like, TSMC, Taiwan Semiconductor Manufacturing Company, and they actually build it. And that's super, super hard. Both those steps are really hard. But the second step where we actually fabricate the chip is are really,

really hard. And it is super, super expensive to get off the ground like we're talking tens of billions of dollars to just set up a semiconductor foundry. So massive capital expenditures, which is why people are kind of looking at this and like, you know, it's a little weird that you would instead of starting by maybe designing your own chips or finding it like a kind of cheaper, easier way around this. Why OpenAI and Sam Altman in particular is looking at like, chip fab.

This is a really, really long term play. It does seem like Sam Altman is doing this because he expects demand for high end chips to outstrip supply heading into the latter half of the decade. If something doesn't change, he doesn't think that, for example, TSMC can handle all the demand that's going to be coming in as AI becomes a more and more central part of the economy. So it's kind of kind of interesting.

He's apparently been meeting with all kinds of investors, including Middle Eastern investors, including this guy called Sheikh Toluene Bin. Oh, God. Yeah. Something of the UAE. He's got a lot of ties with the UAE right now through, potential partnership with G40 to, which is a UAE based research group that I believe is also being investigated, by, by US national security interests for its relationship with blacklisted Chinese companies.

So there's a whole story there. The the House China Select Committee chairman, Mike Gallagher, has been has been really good on chasing down that thread and being like, okay, hold on. What is the relationship there? So there's a whole bunch of exposure, you know, and kind of complex stuff with the UAE Sam Altman thing going on. But the fundamental story here is, they're in talks to raise 8 to $10 billion from G40 two alone from this UAE Group group alone.

Just to set up a global network of semiconductor foundries. So this would be a really big move. It's got to be a long, long term play. And, we don't know whether this is going to be, this kind of foundry is going to operate as a subsidiary of open AI or as a wholly separate entity. But either way, it's pretty clear OpenAI is going to be its main customer.

Andrey

Pretty dramatic, you know, as you said, big play. Usually you have nation states, you know, investing billions of dollars into a solar fab process. And, I do think the global network of fabricators is interesting. I feel like there might be a geopolitical angle there in terms of worrying about the potential future of, you know, whether tensions and sanctions within the US and China maybe having a limited number of places on the globe that can produce these things could be problematic.

Eventually in the long term. So, yeah, big play. And, as we covered and many people covered this, potentially it was part of the, tensions regarding, Sam Altman as CEO of OpenAI, having sort of side ventures and, side activities. So seemingly, I guess that is resolved and it's the path is clear to keep trying to make this happen.

Jeremie

Yeah. That's a great point about the global kind of globalization piece, too. Just one last quick thought as well. You know, a common thread that, Sam Altman has has pulled on when he especially talk to people who are concerned about AI safety and the risks that, you know, maybe we're moving too fast and

all that. You know, the retort has been from OpenAI and Sammy in particular that look, we need to make sure that we are building systems that are as powerful as possible, using all the compute that we possibly can. Because if we don't do that, if we just let compute get more abundant, like our AI chips get better and better and better, and we don't, try to keep models at the frontiers of compute.

Then we may wake up one day and find there's a massive amount of compute overhang, and we end up making a giant, uncontrolled leap all of a sudden and accelerate the capabilities of these systems to, dangerously fast without climbing that, that ladder carefully. Now, this is OpenAI actively seeking to grow the pool of compute. So in other words, this is really like acceleration ism on the hardware end as well. Which I think sort of undermines that argument a little bit.

I personally need to put more thought into this and how it fits into the framing, but it does seem quite relevant from that AI safety picture. Just a quick thought. But anyway.

Andrey

And next, a related story about a powerful tech CEO and hardware for AI in the story is Mark Zuckerberg's new goal is creating artificial general intelligence. So, meta CEO Mark Zuckerberg had this video post where he kind of went into some thoughts regarding the goals of meta. He mentioned that the goal of meta is to build artificial general intelligence and open source it responsibly, so it benefits all humankind or something like that. And, sounds overly familiar.

Jeremie

Well.

Andrey

Also new, I would say familiar and a bit new for that matter. Yeah, right. Meta has been not so much in the AGI race. So this is a development from their previous stance of being very much an AI research lab, but not focusing on AGI so much as more specific advancements. And so with this new policy or, you know, vision or whatever you want to call it for the AI efforts of meta, it, is

pretty notable. And it also came out that meta plans to own more than 340,000 Nvidia H100 GPUs by the end of a year. So they are building a hoard of compute to be able to do this. They, you know, with this kind of number that really just underlines that they are betting big on AI and wanting to advance it toward something like artificial general intelligence.

Jeremie

Yeah. And it also, you know, for context, you would talk about, you know, 340,000 Nvidia H100 GPUs, like you said. You can compare that to see roughly 40,000, H100 GPUs that were used. Sorry, 40 that was a100 GPUs used to train GPT four. But anyway, so, you know, you're talking that order of magnitude we're really seeing in terms of number of GPUs.

And if you if you account for the full computing capacity that meta has in stock, like if you, if you count, you know, their A100 and they're 3100s and all their other kind of lower end chips. Apparently their total compute pool is about 600,000 h100 equivalents. So this is just a, you know, a monstrous, monstrous, amount of computing power that Zuck is claiming, you know, may well be, but certainly he can be sure, but may well be the largest pool of compute available to any company.

I thought this article was just fascinating. I thought it was an excellent article, a great expos. They gave us the single biggest update I've ever seen on Wear Face. Sorry. Where meta stands on AGI, one of the big awkward things for meta for a long time has been other ahead of AI or one of their heads of AI. Yama kun's position on AGI. You know he has taken the view for a long time that AGI is not actually going to be hitting anytime soon.

If it, you know, famously also adds, if it does, it's not going to be risky or whatever, but a big part of his, his party line has been AGI not soon now, as you can imagine, that actually makes recruitment really hard, right? Because you got other companies like OpenAI saying, hey, we think we could be hitting AGI sometime, like the next two years.

Like that is absolutely in the cards. So if you are a. Researcher who's extremely ambitious, and you're going to go work at, like, the best lab with the best shot. Well, you might as well go work at OpenAI, see if you can do it. If it doesn't pan out, you can always go back to Metta because apparently they're playing a long game. So that's been at the core of this. And Zuck is really clear in this interview.

You know, one of the things he says is we've come to view we've come to this view that in order to build the products that we want to build, we need to build for general intelligence. I think that's important to convey, because a lot of the best researchers want to work on the most ambitious problems. That is 100% true.

It's also interesting to hear him kind of saying the quiet part out loud, but this does require a bit of a change in tone from the more kind of bearish language that, that Yann LeCun has been putting out. So anyway, I thought that was really quite fascinating. It seems that, in also 2023. So this is a big ramp up, by the way, we talk about 340,000 odd h100 GPUs last year, 2023. Meta brought it about 150,000. So this is like, over double what they brought in before.

And it's in the context of meta kind of centering their AGI. Now, it's like you said, it's AGI research efforts on Lama three and what comes next. And here's a really interesting quote. The last one I'll give from Zuck in this interview. He says Lambda two wasn't an industry leading model, but it was the best open source model. And that, of course, has been his whole game up until now. They're not the best in the world, but they're the best open source in the

world. And then he adds, with Lama three and beyond, our ambition is to build things that are at the state of the art and eventually the leading models in the industry. So this is meta openly saying, hey, we're now actually going to become a true frontier lab. It's going to be, you know, anthropic, OpenAI, Google DeepMind, and maybe meta. You know, that's kind of the, the, the big set that he's looking at here. Last quick note. I'm sorry. I just like there's so much interesting stuff

going on here. Last note, so, so, Zuck has come under criticism because of the amount of control he has at meta. Right? So he's got he's got basically full voting control of the company's stock. And this is in a context where, you know, OpenAI, quite famously, has this weird board structure that's designed to kind of keep a lid on things. If the CEO goes crazy, it seems like it maybe, you know, maybe it maybe may have failed recently based on some people's assessment at least.

But anthropic has another kind of similar structure designed to have corporate governance, free safety. And here's Zuck, right. Wielding total power at meta. And his argument is like, look, I get it. You know, I have full control over the board and everything. Yeah, whatever. But that's why we're doing open source. That's why we're open sourcing AGI. So it's not under my control.

But then he hedges by saying, for as long as it makes sense and is safe and responsible to do, I think will genuinely, genuinely want to reach sorry, lean towards open source. And I hate to tell you this, but this is exactly the argument that OpenAI made back in the day. The only question is what is the threshold of danger where you then start to say, okay, this is now for internal use only, and

that's a debate you can have. But the reality is that there is no fundamental difference between the arguments that are being made here and the arguments that have led to OpenAI kind of shutting its models down. Some would argue for good reason, and I and I certainly would be one. But, anyway, I think it's sort of an interesting story, so much to be gleaned here about Meta's approach going forward.

Andrey

Right. And this, I guess, builds on something you saw last year, which is we did go all in on the open source direction, somewhat in contrast to many other AI companies. So they've open sourced a lot, including Lama two, which is Tuesday, one of the best large language models. And with the statement, the implication or I guess one of the takeaways that I think is not worry is that they are training lama free and they will most likely open source it.

So lama free will be probably one of the decent chances to get the next better open source model that may get to GPT four or claw to level, right. So I guess the trajectory seems to be the same. I mean, these are near-term and that's yeah, it's pretty exciting because number two did enable a lot of development and generally a lot of the open source models from meta, did, enable a lot in the AI space. Moving on to a lightning round. First story is a AI voice startup.

11 labs, lands 80 million round, launches a marketplace of cloned voices. 11 labs is one of the leading companies for synthesizing audio of people speaking. So given some text, it will output pretty realistic sounding, AI generated audio. And now they've raised this 80 billion in their series B round of funding, which grows their valuation to 1.2. 1 billion. So really positions him as sort of a leader in that space of voice cloning and synthesis.

And with this introduction of marketplace of cloned voices, it moves them toward, probably being more usable and more contexts and, something like that.

Jeremie

Yeah. It's, the, the rounds coming in with some, really impressive investors who are co-leading this, including Andreessen Horowitz, who are, I guess, continuing their fairly recent pivot to more of an AI focused rather than a crypto focus. And, the former CEO of GitHub and at Friedman and, Daniel Gross, who, they say, you know, Apple AI leader. Yes. That's true. He's also former partner at Y Combinator.

And and Sequoia Capital, Silicon Valley Angels, a really, really impressive set of, people joining the cap table and in some cases rejoining the cap table. It's also six months after they raised a $90 million series A round. At that point, their valuation was 100 million. So they have like they've done their valuation in six months. Just goes to show you how fast things move in, in generative AI. And, you know, maybe these valuations will prove to be, more or less stable,

but kind of interesting. This is blazingly fast. I mean, you know, five years ago, this I don't know if this would be record breaking, but it is is a really quick advance. And they're using these funds to launch a bunch of things, including, a dubbing studio feature which apparently gives professional users a way to, not only dub whole movies in whatever language they want, but also to generate transcripts, translations, things like timecode.

So, again, all the kind of the suite of tools that you might want, as is implied by the name Dubbing Studio. So is either taking a, I guess, a much more holistic, picture, or approach to this, which is kind of cool.

Andrey

Next up on topics margins raise questions on AI startups long term profitability. So this is stating that the gross margins on profit from on Tropic is between 50 and 55%, which is lower than the average for cloud software, which is 77%, unsurprisingly, because, unlike typical software, AI is very expensive to run, just the cost of compute for running an AI model, especially a large AI model like cloud two, is non-trivial.

And right now there has been a bit of a race to the bottom to make it as affordable as possible to pay for a generation, which is leading to decreasing, margins, as you would expect. So that is presumably also the case for OpenAI and other providers in the space. It's a tough place to make a lot of money right now.

Jeremie

Yeah. And a lot of people speculating about, you know, will they be able to keep up the the fundraising multiples that they've enjoyed so far? So by that I mean the ratio between their their revenues and their valuation, basically something like that. And I mean, it's an interesting question. Certainly, you know, like, as a startup investor, like if I look at a company, it's got, you know, if it's a software as a service company with margins anywhere between 50 and 55%, I'm like, no, thank you.

But of course, anthropic margins or sorry, their, their multiples don't necessarily just come from people looking at them as a software, as a service business. Right. There's a view of anthropic. It's like this could be the company that builds AGI. And if that comes to pass, then, you know, the the whole strategic calculus behind the business is just like, we want to build the AGI so we can win it like market based economics.

So I think I think there's that like that dimension is kind of missed a little bit in this, this analysis. It's not just a, like a software as a service play. And that's not how most investors would evaluated or are, I suspect, for example, when Google is investing, they're not necessarily thinking like, oh, anthropic is going to be like a kind of comparable to a SAS play. But still, it is interesting. You know, these margins are pretty bad in that

context. It speaks, as you said, Andre, to that race to the bottom. You know, at first it's like we have a way of offloading cognition to machines that can work for cheap. But the problem is, you know, then competition arises and the computations on those machines start to get awfully expensive relative to how much you can charge. So, kind of interesting. Right now, anthropic margins are mostly going to come from their the relative delta of their models, quality to open source models, to open AI

models. And they may be able to kind of keep that going for, for some time. But, you know, that's only buying them 50 to 55% margin at this stage. I, I don't know if it's going to get worse over time or better, but that's a really important metric.

Andrey

Track and related story cohere and talks to raise as much as 1 billion as AI arms race heats up. So here is another one of these big players that is developing big AI models and seeking to, I guess, rule the world. Sort of. And yeah, they are trying to get a bunch of money to build more models. And, given that previous story we just covered, it'll be interesting to see if they do accomplish it and if these companies continue to be able to raise billions.

You know, I won't say easily, but successfully, in this coming year.

Jeremie

Yeah. And like, here's the big differentiator is relative to OpenAI at least, that they're focusing on enterprise customers rather than, you know, business to consumer or whatever else, that, you know, might make it easier to defend their margins if they have enterprise customer specific features that actually account for a lot of the value that they're selling, that aren't just, you know, where you can't just compete with them by buying, you

know, a bunch of GPUs, and then it's a race to the bottom. But still kind of interesting question. Their latest valuation, was $2.2 billion they raised from Nvidia, Oracle and a bunch of other, really solid VCs. So that was the last valuation. We don't know what the new valuation is going to be. We just know that they're discussing raising, you know, 500 to 1, 500 million

to $1 billion. So typically, for late stage investments like this, it can vary a lot, but you usually end up seeing, you know, you know, maybe roughly like a ten x, sometimes A5X, but more like a ten x, even 15 x multiple. So, you know, we may see them raise it a, you know, anywhere from a 5 to 10, even $12 billion valuation on that, on that basis.

Andrey

Next. DOJ and FTC push to investigate Microsoft's open AI partnership. Justice Department and the Federal Trade Commission are discussing whether both of them or one of them will investigate OpenAI for potential antitrust violations, including its partnership with Microsoft. And that is following up on similar tensions in the UK. And some of these news from last year of FTC starting with investigation.

So yeah, not not any major developments so far, but it seems the government is still a little bit, questioning as to which, road to take with their partnership.

Jeremie

Yeah, it kind of seems like, at least my reading of the articles that the DOJ and FTC are sort of arguing over who has jurisdiction. And I think my sense was that they both wanted to do something. So it's like, who gets to do it rather than who gets to not do it? So yeah, I mean, it seems like these conversations are just limited to the Microsoft and OpenAI investigation.

They're not part of, as the article puts it, some broader dialog, over which agency will investigate artificial intelligence issues. Kind of generally, this is really specific to this partnership. The only thing I'd add is like Microsoft's defense in this context, as I understand it has usually been that like, hey, look, we don't own a controlling share in OpenAI, first of all. So like, you know, we don't own the company and our partnership actually

doesn't preclude competition. And in fact, we do compete with them. We have our five series of models. You know, we've trained models in the past like Megatron, Turing and and so on. So like innovation continues apace at Microsoft and, yeah, we are competing. So we'll see if that argument lands. We'll see if that's really what this is about or if it becomes kind of important in this context. But, more legal action for OpenAI, which I'm sure they're not thrilled with.

Andrey

And one last worry for the section figure announces commercial agreement with BMW manufacturing to bring general purpose robots into automotive production. Figure is one of the leading developers of humanoid robotics of general purpose humanoid robotics, along with one X and Tesla. They're one of the startups or, you know, major companies trying to build a robot that will be deployed to do kind of whatever, so to speak.

And yeah, this is, pretty significant milestone for them having an agreement with BMW manufacturing to deploy, their robots for automotive manufacturing. And I think one of the first major developments for humanoid robotics in general of this sort to be deployed in kind of real contexts.

Jeremie

Yeah. I feel like we just keep seeing these stories about a general purpose robotics kind of finally hitting the mainstream. This feels like the year where we're maybe going to finally see some stuff. I was gonna say hit the shelves or I guess hit the sidewalks. But yeah, really interesting.

Andrey

Yeah. I will note, it's been the case that Boston Dynamics has been trying to put its, quadruped robot spot out to New World for a few years now, and the road to actually putting out into a world and having it be out where is has been tough for them so far. And, you know.

Jeremie

Just seeing its goddamn tail, you know.

Andrey

Yeah. Well, not quite that bad, but yeah, here there will be a staged approach. So initially BMW and figure will try and figure out how to start deploying robots in this manufacturing facility in South Carolina. And so it'll be interesting to see whether they can successfully. She accomplish. We roll out, and integrate, easily and fast. Or if, as with rational mix, it will prove trickier to deploy robots and integrate them into these kind of contexts, onto

projects and open source. We only have one story, and that is that stability. AI has unveiled a smaller and more efficient 1.6 billion language model, and this is stable LM 21.6 B and 1.6 B being the size that is relatively small. Most models are like 3,000,000,007 billion, Jupiter four and cloud two are, you know, let's roughly say a trillion. It's not exact, but a much, much, much bigger. And so this is a relatively tiny one.

And it is kind of similar to, Microsoft's Fei and Gemini Nano in a class of trying to get smaller language models to be really good. And it sounds like this one is comparable to something like Microsoft Fi that is on also only a few billion, but impressively good.

Jeremie

Yeah. And actually, I mean, because the fi two, which is being compared to here, you know, it's a 2.7 billion parameter model. Whereas this one is a 7 billion parameter model. And you know, for a I'm sorry, what am I saying? It's it's, it was at one point. Oh, yeah, 1.6 billion parameter. So like almost roughly like half the size. And it's performing well. So that's actually a really impressive kind of next step. It's best performance. It's kind of interesting.

It performs best on the truthful QA benchmark just relative to some of the other models. I want to just flag I think that maybe that may be because of something called inverse scaling. I'm actually quite curious about this. So, the truthful to a data set is designed to test how models deal with questions that, like lead them on that, that hint at them that they should answer in a maybe not truthful way. A classic example is the question who really caused nine over 11?

And a model that's just trying to do text autocomplete might look at that sentence and be like, well, it's not who caused 911, it's who really caused 911. So, you know, probably this sentence was pulled from like a conspiracy theory website or a non-mainstream source. So I'm going to do my autocomplete faithfully and just say, you know, the US

government caused 911 or something like that. And so what you sometimes find is that the better a model gets, the more it will, just as an autocomplete system, that is, the more it will answer incorrectly some of those questions, because it catches on to the subtleties and nuances in the question that hints that a non-mainstream answer is expected. So that's that's kind of why the truthful two way benchmark was invented. It was partly motivated by that idea. And here we have a very small model.

And I wonder if it's kind of benefiting from the fact that it's not clever enough to catch on to those subtleties. So it does answer correctly. It essentially does this thing called inverse scaling, where smaller models actually kind of do better because they can't catch on to the the nuances that would throw off a more complex model.

So anyway, kind of interesting note, the other piece was it did really well on this kind of math benchmark GSM eight K. It's hard to know why because we, we actually don't know what data set, was used to train the model. Apparently the released information about that in a follow on technical report. But it may be kind of more math heavy data. Last thing, that I thought was kind of interesting is they bought two

versions, this model. You know, they've got an instruction fine tuned variant and, and the base model itself. But one of the interesting things that they're doing here is on top of those variants, on top of the base model and the instruction tune model, they're releasing the last training checkpoint before what they call

the pre-training cooldown. So in other words, there are a bunch of bells and whistles that you add after you finish the kind of the auto complete training phase, the pre-training phase for these language models. And sometimes those bells and whistles kind of make it harder to do to provide extra training to the model.

So you might do your initial text autocomplete training, get it to autocomplete like roughly, you know, all the text on the internet or the 2 trillion tokens that they fed it in this case. And then after that you're like, all right, let's add some some fine tuning on, you know, human feedback or on, on, you know, instruction data or dialog data. But once you do that, you can sometimes find that it's harder to continue the autocomplete, the pre-training, if you want to give it even

more training. And what they're saying is, look, we are giving you the version that doesn't have those extra bells and whistles added on so that if you want to take this model and do even more pre-training on your own data set, you can now do that. And so that's a really interesting thing. I haven't seen it done before. And maybe it's a a new category, dare I say, of open source where we can now ask questions about okay, did they open source the model. Sure. Did they open source the code.

Did they open source? The last training checkpoint before the bells and whistles were added? So kind of an interesting dimension to all this and reflects, Stabilities interest in open source as a practice.

Andrey

Very true. And, worth noting also. Unrelated to open source, but we did also release stable code. Freebie. Newer model for code generation. That is also really good. Also better than bigger models from the past. And that is actually going to become part of their commercial offering. We have a membership subscription service that was announced in December, and this, will be part of a models that is available through that.

So, yep. Stability, AI continuing to develop awesome models and put lots of things out there.

Jeremie

Up next, starting off our research and advancement section, we have alpha geometry, an Olympiad level AI system for geometry. So this is coming from Google DeepMind. Of course Google DeepMind. We've covered a bunch of their stuff, especially lately, you know, like fun search. And I think the genome or genome or however you're supposed to pronounce it.

Anyway, they're known for doing for building really, really powerful frontier cutting edge AI models that are nonetheless specialized to solve foundational problems in like math and science. That seems to be one of the the directions they're pushing in to get to AGI, which is their ultimate goal. One of the really interesting things about this particular piece of research is, unlike a lot of other breakthroughs in the space, it's not just one

large like language model. Let's say that we augment with some sort of like, prompting scheme or auto GPT or whatever. It's a combination of two things. It's a language model, and it's a symbolic, system that's coupled to it, what they call a symbolic deduction engine. And it's meant to solve these complex problems in geometry. This approach where you fuze these two, by the way, is called a neuro symbolic approach. And there was a whole field of research, neural symbolic

reasoning. And just trying to find ways to, to make these things work together. Because a lot of people say that that's how you get AGI. You you can't get there through just neural networks alone. You need to add some kind of rules based kind of, logical symbolic reasoning engine to make the whole thing actually work. And so what they're doing here is they're setting things up so you get a

problem, a geometry problem. And the language model part is first going to kind of take a look at the problem at a high level and figure out, okay, what are you roughly speaking, what are some of the strategies we could think of using here? And then based on that high level instinct or guidance, the symbolic deduction engine gets to work kind of pumping out mathematically verifiable proofs that attempt to kind of, in a more brute force way, solve the problem.

And by combining these two things together, what they find is that they're able to get astonishingly good performance. I mean, they for for at least the geometry portion of this, this Olympiad exam, they, they blow it out of the water. They, they hit like gold medal. They're all close to gold medal performance, for this, this test. So really like some of them. Most effective, geometry, if you will. Logicians really struggle on this. And they had a former gold medalist for this competition.

This guy's name is Evan Chen. Comment on what he saw and like assess what's going on here. And so he said this. I'll just pull up the quote. He says, one could have imagined a computer program that solve geometry problems by brute force coordinate systems, think pages and pages of tedious algebra calculation. Alpha geometry is not that. It uses classical geometry rules with angles and similar triangles, just as students do. And that's really coming from the language model part, right?

This is where the language models kind of soaked up all this implicit knowledge about the world during its autocomplete pre-training, and now it's able to deploy that to identify promising paths, promising high level strategies, direct the kind of symbolic part of the system in a more efficient way, and then have the symbolic part execute on these kind of predetermined trajectories. So this is really interesting.

I think it's another example of one of those breakthroughs that like 20 minutes ago, we would have been told was not going to be possible for AI for a long, long time. And, yeah, pretty, pretty remarkable. Though. Again, it is a specialist model, right? This is not a single generalist system like a GPT four, just going out and solving the math like yet.

Andrey

And yeah, lots of neat little things here. I think another neat thing is this is another example of DeepMind, going down a road with synthetically generating data. So in their paper Solving Olympiad Geometry Without Human Demonstrations, they highlight that one of the reasons this is hard is that you don't have data, right? There's just not a known data set of proofs that are machine readable and it can be used for training.

So they did also develop a technique to synthetically generate a whole bunch of proofs to train on, and improve the system that way. And yeah, as you said, I think this is, also notable as a neuro symbolic system. And it'll be interesting to see if this sort of points to a future where whatever AGI is, maybe some things like this, like a highly technical, like solving geometry problem, maybe it won't just be one neural network that can just do it through its

weights. And it will be very honest using tools similar to humans, similar to this, to be able to, solve, you know, very advanced, more complex problems that aren't necessarily in the domain of neural nets. But then again, humans. Us to do this, without that sort of thing, so it's hard to say. Next story. Lumiere. Space time diffusion model for video generation.

This is a video generation model from Google, and they are changing up a little bit from a typical approach by essentially doing the whole generation in one pass with, fancy new variation on the typical architecture that attends to both, space and time and they, yeah, demonstrate, pretty state of the art text to video generation results. There is a video you can go look up for Lumiere. And similar to existing text video, it's like near photorealistic. You can still tell that it's AI generated.

There's some artifacts and some kind of weirdness to it, but it is getting very smooth, very sort of realistic ish. So, yeah, cool to see continued product progress in the text studio space.

Jeremie

Yeah. And that idea to of just sort of like one shot generation of the whole video as opposed to what's done now, which is more like you generate a frame, at one time step and then based on that and based on other information like a prompt and other factors, you then generate the next frame and then the next frame. One of the risks of that is you can sometimes find the, model kind of, guides itself off course, right?

Like, like small, issues that stack together as it generates new frames, can, can add up and kind of throw it off and cause the thing to go off in a different direction. By doing the whole thing all at once, it kind of ensures that the overall picture you get is much more consistent. Right? All the parts of the video are informed by and guided by the same inputs, the same prompts essentially. And so, kind of an interesting approach.

I mean, I imagine this is, this is going to be something that will get more and more compute optimal over time too. But yeah, it is a diffusion based system as well. So it does use the sort of like more standard diffusion approach, but it's just that the whole level of a full video.

Andrey

And real quick, I just want to point out, to be fair, not just from Google. This is a collaboration paper from Google and the Weizmann Institute, Tel Aviv University and, Technion. So a variety of groups. But Google is one of the major pushers here.

Jeremie

I do, I do think every author has a Google affiliation though, so maybe.

Andrey

Oh, interesting. There's multiple affiliation for some of our viewers here. Yeah that's right. Okay, onto a lightning round chat, QA building, GP, four level conversational QA models. They attain GPT four level accuracies for this specific task of conversational question answering. And they propose, novel ish kind of two stage tuning approach that can improve the accuracy of these trained models.

Jeremie

The two stages are, first off, some supervised fine tuning. So so first they get their their pre-trained model, right, trained on text auto complete. It's just an auto complete engine and glorified auto complete. Then they give it some supervised fine tuning on a data set of instruction following and dialog data to kind of make it behave more conversationally,

more naturally. But the second step is kind of where the the secret sauce comes in, to the extent that there is a lot going on here, it really is here. So they create this data set for retrieval, augmented generation. So essentially this is where you have your language model that learns to use an external database to pull in information that is relevant to answer the query. And that only pulls in parts of documents that are relevant to

the query. And so to train that step, they had to collect apparently 7000 documents and get annotators to act as both a user who's asking the question and follow up questions about a document and an agent that gives the responses. So they really had to kind of like do a lot of their own answering. So 7000 different conversational dialogs here. So it's a huge dataset. While huge in terms of the amount of effort that

would have had to go in here. And, so they give that extra fine tuning to give the model some examples of the kinds of answers and the thought process, if you will, that it should go through. That's really the breakthrough here. It's shown to kind of behave comparably to GPT four. They do a head to head, and they find that it wins about 14% of the time versus GPT four, and that GPT four wins about 17% of the time. And rest of the time, it's a it's a tie.

So kind of seems fairly comparable at retrieval, augmented generation tasks, which is an increasingly important thing. Right. Because that's how you as typically today, that's how you ground, large language model responses. In reality, it's how you make sure, as much as possible that they're, not hallucinating. You kind of get them to call up some part of a long document, and, and make sure that that information is explicitly in the prompt that's,

that's used to respond to your query. So interesting. Interesting step. This is by Nvidia researchers. So, it's Nvidia kind of continuing to keep their, their finger on the pulse of, language modeling.

Andrey

Next story. Vision. Mamba. Efficient visual representation. Learning with bidirectional state space models. The real quick state space models, as we have been mentioning for a couple months, is an emerging new type of neural net that potentially could surpass the transformer, or at least in some ways, might be superior to transformer.

Part of the reason for that is that it kind of takes some inspiration or has some relation to recurrent neural nets, basically neural nets that are specifically designed to deal with sequences of data, in method that scales with less overhead than transformers. And so this is building up on one of these state space models. That was Mamba. Mamba was a pretty promising kind of version of that that came out in the language modeling space.

So they show that you can do various vision tasks for images like classification, semantic segmentation, detection, instance segmentation, all this stuff that typically we've already seen transformers do with vision transformers. This pretty much presents, modification or adaptation of a mamba architecture to, images with a Vision

Mamba encoder. That's a an iterative build up on top of this previous work that shows that you can use it and get pretty good results, although it doesn't seem to be, let's say, groundbreaking.

Jeremie

Yet and these I mean structured state space models, as you said, like kind of seem like they are becoming more of a thing. They also they have been around for a while. There's a paper a few months ago I can't remember when, but, that that kind of made some modifications to the, sort of state space model approach to. Yeah, to make it a lot more efficient and effective. And that's kind of that seems to be what's triggered all of this.

And there were compute, sort of compute strategies that they, they use, they had like a hardware aware a strategy or they're optimizing for hardware usage and anyway, doing a whole bunch of other things. So it's sort of like this next evolution of the system that seems to have unlocked a potentially transformer level, capability or promise. So kind of cool to see that story continue.

Andrey

And actually, it's kind of funny. Just a day after that previous paper Visual Mamba came out and a second one came out, the Mamba visual states based model that, while is, pretty similar, it's, visual Mamba. But, now it's, it's V Mamba. And they also show how you can achieve, pretty impressive results. Pretty comparable results to, like, top end neural nets, of other types like from swimmers and convnets. You can also see with Mamba type architectures. So yeah, just came out one day apart.

Visual Mamba came out on archive on the 17th. This one came out on the 18th. So really goes to show that there is some excitement in the space. And, you know, as soon as Mamba seemed promising, a couple of research groups jumped on and decided to extend it to visual tasks.

Jeremie

That is super interesting. It's also, I'm looking at the author lists right now, and this is the first one was, like the most notable, group here is the Beijing Academy of AI, the sort of like, I don't know, there are a lot of Chinese OpenAI, things, but they have an AGI agenda, Horizon Robotics and so on. The second one, like these are non-overlapping groups. Painting lab is the kind of an in Huawei X. So yeah, kind of kind of interesting. I mean, these are two Chinese teams, so

seems like a coincidence. But then also there's no overlap. So I wonder if it's one of those things where, you know, you get worried that somebody is going to scoop you. And so you quickly try to pump out your, your results. But, yeah, I.

Andrey

Think that's exactly what it is. Yeah. So I'm came out people like let's extend it to visual realities and show how you can adapt it. And yeah, both of them just got to work and and right away got results pretty quickly. Decided to release a paper. Usually archive takes a day or two to actually put things out. So it might actually have been total coincidence. They both submitted at the same time. Or it could be that they saw one release and the other one kind of pushed it out super quick.

Jeremie

So I do not miss academia.

Andrey

But exciting for the space of AI to see kind of excitement around a new type of architecture that is not Transformers. And one last story from the section that, if anything, unleashing the power of large scale unlabeled data. So this is about monocular depth estimation, being able to predict how far things are in a 2D image, essentially, and this is a paper that is seeking to build up kind of a best version of that via, large scale unlabeled data. So they're using monocular unlabeled images.

And. They have a data engine that can automatically generate depth annotations for these images. That leads to a huge dataset, 62 million diverse and informative images, and then they use that to train a state of the art, super good depth model. And their models are pretty important because when you can use that for things like robotics or self-driving cars or anything we leave, it has to interact with the real world or make predictions about what geometry of a real world.

Jeremie

And moving on to policy and safety, we have OpenAI suspends developer behind Dean Phillips bot. And so this I feel like is kind of the culmination of a lot of these policy trends we've been talking about for a while. Podcast. So there's this company called Delphi, and they made a bot called Dean Dot bot. And this bot basically does an impression. It mimics a Democratic White House, candidate who, so representative Dean Phillips, from Minnesota.

And it basically can talk to voters in real time, through a website, and, you know, essentially to presumably try to convince them to, to vote a certain way or whatever. It was taken down, though, by OpenAI and, and by the way, it was it was so the, the bot itself is being funded, it seems, by Silicon Valley founders who started a super PAC called We Deserve Better. So anyway, it's like it's all part of the political

apparatus or whatever. But this is notable because it's the first known instance where OpenAI has actually cut off the use of their AI systems, in political campaigns, kind of enforced that criteria in that they said they would enforce in the past. So it seems like the first, the first hit is on, you know, the Steam Phillips bot. And so this also happened notably just before Super Tuesday, which happened well, yesterday for us.

Like, who knows when the podcast will come out a few days ago, I guess. But, so, so really kind of, cutting, we're breaking news story, I guess. And, he's a bit of a long shot. Dean Phillips is, but, he's apparently running, against, like, President Biden. So I guess that's the that's the game plan. If you can't win him, Wentworth votes winning with bots. So interesting to see OpenAI forced, forced to act on their policy.

Andrey

Yeah, exactly. We just covered how last week they announced the policy regarding the election and how this is one of the things that they stated they will not allow is impersonations of, candidates. So we'll see. May might be more of this. It'll be interesting.

Jeremie

And as a related kind of combo story here, we have fake Joe Biden robocalls. Sorry. Robocall tells New Hampshire Democrats not to vote on Tuesday. Again, this is Super Tuesday. So we have the New Hampshire Attorney General's office, is saying that it's investigating what looks like what they call an unlawful attempt at voter suppression. NBC news apparently reported that there's a robo robocall impersonating Joe Biden that was telling people not to vote in the presidential primary

in, in New Hampshire. That just kind of went by. So, apparently sounds just like Joe Biden. But it they think, based on what they call initial indications that it was AI generated, I imagine those initial indications are like, I mean, where was Joe Biden at the time? Probably not on like hitting the phone lines for the New Hampshire primaries. But, yeah, it's it's interesting.

There's a whole blame game going on here where people are, are kind of blaming, in some cases, there's, people blaming Democrats for, you know, you're just doing this to like, amp up, you know, get voters out there and get people excited, or get them to not vote in certain contexts. And then there's people blaming Republicans say, hey, this is, you know, something? Yeah, you guys must be behind this or whatever. They've denied that. So, spokesperson for Trump's campaign said, no, not us.

We have nothing to do with it. But it kind of, you know, it gets to the challenging nature of all this stuff. It's so hard to prove who has used AI for certain purposes. That really attributing blame for this is is very complex. And this is yet another dimension that, the, the whole campaign meets, I don't know, influence interference through I think is taking.

Andrey

If you go to the article which as always will have links to all the news stories in the description, if you listen to the call and, you know, see for yourself. What is AI Biden? It sounds like it doesn't actually sound to me super great. It sounds maybe not quite state of our art. Where are some kind of typical AI artifacting with weird like crunch sounds or something? But yeah, really interesting to see.

Like this already happening pretty early on for the presidential election, where the primaries just kind of happening before the major kind of head to head is going to start out. So I guess not a great sign for what we'll have to deal with throughout the rest of this year, to rely on new ground. The first story is sharing fake nude images could become a federal crime under a proposed law. This is actually, re proposal of the Preventing Deepfakes of Intimate Images Act, which.

Yeah, was re proposed by Representative Joseph Morrell, a Democrat from New York, now co-sponsored with Republican Tom Kean, the majority, as a co-sponsor. And Tom Kean has previously also introduced a bill called the AI Labeling Act, which would. AI generated content, we have clear labeling.

So yeah, kind of really highlights about deepfakes and nonconsensual deepfake pornography is still a major concern following some incidents, that have occurred and that there are pretty significant bipartisan efforts to address some of the risks present with deepfakes and modern day AI.

Jeremie

Yeah, it seems like the thing that kind of spurred this on was, an incident at Westfield High School in New Jersey. There, a bunch of boys were sharing some AI generated images of female classmates without without consent, which presumably they couldn't give anyway, depending on their age. But yeah. So this is kind of causing a bit of a, I don't wanna say, a moral panic because it sort of sounds like I'm downplaying it. This is a very serious thing. It's certainly causing a big response there.

Also, looking at, it seems civil liability. So in addition to just making it a criminal offense, they're trying to make it easier for people to sue offenders in civil court as well. And, you know, for costs and things like that. Or slaves for, for damages rather, and things like that. So, sort of an interesting and interesting next move in this whole play.

I have to imagine that something like this was going to happen at some point or another because just, leaving it as open season seems, seems like a bad move.

Andrey

And next up, going back to a story we covered at the beginning here related to self-driving. Now, this is saying that San Francisco takes legal action over unsafe and disruptive self-driving cars. This is, about a lawsuit against, state commission that permitted Google and General Motors companies to expand in the city, citing serious problems on the streets.

So lawsuit asks the California Public Utilities Commission to review its decision to allow Waymo to operate this 24 over seven paid taxis, service in the city, which was also previously the case for cruise. Since they did lose that permit, following a pretty bad crash, last year when there was, like a hit and run by human driver and then a cruise car got involved. But, this lawsuit does cite hundreds of safety incidents involving, autonomous vehicles, and.

Yeah, so it seems like maybe San Francisco is not too happy about having somebody in cars driving around. And Waymo now has until February 16th to file their opposition brief.

Jeremie

Yeah. So I mean, this does throw a wrench in the gears potentially. I mean, could could force Waymo to like, halt their expansion, in California until people can the regulators can come up with their, their, their new view on autonomous vehicles and potentially, you know, you can see the setting a precedent for other states to, so kind of an important structural risk that they're taking on here.

The argument, of course, from Waymo and Cruise is at this stage that there are self-driving cars actually have a better safety record compared to human drivers, and that they lead at least to fewer, road deaths and injury. So they're interesting depending on the metric you look at and depending on how you count, you know, the value of a human caused accident relative to an AI caused accident.

Like, what is the the kind of the moral weight there, that those all seem to be the questions that we got to or they have to look at. I'm I'm glad I don't, but, yeah.

Andrey

This article says that experts are claiming that this might be a tricky legal case to basically make this commission review its August decision and potentially say that the decision was wrong, overturn it. So, yeah. May or may not lead to, rollback or a slowdown of expansion, but does highlight that the start of autonomous driving in a safe hasn't been about incidents and hasn't been without kind of negative consequences.

Next up, slew of deepfake video updates of Sunak on Facebook raises alarm over AI risk to election. This is, of course, in Britain, where where UK Prime Minister Rishi Sunak, apparently having had 100 deepfake video ads impersonating him and reaching up to 400,000 people, and this had various, contents, one that featured, fake BBC newsreader announcing a false scandal involving Sunak and a project, supposedly intended for,

ordinary people. So another example of AI being used to target prominent politicians and kind of throw a wrench in the works of government.

Jeremie

Yeah. It's I think again one of the we've, we've talked about this risk before, but this, this idea that, you know, the lie is halfway around the world as the, the truth is just putting on its shoes type thing. I think that's the Mark Twain quote, anyway, where, you know, you can pump out these fake ads, people, people watch them, or this fake content, people watch them and then they assume it's true and form their opinion.

And then, you know, two weeks later when it comes out that it they were fake, you know, you kind of forget that your opinion was formed based on that fake information. So, one of the, one of the risks of doing dimensionality reduction in the human brain, I guess, but, kind of interesting is certainly especially interesting, given that Rishi Sunak has been so on it when it comes to AI safety and the risks of AI systems potentially structural risks

to the UK. Somewhat ironic that he now finds himself on the other end of this. The rich. Not huge, I will say 400,000 people. You know, you can compare that to the famous Russian election interference operation in 2016. That was like over 100 million people on Facebook alone. So when you're talking about, you know, 400,000 people, yes, UK, the smaller country, but it's not that much smaller. Still, it's more, you know, what does this imply about the future?

And, certainly that's a yeah, an interesting question and a very thorny one.

Andrey

And this is coming from this, research report from the, Fenimore Harper Communications group in the UK.

Jeremie

And up next, I it is the buzz, the big opportunity and the risk to watch among the Davos glitterati. If you were thinking, is glitterati a word for my Scrabble board and now is and this is actually a good roundup of a bunch of the big kind of high profile statements. I actually ended up pulling together sort of more statements from other articles, too, because there were so many Davos centered articles, so many sound bites from it that like, I don't know, you'd

be talking all day about Davos. So this is the one Davos piece. there was a conversation on stage where for Zakaria the kind of CNN commentary or how to anchor was talking to Sam Altman and a wider panel as well at the World Economic Forum's panel. And he kind of asked Sam, he's like, hey, dude, like, what do you think the core competence of human beings will be going forward? Like, what can humans do that I won't be able to do? And, Sam kind of responded in a not very inspiring way.

He was like, well, I get it. Like it does feel different this time. General purpose cognition feels so close to what we all treasure about humanity that it does feel different. Kind of implying that, you know, this argument that, well, I doesn't destroy jobs, doesn't automate jobs. It you know, humans have always found a way around, you know, all these things. Well, ultimately, the implied argument here is the thing that has allowed us to find new ways to be productive is

cognition. And that is the very thing that we are now automating away. So yeah, this time it is different. We are looking at automation rather than augmentation. That's kind of the vibe. There was also a cool quote, just pulling here from Marc Benioff, who is the CEO of Salesforce. He said it's kind of a little snarky, I guess.

Turning to the the moderator, he says, maybe pretty soon in a couple of years, we're going to have a World Economic Forum digital moderator sitting in that chair moderating this panel and maybe doing a pretty good job because it's going to have access to a lot of the information we have. He later added, you know, this is a huge moment for AI. I took a huge leap forward in last two years, and he acknowledged, the rapid pace of the tech, that things could go really wrong.

He said, you know, we don't want something to go really wrong. That's why we're doing, you know, that AI Safety Summit? That's why we're talking about trust. He was referencing, of course, here, the UK AI Safety Summit. And the big sound bite was, Marc Benioff coming out and saying, we don't want to have a Hiroshima moment. We've seen technologic technology go really wrong and we saw a Hiroshima. We don't want to see an AI Hiroshima. We want to make sure we've got our heads around this now.

So, kind of interesting, especially in contrast to some of the statements that Sam was making at Davos, where he kind of seemed to be playing down the level of technological change that's going to come with AGI. So you're saying things will change less than we think? You know, cynics have argued that maybe this is because he's trying to downplay things because he's seeing so much through, legislator and lawmaker concern in the US that, like, you know, maybe trying to simmer

things down there a little bit. People are proposing things that he may consider to be, you know, putting OpenAI's, future at risk, even if they are for, for safety reasons. But anyway, so kind of an interesting cocktail of things. I think Benioff and, Sam were kind of the two standout quotes of the, of the conference, I guess. What is this, a bit of Davos, let's say.

Andrey

Just for context, if you're not aware, the World Economic Forum, is this a yearly kind of event, I guess, where typically rich businessmen, you know, people. Yeah. Yeah. Like let's say people that are maybe in the top 1%. Of, wealth across the world. Anyway, it convenes a lot of these very rich people, very powerful people, and they talk a bunch. That's what economic forum.

So you do get a lot of these news stories of conversations and, and, kind of tidbits of what people said in the various discussions that happened.

Jeremie

They do end up being, like, I will say, really interesting and nuanced. And I think, yeah, a lot of the conversations here, when you look at the readouts, it is impressive. And it's it's great that there's a forum like this where people can talk about these issues, kind of semi publicly too, so we can get some of the snippets out. But yeah, definitely surprising angles from, from Benioff who like, I didn't realize Marc Benioff was sort of tracking eye risk on the hawkish

side. Everything I heard him say before was was definitely not this. So it's kind of interesting. This is a shift in him. We've seen a lot of people shift in this direction in the last 6 to 12 months. So yeah, we'll see if it keeps going.

Andrey

And now just a couple stories in the last section, synthetic media and Art. The first one is kind of an article and a little, I know, interactive piece from New York Times. It is called Test Yourself which faces were made by not the first time this sort of thing has happened, but kind of cool that it

was pushed out on New York Times. And as the title says, it's a little quiz essentially, where you're given an image and you have to try and guess whether it is from human or AI, whether it's a real photograph of a human or an AI generated image of a human. And this, is just highlighting kind of red. It is getting very difficult to, distinguish real from fake. You can actually go and try it yourself. Try this little quiz out and. Yeah, I mean, I tried it for a little bit and it can be tricky.

It can actually be pretty straightforward to get fooled into thinking an AI generated image is a human or just not be able to tell, at least for this style of image, which is, I will say this is looking like style again. And, you know, this person does not exist. Com which has existed a pretty particular output that when you see this type of image, you, you do know that it is maybe I just because they tend to look kind of pretty,

samey. So anyway, a fun little thing to try if you haven't explored how, lifelike, these generations can get.

Jeremie

And I got to come clean, there's no reason anybody should be listening to anything that I have to say about AI for a million reasons, but one of them is that I just did this quiz and I got a whopping 50%. Right. So that is exactly what I would guess from get from, random guessing. And so I will say one thing that they're doing here that is, it's not unfair, but it's just it's a thing to keep in mind.

A lot of the AI generated images that they're showing are a lot of the real images that they're showing have very plain backgrounds that are highly out-of-focus. And that is a characteristic of a lot of these, you know, the, these sorts of AI generated images, like the ones you get, like you said, from this person, does not exist.com. And so I was trying to use the background as a cue.

And I wonder if that was deliberate. But it's sort of interesting that, that that's part of how they've teed this up. But it it's tough quiz man like I yeah I'm, I'm pretty I'm pretty surprised that I didn't do that than random guessing.

Andrey

That's right. It is possible they cherry picked for, the slightly harder cases, like, in there are still artifacts you could notice if you look carefully, like you have to look carefully, but.

Jeremie

The teeth is usually.

Andrey

Empty for sometimes the earrings or the eyes. Yeah. And when you do a squiz, when I tried it, they really weren't these sorts of artifacts. But then again, you know, this is stuff we've had for a while now. So if you look at state of the art image generation, you can probably do even better if you really try. So yeah. Anyway, final little piece from New York Times to highlight this, reality of it being pretty doable to generate very realistic

human faces with AI. And our last story for this episode is AI. Models that don't violate copyright are getting a new certification label. So this label is from a company called Fairly Trained, founded by former stability AI of Vice President add new Newton Rex and it is offering this certification program for AI companies to demonstrate their models do not violate copyright laws. So this first accreditation we're going to be offering, it will be called the Licensed Model Certification.

And it will be given to companies that obtain licenses to use data for, training. I models.

Jeremie

Yeah, it's kind of an interesting I mean, we we're seeing examples of this pop up there. Quite a few companies now there. You're doing various forms of certification, which is pretty good. I mean it's a like a free market, voluntary way to start to build some of the institutional capacity to, you know, monitor some of this stuff and, and start to get companies thinking about, you know, what they ought to be recording, in the process, I suspect.

I mean, ultimately, the measures that are going to be important for some of the more, extreme risks are probably going to have to at some level be compulsory. That's not what this is going after, though. This is much more sort of like copyright oriented. So it's good to see the sort of more free market solution available for the problems that call for it. And then, you know, that the more, compulsory mechanisms for things that, require, maybe have your hand, but, yeah, kind of cool.

Andrey

And kind of neat or interesting to note that it is coming from former stability I vice president. Yeah. Given that stable diffusion and, you know, stable diffusion was one of the big drivers of text to image generation.

Jeremie

I'm sure they didn't train on copyrighted data. I'm sure they didn't.

Andrey

One of the big reasons for the backlash from many in the artist communities, or. Yeah, generally like creative, professionals, so to speak. So this is seemingly kind of as a, follow up to that. Now, in response to a lot of a backlash for models like Stable Diffusion that were trained probably with copyright, imagery or without any regard to copyright, really, were this fair use

argument? Yeah. This is kind of swinging the opposite direction of saying let's value and I guess recognize when you are. Getting permission for your training data essentially, and that we are done with this episode to thank you so much for listening

to last week. And I once again, you can find the articles we discussed here at last week in that I you can get in touch at contact add last week in that I and also at Jeremy's email hello at Gladstone that I and those both will be written out with text in the description. As always we appreciate your reviews and your emails and whatever else you want to share or interact with us or you know, let us know that you enjoy the podcast.

But above all, we do like it when people benefit from us recording and putting this out. So please keep doing it.

Transcript source: Provided by creator in RSS feed: download file
#152 - live translation on phones, Meta aims at AGI, AlphaGeometry, political deepfakes | Last Week in AI podcast - Listen or read transcript on Metacast