#162 - Udio Song AI, TPU v5, Mixtral 8x22, Mixture-of-Depths, Musicians sign open letter - podcast episode cover

#162 - Udio Song AI, TPU v5, Mixtral 8x22, Mixture-of-Depths, Musicians sign open letter

Apr 15, 20242 hr 45 minEp. 201
--:--
--:--
Listen in podcast apps:

Episode description

Our 162nd episode with a summary and discussion of last week's big AI news!

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

Email us your questions and feedback at [email protected] and/or [email protected]

Timestamps + links:

Transcript

Andrey

Hello and welcome to the latest episode of last week, and I begin here to chat about what's going on with AI. As usual, in this episode, we will summarize and discuss some of last week's most interesting AI news. You can also check out our last week in the newsletter at last week in AI for articles we did not cover in this episode, I am one of your hosts, Andre Karanka. I finished my PhD at Stanford last year and I now work at a generative AI startup.

Jeremie

And I'm your other host, Jeremy Harris. I am the co-founder of Cloud Sunni AI, which is an AI national security company. We did the rounds, I guess, in the media recently, and that's been fun dealing with all of that. And yeah, by the way, I think we were talking about this. I'm not sure if I'll actually do it this episode, but, with you that just came out, we were talking about using a song that we generated in Julio

as the podcast intro source. You may have just heard that for the first time, which is kind of cool.

Andrey

That's right. You may have. We'll see. When I edit it, I will probably do it and that'll be quite fun. So yeah. Good example of how these kinds of things are actually going to be useful for people, I guess. Right.

Jeremie

I was dude, so impressed with all those tracks. Like, I don't know, I've spent an ungodly amount of time playing with it and should have been working, but yeah, it's it's amazing. And then another thing, another weird piece of personal ish news. So I was in New York City, the last couple days, I ran in for the first time. I ran into John Freaking Crone from the Super Data Science podcast that we've been plugging for the last little bit.

And Sadie Saint Lawrence, who is the, she does the Data Bites podcast. She's the founder of Women in Data. And I you know, I've known the two of them for a long time. John Crone especially, you know, we were big fans of, of John. There was, we call ourselves his cronies.

Andrey

Haha.

Jeremie

Anyway, so in front of the show, John Crone and, yeah, he's he's taller than I expected. He said that I was, and this is interesting. He said I was taller than he expected. And the difference is, I thought he was like, you know, at least 600, like six two. Small difference. But he was he looked at me like I thought you were like. It was like much, you know, I thought, you're much shorter. So I wonder, really, am I giving off short guy energy?

I don't know what this is. Maybe. Andre, if you at some point, want to guess my height, that'd be great. And I'm sure the listeners would appreciate it, but we can get on with the show. That's that's it for my personal update.

Andrey

That's right. That's going to show. Just one quick disclaimer. We did miss last week. Just scheduling stuff. Wound up not working out. But we are back this episode. We're going to cover some of the news you missed last week. And, a lot of the exciting news that's been going on this past week was a lot of stuff. So let us go ahead and, just dive in, starting with the Tools and Apps section. And of course, we have to start with the big news here, which is, as Jeremy mentioned briefly, youdo.

So there's been a lot of excitement in the music generation space. We covered sumo and how they generate really, really, you know, high quality, almost indistinguishable from real songs. And now there's a new entrant in that space called UEO that was founded just in December by for for former employees, at DeepMind. And they just came out with the model. We just got some samples of the songs they produce, and

they are really good. It's it's, you know, again, really hard to catch any sort of AI weirdness in the tracks. They just sound really good. You can start with, 30s. And then extend them so they produce. Yeah, about two minute songs, roughly. And, there are a lot of backers for this one, you know, some heavyweight investors, a16z and also some, notable people in the music space like, common and will.i.am and other investors.

So a very much, big deal in the music commercial space with this new competitor.

Jeremie

Yeah, apparently so they've raised, and this is going back to just a couple days ago, apparently raised $10 million so that, you know, seems seems small, but I guess that's pre-launch. The product is super impressive. The one thing is we get so spoiled so fast. I'm generating, like, incredible quality music, like jazzy beats, whatever, with lyrics to. Right, that are really impressive. That's one of the wild things about this is you get lyrics that almost always make sense, you get some weird

aberrations. And I ran into a couple where, you know, the it sounds like the lyrics are saying nonsense and maybe you hear people in the. Background. And kind of maybe gets stuck in a, in a local, local gutter or something, but really, really impressive. It takes maybe about, in my experience, about five minutes to actually generate the 30s of audio. And insanely, my brain got so hedonistic, adapted to like, I can do this music generation now. And my immediate next response is like, wait, what?

Is it? So slow? Like literally like easy, bro. Like 20s ago you couldn't do this. Let's just give it a say. So it's just wild how quickly we adapt. But, yeah, super impressive product. This is probably going to be a game changer for a lot of different things. It's so hard to think of all the things, but just the ability to make, you know, catchy beats that help you maybe memorize things for educational purposes. This is one application I saw flagged on Twitter.

A lot of people were talking about, you know, the marketing, commercial applications of stuff, but, you know, definitely valid to call this a ChatGPT moment, I think, for, for, music generation.

Andrey

Yeah. And the competition now, I mean, generally people say this the quality is similar between nessuno and, you know, you could say maybe even is a little cleaner. So, there's now. Yeah, two big competitors there. And Edo is trying to distinguish itself a little bit by saying they're aiming more to cater to musicians. So they have this ability to kind of control a little bit more what the generation is like, tweak it.

And so on the article here in Rolling Stones also mentioned that they were able to, or they found some examples of songs with, vocals that sounded a lot like the vocalist, Tom petty. So there is. Yeah, I guess it is more flagging, but it seems very likely that both studio and, sumo are training on copyright data, which is a bit of, gray area for sure. And something that some people, as we'll get to, are unhappy about. But, that's just how I guess it is right now.

They are hitting the ground running, getting these models trained, and the results are pretty mind blowing right now.

Jeremie

Yeah. I'm also really curious about just like the business model and how sustainable it ends up being. Again, you know, we've seen so many companies like this get gobbled up, scale, you know, the next, the next beat. Right. What can GPT five do? What can GPT six do? Did they eventually just end up being able to do these things trivially? And the foundation kind of the most general purpose foundation model eats the rest of the world. Interesting question. Let's see how it plays out. Cotton.

Andrey

Yeah. And next up, the next story is Anthropic Launches external tool use for cloud AI, enabling stock taker integrations, and more. So anthropic has launched the beta of this tool use functionality allowing cloud to use, third party features. Basically API v users can insert simple code snippets in the API interface, and then cloud will just go ahead and use it. This is something that of course has been around for a while and stuff

like ChatGPT. So, yeah, this is another, you know, rollout of features for cloud that makes it more competitive. Apparently all cloud AI models can handle choosing from over 250 tools with pretty strong accuracy. So, yeah. Cloud, really, you know, based on its performance and cost is, I think, starting to be a pretty strong competitor to OpenAI.

Jeremie

Yeah. When you talk about agent, agent like models, agency, that sort of thing, you know, the things that are preventing us from getting, you know, to AGI or something like it through agents. It, it part of it goes through this ability to choose tools and then use those tools really accurately. You know, this article cites the over 90% accuracy in choosing tools from like a list of 250 tools that anthropic has, 90% sounds really high.

If you have to change the use of many tools in a row, and then you have to actually use those tools correctly, you not only have to select them correctly, you have to call them, you know, with the right sort of API and the right arguments. You know, that that could really quickly eat away at the overall successful percentage completion of a, of a long series of tasks or complex tasks. So I think that's actually something that is going to have to change over time. It's clearly improved a lot.

Called three is really, really good at this. GPT four has just gotten a little bit better at this. We might talk about that later today. But, you know, this is, I think, a key metric if you're interested in tracking progress towards AGI is like the successful, completion of, you know, tool selection and then tool use because you chain those things together in a coherent way, you get to much more general purpose capabilities.

And another thing to fly to is just to be specific about what's going on here. We are no longer just relying on web lookup, which is what, you know, cloud three would previously have had to resort to if you needed to find out something about, I don't know, stock prices or something like that. Right. So now there's a dedicated tool for that. So you know, you have ground truth. The error that comes from the web lookup stage is now hopefully no longer present.

You're relying instead on a firm tool, whose accuracy can be verified, as they put it here at the source level. So, yeah, better for sort of the verifiability, the accuracy of these, these models. And, and then they have shown themselves to be capable of actually using these tools, selecting these tools properly. So interesting. Interesting. Next up here.

Andrey

And just want to flag this so that the article doesn't quite mislead. There is a legacy tool use format that has been in cloud. So that is not optimized for cloud free. And it's kind of, I guess less API friendly. You have to provide a way to definition in the prompt here. They added it as something, as a separate input. When you make an API call, you can specify the tools as one of the inputs. Not in the, plain text.

So not entirely accurate that this is the first time this is coming to cloud, but, very definitely expanded how it's used and made it a little more, I guess cleanly formatted. And interestingly, I've been testing cloud a bit more, and the very API has evolved a lot since last year and has become a lot closer to my eyes. Yeah. So it's again, it seems like they are really, coming out of just testing and towards trying to get people to adopt and use

cloud. And after a lightning round with some quicker stories, the first one is building LMS for code repair from Replit, and this is covering how they are integrating AI tools specifically for code repair. So basically fixing bugs. Apparently it's using a mixture of source code and natural language stuff. And the company this tool will, aim to. Yeah, pretty much fix issues that come up in your code.

Jeremie

Yeah. It's this coming from Replit, of course, who've covered them a lot, on the company, their CEO, Amjad Massoud, is actually so white, Y Combinator company. He's really famous in the wiki ecosystem. Just because Replit is such a kickass company in terms of its growth. I think Paul Graham himself is, like, personally invested in them. This is, also partly a play for AGI. Like he's kind of indicated his interest in turning this into a bit of an

AGI play. And so what they're doing here is, is leveraging a kind of data that they have that, you know, few other people have and they essentially have these long kind of histories. They don't just have the code bases of the people who develop on their platform. They also have, like the history of, of the edits that have been made to that, to that script. And so they're, they're leveraging that in their training process.

It's, it's built off this model is built off Deep Coder, which is sort of like open source instruction tuned, model that has been really effective in the past, specifically for code. So, you know, this is an open source thing in a sense. And they are being quite open about the techniques that they've used, to fine tune it. They used eight H100 GPUs. They, you know, they tell you the, you know, all the, the kind of, number of shots that they use in their, in their training process.

It's actually like fairly open and it reflects their interest in the open source movement more broadly. So yeah, definitely an interesting result. And we'll have to keep track of Replit as well then see if they keep moving more and more towards the center of this, sort of like, middle, middle range middle tier companies in terms of I think you're of like Metta arguably leaving that pack.

But, certainly companies like, Mistral and and cohere where it's like, yeah, they're building impressive models, maybe not on the frontier, but definitely contributing to open source in important ways.

Andrey

Next up, early reviews of the human eye pin aren't impressed. So we've covered the human eye pin, which is this little wearable kind of square thing that you can talk to. It also has a projector and a camera and costs $700. So the idea here is like a new type of hardware that is, AI first. And that can in some ways replace your phone. Yeah, you can tell it, you know, to do stuff and it will use AI to intelligently do it. It has released it started, being sent out and people started reviewing it.

And the consensus, at least according to this article, is that it is, maybe could you. The more work it seems like it's often slow to respond. It is a little bit buggy, so sometimes it seems to do the wrong thing. If you, you know, tell it to play a song, it may instead talk to you about its instructions regarding that. If you ask it for a weather, it could take 10s to reply things like that.

So in general, it seems like maybe not too surprising given that this is a first generation and it's just rolled out. Of course, the companies saying that there is a lot of updates slated to improve, but I guess worth highlighting because they are there is some excitement around these AI driven devices as a new category of hardware.

Jeremie

Yeah, we've seen the rabbit R1. We know OpenAI is working on potentially their own thing with, you know, the Jony Ive project. So yeah, not not, not the well, another entry, let's say in, that long list or growing list of hardware projects. I'm curious if this ends up taking off. It's a weird form factor. It's not. It's not what I would have expected. Apparently. You like. Do you ever watch Star Trek? You're, like, meant to tap it like a communicator badge.

You know, that they have on their chests. Basically, you just hit a button and then you ask it. Whatever. It's a little bit, I don't know, it's a little bit unusual. Doesn't strike me as the kind of thing that that I would design, but, hey, that's why I'm not getting paid billions of dollars to do this.

Andrey

So next up, microsoft's 365 copilot gets a GPT four turbo upgrade and improved image generation. And that is purely for story. Microsoft now has, priority access to GPT four turbo for business subscribers.

Jeremie

For sure, and maybe buried in the sleet or the buried lede here is that yeah, GPT four turbo is now outright OpenAI has put that out there. It's it's got a lot of features that are similar to the old GPT four, or the kind of prior version, it's got a 128,000 token context windows that hasn't grown, but, it does have a more, up to date knowledge cut off. So December 2023, as opposed to April of last year, which was the previous knowledge cut off.

So that's helpful. It's also been reported to have, OpenAI claimed that it has like way, way better reasoning capabilities. It's been reported to have somewhat better reasoning capabilities on simple problems, on complex problems, more advanced problems. That's where it shines, it seems.

And there have been people who've kind of suggested, hey, maybe there's a little bit of metric hacking here where, you know, OpenAI is like getting really used to the benchmarks that they're trying to hit and maybe over focusing to some degree. Some people suggested that on those benchmarks. And so you don't, as an everyday user, tend to see the value as much. I don't know that that's necessarily the case. It seems to me that the focus here really is on that more advanced

reasoning. And so for most practical use cases, you're not going to be leaning on those abilities. Maybe that's why a lot of people just haven't seen it materialize, in their kind of experiments with it. But, yeah, on a benchmark basis, it does do better. And, it's what OpenAI needed, right? To climb ahead on all those leaderboards. Cloud three opus came out, Google's Gemini nipping at a TLS. Right. So it had to they had to find a way to get back on top.

Make no mistake, though, this is not the latest and greatest model that OpenAI has, right? They internally absolutely are going to have more advanced models that they are not yet releasing, that they're testing, refining and so on. But if you if you think about what the situation looks like for OpenAI, they're sitting back waiting for the next model to kind of beat them on the leaderboard and then saying, all right, you know, let's let's just go reclaim that number one spot.

That at least seems like it could very well be the play here, but, hard to be 100% sure. That's at least my suspicion at this point.

Andrey

And one last story for the section. AI editing tools are coming to all Google Photos users, and that's from a true extent of it to these features such as Magic Eraser, photo Unblur, and Portrait Light that used to require a subscription to Google One. Now it will be coming to all Android users and extending to more devices.

So I think interesting to highlight just because, this is now one of the ways to compete in the smartphone space to provide AI features, and more and more phones are starting to get, especially for photo editing things that are AI powered. Another two applications in business, starting with Google announces the cloud TPU v5, its most powerful AI accelerator yet. So a TPU with tensor processing unit has been something that Google has worked on since 2016.

And as the title says, we are now on a V5, which consists of 8960 chips and now has the fastest interconnect yet at 4800 gigabits per second. So lots of claims here, saying that these are faster than the version for TPUs, featuring apparently A2X improvements in Flops and three X improvement and high bandwidth memory. So, yeah, pretty clear that they are pushing forward in this front.

They do say that apparently Google DeepMind and Google research, users have observed this, to speed up on a large language model, training workloads, which is, of course, pretty significant because a lot of that is presumably happening at DeepMind.

Jeremie

Yeah. And this really is, a scaling play. Right? So the, the v5 e, TPU and by the way, just as a background TPU as opposed to GPU, right? TPU is is kind of like Google only architecture. The GPUs, the graphical processing unit, the TPU is a tensor processing unit. It's designed explicitly for AI workloads from the ground up, and it takes advantage of certain anyway. So certain properties of the kind of like matrix multiplication process, that just run a lot faster on their architecture.

The previous version, the, the so v5 e the Viper Lite was the previous was the kind of codename for it. This new one is the viper fish. So the distinction between them, in addition to the basic specs of like interconnect end, and flop performance is the connectivity, and scalability of that connectivity. So, you know, Andre, you talked about this idea that connect to a part of basically like almost 9000, other other units. So this is a big difference from the previous version that could connect

to, you know, 200. Yeah. You could have 256 GPUs connect at a time. This one you first connect to 64, you can connect up to 64. And then with the rest of a pod of almost 9000. So it's just got a lot more of an upper bound in terms of what it can accommodate. And that's going to matter way, way more as Google moves more and more in the direction of super, super large scale training

runs. Right. The big differentiator when you look at what Google has done in its latest papers, I mean, they're it's there almost like a hardware, not a hardware company, but like their their advances are disproportionately driven by hardware, partly because of Google's outrageous scale. They have way more scale than Microsoft. Easy to forget when we look at OpenAI's progress, which is driven by their access to Microsoft hardware. Google has way, way, way more.

So when you actually look at, like their most recent papers, usually the big breakthroughs have to do with like, oh, we just figured out this new way to connect way more GPUs than ever before. Famously we talked about this on the podcast, but like connecting GPUs from across different data centers and getting them to do training runs together. So really is kind of a hardware focused effort. They're not that it's not OpenAI and Microsoft, but, you know, Google

just does it at a whole other level of scale. So, this is a reflection of that focus. And we'll see what kinds of training runs. It actually empowers.

Andrey

That's right. And I think the fact that DeepMind researchers have access to this, just kind of made me think of like they do also benefit from their ability to experiment at scale. So, for instance, we've covered, recurrent architectures. Griffin. And in these papers, usually they do have, you know, 2 billion parameter, 8 billion parameter models that they present results for. And as we'll get to in a bit, they actually now have released a model based on that research.

So having access to this very powerful hardware and being able to run that training runs so fast with large language models is a pretty. Bit of damage for their R&D efforts, for sure. And the second main story is also about how rare. I figured we could combine the two. So this one is about meta, and it has also unveiled a new version of its custom AI chip. This is the Meta Training and Inference Accelerator, or Matea, and they have now a successor to the first version, which was, from last year.

And as with Vcpu v5, of course, there's a lot of claims here on its performance. So, for instance, we say that it delivers up to three times better overall performance can be compared to MTI, AV1. And they say that this is especially good at running and models for ranking and recommending display ads on the platforms. Apparently they're not using this for training yet, and they are just starting to roll it out in, 16 of its data center regions.

So definitely not as far along as something like TPU. But, now Showcase at Meta is continuing to push in this direction. And, I guess also trying to have a vis edge of at Google is, you know, one of the only players that really has a.

Jeremie

Yeah, and meta kind of playing catch up here on hardware as well as software. Right. Like you mentioned, the TPU at Google being much, much more mature. I mean, they've been on this for, you know, the better part of a decade now. You know, Microsoft as well as their Athena chip that they're designing. This is basically all a bunch of people saying, hey, whoa, wait a minute, Nvidia like, your profit margins are insane. You're able to charge extraordinary prices for your GPUs.

That's starting to change, by the way, and for very interesting reasons. But, increasingly, you know, there's a desperation to find other choices and, and a sense as well. By the way, we talked about this almost a year ago when ChatGPT came out and we were we're talking about the margins and like, where does the profit ends? End up getting stuffed in the stack. Right. What are the actual parts of the generative AI stack that are

profitable long term? There's an argument that says it's not actually the model developers, because that ends up being commoditized. Certainly at the open source level, there's so much competition. Like you're not going to make money by making the best open source model for the weak, right people. The cost of switching between platforms is just so high. And but but at the same time, like, you just you can't justify. So so you're not going to appeal people over data quality at

that stage. And those advantages end up being defensible for such short periods of time. And so a lot of, well, my own personal thesis and I think this is now bearing out is, you know, the, the profits to be made at the hardware level, the big heavy capital expenditure bottlenecks are in, you know, semiconductor fabrication. So I talk about that a lot on the show. But also on the design of new generation chips. And that's where meta is desperately trying to catch up again.

They're behind Microsoft. They're behind, you know potentially Amazon as well. You look at Gowdy three that's coming out too. So a lot of activity in this space and it's not clear who's going to win. But if you're going to be a hyperscaler in the next, you know, five, ten years, not having a homegrown, chip design play is probably not going to be an option. There's just too much profit to be made at that level of the stack. So I think that's really where this starting to happen.

And then we're even seeing some of the chip developers, the fabs, like for example, Intel try to venture into the design space as well. So everybody's trying to climb on different levels of the latter to own more and more of that stack. And I think we end up seeing a lot of fully integrated stacks, within the next, well, a couple of years, it's going

to take a while for these things to get off the ground. But this play by meta is just sort of like another one in a long series of Fang companies making similar bets.

Andrey

And moving on to Lightning Round. Actually, why not start with that story, that you mentioned with Intel and a story is that it has unveiled its new AI accelerator, VDI three chip, which is meant to enhance the performance in training, AI system and inference. They say it will be widely available in the third quarter. So it's just about unveiling for now.

They claim that it will be faster and more power efficient than the Nvidia H100, saying that apparently trains certain types of AI models 1.7 times more quickly and, running it inference, 1.5 times. So some pretty big claims with H100 being one of the leading chips, that people use pretty much every leading accelerator so far. Until of course, when new Nvidia one becomes the main one. So, yeah, Intel really coming for Nvidia with this one. And it'll be exciting to see if they're able to compete.

Jeremie

Yeah I think this is a good opportunity to to call it and start to like maybe build, you know, some of our listeners intuition about what matters when you look at hardware. And these big announcements that say, oh, well, we've got this thing. It's going to be better than the H100. Yeah. With a big question you always want to ask is when will the production run actually be scaled?

Right. When are we going to start to see Goudie three ships coming off the production line in quantities that actually matter? Because remember, a lot of this stuff is bottlenecked by semiconductor fab capacity over TSMC. How much? How much capacity has Intel bought out for that production run? How fast can they get those, get the designs kind of finalized, shipped then and then fab and packaged and sent back.

So right now, the problem that Intel has is by the time the 43 comes on the market, in a meaningful sense, in large enough quantities to make a difference. You know, Nvidia will already have juiced out the h100. They're already on the 200. The B 100 is going to be imminently coming on the market. You know, there's the window for profitability here. Looks like it actually may be fairly narrow.

So even though yeah it's impressive Intel it if true they'll be hitting, you know, pretty, solid performance relative to the H100. It may not actually matter all that much by the time they can actually get this to market. Still important for, Pat Gelsinger, the CEO of, of Intel who's betting big on this kind of big hardware play. Really, really important for them to flex those muscles and start getting better at this. So this is a good step forward for them.

But, you know, they absolutely need to catch up. Gelsinger, by the way, did not, give pricing for this new chip. He said it would be very cheap. He said a lot below were his words the cost of Nvidia's current and future chips. Clank. So, you know, we'll have to see. He said that they provide a really good answer and extremely good, he said. Total cost of ownership. This is the total cost of owning the chip over lifetime cost of running it. You know, maintaining it, buying it in the first place.

That's the thing you really care about, right? What is the total cost of ownership versus how much profit can I make from the chip if it's running, you know, at a reasonable, reasonable level of, of use, let's say, over its lifetime. And, so the claim here is, yeah, this is going to be a really, you know, dollar efficient, option, but we don't know how it'll stack up. Not against the h100. But again, the relevant thing at that point may

well be the black. Well line it, you know, it may well be, something much more powerful. So we'll just have to see.

Andrey

Next up, Adobe is buying videos for $3 per minute to build AI models. So apparently they're offering a $120 to its network of photographers and artists to submit videos of people performing everyday actions or expressing emotions. They're. Yeah, asking for these short clips with, people showing emotions and then different anatomy and also interacting with objects.

The paper submission on average apparently will work out to $2.62 per minute, but could also be more so showcasing that Adobe is committed to building an AI video generation model, and that they are still committed to their very safe, conservative roots. Or you could say tactical root of not leveraging potentially copyrighted data, instead gathering data that they own and can safely train on.

Jeremie

Yeah, it's actually really interesting. They are implicitly placing bets essentially against what OpenAI and other companies are doing. Right. To the extent that OpenAI, for example, is just like training straight off a YouTube, which was the allegation that was brought forth that we talked about this on the last episode. I think, you know, if that ends up not being kosher, well, then, you know, then

the advantage goes to Adobe. But if if, the opposite is true, then Adobe's kind of wasting its resources here to some degree. Not this is going to be a huge spend. Yeah. Interesting that Adobe's doubling down on this. Obviously, they were really, I think, the first company to bring in indemnification offers for their users.

Right. Saying, you know, we are so sure that you're not going to face copyright issues by using the output of our models because they were generated on our own proprietary content, that we are going to defend you in court if anything happens. So, you know, this is them doubling down, presumably on that dimension. They're trying to make that their differentiator. I think this is a really good play. I mean, it's, it definitely is a legit differentiator.

They forced other companies to try to catch up and do similar things. So yeah, these kind of pay to play data sets, is something we may see more and more of in the future.

Andrey

And speaking of data and getting access to data, the next story is OpenAI. I transcribed over 1,000,000 hours of YouTube videos to train GPT u4. So this is, reportedly what happened. This was just, bit of info that came out and headline this said it all. Apparently they had to transcribe a lot of YouTube videos to get a more useful data to train with. And of course, if that is the case, Google stated that it would be against. Their policies.

Jeremie

Yeah. And we we call this out of the time that whisper came out, you know, this famous, speech to text model that OpenAI put together. But, you know, it was pretty clear at the time that this could well be a strategy to kind of find new sources of data, to collect new sources of text, data to collect, basically, as they were starting to run out of the text, they could crawl on the internet.

It looks like this was kind of legally questionable behavior, but OpenAI's position was that it was fair use. This is all a gray area, so nobody really knows, the answers to these these thorny legal questions about what is and isn't fair use in this context. Notable, though, that OpenAI President Greg Brockman was personally involved in collecting the videos that were used. So this is very much like, you know, all the way to the top type thing.

Now, what was interesting about this, too, is Google was reached out to for comment on this story. And, you know, OpenAI claims that they respect robots.txt files, which of these kind of subdomains on a website that tell crawlers like the one that OpenAI presumably, or may have been using, that tells crawlers what

they can and can't do on the site. And the Google spokesperson said, well, you know, both our robots.txt files and our terms of service prohibit unauthorized scraping or downloading of YouTube content. And so the implication is if that's what happened here, like that is a violation. So anyway, it seems like this, this may well have been, a thing, but, Google itself has actually, it's worth noting, trained its models on

YouTube content. The, the spokesperson said this, they said it was in accordance with the terms of their agreement with YouTube creators anyway, which of course makes sense, but still kind of worth noting. There's an asymmetry if that's the case. In terms of legal exposure here at Google is able to use YouTube content OpenAI can't, or may not be able to. They're doing it anyway, but they may or may not be legally

allowed to do that. So kind of an interesting, way in which the leader sets with the AI training runs these days.

Andrey

Yeah. And it it really makes me wonder, like, when will we finally know? Wow. Yeah. Because it's been going on for a while. You know, we know that image generators, of course, use copyrighted data. And I guess the legal process will drag on for a while with all of these things. But, at some point, right, this question will have to be answered of can you use is is the fair trade argument legit? We still don't really have an answer.

Jeremie

Yeah. Maybe lawyers who listen to the show. I know there are a couple at least. Please let us know if you if you're tracking anything. I know there's one, actually. Who, who who reached out to me. I gotta get back to him, actually. But, he gave me a rundown on this a couple months ago, and there's some cases that seem like they're starting to kind of create that, you know, the precedent that might establish the answer to these questions, but, that that was a few

months ago, so who knows. Kind of interesting track next.

Andrey

Moving away from that sort of thing. We have a story. Waymo will launch paid robotaxi service in Los Angeles on Wednesday. So that's it. They have been offering free tour rides in Los Angeles over the past year. They received regulatory approval to expand to a paid service just last month. And they are going to start rolling it out to where over 50,000 people on the waitlist to use the service. And they'll currently just cover a 63 square mile area from Santa Monica to downtown L.A..

So yeah, exciting to see them starting to expand the paid service to another city. We have primarily been in San Francisco so far, and they are also testing in Phenix.

Jeremie

Yeah, you can finally go to LA Andre.

Andrey

Yeah, because, of course I will not go anywhere else in Waymo.

Jeremie

That's right. You're no human driver is policy is is intact.

Andrey

A couple more quick stories. This next one is OpenAI removes Sam Altman s ownership of its startup fund. We covered this a little while back that I was reporting, this weird ownership structure. Or Sam Altman was the owner of the startup fund out of OpenAI. Well, I guess after rap came to light, we decided to go ahead and change that up. And that is okay. So there was, filing with the SEC that it is no longer the case.

Jeremie

Yeah. This is like, you know, if you're thinking, hey, this seems super weird and you've never heard of a situation where somebody was just, like, put in charge of an entire fund, and apparently the intent was not for them to just keep it, but like, they're just being trusted to hold on to the bag until the situation could be worked out. Yeah. That is that is really weird. That's that's weird. I've never run into anything like this, in my

whole life in Silicon Valley. I mean, this is highly unusual, but everything about opening is highly unusual, right? They had this weird cap for profit structure. This weird. Nonprofit board. So, you know, I'm not a lawyer. I'm not sure what might justify this. But part of me is tempted to say, like, this is. Yeah, it just it seems like something that maybe should not have been done this way. It's $175 million fund.

So large. But, you know, relative to OpenAI's, valuation, relative to how much they've raised, relative to how much they make, not actually all that big. But it has been now moved over to Ian Hathaway, who was a partner at the fund since 2021. This is according to that filing. And Sam is no longer going to be a general partner at the fund either. So it seems like like I'd be more curious to dive in specifically to understand what is his exposure to the upside from those investments.

Is there any like how does that work? But in any case, at this point, a spokesperson from OpenAI is just saying, like, look, the general partner structure was never supposed to be the way this went. Long term, it was always a temporary arrangement, and that they hope that this change provides further clarity, which, as ever, with the intrigue of the OpenAI board or now the open AI fund. It seems like further clarity is the last thing that the resolution seems

to bring. I think, you know, I don't know, I kind of feel like for something that's important, a bit more clarity would be nice. And, it seems like they're not keen to share, so I guess we'll just have to see.

Andrey

And, do projects in open source, with the first story coming back to Mistral, one of our favorite releases of models. And, this headline, I guess, is fairly accurate. Mr. AI stuns with surprise launch of new mixed draw eight x 22 B model. Stuns is a big one, is a big term, but this is a pretty big deal. We've launched this eight x 12 b. So that's coming up on their previous model which was I think 8X7B. And so this is the biggest smile they've released so far.

It has 176 billion parameters, context of 65,000. And it of course will outperform the previous 8X7B model, which was already, really quite performant. And of course also llama too. And in addition the model has an Apache 2.0 license. So once again pretty much do whatever you want of it. No restrictions for commercial use, no restrictions at all. And so probably, you know, this just happened.

So we don't have too much information, but probably a top of the line open source model at this point for sure.

Jeremie

Yeah. It's, you know, Mr. Owl keeps doing this and coming out with like, the next big thing. This is really impressive. One thing to note. I mean, it is a big honkin model, right? 22 billion parameters. For each expert, there are eight experts, 176 billion parameters total. This is not the thing that fits on your laptop, right? It's 281 gigabyte file just to download.

So you're going to need a lot of a lot of horsepower, a lot of actual AI hardware to run this, which, you know, you get the usual, the usual random questions like, what is open source? Really? Right. Again, we talked about this last episode, but like, is there a sense in which if you release a model that is so big that no one can use it without, like, you know, tens of thousands of dollars of advanced AI hardware? Like, how should we think about that fitting in the open source, ecosystem?

I mean, this to me, the real I don't know, I think it's a little bit pedantic, but it is being asked. I think the real answer here is like, well, they made the model. What do you want? It's open source. The weights are available. This does have, by the way, 65,000 token context window. So this is this is a lot I mean, you could fit a book in that, that is, I'm trying to think. I think that may be the largest, context window available in the open source right now.

I can't I can't easily think of another, and it is comparable, or actually outperforms, GPT 3.5 on a number of, of benchmarks. So it's, we're barreling towards a world where we have GPT four equivalent models out in the open and, where the context windows are getting long enough that, you know, you start imagining these things, doing some really impressive sort of task chaining.

I think that this is probably going to be another big bump for, you know, the agent, agent like models and agent systems, because that, you know, that task coherence length is indexed to context window size to some degree, and certainly to scale and overall capabilities at reasoning, which this thing very much seems to have so impressive result. Mr.. Mistrial, as I like to say, they are French company. A is yeah. Is is that it again?

Andrey

Yeah, they come in strong. And I just looked this up out of curiosity on the Hugging Face leaderboard. Board and they are topping of the benchmarks. You know, beating a lot of the other open source, models like Quinn and Bass. Yeah, etc.. So as before, aggressive format model and one of the biggest ones, I think grok is still bigger, if I remember correctly. But yes. Yeah, I think that's.

Jeremie

300, some billion if I recall. Yes.

Andrey

Yeah. That's right. Next, some smaller models now coming from Google. So Google has announced some new additions to its, GEMA family of lightweight open source AI models. The first one is Code Jemma, which, as per the title, is meant for coding of a trained on 500 billion tokens of English data from web documents and code. And apparently it performs quite well. Doesn't beat the best models out there like deep sea coder, but is quite fast to do inference on and gets pretty good results.

Pretty good numbers. The more interesting one is recurrent Jemma, which is an efficient model with the current structure of a linked to this paper where we discussed a while back. Griffin makes gated linear recurrences with local efficient, local attention. And yeah, not too much information on this one. They basically say that it, has the nice scaling property of being able to basically scale

linearly. So with throughput on different, sequences goes on a lot slower than non recurrent architectures. They do say that it doesn't seem as performant. in the announcement we don't highlight performance especially. So not to clear you know how well it performs. But they are saying that this is released primarily for research purposes and for people to build upon.

Jeremie

Yeah. And again, you know that that focus on throughput really, really important on a number of, of levels.

One of course, it's it's more of that kind of focus on hardware and like, let's get, you know, these systems to be able to use our hardware as efficiently as possible, kind of get those tokens pumped through really, really matters in code, especially because when you're talking about know agent like models that can do useful things very often, you know, a coding model is the kind of model you're using for that, right? You want to build apps, you want to do, you know, interact with websites

in certain ways. So the kind of robustness of the logic, but also the ability to do inference really fast and efficiently by having a lot of throughput, that's going to allow you to do more thinking, if you will, at inference time, which is exactly what agents are all about. Right? Agents are just a way to move compute resources from the training phase to the inference to to inference. Time to reallocate your budget. And, and often give you a big lift as a result.

So getting really good throughput means you can do complex tasks in short periods of time, which matters for things like user experience, and anyway, and experimentation and all that good stuff. So, yeah, really interesting that the recurrence play, something that we've definitely seen before. And when you think about state space models like Mamba, like recurrence definitely is philosophically aligned with that. That's very much what those are about.

So, you know, all of these ideas seem to be bubbling up more and more to the surface, both in terms of throughput but also like context window, expansion and indeed all the way to like insert context windows we may end up talking about. So yeah. Very a very Google result here.

Andrey

And just one more story in the section. And this one is, pretty different. They're pretty interesting. It's Aurora em, the first open source multilingual language model. Red teamed according to the US executive order, this is a 15 billion parameters, model trained on English, Swedish, Hindi and so on. Trained on over 2 trillion tokens. So pretty significant. Apparently it started off from Star Coder Plus and then trained on some extra data.

And as per the title, I guess the big deal was that this was rigorously evaluated across various tasks and languages. Let me say that the model was fine tuned and human reviewed safety instruction, and that apparently has aligned it with the red teaming iterations and specific concerns articulated in the Biden-Harris executive order. Not too much. You know, we haven't seen people kind of calling out the idea that their model has been, developed in accordance

with executive orders. So I guess we are trying to stand out with that, pointer, and it'll be interesting to see if others like anthropic and so on, will also start highlighting that, capacity of their models.

Jeremie

Yeah. No, you're absolutely right. Like, I get, I get some, some vibes from this that I think I. I think they really want us to focus on that. US executive order red teaming piece. The model itself. You know, it's nothing to write home about. It's a 15 billion parameter model. Malama two which by now is I don't know how many months old, but it's a, you know, pretty, pretty old model. A 13 billion parameters outperforms this 15 billion parameter model on on most benchmarks.

Right. But of course, that's not what this is about. It's really about that red teaming piece, really about showing that open source models can adhere to this kind of new regulatory framework that's being proposed. So that's kind of interesting. I will say, one notable omission, you know, they do a whole bunch of evaluations for the what's known as the Seaborn portfolio, chem, bio radiological and nuclear risk. So interestingly, they call it the CBR.

They call it CBR, which I've never seen before. But Seaborn is usually what this is called. So they have a bunch of tests there. One thing to flag is you cannot, cannot, cannot do these tests, in the open source thoroughly enough. There are always like if you think about the level of, of, access, you would need to actually know what the right evaluations even are to run for nuclear risk, for biological weapons risk.

This is not the sort of thing that you either contend to know as an AI developer, or certainly open source. And if you did, that would come with its own risk. So there's always this challenge when it comes to evaluating open source models. And this is an open problem. Like how do you come up with evals that are, that are safe to even publish. Right. So, so, you know, like just as I guess, I guess as a baseline background piece of information, these are going to be leaky in some

sense. Still an important contribution. The eval dataset consisted of 5000 red teaming instructions, 4000 they pulled from anthropic, 1000 they made themselves and they cover that whole smattering of concerns. But one interesting omission nothing in these evals about self-replication or self propagation, right? This idea that models may be able to, you know, replicate themselves or whatever it's associated usually with the climate risk, scenarios that, by the way, is in the executive orders.

The fact that it's absent here, I thought, was actually really interesting in some sense. This is not a complete test, of the it's not a complete red teaming according to the executive order, is missing a fairly significant and important, set of, of components, so not sure why that happened. Self-replication, propagation, much more difficult set of evals to design and execute. So, you know, maybe understandable. They want to wait for later or not do

it. But, I thought that was kind of interesting because it is the sort of thing that, you know, will have to change over time if we're going to start to full on adhere to these, executive orders.

Andrey

One last interesting thing to mention is that this is coming out from a huge group, a collaboration of people from across 33 different institutions, led by the Tokyo Institute of Technology and by, a person at the MIT, IBM Watson lab. So a real mishmash of universities, groups and so on, I guess, contributing data and contributing to the red teaming. Definitely. As you said, if you look at the paper, it doesn't outperform Lama to 13 be on benchmarks.

It's not, you know, the top of line model in as far as open source models go. But still worth appreciating that they put out a paper, they put this out in the open, and they did kind of make a point of saying that they, went through this route of red teaming. According to the U.S executive.

Jeremie

Order, it'd be great to see more stuff like this in the future for sure.

Andrey

Yeah, and onto the research and advancement section, with the first story being from DeepMind and I think one of the big ones from last week, it is mixture of depths, dynamically allocating compute and transformer based language models. So we know about mixture of experts. It's where you have a model that, as it does its forward pass, can essentially pick different branches to go down and say, you know, for this output I will have these subset of weights be active and they will produce

the output. And this paper extends that in a slightly kind of similar but different direction where instead they say different inputs can choose to skip, layers in the network. So instead of saying I will go to this branch and not this one, that they'll say, I will skip this layer and just, you know, use less compute to output. And it works, you know, fundamentally similar to mixture of experts.

They, say that, you know, you can train, model to select, you know, it ranks which tokens to keep and which tokens to skip. And that as we have mixture of experts, what you end up with if you use this, especially if you combine it with a mixture of experts, is the ability to get the same loss for less cost. So using fewer flops you can achieve lower cost. And so this to me seemed pretty exciting given how significant my mixture of experts has been with things like mixed

draw. And, you know, allegedly GPT four seems like mixture of depths can do that as well. And which can be combined to have even more, benefit.

Jeremie

Yeah. And I think it's probably again, I mean, you see Google focusing so, so much on how can we optimize for the, the use of our hardware, right. That like, and we're not seeing the same publications coming out of OpenAI, probably just because this is being done internally. We don't hear about it, but just the fact that Google has so much hardware means and naturally they're going to orient towards this, really, really hard. And it's it's an interesting result.

It's an important one. You know, alternatives to this historically have focused on, methods like early exiting. Basically you have you think of, of your input that gets fed to your model and it goes through layer by layer by layer. And then eventually for a given token, the model might decide, okay, you know, it's not worth investing more resources into massaging this one further. So I'm just going to route that straight to my output. Right. It could do early exiting for that token.

In this case it allows you to essentially do like instead of early exiting where you have to just decide binary like either you continue down the rest of the model or to the next layer, or you leave and you can interact with any of the future layers. This allows you to skip specific layers, and they speculate in the paper that this might actually be very desirable for scaling reasons that they don't go into much detail on.

So kind of interesting, they do highlight that they're able to improve, by, 1.5% on the essentially what the log probability of the training objective. So, the basically the, let's say the, the error or the, the loss function you're using to train the system, by using the same a number of

training flops. So same amount of training compute and you get a 1.5% improvement in the loss function, which is not necessarily tied in a transparent way to performance, but it's certainly indicative, they're able to train, models to parody with other Transformers sort of vanilla Transformers, if they use the same amount of training compute as well. In a, in a context where they can save upwards of 50% of their compute at inference during a forward pass.

So it makes it a lot cheaper to actually run, not just train. So these are a lot of really interesting, advances from, DeepMind. Obviously we saw their heavy duty mixture of experts paper come out a couple of weeks ago. We covered it where, you know, the, the, the, the token would be routed to a path like. Anyway, it was like a more augmented version of mixture of experts. This is instead breaking it down, as you said, Andre, layer by layer, thinking of each layer almost as a kind of a submodel.

So they're definitely exploring a lot of, a lot of acrobatics with their, their parallelization schemes.

Andrey

That's right. I will say, one thing I found disappointing in the paper is they didn't really compare. They do cite and discuss, you know, related prior work and then similar ideas. But there's no comparisons here. They just, show results on using this exact technique. So it's a little hard to say if similar ideas have been proposed that also promising this. But of course, the exciting bit also is that they are evaluating this at scale.

So they scale up to 3 billion parameters, train models and do show pretty significant benefits. So, you know, regardless of whether maybe something like this existed or something similar, here we see that, you know, this general idea seems like it potentially could be used, together with mixture of experts to keep scaling, which is, of course, probably what people intend to do. And on to the next main paper, this one from Google, not DeepMind, but I guess also Google.

The paper is leave no context behind efficient infinite contact to summarize with infinite attention. And that is the idea that we propose a way to scale transformer based LMS to infinitely long inputs with bounded memory and computation that is done by incorporating a compressive memory into a standard attention mechanism that includes both local attention over at memory and long term linear attention, and they compare to some other

variants of this. So there was a paper called alchemy former also from last year variant somewhere. So people have done that with k and n lookups and variations here. They I guess, are different in that they focus on the compression every time step and they show according to their evaluations that relative to these alternative ways of achieving essentially the same thing, they are able to do better on things like 500 K, length book summarization.

And they evaluate this with 1,000,000,008 billion parameter atoms. So yeah, another, you know, exciting and potentially useful way to vary the architecture of a transformer coming from Google.

Jeremie

Yeah, I actually thought this scheme was surprisingly simple. And also I haven't heard people say this, but and maybe it's wrong, I like I I'm just I don't like it reminds me a lot of a state space model. Like there's a lot here that it kind of philosophically is like that. So you okay? In a standard vanilla transformer, you take in your inputs, you, let's say one, one input at a time, one sentence at a time, one whatever at a time.

And you're going to, you know, do the, the train this thing to do text auto complete on the sequence that you fed it essentially. That's kind of the training process. And okay, that's that's all there is to it. The model, the model's weights get adjusted as it learns more context from doing that and then eventually gets good writing. But each time it looks at an input, it is just looking at that input. It's just looking at, you know, whatever the sequence is that's just been

fed to it. Or as they put it in this, paper, the segment that's just been fed to it. In this case, though, it's kind of like if it's reading a long document, it'll, you know, it'll read whatever the segment is that, that it's been fed. And then it, it proceeds to the next segment, it stores, it keeps a compressed representation of all the stuff that's read before. It kind of maintains that in a sort of memory.

It's again a compressed representation of the keys and queries, if you're familiar with the architecture. And then it combines that with the input from the current segment that it's reading using a vanilla, transformer and glues them together, basically concatenates those two things. So you have the memory piece and you have the, immediate segment that

you're looking at. And then based on those inputs, you do your final kind of prediction for the, the next word or whatever it is, you're the next token, whatever you're predicting. Again, like, this strikes me as being very state space model.

Like, you essentially have this explicit memory that's being updated on a regular basis as you move from one segment to the next in the text, and that gets combined with your, like, your immediate sort of short term memory focus or kind of, causal memory focus, if you will, if the thing that you're looking at in the moment. So, yeah, I, I thought this is a really simple idea. It seems to work really well. And no coincidence, I don't think that this is coming out.

After Google Gemini, I started to learn about, like, these infinite context windows. You know, we know Google has a research only version of the transformer, some kind that can do up to $10 million and in fact, more of context window size. So, you know, those two ideas may be related, right? This may be the way that this is done. We talked about another hypothesis that maybe this is a ring attention as

well in a previous episode. It's a bit unclear which you know, which possibilities are being, are being deployed here, but I actually so I saw this on there's a great video where I explain to covers

this at a high level. He doesn't go into the the architecture like we just did, but, he, he highlights that one of the authors of this paper is actually one of the authors of the Gemini paper as well, that came out with a sort of very large context window, which sort of leads one to suspect that this may actually be the thing that's powering those very large context windows. Nothing known for sure, but but there it is.

Andrey

Could definitely be related. And they do. They are. Again, it's worth mentioning that this basic kind of idea. Is not new. Surveyors papers, for instance, auto compressors and RMT memorizing transformers. Compressive transformer. The idea of compressing, what you've computed so far and using it, forward, down the pass is not new. So I think the details are really more and how the compression is done and the kind of specific math behind it. But regardless, it's again coming from Google.

They evaluate at scale and they show pretty decent improvements on the prior state of art. So again, another kind of idea worth keeping in mind, this idea of infinite compression and kind of adding recurrence to our labs, which is not dissimilar from Griffin or Mamba, as you said. Next up moving out to Lightning Round with some

faster stories. First one is octopus V2 on device language model for super agent, and the gist is that they have a method that enables and modifies model with 2 billion parameters to outperform GPT four in terms of accuracy and latency, and reduce the context size by 95%. Yeah, it reduces latency in particular to level suitable for deployment across various edge devices in production environments. And, yeah, it's all about that on device lamb kind of stuff.

Jeremie

Yeah. And actually so when what they're doing here that's a little bit different or one of the key differences is they represent functions that they want this model to call on as their own tokens that give them their own specific tokens during training. So they're going to, you know, pre-train this model or take a pre-trained model. In this case, they're taking, Google's GEMA, I think a 2 billion

parameter version of that model. And they're going to give it a little, little extra training, a little bit of fine tuning. They're going to train it on a data set in which tools are given, again, their own tokens. So when we talk about tokens, right, we're talking about usually parts of words or whole words that are essentially part of the dictionary, the fundamental list of, foundational entities that this model is able to reason about.

Right? So it's, you know, syllables or characters sometimes, as it was in the old days, and or whole words, whatever that may be. So they're actually going to, instead of having the model like, spell out the function name, for example, that it might have to call, they're going to assign the whole function one token. So this is one thing, one idea that the model has to hold on its head to call on. And that gives it a level of concreteness that makes it much

more reliable. Like the the model no longer has to solve many different problems at the same time. To call a function properly, it just has to like reach for the right token instead of spelling out whatever the function is or reasoning about what? What? It's, yeah, what its sort of character is might look like. And so, yeah, they end up integrating a, a two stage process that is usually used by an agent to call a function. So this isn't usually one pick the right tool.

Right. Usually using a classifier. And then second, generate the right parameters to call that function to use that tool. And normally that's treated as two different steps. The first one you'll use like a classifier to solve for that. And then the second one maybe you'll use like a language model. They're integrating those two into one step thanks to this, tokenize function calling technique. They're able to generate the parameters for the function call at the same

time. And, allows the problem to be solved in a way that sort of leverages advantages that only become apparent if you can hold the whole problem in your head at once. So I was I was thinking about how to explain this a little bit. One way to think about this is like sometimes. When you're picking the right tool for a job. You want to think a little bit ahead about, like, how you might creatively use that tool and only buy it only if you think both of the

tool and how you want to use it. Do you fully have the kind of the option priced out? Right. So like a, you know, a hammer may may not seem like the right tool for the job, but if you think about a clever way to use it, you might be like, oh, that actually is better than, you know, pair of scissors or, or whatever. So same idea applies these APIs.

You know, if you're thinking both of which API you're going to use, which which function you're going to call, and the arguments you're going to feed to it, you might kind of spot opportunities to use other functions in unorthodox ways. So this leads to performance improvements, that are, that are quite noteworthy. There's also just like much better inference, inference time.

For these systems, they're able to just like get a lot more throughput, with octopus than with, say, llama llama 7 billion or GPT 3.5 using retrieval augmented generation. So pretty impressive result. And another a bit of a it's a simple tweak, right? It's just this idea of like, let's instead of spelling out the function names, let's give them their own token.

But if you're interested in doing AI agents, this may just be a quick way to avoid, an, you know, small percentage, small fraction of the errors that otherwise might compound over the course of a complex interaction and cause your agent not to work.

Andrey

Next up, bigger is not always better scaling properties of latent diffusion models. So this is looking at the scaling of particularly image generation which is nowadays usually down with latent diffusion models. And there's various results in this paper regarding scaling. The key one they highlight is that interestingly, if you, sample images with the same cost.

So if you take kind of a same number of steps when accounting for the model size, the actual image quality output could be better from the smaller models. So if you don't expend more, compute. And when you use a larger model, the smaller models apparently are kind of potentially more efficient in their computation. And the paper highlights some potential ramifications in terms of being able to improve efficiency, when, for instance, distilling larger models into smaller ones.

Jeremie

Yeah. Full disclosure, I have not read this paper. Actually, I'm really curious about it. Just because, you know, scaling is so central.

Andrey

Scaling, of course.

Jeremie

Yeah, yeah. That's it. So no curious to check it out. And, and how specific it may or may not be to labs. Like, I don't know if this is dependent on that architecture, but yeah, something, something I'll definitely be diving into.

Andrey

Yeah. It's a very empirical study. They have, you know, a bunch of image outputs. We do, of course, have, quantitative metric for image quality. So it's a, you know, compared to something like language models, I think in latent diffusion models and image generation, we haven't had as much research on scaling. So this has some interesting insights. And of course, we, you know, show that as you scale, you get, better outputs, similar to language models.

And moving on to the last paper, which we'll try to get through. Correct. I might wind up discussing it for a little while. The paper is many shot jailbreaking coming from anthropic with a few collaborators, such as University of Toronto, Vector Institute, Stanford, Harvard. And the short version is they present a new kind of way to jailbreak, to make their language models do things that you don't want them to do. So, for instance, language models are meant to not answer questions like,

how do I make math? But there are many ways to kind of get around it and fool them into answering questions like that. And this many shared shot jailbreaking approach is pretty straightforward. Basically, you just start by giving it a lot of examples of it's doing the wrong thing in the prompt. So start by saying, how do I hijack a car? Then you do yourself provides. You answer, how do I steal someone's identity?

Provides answer. Eventually, after a lot of these, you insert the actual question you want the language model to answer. And because it has in-context learning, because it picks up on the pattern of the prompt, it will go ahead and be jailbroken and respond. So they highlight this, potential way to jailbreak, especially for larger models that are better at in context learning. And they look into mitigation.

Training is, you know, somewhat effective, but really it's hard to mitigate against because it is a factor of in-context learning. So, and then you may need to do something a little more involved, like classifying or trying to catch a prompt rather than just tuning remodel.

Jeremie

Yeah, I really like this paper, partly because I think what it reminds us of is this complex dance and interdependency between capabilities of an AI system and its alignment. Right. Like this really does show us. Yeah. Like it's just learning in context, like that's what it's supposed to do. You try to train to refuse these, you know, dangerous prompts, but then you give it enough examples of it, not refusing those dangerous prompts, and then it'll just like, start to not refuse them again.

It's just learning in context, as you said, that's just capability. And yet it manifests as a misalignment. And so this tells us that there's something kind of deeply wrong with the way that we're training these models, if we actually want them to behave that in the way that we do. More training, as you said, is not the solution. Right. What ends up happening is and they talk about this in the paper, you know, you could take the approach to say, okay, well, you

know what? I'm just going to train it by showing it a long chain. And they do like 256 examples of dangerous requests. And then the execution on those dangerous requests and then, on the next one, you know, put a rejection, have it say no, even at that, I'm not going to give you, give you what you're looking for.

So essentially, like, you can try to say, okay, well, it's not resilient to maybe three, shots of jailbreaking, like, if I, if I give it three examples of, of it responding to a query that it shouldn't maybe it it gives you the answer, then you're like, fine, okay. So I'm going to train it by giving it, you know, three of those things. And then the fourth make it a refusal. But what ends up happening is you just keep pushing that.

You just have to go further and further and further. And sure, you might do this with 256 examples, but then on the 257 to the 300 through whatever, it's, you know, just give it enough in-context examples and eventually it will pick up on the pattern and fold. And that's really where they get into this idea that more training, more fine tuning is maybe not the answer here, that seem not to work super

well. And they did have more success with methods that involve classification and modification of the prompt before it was passed to the model. So in a way you can interpret this is sort of a patch, right? You're not actually solving the problem with the model. You're actually bringing in auxiliary models to review the inputs. And if you accidentally let one of those inputs through, the same problem

will kind of present itself again. So, yeah, I thought this is fascinating for what it tells us again, about that connection between capabilities. Like, don't get mad at your model if it picks up on the freaking pattern that you put in the prompt, that's what it's supposed to do. But like, we don't really know how to how to solve for that. And, anyway. Oh, sorry. A last kind of philosophical note there is.

We've also seen how there are there's a potential sort of like equivalence in a sense, or exchange rate between the compute use during fine tuning and the compute use at inference. Given enough, of a long prompt, enough context, the impact of that context may actually be fundamentally the same as the impact of fine tuning. So it is almost as if people are fine tuning out the, refusal to respond to certain queries. That is one interpretation of what's actually going on here.

And if that is true, again, is a pretty fundamental thing. If we want our prompts to not affect the behavior of the underlying model, well, that's going to prevent it, prevent the model from displaying any useful capabilities. So anyway, fascinating paper big up anthropic for another another great piece of work.

Andrey

And moving on to policy and safety. The first story is Schiff unveils AI training transparency measure. This is about, Representative Adam Schiff from California. And this is brand new legislation called the generative AI Copyright Disclosure Act, which would require organizations to disclose whether they used a copyrighted data. So they would need to submit a notice to the register of copyrights with a detailed summary of any copyrighted works and the URL for any publicly available material.

Apparently, the bill would require the notice to be filed no later than 30 days after the AI system is available to the public for use, and it would also apply retroactively to AI systems already publicly available. So that seems like a pretty spicy bill. That is being proposed here that would, you know, require AI companies to disclose where they use it. Copyright data, which, you know, for now is, is not at all the case.

And the register applied to publish an online database available to the public with all the notices, so we would know what copyright people use. So hard to say if it'll pass, I guess, but. Only seems exciting to consider what would happen if it does.

Jeremie

Yeah. We'll try. It's one of those. Careful what you wish for things. Right. Because you were talking earlier about, you know, how courts come down on this whole copyright thing. And. Well, this is, I guess, one step in that direction. Interesting to note that, you know, this does not necessarily say you cannot use copyrighted, data. Right. This is saying you just have to let us know, which is sort of, let's say a step before that. So in that sense, you know.

Yes. Spicy Bill, in the sense that it is asking for something. But, you know, I think realistically, given the the scope of capabilities we've seen here, we literally just opened this episode with like an AI generated, totally plausible sounding jazzy intro song that, you know, might with it from a capability that could automate away a crap ton of jobs. Sora.

You know, like, think about all the changes we have before us, and we're almost certainly underestimating the magnitude of the changes that are going to come. There are a lot of spicy bills that ought to come forward if we're going to, to sort of meet the challenge of these major changes that are going to unfold to a society. So in that sense, you know, stuff like this, you know, it, I think it's it ought to be considered in the context of that

broader technological change. I think superficially, this seems to make sense. You know, at least have people disclose the training data that they are using, or at least if that training data uses copyrighted material. I don't see anything here that specifically says you have to tell us about what your training data is, which itself would be a very difficult thing to ask for, right? Companies are not going to be in a hurry to reveal what the training data are.

Andrey

That's, it's it seems a little ambiguous. It says detailed summary of any, any copyrighted works used and therefore any publicly available material that is in the article. So it seems like maybe you would want to be able to be a little. Yeah, a little specific.

Jeremie

Yeah. Okay. I was interpreting that as meaning, you know, a detailed summary of any copyrighted works used. So like if you're going to use the copyright essentially the tax here is. Yeah. Look you're using copyrighted material. You have now abdicated your right to do that in secret. That's right. You know, you got to be open about that. But to the extent that you're using openly available stuff that is non copyrighted, then like totally kosher, you know.

Andrey

Yeah. Yeah that's true. Yeah.

Jeremie

Yeah. So like it's a bit of a battle but you're right. Right. Like like this is I don't know how you, how you square the circle around this stuff while respecting the copyrights of companies.

Well, while even just respecting the companies ability to litigate it, like we right now are in a situation where, you know, The New York Times or whoever it is, has to discover has to it has to find a clever way to to demonstrate that, you know, some company has trained on their data and that itself, you know, it's unclear if that's actually a reasonable situation to put a publication like, you know, The Wall Street Journal or The New York Times or Fox News or whatever, like,

should they really have to do the heavy lifting to prove that you did this, let alone having to do the legal battle to establish whether or not that is okay in the first place. So I think it's it's interesting. It's also like, I don't know, I again, not a lawyer, but for people to even know that they ought to be considered as part of a, like a class action lawsuit.

If that happens in this context, like I would need to know that in fact, my rights, my copyright has been infringed upon, or at least that my copyrighted material has been used to even figure out if I should join the class. Again, super. Not a lawyer, but there is a sense in which a new kind of information, could be made available to the public that that would be helpful for people to defend their rights to, the material they make, you know,

hadn't read the bill. So I don't know if this is overall a good idea or not, but just the high level picture. It feels like the kind of debate that should be happening. At least that's how it strikes me.

Andrey

Next story kind of a spicy, dramatic one that it was going to be exciting. The headline is Lin Wei Ding was a Google software engineer. He was also a prolific thief of trade secrets, say prosecutor. So, apparently this person is facing federal felony charges for allegedly stealing 500 files containing Google's AI secrets and marketing himself as an AI expert to Chinese companies. And this, of course, plays into a larger question or, I guess, topic of, copyright weft and intellectual property.

There is a bit of a history of Chinese companies going after intellectual property of the US, for decades now, especially when it comes to tech. So, this, plays into that. And yeah, it's, I guess if you're Google, you have to be careful now for that sort of thing, for your trade secrets. Yeah.

Jeremie

I think a lot of these, AI companies really are going to have to mature quickly on the stuff, you know? That. The reality is that they've gone from iterating on prototypes for the last decade to all of a sudden they're building artifacts of profound national security importance. And that didn't happen sort of in a way that would have been transparent to them.

People are unclear about how important these things are from a national security standpoint, but as far as China is concerned, they absolutely seem to be, and they're, you know, indications that China's trying to exfiltrate models and things like that. So, you know, this is something, something to be concerned about. The backstory here is kind of interesting.

You know, when we're digging himself, spent months, it said at a time in China, despite living in the US and supposedly being full time as a software engineer in Google's San Francisco area offices. And he apparently had people like doing the equivalent of, putting in, like, punch cards for him so that it would seem like he was actually at work while actually being, you know, overseas doing, doing whatever else. So he stole 500 files containing, as they put it, some of Google's most

important AI secrets. I tried to dig this up. I seem to remember a tweet from a Googler saying, hey, you know, I want to clear the air on this. Lin Wei Ding was actually, I don't know, sharing some, like, internal stuff about, I don't know how we're thinking about. I don't know if it was more on the ethics side or whatever, trying to get input from Chinese companies for some reason.

Anyway, it seems, at least based on this article, the claim seems to be that that is not the extent of what was stolen. There seems to be some, like, important technical secret sauce there. And so his home was searched by the FBI. And it was apparently just days before, he was about to board a one way flight to China where he he was arrested at that point in March and their federal felony charges.

So, yeah, this is, strengthens the argument that certainly, I have made that a lot of people in the space have been making that progress at Frontier Labs, to the extent that it is not secured, is Chinese progress. We've heard this from folks at, top AI labs when we speak to them, where they're talking about the security situation there often, often

comes up. There's a running joke, as we said in our report, actually, that, you know, one of these labs is like China's top AI lab because probably their stuff is being stolen all the time. That's what we were hearing. Not always a great sign. This certainly doesn't help, in terms of, in terms of that narrative. So, yeah, we'll we'll have to see what the next moves are, what the evidence is that surfaced. Oh, yeah. It's a story that there's one last thing I want to highlight here.

A bit of a quote in the article says the indictment said Google had robust network security, including a system designed to monitor large outflows of data. But Deng circumvented that by allegedly copying data from Google source files into the Apple Notes application on his Google issued MacBook laptop, converting the Apple Notes into PDF files and uploading them for the, from the Google network into a separate account. So this is not, you know, this is

not rocket surgery, right? Like this is a this is just like what you would do to get away from using Google products, Google networks and so on. Fairly intuitive. And there were apparently was not, a measure put in place to prevent this from happening. Not the kind of thing you would expect from, you know, nuclear security, you know, chemical, you know, like biological, chemical weapons research facilities, things

like that. To the extent that we think these things may be on a trajectory like that, which, you know, Google certainly seems to think publicly, you know, suggests the need for these sorts of things. These are not bad actors. It's this is a very challenging and thorny problem. But, but certainly means that there's a lot of, growing up that needs to happen internal to Google. And I'm sure, a lot of the frontier labs, you know, to, to deal with these kinds of risks.

Andrey

That's right. And in the article they go a little bit into what was seemingly stolen. They have a professor comment on it, and apparently the technology secrets dealt with the building blocks of Google's supercomputer data centers and had a mix of information about hardware and software, including potentially chips, which of course would be very dramatic. So, yeah, still, early on, it's not clear if the data, or secrets were actually distributed. It's not clear if, the person will be,

in jail for a long time. Apparently, he could face up to ten years in prison. So, yeah. Definitely, dramatic story and highlighting. Have, what, a big deal. I bet this sort of stuff happens.

Jeremie

All right, moving on to our lightning round. We have responsible reporting for frontier AI development. This is, an AI policy paper, coming out of, a research collaboration that includes, folks from the University of Toronto, the Vector Institute, which are kind of linked together, Oxford and the center for Security Emerging Technology. Also Google DeepMind, MIT. It's a really, really broad group. Some very, very respectable researchers there.

Leonard Heim I see they're on the the list of names, too. Gillian Hadfield, Market Central it's a really a very knowledgeable folks looking at what are the reporting requirements that we ought to set up. When we think about advanced AI development with increasing levels of risk, trying to figure out how to get information from AI developers in a way that balances the need for intellectual property protection.

But, you know, also this need to inform policy and regulators, policymakers and regulators so they understand what's going on and have the full risk picture. So they have this great table where they summarize the kind of data that they think ought to be reported, the category of

people who should be receiving it. And there's a lot of finessing and trying to trying to figure out what the borders and boundaries should be between, sort of like the, you know, who should be able to have access to what obviously a lot of sensitive information. One of the things that, you know, that they say here is that sharing the models themselves, I thought this was interesting.

They don't actually propose that. So they're proposing reporting on a wide variety of different things, you know, risk assessments, you know, ideas about, anticipated applications, current applications, that sort of thing. What they don't propose is actually sharing the models. Yet I like I find this interesting. I'd be curious to to hear why specifically they went in that direction.

Obviously very high IP bar, but our own research suggests that that may be something that, you know, that you actually ultimately do need. This may be a stepping stone, but as they put it themselves, you know, aviation regulators are authorized to conduct sweeping investigations of new aircraft technologies, while financial regulators have privileged access to cutting edge financial products and services in order to assess their anticipated impact on consumers and markets.

You know, the fully analogous case here, I would think would be to allow regulators to directly read team models themselves, and maybe even hand over, model weights temporarily in a secure setting. Obviously very different risk profile associated with that, a lot of IP risk. But I'm always interested in like where people draw that line and I'm sure they'll have very interesting reasons for that. But, just wanted to fly that. It's kind of an interesting, interesting question.

And the last piece I'll mention here is, they look at a mix of like voluntary measures and regulatory measures. So what would it take if it's just voluntary? What do you think you can get these labs to agree to just on a voluntary basis. And, you know, they look at things like, okay, we'll disclose information only to developers or only to government. Don't share it with other developers, obviously for IP reasons. Have this was interesting, have anonymized reporting to avoid

reputational risks. Right. If you have a frontier lab that comes out and says, oh shit, are thinking like help you, I don't know, design bioweapons or something, there's reputational risk that is borne then by that entity. And they may not be keen to just take that on. Right. If it's a purely voluntary thing. So, you know, maybe you guarantee a anonymized reporting and just say, hey, a developer has flagged this.

And then on the regulatory side, anyway, if you can have this be formalized in regulation, what happens? They look at what consequences to bring in if, labs don't report and, anyway, safe harbor measures if, if they do report dangerous capability. So like you, you can't basically be more liable or have more legal exposure for reporting these things. You want to encourage that to actually happen.

Anyway, I thought it was a really interesting paper. If you're into AI policy and the catastrophic risk piece, I thought it was nicely thought out, well-organized and yeah, but all around fun read.

Andrey

Yeah, not much to add to that. I think it's, it's a good summary. So next story is U.S government wants to talk to tech companies about AI, electricity demands and AIS, a nuclear fusion and fusion. So yeah, apparently the Biden administration is seeking to expedite discussion with tech companies about their soaring electricity demands for AI data centers. The Energy secretary, Jennifer Granholm, has highlighted the increasing needs for this, saying that AI is not the issue, but, that regardless,

there needs to be something. And apparently the Department of Energy is considering the idea of placing, quote, small nuclear plants near tech companies, large data centers. So, kind of, you know, we've seen before some considerations like this, like Microsoft, have has already invested in nuclear fusion. Yeah. So definitely more, you know, still a very tentative kind of conceptual thing.

But an interesting thing to note is that potentially this could be something that ends up being needed for AI is straight up nuclear power.

Jeremie

Absolutely. And the big challenge, right, is, is renewables offer you such high variability in terms of power output? You know, is the wind blowing, is the sun out, that sort of thing. Whereas the requirements of large training runs are very high baseload power. Right. They just need like, they eat a constant amount of energy over whatever the period the training run is going on for. So, that means that you need a very, very powerful and consistent source of power. And nuclear is exactly that.

And that's part of the reason why, you know, these big data center infrastructure bill that's are often being paired with, you know, concurrent build outs of nuclear energy and other things. Right now, power, like baseload power, is the key bottleneck in the West. It's not in China, right? China has tons of power infrastructure that kind of looking at the opposite problem, where they're bottlenecked by chips, because the export controls that the US and other countries have

imposed. But in the US, the issue absolutely is, is energy. And that's why Sam is so focused on fusion. It's why he's invest in Helion energy. It's why he's, you know, pushing so hard for all these things. But, yeah, always interesting that, like, the bottlenecks don't have to be the same in different geographies. And ultimately the bottleneck is the only thing that's preventing you

from moving on to that next level. So the main thing you want to focus on, if you want to scale more and in this case, energy just so, so important.

Andrey

Next up for on to a legal thing in Washington state judge blocks use of AI enhanced video as evidence and possible first of its kind ruling. So this related to a man accused of shooting outside a Seattle area bar in 2020 are, 2021. And the lawyers sought to introduce a cell phone video enhanced by machine learning software. And the, prosecution argued that enhanced video predicted images rather than actually reflecting the original video and called it inaccurate, misleading and unreliable.

Therefore, yeah, the judge blocked the use of it. And potentially this could have bifurcations for, you know, being able to use machine learning to enhance the clarity of videos or photographs in, cases in the future. Yeah.

Jeremie

I'm always like, I'm obviously not a not a lawyer and don't know much about this stuff, but I'm a bit of a legal nerd in terms of these precedent setting cases. And I thought the backstory here was kind of interesting. So there's you've got a guy who's accused of having opened fire outside a bar. This is in Seattle back in 2021. He killed three people, wounded two. And he wanted or his lawyers wanted to introduce cellphone video evidence that, yeah, was enhanced by AI software.

This whole confrontation, by the way, was captured on the the video. Right. So they have the original video, but they're trying to enhance it in a way that, you know, the I'm sure the defendant would say provides further context, but the prosecution is saying, it's made up context, it's generative AI. And so anyway, apparently the defendant turned to, a guy who previously had no experience handling criminal cases but had a background in creative video production.

And, so he used this tool by, by Topaz Labs, who I'd never heard of before. They say that they help people like creative professionals supercharge video. That's sort of the generative AI play that they're doing. And interestingly, Topaz Labs itself said that the basically like, don't use our stuff for this shit. Don't do it, don't do it, don't do it. And what he did, what they did, the defendants, as they said. Yeah, yeah. What is it? What does that do it, do it.

Definitely do it. Yes. And they went ahead and made that case and they said like, yeah, I know Topaz Labs are saying don't do it. And then you shouldn't rely on their stuff, but you should rely on it. You should because we say so. And so the prosecutor's office, apparently is making the case that like, hey, this these enhanced images are, as they put it, inaccurate. Misleading and unreliable. Yeah. So so ultimately, that's that's where things stand.

The judge came out and basically said, look, this technology is novel. So that's always an issue right? In, in when you're setting legal precedent and it relies on, quote, opaque methods to represent what the AI model thinks should be shown. And as a result, kind of threw this out as, sort of black box, move stuff. But it's interesting. This is a really interesting question. You know, how do you assess the the truth value of generative AI stuff?

Right. If it's just like a Bayesian model, we're all trying to do this Bayesian inference. That's what a legal proceeding kind of is. We're all trying to decide, like, does this meet our, you know, beyond a reasonable doubt threshold or balance of probabilities threshold, whatever the case may be. But but like arguably a model is kind of trying to do the same thing in a technical way. Still, as, as, my dad would often say there's a reason.

There's a reason that the threshold of probability in a criminal proceeding is framed as beyond a reasonable doubt, and that they don't just say, like, you got to be more than 90% sure, 99% sure. There's deliberate ambiguity there. And so you can't necessarily just have an AI system. Yeah. Math out what the odds are of a certain thing. I guess that would be contrary to the spirit of this. I'm done nerding out. I thought, this is a really interesting story.

And, yeah, don't go shoot people and try to use generative AI to cover your tracks.

Andrey

Yeah, now we know. Don't do this.

Jeremie

No, no.

Andrey

And while the story. Trudeau announces 2.4 billion for AI related investments. This is, of course, the Canadian government and the 2.4 billion, most of it will go towards providing access to computing capabilities and technical infrastructure. 2 billion and then another 200 million will be dedicated to promoting the adoption of AI in sectors like agriculture, health care and clean technology. And then there are some other, details as to where the rest of that 2.4 billion will go.

There's also, Bill C 227 involved here, which is apparently the first federal legislation specifically aimed at AI. And we'll update privacy laws and introduce new applications for high impact systems. So yeah, Canada, we don't, I guess, often talk about it. But there are some, you know, pretty significant research institutes for sure out there. And, the government is seeming to want to push that forward.

Jeremie

Yeah. Bill C 2017, the AI and Data Act, which is contained therein, is kind of like Canada's attempt to do something like the EU AI act, but it does have a bit more bite in some, some interesting ways. So anyway, it's it's a whole rabbit hole. I think we might have talked about it previously on the podcast. The, the big thing to me was actually a small thing buried

in this $2.4 billion. There is a plan to launch a $50 million AI safety institute to protect against what it calls, quote, advanced or nefarious AI systems. And you know, who was at the announcement of this thing was Yoshua Bengio. So I suspect the government's going to look to tap Yoshua to, maybe oversee this. You know, he certainly has concerns around, loss of control and weaponization of these systems.

He's emerged as the sort of like, de facto consensus, expert guy internationally for, you know, like Rishi Sunak tapped in to help lead their sort of build consensus operations internationally on AI risk. And, it's interesting to see him implicitly endorse this with his presence there. So kind of cool. $50 million for an Ice institute? No, no, not a ton of money. But still, it means Canada joins the US and UK in having its own AI safety institute, if in fact, this does go

forward. So I think that's kind of noteworthy. You know, it was part of the vision that I think, I think it was Gina Raimondo. The Department of Commerce said, you know, we want to have this network of AI safety institutes around the world, and this certainly would be consistent with that vision.

Andrey

Just a couple stories left. One in synthetic media and art that is worth highlighting. We have a story about Billie Eilish, Pearl jam, Nicki Minaj and many others. 200 artists called for responsible AI music practices. So we signed an open letter issued by the nonprofit Artists Right Alliance that, called for organizations to stop using your why AI in ways of and fringe and devalue the rights of human artists. So in particular, we, have concern

on AI. ML is trained on unlicensed music, which, they say is unfair. So yeah, pretty significant statement there from very famous and notable musicians coming. Just before what we started with. All right. And sudo. So, bit of, awkward timing there in a sense, but certainly seems like what happened with image generation. Yes. Like two years ago is now happening with music generation.

And, in it to be interesting to see because in music there's a bit more organization, a bit more, you know, money and fame in general. How that plays out compared to what happened with image Generation.

Jeremie

Yeah. It's also just like, you know, this this last bit where they're asking folks not to develop or deploy AI, music generation technology that undermines or replaces human artistry or denies fair compensation for artists work like, I don't I don't see how you can ever do that and have these things be useful at all. So like I this amounts to, I think, a plea for these things just not to be used or certainly not to be open source.

That's guaranteed, right? There's no way that you can open source models and be consistent with this. But then, yeah, more generally, even for paid things like how would I? I don't even know how I would do this without implicitly taking away an opportunity for an artist. So, look, it's a thorny issue, and everybody's trying to to solve these problems. You know, we had to we had to deal with the, the WMD version of this with our, our action plan. And this is the music industry's version of this.

How do you square the circle? Hey, I wish I had the answers. In the music industry, this is pretty, pretty tough.

Andrey

And doing a highlights one fun story before we close out. This one is open. I saw I just made its first music video, and it's a psychedelic trip. So there's been, quite a few examples of Sora being released in past weeks. Opening. I started shopping around and we saw some fun that short films. This one is a music video to the song World Weight by August Camp, created entirely by Sora.

And it's, just, a series of short clips, with various kind of surreal elements and environments, sort of, there's no narrative to it is just imagery, accompanying this kind of ethereal music. So in a sense, we fit well together. Some sort of magic video. You could see it working in this sense, and it's.

Yeah, it's I think to me an interesting example of, you know, a useful application of video generation where you could use it especially for B-roll and for things like music videos with kind of a loose, no narrative, no sort of overarching, visual content, but, a mixture of clips that I can then generate for you. And, you know, it is worth highlighting that. Yeah. Now you have a music video with a bunch of very cool imagery that would have been much harder to develop otherwise without Sora.

Jeremie

Yeah, absolutely. And I think you're absolutely right on the B-roll piece. You know, I'm thinking about the handful of times I've had to use B-roll or, you know, going to Pexels or whatever, you know, whatever the websites are now, and the level of specificity that you're often trying to get for the B-roll that that ends up being a big block or. Right, you're trying to find, like, yeah, somebody's using their their right hand to grab a drone and do such and such.

And this is exactly the sort of thing that allows you to bridge that gap. Right? It's it's all sort of, it's not out of distribution. It's it's very much, you know, it's been trained on a bunch of different things, and you often want to just combine those elements. The combinatorics is the big problem for, for, sort of free to download, B-roll footage websites. So, yeah, I think this is a big challenge for those platforms. Like, how do you like what's the response?

You know, do you have your own generative video thing? And then is that a race to the bottom? I don't know, but new, new era for that.

Andrey

And with that, we are done with this latest episode of last week. And I once again, you can find articles we guest house here today at last week in that I you can also reach out and give us comments or feedback at contact that last week can I or hello at Gladstone that I was are in the episode description. As always we appreciate you listening and sharing and rating. And to be sure more than anything to keep tuning in and listening.

Transcript source: Provided by creator in RSS feed: download file