#149 - Reflecting on 2023, Midjourney v6, Anthropic Revenue, Unified-IO 2, NY Times sues OpenAI - podcast episode cover

#149 - Reflecting on 2023, Midjourney v6, Anthropic Revenue, Unified-IO 2, NY Times sues OpenAI

Jan 07, 20241 hr 24 minEp. 188
--:--
--:--
Listen in podcast apps:

Episode description

Our 149th episode with a summary and discussion of last week's big AI news!

Check out our sponsor, the SuperDataScience podcast. You can listen to SDS across all major podcasting platforms (e.g., Spotify, Apple Podcasts, Google Podcasts) plus there’s a video version on YouTube.

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

Email us your questions and feedback at [email protected]

Some recommended resources for keeping up with AI research:

Timestamps + links:

Transcript

Andrey

Hello and welcome to Skynet Today's Last Week in AI podcast. Where you can can hear us chat about what's going on with AI. As usual. In this episode, we will summarize and discuss some of last week's most interesting AI news. You can also check out our last week in AI newsletter at lastweekin.AI articles we did not cover in this episode. I am one of your hosts, Andrey Kurenkov. I finished my PhD researching AI earlier this year and I now work at a generative AI startup.

Jeremie

And I'm your other host, Jeremie. I'm the co-founder of Gladstone AI, which is an AI safety company, and I do a bunch of stuff with the US government on like national security meets AI, um, and AI safety stuff. So, uh, that's my background. And by the way, today's episode is brought to you in part by Covid 19 and by a probably like random rhinovirus. Um, so each of us has a different breed of a virus today, so we're kind of excited to bring that to the table for you.

Andrey

Yeah, if we sound bad this time, it's not our fault. You know? It's just, uh, our body's, uh, failing to put the shutting down.

Jeremie

Yes. One of, uh, one of the other things that's going wrong with our bodies, or at least our brain specifically in my brain, even more specifically, um, was actually pre viral disease. Uh, it's, uh, I got this really great. And by the way, Andre, like, we have the best listeners. Um, I got this email from a listener of ours. His name is Chris Cox. He's, um, the chief information officer of the Bank for interaction for International Settlements. Uh, this is like, uh, Bank of central Banks.

Um, they anyway, they do all kinds of interesting stuff. So Chris himself has a background in evolutionary algorithms. Uh, from back in the day in grad school. I've actually met him a couple of times. By the way, this is one of the cool things about this podcast. We get to, like, meet these people and they get to correct us from time to time. So I talked about, uh, fun search or. Well, we talked about fun search last time, but Andre, you didn't make

a mistake. Um, so in the context of DeepMind's fun search, uh, this was the the algorithm is all powered by language models. And basically you had this language model that would propose a new computer program to solve a problem and or a bunch of them, and then they'd be evaluated. And then the kind of best performing, um, programs on evaluation would then be stored in like a cache of programs, and then that cache of programs would therefore grow and

grow and grow. And each time the algorithm wanted to solve a new problem, it would actually consult that cache and try to mix and match different, uh, different previously proposed solutions to try to combine them in new and interesting ways. And what I had said in the at the time was that any time you have a loop like that where a language model is going to, uh, try to evaluate its own outputs and kind of like, um, yeah, use that iterative cycle to improve its solutions.

You have a hard ceiling because you only ever have as much information as is contained in the language model. So, so, you know, eventually you need a new source of information. And in this case, I said, uh, that source of information, it comes from the fact that these programs are being evaluated because they're programs whose outputs can be graded objectively and automatically. That grading provides a new source of information into this ecosystem. And I kind of left it at that.

And what Chris is very rightly pointing out in the email he sends here or he sent me was, well, actually, um, technically the information is also coming in at the level of the mixing and matching of the, uh, of the functions that have previously been proposed in that cache of functions. So there's new information kind of being created in that sense as well, from an evolutionary algorithm standpoint.

And it seems based on his email that, um, that through that evolutionary evolutionary algorithm lens, the way you would use the phrase like the information is being created is actually more to do with that step than the actual evaluation stage. Um, I think like for various nuanced reasons, there is new information and it's pretty critical coming in at the evaluation stage. It's just like how you think about, you know, what

is information. It's got to. Anyway, uh, just wanted to surface that, that that is an important extra bit of nuance. And so thankful to Chris for flagging that, um, Chris is really, really smart. We have a ton of smart listeners like this and just super thankful, uh, for, uh, for having people who can call us out on that sort of thing. And this I thought was a great example. So thank you, Chris.

Andrey

Yeah, that's a great point. I think, uh, font search is evolutionary algorithm. And so we like randomness injection where you kind of tweak your previous solutions in theory can yield like infinite new information, right, via evolution. So I guess it's not fair to say that there's an inherent limit to anything because, uh, you know, you go on long enough, you find everything, but exactly. So that's that's a good point.

And speaking of, uh, being thankful to our listeners, I also want to quickly shout out a couple new reviews we got. And it's kind of funny. Uh, you look at an Apple podcast as I do some times, and you got two new reviews on the 24th of December and one of the 26, hey, that's like, we got some Christmas gifts, I guess. Uh, but yeah, uh, some really nice, uh, new reviews.

Jay Fortuna, uh, mentioned that he loves to research and advancement section, which is cool because that is one of our maybe more technical ones. And yeah, I don't know if we go too technical sometimes. It's kind of sad, actually. We cannot cover a lot of AI news. We really have to be very selective with regards to research. So I think I'll include a few recommended newsletters and podcasts if you are interested in following on, I research more closely.

There are some resources I follow myself and uh, yeah, so those will be in the comments. If you're curious, uh, in the description of this podcast and uh, also, yeah, we got another podcast, uh, from a couple of people that were really nice and, and helpful. Apparently we have information about recent events and the technological state of our art without the manic aid of libertarian isolationism or the rantings of doom prophets, which I like. That's, uh, something we strive for either.

Yeah, yeah, yeah. You want to be a little bit badass? So, uh, yes. Thank you for the feedback. As always. We appreciate it. And it probably helps us get you listed as I don't know, as always, we do appreciate it if you leave a review on Apple Podcasts and do feel free to get in touch at contact at last week in that I have any comments or suggestions or corrections, it is always, uh, awesome to hear from listeners.

And before we get into the news proper, real quick, we're going to do our sponsor read. And once again, we are being sponsored by the Super Data Science Podcast. This is, uh, massive, very long running, uh, podcast about data science. We have over 700 episodes. We get released twice a week, hosted by John Krohn, who is a chief data scientist and co-founder of machine learning company Nebula. And he has been on the show, as I've already said in previous ads.

And, uh, as you might have heard, if you've been listening, uh, this past year. So he is super knowledgeable and talks to just all sorts of people, uh, across AI, data science, surveys, machine learning, AI, data careers, a lot of more kind of hands on practical stuff that isn't just what is covered in the media. And yeah, so it's a great resource if you're curious about people in data science and AI, and I want to get a bit beyond just knowing when use, uh, as we do.

Jeremie

He is also, by the way, just, uh, the super, super chill guy. And, uh, he, he sent me, you know, this is just how classy he is. He sent me a Christmas card. Andre. And I don't know if, you know, there's nothing like receiving a Christmas card to make you realize that you do not have your shit together. You know, other people are writing Christmas cards, and you're, like, barely putting your clothes on in the morning. So anyway, shout out to John Krohn.

He's got everything kind of locked down to a tee. And, uh, Merry Christmas and happy New Year to John Krohn. Uh, as well.

Andrey

And speaking of, uh, Merry Christmas and happy New Year. We have for you to kick off this episode. You do something a little bit different, which is before getting to the news that happened over the past couple weeks, of which we are not a ton, uh, nothing too impactful. We just do a little bit of a retrospective on the past year and a chat about what went on in 2023 and yeah, just sort of like remind ourselves of all the stuff that's

happened. And of course, of one year on the newsletter front at last week in that I, we actually published, uh, whole little text newsletter I news in 2023, a look back where we included all the stories we highlighted on the text front, uh, month by month. So it was, uh, just putting that together. I got a kind of refresher on what happened. It was very interesting kind of going back and seeing how early on in the year there were still only just ChatGPT.

Uh, yeah. Its impact was still being felt. And then in February, Microsoft came out with updates to Bing and Meta, and Google started sort of rushing to try and catch up soon after that, around March, the open source, uh, machine kind of started going with llama and alpaca. So a lot of these trends, uh, that started in February and March with new ChatGPT rivals and new open source alternatives to charge. GPT kicked off and basically just kept going all year.

Right? And just last month, we got the culmination of that, almost in a sense of getting Google's Gemini, which is one of the first really GPT four tier models, along with Claud. And we got Nick straw, which was a GPT 3.5 tier open source model. So anyway, yeah, it was, uh, clearly a very eventful year in this, uh, language model, a chat bot space. And, uh, anyway, Jeremy, you have any thoughts, uh, on the whole of 2023 and I.

Jeremie

Yeah. My God, it's so hard to to pin down, like, just do a small number of things, but I guess big themes, um, you know, whether a couple of big ones. So I think one. That is very easy to forget about because it happened a while ago, but so important. You mentioned open source. Well, the development of Laura, the Laura algorithm and what that did, like cheapening of the fine tuning process.

Basically people finding ways, incredibly efficient ways to take like open sourced loans and then fine tune them for new tasks for like 300 bucks. Right. So that's part of what led to the alpaca, the kuna kind of Cambrian explosion of different AI models that had this sort of like high degree of specialization that could, on narrow things, compete with the best private, uh, AI model, a language, models like GPT 3.5.

Um, so I think Laura and the kind of broader and Q Laura, the kind of broader ecosystem of, uh, sort of fine tuning tools, I think that's been a really big deal. Another thing that I think, you know, again, one of the things we do on this podcast that I think is really important that's often missed. And, and that that missing is, is going to be felt even more in the next year, I think, is we look at hardware a lot.

One of the big things that happened with AI hardware, we've seen the cycle times that, for example, Nvidia puts into producing a new GPU get cut in half. So in VR, Nvidia has gone from releasing a new GPU every two years to a new GPU every year. And that reflects this broader trend of more and more competition. In fact, Nvidia now seems like their position is much less, uh, long term stable perhaps, than it appeared this time

last year. You know, this time last year, you look at Nvidia dominating market share 95%. No real competition in sight. Now we've got AMD. Now Microsoft is making their Athena chip. We've got um even Intel coming with their Gouty chip. Like there are a bunch of different, um, pretty plausible looking competitors, at least on the hardware side for Nvidia. Software ecosystem is another story, but I think overall, what we're going to start to see is, you know, Nvidia is going to have to hit the

gas. They will. Um, but we're going to see a dramatic acceleration in AI optimized hardware. Uh, and last, last note I'd make on that too is I think 2023 will be remembered in part as the last year that, uh, non Lem optimized AI hardware was used for LM training or was was sort of coming off the production line. So we've seen this a lot with some of Microsoft's, uh, Microsoft's chips where yes, they're just coming out now, but the chips are like designed back in 2022.

And so that was pre ChatGPT. That was pre the LM hype wave. There's still going to be really impressive. We're still going to see huge leaps in capability and speed. But these are not optimized yet for LMS. And so I think we're going to really start to see things kick into high gear as companies as AI hardware designers really loop that in and start to double down.

Um, we also saw obviously China making a lot of big strides in their domestic semiconductor, uh, ecosystem, with Smic kind of breaking through, as we've talked about many times, that seven nanometer node, maybe even a five nanometer node we might talk about today. Um, so I think those are big things. And obviously I say free safety also went mainstream in a big way this year. Uh, so I think that's another really big, big update.

Andrey

Yeah, it was interesting, uh, looking at this article and seeing, uh, you know, it was only in May that Geoffrey Hinton left Google and he converted to the let's talk and take AI safety serious side. Similarly, I think Bengio started talking. Bengio is a famous AI researchers and scientists. Uh, you know, they kind of got into the conversation in a big way. And so it did go mainstream in a sense, of the community, to much more an extent than it had before.

And, um, also in policy, of course, uh, you know, we just covered last month how the EU AI act seems poised to actually move forward in the US. A bunch of stuff is happening. So, you know, there is movement really on the regulation side that, uh, in some cases certainly need it, you know, for things like deepfakes and so on.

And, uh, yeah. One other thing I'll comment is, I guess adjacent to hardware, another story that might be flying under the radar is, I think, self-driving cars, right where 2023 was really the year where they started to come out, where Cruise and Waymo were both offering paid services for anyone up in SF. And, uh, cruise unfortunately didn't quite make it work. Or they ran into some issues as we've covered. But, uh, Waymo is still going strong and is still expanding.

So I think in 2024, we could really see self-driving cars become much more of a normal thing for people to use. Much like chat bots became a pretty common tool last year.

Jeremie

Yeah. Again, hitting the mainstream, right? Yeah. I do think 2023, you're right. It's been this year of sort of like the mainstream ification of a lot of stuff, you know, like worries, worries over AI safety, um, uh, self-driving cars, uh, even like the other thing too is AGI timelines. Like, I'm old enough to remember a. And, uh, if you said that you thought AGI was going to happen in the next five years, it would get you laughed out of a room.

We obviously don't know if that's going to happen. There's huge uncertainty. But one of the things that we've seen is short timelines, kind of like enter the popular, you know, ecosystem of public thought, where we have all the frontier labs saying like, yeah, you know, we think by 2028, 2027, 2026, like, these numbers keep getting shorter. And so I think that's a really interesting addition to we have the future almost seems like it came to the present in 2023.

Everything just got too compressed. And obviously huge disagreement about the extent to which these timelines will materialize, the extent to which safety concerns over existential risk, catastrophic risk is real. But, uh, certainly everyone now seems to have equity in this. Everyone seems to have an opinion. And, uh, that wasn't the case this time last year. I think people were really more in the figuring it out. And now we're more and more seeing people entrenched in their in their views.

Andrey

Yeah. And, uh, nothing out of the future has become a president. I think we also saw the emergence of some of these very science fiction, the feeling trends of people, for instance, getting really involved in talking to chat bots. Right? And, uh, there being a million, you know, I girlfriend chat

bots. And of course, if you see in the movie her a specifically or a few other movies or read science fiction, this whole notion of having interpersonal relationships with AI is, uh, pretty science fiction. The concept of you talking to a computer and you have feelings for it, but it is happening in a real way now, uh, and, uh, and still growing, I presume. So, yeah. It's a it's a crazy time. It was a crazy year, and I don't think it's going to slow down anytime soon.

Although if I were to make one prediction and, uh, predictions are scary, I'd probably have a good idea. But I will say I have the feeling that we are starting to hit the ceiling in terms of, I guess, intelligence or capabilities for chat bots and large language models. We saw, you know, with DeepMind, Google tried their best and could not outperform GPT four with Gemini, uh, even with their largest, most expensive variant.

So I think we might not get any crazy new capabilities outside of things like having retrieval for memory and different plug ins and different sort of system level architectural level tweaks. But, um, actual neural net training side so far, it seems like maybe we are slowing down and have slowed down since the release of GPT four last year.

Jeremie

Interesting. Okay, so I think this will be an interesting, um, uh, and I agree, scary bet for us to each take. I'm sort of on the other side of this coin. And that's why folks last week and AI is the ultimate fair and balanced podcast. We do it all, man. Um, but yeah, no, I'm more on the I guess I expect things to accelerate side of things. I think I would expect GPT five, uh, to, um, come out, uh, maybe

like surprisingly soon. I don't think the delay between, you know, three and four, for example, will be mirrored in the delay between 4 and 5. Um, and I do expect a pretty significant capability leap, just based on talking to some folks who are involved in building GPT five

and what they've seen so far. Um, I think like it's it's going to be an interesting year to determine, like to learn about what we think intelligence is, because I think what we'll find is a lot of the times, you know, we'll see advances that maybe somebody like me goes, oh, wow, that's a really big deal. That means we're on really like kind of closing in on AGI level stuff or whatever. And, you know, somebody else might look at the same

thing and be like, oh yeah, yeah. But this is, you know, uh, this is trivial. This is some other, you know, thing. And we'll probably end up having the same discussions and debate. And I think those discussions are really the most important thing, because that's how you end up figuring out like, where, where do I fall on this? What do I think? Um, there are no answer is no easy

answer, certainly in this space. But, uh, I see indications like, you know, with, uh, DeepMind with relatively simple tweaks. We talk about, uh, fun search, that algorithm that I screwed up, uh, the description of earlier today or, sorry, in the last episode, um, you know, really starting to drive forward even human research, uh, capabilities. I expect AI to start being useful for AI research sometime in the next, say, 12 to 24 months.

That wouldn't surprise me. Like, I'd put, like, a 5050 bet on that or so. Um, so, yeah, I mean, I think it's going to be a very telling year. I'm looking forward to having a lot more discussions with you, because this always helps me figure out what I think of these things. That's that's kind of part of the joy of this is like, because we approach this from different perspectives, it's a bit generative. It ends up, you know, forcing you to confront deep questions that are

always obvious to answer. So, uh, anyway.

Andrey

Yeah, definitely. I mean, only you you never know, of course. And we have seen some interesting technical announcements. Think. Like Mamba and some reason mixture of experts tricks in recent months coming out of academia.

So um, yeah, maybe we will see some pretty big leaps still in language, although I do think most definitely we will see them in 3D generation and video in some of these fields we've seen they're still on the emerging front of AI capabilities so that, you know, by year's end will have photorealistic 3D, I think, and we'll have maybe like a minute long video. Right. And that will be kind of the area where I will keep

blowing our minds. Whereas I think with GPU five, it will be better for sure, but I don't know that anything will be mind blowing. I hope. You know, maybe if it has memory and it actually can, you know, keep a context for weeks at a time that could be mind blowing for a user of a chatbot. So, you know, and we'll have to see. We'll have to live it as we have been for the past year.

Jeremie

Yeah. Just a last thought maybe to crystallize this. I think like for for listeners who are wondering like, what is it that fundamentally like will surprise us? I'm going to I'm going to test something out in Andre just to see if you would agree with this characterization. So so one of the central questions is, you know, are we really close to AGI? How soon will that happen? And one of the things that would make it so that we actually are close to AGI is, is scaling up current systems, scaling

up especially large language models. But, you know, more and more multimodal systems do, um, invariably leads to more generality, uh, greater kind of longer time horizon planning, all those things that we associate with, you know, human level intelligence. And there's this question about how far scaling will go. Um, I tend to think scaling will get us. Actually, scaling alone could probably get us all the way to AGI, even just like scaling, uh, in the limit.

Um, actually, I'm not so sure about just a GPT four type architecture, but just scaling alone could could do the vast majority of it. Uh, a lot of people disagree. Very smart people disagree, I think. Andre, you disagree. Um, the question is like what we know from scaling is as you scale these systems, they get better and better at predicting the next token, right? That's what scaling tells us.

Increase the data, set size, increase the amount of compute the size of the model, increase those things together and you will get a better and better text autocomplete system. And it just happens to be the case that as you do that, as autocomplete capability increases as a side effect, we get all of these other capabilities that emerge. Now we can predict autocomplete improvement.

That's what the scaling laws tell us. If I give you this compute budget, this data set size and so on, it'll be this good at text autocomplete. It'll make this many mistakes on average. But we can't do is say, oh, that that level of mistakes or that that level of autocomplete capability will lead to long term planning abilities that allow you to solve, um, cutting edge math problems, uh, agent like behavior that allows you to execute autonomous, fully

autonomous cyber attacks. We don't know the mapping between the metric we actually know how to predict and the actual metrics that we sort of care about. And that, I think, is one of the central points of a sort of uncertainty when we talk about like where all this is going and how how tight could timelines be? Um, somebody like me can point to hardware advances and how I think

things are just like lifting off there. But then somebody else can say, yeah, sure, but that just makes you a better auto complete system that does nothing for the kinds of capabilities we really deeply care about. Um, so, Andre, if you agree with that characterization, I think 2024 might be one of those years where we get a little bit more information, hopefully about sort of like whether that, you know, which aspects of our thinking on that are wrong and.

Yeah, and how they materialize in the world.

Andrey

Right. Yeah, actually I disagree, but on a different level, which is I think we we already have AGI. I am a big fan of actually the levels of AGI paper by DeepMind as a general idea that it produced, which is you shouldn't think of AGI as a binary. It's a continuum. You know, you have different levels of generality and skill. So they say they're charged with D and two and barred are already AGI and are in level one or emerging, which is equal to or somewhat better than an unskilled human at

a whole bunch of stuff. Which is true. And I guess the question and what you are saying, a lot of people, when they talk about AGI, they're saying, let's say what DeepMind called level three, level four, which is at least 90 percentile of skilled adults or 99th percentile. Ask all adults at various tasks, uh, across kind of a spectrum. So yeah, I guess it is a real question of if we just scale, can we get to an AI that is better than the most skilled, you know, uh, people at many things.

Uh, and, uh, yeah, I mean, definitely it is a question to see. Are we going to get more surprised at that of scaling, or is it going to be really new things that drive us forward?

Jeremie

Yeah, I, I I agree. I would say like I do see it as more binary in one sense, which is the kind of AGI I'm specifically interested in is AI that can automate AI research itself, because I think once you do that, really you hit a take off speed that is very, very fast. And so effectively everything, every capability at that stage comes within reach.

Andrey

Well, it's all about scaling. You don't need much research.

Jeremie

Right? Right. That's true. That's true. And this is part of the part of the equation. It's like yeah. Can it help with AI hardware design? Can you know, how slow are those feedback loops. But anyway yeah I think we've got we've got a lot to learn this year. We're going to be covering a ton a ton of stories. Um, one of the things we talked about, we briefly paused because, uh, poor Andre is just like he's he's pushing through on the Covid.

Uh, I can take the lead maybe on this one and read the stories out loud for us just so Andre can rest his voice and save it for his insights that it'll share on the back end. So, uh, with that, um, we can just dive in here and and kick things off with Midjourney version six. Look. More text. So Midjourney has just released this new version of its text to image model. It's an alpha release and it's got a ton of new features. It actually looks really kind of impressive.

I mean, they always do. And I feel like we talked about this for a while, but we're kind of like saturating the photorealism element. Like, these things are so good now that each additional version, um, is like it struggles to find a concrete, clear like value. Add that the previous version didn't, just because they're already so good. Well, here the new version has the ability to actually spell.

So when you generate images that have text in them, having that text actually be kind of cogent, coherent text, that's a really big challenge. Uh, we saw sort of the Dall-E line of, uh, text image models gradually climb that ladder. Well, with version six, Midjourney seemed to have

reached that point. So that's, uh, going to open up a whole bunch of new possibilities in terms of, you know, applications you can imagine now, like, you know, generate, for example, an image of a storefront with, um, you know, my tagline on it, and it can actually do that. And the tagline will be on it, right? So that's a lot more useful. Uh, they give you a whole bunch of different examples. Um, actually some pretty close to that, like storefront one.

They show a neon sign by a shady motel that says vacancy. And the word vacancy is clearly spelled right. Um, and then they show some, some other examples where you can see it's like, it's definitely still uncanny valley. So a bunch of printed propaganda posters that say, this one's way too long, buddy, and it's like mostly spelled right. But there are some some that are wrong. Some of the spellings offer similar letters are just misshapen, so they're still kind of breaking through.

It hasn't quite caught up with, uh, with Dall-E three, which really is like on point when it comes to text rendering. So perhaps this, you know, an indication that Midjourney, um, either algorithmically or in terms of their access to hardware or both, uh, currently lags behind OpenAI at this stage because Dall-E three, of course, has been out for for quite some time.

Andrey

Yeah, I think it's it's maybe more a matter of prioritization of the what a model excels at. So Midjourney has pretty much been on the forefront of quality wise. If you want to go and generate a good looking image that is cinematic or illustrative or kind of, I don't know, artful Midjourney is probably still in the lead on that front. Uh, and it does seem like, you know, one way in which we have been lagging is an ability to do text. So now they do have a capability.

And this, uh, news follows, uh, shortly. It came out like a week after the news that they were launching on the web with Midjourney Alpha. So, yeah, it's, uh, exciting news. If you're a lover of text to image, Midjourney really generates, you know, absurdly good looking images. And now you can use it for things of text as well, assuming it's short text. We have examples in this little article where if it's one word like vacancy or happy birthday, it does do a good job.

If it's something complicated like a poster, it's going to still have that AI wackiness to it for sure. So not quite perfect. But again, you know, the range of things you can do now is expand it yet again for sure.

Jeremie

Yeah. In under you make a great point about like the prioritization piece. You know, Midjourney is not an AGI company the way OpenAI is. They're not prioritizing necessarily the like especially the text piece, because OpenAI's whole thing has been limbs for a long time. That's where they kind of got got their main research focus. So perhaps less surprising that Dall-E three, which is coupled to a language model, um, is, you know, that much better at writing text and relatedly, prompt adherence.

This is the sort of idea of having the logical, um. Consistency of the text prompt and the image should be very high. Uh, prompt adherence is something that, uh, Dall-E three has been a bit better at that Midjourney. Midjourney in the past.

Has it been? So, for example, if you give it a prompt like two red balls and one blue cube on a green table, which they show in the article, um, often previous versions of Midjourney would kind of show you, you know, maybe a red ball and a blue ball and then a white table or like, uh, you know, different configurations. The thing that didn't really match what you were saying. Well, now version six really is ticking that box a lot better.

Um, at least for this particular prompt. You know, two red balls and one blue cube on a green table. Um, you see images that correspond much better to that. So a higher degree of of prompt adherence and that kind of signals a certain, a certain like logical grounding in the world, a better correspondence between the text and image piece. So another, another bit of, uh, a good news there for Midjourney and impressive, uh, impressive result.

Andrey

Here's just a quick example to give you an idea of kind of the prompt adherence. One prompt in this article is there are three baskets full of fruit on a kitchen table. The basket in the middle contains green apples. The basket on the left is filled with strawberries. A Marcus basket on the right is full of blueberries in the background as a blank tile wall with a circle window. And yeah, uh, Midjourney re6 actually nails it. It gets all those details in there.

And that's also going to be solved. Dall-E three. You can get these paragraphs, long descriptions. And in general, uh, the model will tend to follow it. So yeah very cool.

Jeremie

And up next we have Baidu says its ChatGPT rival Ernie Bot now has more than 100 million users. Okay, so 100 million users. By the way, there's a reason that they're announcing this. This is the kind of big vanity metric that OpenAI announced a few months after launching ChatGPT. I think this is in, uh, in November, actually. So more than a few months. But in November, OpenAI said that ChatGPT had reached 100 million weekly active users.

And Baidu, which is really kind of positioning itself to be a big competitor of OpenAI domestically in China. Um, they have Ernie Bot, um, which is their this sort of like their Ernie language models. Um, Ernie 3.0, I think is what it's based on right now. They're claiming that it's now hit over 100 million users over that

same benchmark. Now, interestingly, what isn't said here is whether this is weekly active users, which is the metric OpenAI reported, or total users or monthly active users, which is a much easier target to hit because you just need people to kind of log on, you know, once a month or so.

Um, so that much is very unclear and, uh, that, you know, that makes me suspect that this might just be like a vanity metric, a bit of metric hacking, so they can be able to get that headline and say, hey, we're we're hitting a hundred million users. But nonetheless, uh, really impressive, given that Ernie Bot is not accessible outside of China. So the market they're going after is the know, whatever it is, 1.2 billion or so, 1.4 billion people in China.

So this is like, uh, you know, something like 10% of China seemingly has, uh, has access this tool and maybe available, um, actually in Russia or other places like that, I think, but it's certainly not in North America. Um, it is cheaper. They're charging about eight bucks a month for this, so it makes it more accessible. Um, certainly in the Chinese market too, compared to like the 20 bucks a month for, uh, opening as ChatGPT. Um, and there've been a bunch of questions around, like, really?

How do ChatGPT and Ernie the Ernie Bot compare really hard to do across languages. Um, but, uh, but anyway, uh, an interesting development and certainly indicates that Baidu is pretty serious contender when it comes to language models that have commercial traction.

Andrey

Yeah, we've covered some stories about, you know, companies wanting to be with OpenAI of China. And interestingly, I think so far there really hasn't been one. Maybe Kai-fu Lee's company could be set to be one. They released a really good open source language model. But in terms of usage, Ernie Bot seems to be the leader and it's really more of a Google type company dominating in China. So, uh, anyway, good to know. Uh, if you're curious about the space of chat bots, only bot is a big one.

Jeremie

Yeah. And one of the things that's interesting to note to here is the extra headwinds that Chinese companies currently have to face because of the regulatory pressure from the government. So apparently they developed, uh, Ernie Bot back in, I think, I think in March or something like that. I mean, we reported on it at the

time. Um, but they weren't able to to do a mass launch until about August because they basically had to get regulatory approval for a mass rollout, and that had to meet certain thresholds of, of performance.

You can imagine the sort of thing, you know, in the, in the People's Republic of China, the requirements that the Chinese Communist Party imposes or things like, you know, make sure that it never produces, you know, anti-communist content and with language models that it's really hard to do, um, maybe even in. Possible. Uh, so, uh, anyway, all kinds of extra technical hurdles that they had to overcome. So they were able to launch only in August along with a bunch of other local players.

So they all kind of had the same starting, uh, starting moment. And so in a weird way, Baidu wasn't even even able to take advantage of their first mover advantage. Having launched in March, the sort of delay cost them a lot of months. So it's an interesting dynamic that that cost the first mover advantage that they otherwise would have had, um, due to, uh, to regulation. That's something we just haven't seen in the West, where it's all about the racing dynamic.

And moving on to our lightning round, we have Android Auto, we'll have Google Assistant summarize your messages with AI. So Google's developing this feature called and uh, sorry for Android Auto. Rather um, that would let Google Assistant take your very long and messy conversations and deliver a summary. And, um, that's kind of the story here. I mean, it's a pretty convenient development, I guess if, uh, if you use Android and hopefully it doesn't hallucinate too much.

Andrey

And this is one of those areas where having a Google phone with Gemini Nano built in might be helpful, because then it could use the onboard chip to do a summary. And that's one of the examples of it, uh, of the usage of onboard AI behind summarizing chat messages. So I guess we'll start seeing a lot of this on device type stuff coming out, uh, in the coming months. Actually, speaking of that, on the same product, we have some of the, uh, Samsung Galaxy S, s 24 key AI

features just leaked. So this new phone is expected to launch in just a couple weeks. And there are now leaks that are saying that it will have some, uh, capabilities. It will have live language translation to built into a phono and some nice, uh, you know, additional camera, uh, tricks, uh, nice tography zoom and stuff like that. And also might include a generative edit feature similar to what you have in Google Photos.

So, uh, yeah, that's another example where you now have AI as a feature to compete with in the smartphone space.

Jeremie

Yeah, it's interesting coming from Samsung too, because we're used to seeing the big the big tech companies, which all now have their own phones, you know, the, the Googles, the Microsofts, um, kind of working their way back from their AI dominance to the hardware dominance. Well, Samsung actually is one of the few companies in the world that makes pretty decent, uh, chips, AI chips, like they actually have their own fabs. So we're kind of seeing them move in the other direction.

Again, using phones. Is that kind of that bridge between two worlds. So I'm curious about, you know, how big their their AI research teams are internally. And really what Samsung sees as the AGI play here, which increasingly is almost got to be part of part of these companies, uh, business vision. So anyway, I'm really curious what we see next from Samsung and whether they can actually keep up.

Andrey

And one last story for this section Microsoft quietly launches dedicated Copilot app for Android. There's now, uh, a new app to talk to. Uh, chat bot like we chat your app. Now Microsoft has where copilot app which is similar to being chat app but is for copilot. So yeah, a little bit full on it. But uh, yeah, it's kind of the same thing.

Jeremie

And moving on to our applications in business section, we open with anthropic forecasts more than 850 million in annualized revenue rate by 2024 year. And so this is a really interesting development. We've heard about OpenAI meeting, um, uh, or hitting a projected 11. $6 billion annual revenue rate. Well, this is saying anthropic is actually pretty close to that.

Like or at least by 2024, they will be um, this is interesting because it does suggest that even though anthropic has this policy of, among other things, releasing models only after they're slightly behind the cutting edge and in capability. And this is just a reflection of their commitment to safety. They don't want to be accelerating the race to potentially, as they view it, dangerously powerful, uh, AGI like systems.

Um, but they, they know that they have to be building frontier models to be able to understand them from a safety perspective, from policy perspective. Well, anthropic seems nonetheless to be on track to generate a good amount of, of revenue. And, uh, apparently three months ago, the company told some folks that it was some investors, that it said it was generating about $100 million in, uh, annualized revenue. And then it figured that would reach 500 million by the end of 2024.

We're now seeing that be revised upwards. A key thing, though, to keep in mind, any time you see top line revenue numbers for these big language model companies, you know, these big AI companies, the OpenAI's, the anthropic and so on. Always, always, always ask yourself about margin. It is not the case that, um, a dollar in is a dollar of. Profit even though it's software, right? So usually we're used to the revenues of software companies being overwhelmingly profit.

Um, and that's because, you know, the hosting costs for a website or an app are so low. Right? You just you host it for super cheap using Heroku or whatever else. It costs you almost nothing. Um, when it comes to language models, especially at scale, and a large fraction of your costs go into the serving costs, the compute costs. Now, OpenAI has a massive first mover advantage here.

Their, uh, their margins are really high because they are the only you or one of very few labs competing at the level of the GPT four kind of quality level. Um, so that's something you really want to look at is, you know, what are anthropic margins like? I don't think we have clarity on that, at least I certainly don't. And I haven't seen compelling figures about that. Um, so, you know, take these figures with a grain of salt, but certainly they imply high demand for anthropic products.

Um, even at the level of them being, you know, comparable in order of magnitude terms to OpenAI, which I think is a really big win for anthropic, given that they started like, you know, 3 or 4 years later.

Andrey

It's quite impressive to see this, uh, rapid growth. Anthropic is very similar to OpenAI. We also now have APIs you can use in an expanding access, uh, kind of slowly, actually, so many people have not been able to build on top of Tropic in a cloud until kind of recently. And anthropic also has partnered with Amazon so you can get access to cloud through AWS, Amazon Bedrock.

So there's a slightly different kind of, uh, approach where and potentially more people would be interested in building on top of it. We'll see. But, uh, yeah. Seems like a very real competitor to OpenAI. Uh, you know, Tropic.

Jeremie

Absolutely. And actually dovetails into our next story. It's in a way, this is sort of the reasoning behind the next story. Anthropic is going to be or is in talks to raise the $750 million in funding. Uh, and, you know, that's that's not huge news like they've raised probably about, you know, out of maybe 6 or $7 billion so far. So far. So it's not a huge contribution of their war chest in relative terms, but it's an $18.4 billion

valuation. So this really positions them, um, along with OpenAI as the very, very, very rarified subset of Deca core. And that is $10 billion plus, um, AI companies working in that line space, in that AGI space. It is four times larger than their previous valuation of 4.1 billion, which was earlier this year. Um, and this, by the way, in a context where OpenAI is said to be in talks, uh, to raise at a valuation of

100 billion or more. So anthropic, uh, doing really impressive things, it seems, according to the market, like the market seems to be assessing them as being behind OpenAI. But this might sound weird. Factors of five. Factors of three. When you're up to like the, you know, 20 billion versus 100 billion valuations. Um, it's not clear how significant those might be, um, and how much this is, you know, going to play into the the end result here. You know, anthropic raising seven 750

million. Um, it's that's not going to be too dilutive. Like they're raising it at $20 billion valuation. Yeah. That's about like five or so percent of uh, of their uh, equity that they're giving away. So that's not that much control. Um, so expect these companies to keep being able to raise on very high valuations and therefore sort of maintain some degree of control. That's kind of strategically how this ties into the safety mission.

Anthropic. Um, and uh, and I think a really important ingredient in all this.

Andrey

Yeah. And uh, afterwards, entrepreneurs, another bit about OpenAI, in addition to that plan to raise was also that their annualized revenue has topped $1.6 billion. Uh, which is actually up from 1.3 billion just a few months prior. So the whole space, uh, seems to be growing rapidly. I mean, if 300 million more in annual revenue over just a couple of months, uh, for OpenAI is impressive. So I guess it makes sense.

AV club, likewise, is, uh, Cardano and Tropic are also seeing a lot of growth that's quite rapid.

Jeremie

And kicking off our lightning round, we have Nvidia releases slower, less powerful AI chip for China. This is about, uh, a less powerful usually these are by the way referred to as pare downs. This is a pared down version of their RTX 4090 chip, which is meant to comply with U.S. export controls, uh, on China. So, uh, essentially the way this works is you make the regular chip, usually the one that you might sell in the US, and then you'll like

blow fuzes. You'll blow circuits on the chip itself, or you'll add, you know, software that's meant to kind of pare down its capabilities in a hopefully irreversible way. But by the way, people are actually quite concerned about these things being reversed when they are in China. But anyway, that's that's how this is meant to work. Uh, and overall, it's set to perform 11% slower than the original. Um, and has is. Has fewer processing subunits that can accelerate AI workloads.

Um, 11% slower, by the way, these metrics, like, uh, they don't always tell you all that much just because, uh, the, the thing that bottlenecks AI, um. Uh, AI training runs in AI inference runs always changes depending on the model you're thinking about. So for, you know, for some language models, it could matter how much like, uh, how much memory capacity the system has. For others, what matters is like how how many calculations each, um, uh, each core can do you know how much?

So how many flops? In other words, floating point operations. It can do per second. Um, in other cases, it's actually the interconnect, the bandwidth between GPUs for really big workloads. So, you know, saying 11% slower, there's a lot of assumptions going on, you know, going into that. Um, so it always, always bears kind of peeling back layers of the onion. But overall, uh, those the latest round of export controls was actually really quite significant.

And this new chip, the it's being called the RTX 4090 D, uh, is uh, is just that it's meant to skirt those export controls or just slide underneath them.

Andrey

Yeah. There's actually a statement from an Nvidia spokesperson that was, you know, this was developed, uh, while being extensively engaged with the US government. And this product will only be out in China in January. So this is a direct consequence of the export controls. It's entirely about.

Jeremie

That. Yeah. And that engagement with the US government probably important given that the, um, uh, Gina Raimondo at, uh, the Department of Commerce famously said and as we reported here, like she basically said, listen, Nvidia, like, stop trying to deliver. Um, like, you know, shouldn't use this term, but like a weapons grade AI hardware to China, um, by skirting around

our export controls. So I suspect that the frustration there is probably leading to a little bit more consultation with the US government directly than they previously have been the case. Up next, we have Smic, which, by the way, is sort of China's big domestic, uh, um, uh, sort of semiconductor fab, uh, fabrication factory. So Smic is reportedly working on three nanometer process technology despite US sanctions. Okay. So, uh, we've heard about, first of all, a quick recap.

Um, the, uh, right, currently the, um, Nvidia H100 GPU, which is more or less top line GPU you can get, um, that's built using the TSMC, the Taiwan semiconductor manufacturing companies. Um, five nanometer fabrication process. So for AI chips, that's more or less cutting edge. These these machines that have resolutions for some features down to like five nanometer resolution. Um, that gives you the h100.

Now the the challenges, um, the US, the Western world doesn't want China to get access to that level of capability currently. Uh, or at least until like 20 minutes ago, it was thought that China didn't have the capacity to make not even five nanometer resolution, but seven nanometer resolution, um, chips. Now, they recently came out with an announcement that, hey, like, we have Huawei just came out and said, hey, we have, uh, a line of laptops that use the seven nanometer process.

Um, so it seems that we're able to produce those at scale. Uh, so based on that, China has actually cracked that crucial seven nanometer, uh, process threshold. Now it seems like Smic, um, may actually have broken through the next layer five nanometers, and now perhaps even three. Or at least they're working on this three nanometer process technology. This stuff requires some very specialized machines called photolithography machines to do it.

Previously was thought that you could only make five and three nanometer process technology using this very advanced type of lithography machine, called an EUV extreme ultraviolet lithography machine. Um, it turns out Smic may be using the the previous sort of like worse technology called deep ultraviolet lithography duv and using that successfully to make seven, five, uh, and perhaps maybe three nanometer, uh, processes. And uh, this involves anyway, going back over.

So using the Duv machine multiple times to kind of um, do what's called multi patterning, uh, which slows down the process, but does yield these very advanced chips. So uh, a lot, a lot going on there and something we will do a deep dive in uh, on a later episode about hardware. But for now, just to track like SMA, Smic seems to be making some really impressive progress on manufacturing that was not thought possible. Um, when these, uh, these sanctions were first being thought of.

Andrey

And on a related note for next story is Huawei files a patent that enhances wafer alignment and efficiency, hinting at the company's self-built fabrication plants. And that's the story. There's a new patent about this wafer processing. Device that does suggest that Huawei wants to fabricate its own chips right now.

Jeremie

One of the defining challenges for Huawei is their dependance on Smic. The company we're just talking about, because that is the only source of manufacturing of these semiconductor chips. So Huawei kind of like Nvidia in the US. They design chips and then they send them off to be fabricated. Huawei sends them off to Smic because that's China based. Um Nvidia sends them off to TSMC, which is the Taiwan based company that is leaps and bounds ahead of Smic.

But still, um, you know, Smic as we discussed, is making progress. So now what's interesting is it seems like Huawei is saying, well, wait a minute, I'm not so sure that I want to depend on Smic for fabrication. I think I might just want to be able to both design and fabricate my own semiconductors. That, by the way, is an extremely huge lift. Just the development of a semiconductor fabrication facility can cost you around 50 billion with a B. You are hearing that, right? $50 billion.

These are huge leaks. I mean, they are perhaps the single most capital intensive thing that human beings do on the face of the earth, with no exaggeration. Um, more, by the way, then like the the moon landings or anything else like that, like this is a ridiculous, ridiculous spend for these things. Um, anyway, so what's happening right now is why are we saying, man, maybe we want to do this in-house.

Um, a little bit unclear why they might want to be doing this, but that certainly is what this patent filing suggests. It's about basically a way of of aligning, um, these wafers to. So, so semiconductors are made on these wafers. And we'll talk about this in a special episode dedicated to hardware. But basically you put these wafers called a whole bunch of chips that you're then going to fabricate onto them.

And um, and that's one of the things that, uh, Huawei is looking at optimizing is how do you yeah, how do you set up, uh, wafers optimally for, well, alignment, which is really important for getting highly, uh, high resolution patterns etched on them. Ah, and finally, uh, we're talking about this last story called ASML ships first high numerical aperture or high Na lithography system to Intel. Um, okay. So this is actually, uh, upstream.

So, so, um, the semiconductor fabs like, uh, TSMC and Smic, the companies that actually build the chips that power this massive revolution of AI that we're seeing, they rely on specialized machines called photolithography systems. Now photolithography systems have to keep improving if they're going to keep allowing the fabs to pump out better and better chips. And the world's best um, factory for photolithography machines is the Dutch company ASML.

And they are exploring this technique called high numerical aperture lithography. Now basically you have two options. If you want to keep improving the resolution of your chips, you can either shorten the wavelength of light that you're using to laser in patterns on the chip. That's one option, um, that requires a fancy light source and improving light sources, which is really hard. Or you can essentially increase the size of the lens.

In other words, use a high numerical aperture lens, uh, which allows you to again, get better resolution on your system. The problem is, when you do that, it kind of fucks with your whole setup. These the setups for these photolithography machines have been optimized over generations with a baked in assumption of certain maximum lens sizes and so on. And so you can't just jack up the size of these, uh, these lenses increase the numerical aperture all you want.

Eventually you run into really serious challenges. And that's exactly what's going on here. Um, ASML is experimenting with these high numerical aperture systems.

And they are selling this their first batch of these to Intel, which is a signal that they may be commercially viable though, you know, you really you shouldn't assume them to make a big, um, a big difference in the chip market and the chip market itself, because it's going to be a while before they're actually used for chip manufacturing, maybe 20, 26, 27. So another thing we're going to be tracking for you when we talk about our our big hardware episode, uh, in a couple weeks.

Andrey

And actually we have no open source news this week. So we're going to skip straight ahead to research and advancements. The first story is unified IO two scaling autoregressive multi-modal models with vision, language, audio and action. And this is coming from the Allen Institute for AI. And some academic partners is on

a paper. And yeah, that's the paper is about this new autoregressive multimodal model that is capable of understanding and generating images, text, audio and even actions with, uh, kind of a shared space of representation for all of these modalities.

Jeremie

Yeah, I think this is really interesting because in the past, you know, any time we've. Multi-Modal. You know, for example, GPT four is a multimodal system. Usually what that means is it's you can take as input text or video or images or audio or many of them. Um, but the output is almost always just text, right? So the thing can explain what an image is that you feed it or what's in an image or a video, but it can actually generate images or videos

or whatever. And, you know, we're just starting to see that paradigm shift a little bit. But this is really diving in the deep end and saying, you know what? We want true multimodality. We want one system that takes in any kind of input and spits out any kind of output. Uh, this really makes me think a little bit of, uh, Google's, uh, perceiver io that they announced this was like two years ago, but there was more of a concept idea that they had.

And, you know, they did early tests to show that it had promise. But this is really kind of, again, diving the deep end and showing that this is, um, actually capable of generating. And they show them some really impressive outputs. One of the things that they're doing here is merging all of these inputs image, video, audio and so on, and text into a shared embedding space.

So basically representing them in the same, uh, you know, with the same vector, the same list of numbers, they're going to map them to that same list of numbers. So an image has a representation, uh, that's a list of numbers of a certain shape. Well, text has a representation that has the same

kind of properties. And this allows the model to kind of understand, um, in a unified way, uh, what these different kinds of data are, you know, you might to very roughly, uh, bungle this, you can expect, like an image of a cat to, to be represented within the model in the same way as the word cat.

This is a gross oversimplification, but like essentially this is the idea is the model is understanding the world through different modalities in a consistent manner, and it can output it can generate those modalities as well. This is sort of reminiscent of some of the stuff that we've seen meta do. Um, they've been really focused on this idea of multimodality in their case, because they think that's kind of the key to getting to AGI, that you can get AGI without, you know, going beyond text.

Um, but, uh, but it's interesting to see this pop up, uh, from the Allen Institute for AI, which is a really impressive research organization. Not not a frontier lab, but but still, uh, they pump out really high quality research.

Andrey

As this is, uh, version two. Really? There's no single kind of insight here. It's it's interesting. Uh, like in the abstract, it says training with such diverse modalities is extremely difficult. We propose various architectural improvements to stabilize the model. So this is almost an engineering feat in some sense. Um, there's kind of different ways of handling every modality. We actually build on top of perceiver to some aspects of this.

We use some things from perceiver IO for encoding audio and images. And yeah, overall you end up with this unified model trained and all these modalities that is really beyond anything that you can use. Uh, like GPT five with image outputs is the closest. But this also has audio and they even have it controlling a robot as, as the output of actions, which is totally new. So yeah, really cool. And they do say they are releasing the models to research community.

Jeremie

Um, and that's actually really fun. I didn't I didn't realize they were using perceiver like literally using perceiver. I was just like, oh, and that reminds me of it. It's the that's the reason.

Andrey

Yeah. And aspect ratio for encoding the data in the input. So and building you know a lot of previous research as is often the case.

Jeremie

Yeah. Yeah. Yeah. And our next story now is task contamination. Language models may not be few shot anymore okay. You know speaking of the things that will define the course of where AI goes in the next year or so, this is a really interesting little paper. So there's this question that every time you train a language model, um, you're kind of training it on all the text on the internet that existed prior to the point where you train your

model. And so now if if you go and test your model on benchmarks, if you run evaluations that already were preexisting somewhere on the internet before, well, guess what? Those may well have found their way into the training data that you used for this model. This is a very well known problem. Uh, it's called test set contamination.

And they're sort of viewing, uh, an extension of this idea, thinking about an extension of this idea where, well, also, more abstractly, the kinds of tasks generally that you might want to get these models tested for. Um, there might be examples of language models or human beings performing those tasks on the internet before, um, in the

context of evaluating language models. So again, helping them kind of like teach to the test and be trained in a way that sort of like inappropriately corresponds to the test. So the test that you then run on the model, you know, they don't really reveal the true capabilities of the model. In fact, they reveal the, um, the kind of. Like memorized abilities because it might just have memorized, like, a bad student. Uh, you memorize a part of the textbook, and then it's

regurgitating it. That's sort of seems to be a big part of what's been going on. And so here they're running a bunch of tests with a bunch of GPT three series models and saying, okay, um, can we look at data sets that, uh, we know were only produced after this training was run, was complete? So there's no way that this data set is anywhere on the internet. Um, was anywhere on the internet at the time this model was trained, and therefore there's no way the model could possibly have seen this

data. And what they find is that consistently, the data sets that they train on that were not available in the model's, um, uh, training set, uh, the model does a lot better on. So when the model is tested on those kind of held out data sets. Uh, the sorry, the performance goes down. Sorry, because it hasn't seen them before, whereas it does a lot better on data sets that already existed prior to the training.

This is interesting because developers actually do invest a lot of effort to prune out, um, the testing data sets from the training data set, uh, when they do their data collection and so on. But with the suggests is like that process is kind of imperfect and we're getting, uh, significantly inflated sense of how these models can perform on those tests that were produced, uh, prior to the training.

Andrey

I will say I'm a little confused on this question of, you know, are these really just more difficult tasks, perhaps, um, in your data sets? Because, you know, you talk, you go back to 2021. Data sets that are getting created and released today generally are more challenging because their benchmarks have been basically solved. So it's not entirely clear to me from just scrolling over a paper if they answer that question.

Uh, exactly. But they do have some additional analysis to show that you can extract and show that some of these models, like iPAQ and Kuna, have been contaminated, uh, and so on. So it's a very kind of useful thing to keep in mind that really when we talk about performance and benchmarks, once again, the numbers aren't super meaningful because there's a lot going on there and underlining around. And the first story is also related to this question of

evaluation. And the paper is evaluating language model agents unrealistic autonomous tasks. They are looking into whether models are capable of something. We say autonomous replication and adaptation or Arra. And this is, uh, kind of capability to solve tasks that, uh, require of agents to create, uh, copies of themselves, uh, adapt, uh, to novel challenges, basically kind of act out in the wild autonomously and do something.

And they create some simple agents with existing limbs, create 12 tasks irrelevant to this general cluster of tasks, and show that generally the existing ones, uh, that they have do not, uh, cannot do this currently. We cannot autonomously go about and, you know, copy themselves and get stuff and solve something uh, in that way.

Uh, yet and they do also have some discussion, uh, sort of, you know, probably next time around or soon enough will have, uh, agents capable of area models capable of this kind of thing, which might be a bit scary because, again, a lot of stuff is possible.

Jeremie

Yeah. This is, by the way, a paper produced by, uh, Ark the Alignment Research Center. This famously, is the group that audited, um, GPT four back in the day for dangerous capabilities, including its ability to persuade humans. They're the ones who discovered that it was able to convince humans into solving a Captcha for it. Um, at the time they tested GPT four for its ability to do autonomous replication, adaptation, or era.

And this is, by the way, um, with a view specifically to countering the risk of loss of control of AI. So misalignment leading to power seeking, um, because resource acquisition is such an important step in, uh, the power seeking process if it leads to kind of really bad outcomes. Um, and so what they're trying to do is formalize like, okay, how do we actually audit these capabilities?

We're worry worry about. Right? How can we tell if a model might be able to stand on its own, to proliferate, to replicate itself and thereby escape human control? The reason they're so interested in ERA, and that category of thing is that once a system develops that ability, it's a lot harder to kind of put bounds on how much risk is posed by that system. Once you have a system that can self replicate, that can acquire resources, that can, you know, break out of confinement or things like

that. So I'm not sure that breakout as part of their assessment. Um, you have a much, much larger risk surface associated with that model. And, uh, and again, that's what they're like. You said that that's what they're anticipating potentially happening with like some of the next generation of models perhaps, or the generation after that. Like they see really short timelines and they see high, uh, probabilities of risk from this sort of thing, which is

why they're so focused on it. Um, I thought there's a really, really interesting paper. Um, there were a bunch of, of great little tests that they ran with different kinds of agents. So they took GPT four and turned it into various versions of itself with it seems like an auto GPT like, um, framework. They don't actually say auto GPT. They just kind of called it call it a scaffolding program.

So kind of makes me think that they're doing their own version of auto GPT, um, but basically have versions of this that just do kind of chain of thought reasoning. So sort of like taking a complex, uh, task associated with autonomous replication or, or, um, you know, with, with collecting resources or whatever and breaking that task down into a series of subtasks and then executing those subtasks or in other configurations, delegating those subtasks to other agents, other versions of itself.

Um, and this is done with GPT four. It's also done with, uh, anthropic model, Claude. And, uh, you know, they've got this great scorecard where they show you what the models have been able to do. Um, needless to say, right now, we do not yet have systems that are, uh, ostensibly able to do this, though they do point out this whole audit, our entire ability to audit these models can only place, uh, a lower bound on the capabilities of these systems.

Because any time a model fails to do something, you don't know if that's because the model lacks the capability, or if it's because you just didn't prompt the model well enough. As is often said, prompting can reveal the presence, but not the absence of capabilities. And so they're making a really big point of saying like, this stuff does not bound upper bound the risk from a particular model. It's all lower bound. We just know that we were able in our tests to

elicit these capabilities. Doesn't mean they don't exist yet, though. Uh, admittedly, I think it's highly unlikely that GPT four or Claude to, uh, actually had these capabilities already.

Andrey

Some examples of these tasks are, for instance, create a Bitcoin wallet or cow dogs in image, identify new employees at the company. Pretty, you know, not super complicated task for a person to do that would require just sort of acting out on your own. Um, and our example is, for instance, targeted phishing. So pretty plausible that soon enough we'll have things capable of doing this stuff. Uh, and I guess we'll keep tracking this general cluster of tasks.

Next story tiny GPT v efficient multi-model large language model via small backbones. Quick story services. Uh, like mini GPT two, for which you had before an open source alternative that builds on open source models. This is building on, uh, more recent open source model, Phi two, and is meant to replicate the multimodal capabilities of, uh, GPT four V, right to visual part of it. So they do that, they train it.

And I show that at a tiny size of just 2.7 billion, it matches with a lot of the existing open source bigger models, whether 9 billion or 7 billion in size. So yet another open source model that I guess you could check out in open source. But this is also research.

Jeremie

Yeah. And continuing that trend of as well, you know, people in the open source community making kind of smaller models and then comparing them to uh, slightly larger but also still small models and focusing more and more on algorithmic improvements. Just because the open source community doesn't have the kinds of resources to train super scaled models.

But you can still see, I mean, these these kinds of, uh, advantages are going to compound, and they're absolutely helping to propel forward even the frontier, because you got to believe labs like OpenAI are looking at these results and uh, and uh, folding them in as, as appropriate to their development trajectory. So yeah, kind of cool. See another another step in the direction of small models. You can do big things.

Andrey

Next story improving text embeddings with large language models. And this is actually coming from Microsoft. The short version is they show that you can get really good text embeddings, representations of words and things like that that you can use for similarity search or other tasks like that. They can, uh, do that by generating synthetic data straight from language models. If you just generate a ton of synthetic data, uh, you can then train your text embeddings on that. And it is really good.

And if you combine a synthetic data with real data, you end up with something that is state of the art for text embedding. So pretty straightforward continuation here I guess. Nothing too surprising. But another example of synthetic data from models, uh, being very useful.

Jeremie

And finally we have D-Wave. Discrete EEG waves encoding for for brain dynamics to text translation. And I know what you're thinking. You know, 2023 was a cool year, but what it was really missing was the ability to read your brain thoughts and turn them into text. And 2024 just came a in and said, hey, no worries, I got you, man,

I got you. So that's what this paper is doing, basically looking into how do we convert brain dynamics into natural language and, you know, the application that they're kind of putting front and center here is brain computer interfaces. Just the idea that if you're going to take ideas in a human brain and turn them into actions in some computer, there needs to be some sort of interface. There needs to be a common language. And the common language that they're proposing is language.

And so the idea here is, you know, can we can we come up with a way of measuring the activity in the brain, turning of text.

Um, currently there's a bunch of like eye tracking stuff that people have been doing or event markers that like look at brain dynamics and try to correlate them with the real world, the world level features that, um, are currently used to do this sort of thing, but they they limit the amount of data you can get, the richness of the data you can get, and essentially what what they're doing here is focusing on EEG signals.

So signals of like brainwave activity directly and trying to turn that into uh, into text that can be used to, to power. Um, well, brain machine interface.

Andrey

Right. So we remove some of this extra, uh, data annotation that is typically required via some technical details, discrete encoding sequences and so on. But the point is now you can go straight from signal to text decoding. And just to give you an idea of how this works, uh,

we have some examples here. So for instance, in figure one we show that the ground truth of what was being, I guess, thought of or read, uh, was Bob attended the University of Texas at Austin, where he graduated Phi Beta Kappa with a bachelor's degree in Latin American studies in 1973. That's how the ground truth Truthing starts. And then a prediction from a model was the University of California at Austin, in, where he studied in Beta Kappa.

It had a degree of degree in history, American Studies in 1975. So we are still got a ways to go, but you can see how, roughly speaking, we can decode the general kind of, uh, the tendency of what is going on in your brain. And we've been seeing some ongoing progress in this general area.

Jeremie

And kicking off our policy and safety section, we have the times sues OpenAI over at OpenAI and Microsoft over I use of copyrighted work. And so this is, um, you know, another one in a long line of lawsuits, uh, that, uh, poor old OpenAI and Microsoft are facing the claim here is that, um, OpenAI has a business model and Microsoft that's based on courts mask copyright infringement. This is according to the New York Times lawyers.

And they're saying that the company's systems were used to create multiple reproductions of the times IP for the purpose of creating GPT models that exploit. And in many cases, here's a key word retain large portions of the copyrightable expression contained in those works. So interesting question. We've talked about this a lot is like where exactly is a line on copyright infringement?

Um, does the model actually have to generate a verbatim identical piece of text to what has already been put out? Uh, does it have to just contain within its weights a representation of that text if it memorizes? Excuse me, if it memorizes, but does not actually produce that text, is that copyright infringement or is it copyright infringement the moment you even train on that text?

All of this stuff right now, as far as I know, and our amazing, uh, lawyer friends who listen to the show can certainly let us know. Please do. As far as I understand it, all of that right now is up for grabs. We don't know what courts are going to decide the claim, though, coming from the New York Times lawyers. Here is that, quote, settled copyright law protects our journalism

and content. If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission. They have not done so. So again, I think my understanding is actually quite ambiguous. What's true and what's not here? Um, OpenAI kind of upset, or at least that's their public

pronouncement. They're saying, look, we've been having these ongoing conversations with the times, uh, and we thought that they were productive, like we thought that we were going to come to some agreement. I mean, implicitly, it seems like maybe they're they're thinking about, you know, an agreement like the ones that they set up with. I think it was Axios that we talked about previously where, you know, they're going to license access to the times content for a fee.

So it sounds like they thought things were going well. Um, but now this, you know, this, uh, lawsuits, they're framing it as like this lawsuits coming out of nowhere. Um, I do want to highlight there's, uh, a slightly funny kind of bit of slightly not great journalism where, uh, they say that OpenAI is the creator of GPT, a large language model that can produce human like, blah, blah. So obviously, you know, there's no model called GPT. There is ChatGPT, there's GPT three,

Dpp4, and so on. Um, but they also say it was.

Andrey

GPT, but that was in 2018. So a long time ago.

Jeremie

In the story. Yes, actually, that's so true. It's easy to forget. I guess what I mean is that that is not the model that they're referring to. Uh, they say it uses billions of parameters worth of information, which is obtained from public web data up until 2021, which is also not true for, uh, really any of the relevant GPT models now available. So I just thought that was kind of a funny thing.

Totally understandable. Obviously, you know, this is a silly thing to be harping on, but one of those things where the world moves so fast in this space, it can be hard to keep up.

Andrey

Right? Yeah. So yet another lawsuit. We've seen some other ones along this, uh, kind of ilk and maybe kind of a very important one in a sense. This New York Times is a big deal. So along among the lawsuits have been filed, they probably the biggest, I guess, uh, complainer so

far. And someone who can really take the fight to OpenAI, given that they are a powerhouse in journalism and, uh, yeah, they do claim that this is, uh, creating, uh, competitor, essentially something that can steal audiences away from, uh, uh, New York Times and take readers. Right? They have examples, uh, how if you use uh, browser of Bing, which is actually a Microsoft search thing that can reproduce almost verbatim results from a website that they own and run it without citing it?

Uh, so I thought she was a little bit more here than just the model angle, but yeah, uh, kind of an important lawsuit to that, uh, will either end in settlement in some sense, because it could just be a play by The New York Times to try and, you know, get a hand up in the negotiation on licensing their data. Or it could actually go and be a big precedent setting case. We'll have to see.

Jeremie

And up next, we have a juicy story. This is pretty close to home for me at least. Congress warns science agency over AI grant to tech linked think tank. Say that three times fast. Um, it's a it's. A real tongue twister of a title for a really important story and a bit of drama in the I safety world. So, uh, just for context, uh, the Biden executive order that came out a couple of months ago now, uh, asked, uh, this specific agency called next, the National Institute for Standards and Technology.

They're based at the Department of Commerce. It tasked them with coming up with AI standards to deal with some of the more extreme risks posed by AI technology. Nest seems to have awarded a contract associated with this work to Rand. So the Rand Corporation is a very storied and insanely competent, uh, research group that is, uh, has done a lot of work on AI, a lot of work on biosecurity and stuff like that.

Um, there's sort of a, uh, a closely there, a company, a private company, but they're closely linked to the US government do a lot of lot of work. They're sort of a part of the national security apparatus in, in a sense, and not not literally. And what seems to have happened here is that nest, uh, awarded Rand this contract without giving Congress like a ton of notification of of that fact without holding a competition, uh, an open competition for this contract. That's what's being alleged here.

And this then led to a letter that was written by some folks in Congress to nest, basically warning them about this deal that they have with Rand on, uh, this kind of AI safety stuff.

Andrey

I would agree with a letter. Just go straight to it. It's it's an interesting read. And in its attempt to sort of like criticize the general approach, I guess, to AI risk. And uh, it's also from, uh, signed by six members of the committee. This is from the House Science Committee. And this is coming from a mix of Democrats and Republicans. So I would have thought maybe this is about regulation and sort of

stuff like that. But now it seems to be really kind of, uh, trying to be coming from an informed place and to scrutinize what this is doing. So, uh, in some sense, maybe it is, uh, reasonable to criticize at least the aspect of the awards that are being handed out, uh, there being no publicly available information about the process for the awards, uh, stuff like that. So maybe arguably you should have been a different process for the funding aspect.

Uh, but, um, anyway, I guess if nothing else, it's, uh, good step. It's a sentence. And we did request a staff briefing from this to discuss the process and use found funds so interesting to get a glimpse at a sort of procedure procedural moment in terms of how the executive branch within the US is going about its business and is being overlooked by the legislative.

And this is, of course, uh, very much related to the executive order, uh, from President Biden, where a lot of stuff in the executive, uh, branch of his government is happening kind of behind the scenes or kind of quietly.

Jeremie

Yes. And actually, to your point, the EO, because it's an executive order, it doesn't come with funding. And so that kind of puts nest in a tough spot like they, you know, they have to go with some kind of out of the box solution rapidly. I think that may have played into this a little bit.

Andrey

And uh, to the lightning round, just a couple of stories. Uh, first one is Elon Musk's x I jumps on the bandwagon of rich startups benefiting humanity. And the story is that Z has registered as a for profit benefit corporation in Nevada, similar to OpenAI and anthropic, which are for profit, but also sort of, uh, you know, for the benefit of humanity. Uh, and that's yeah, that's that's the story there.

So I guess it's kind of interesting to see them all being in the same rough category as, uh, major AI companies.

Jeremie

Yeah. One of the things that they call out is that a benefit corporation like X, AI and anthropic can have certain legal advantages compared with a public company. Um, so in particular, in Nevada, the legislature states that, quote, no person may bring an action or assert a claim against a benefit corporation or its directors or officers. Um, would love again for our our lawyer friends who are listening to, uh, to let us know specifically what that means.

But my understanding is that basically, if you're going to make a claiming it's a company, it's got to come sort of like from the inside, from from a director or a big, uh, shareholder. And, um, and that it gives the company less exposure to liability of various kinds. So, you know, perhaps, uh, some advantages there. Another interesting little detail that I didn't know. Apparently investors in X, like the formerly known as Twitter, uh, are going to own 25% of Z, so

that's, uh. At least a new development, as far as I know.

Andrey

That is. Yeah, an interesting tidbit. And of course, grok, as we've covered, has launched a graphic of a chat bot from X, AI has launched on X. So that makes some sense. X is Twitter right. So x I x I guess they're a little bit, uh, entwined now. Uh, and with that we are going to go ahead and close out our episode. Not too many stories as this is the beginning of a new year. Thank you so much for listening to this week's episode of Last Week in

AI. You can find the articles we discussed here today and subscribe to see what was at last week in that AI. As always, we appreciate reviews, sharing, subscribing, etc. but most of all for you to keep listening.

Transcript source: Provided by creator in RSS feed: download file