#156 - OpenAI's Sora, Gemini 1.5, BioMistral, V-JEPA, AI Task Force, Fun! | Last Week in AI podcast

⁠¶ Intro / Banter

Andrey

00:05

Hello and welcome to Last Week in AI, where you can hear us chat about what's going on with AI. AI as usual, in this episode, we will summarize and discuss some of last week's most interesting AI news. You can also check out our last week in AI newsletter. As lastweekin.AI for articles we did not cover in this episode, I am one of your hosts, Andrey Kurenkov. I finished my PhD focused on AI last year and I now work at a generative AI startup.

Jeremie

00:31

And I'm your other host, Jeremie Harris. I'm the co-founder of Gladstone, the AI national security company. And, yeah, I mean, I guess consistent with the weird thing, I've been on the last two episodes where I've randomly been saying like, hey, guys, like, we're hiring anybody who wants to reach out, just reach out to us. And I'm looking at these, like, weird sounds. Well, here's a random announcement. I got a buddy called Ben. I was looking to make career move.

00:53

He's got a ton of experience leading AI teams. I don't want to mention the company, but it's like a fortune 100 that he's been working at. Doing a bunch of really cool stuff on AI specifically, and physics, blah, blah, blah. I spent a bunch of years in DC doing stuff in the federal government on AI policy. That's kind of how I ran into him. So if you want to, you know, I don't know, reach out to him if you want to connect. Hello, Gladstone. I be happy to hook you up. And that is my weird pitch.

01:15

It feels like a kind of flea market at the beginning of every episode now. I'm sorry about that.

Andrey

01:20

Yeah. No wonder why I didn't do this last year when I was graduating. Just like, hey, I graduated. Who wants to get me a job? Maybe I should have.

Jeremie

01:29

Life's full of regrets.

Andrey

01:30

Quick programing announcement for any regular listeners or even new listeners, we're going to try something new. This episode is just a small little tweak where we will add a new section of news that isn't really news per se, but just sort of like a place to put fun things we came across that don't necessarily fit in any of our sections we have. So you're going to call it a fun section, even though some of it might be very nerdy and not necessarily funny, traditional sense.

02:00

And yeah, so we're going to caffeine's out of some kind of less serious stories, hopefully, and have some fun and then close it out.

Jeremie

02:09

We know, of course, nerdy is going to be a huge turn off for our audience because we talk about AI papers all day. So, so sorry about that.

⁠¶ Tools & Apps

Andrey

02:17

Well, let's get going. Venues starting with first section tools and apps. And our first story is, of course, Sora. So this is not quite a tool yet, but I think close enough. We're going to get going there we are starting with open AI's Sora, their new text video model that came out right after we recorded our last episode. Helpful. And, yeah, it was, definitely the biggest news story of the past week, I would say. And the gist is it's a really, really good text to video model.

02:53

So as soon as we announced it in the blog post and on Twitter, there were a bunch of example videos that we put out, some of them quite long at 20s, and it is just beyond anything we've seen with text to video AI super, kind of high resolution. Clear still has some artifacts, but now you do have to look pretty closely to see them, like, you know, like weird, kind of like trippy, objects changing it to other objects, things like that, but in general very, very impressive.

03:33

And of course, now there were a ton of discussion, a lot of like, oh no, AI is going to take over everything. Hollywood is doomed. All that kind of response. Probably a little overkill, I would say, but definitely pretty dramatic. How much of a leap of this is.

Jeremie

03:52

Yeah, it's actually it's funny. You know, you said Sora's the kind of the big story of the week. Obviously we've got Gemini 1.5 as well, which we'll talk about in a minute. I think the world is divided into two camps. The people who think saw as the big story, and the people think Gemini 1.5 is a big story. It kind of feels like that. What was it? You know, when Twilight came out and there was the Robert Pattinson people in the the I'm like, showing my age is really I'm also I didn't

04:14

see Twilight guys, I swear to God. Anyway, so, this is actually, I think, a really impressive breakthrough. If you are hoping to find out about how specifically a model set up, by the way, be prepared to be disappointed. Because the technical report that OpenAI published here says model and implementation details are not included in this report, which is a very, very slight undersell. They do talk a little bit about it. Essentially this is it says, first of all to transformer.

04:39

This is an interesting thing in and of itself. Write transformers. Usually we use for text generation, GPT, that sort of thing. What they're doing here is they're basically taking videos. Obviously video is a sequence of images, and for each image they kind of embed it in a latent space. They use an encoder to essentially extract from that, from that image. It's meaning. Basically, they create a vector, a list of numbers that encode the meaning that was captured in that image.

05:05

That's something that's done very often in kind of computer vision applications, that sort of thing. So now essentially for every image in the video, they kind of have a, a bunch of numbers that describe the meaning in that image. But more than that, they do this for patches of the image. So for every patch of an image, they have a description, a sort of list of numbers that capture the meaning of that patch. And then they have that for each image over time.

05:32

And so now you can start to think about not just having a patch of an image, but a patch of an image that you track over time over several frames of video. This now you might be tempted to call a space time patch. Ooh, space time patch. Yes, we're talking about space time. So this is a space time patch. That's what OpenAI is calling them. And these are like the atomic units of meaning in the context of this, video generation tool that they've built. They're just like the tokens that, GPT models

06:03

get trained on during text prediction. Usually that's like the syllables that make up words. So that's chunks of meaning. While these are chunks of meaning in video. And one of the big breakthroughs that seems to have happened here is OpenAI's figured out the right way to chunk up videos such that language models can actually learn from them. So Sora is indeed a transformer.

06:22

A reflects OpenAI's ongoing belief that transformers are pretty robust, that they can learn world models in other words, that they can create internal representations that capture physical facts of the matter about the universe. Even. Ultimately, some people think laws of physics or things that are that deep. Certainly people have argue that that's happening with, you know, GPT style models.

06:44

But here we have the first time that's represented in video, which is a great way to test whether there is an actual physical world model here, because you can see glasses shatter or not shatter in these videos. You can see balls fall and not fall, and you can assess how does this thing seem to have an intuitive grasp of

07:02

physics? And one of the really interesting things that they seem to find is that that is in fact the case, at least to an extent, and that the grasp of this model, the physical kind of world model that this thing has, gets better as it scales. Again, consistent with OpenAI's thesis about scaling.

07:19

One of the really interesting things here was apparently the model emergent develops the ability to portray 3D scenes without having any, special architecture that biases it in that direction, without what are called inductive biases that push it that way. Just with scale, you just scale it up, train it with more compute, during training time. And it importantly develops this ability.

07:40

And so I just thought there was so much here, such an interesting, breakthrough and also notable because it's not a language model. Right. This is not or at least it's not just the language model. It's trained in tandem with video that's chunked up as we described, and text input as well, but they correlate with the video.

Andrey

07:56

And to add just a bit, more technical detail beyond that. It is also not just a transformer. It's a diffusion model. In a way, it, reminds me of what we covered just a few weeks ago from Google. There was the paper Lumiere as space time diffusion model for video generation, where a big deal was that they didn't separately generate frames in a video and kind of stitch them together. They had an end to end model, sort of speaking diffusion model that generated the whole video end to end.

08:27

And my impression, although it's hard to tell from the pretty skimpy technical report, is that this is similar to that in nature. So this is a diffusion transformer in a similar way. And I think, again, my feeling is that the real difference is, as usual, OpenAI is scale, right? It my impression is we just threw compute at a swing of a train with full resolution videos.

08:58

We go into some details here of how in training, often in the past, people kind of cropped videos or down raised videos and then tried to appraise them again afterward. And they say that they just trained on full res HD videos and they can now generate and as a diffusion model, this thing can do a lot of stuff. So it doesn't just do text to video. You can give it an image and it can automate it. So we have some examples of images from Dall-E and having

09:30

them be animated. It can extend the generated videos so it can continue something that, if you give it as input, it can do video to video editing, lots of these sorts of things. So you just got to go and see it for yourself. Sadly, this is not a video podcast, at least not yet. Maybe one day I'll find the time to make it so. But just, go check out the link in the description, as always. Or just Google OpenAI. I saw, and you'll see that these videos are. Pretty impressive.

10:02

And it, is still not the case that anyone can use this. It was announced, and they say that currently, Sora is only available to red teamers assessing a model for potential harms and risks and some artists, designers and filmmakers for feedback. There's a bunch of stuff on safety in the blog post saying that they'll be working on watermarking and detection and whatnot, so might be months until this is an actual tool that is out there, but certainly it's going to happen sooner rather than later.

Jeremie

10:38

Yeah. And I think just as a last quick note, it's worth putting this in the context of OpenAI's AGI mission, right? Because that, of course, is opening does everything they do to try to build AGI. So the question is like, how does Sora fit into this? You know, one piece. We touched on this idea of building a physical world model. You know, there were there was a time, like, literally 20 minutes ago when people were saying, well, there are certain things that scale simply will not do there.

11:03

Certain, for example, conservation laws in physics that cannot be learned by these systems consistently. And this is where people made the argument you need like explicit symbols, symbolic AI, you need neuro symbolic approaches at a minimum to capture these things. And one of the really interesting, interesting things that we see in this result is that Sora actually displays what's known as object

11:25

permanence. And so there are many cases, for example, where, you know, a painter in a saw, a video will paint something, and then the thing that they paint, the streak of paint will remain there throughout the video. So you had that coherence happening over long stretches of time. That's the notion of object permanence. Things remain there after the cause, you know, that brought them into being, or at least the objects. I mean, classically, object

11:46

permanence, the thing that babies live, right? You hide your face behind a thing and then they go, oh my God, the face disappeared. So the ability to track objects over long time horizons seems to emergent have appeared here. The, the second piece is, you know, I think one of the big things that distinguishes OpenAI's approach is they are really, really good at figuring out what are the atomic components of a data set that will allow a model to have highly extensible

12:13

behavior. Right. We saw that with their giant bet on language modeling, we've seen that as they've built image generation tools, figuring out in this case, okay, it's the space time sort of chunk, this this blob of space time that is the atomic component. If we get an AI system to use that, to chew on that kind of data, to look at the problem that way, all of a sudden we unlock all this

12:34

generality and all this behavior. So I think this is starting to emerge as a consistent theme with OpenAI's biggest breakthroughs, where we're seeing the kind of chunking up of data, looking at it from the right perspective, in the right frame, and then only then applying massive scale to it. So I think a really interesting breakthrough. And there's there's tons of tons of detail in the technical report here, but maybe not as much as nerds like us would would like.

Andrey

12:55

Not nearly as much, to be honest, but there's still some sort of some fine details. And yeah, as you said, there was some conversation online of this being essentially a world simulator. And we'll get back to this a little later, actually, with some announcements from meta. But yeah, you could argue that this is learning physics and learning kind of common sense about what's happening with, things and that there's a whole kind of philosophical thing. And, probably we should just go ahead and

13:22

move on about getting into it. So moving on. The next story is, as was foreshadowed, Gemini 1.5. And this is coming from Google. This is Gemini 1.5 Pro, and it was a pretty big deal, as Jeremy said, because it is really good, supposedly, or at least according to announcement, it is as good as Gemini Ultra and is seemingly going to be more efficient.

13:55

It's trained using a soft mixture of experts from what we know, and somehow is able to deal with an absolutely gigantic context window so it can take a ton of input. Like, I forget, it was like 1 million tokens, something ridiculous.

14:12

So then Outsmart did seem to a lot of people like a huge deal of having, you know, faster to run model that is as good as Gemini Ultra and therefore kind of on par of GPU for that is being rolled out for now to developers and enterprise users and will presumably soon be coming to consumer users, which will bring even more pressure on open AI if it is priced the same as Gemini Pro, which is not their GPU four tier model. This is they're like three year or less expensive model.

14:48

That would be, I guess, a real social competition.

Jeremie

14:51

Yeah. It's interesting too, that the model is they describe it as a midsize multimodal model. I don't think they actually tell us the number of parameters. But, you know, if. If we use the Andre crank of, scale, scale model here, maybe a little midsize on a what would that be? 30 billion parameters, something like that, anyway. Yeah. So it seems reasonable. Let's just. You heard it here first. It's probably 30 billion. You know what? It's definitely 30 billion parameters.

15:15

We're reporting that right now. So, yeah, it's a midsize model. It's got a lot of, as you said, in interesting, characteristics, one of which is, of course, the widely advertised context window. So you're right that they so they report on the 1 million token context window. That seems to be the the thing they're anchoring on, though they do advertise, they've successfully tested up to 10 million tokens. This is insane.

15:41

To give you an idea, this is like. So they tested it out on one hour of video, 11 hours of audio code bases with 30,000 lines of code. I mean, you can fit an entire code base in this thing, you know, 700,000 words, which is more of that 1 million, token context window. They give examples like they served at the 402 page transcript from Apollo 11 mission to the moon. And it could reason about conversations. It could recall events in details. And this is really important.

16:09

One of the things that we find when we scale the big context windows, we we talked about this in the context of Claude and Claude. Two, when these really big context window models started coming out. These models tend to forget things that are in the prompt. So if you say something, you know, early on in your prompt and then, you know, 100,000 tokens later, you say something else and you want to kind of take advantage of a linkage between those two ideas. The thing will not be able to do it.

16:33

One of the ways that diagnostics that people use to test for this is known as a needle in a haystack test. So you'll bury a little detail somewhere in your gigantic prompt, and you'll see if the model can recall it after. Turns out this model blows everything, and I mean everything that has ever come up before out of the water. It blows GPT four out of the water. It blows Gemini Ultra out of the water. It is incredibly good.

16:58

We're talking about over 99.7% recall for this needle in a haystack problem for up to 1 million token context windows across all modalities text, video and audio. This is insane, right? Apparently it even maintains that recall performance when it's extended to the 10 million token mode. And at this point, when I look at that, like we're seeing something like something has shifted here, there is something fundamental and algorithmic that's going on in the back end. This is not just scaling.

17:26

At least I, I almost hope it's not scaling, because that would mean, holy shit, scaling has just like, done something absolutely insane. I don't think that's what's happened here.

17:37

You know, total conjecture. But one of the few mechanisms that I can think of that allows you to achieve something like this is to have some I mean, it's not a state space model, but to have some kind of state spacy thing here where you can retain a memory as you're going through, as you're chunking through these giant, giant prompts. No idea what that might look like.

17:57

Again, total conjecture. It's probably not the case, but this is just so, so weird and out of band with what we've seen previously with Transformers. It's allowed the system to have insane learning speeds as well, because it really can soak up all the content in that context window. Something fundamental has happened with this model's ability to understand and absorb context. It was able to pick up this obscure language with fewer than 200

18:22

speakers worldwide. It was given a grammar manual for it. The language is called Kalman and apparently learned to to speak it or to write it as quickly as a human would. In other words, using the same amount of data than a human would. This again, is, you know, it's another one of those kind of goalposts that we used to have that said, hey, we're on our way to AGI because, you know, we're not on our way to AGI because we can't do this sort of

18:42

thing. We can't have systems that learn as fast as humans. Well, here we have that. So I think it's a very, very interesting breakthrough. Not a lot of of detail specifically about the how the recall is achieved. And I think that's the to me, that's the fascinating thing. You know, the expected performance boost that comes from all kinds of optimizations and jiggery pokery is absolutely there. And, you know, as as you said, Andre, it compares favorably to Gemini 1.0 and other models.

19:08

But we don't know. We don't know how this thing was built or how it was aligned. So not a lot of detail. I was able to find the paper about like, you know, the Jeff process, is it DPO like what's going on there? But all we know is, holy crap. This is a really powerful model.

Andrey

19:23

Yeah, it it sure looks like it. And it's a shame we don't know how it can achieve such a long, effective context window. It's been kind of an emerging topic in recent months. So we are is really like starting in academia and published papers to see more work on this problem. For instance, back like two and a half months ago, early December, Tropic Data published a paper long context prompting from cloud 2.1 where they showed how to get an effective to. Hundred thousand token window we've ever

20:02

had this little hack of. Just like, add a little sentence to a prompt, he has a most relevant sentence. And that made effective. So I would not be surprised if this is, achieved by tweaking the decoding process and the prompting process primarily, although as you said, that there could be also algorithmic modifications to a model itself or all sorts of things going on here. 1 million tokens is is huge. It's kind of hard to convert into intuitive stuff, but that's about, 750,000 words

20:37

ish. So some really big books. Yeah. Or like all of Harry Potter, somewhere in that kind of benchmark, you know. So yeah, pretty impressive announcement and pretty impressive to see this coming so soon after the initial announcement of Gemini, which was just a couple of months ago, we saw the initial rollout of Gemini Pro, its release, and Baade and Gemini Ultra even more recently came out. Now we have Gemini 1.5.

Jeremie

21:08

Which is it's a week later on today. Like it's about time that we get another generation of language models.

Andrey

21:13

I know why it's so weird now. Gemini Ultra isn't even ultra anymore. It's Gemini 1.5. Pro is as good as Gemini Ultra. Very weird, but, pretty exciting.

Jeremie

21:27

Yeah. You know, one last, comment that did come to mind, and this is especially on the scaling side, you know, but forgive me, I'm obsessed with scaling, but there's this interesting, figure where, you know, the so that what they do is they get Gemini 1.0 pro. Sorry, 1.5 Pro, I should say to, to basically run predictions on, on code.

21:48

So it's, they give it a massive code base and they have it, they look at like how essentially the so they feed this, this part of the code base, let's say to the model, they try to get it to predict the next token. And they see how surprised it is, if you will, at the, at the next token. Right. So if it's really surprise, it's a bad model, right? It hasn't been able to build on the previous context in order to

22:11

inform its prediction. And what they find is, you know, unsurprisingly, as you'd expect, as you feed the thing a longer, longer, more and more of this code base, it gets progressively less and less surprised at the next token, its predictions get better and better, and what you generally expect to see here is a power law

22:29

fit. So as you as you increase the number of tokens that you feed to this thing, it'll kind of, sort of exponentially, you know, drop the, the number of errors that the thing's making, its predictions will become exponentially better. Now, what ends up happening in practice, though, is that Gemini 1.5 Pro goes improves even faster than that.

22:50

So it's actually outpacing its ability to predict the next token based on the context is actually accelerating faster than it should, according to at least all prior convention. That's another thing that leads me to think there's something fundamentally algorithmic going on here. Unless, hey, maybe scaling is doing this too, in which case, holy

23:10

shit. This implies that there's something qualitative going on here that's allowing the model to chew on its context, to make predictions that are qualitatively better than what we had expected before and what any other models have done. So that's, a really interesting and weird thing. They, they don't really do more than just like, speculate about why that might be. They seem kind of confused about it.

23:34

So yeah, just wanted to call that out. That's something, by the way, that was also mentioned in this great video by I explained, which I recommend checking out too, about, about this whole paper. But you know, when you're, when you're looking at scaling as a way of getting into AGI, when you think about transformers, these are the sorts of indications you might be looking for that something weird is afoot.

Andrey

23:52

Sure. We just found the best hyperparameters. That's secret.

Jeremie

23:55

Yeah.

Andrey

23:55

That's right. Just the best parameters for training. All right, on to a lightning round. Starting the a grow q AI model goes viral and rivals ChatGPT and other chat bots. So this is grok you not to be confused of grok, the, chat bot from Elon Musk's z gra Q is coming from great Q Inc, which is a company that's been around for a while since 2016, and they are focused primarily on hardware. So they claim to have created a first language processing unit

24:32

to run this model. And they posted a, demo of it on Twitter that, per the headline, kind of went viral because of how fast it is. This is an A6 chip. So an application specific integrated circuit, not a general purpose chip like GPUs are, and it allows it to generate about 500 tokens per second. Compared to, for instance, GPT 3.5 is 40 tokens per second. 500 tokens per second is roughly 400 words per second. So it's really blazingly fast.

25:12

And the reactions have been pretty dramatic because of that, because it kind of changes really the experience talking to a chat bot if it is that quick.

Jeremie

25:20

Yeah, I want to do a combination of hype this up and throw some cold water on it, because this story really, captured at least my attention. I think it's it's so important to track these kinds of breakthroughs. So first things first. Yes, it's blazingly fast throughput. No question about that. They see it turns out about four times the throughput of other,

25:41

inference services. And yeah, it like it's, it's very, very quick and no doubt their by the way, their chips are entirely fabricated and, and packaged in the US. That's a big advantage they have to over other companies that are doing, you know, have a complex supply chain that involves Taiwan and South Korea. But one of the things that I think we have to keep in mind here is performance is about more than just how much throughput you can get, like how many tokens you can get out

26:08

the other end. You also have to think about how many, how many customers you can offer this to at a given time. And and here I'll just share some some insights from Semi analysis, which is a great firm that looked into this. So when you when you look at this, this grok system I think it's pronounced grok by the way because they got into a tiff with Elon about like he took their name. It's just a different I believe that's what's going on.

26:32

Each of these chips, they're super fast, but they have crazy small amounts of, of basically of onboard Ram, 230MB. Right? In a context where language models are like, you know, 7 billion parameters for a small one on the Andre crank up scale. So 230MB doesn't get you a whole hell of a lot you need it. Turns out some like 600 chips, 600 of these chips in were to have the inference you need to serve even a mixed role model, whereas you can do that on a single Nvidia H100 chip.

27:03

So now you're having to buy a crap ton where you're having a, you kind of dedicate way more, sort of data center infrastructure to serve these chips. And so it is blazingly fast. But it's it's important to note, right, this is a big, big limitation. And the cost equation is far from clear right now. Grok is currently losing money on their API. They're going to need about A7X increase in throughput in traffic and utilization to break even. And that's a reflection of the weird unit economics.

27:32

They're just going in a different direction with this. And I think the last thing that I really need to mention is this is a chip that just does inference. It does not do training. And that's really, really important. That's a key limitation. But it's also a hint, right?

27:45

We've talked on the podcast a lot about how hardware breakthroughs and model breakthroughs are algorithmic breakthroughs are starting to bias us towards a direction where increasingly, our models are doing more and more of the thinking at inference time.

27:58

Rather than spending your compute during training, you're going to start to spend more and more of your compute at inference time, where you get these models to prompt themselves a bunch of times using crazy prompting techniques, like self-consistency and chain of thought prompting and all that jazz just to get a single output. Right? So you're doing a lot of the putting in the elbow grease after the model's been trained rather than before.

28:19

And I think this kind of breakthrough is another push in that direction, perhaps. Right. We may see these chips optimized ultimately for training too, that I don't know. But I think it's really important to note this is a key constraint. And it does reflect something that we'll be seeing more and more of more custom chips, for, for specifically LLM use cases that are specifically good at inference.

28:38

And I think if nothing else, it's a great sort of warning shot that we can expect a lot of room for growth on chip design, even using existing fabrication nodes, even at the five nanometer process node. I think this one actually might even have been a bigger node. So not even the cutting edge TSMC one. So, anyway, really, really interesting breakthrough with a lot of, I think, depth and detail too.

Andrey

28:57

Just to be super clear, this is not a new chat bot. This was kind of a demo I posted, mainly to showcase the technology and where serving open source language models like Nick Straw and Llama two at very fast speeds. So the big deal is Rpu re language processing unit. And we'll see, as you said, if you'll have more of these, specific not general purpose hardware for inference. Next story. Something I found pretty cool. Introducing IP adapters create consistent game assets

29:33

in seconds. So this is actually not a new story. This was a release from the company scenario, where scenario is a tool that is used to create assets for primarily video game developers. And the announcement is essentially that you can now have, reference image of, let's say, a single character. And from that one reference image you can create all sorts of assets. Of that same character that are consistent with the initial image.

30:03

So you can say, you know, give me this character, but in these clothes, give me this character. But we these glasses give me this character. But running stuff like that, and I think worth highlighting because this has been one of limitations when it comes to text to image and image generation is like how if I'm creating, you know what I mean? Movie, webcomic, video game. How can I make the same character show up throughout? It's actually been a pretty tricky problem.

30:33

We've been seeing more and more research come out in recent months, showing how you can get consistent character generation in various ways. And here's an example of that coming out, in a pretty established product where you can say, okay, now I have this specific character and I can generate a bunch of assets of a character in various contexts, doing various things pretty, you know, necessary for us to actually have an impact in the industry, I think.

Jeremie

31:00

Yeah, it's interesting too, because they are explicitly generating IP here. So it makes me think about the whole copyright thing and like, you know, what their copyright protections are because this is meant for explicit use in commercial applications. So, you know, you never know depending on how the thing's trained, what the training data set is, or other people's ideas or IP going to creep into, you know, the things you get generated. I think it's a it's a really cool tool, man.

31:23

The demo on their website, by the way. Holy crap of have you, have you like used it in your workflow.

Andrey

31:28

Played around with it. You know, I haven't I haven't used it directly. We have our own models. But yeah.

Jeremie

31:33

You have your own models. Oh, yeah. And up next we have report OpenAI working on web search product. And this is a pretty short report, but, maybe it shouldn't be too surprising. We have OpenAI that has, they've got a web crawler already. It's GPT bot. We have ChatGPT plus users who can browse with Bing. And then of course, Microsoft used, sorry, Bing rather used GPT for, for its customized search product or still does. And so, you know, it's we're circling the drain here a little bit.

32:05

And so perhaps not not so shocking that we have now OpenAI potentially, according to this report, looking into, the search product, I think one of the big questions is, are they going to be able to make a dent that's bigger than what Bing was able to make? Thanks to GPT for, you know, Bing's market share famously just like barely budged after the, you know, after the initial hype wave, surpassed on the GPT four launch. So, hard to know.

32:29

Right now there's a big imbalance between, you know, Google, which has like almost 85 billion, visits that they saw in December versus ChatGPT with they got 1.6 billion. But of course, you know, chat bot different product from a search engine, no question. So we'll see if they can compete whether on quality or just, or just distribution.

Andrey

32:49

It's also the case that we have other, competitors in space. There's perplexity. There is u.com all doing this kind of AI enabled search idea. And of course Google has that already built in to I guess not Bart anymore. Is it Gemini now? So it's already going into a crowded space, unlike the initial ChatGPT release where it was the first one of these. But that does make sense that they would go ahead and put it out there,

33:18

I guess. And, one last story for this somewhat long section where you wound up talking for a while. The story is Adobe Acrobat adds generative AI to easily chat with documents. So that's the idea. There is now this new tool, AI Assistant in Acrobat, which is a conversational agent that can summarize files, answer questions, and recommend more based on the content, allowing users to interact with documents in a chat like matter.

33:48

Very intuitive I guess. Use of integrating chat bot Acrobat is a way to read PDF files. Pretty popular as as far as I understand for looking at this sort of stuff. So yeah, pretty impressive. Or maybe just notable that Adobe is continuing to push out AI features throughout its product suite, not just in Photoshop, but now also in Adobe Acrobat.

Jeremie

34:16

And up next, we have applications in business, and we start with Sam

⁠¶ Applications & Business

34:20

Altman owns OpenAI's venture capital fund. This is a weird one. It's also one that has well been in the news without being in the news. So, you know, Sam Altman famously testified before Congress about, you know, concerns over over catastrophic risk from the AI and that sort of thing. And during that testimony, you know, he was famously asked, like, how much of OpenAI do you own? He was like, none of it. I don't own any equity. And that's a really weird thing to have

34:45

happened. You know, this was sort of framed as Sam already has investments in tons of companies. He doesn't need any more money. Maybe there is sort of a vaguely ethical, dimension to this, too. Or somewhat ambiguous, but the. That's the lay of the land. Now we're finding out that OpenAI and its venture capital fund, OpenAI as VC fund, which, by the way, has like about 175 million in total committed investments.

35:12

And they've invested in companies like Descript and Harvey, which is a really popular legal tool. Is it's it's actually owned. It's owned in Sam AI's name. The by the way. So the fund also has, LPs limited partners that kind of like Co-invest, I guess with OpenAI, they include Microsoft. So it's a pretty certainly it's a fund with a lot of access because of where OpenAI is in this nexus of AI startups.

35:37

But yeah, so the the really weird thing here is even like OpenAI and its nonprofit foundation do not actually have ownership over the startup fund. It's literally in Sammys name. And there was a quote from OpenAI spokesperson who says, well, look, we wanted to get started quickly. And the easiest way to do to do that, to our structure, was to put this in Sam's name.

35:59

We have always intended for this to be temporary, which, I mean, I'm no legal expert on it, certainly not in corporate law, but this is a really weird way to kind of, like, do a temporary structure. Like you're just gonna put it in Sam's name. One of the questions this article raises is like, what would have happened if the board debacle had just gone a different way? Like what would have happened if they kicked out Sam? A he actually stayed kicked out.

36:25

And you know, now this dude owns the entire VC portfolio for this opening. I VC so like seems like it introduces all kinds of risks, risks that almost materialized even. And so just a really, really weird arrangement, you know, no clear answer to why this has happened. But all they say is, look, we know that we need to reexamine our governance structure. And that should come before changes to the fund. And then they're focused on on creating a new board. But all of this is just a really weird.

36:56

I mean, it's a sort of thing that, again, not being a corporate law expert and OpenAI has a famously, like, weird corporate structure. So there could easily be an explanation buried in there. And I shouldn't play that down. But this seems I mean, it seems almost like the amateurish thing that I might try to do if I was like, yeah, you know, whatever. Like a I only have so many hours in the day, let's just put in my name for now and we'll, we'll figure it out in post.

37:17

So really, really weird arrangement.

Andrey

37:19

Not like a huge implication, but a very weird reveal. I think, as you said, I was also very surprised when we linguists, the fund that was launched in late 2021 and the 175 million network commitments was as of last May. So it may actually be even bigger now. And, yeah, it's, kind of a real surprise, a weird arrangement going on here. For several years, this temporary situation seems to have been kept as is, and I suppose it does seem likely where we'll look into a board and governance

37:59

structure. But this will be reexamined at some point. But yeah, yet another aspect of illegal structures around OpenAI that is unusual and very, idiosyncratic.

Jeremie

38:13

Yeah. There's secretly a, a fund that's keeping lawyers afloat.

Andrey

38:17

And next story Reddit signs AI content licensing deal ahead of its IPO. So this is relatively, detailed. This is kind of a report. They say that Reddit has been telling prospective investors in Sapo that it had signed the deal, and that it is worth about 60 million annually. As far as income. And this is apparently what some people had told the reporters here at Bloomberg, this would be a pretty significant chunk of their revenue.

38:54

Reddit does take in 800 million in revenue last year, so that would be a pretty significant amount of income just from licensing this stuff that's on Reddit to for use for training AI. There was a huge controversy at Reddit last year when they limited access to their API, so that people couldn't easily get access to a data. Anyway, there was a whole like a community revolt thing going on because a lot of apps and things built using the API no longer work.

39:26

And that shutdown and that kind of conflict with the community happened precisely because the idea was to keep the data inaccessible unless companies paid to get access to data for AI training. So makes sense to see this happening. And I guess it'll be interesting to see if people do wind up paying for all this data.

Jeremie

39:46

Yeah, what's really weird about this is they don't name, well, not weird, I guess. It's confidential. They don't name the AI company the agreement is actually with. And so we we don't we don't know. But, you know, I think there's a good chance. Could start with an O and end with a pen. I, and the reason I'm saying that, ready to stand completely correct if this, you know, blows up in my face the beauty

40:07

of podcasting. But, so Reddit has a kind of long history, in the specific social circle that would cause it to be well connected to open AI. They were a Y Combinator backed company back in the day. Sam Altman himself was the president of Y Combinator, actually, still, at the time that I went through it. And Sammy actually has been on their board, I believe he still may be. And, and so is, Michael Seibel, who's another, partner at Y Combinator. So a lot of social and other entanglements there.

40:38

I would not remotely be surprised if this was actually a, a wiki or, sorry, a, an open AI, deal in partnership. But either way, this is a deal with just one company. So we also don't know if it's an exclusive deal. You know, maybe there are terms that prevent Reddit from double dipping, selling the data to other, AI companies. And it also, no matter what, could be precedent setting.

40:58

Right? So we now have a price point $60 million per year, it seems, for data of the quality and scale that Reddit can provide. And so that's a really interesting kind of like, I guess, waypoint a marker for, potentially future deals to come.

Andrey

41:13

Do lying around for a story. And Nvidia reveals its EOS supercomputer for AI processing that is supporting 4608 H100 GPUs. So where you go, we have this crazy data center of scale supercomputer designed specifically for AI applications. And yeah, so having thousands of supercomputer level GPUs vs H100 GPUs cost I don't know the exact number now, but they cost a lot. I think it's tens of thousands to get just one, and it's pretty hard

41:50

to get your hands on one. And beyond just the hardware, of course, there's also some pretty impressive, networking tying all of this together. Like this article has some details talking about quantum to InfiniBand networking and software, providing 18.4 extra flops of FPGA, AI, performance, etc. etc. lots of big numbers, but point is there's now an Nvidia built supercomputer for AI, and it is ranked number nine in the top 500 list of the world's fastest super supercomputers.

Jeremie

42:26

Yeah, in that list, you know, it bears mentioning it's not going to include all of the, actually all the most powerful, supercomputers. You know, when you talk about you, the clusters that Microsoft is running that that meta is running, you know, those are not necessarily going to be there, even if they are coupled together in one data

42:42

center. But also worth mentioning that, the the world's most powerful, man, I don't even know what's call them, like clusters of computing are not necessarily even all under one roof anymore. Right? Google, famously, is now working on ways to train across multiple data centers at a time, so it's increasingly

43:03

less meaningful. And when we talk about, like individual supercomputers that are really, really fast, increasingly, the ability to wire together, effectively wire together, super supercomputers and clusters, is really important, but still very notable achievement, especially because Nvidia actually stopped focusing on the, the double precision gains basically.

43:25

So, so the very, like, like, you know, double precision floating point number calculations, and basically so they could focus more on AI related stuff, a while ago. And that's part of what was being measured here. Just to give one number, I guess that maybe, folks listening, you know, you'll maybe listen to and be like, oh, okay, that that I can relate to. So a famous benchmark is the Mlperf training benchmark.

43:50

And this is basically where you train GPT three, which again is 175 billion parameter model. And you train it on 1 billion tokens and you see how long it takes for the hardware to handle that training run. And it turned out that iOS, did this training run in about four minutes, which was three times faster than, what had been able to do

44:09

just six months ago. So pretty insane, like thinking about, you know, training what used to be what used to be just like three and a half years ago, a cutting edge model, and you're training it in, like, four minutes on one on one supercomputer. Pretty wild.

Andrey

44:24

Fun fact iOS apparently the Greek goddess said to open the gates of Dawn each day, which is, pointed out in a blog post, so.

Jeremie

44:34

Oh man, I was going to say Andre has some Greek Greek mythology skills. Now I'm just waiting to drop that.

Andrey

44:39

Not that knowledgeable. Next up, Google quietly launches internal AI model named goose to help employees write code faster, leaked documents show. So yeah, not a product that's aimed at consumers, but nevertheless a new AI release from. Google, at least internally.

44:59

This is supposedly a descendant of Gemini, and is meant to pretty much help write code using the internal tech stacks, of Google, so meant to pretty much speed up, the thousands, tens of thousands of software engineers, Google employees across all its various products make sense for them to try and invest in it a little bit.

Jeremie

45:26

Yeah. It's, apparently trained on, as they put it, the sum total of 25 years of engineering expertise at Google. So I guess presumably just like on all their, all our code. Yeah. It looks like this is potentially aligned as well with, you know, some of the Google efforts to like do a bunch of, you know, efficiency increases, read layoffs, through AI potentially. That's at least what it seems like from the outside.

45:51

But then again, their chief business officer also explicitly said, quote, we are not restructuring because AI is taking away any jobs. I guess that's different from intending at some point to do that. But, still kind of interesting that, that they're doing this. It has a 28,000 token context window in case that's, sort of of interest. And last quick note is that it was a collaboration between, all of the different parts of Google that do AI things.

46:17

So Google Brain, DeepMind, and then the Google internal infrastructure team. So very wide ranging effort here.

Andrey

46:24

And next up we are going to have couple of funding stories. First up, Chinese startup moonshot AI raises 1 billion USD in funding round led by Alibaba and VC Hongshan in interest of opening AI tap firms. So there you go. This is a huge funding round. Moonshot AI has launched a smart chat bot, Kimi Chat in October. And that is built and it's self-developed. Moonshot, a loom large language model apparently capable of processing up to 200,000 Chinese characters in its company window.

47:03

The company was founded, just in April of 2023. So a pretty new one. And yeah, this is pretty much investment in open AI type play. I'm a bit surprised because I haven't been seeing anything, about this personally, but. Yeah. I mean, this.

Jeremie

47:24

Is definitely one of the weirder fundraisers that I've, personally seen, especially in the Chinese ecosystem. Like, you know, for context. Usually if you're like a fundraise on the order of $1 billion is something that you do like before you're going to IPO after, you know, years and years and years development. Totally get that. AI is different. Things move faster.

47:44

But the other weird thing about this is when you raise on those valuations, you usually don't give away in this case, like more than a third of your company in a single fundraise. Though in total you might easily have given away a third of your company by the time you're raising that amount. So I don't even know how to parse this. This company is head by headed by this guy, Yang Jilin, who is a computer science grad from Tsinghua University, very

48:09

prestigious institution in Beijing. Worth noting it has an open affiliation with the People's Liberation Army. So this is, for all intents and purposes, a military affiliated institution. And now it has the spinoff. So really kind of noteworthy as well because the investor is, yes, include Alibaba. They also include Hongshan, which is, kind of Chinese spin off of Sequoia Capital.

48:30

We talked about them, I think, in a previous episode and how China basically wrestled, this, this spinoff away from Sequoia. And it just turned out to be kind of a not a dead loss, maybe for Sequoia, but certainly not nearly as, a good outcome as they might have hoped. And, and of course, Hong Shan, we previously talked about how another company that they funded is another sort of like OpenAI competitor. This is the company light years beyond.

48:56

And they've anyway been doing, you know, at scale language model training. Maybe a last thing worth mentioning. There are a lot of these companies flaunting long context windows, right? In fact, at one time there was a company called Beichuan in China that had said, hey, look, we've launched a 350,000 character context window model. And they even said that this is the longest context window in the world, and that it beat clod two, which at the time was its next closest competitor.

49:22

One of the things to look out for when you hear these things about big context windows, you know, anybody can make a model that can absorb a huge context window. The question is, how well does it handle that context window? You know, how does that context window translate into quality? Oftentimes with Chinese models in particular, I recommend a, you know, trust but verify approach. Wait until you see the performance on the open, open benchmarks.

49:47

At least if it's an open source model, that can be especially helpful. Yeah. Just because you never quite know, there's so much impetus to to kind of hit those vanity metrics, just because they make headlines. But still very interesting development. Huge, huge fundraise and, and very I mean, this is a lot of capital sloshing around in the Chinese market right now.

Andrey

50:06

Next story. AI competing firm Lamda raises 320 million in fresh funding. There you go. Lamda focuses on compute. As it's said, they already serve 5000 customers across various industries and will now be tried to compete with Nvidia and other hardware providers.

Jeremie

50:29

They're known for doing a bunch of, lean and generative AI training. So that's kind of where they specialize, not in inference and not in stuff that's not generative AI. But yeah, specifically those things. And, apparently I thought this was interesting on their website, they apparently have, a shipment of two hundreds either already on site or about to be deployed. So they're one of the first, cloud providers with their hands on two hundreds from Nvidia.

50:54

So clearly they've got a, a pretty good way of getting allocation from Nvidia, which is, you know, probably behind the raise in the valuation.

Andrey

51:01

And one last quick story about funding. Ex Salesforce co-CEO Bret Taylor and longtime Googler Clay Beaver raised 110 million to bring AI agents to business. This is 110 million from investors and includes Gear Capital and Benchmark. And these are meant to be AI agents that do various things. So, for example, there are, hundreds of thousands of customer conversation every month for clients including Weightwatchers, Sirius XM, Sonos and all look

51:34

AI. So they are kind of enabling AI interactions across various businesses and customers.

⁠¶ Projects & Open Source

Jeremie

51:42

And up next we have our projects and open source section. We're kicking it off with bio Mistral, bio Mistral, if you're French, which, you know, not everyone is a collection of open source pre-trained large language models for medical domains. So essentially what we've got here is a case where this, lab is, like, put together a, they've taken the missile, 7 billion instruct the 0.1 model.

52:08

So the instruction fine tune version of Mr. Al's 7 billion parameter model, and they gave it some additional training, on PubMed central. So a database of basically medical data. It's a pretty significant. Corpus about 1.47 million documents. 3 billion tokens that, were added to the already kind of pre-trained, Mr.. All 7 billion, model. And then they basically looked at like, hey, how does this do on, a whole bunch of different questions on medical question answering tasks, in English.

52:42

And then they automatically had translations of those tasks into seven other languages as well, just to see how well it generalized. And it does pretty well. It, it outperforms, models like meta alpaca, which, you know, it's kind of what it sounds like, sort of medically fine tuned version of the 7 billion parameter alpaca model. And BioMed GPT, which we covered, I think, in a previous episode

53:04

a little while ago. So it definitely is ahead of the pack on, you know, just about all the, all the benchmarks that they tried with a couple of limited, kind of, exceptions, you know, things on a medical genetics, anatomy, college medicine, you know, on average, strongly outperforming, the vast majority of other, of other models in a lot of cases by like 10% performance on, on these benchmarks. So really quite impressive.

53:31

And another kind of open source model, increasingly we're seeing obviously the, the medical models, the medical versions of these models come out, you know, first the base model, then you get the instruction fine tune, then the dialog fine tuned, maybe the left, and then you get the kind of medical specialist models and so on and so forth. So we're seeing this, the, the Mr. align very much, mature.

Andrey

53:52

I found it interesting that this was trained using the CNRS, the French National Center for Scientific Research, high performance computer. There's a similar initiative in the US to make, national AI cloud, where you can have these sort of serve, supercomputers for AI research. So I guess, yeah, a nice demonstration of what happens when you give academics, or open source developers access to powerful hardware.

54:20

They can develop these sorts of models and fully open source, so models, datasets, benchmarks, scripts, everything is out similar to what we saw last week with, AI two and next story. Nomic AI is the first fully open source long context text, embedding the model that surpasses OpenAI's other two performance on various benchmarks. Long title of a story. But that's what it is. A release here is of knowing this embed text v1 that does generate these embeddings.

54:57

A quick recap embeddings are just a bunch of numbers that, roughly speaking, tell you what text means, and you can use that as an input to a language model, a chat bot, or you can use it for various other things. You can, use it to find similar text to retrieval, classification, a bunch of stuff. This model can handle a sequence of 8000

55:22

tokens. So that's quite a bit higher than, a lot of typical open source models that are capped at, let's say, 500 and yeah, hopefully a PyTorch services coming out under an Apache to license.

Jeremie

55:37

Yeah. It's also so it is a very small model at 137 million parameters. It used to be that, you know, to, to hit anything like an 8000 token context window, you would just need a much, much bigger model. My guess is that they probably haven't done like, sort of compute optimal training. So in other words, they probably poured in more compute than is ideal, for this number of parameters on the standard scaling basis, just to make sure that they squeeze as much value as they possibly can in

56:07

those parameters. The idea here really being to make sure that you have a small model that can do really well. Right. It's it's another example of the kinds of little nooks and crannies that we're still trying to fill in with, you know, models that have, you know, in this case a long context window, but are really small, you know, that that combination is, is something that, there just wasn't a good model for this actually beats.

56:29

Yeah. OpenAI's, text embedding models like text embedding, and text embedding three small on, short and long context benchmarks. So it is, you know, useful for, for a wide range of things that, those models perhaps aren't for, or aren't useful for. And it is released under an Apache two license, so, you know, very

56:46

permissive. But yeah, definitely got more of those vibes from this paper of, like, another example where people are trying to fall over themselves to show how open they are, will give you the code, will give you the data, will give you. This was very much one of those cases. So pretty well everything you can imagine wanting out of this, out of this, model, you can certainly get, they have a bunch of interesting details, like, you know, they're using

57:09

flash attention. Maybe not too surprising to see that use now and a bunch of tricks like. Anyway, setting their vocabulary size strategically just to make sure that they're, they're improving, let's say on the, the previous, system is they do use. Is a Bert base. So that being kind of the, I guess the the version of language models before the GPT series actually that used to be the cutting edge. Well, now they're they're back to Bert with these

⁠¶ Research & Advancements

57:33

augmentations and seeing some cool results.

Andrey

57:36

Onto research and endorsements for story is a meta unveils v dash japa AI model that improves training by learning from video. And the blog post from meta was actually titled part the next step towards Yana coon's Vision of Advanced Machine Intelligence. Or am I right?

Jeremie

57:58

Same thought.

Andrey

57:59

First title in the Army apparently is is now a Do acronym we are trying to make happen. But yes, this is the Jap, a video joint embedding predictive architecture similar to their image joint embedding predictive architect architecture. And the whole idea is to be able to try and predict patches of a video so similar in a way, to, the source story we began with.

58:29

This is another story of trying to train a world model, in this case, trying to train an AI to understand how the world works by predicting portions of video. In this case, they say we mask out large aspects of image and have a model train predicted. Yeah.

Jeremie

58:48

And so I think it's really noteworthy just how similar this is in spirit to OpenAI. Sora. Right. Like in both cases, you take an image and you chunk it up into patches, you create an embedding, right? This like list of numbers that captures the meaning in that patch of the image, right? That captures whether or not certain concepts are represented there. And for each of those patches, you're going to create an embedding

59:11

like that. And what you're going to do is, is take advantage of the fact that video frames that are close to each other in time, or where patches of image that are close to each other in space usually contain closely related information. Right? So if you see a bit of sky in one part or one frame of a video, then there's a good chance that there's going to be sky in that same patch one frame later. And there's a good chance that neighboring patches will also contain

59:38

some sky. And we've actually talked about this idea, I think, last week when we talked about how humans might learn visual skills and visual understanding of the world by recognizing that the things we see at one moment are usually going to be pretty similar to the things we see very soon after, and we don't need to label our data to know that. Right? This is just all unsupervised learning.

59:58

It's all being done without labels, without labeling our, our input data for the, for during the training process. And so essentially this model is just going to be trained to determine whether a chunk of an image or a time bound piece of video follows or is close to another given chunk of the video. And in that sense, it's not a generative model, right? It's it's a it's not going to be generating video. It's a discriminative discriminative model.

01:00:25

It's going to be analyzing, analyzing pieces of video and images. And it makes its predictions not by operating on raw pixels, in the video, but instead on operating on, this embedding space, right on the space where we're capturing the semantic meaning of, of the chunks of video that we have. So in that sense, it is kind of like Sora, right? Both of these things operate at the level of the

01:00:46

embedding space. They're both, it seems, taking advantage of the fact that, meaning is similar in kind of closely, physically, closely related parts of the video, whether that's, you know, close in time or close in space. And ultimately the thing you want to get out of this, in the case of the japa, the most valuable artifact that you're after. Yeah, from the training process is, is the encoder that takes in a patch of video or a patch of an image and generates

01:01:13

the embeddings. Right. Generates that. That meaning extracts the meaning from those inputs. And so, you know, you might be tempted to look at this and kind of compare it to OpenAI Sora more explicitly. And, you know, if you do that, like I did superficially at first, you might see this is a fairly weak showing compared to Sora. Right? Like, this model has a lot of big limitations that Sora doesn't have. It's, it's only discriminative, right?

01:01:37

So it can analyze meaning in images and video, but it can't generate video. It also only works on very short chunks of video, like their paper says something like 10s or so. It's about what it can handle in terms of recognizing actions over a long time horizons. And another thing is that you still need to adapt the model by training like a small, lightweight, specialized layer or a small network on top of it if you want it to learn a new skill.

01:02:01

So there's no equivalent here to like in-context learning that's captured, as far as I can tell in this architecture. But the flip side is that they're publishing it openly, and it does reflect a commitment to Yann LeCun vision of AGI and how he thinks it will be achieved. So whereas, you know, OpenAI tends to like the idea of scaling individual models, they tend to take the. View that as we progress.

01:02:24

Architecture matters less and less and, and scale matters more and more because it starts to do more and more of the work for you. Whereas meta, on the other hand, as is the case here, they take inspiration from the way the brain works a little bit more, and they see that as the path to human level intelligence. So it's maybe not so surprising that in that context, you know, they're more interested in these more specialized and purpose built modular architectures. And framing.

01:02:49

The study is an investigation into how to replicate human learning patterns. Rather than, you know, OpenAI's. Let's just scale this up and see approach.

Andrey

01:02:58

This is definitely more of a research effort primarily. Right. So we say this paper explores feature prediction as a stand alone objective for unsupervised learning from video, and introduces a collection of vision models trained.

01:03:15

So it really is exploring specifically a training objective and is primarily a research artifact, right, of, trying to release some new information and release this stuff for really other academics, which I don't think can be compared to Sora, which is definitely, I guess, not a tool yet, not a product yet, but clearly is more of a commercial investment from OpenAI perspective.

01:03:46

I will also also say it's a little different in the sense that and everything highlighted here is that this is self-supervised or unsupervised training, so you can just take a ton of videos and train. Whereas for a generative model, text to video, you need text and video, right? So you could make your argument from a scaling perspective. Yeah, that's the limit.

01:04:07

You can use this kind of method to train on a giant corpus of videos without requiring any labeling, versus if you want to train a generative model, when somehow you will need to first label all of them. And maybe it turns out that you can first train a model on this self-supervised objective, and then train it some more with more generative whatever more. If noting a lot of advancements in image, stuff has come from self-supervised learning.

01:04:34

I mean, chat bots fundamentally are self-supervised at first van. They do fine tuning on human labels. So another example of meta really exploring of that type of space and showing that is can be extended to video. Next paper chain a fart reasoning without prompting. So chain of fat. We have mentioned it a lot of times over

01:04:59

the years. Basically, the idea is that for a lambs to be able to solve, let's say, trickier problems which require a bit of thinking, it's been found that you can, condition the chat bots to do better by challenging it to, you know, say think step by step, where first give it an example of reasoning, through the question before giving the answer, stuff like that. So that all is kind of chain effort.

01:05:29

And also this paper is saying is that it is possible to get Chad bots, MLMs to do chain of reasoning without giving it examples or telling you to do so in the prompt. The way they do that is they investigate the decoding process, the process by which you generate the output after giving it to be input. And instead of doing greedy decoding, where you just take the most likely output each time, they show that using, the top k alternative tokens.

01:06:04

So other paths that the LM could go down, it is possible to find the way chain of fat paths of output that are inherent in V sequences. So we basically aim is that language models are already inherently capable.

Jeremie

01:06:22

Of a.

Andrey

01:06:22

Chain of reasoning or like reasoning in their output, if you decode the output in the right way. And, if the confidence in the final answer increases when a chain of fat type output is present in the coding path, which we can leverage to create this chain of fat specific decoding.

Jeremie

01:06:46

Yeah. I'm curious if you can think of this for like concrete applications of this. I think it's interesting in its own right and worth consideration for that reason. It's just that, you know, some of the caveats here.

01:06:57

So, so they'll say, for example, it's generally but not always the case that, you know, the model will be most confident in its final answer if it takes a reasoning trajectory, as you said, that is associated with, chain of thought prompting or sort of like where where it autonomously decides or spontaneously decides to chain of thought. But that's not always the case. It's inconsistent and chain of thought.

01:07:22

Prompting is also apparently not the most common of these reasoning trajectories that it ends up going through. And so as a result, you know, you can't automatically sift through and pick out the chain of thought. One, using techniques like self-consistency, which sort of look at kind of which of these approaches is coming up the most and is most self-consistent. There's just too much diversity in the, different, reasoning strategies that come out at that level.

01:07:48

And, you know, this approach also would require getting the model of fully generate all of these outputs. And so, you know, that's pretty expensive from an inference standpoint. You're running the model many, many times all the way through. And so at a certain point you're kind of reduced to just doing an ensembling approach really. That's that's to me what this looks like. It's an intelligent, ensembling approach.

01:08:08

They're they're coming with heuristics to make it more likely that we can pick out the, the self prompting one. But then the last challenge, that they flagged here too, was apparently this works best for simple problems that are more similar to the problems that, were in the training set explicitly.

01:08:24

But you are still going to need standard chain of thought prompting if you're going to take on more challenging problems, because you kind of need to be in teaching mode a little bit more and help the model out.

01:08:33

So yeah, I like I think it's, it's academically interesting because we're, we're learning that, oh, you know, bubbling up to the top in a lot of these suggestions is the the LM autonomously is kind of going, oh, I want to try this, but it's not always in fact not often the first thing that it'll try and there are all these kind of issues behind the scenes that at least, to my mind, might make it a little hard to kind of, use this in practice.

Andrey

01:08:58

Yeah, I agree, I think this is more of an interesting result and something that by itself isn't, let's say a game changer. You can you can just prompt them to do reasoning or fine tune it. But, you know, you can use this insight and build on it as you do in so many cases and AI and research. And potentially this could impact how you create fine tuning, data sets when you have LMS evaluate other alums.

01:09:27

I think having a better understanding of the space of things you can do while decoding, is very useful. Yeah. And I'll do a lightning round where we're going to try to move fast for a few favors.

Jeremie

01:09:40

For doing great at this.

Andrey

01:09:41

Yeah, we are really good at that. Well, first paper OS copilot towards generalist computer agents who have self improvement. So this. Is exactly what the title said a framework to build generalist agents capable of interfacing with various elements of an operating system, including, I guess, everything on a computer and not just kind of the operating system internals, but also the web code, terminals, files, media and third party applications.

01:10:12

They use that framework to create Friday as self-improving embodied agents for automating general computer tasks. And on this benchmark called Gaia, a general AI assistance benchmark Friday outperforms previous methods by a good deal. So yet another example of people working on agents and on automating workflows on a computer. In a general, way.

Jeremie

01:10:40

Yeah, it's, I think the Gaia benchmark is one that we talked about a fair bit, before. And it was basically this attempt to come up with benchmarks that are hard enough that, you know, current language model agents struggle with them and that they ended up having these like three different levels of tasks, you know, level one, level two, level three tasks, level three being the most challenging.

01:11:00

And basically, you know, previously all language model agents just would flop at that, that level of task. Friday this this particular framework, achieves a success rate of 6.12% on it. So that's kind of cool. And and certainly an indication that, you know, maybe getting a little bit of lift off now at that higher end, level of difficulty tier, but, yeah. Interesting. Interesting result. Another another new round of agent architecture.

01:11:26

I feel like we're we're seeing another paper like that every week or so these days, actually, more than that. But a definitely a big step forward. And certainly looking at that level three task, push is, at least to me, seems like one of the more impressive things we've seen so far.

Andrey

01:11:40

Next paper world model on million length video and language with ring attention. So the idea is to show that you can train a transformer to be effective on token lengths. Context tanks of 1 million tokens, similar to when you began with Gemini at 1.5 Pro. In this case, we actually have a paper that tells us how they did it. So one of the, tricks they had was ring attention. That is a technique for scaling up context size arbitrarily without approximations or overheads.

01:12:16

Secondly, they curated a large dataset of videos and language from books and gradually trained this model with increasing context size, starting at 4000 tokens and going up all the way to, 1 million tokens. They open sourced, highly optimized implementation with ring detection and other features to let other people build on this context transformer that they trained.

Jeremie

01:12:47

Yeah. To me, the ring attention piece is really the highlight here. You know, this seems to be what they're using to achieve these absurdly long context windows. Peter Abele, is pretty famous UC Berkeley researcher who came up with, the original reinvention paper. This is a it's kind of a I mean, it is kind of like a standard transformer. But it has a fancy way of passing off, the keys and values.

01:13:13

So basically, these are two intermediary things that you have to calculate in the process of generating the output of a transformer. So, so it's this fancy way of, of passing those values along to or between multiple devices that are set up in like a, well, a ring like structure. And by, by doing that, they're able to achieve like the details are somewhat technical unfortunately, but they're able to achieve these really, really long context windows.

01:13:39

In principle, they are like theoretically they can go up to infinite length. But they're only they're limited basically just by the number of those devices or cores that you can kind of stack together in that way. So, yeah, it's I think a really important new development ring attention is something that I'm going to be paying a lot more attention to going forward. And, yeah, very interesting that they've been able to pull this off. It's also kind of weird that it's coming out at the same time as

01:14:04

Gemini 1.5. Sort of makes you it makes you wonder a little bit, where, what's what's under the hood there and how that might relate. But, yeah, very, very interesting. New breakthrough.

Andrey

01:14:14

Right. I think this is highlighting that these, kinds of things, larger context is part of it trends, I guess, still ongoing. It's been ongoing, really for the last year and a half, where we've seen, like at a time, Claude having 32,000 talk in context, I think was really impressive. And now we're going all the way up to a million, which is, of course, essential for having general purpose AI and so on.

01:14:41

So makes some sense to me that they're coming, out close to each other and, pretty cool to see them open sourcing, fine tune. Version of two 770. So this model that they release with long context windows is a 7 billion model that other and others can use. Next story Amazon AGI team says where AI is showing emergent abilities. That's the headline. So it turns out that Amazon has an AGI team, that does research and works towards AGI. This is something I just didn't note.

Jeremie

01:15:18

Man for Amazon.

Andrey

01:15:19

Yeah, I mean I, I was not aware maybe I was, but they have created a new model called Big Adaptive Streamer Waltz with emergent abilities. So base RTS and as per every headline, it says that there's some emerging abilities that it wasn't trained on. So this model was trained on 100,000 hours of public domain speech data, mostly English, and is presumably really good at text to speech as a result, and the specifics for this emergent stuff, basically it has to do with pretty, complicated aspects

01:15:59

of text to speech. So they have some sentences that include foreign words and, you know, like add signs and hashtags and things like that. And apparently based TTS was not explicitly trained to deal with foreign words or punctuations or things like that, but still was able to do pretty well. So it kind of kept the ability to create speech, even for things, that it hasn't seen.

Jeremie

01:16:28

One of the big take homes of this thing is actually that Amazon has an AGI, AGI research team, like I was I was joking about it earlier. It like, I vaguely remembered this. I think we might have actually touched on it in a past episode, but we we haven't heard much from this team. That seems to be one of the sort of first results that we're seeing. They do have a bunch of audio samples that you can listen to. I just listened to a couple just now and they're.

01:16:53

Yeah, they're good. Definitely a solid sort of text to speech model. And, interesting that their AGI team is, is starting with that focus. I'm, I'm not sure if that's. Yeah. Commitment to a certain view about the value of audio or that is a path to AGI. But we'll just have to see what they come out with next.

Andrey

01:17:10

That is it for research. Moving on to policy and safety, starting with the story,

⁠¶ Policy & Safety

01:17:16

hackers for China, Russia and others have used OpenAI AI system, according to a report. This is research by OpenAI and Microsoft. And they say this is some of the first documentation of hackers with ties to foreign governments using generative AI in their attacks. The attacks were using AI in relatively mundane ways. So drafting emails, translating documents, debugging computer code.

01:17:43

And as you might expect, OpenAI and Microsoft have said that they are working to disallow and curtail the use of their systems by these foreign hackers.

Jeremie

01:17:56

Yeah. And it's an extension of a partnership with Microsoft Threat Intelligence, which is interesting because it reminds us of that, you know, close partnership between Microsoft and OpenAI, which, you know, the both Microsoft and OpenAI, I hasten to tell us, is, does extend to safety and security. And so, yeah, they have a bunch of different sort of, scenarios or what am I trying to say? Like, they're a bunch of different vignettes, or examples that they're sharing in this, in this post.

01:18:26

And then Microsoft's post as well goes into a little bit more depth. They look at two different, China affiliated threat actors who apparently try to use OpenAI servers. They give them code names that are really cool, like Charcoal Typhoon and Salmon Typhoon. There's an Iranian affiliated threat actor called Crimson Sandstorm, North Korea, and a North Korean one called Emerald Sleet and a Russian

01:18:49

one called Forest Blizzard. So kind of cool if you if you're into the if you're into the fancy code names. Yeah. Essentially, as you said, I mean, they're trying a wide range of different things. It was useful, I guess, for, Microsoft Threat Intelligence and OpenAI to watch them, to kind of let them, use the service a little bit, see what sorts of things are after.

01:19:08

It was interesting because Microsoft's post went into a little bit more detail about who these actors are and what they have tended to do. Charcoal Typhoon, that sort of Chinese state affiliated one. They talk about it having a broad operational scope, targeting sectors like government, higher ed comms, oil and gas. So very, very broad where A-7 typhoon the other, Chinese state affiliated one is seems a lot more sophisticated.

01:19:33

They're a, they have a history of targeting U.S. defense contractors, government agencies, cryptographic tech, companies, that sort of thing. And, what they were doing is they were using these, open AI models to translate technical papers. That's kind of interesting. Get publicly available information on intelligence agencies and regional threat actors, get help with coding, and also to research common ways processes

01:19:59

can be hidden on a system. So we're seeing, you know, definitely more of a veering into the kind of malware, cyber offense, dimension. And, you know, anyway, there there's a bunch more really interesting information. Last one I'll mention is Forest Blizzard, the, Russian one. Apparently this is actually linked to GRU unit 26165. So, apparently it's, targeted victims of both tactical and strategic interest to the Russian government and has been active in the context of Ukraine.

01:20:28

So the GRU is sort of Russia's, you know, back in the day would have been the KGB, the sort of Russian intelligence agency. So the GRU here, also getting in on the action, and, and kind of interesting apparently all of these accounts, by the way, have been shut down. So, you know, no surprise there. But, yeah, sort of an interesting bit of, bit of news and some, some cool transparency, I guess, from Microsoft and OpenAI sharing a little bit about what's been going on under the hood.

Andrey

01:20:55

That's right. Yeah. You can go to these releases by both of them to get into the details and see how they're tracking these agents. But I guess the takeaway is so far the hackers are mostly just doing mundane stuff with chat bots like we are and are not somehow becoming super hackers just because I have access to ChatGPT. Next story house leaders launch a bipartisan artificial intelligence task force. So, the House has been doing some stuff related to AI for the past year.

01:21:28

We've seen forums on AI. We've seen some bills starting to come out related to deepfakes. And now House leaders, speaker Mike Johnson and Minority Leader Hakeem Jeffries are launching this bipartisan AI task force with Task Force. It will be looking into how the US can support AI innovation and study potential threats, release guidelines, policy proposals, all those sorts of things.

01:21:59

And it will have 24 members led by Chairman Jay Obon, old and co chairman Ted Lieu, both of whom have computer science backgrounds and have previously talked about AI. And I found that detail pretty interesting. You know, the leadership front there.

Jeremie

01:22:16

Yeah, for sure. G openly, is sort of famous for, yeah, owning a video game development company as well. So get kind of technical guy. He's got. Yeah, a masters specifically. He also is, there is historically been less concerned about the sort of alignment risk, the catastrophic risk potential from, exotic AI accidents, that sort of thing.

01:22:39

So interesting to and useful to get, you know, this balance of more sort of like free, free market libertarian perspective and the, you know, to lose perspective, which, you know, he has been more of a hawk generally on on AI overall, though, I'm

01:22:55

struggling. Remember if he's actually kind of looked at catastrophic risk from eye alignment, you know, that's something that I think really ought to be in the conversation here, especially as, you know, we enter or start to think about entering the the spring and then the summer when if there's going to be a bill that'll go through the House and Senate in, you know, before the election, that that's gonna probably have to happen fairly soon.

01:23:18

So this is clearly part of Congress trying to wrap its arms around this very complex issue. And, yeah, it's it's good that they're technically informed minds at the table. I think one of the, one of the big risks that you run into as well is, you know, we all can index a little bit too much, potentially, towards our, our past experience. And I find this often with folks who do, you kind of like CSS

01:23:42

stuff. I stuff from back in the day, you know, the, the sorts of strategies that worked back then, had limitations that the strategies that work now don't. I'm sure that these folks are tracking that, but it can tend to bias towards thinking that things are moving perhaps more slowly. Like we see the limitations. We don't necessarily see the, the

01:24:01

capabilities. So I think one of the yeah, hopefully one of the things that will happen here is they'll, they'll, kind of canvass around for a wide range of opinions on where the field might be going in to account for the fact that a lot is unknown. Like we do not know how fast stuff could move. And, you know, that probably means we ought to have some chips, bet on the possibility that things could could move fairly soon. You know, that's the kind of possibility you don't want to be

01:24:24

blindsided by. You want to have some kind of, some kind of legislative muscle in place to to deal with that possibility. So really interesting, interesting cast of character is very well informed group of people. And, you know, hopefully they make the right calls.

Andrey

01:24:37

Right. And, quite bipartisan. It's a real mix of Democrats and Republicans on this thing. And we've seen that at least in AI, there are some things that can be bipartisan, like regulating deepfakes. So I wouldn't be surprised to see this taskforce kind of actually come to some agreement regarding aspects of AI regulation, even if, as typical in the US, Democrats and Republicans will have pretty wide divides on some of the issues.

Jeremie

01:25:08

All right unto our lightning round we have. Your fingerprints can be recreated from the sounds made when you swipe on a touch screen. Chinese and U.S. researchers show a new side channel can reproduce fingerprints to enable attacks. Okay, I'm all out of breath. And that was basically the entire paper in the title.

01:25:26

It is now February 2024, and you're probably asking yourself, how come we haven't yet run into an AI breakthrough that allows your fingerprints to be reconstructed based on the sound they make when they slide on a screen? Well, Xinhua University, which again, I hasten to remind you, has an open play affiliation and the University of Colorado teamed up together to make this breakthrough happen. It's not a 100% effective thing. This attack allows you to.

01:25:55

It turns out, attack about 28% of partial fingerprints, cases where partial fingerprints and about 10% or 9% of complete fingerprints within five attempts, just based on the sound, the sound of your finger as you move it across the freaking trackpad. This is pretty insane to me, but, I think, you know, just a reminder of, like, how crazy easy it is to gather information, recreate, data with like, about your environment with very little information.

01:26:23

And, you know, we're we're moving into a very interesting information environment where, you know, like, monitoring, intelligence gathering and all that stuff is going to be a hell of a lot easier.

Andrey

01:26:31

That's right. This is not, huge kind of AI leap year, not some end to end model we created. It's rather a whole kind of little system where there's a series of algorithms, different types of pre-processing and, steps for understanding the raw audio signal that they put together and got this to work pretty well. So I guess a little warning, because if you do try to actually just train a model end to end from text to image or whatever you want to attempt, maybe you could do better.

01:27:04

So just FYI, next story, this one is from Axios, and it is simply that states are introducing 50 AI related bills per week. And this is in the US of course. So they just cover some details as to the state of a bills being introduced in states in the US and highlight that it there's a lot going on as of February 7th, where there were 407 total AI related bills across more than 40 states in the US, and that's up from just 67 bills a year ago. States have introduced 211 AI bills last month.

01:27:46

There's 33 states have election related AI bills and so on. So yeah, and as per headline, it's, now the. There's 50 new AI related bills per week throughout the States. Whether you know some of the states having the most of them, New York has 65, California, 29, Tennessee 28, Illinois 27, and some other ones also are introducing them.

Jeremie

01:28:12

And up next, we have finally, my home country actually making the news. Air Canada found liable for chat bots bad advice on plane tickets. So Air Canada is our, you know, national airline, and they've been ordered to pay compensation to a grieving grandchild who claimed that they were misled into purchasing full price plane tickets by an ill informed chat bot. This is where things get weird. So the airline actually tried to, like, separate itself from the crazy shit that its chat bot said.

01:28:43

And, I'm gonna say this all Canadian like, because I'm guessing that this is what they sounded like when they said it, but they said, a separate legal entity that is responsible for its own actions. That's what this thing is. It's a separate legal entity. They claim this chat bot, that is distinct from them. And so who knew? Right? This thing is it's an independent agent. So this led to a decision, perhaps unsurprisingly, coming from, a tribunal that said that.

01:29:06

Look, this this whole idea basically does not, that while a chat bot has an interactive component, it's still just a part of Air Canada's website. It should be obvious to Air Canada that it is responsible for all the information on its website. It makes no difference whether the information comes from a static page or a chat bot. And and then they added, I find Air Canada did not take reasonable care to ensure its chat bot was accurate.

01:29:31

Interestingly, this introduces a legal requirement implicitly for AI alignment, right? Like what is reasonable care for AI alignment? What is an amount of care that is reasonable to ensure that this chat bot, will give true outcomes, correct outcome or outputs? So I think this is really interesting. We're going to see a lot more stuff like this obviously.

01:29:54

But it was it was also kind of interesting because of the weird legal argument that Air Canada tried to put forward here, you know, like a separate legal entity that that's interesting, for a chat bot. So we're, we're playing out a bunch of science fiction plotlines and figuring out what's human, what's not. I guess.

Andrey

01:30:10

That's right. And specifically, we chat about claimed you could get essentially a full refund and you in this case couldn't. And the person was awarded the $812 to cover that difference. And the last story for this section, the FTC warned about quiet US changes for AI training for us being terms of service. So the warning was that companies, might be tempted to resolve a conflict of wanting to use user data into AI training fuel.

01:30:46

And the FTC stated that, yeah, they might simply change the terms of a privacy policy so that they are no longer restricted in the ways they can use their customers data. And so the FTC post actually says this. And to avoid backlash from users who are concerned about their privacy, companies may try to make these changes surreptitiously. But market participants should be on notice with any firm that reneges on user privacy commitments risks running afoul of the law. So here you go.

01:31:18

The FTC is saying don't secretly change your privacy policy to make money from the data. Don't do it. That's, not okay. And apparently zoom did this in August of 2023. It's updated the terms of reference to clarify that the company can train AI on user data with no way to opt out.

Jeremie

01:31:37

Yeah. And then they had a, a commentator, an analyst who stepped in and said, you know, maybe it's it's not so bad as that, you know, the, the, zoom change and now this Google change, they're saying, at least in the case of zoom, if done quietly, it was likely because the change wasn't material, it was just stating more explicitly something that had already, that it had already retained the rights to do so.

01:31:59

You know, maybe, you know, maybe this is a more kind of, innocuous play, but certainly the idea of the FTC is coming in and saying, hey, folks, like, you're going to play nice now, right? Like, this is a bit of a shot across the bow. So we'll see if people actually heed the warning.

⁠¶ Synthetic Media & Art

Andrey

01:32:14

All righty. And on to synthetic media and art. This time. We all have only one story in this section, and it is that Sarah Silverman's a lawsuit against OpenAI. Partially dismissed. This is a California court. And, as part of a headline that it has dismissed this lawsuit against OpenAI by Sarah Silverman and several other offers. The lawsuit made six claims, including direct copyright infringement. Vicarious infringement?

01:32:45

Some other ones. OpenAI has requested the dismissal of all of those except for direct copyright infringement. And the judge has. Dismissed. For, these, different, claims. So now it's down to just the two of them, which is unfair competition, and they're direct copyright infringement. So I guess, narrowing of the scope of the lawsuit in this case, less for the offers to argue as to what OpenAI is liable for in this case.

Jeremie

01:33:18

Yeah. And as always as we're keen to say hashtag not lawyers. But, here is the, the reasoning that, that the judge in this case, I'm going to butcher the pronunciation if I try, but, judge well, I guess I'll try. Martinez. Alguien, essentially said that. Look, I'm not I'm not convinced that, OpenAI was. So there's one allegation that OpenAI was intentionally removing copyright management information.

01:33:45

This would be like the title and the registration number, for these documents, these books, and, and she also said it's not really clear that the authors had proven economic injury because in nowhere in their complaint were they alleging that defendants reproduced or distributed copies of these books. And, you know, this is an interesting threshold to set

01:34:09

right like that. Okay, apparently you've got it the way you show economic injury, the sort of justification for this part of the lawsuit, at least, is there's got to be, you know, the full distributed copy, of the of this material. That's a pretty high bar. And something that you can imagine being gamed pretty easily as well by MLM companies. Right. Like if you have a, a classifier run over the system and go, oh, I might reproducing verbatim a chunk of text. Okay, I'll just add a word here.

01:34:36

And, you know, now it's no longer verbatim. No, that may be too facile, but, also apparently the court decided that the claim of risk of future damage to intellectual property was too speculative.

01:34:50

And that's also interesting, right? Because you can imagine comedians, you know, let's say, well, Sarah Silverman, Dave Chappelle, these folks, you train these models on the data that, that, that, you know, they've produced their kind of collected works and then it can go out and do, Sarah Silverman monologue or a Dave Chappelle monologue. You know, arguably that that is economic data or, intellectual property damage of some kind. But apparently that's too speculative at this stage for course to

01:35:17

consider. I'm not sure what that does. To the precedent setting side, given that they're couching this in. Well, it's too speculative. Maybe that gives them the opening to not have this affect precedent to too much. But it definitely does start to kind of, as you said, constrain the set of things that you actually can sue these companies for. At least it starts to set that precedent.

Andrey

01:35:37

Definitely. Yeah. So this is one of many lawsuits that are ongoing, as we've covered over the last few months, where also different lawsuits for text to image, companies that were specific to, offers of texts. The offers can follow changes to their original complaints. By March 13th. And the main complaint Virgin shipped directly violated the copyright remains on the table. So we will still be seeing this go forward. And I guess it will still be interesting to see where it goes.

⁠¶ Fun & Misc

01:36:14

And now on to the last section. Venue, last section, which is just fun or miscellaneous, where we can include whatever you feel like. It doesn't have to be, anything else. So from my end, the first one I picked was this visual guide to Mamba and state space models. This is a really nice write up by Martin, which endorsed the Substack, where it, yeah, just goes through the details of the architecture. Explains the various details. It does take a while to get through conceptually.

01:36:55

It is built on some of these control theory concepts. It has some hardware summarization in there. It's just a mixture of various elements of that are rather technical. So I, to be honest, still haven't tried to fully get all the details in there, but I now have a general grasp of what's happening thanks to explainers like this read go through. It's a nice and step by step. And it is. Yeah, I would say kind of interesting at least, to get to a general picture by reading something like this.

Jeremie

01:37:29

It's 100%. I mean, the illustrations are so good. And, yeah, I mean, these sorts of things are worth their weight in gold, right? So often we we think we understand something and then we see, you know, there are a couple places where this really change. You know, my way of thinking about it, it was like, oh, wow. Okay. This is a kind of nicer way than the, then the equation based approach that, that sometimes, is the default, especially when these, these things just have come out.

01:37:54

Right. And all we have is the paper. So, yeah, really, really nice resource. And next the one I picked is called Cellular functions of stroma to goni or stem cells in relation to Jak stat signaling pathway. And if you're wondering yourself, I thought this was an AI podcast. Why are you talking to me about sperm stem cells? Well, I. I would too in your shoes.

01:38:19

But if you click on the link and you go to the paper, what you will find is that this is a retracted paper and it's retracted for a very interesting reason, because if you click on the actual images that are in the paper and you zoom in real close, what you will find is that they contain very nice pictures, you know, of cells and stuff.

01:38:39

I'm not a biologist, blah, blah, blah. But when you look closely, you find that the text on those images has weird spelling shit going on, almost as if those images were generated by an eye, almost as if this is complete confabulation. In fact, that is exactly what seems to be going on here. This paper is riddled with AI generated images, and it's not clear if the text is or is not, but it has been retracted. It's from the it's from two different institutions.

01:39:09

The researchers from the Department of Spine Surgery, Hong Kong Hospital, Qian Jiao Tong University in China, and the Department Spine Surgery in Hong University in South China. So, really kind of, big ding to this particular journal, which is frontiers in, which I think is actually pretty. I think it's a decently well-known journal, and I've heard of them before. Not sure where or in what context, but anyway, this is a real ding for their reputation and very surprising that, like, peer

01:39:39

review didn't catch this. It seems weirdly obvious.

Andrey

01:39:43

Yeah. So this, was covered. It kind of was. I now know I wasn't aware of a paper title, so at first I didn't know where you were going to this, but there have been media articles about this. For instance, one titled scientific scientific Journal publishes AI generated Wrath, we have gigantic penis and worrying incident.

Jeremie

01:40:06

So I didn't mention the piece, by the way, because this is a family show.

Andrey

01:40:10

But but that was in the news, and, I knew of that story. I didn't think it would fit in any section. But now I guess, this is where you go. So there you go. For reference, some journals are less reputable than others. Peer review is, can be kind of broken, especially if you go for a journal that is more sort of you just pay and most papers get in. Maybe that was the case here.

01:40:37

So I wouldn't say this is necessarily like a worrying sign for all of science that we're going to start to get more of these kind of ridiculous incidents. But yeah, kind of a fun story to be aware of. And just a couple more. I have one more from my end. The story is that Helen Mirren has ripped up an AI generated speech at the American Cinematheque Awards.

01:41:00

So that's it. Helen Mirren was accepting a lifetime achievement award, read out a very generic typed speech, and then said that it was AI generated and proceeded to tear it up, let three pieces for the floor, and that was met with applause and cheering. So yeah, kind of, a little bit sign that there is, growing backlash towards AI and the creative industries. I mean, this has already definitely been the case with, text to image, but I'm sure this will be the case for authorship as well.

01:41:35

And here is a very kind of clear sign of it.

Jeremie

01:41:39

Yeah. And as if on cue, like the next story is Microsoft's game changing Super Bowl ad, which basically goes like, hey guys, I know everybody's like really freaked out about like, you know, I was going to destroy the world to take your jobs or like, steal your children, kidnap them and sell them back to you for money that it can then use to train more of itself because it got more anyway. All that. But Microsoft goes. Don't worry. We're here to make your dreams happen with AI.

01:42:01

That's the reframe that they're going for. This is their big Super Bowl ad. They start the ad with a bunch of people who are, like, talking about all the different ways that their lives are not going the way they want them to. You know, the things like they say, I'll never open my own business or get my degree or make my movie or build something like all these things. And that's the first 30s of the ad.

01:42:22

And then later, basically Microsoft to apps in and says like, yeah, what's the copilot I bought, which goes to all these users and responds to them and goes, yes, I can help you. Unless the request involves opening the pod bay doors, in which case, like, no, I want to do that anyway. So this is kind of an interesting ad. It's Microsoft really trying to push back on this kind of cultural fascination with AI being, you know, like a source of significant risk

01:42:50

and exit. And they're trying to do, in some ways, what Apple did back in the the, apparently 1984 Super Bowl ad when they unveiled the Macintosh and they basically took a, you know, this Orwellian dystopia with a bunch of zombies that gets liberated by a projectile from a revolutionary sprinter. Anyway, it was like a an attempt to kind of rouse people out of, like, thinking of technology as this big bad thing.

01:43:13

Here is Microsoft trying to do the same thing and quite clearly kind of stepping on Apple's turf a little bit with a bit of that think different vibe. There is that sort of subtext to it. Anyway, so this actually this article is actually really great and efficient at kind of walking you through it and providing a little bit of context. So kind of like that.

Andrey

01:43:31

Right? Yeah. I saw this ad visible was now a week and a half ago as of this recording, just the way I and

Jeremie

01:43:39

it's a.

Andrey

01:43:39

Little hammy, this whole thing of like, oh, may say, I can't do this, I can't do that. And then the answer is, well, I exist, so you can. And, if you read the YouTube comments and the general response to this ad, I think it was seen as kind of lame and not particularly inspiring. But, to your point, I think it does show a desire to portray AI or reframe it as something great, as an enabler of human achievement and not a replacement or something like that.

01:44:12

And with that, we are done with this episode of last week and I. Once again, you can find the articles that we discussed here at last week in IE, our text newsletter. You can also feel free to email us with any suggestions or feedback at Contact add last week in that I or comment on YouTube or Substack or elsewhere, and we will be sure to, keep an eye on it and reply. As always, we would appreciate it if you share a podcast or review it on Apple Podcasts or somewhere else.

01:44:43

It's always nice to hear feedback. And I guess no bad recording these ridiculously long episodes is something that actually people like. But more than anything, we do like to see what people listen and benefit from these episodes, so please do keep tuning in.

Transcript source: Provided by creator in RSS feed: download file

#156 - OpenAI's Sora, Gemini 1.5, BioMistral, V-JEPA, AI Task Force, Fun!

Episode description

Transcript