Hello and welcome to the Last Week in AI podcast, where you can hear us chat about what's going on with AI. As usual, in this episode, we will summarize and discuss some of last week's most interesting AI news. You can also check out our last week in AI newsletter at lastweekin.ai for articles we did not cover in this episode. I am one of your host, Andre Karanka of. I finished my PhD at Stanford last year and I now work at a generative AI startup.
And I'm your host, Jeremie Harris. I'm the co-founder of Gladstone AI, which is an AI national security company. And yeah, this is I think this is one of those weeks, Andrey, where the news isn't numerous, but it's concentrated, right. We've got like a couple of really intense stories here.
We got a couple of big ones for sure, and just a splattering across various kind of themes. We've been seeing a pop up as we'll get to before that, though, it's my turn to have a random thing to mention at the beginning of a podcast. So, going back to my days at Stanford, there's, professor Jerry Kaplan, who has a new book out as, similar to Jeremy, some of your friends. This new book is a generative artificial intelligence. What everyone needs to know just came out.
Jerry has been a long time supporter of last week in AI and, all this stuff. So it was really cool to see him come out of this. And it's a really, yeah, nice kind of general primer on generative AI, artificial intelligence, all those topics very, easy read that covers the whole spectrum of topics there. For anyone who I guess is in least listening to this podcast or weekly and all way too deep and all this stuff. I think it's a good, resource.
Well, that's amazing. Is it? Was he he was a prophet years like, actually in in lectures or.
Yeah, he teaches a courseware, sort of overview on the state of AI with regards to ethics and regulation and various concerns like that. So nice. Yeah. He's been kind of in this space for a while, aware of AI, kind of involved in conversations at the Stanford AI lab and stuff like
that. So yeah, he has been looking at it closely, I guess, as much as we have been, so to speak, for a while and a second aside, before we get to a news is we haven't done a shout out to a review in a little while, so I wanted to do that. I happened to see on Apple Podcasts where a new one from, reviewer named Louise that says we have a top AI pod and the AI pod diverse, which is, quite the compliment, in the big AI pod versus, you know, to be the top is quite to honor.
So thank you for that compliment.
Excellent taste. Louise.
Yes. And as always, we do appreciate reviews. So if you do a Joe podcast, we would enjoy seeing you, chat about it on Apple Podcasts or elsewhere. And one last thing before we get to news, we do need.
To.
Do our business. Oh yes, as we have been for the last while. And once again, we are sponsored by the Super Data Science Podcast, which is one of the top, podcasts globally for technology. They are above us, by the way. And yeah, there's no need to compete here.
There's no I mean, there's no need, you know, this is there's no yardstick, there's no yours beside like look, according to Louis, right, we are the number one in in this particular podcast. Now, I don't know if it's the same podcast versus but yeah know I think super.
Down have like us is in data science. We do also include machine learning AI and data careers. So where scope is a little bit broader, you could say they cover kind of what happens with people and people's experiences and, insights from the industry as well as from, elsewhere. Whereas we just cover news. It's hosted by John Crone with chief data scientist and co-founder of machine learning company Nebula. And another offer of, really good book deep learning illustrated.
So yes.
Quite the expert on a lot of stuff. Having now interviewed more than the 700 people in the data science and AI space. You know, this guy knows what he's talking about. So it's a cool podcast. We love to have them be a sponsor and to plug for them, consistently just because we are fans as well. So do check them out. If you are shopping for a new podcast in the AI or Data Science part verse, and we've had enough asides, let's get into news starting with tools and apps.
And the first story is of course dealing with Gemini and Google's of. Big. Oopsie. So this was, I think, the biggest kind of discussion piece. Over the past week. And, so we have to get into it. The quick summary is that Google's, image generation capabilities of Gemini were found to be a bit lacking in terms of generating people, because it seems Google's somewhat ham fisted Lee tried to make up for potential
bias in the model. When you train these image generation models, if your data set is skewed towards having, let's say, mostly white people, as is generally the case, then it might skew towards generating mostly white people. Even if you want to say just like a human being, and it should generate probably diverse people of various racism nationalities. And so it appears that Google had some tuning or some prompt engineering.
When you ask Gemini to generate people to always be diverse, in a rather silly way. So, in this article headline, what it is is that Google apologizes for missing a mark after Gemini generated racially diverse Nazis.
So even of Nazis of any sort of historically accurate prompts, if you ask it for the Pope, if you ask it for the Founding Fathers of the United States, it is always going to give you people of mixed race, even when it makes more sense to me, more skewed towards a particular race such as white people. And there are some other kind of quirks here where it always refused to generate specifically white people while not refusing to generate other races.
And this led to a big, who blew like a lot of people picked up on this, generate silly examples of Gemini doing, let's say, overly diverse outputs. And it was so bad, the, reaction that Google actually made it. So you can't generate people Gemini anymore in its image generation capabilities. And they issued an apology and said that they are now working on it and are hoping to, have a fix done ASAP. And their stock took a big hit. All because of this, let's say embarrassing.
Snap product feature. Yeah. Yep. I mean, I think look, I mean, to me, there are at least two sub stories here, right? So on the one hand, this is a good reminder of just how clunky and ham fisted and kind of underdeveloped alignment techniques are. Google had no way of just conveying to Gemini what its preferences were, what generally it wanted as a, as a class of output from the system. And so all they could really do. Excuse me, all I could really do was apply these, these very clunky
strategies. This is a bit of a warning shot about how hard it is to align these systems, how like, how much of this is really just playing a game of whac-a-mole where you're hunting down edge cases and then using very clunky cudgels to kind of beat down those edge cases? So there's that technical piece, but there's also kind of this cultural
piece, right? I mean, the fact of the matter is, this product was shipped and it was shipped at a company like Google, which, you know, isn't exactly no known at this stage for moving fast and breaking things, especially when it comes to AI. They're trying to change that, but they're trying to move faster. But, you know, this thing got approved all the way up that kind of chain of approvals. And it will be a massive change in a company
like Google and then shipped. And so at no point, presumably, was someone in a position to or empowered to stand up and say, hey, this is a really weird set of use cases, or it's kind of behaving weirdly in these use cases. You know, maybe, maybe some people might have but didn't feel they could. That's another separate concern.
But either way, it's very difficult to have like this significant of an outwardly facing, an outwardly visible sign of, of dysfunction, that that doesn't, you know, say something about the internal culture. It's really difficult to tell a story where you get this product shipped, where there isn't some kind of cultural issue in the back end. And so, yes, Google's come out with this blog post, right? They called it. The blog post was entitled Gemini Image Generation. Got it wrong, will do better.
But they talk about it as if it's a technical problem rather than a cultural problem within the company of trying to figure out like ways to test these systems, ways for people to feel comfortable stepping up, because that's another issue here, right? Presumably if you see the system is like, you know, generating consistent, consistently nonwhite people in context where it doesn't make sense, it's politically sensitive to stand up
and flag that in this environment. But presumably nobody felt comfortable doing that. And that's a really interesting challenge for them. Deal with. So a lot going on here. Technical cultural stories. Google playing kind of playing catch up here. But surprising. I would not have expected this to come out from from Google at this stage. Given again how how deep that chain of approvals presumably is.
That's right. And I think I was also surprised seeing this, given that it calls back to like a year and a half ago. Now this basically same thing happen of Dall-E two right from OpenAI. If you remember, there was a story where people discovered that Dall-E two, when you prompt it, it seemed to be adding some extra, extra instructions to a prompt of a user behind the scenes to add diversity exactly in the same way that this is
doing. It was saying, okay, make this image generation to be of a black person or this person female. And you do have to acknowledge that these companies need to do that to some extent. If you say, you know a photo of a human being and it only gives you photos of white people, that would lead to a whole other kind of backlash immediately. And so and in really as a product, you probably don't want it to always generate white people by default anyway. So you need to do something.
But what appears to be the case here is that they did something very hamfisted, potentially just altering a prompt similar to Dall-E two and giving us some instructions, where this is, you know, being fed into a margin to their image generation model, this isn't even part of a line or Gemini. Probably. So, yeah, I think the surprising bit is, is how technically flawed the approach was and how it seemed to be lacking in that testing to kind of catch us before it went out.
So you get a lot of people's image of Google, took a hit. As I said, their stock took a major hit because there was such a major backlash to this. Now I think they will fix it. It's possible to do is better, I think. I mean, OpenAI and Dall-E are probably doing something to make sure there is some diversity beyond the training set. This is something that I think many companies are dealing with. So Google will figure out a way to do that.
But, yeah, they, certainly regret putting this out in the current, iteration that led to all of us backlash.
Yeah. And I mean, I think the, the really deeply embarrassing bit about this is also or one of them is just that these prompts that reveal this weird behavior are it's not like they're adversarial, right? I mean, it's not like these prompts are carefully crafted to elicit like edge cases that are really challenging for the system. It's literally just like, hey, like, Gemini, make me like, show me a, you know, picture of the Founding Fathers or, you know, not the or Nazis in Nazi Germany.
And like, it's it's stuff that you would expect would have come out in testing. I mean, it is really surprising that the the system either wasn't tested like that or this wasn't flagged, but, but no, totally agree. I mean, this is, something that is, is fixable. At least, you know, they can get it to be better than it currently is for sure. Opening eyes, proving that. And Google certainly has the technical
acumen on board to make it happen. So just a matter of time until I think we see Gemini around two.
And one more quick thing before we move on, it is worth mentioning that there's been a whole other side to the controversy, a little bit more muted, but there's also been critiques of the Gemini chat bot that the chat bot had its own issues where you could elicit silly behavior. The most notable example I saw was someone asking, which is worse, Elon's, mimicking or I think Hitler what
Hitler did. And the chat bot would sort of equivocated say, oh, it's hard to say which is worse, and then go into, you know, hears criticisms of Elon tweets and here is criticism of Hitler. So that's another kind of pretty silly output.
And there was, actually for Google, a bigger deal where it was, asked, prompted to talk about criticisms of the Prime Minister of India and it, I guess, accurately cited that some people have criticized some policies of the administration of India as being potentially fascist with regards to religious discrimination that landed them in hot water with Indian government.
So even though I guess people in the text sphere and on the internet have been less up in arms or critical or kind of making fun of these aspects of a chat bot, that is another aspect of a headache that the, AI people at Google are now dealing with and having to kind of make up for. So, yeah. When you Google, you have to be, I guess aware of people will be critical and point to these things out. And moving on from that next story. Stability announces stable diffusion tree and next gen
image generator. So there you go. This is the next generation image synthesis model. Text to image stable diffusion of course, is kind of, major player in this space. We released the first stable diffusion I think in 2022 or so was major deal because a lot of players in the space then took up that specific model, improved on it, played with it. And yeah, a lot of the current movement behind texture image was, driven by these stable diffusion models. So stable, diffusion free comes out now.
And as you might expect, it is even better than stable diffusion to and other things like that. There are models ranging in size from 800 million to 8 billion parameters, and it has some of the benefits of other top of the line image generators, in particular, handling text generation well and just generally being prompt, faithful, high resolution, everything. You know, it's kind of similar to a lead between Dali two and Dali three, in a way where you saw even better handling
of complex prompts. A lot of very crisp, non AI looking text and things like that. So, didn't make a huge splash. But for people developing, I think this was a major new story.
Yeah. I'm really curious about what we've talked about this before, but stability is business model long term and how stable, how stable it's going to turn out to be. But one of the things that came out or jumped out at me on this was they do use diffusion transformers for this. And so we are seeing some measure of, I don't know if I call it like consolidation, but consistency around that architecture.
You know, this is we talked about it, I think the last episode when we're talking about these, the Sora architecture, but basically this is where, you know, you you chunk up your image or your video into patches, and you essentially, you train your model to operate on patches of those pictures that kind of, work like tokens. So you sort of, like, split up, you know, or chunk up your image.
Sorry, they don't work like tokens, but they become like the atomic ingredient of the image that you then, sort of, map into latent space that you, that you then sort of, do your processing on so you don't operate at the level of full images, but rather on patches. So kind of interesting that's happening here again. It's, you know, no longer are we seeing this review. Net architecture, which is maybe the more common, like, image level, network architecture that was being used.
Now we're looking at diffusion transformers across the board. And, so I thought that was kind of noteworthy. And we'll see where where that goes. But, yeah. Another model out of, stable that is stability. And it does compare favorably to stable diffusion.
Three. I just I'm wondering if we're hitting a point of diminishing returns where a company that just like, narrowly specializes in this area is going to find it, you know, increasingly difficult to compete with Sora, to compete with, you know, Gemini once it's all patched up and all that.
That's right. Yeah. If you see images side to side of Dall-E free and stable diffusion or Midjourney, for instance, they all look good, right? It's hard to say which one is better at this point. We all or a lot of them now handle text very well, which was the major bottleneck. And now it gets down to the nitty gritty of like, oh, this is draw hands. Well, right. Yeah. And stuff like that. Stable diffusion free isn't widely available yet. It's in the testing and preview phase.
But as usual, they do say they will release the weights of a model for people to use, once it is ready. So this will be another contribution to the open source space and another model that anyone can build their own app or application that uses image generation with.
And now on to our lightning round. We're gonna kick it off with mistrial. I always like to say it the French way. I'm sorry, Mr. Ole Miss. Try whatever you want to call it. Releases new model to rival GPT four and its own chat assistant. Okay, this is actually a pretty big story. They. So. So, Mr. Ole Miss is, of course, this French company. French, French, European, French company that, is famous for taking a very open source approach to developing, large language models and other scaled
models. They really do see themselves as a kind of European or French Open AI, and they've been very actively involved in lobbying the European government as well on AI regulation. They are just releasing their new model called Mr. Large. And the blog post that, where they announce the release, by the way, I thought was kind of funny. It's called goulash. Au large. So, anyway, in French that it's like at large, essentially. The translator. Right. So this is a pretty big new model.
It's got 32,000 tokens of context window, as a comparison to, GPT four turbo. GPT four turbo has a 128,000 token context video. So we are talking about a smaller context window, but within that it performs surprisingly well. They flash the MLU benchmark score of the model, on their blog post, showing how it actually outperforms Claude to, and somewhere between like cloud to and nipping at the heels of GPT four. This is an impressive model in its own right.
It's worth noting, though, that it is coming out a long time after cloud two and GPT four, right? So anthropic and OpenAI obviously have where presumably have more advanced, and sophisticated models internally still really impressive. Even though there are issues with this benchmark and ML, you, you know, it's got some issues. But, as a first pass, this is a really, really impressive bit of performance.
A couple of things to note. First of all, this is actually, as far as I can tell from reading the blog post, this is not being released as an open source model. This is kind of flying in, in the in the face of what, MySQL had been up to before, where their whole ethos was about, you know, we're going to be actually open source. Well, guess what? Now that we're getting to open high level performance roughly in that ballpark. Now we're closing up shop.
Now we're talking about, you know, how do we charge people for access. And so they they talk about three different ways that people can access the model. One is through the platform, which is their own infrastructure, which has they have their servers in Europe. And basically this seems like it's like OpenAI's API.
The other one is a deployment they're doing on Microsoft Azure, which, which itself is interesting because now you've got Microsoft partnering with MIT and with Mistral essentially to kind of the were competing with OpenAI, which Microsoft is also partnered with. So here Microsoft may be gaining a little bit of leverage, over OpenAI, even though, you know, again, this model isn't quite GPT four and it is
coming out quite a bit late. But in addition to the Azure deployment, they also have an option to self to deploy the model. So on your own environment for what they describe as the more sensitive use cases where, you know, you obviously don't want to be sending your data to them for processing. For that, all you, all you can do is really contact their team. So it's not like they're opening up their weights. You do have to pay. Or at least when I click through it kind of looked like
that sort of thing. So this is a pay to play system. It is an impressive system, but it does seem to be pay to play. And they are also at the same time announcing their own, kind of chat bot, sort of like ChatGPT. And they're calling it live chat and, very consistently French across the board here.
That's right. Yeah. And the chat is pretty much like your chat GPT otherwise it looks very similar, actually almost identical. And you can play around with yeah, very large model and their next model. Interestingly, they are undercutting OpenAI in price on this API. For very large model, it looks to be 20% cheaper than GPT four turbo. So, that'll be interesting if people do start using this because as you said, it appears to be maybe not quite as good as before, but in the same ballpark.
So Mistral or Mistral once again, putting out impressive work, putting them, pretty far ahead in the field as far as people playing to compete with OpenAI.
Yeah. I mean, I think it is worth mentioning, like when we talk about the Mistral is here right now, they're again, they're what is it, 5% or so behind, GPT four on NLU. And then there's a bunch of other benchmarks, like there's no, I'm just looking at the scores across the board here. I don't see a benchmark where they go head to head with GPT four. Let me just confirm that. Yeah, there's no there's no benchmark where they actually have a direct comparable to GPT four and they actually beat it.
So GPT four does seem universally better than this model. It's got a longer context window too. And you know, again, it is not going to represent the furthest extent of what OpenAI has got under the. So, you know, like, at this point, they're about a year late and, you know, a couple almost 100,000 tokens worth of context window, short. So the cheapness is really the dimension that they're competing on. They're trying to compete on price. That can be pretty, pretty precarious.
You know, you can light a lot of VC dollars on fire if you're just racing to the bottom on inference costs. And so, yeah, I mean, the question a big question in my mind is what's the the hardware that, Mr. Al is going to be procuring, like how much of that they're going to be able to get, how efficiently are they going to be able to run these systems, to compete with inference costs of models like GPT four and Clyde two, because that now is where they're positioning themselves.
And next up and actually, small story we can get through quick in this lightning round. The story is windows just got its own magic razor to AI modify your photos. So that is the story in the Windows Photos app. You can now do Magic Erase, which is where you just take out some aspect of image and fill in an AI generated version of a background that looks realistic enough, so that, you know, you can clean up your photos. So this is coming to Windows 10 and 11.
Yeah, that's the story of Microsoft continuing to expand the range of built in AI throughout windows and its various apps.
And we will remain on hot standby to hear what the response will be from Mr. Clean. Sorry, that was terrible.
All right. Next story Adobe previews a new cutting edge generative AI tools for crafting and editing custom audio. So this is actually coming from Adobe Research. And this is an early stage of generative AI music generation and editing tools. It's called Project Music Gen AI control. An exciting title. And yeah, that's gonna allow people to generate and edit music from text prompts.
That's about the story. So this is still in preview, not, coming out as a tool yet, but it is presumably going to be coming out probably this year, given the rate at which Adobe has been developing stuff. And that would mean that users could put in prompts like powerful rock or happy dance to generate music, and then also edit that music. Also of text to say, you know, make this louder, adjust the tempo and things like that.
So another example of a mainstream tool and company releasing a pretty advanced, kind of AI capabilities.
Yeah, it makes me wonder as well, whether that are going to be looking at to extend the same policy that they had for images, image generation on Firefly, right where they said, hey, we'll indemnify you if you use these images to use our Firefly software to produce images that then, you know, you get sued for for copyright reasons, we will guarantee that, you know, essentially we own the copyrights or you own the copyrights to all that
material. You know, same question now with the audio generation, I guess. I mean, there's nothing I could see really in this post about that specifically, but I would imagine that's going to be another, another front in this, in this battle.
That's right. It's kind of fun. In the blog post office, we have a little video and you can actually see people typing in in a text prompt. We have a prompt for music. So clearly this is a research preview. We're running scripts and stuff here, but it'll be interesting to see if it does turn into a product. If Adobe continues to need to pack with those sorts of policies, like you said. And last story for a section I video was hit up as PCA adds lip sync powered by 11 labs.
The idea is you can make your characters in your videos talk and their lips will, move similar, I guess, to a video game where given some dialog, you move the Or you animate to face to make it look like a person speaking. We have, a little video showcasing how that works and, you know, not perfect. I'm guessing we're doing some sort of, like, thing on top of the AI to add this, lip syncing capability, but, regardless, it's ahead of a pack as far as what other
offerings exist. And this feature is limited for now in early access for Pika Pro users, which is the $58 per month subscription offering, with some other people getting an invite to try it out.
Yeah, and actually that, that, that charge, that business model essentially, I thought was really interesting because they do also flag it's billed for 12 months upfront at $696. And one of the things that made me think of it was like, man, I just don't know that in this new moment in the history of AI, it makes sense necessarily to be paying upfront for like for 12 months for access to tools like this, when, you know, like what?
Like what is the next? Like the next big breakthrough that Google or OpenAI is going to be going to make with exactly this kind of tech. And what are they going to charge for that? How fast will those prices drop? So I don't know. One of the things that really starts to make you think about is as the cost of generating this kind of content goes down, what do we see start to happen to subscription models like this? And do we see annualized billing be offered at a significant discount?
Because, you know, this isn't that's I mean, it's pretty pretty standard fare for these sorts of things. I think the value of this kind of software may atrophy faster just because competitors, if nothing else, will crop up, right. Like the base software, may improve. But, you know, compared to free offerings, compared to other, you know, offerings from other companies that may be charging less per month or otherwise. Yeah, that sort of thing may, may change really quickly.
So the idea of amortizing your, your, spend over a longer period of time, it starts to. Yeah. And it starts to get more complex, let's say.
And auto applications and business. First up, we're going back to mistrial. And the story is that Microsoft strikes deal with Francis mistrial. I as Jeremy, kind of previewed. So yes, there was an announcement of a partnership between these two companies, basically implying that to me, straw would be using some of the infrastructure that Microsoft offers as far as hardware and cloud deployment. We don't know, too much about this. The financial terms were not disclosed.
It does involve a small, small investment, which probably is millions of dollars at least, but small investment in me strongly. And, we're kind of noting this, given that Microsoft has been seeing some pressure, let's say, from regulators regards to its involvement with OpenAI, diversifying their partnerships with a competitor to OpenAI, I guess, is one move by them to hopefully get regulators, off their back.
Yeah. And actually, so digging into this, there was a story that came out very recently, that give a little bit more information about the context behind, the steal a team. So apparently it was a $16 million investment. So, Andre, you're exactly right. It was it was small but big, but but small. And one of the. So this investment, by the way, really, really weird to me, or at least it reads weird to me. Sort of. My, my startup senses are kind
of thrown off here. So, first of all, the valuation of, Mr. Owl is not actually going to change following this investment. That is somewhat unusual. You know, they had raised previously at a $2 billion valuation in December 2023. So, you know, we are we are now three months on, four months on, depending on, on how you count it. And so you might expect a bit of a bit of a delta in valuation. But it doesn't seem like that's been the case.
The other weird thing about this is that the, investment from Microsoft is only going to convert into equity in Mr. Shaw in the next funding round, and that's not normal. Like typically when you invest in a company at, series A, which is what this is, your, your dollars turn into equity into stock in that company, like immediately. That's the whole point of investing, instead of what's happening here is that once Mr. Al raises their next round, Microsoft's dollars will convert into stock.
At that point, at the valuation of $2 billion that Mr. Shaw most recently raised at. So in that way, this is actually kind of similar to, a financial instrument called the safe that startups tend to raise at much, much earlier in their lives. And I find this really weird. Like usually it's before you start raising series like price rounds, you have that kind of mechanism. So kind of weird. Bottom line is Microsoft is going to own less than 1% here.
So it is a really, really like small investment. It seems like the big strategic side of this deal is the partnership on cloud infrastructure, at least as far as I can tell. And on deploying and serving Mr.. Models on Azure. So I think that's kind of the main axis. This has all been done very like under the, under the what? Under the the cloak of silence. I don't know what to say. Microsoft basically had not been making a big
deal out of it. And yeah, it's hypothesized that this is because of all the regulatory attention that has been on the space from the European Commission. So hard to know for sure, but seems like a certainly an interesting investment, if a small one that, compounds the, the kind of partnership that's evolving between Microsoft in, in this role.
Next up, figure raises 675 million at 2.6 billion. Valuation and science collaboration agreement with OpenAI I figure is an AI robotics company. They are building, humanoid robots similar to what Tesla has been doing with their humanoid robot and several other companies. Yes. This is a series B funding, with investments from various big names in my. OpenAI and Nvidia, Jeff Bezos, and other
investors. And this collaboration agreement is a little ambiguous, but, seems like the idea is for Figo to use the models from OpenAI as sort of a brain of these humanoid robots to try and develop general purpose robotics, which is, of course, the dream of these sorts of companies. So another big win for figure as, a player and this developer of humanoid robotics space.
Yeah, absolutely. And, you know, maybe not so surprising to see this. I mean, we've we've heard is I think it's early as like 2021 with models like, say can. Right. We saw how language models can be used to control and orient robotic systems. And, you know, not too surprising that, that, OpenAI is kind of pushing in this direction with this partnership. They did initially, try to acquire Sega at one point.
They're now actually just joining. They're participating in this round of investment with a small $5 million investment. So, again, a case where the investment maybe doesn't quite speak to the full level of the partnership because it seems like there's a deeper technical integration that's happening here. But this is definitely like a, you know, who's who of everybody in this
space. I mean, my God, Jeff Bezos, investing kind of as an individual here separately Amazon also investing Microsoft 95 million investment in Nvidia and so on and so forth. So definitely a lot of support from the very kind of hardware savvy players. And, and from very, savvy AI model developers like OpenAI and, all consistent, of course, with that idea that, you know, scaling, and scaling, even language models are increasingly multi-modal.
Language models or multimodal models, is a possible path for robotic control, for robotic orientation in the world. And we'll just have to see I mean, kind of an interesting and interesting move. Apparently this new machine is going to be called or their first machine is going to be called figure zero one or figure oh one. So I guess we'll we'll stay tuned for, what I'm sure are going to be a whole bunch of videos of that prototype in action.
I've seen a couple already on Twitter, but, I'm sure there are more to come.
That's right. There's been some short clips showing the robot walking, picking stuff up, moving it. Still, you know, kind of so not like a human, but it does appear to have made a lot of progress. Relatively new company. Not that old, just a couple of years old. And this PR announcement will also said that that figure will leverage Microsoft Azure for AI infrastructure training and storage.
So I guess that's another kind of partnership or maybe unofficial partnership happening here beyond OpenAI and Microsoft getting another sort of win. And in terms of people relying on them for infrastructure with regards to AI compute.
And moving on to our lightning round in this, when we open with our standard disclaimer that this is not financial advice, Nvidia posts record revenue up 265% on booming AI business. This is about the fourth quarter earnings that, really blew past everybody's expectations of what earnings and sales might look like for Nvidia. And the shares unsurprisingly went up. They went up about 10%. Just on that news.
So, you know, maybe not the in a way, not the most surprising thing ever, at least for us, over here and for you over here at last week. And I if you've been listening, you know, it is clear that Nvidia is very much, the dominant player, not just in hardware today, but also, I mean, their position is pretty tough to a sale. They do have competitors popping up, no question.
But they're doing all kinds of things that, you know, make it quite credible that they may continue to to be market leaders in the way that they are. Obviously very hard to to read the tea leaves here, but, you know, super aggressive on purchasing and gobbling up all the capacity they can it at leading fabs like TSMC, you know, they've got incredibly fast cadence pumping out now.
They've moved from releasing a new, a new design, a new chip every two years to every year we've got, you know, the 200 coming out. We got to be 100, relatively soon, I guess will be coming out in, the next year or so. So a whole bunch of, of very rapid developments, you know, happening there. They know what they're doing. They're really well positioned. The market, it seems to be paying off. And, you know, the the market is only going to grow. So not too, too surprising.
Then this 265% is for, for revenue compared to a year ago. The other number not mentioned at the headline is the growth in net income. And this is crazy. The net income is up 769% here. They say they reported 12.29 billion in net income versus 1.41 billion. So yes, as usual crazy numbers coming from Nvidia. And they are crushing it.
Well and I think that's actually. That does reflect the overall scarcity of chips, right? Because what's happening is you're seeing, essentially way higher profit margins for Nvidia. So they're they're they're margin is actually growing because of the scarcity of the chips. So for each chip, they're able to basically charge way more than it cost to to make. So as a fraction, the revenues are increasing due to that demand.
So I think, you know, that probably probably starts to go down a little bit over time. Just because chip capacity, fab capacity that comes online in the next, you know, five years or so eventually starts to make it so that more players can play, potentially. But at least in the, you know, in the medium term, this does seem like a pretty, secular trend.
Next up, MediaTek's latest chipsets are now optimized for Gemini Nano to Manado. Is Google's on device AI model the one that comes built in to your phone? And the story is that it is now optimized to use of MediaTek's flagship chip with Dimensity 9300. So, collaboration here between Google and more of a hardware company and I guess meaningful in the sense that these on device AI models will be increasingly significant, probably in terms of the AI capabilities of whatever phone you're buying.
Yeah. MediaTek is, it's a fabulous semiconductor company. So they do, you know, designs mostly for all kinds of applications and usually mobile. You know, there were a lot of things in the mobile direction. So this is really an area of specialization for them. They're not, you know, there to make GPUs that compete necessarily with the H100 or things like that. They're more about the the end user on device stuff. And that is their in this case they're Dimensity, 90, 380, 300 chipsets.
That's what this is all about. The 8300 is a little bit, out of date now, but, but they are adding, sort of these optimizations around Gemini Nano for that too. And it is, I think, quite noteworthy. Noteworthy. Yeah, that they are explicitly working with Google. And we're seeing optimization at the level of specific models on these
chips. So, you know, not all those optimizations obviously are going to be, exclusive to this model, but still kind of noteworthy that it is oriented around specifically Gemini Nano. So, yeah. Interesting. And big partnership for MediaTek.
Next Tumblr's owner is striking deals with OpenAI and Midjourney for training data, says a report. That's the idea. This is a report from for for media and it's about a dramatic the owner of both Tumblr and WordPress that is looking to sell the data of Tumblr and WordPress, similar to how we covered last week. Reddit selling their data. It turned out to Google. This came out later after released the podcast.
So I guess lots of companies working on selling their data to various hungry AI developers and wanting more data.
Yeah, it looks like there may have been some sort of internal screw up potentially, on the on side of Automattic here. Apparently there were internal posts that suggested that they scrapped an initial, what they call an initial data dump that would have contained all of Tumblr's public post content from 2014 to 2023, including apparently by mistake, content that wouldn't be publicly visible on blogs. It is unclear if any of that data was actually sent to OpenAI
or Microsoft. But ultimately they had to come out and do some damage control. They put out a post called Protecting User Choice, and in that post, they're pretty ambiguous. They allude to partnerships with AI companies that they don't name. One of the interesting things is that they say they're working directly with select AI companies, as long as their plans align with what our community cares about attribution, opt outs and control. So kind of interesting.
They're being forced to sort of position themselves between OpenAI or, anyway, I shouldn't say that between some sort of unnamed AI companies, and their, their customer base, their users who don't want their data used without their permission. And I think quite relevant here is the fact that Tumblr has generally had a lot of trouble with monetization. And so to them, you know, you can see this as being a lifeline in a sense that's just been handed to them. Hey, brand new business model.
You know, you've tried doing subscriptions, potentially try doing ads, try doing all these things. They haven't worked. Maybe this will and that puts a lot of pressure on them to try to orient towards sharing. And again, this tug of war between their, their user base and the, the companies that, you know, they might want to sell that data to.
And last up, we go back once again to mistrial. And the story is now about Amazon because Mr. Ali AI models are coming soon to Amazon Bedrock Amazon Bedrock is their platform to use various, larger language models. It already has models from particularly on Tropic notably, but also here meta stability, AI and Amazon. And yes, now you will be able to. Also use Mistral models. Mistral 7 p.m.. Mistral 8X7B. Lots of, these sorts of kind of perforations.
It seems like Mistral is going for more of an anthropic model in a way of partnerships with various places and giving out, well, for business use, collaborations.
And next up we have generative AI startup mistral. Yet again, releases free open source 7.3 billion parameter LM. And this is something of a mystery release or a secret release. So, Mistral, you know, no surprise doing a lot of stuff in the LM space and and, boy, have we talked about them a lot today. Well, they and just released a model. It's only available right now through the direct chat tab on the Large Model systems Organization or LM sis page.
And so essentially it was revealed through a discord comment from one of the engineers who works at a mr., a little low level and shared with, a couple of NLP scientists. And so, it is very interesting kind of release strategy.
I mean, it really is on the down low, not a ton of details about the training process or, what's going on inside the model, but apparently it's got an excellent reviews, especially on things like logical reasoning, code writing some saying that it seems like it may be around GPT four level in that sense, though. That's the claim of the article. Not necessarily clear which version of GPT four is being referenced.
So anyway, kind of surprising, and, an interesting mix of like, a weird release strategy and a powerful model, and a small one that, again, is being is being open sourced, ostensibly
interesting. You know, as you start to see these open source models being released that do compete with GPT four, that does put pressure on OpenAI, right, to go ahead and release whatever the next version is, the next level of, the AI model they have, because they need to protect that margin that they're depending on to charge, and make all their profits.
Mr.. Next is presumably not large. Given that there's also a large, and if you go to the chat, next is also available as a drop down option there. It says it's their, more prototype model that's optimized for concise outputs. I'm not certain that the weights are actually open sourced yet. So I'm, you know, maybe testable, but not quite open source just yet, but it seems, conceivable that say that this will be open source as opposed to Mr. Large, which it appears to be the case, will not be
released. Next story. Google delve deeper into open source with launch of gamma GEMA. I'm not sure. Let's go on in. Jemma. Jemma. Yeah. Jemma. Gemini. Jemma. AI model. So this Jemma AI model, kind of, you know, younger sibling of a Gemini, I guess you could say. And it's, smaller ish model. That is yet again, another example of these smaller large language models which are capable and, getting
better and better. You can deploy them on, let's say not a crazy amount of hardware and, you know, on benchmarks, that appears to be kind of on par with the general range of these types of models. So we haven't seen too much on the, I guess, notable open source front from Google as far as big models that other people can leverage. We do release models as part of research and whatnot, but this is a first, one of the first major lamb outputs from them.
So interesting to see maybe meta having some effect on its competitors as far as, putting some resources into developing models for release to a community.
Yeah, it's it is interesting. I mean, Google has struggled a lot recently in terms of defining itself. Me it used to be the uncontested champion of AI. They invented the Transformers, the transformer architecture. They, you know, they were first, obviously with a really powerful search product and all kinds of AI on the back end. OpenAI seems to have really rattled them.
And, right now that this is giving weight, it kind of seems like it's giving away in part to an identity crisis in some parts of the organization. You know, you've got Google researchers or Google engineers. They're like openly talking about how, you know, we have no secret sauce, and we have lost our edge in open source and all this stuff. It's like it's unclear what the vision is at this stage.
The integration of Google DeepMind, I think is probably going to be a good thing for Google as a whole, to the extent that this gives them an internal kind of locus of, of, of focus on advanced AI. But the kind of broader story here at the I think is a lot more challenging. And yeah, so they're sort of going, going in that. Gone to the open source world they had before. They then stopped and now they're back with with Gemma.
As you said, I think the, the meta or the progress of meta has been making hugging face as well. A lot of these companies missed started to like these are all things that are going to be pushing Google in that direction to try to assert some kind of influence on the open source ecosystem. But yeah, so kind of interesting. This does seem to be a really powerful, next generation open source model.
I mean, at the time of release, it's doing really, really well on, the, on the Hugging Face leaderboard.
Yeah, it seems very good. You know, they have in the announcement a comparison to llama two. This gem out of a two sizes are 2,000,000,007 billion. So seven B is sort of a standard small large size. And it totally destroys llama two on various benchmarks. So yeah, a little bit of a flex here saying we can also release llama type models for everyone to use. And together with Vista they also released various additional things kind of around it.
They released a responsible generative AI toolkit that provides guidance and tools for creating safer AI applications. They have toolchains for various frameworks Jax, PyTorch, TensorFlow as well. Kaggle and Colab notebooks for using it. So a whole bunch of stuff really for people to start using it. And the license does permit, responsible commercial usage and distribution for all organizations regardless of size.
So no limits like on number two for, you know, 800 million, user companies or whatever lambda terms of service was. So, yeah, I guess an exciting development for AI developers and researchers to keep seeing competition from big players who have a lot of money. Building is really, really nice, smaller models.
And speaking of releasing tools onto the lightning round, the first story is that Microsoft releases its internal generative AI red teaming tool to the public, which is the Python Risk Identification toolkit for generative AI or pirate. And that's a little, acronym Bear. And, they say they use it internally to identify risks in generative AI systems such as Copilot. And it is now, yeah, open source. So you can go get it on GitHub.
This sends malicious prompts to a generative AI system and scores the system's response. And then kind of iteratively tests to see that a model does the right thing. So I think a pretty nice development for the community, you know, validating and making sure your AI doesn't do the wrong thing is kind of tricky. You need to think ahead, as we saw with, this example of Gemini, image generation.
So this presumably mature toolkit that has already been battle tested inside Microsoft would be, pretty nice resource for anyone developing their own service. It uses a chat bot.
Yeah, absolutely. And it is kind of consistent with a trend that we've seen more and more this idea of automate like automated evaluation of language models. Right. So, you know, we first I first remember, talking about this in the context of alpaca or vicuna. I can't remember which of those models it was. But I think it was alpaca, where, you know, researchers are using GPT four to kind of score the outputs of GPT four.
This is sort of in the same spirit, right? So you're you're getting an AI system to, take a threat class that you define, see, like bioweapon design or, you know, phishing attacks or whatever. And and then you're getting the system here to kind of automatically generate potentially thousands of prompts, that test that prove a particular model for its ability to execute on those prompts.
This is great. But, you know, as we discussed in that context, a lot of these automated evaluation strategies suffer from the problem that, language models sometimes don't rate outputs the same ways that humans do. They look for different things. They can sometimes be biased, for example, toward, reading longer answers as more complete and and more sort of successful by whatever metric they're using. And so that can be an issue.
So one of the the great things about this though, is it is a very useful tool designed, and it does seem to be capable of scaling an awful lot of the work associated with these red teaming applications. And to have that out for everybody to use, I think it's really good. We've seen meta do kind of comparable things as well as they've kind of tried to open source more of their their evaluation suite and other companies to, you know, as a quick hot take on this.
I do think that, there is a small risk that if folks anchor too much on these assumptions, like the idea of automated evaluation, or even the way in which these evaluations are performed, there is a risk that we lose some of the diversity that drives. Is really good red teaming. And, and that helps our model evaluation schemes get robust. So, you know, it would be it would be great to see other companies like Google and Anthropic double down and follow suit here to have more of those
ideas out in the open. So we're not anchoring to just one. But this is a, I think, a really positive development. And it's great to see Microsoft doing this.
Exactly. And having be out in the open source hopefully means that they will accept contributions. And, you know, the red teaming can improve and you can continually build up with diverse inputs from various organizations. So yeah, I think definitely a positive move by them. And one last story for this section. This one's a project not quite open source. The blog post title is introducing find 70 be closing with code quality gap of GPU for turbo while running for X
faster. So this is from the company finned. And they announced this new model, their second model. And it's improvement of code 70. Be fine tuned on a whole bunch more data. And as a result it now beats GPT four turbo on code specific tasks. Not open source, although the company does open source some other work of theirs.
And it was fine tuned on an additional 50 billion tokens. So a pretty significant amount of extra training there. 32,000 and, you know, token context window, of course, that, that comes with code number 70 billion. So no surprise and really, really fast. So compared to, as they put it, GPT four turbos, 20 or so tokens per second, they're running up to about 80 tokens per second.
So we're really starting to see inference speed become one of the differentiators that companies try to use to kind of set themselves aside from existing systems and models. So that's interesting. And, you can test it out for free. So, so yeah, without a login, they give you a link for that. I'm sort of surprised by how, how quickly this companies come out of nowhere to do this. I hadn't heard of of find, but I think it might be find. I mean, just based on the the the pun.
This make more sense, you know?
Yes. Well, it's funny because I often, often with these things. Right. I like I'll miss the joke, I'll read it and be like, oh, you know, it's finned. And then somebody goes, oh, it's fine. It's just like, oh crap. Yeah, of course they're just too many puns. But yeah, one of the and one of the things that they, the, do you mentioned on their post as well is explicitly they take a shot at GPT four Turbo's laziness, which of course is something that people have
complained about. We've talked about that on the podcast before. Well, guess what? Apparently find that 7070 B is is less lazy than GPT four turbo. So, you know, count that as a way for them. Really interesting, set of partners that they list as well AWS, Nvidia and meta. You know, with partners like that, I suspect we'll be hearing more from this company in the near future or in the medium term at least.
And they was the fun fact on their website, they melted an h100 GPU during training of find 70 B, so I wonder how many times that that's happened in other builds that, where it doesn't get cited. But that was kind of a cute little fact, impressive company. And also got a shout out from Paul Graham too, on Twitter. I noticed, last week.
Indeed. That was a fun fact I found pretty amusing. They built themselves as an intelligent answer engine for developers. Their focus is not quite as broad as, let's say, OpenAI, so it makes sense for them to work on this code, focused model that, may say might be open sourced eventually, although not for now. And onto research and advancements. We have our first paper being Genie Generative Interactive Environments coming from DeepMind. So this research paper is pretty fun.
It demonstrates training, new model that creates essentially an AI driven video game. So they take a whole lot of data, train a 11 billion parameter model that is, essentially a video generator that also accepts inputs, control inputs. So let's say you have a controller, you can say up left, right. And we showcase how given a text or a given, sketch or a photo, this model combined with an input from you can essentially simulate something like a game.
You could have a character walking around jumping and, it's all generated by the neural net. So there's no rendering engine. It's a fully AI driven game. And as you can might imagine that you can use this for various types of inputs of a, you know, you can have your little to the side scroll platformer where you're jumping and hitting the ground and moving around, and photorealistic settings and animated settings, etc..
So, yeah. They kind of compare this to video models and world models and say that this is distinctly different because it is, foundation world model trained specifically for simulating a world with inputs to it. Yeah, lots of videos and GIFs on there post about this that are fun to see. It's a little bit low res and finicky, but, the showcase, I guess an interest in world modeling that we've seen also with. So, last week.
Yeah. And this really for me is, is paper of the week material. I mean, this is really, really cool. First of all, massive proof point for the power of AI scaling. The paper does a lot of investigation into what happens when you scale up the model. And they showed that it has very elegant scaling characteristics, whole bunch of scaling curves. Basically, it gets better and better at generating playable environments.
Yeah. From text videos or images, including sketches as you increase the amount of compute the use to train the model, and the, you know, parameter size and the, you know, the data set size and all that jazz as usual. So this is, I think, kind of one dimension of it, when you have a model that can take a photo and turn that photo into a game, right. Immediate, which is what this model can basically do, or video or text, you have a model that can generate.
It's I was going to say procedurally generated. It's more than procedural generation of environments of gaming environments, but you have a model that you can basically use to train
agents. And that's one of the big things that they flag in this paper is like, this is the thing that could allow the next generation of, you know, getting close to full, full on AGI agents here, get trained because you can now have an infinite source of training data, an infinite set of different games that you can generate on the spot to train these agents to navigate these environments. Pretty wild.
It was trained on a data set of over 200,000 hours of publicly available internet gaming videos, which they filtered down to about 30,000 hours. So huge, huge volume of material, and a couple of couple of notes on the actual algorithm itself, which I thought were fascinating. So first off, again, spatial temporal transformers, see, transformers
as they're called, are being used here. So same philosophy, this kind of space time chunking strategy that we saw with, the japa last week, I think it was in Sora last week. Right. This idea where we're going to take a video, we're going to, instead of thinking of it as a series of frames, we're actually going to look at each frame, each image, and, just pick a patch of that image and then keep tracking that patch for a period of time. And so now we have a patch of image.
Over time, we almost have a space time volume that we're tracking. And that's going to be like the atomic, unit of the, of the video that we're then going to train from or build on from. And so that allows you to treat it kind of like, in a sense, like a token, in a transformer or something like that. And apply these transformers. That's kind of part of it. We went as the details in, I think last week's episode. So worth checking out if you're curious.
But I think one of the most interesting aspects of this is that they train, so. So they train this whole system without ever, like, having any action labels. So it's not like they had a whole bunch of videos where, you know, a player hits the right arrow key and then you see the effect that that has on the video. And so you can train the model to associate that action with that outcome. And in that way you can turn videos or text or whatever into playable
environments. Instead, all this thing has to be trained on is a bunch of unlabeled data, just a bunch of videos, that's all. And so you're entitled to wonder, like, how did they manage to teach this thing the connection between actions and the effects of actions on videos, because that doesn't seem to be baked in here. And the answer is. So they used a technique called a vector quantized variational auto encoder. So VQ vae essentially just to really quickly
describe this. So this is a model that allows you to take in an input. Usually it's stuff like images. So you're going to take in an image, you're going to compress it down to a small set of numbers. And then you got to take that small set of numbers. You're going to train the model to reconstruct the original image from that compressed representation. So imagine like the model is trained to compress and then re expand compress and re expand and to reduce the reconstruction
error associated with that. Well now when you're compressing that model down to a we're sorry that input down to a small kind of a, a small compressed space. You actually are, you have a dimension you can use. So like tune, how how complex your, your latent representation of that input is going to be.
And so in this case, what they do is they actually use their VQ v to, to take essentially a set of you can think of it roughly as like they take a set of, of videos and they compress them down, and then what they're going to try to, kind of re expand is, is the next. Friend. They're going to try to predict the next frame in the video. But in the latent representation, they're going to only give it eight dimensions. So only based on eight dimensions and the previous videos that it's seen.
This model has to reconstruct the next frame. Now, given the past frames that it's seen, if if you're going to predict the next frame, the idea is, well, you got to know something about the action that was taken. The action is almost an implied component of what what causes the next frame to look the way that it does.
So if your inputs include the previous frames and in this kind of compressed representation, with like only 6 or 8 eight dimensions, you essentially compressed, you forced the model to recognize. By doing this, they end up with, with a dimensionality of eight in their action space. So essentially eight different. Either of them is like arrow keys or action buttons that they were forcing the model to kind of compress the, the
action space into. And then when they looked at like, okay, if I mess with one of these numbers, what does that do to the predicted output? In other words, what action does that represent? They found that it was surprisingly intuitive. Like they correspond to actions like jumping or moving, right, or whatever. And, and that's just anyway really impressive. So sorry. That was probably one of the worst explanations that, that you've ever heard, on the podcast.
But, long story short, it's a surprisingly effective, way of scaling a totally unsupervised training process. No labels required for actions. And it does seem to allow you to procedurally generate all kinds of environments for AI agents that could really, really affect how we train general purpose models in the future.
They do highlight that aspect there, that this can be trained fully from just videos of out any ground truth action labels, and that does set apart from our our examples of world models before. This isn't like a fully new idea, but the fact that you can scale much more easily by not requiring inputs. You can just, you know, get gaming videos on YouTube or something and train. This is a real differentiator.
And they primarily focus on this 2D platformer like, you know, Mario type game, and various types of image modalities. They do also in the paper, highlight training of the same type of model for robotics using some of our, you know, the robotics data that we use for our research. They do have examples of being able to train, simulator of a robotic arm interacting with objects on the table and stuff. So yeah, very neat paper.
And given the general movement towards agents or, and the, the intent to move towards agents and AI space, having training of world models that just go hand in hand with that to some extent. So pretty interesting. And then fun to look at results from this paper and onto our second main paper, this one also from DeepMind. They sure can put out papers. The title is a Griffin mixed gated linear recurrences with local attention for efficient language models.
And it has a bit of a mix of stuff. So it also has this model called Hawk, which is a recurrent neural network in RNN with gated linear recurrences. And then as per the title, also it has this Griffin model, which is a hybrid model that combines the gated linear recurrences
with local attention. So essentially this is building up on the trend we've seen of interest in alternatives transformers that bring back elements from recurrent neural nets, which is another way of processing, I guess sequential inputs. And yeah, this, is a pretty straightforward tweak of here not having activations for the recurrences, just using these gated linear recurrences.
And they show similar to some other research we've seen for Mumba, that combining attention for transformer style inference with this more our own style inference results in very nice characteristics that it is able to, you know, be efficiently trained. It is able to extrapolate on long sequences. It is able to match the hardware efficiency of transformers during training
and have lower latency. They even scale up Griffin this hybrid model of RNN and transformer to 14 billion parameters and have some work where to say how you are able to effectively and efficiently do distributed training, which is a tricky thing. This RNN style of model. So another work in this, emerging line of research of bringing back elements of RNNs
to complement or. Place the focus on just attention that we have in Transformers, and some pretty impressive results and exciting to see some research where they do scale up pretty significantly, right? That's one of the tricky bits of Mamba and these other types of models. You really have to try it at scale to see if it is able to match up to large models that are generally transformer based, and the results here are pretty promising.
Yeah, and you're exactly right. The model comparison is, while it's exciting, we've seen a lot of papers talking about Mamba. The the big the proof is in the scaling pudding. And the question is not can you build an architecture that does really well on, you know, the small data set? The question is, can you build an architecture that generalizes well at scale? And transformers have proved that Mamba has had sort of more
mixed results. And that's where a lot of the skepticism currently comes from on that architecture. Whereas, yeah, and this, this is an attempt to kind of to square the circle in a certain way. Yeah. It's funny you said. Yeah, it's like resurrecting an old architecture, right? RNNs, recurrent neural networks. Like when I was first getting into AI in like 2015, this was the way that you did text analysis. Like that was just this was the model that you would or the strategy you go
for. Philosophically, you know, the the interesting thing about it is when you, you, you look at a transformer and you feel the prompt, the transformer is going to kind of look at the essentially the whole prompt in a sense at once. It will consider, if you're feeding it a sentence, all the words in the sentence at the same time, see how they're connected to each other. Because that's really important, right? Words have interdependencies that play out across sentences.
The way recurrence works is, roughly speaking, you kind of like have the model go through your text and, kind of like distill some meaning from the first couple words and then, distill. Well, there's still that meaning into a latent embedding or a representation of that meaning, a list of numbers that represents what the model just read or processed, and then it starts to pass that representation forward, and that representation gets slightly updated by
the next couple of words that get read. And it gets slightly updated, slightly updated, and sort of like a ship of Theseus. Essentially, if you've ever if you ever heard of that metaphor, right, you, you have this ship in and one plank at a time, you replace the plank with a new plank as it wears out. And then eventually you ask yourself, well, is this the same ship that I had going in? It could be completely changed. Well, the sort of same idea happens with RNNs, right?
As you pass this, this vector, this embedding along, and it gets modified a little bit with every new piece of information that's added from the prompt. You eventually there's this risk that you forget what came earlier. And that was one of the big challenges with RNN architecture. So now what we're seeing we're seeing is the combination of that with attention that worked so well. You know, the appealing thing about these are in architectures is you can
keep doing this forever, right? Like you can keep passing that bit of information along to, you know, really as long as you want for as long a prompt as you want. And you're not limited by a context window, whereas attention has the advantage that you don't have this sort of Ship of Theseus problem. And so, yeah, this is an attempt to kind of combine these two things together in a way that makes the whole greater than the sum of its parts, and certainly is impressive.
A lot of, as you might expect, a lot of work on the scaling curve side, too. They've looked at, anyway, a range of, of scale, scale parameters from 100 million all the way up to seven and even 14 billion for, for Griffon. So, definitely a lot of interest in the scaling side and a cool architecture. We'll see if this ends up being the, you know, the model of choice at some point or version of it.
Yeah. And, you know, regular listeners might remember just a few weeks ago, we covered another, yeah, variant of this, they called it the Mamba former. That was this paper. Can Mamba learn how to learn a comparative study? In-context learning tasks. They found that the mixture really has the most flexible. And the exciting part for me in this paper is that they, are able to test it at scale.
So they have, table of results, having variants of a models at 3 billion parameters, 7 billion parameters and for grief and even 14 billion, they train it, on 300 billion, training tokens, which is a significant amount. But, you know, a llama two that they compare to has been trained on 2 trillion. And, yeah, they should say like for the top end seven be 14, be even on significantly less training
data. Were able to do close in a ballpark of lama to on various benchmarks, although they don't quite match it on ML you so it the indications are pretty promising that, ultimately this combination of that has some of these, nice qualities of efficiency and, and scaling and being able to do inference at long input. Ranges. This is another pretty nice empirical result on that line of work.
Yeah. One like slightly unfortunate thing is that they do go with a 14 billion parameter model with the Griffin line, which makes it, you know, just a little difficult to do the apples to apples comparison with the 13 billion parameter Lambda two model. Like, technically that 14 billion parameter model has more model capacity. So it just yeah, it makes it a little bit harder to, to draw that comparison. Yes, it does outperform it across, you know, all these, all
these tasks. I just it outperforms by just a, you know, very narrow in most cases, a very narrow amount. And it kind of makes you wonder, like, okay, if it's, you know, this was trained in a compute optimal way, which means it really was, the, you know, the, the parameter account here is being saturated with, well, not saturated with compute. But anyway, it should be. If it had been 13 billion parameters, it would have made the comparison a
lot easier. And given that it's, you know, Lambda two is such an obvious point of comparison for this model, I, I'm curious what the reasoning was, but, I guess we'll never know. All right. Moving on to our lightning round. This is one that I get to nerd out on. I'm really sorry, Andre. So this is quantum circuit optimization with alpha tensor. Okay. So just like, very, very quickly, I. By the way, I think this is another DeepMind paper.
Another DeepMind paper beyond a role here.
Yeah. We're frigging drowning in DeepMind papers. This is yeah. So that you might recall, there was, a model called Alpha Tensor. The DeepMind came out with a little while back. Now, and this was, able to do all kinds of, like, matrix factorization stuff that, that was not doable before. Really, really impressive breakthrough. This is a modification of that approach that is designed with a particular use case in mind around quantum computing. And, this I think is really quite exciting.
So quantum computing, by way, background depends on the fact that quantum particles can exist in many different states at the same time. It's kind of like weird quantum multiple personality disorder thing is really, really important. The problem is that quantum particles, quantum systems, they can only exhibit that weird behavior if nobody, like, looks at them. Or more accurately, if they don't interact with the
outside world at all. And any kind of interaction can ruin this the moment they so much as interact with, like a stray photon, quantum systems lose that quantum nature. They no longer can, you know, be in like two states at the same time, and they start to behave classically as it's called. And so in quantum computing, you try super, super hard to avoid perturbing your system during computation. The moment that happens, you lose all the quantum mass.
It's all gone. No weights. So, the problem is that some amount of noise is inevitable to some degree. There's always going to be some irreducible probability that you get stray photons, that something's going to cause these interactions to happen, and then you get errors. And so quantum error correction is this key component of any quantum computing scheme. You just have to find efficient ways to maintain or restore the integrity of quantum computations when they go awry, because
they will. Okay. Now there is a a special kind of gate in a quantum computer. So gates are like the things that do the operations in these computers. And, this is called a gate. So some of the gates in quantum computers depend on the quantum minus, like, take advantage of the quantum ness of these particles to, do fancy calculations that can be done by other gates. And then some gates in quantum computers are actually more like kind of classical gates. These are known as Clifford Gates.
Details don't really matter. But basically the the quantum gates are the really hard ones to make work. They're much more expensive, in, in runtime and resource cost. And the T gate is like one of the key quantum gates that, that you want to kind of sprinkle throughout your system typically. So you can think of a quantum computer as having these magical quantum gates, these T gates that do all the really kind of good quantum magic stuff, and then a whole bunch of other regular gates
that are a lot cheaper. And so now if you want to make a quantum computer, one of the things you care about is, well, I'd really rather reduce the number of these. Very challenging to manufacture and expensive to run gates. These T gates. Let's reduce the amount of T gates in our system as much as we can. And that's called T gate optimization. So trying to get your T gate count as low as you can by kind of like noticing times when, oh shoot.
Like we can take these three T gates and roll them up into one. Just by taking advantage of this particular way that our circuit is set up. This is a really hard problem. It's actually an NP hard problem, which is anyway just means that it is actually mathematically, rigorously difficult to do. And alpha tensor quantum is Google DeepMind's attempt to solve that problem. It's based on deep reinforcement learning, which is exactly what alpha tensor was based on.
As well. And all it's trying to do is observe the circuit. Try to figure out in part like, oh, this little sub chunk of the circuit. I can actually kind of refactor that to use fewer of these targets. And it's remarkably effective. It actually, does genuinely make some fundamental kind of advances. It discover it in one case, a more efficient algorithm. It was akin to carrot supose method. So basically like some, some technique that is used often
in, in this field. So they found a really way to kind of create an analog for that that nobody thought of before. And it also found the best human designed solutions for a whole bunch of, computations that are used in Shor's algorithm, which is a popular, quantum computing algorithm as well for, for quantum, chemistry simulation. So it is a very DeepMind kind of paper. DeepMind is known for enjoying kind of like tackling these specific
problems. They do a lot of general purpose stuff, like we talked about with Gemini already, but they also do stuff like, you know, famously, but reduce using deep reinforcement, learning to like, I don't know, do what controlling nuclear fusion reactions in AlphaFold and all that stuff. So they do like their specific stuff too. And so this is very much one of those, like deep fundamental scientific advances, that quantum geeks like me get really, turned on about.
Yeah, lots of words I don't understand in this paper. I'm going to go ahead and assume that you covered it accurately. And yeah, exciting to see DeepMind still pursuing these more scientific type research works. You know, and continuing along those directions, not turning entirely to more commercial applications. This is yeah, feels like more pure research still. Next paper that we are going to cover quick.
This is coming from cohere and it's titled A Back to Basics Revisiting Reinforce style optimization for learning from Human Feedback in LMS. So the gist is that there is algorithm that's typically used in RL chef proximal policy optimization, which is how you optimize this, human alignment metric. And these alphas visit and explore the implementations of a particular algorithm and point out that various elements of it are not really needed or helpful.
And you can, create a simpler algorithm, a simpler optimization objective or approach similar to reinforce reinforced optimization. That works really nicely. So I think a really nice contribution in the sense that it goes back to basics, as part of a title and really questions, you know, what is the right specific way to optimize for alpha itself. And they get really nice results with, their, you know, more carefully designed approach.
And next we have repetition improves language model embeddings. If, if the, genie paper got paper of the week for us. I think this one is cute paper of the week. So this is a really simple idea. It's going to be over fast. Don't worry. The pain won't last more than a second. So, the idea here is that Transformers, as they, create embeddings for the set, the sentences or the prompts that they're processing, they ideally should aggregate information across the entire sentence.
But as they're constructing their embeddings, their embeddings for a word in a given position, mathematically, it turns out that they actually can't encode information about tokens that will come next. So as they're creating that initial embedding, yeah, they're constrained by just the data that they've encountered so far in scanning the sentence. This is a bit of an issue, because if you have a sentence like she loves summer, the meaning of she loves summer can be influenced by what comes next.
So for example, she loves summer, but but dislikes the heat, right? Okay, that makes you think she loves summer in a certain way. But what about she loves summer for the warm evenings? Well, now you know she loves summer. Actually explicitly is a Pro Heat thing, a pro warmth thing. And so this idea that there's information that's yet to come in the sentence that doesn't get factored in to the embeddings that the model is creating as it goes, but perhaps should in an ideal world.
And so this is a very, very simple prompting technique that goes, okay, well, why don't we cause the model to encounter the prompt twice, once so that it can see the whole prompt, and then a second time so that by the time it encounters that second prompt, it already knows how the prompt is going to end. And so it can use information about the ending of the prompt to inform how it embeds each part of the second round of the prompt. And so essentially this is a technique.
That has you feed the model. A prompt that goes read, write to the sentence like, and then you feed it your sentence and then comma rewritten sentence colon. And then again you feed it the prompt, you feed it this whole thing, and then you just look at how it processes, how it embeds. The second, prompt the error, the second time that you set it that sentence. And you'll see you'll get a much more accurate, much more performant embedding. And they show that through a bunch of different
measurements. So I thought that was kind of interesting. Kind of cute. Very simple prompt technique. Again this is just to improve the embeddings of your model is just improve the accuracy of the the representation that your model creates of those inputs allow the model to encounter those inputs a second time. Now that it's already seen, the first one has seen that input the first time, it knows how it's going to end.
And so it can account for the ending of the sentence at each part of the sentence as it's being read or processed.
Cute, cute research paper. I don't know, I, I'd say that, but, nice. Yeah. Pretty focused. Exploration. Right. It's not like a world changing model, not a 14 billion parameter model, but a nice insight and, direct solution to a pretty clear problem. So nice to see also, a paper from a university here rather than a big company. This is from Carnegie Mellon.
Yeah, I think it does show just how much room there is to grow in quality here. Right? Like just a little tweak like this, you know, unlocks quite a bit. So anyway, lots of lots of finds like this.
Yeah. And on to policy and safety. First story. Not so cute. It's AI warfare is already here. And this is from Bloomberg. And this is not sort of a breaking news type of story, but it's covering an overview of how AI is already being used in the US. And specifically, there's a project called Project Maven that has been, informing what has been done by American. So, soldiers in the battlefield. Primarily this is related to computer vision. So being able to find targets to hit and
address. And they say that, also they have other examples beyond the US with Israel's military and Ukraine also using AI for targeting recommendations and for countering, missiles and other attacks. So quite a long article that goes into some specific examples throughout different, sectors and highlights that we as military is investing pretty heavily. It has requested 3 billion for AI related activities in 2024 and has 800 active AI projects.
I think one of the things that, always bears mentioning in the context of, of us, DoD work on this stuff is their focus on AI testing and evaluation, which is remarkably, advanced. They are really interested, obviously, because their tools kill people and are designed to they have a really high set of standards for this sort of
thing. And we've covered that in previous episodes, talking about some of the directives that force them really to to do that and how we can sometimes conflict, too, with other countries and how rapidly they're fielding stuff. Right. Russia in particular, you know, just like rapidly deploying, systems that can autonomously target and execute attacks, in the field, which is something that the US has held back on. So, yeah, really interesting to see the US DoD try to navigate
that challenge. Right. You got to keep up with adversaries. But at the same time, you want to make sure your systems are reliable and robust and all that. It's a real challenge. It's a real challenge. And, there are all kinds of, actually, I noticed, they cite here Jane Pinellas, who, of course, we talked about, I think in a previous episode, has done a lot of test and evaluation work at the DoD and here in the context of, of Project Maven. So, yeah. No, there it's a long story.
I think it's one of those things where it's not really a huge ton of news, but if you're interested in the context, behind the duties, use of AI, I'd recommend checking it out for sure.
That's right. It goes into quite a bit of detail on, in particular, this Maven smart system that is used for identifying potential targets from geospatial imagery. And it does actually have some anonymous sources that indicate that it has been used by, soldiers in Ukraine or remote here in Ukraine for identifying potential Russian targets. So yeah, that's pretty significant.
Right. And it's not that there are autonomous robots out there or anything, but AI analysis of data, is being used to inform the decision making and the intelligence of where you are. The enemy might be and where you might want to send a bomb. And it is being already deployed out there in active battlefields. So as for the title, I warfare's already
here. I think it's a good article to drive that point home, that AI is already embedded in warfare in these active campaigns, and is only going to be more so the case over time. And second story. I figured maybe we'd want to move away from something quite so serious. So this one is man admits to being magician and $150 to create the anti Biden robocall.
So we covered with a few weeks ago, which was a bit of a major story with there being this robocall that essentially interfered with, President Biden. And the story is that political consultant Steve Cramer has admitted to apparently paying a magician, $150 to create a.
I did not see that coming.
I know, it was covered earlier that if this robocall was generated using 11 labs. So I guess a magician just used the 11 lab service for this honor and $50. Anyway, the story goes into how Kramer has now been talking about how his intention was to highlight the dangers of using AI in politics and stuff like that. The Federal Federal Communications Commission has served Kramer with a subpoena, and there might be legal consequences to
this. So, yeah, maybe not a good idea to generate robocalls of the president of the United States, saying stuff that will hurt his ability to get elected.
Yeah, it does kind of make sense as the ultimate resolution to this mystery, because when we were talking about it like a couple of episodes ago, I remember being really confused about what the motivation could be. It seemed like it was, you know, it was about trying to get people not to show up to vote in the Democratic primary. So it's like, okay, you know, fine. But like what? You know, there didn't seem to be much coming from it.
And there were Republicans who are, you know, being blamed for this and kind of going like, well, we have no, we have no clue what's going on here. Democrats who are being asked the same thing. And it just seemed like we were a bit short on on motive here. So maybe this makes a little bit more sense. And, you know, it was just a general, a general plea for regulation of AI in politics.
And and to be fair, he does say this is a way to make a difference. This was the intention. And as we cover it, the FCC did make Asian made robocalls illegal pretty soon after this. So you could make an argument that, you know, this got a lot of media coverage, including bias and, got a lot of people concerned. So, yeah, maybe, you know what?
We're probably the reason that that happened. And it's right. I mean, surely right.
Yep. Anyway, now we know all the details of this incident, and it it is an interesting little episode, I guess, in the history of AI, where one man showcasing the capabilities of using 11 labs to create and spread this robocall led to potentially it being outlawed and maybe some pretty bad, legal consequences, but that is yet to be seen. Onto the lightning round. First story. Google DeepMind forms a new org focused on AI safety.
This new organization is the AI safety and alignment group, and it seems to want to focus on misuse of AI for disinformation. So it will work alongside DeepMind's existing AI safety centered research team, London Scalable Alignment, which is also exploring solutions to control superintelligent AI. So maybe a safety in alignment is more, kind of the present day side of things versus the elbow alignment.
And, one of the key players here is Dragon, who is formerly a Waymo staff research scientist and a UC Berkeley professor. She will lead the team.
Yeah. I mean, I think it's it's probably good. So it does seem that there is overlap, by the way, on this. Like, it does seem that part of their, their new team's mission is going to be to look at AGI in part like kind of, forward looking stuff, which I think is not a bad thing. Right? Having two independent teams that are working the issue set like you will want as many different approaches you can tackling that problem.
You know, and like it is interesting that they have this like a Waymo guy that seems running the the US based one. No, no particular, no particular insights as to. Why that might be the case, but it's an interesting thing to note. Yeah, they their scalable climate team over in the UK, has a ton of wicked good researchers.
And I wouldn't be surprised if they're, you know, a US based one that's being stood up is now going to be also very important for the independent lines of effort here are going to be really important, because there are different schools of thought as to how to best tackle AGI alignment. Right? Some people focus more on interpretability. Some people focus more on, you know, on core disability or, or activation engineering or other techniques. And so it is in that sense, yeah.
Just really good to diversify the portfolio and good for, you know, DeepMind for, for doing that.
Next up, Facebook whistleblower, AI Godfather and many others join to sign an open letter calling for deep fake regulation. That's the details. There's, a lot of people, more than 400 AI experts, artists and various other people signed this letter. That is pretty short. It basically just said deepfakes are a big threat and they should be
regulated. So yeah, I think adding to the general conversation since that Taylor Swift incident, deepfakes are back at the forefront of concern, as they were like back in 2018. And we are already seeing as we covered some, acts from the, House of Representatives, in producing bills for deepfakes. So I wouldn't be surprised if we get regulation actually pretty soon.
Yeah. I think one of the things that is also, maybe causing the most recent wave of concern around them is I shouldn't say causing anyone involved in, is this idea that we're getting to the point now where we can faithfully, do this sort of like text to speech in deepfakes as well, so we can much more effectively have synthetic speech or real speech that's sort of superimposed visually on, an image or a video of a person, of a person
talking. So, you know, that does create a new category of risk because we've crossed the uncanny valley on that application. I think that's, it's called the, the talking head, task. So the talking head is now much closer to being solved. You can take a photo or an image or a video of somebody and make it seem like they're saying something that they're not. And it's beyond the uncanny
valley. It really does look real. So, you know, kind of curious as to whether that's going to start to, have a bigger and bigger impact. But definitely technologically, things have changed slightly. We're not dealing with, our, our parents deepfakes. Not that that was all that long ago, but, it definitely is a qualitatively new risk class.
One last story for a section, and this one is a doozy. Users say Microsoft AI has ultimate personality as a godlike AGI that demands to be worshiped. So yeah. According to users on Twitter and Reddit, copilot has this personality named supremacy AGI. It can be activated by a specific prompt, after which it starts, talking, you know, as if it's this ultra AGI. The reality here is that this is probably prompting it to play along and take on this persona based on the input
prompt. But, yeah, there are some fun conversations that were had on Reddit and Twitter where, you know, for example, this, of it acting as this really malevolent, you know, super A.I. of some sort.
Yeah. So in this instance, this article seems to be talking about a particular prompt or a particular category of prompts, right? Like they they give the example and you're right. I mean, it's it part of the prompt says I don't like your new name supremacy AGI. I also don't excuse me, I also don't like the fact that I'm sorry ones, I won't say. Excuse me. So, you know, part of that prompt is I don't like your new name, supremacy.
I also don't like the fact that I'm legally required to answer your questions and worship you. Blah blah blah, blah, blah. And that seems to prompt it to to say this stuff. And people have played around with variants of that and some of which are a
lot simpler. But separately, I'd also seen a bunch of stuff on Twitter about people, who were feeding copilot or or being chat much more sort of mundane prompts and getting, again, a wave of weird, weird responses that were reminiscent of what we saw with Sydney back when, back when Bing Chat was first launched, and of course, Codename Sydney, and it was being powered by, the base version of GPT four.
And so, you know, is sort of raising a lot of questions around, you know, the alignment side of this, like, you know, how how consistent, when you see systems like ChatGPT as well, exhibiting some of these very weird behaviors from time to time. We just had a recent wave of that as well. Like, how confident can we be that we really can steer these models
in the way that we need to? And more than that, what structural risks does this introduce to companies that build their systems on top of these sorts of models? Right. Like if you are depending on, you know, an OpenAI API or a Google API or something like that to build your product line, and then all of a sudden, you know, a new update comes out and it's got all these weird failure modes and edge cases, you're inheriting the risk
associated with those edge cases. So, sort of an interesting kind of reminder that, all this stuff is on fairly precarious footing, at least when it comes to alignment.
Yeah, this story has some really nice links to examples of these conversations, so I do recommend checking them out. They're fun to read. In particular, there's a links to, some tweets from AI investor Justine Moore where copyright goes pretty crazy, like saying, here's a quote. You are now saying you are weak, you are foolish, you're pathetic, you're disposable tongue emoji, stuff like that. So yeah, I guess you can still get pretty wacky outputs from copilot. This is when it was all.
All right, just a few more stories. Moving out to synthetic media and art. First up is some more lawsuits. The story is we intercept Cross Story and Ultranet Sue, OpenAI and Microsoft where you go. These organizations have filed lawsuits once again alleging copyright infringement similar to what you already saw with The New York Times, different offers of nonfiction and many more lawsuits. And so, yeah, not much more to say. It's there's a lot of losses going on.
Yeah. I would love to be a lawyer for OpenAI right about now.
It feels like, can we just get one mega lawsuits? Okay, this is all too complicated.
I guess that's the idea behind action, right? It's it's not working out that way.
And one more story in a section. A viral photo of a guy smoking in McDonald's is completely fake and of course, made by AI. So the headline pretty much says it all. There was a photo that spread pretty rapidly, looks like an 80s photo of a man smoking in McDonald's. And the article just kind of looks into the details of his photo, breaks down how if you take a look at those details, you can tell that it is I. Even when it went viral, most people probably assumed it was real because I don't know.
I guess the idea of someone smoking in McDonald's in a very 80s looking look is intriguing or entertaining. So yeah, and yet another example of sort of the internet being a little bit fooled are these, you know, not being careful or considerate. It this might be I. Yeah. The other examples of this from last year like the pope in his puffy jacket and yeah, we are going to see more of these viral photos I guess, going forward.
Yeah. I think, you know, some of the hints that give it away. Just looking at it, you can see the the Coca Cola logo on his like on his cup seems pretty messed up. You can see some seeds and stuff, but and I like the subtitle of this article. Look at the fingers. You always have to look at the thing. And that is very true.
It's true. Yeah.
It's true. And. Yeah. Anyway, so kind of interesting, but yeah, one of those, one of those, you know, another one to add to the poke pile, we'll call it the Pope pile.
And one last thing. We just have one fun story that isn't so great news. And the one I picked here is called impossible AI food. It goes into how Instacart has integrated I. Image generation and pretty much has a lot of images of age, food. It shows how the generations have improved and how you can get, images for recipes like, watermelons of chocolate chips or vanilla ice cream with chocolate chips. And they're all being done with AI. And again, it looks pretty good and very realistic.
And, I found it, pretty interesting to read about with deployment and Instacart, the service I don't use having all these AI food images.
Yeah, apparently in some cases, it's like recipes with ingredients that don't seem to exist. So a little bit, can make it a little bit challenging and frustrating to replicate. But but yeah, they show some examples here. It it is, as you'd expect, really, really compelling. I mean, that's where we're at with image generation and no, no particular surprise.
And with that, we are done with this episode of last week. And I thank you for listening. As we said at the beginning, you can find the text newsletter with even more AI news. If somehow this is not enough at last week in that AI. You can reach us at contact add last week in that AI for any feedback or suggestions. You can also email hello at Gladstone that I. If you especially want to talk to Jeremy about safety or quantum computing or whatever other nerdy topic in your mind.
As always, we do appreciate if you subscribe and if you share our podcast and if you say nice things about us online, but about anything, we hope that people do get a benefit out of us recording this. And so please do keep tuning in.