¶ Intro / Banter
Hello and welcome to the latest episode of the Last Week in AI podcast, where you can hear us chat about what's going on with AI. As usual, in this episode, we will summarize and discuss some of last week's most interesting AI and use, and you can also check out our last week in AI newsletter at lastweekin.AI for a bunch more articles that we will not cover. I am one of your hosts, Andrey Kurenkov.
I finished my PhD at Stanford last year and I now work at a generative AI startup, and this week we do not have our usual co-host, Jeremie. He is on vacation, but we do have a wonderful guest co-host.
Hey, this is Jon Krohn. I think this is my third time guest hosting on the show.
I think so, yeah
But the previous two times I was co-hosting with Jeremy. So it's the first time that you get Andre and me, and I'm excited for this one, as regular listeners will know or people who heard me in those previous episodes.
I am a diehard listener to this podcast, and so it's such a treat to be able to come on here, in addition to spending as much time as I can every week listening to this podcast, I also host the Super Data Science podcast, which is, as far as we can tell, the world's most listened to data science podcast. And, that's more of an interview show than a new show if people are interested in that kind of stuff. I'm also co-founder and chief data scientist at a machine learning company called Nebula.
So, yeah, so that's me and Andre. Should we should we talk about the beers now? This reminds me of the podcast that I used to love, and it inspired me to get into podcasting. About data science was a show called Partially Derivative. And on that show, partially derivative. They always were drinking beer on air. And so they would open up every episode talking about what beer they're drinking. And so in today's episode. The sound of me opening, my beer, which is but we
have a bit of a twist here. This is kind of like, I don't think I even qualify as a millennial, but it's like a millennial twist on having a beer on air because we're drinking Athletic Brewing Company beers, both of us. And these are basically nonalcoholic. They have half a percent of alcohol, per beer.
Yeah. It just so happened that I had one open, at my desk and, John saw it and also got one out. It seems we're both fans. I wish we could be sponsored in. This would be an ad, but it's not. It's just some fun little detail for our listeners. Before we get going with I, we are enjoying some nice, like, beer to keep us hydrated as we talk for an hour and a half about all the latest I news.
If if I can drag this on for like, one more second. Andre. What flavor? What's the, particular kind of athletic brewing that you're enjoying today?
I got We Run Wild IPA, my.
Favorite.
Variant. And, yeah, you know, it's I'm not sure if you could call this healthy, but it's certainly better than soda or whatever else you want, and it's. I really like it, so.
Yeah, I think it's delicious. I have this, Emerald Cliff's, which is a limited edition one. If you can still get this online, I highly recommend it. It's supposed to be, I guess, like a Guinness. And it's absolutely delicious. I love it.
And just one more thing. Before we get into the content of a show and all venues, as usual, I do want to call out and thank a couple of reviewers. We got a couple new ones over the last few weeks on Apple Podcasts. There's al assume that said great podcasts and enjoys it a lot was lone Wolf are also saying great podcast and has some useful bits of feedback with I think Jeremy sometimes repeats some details. I say just because we say so much.
So thank you for the feedback. We'll try to incorporate it as we go forward and we'll far. And I guess, John, I, I try.
To I'll try not to be such an interesting one because if you paraphrase the important points, it can end up really making it easier for listeners to, to apprehend that, particularly important information. So it's a tricky balance to strike, I think, as a podcast host. But good feedback for sure.
Yeah. All righty. Well, let's dive into venues starting
¶ Tools & Apps
as usual with the tools and apps section. And the first story is about AI generated music yet again, the kind of one of the highlight trends of a year, it seems. And this time it is coming from 11 labs, 11 labs. If you have been following it for a while, you probably know is a leading source of, AI generated voice. You can give it text and get really good, high quality voice outputs.
And now they have previewed, they have given us a. A sneak peek of their work on AI generated music via some posts on Twitter or ECS. So, not quite out yet. Not clear when it'll be out, but they did post a few examples and the examples I listen to them really clean, really convincing results where, you know, we now have two big players. We recently discovered a new entry with UEO.
There's also now 11 labs, seemingly also entering into a space or soon to be entering into the space of music generation with yet another model that is like when I listen to it, it's hard to tell it's AI generated. So yeah, we are very much in the realm of generating full songs that are indistinguishable almost from AI music or human music.
With AI, Yoda was the vendor that you guys used a few weeks ago to change up your theme song, right?
That's true. Yeah.
I was surprised that you guys haven't continued to have different theme songs every week. I guess it would kind of take away from the branding of it.
Yeah, I don't know. I have a soft spot for our current theme. I kind of just like the sound of it, and it could be fun. Maybe. Maybe even outro music. I every time.
I do like your intro music, it makes me. It feels like I'm being lulled into calm by an AI system that is going to take over the world. And my brain, it's like you're just being gently, just relax, don't worry, AI is coming. It's going to destroy everything. Just relax, I love it.
Yes, it's a good, kind of prepare for the onslaught of AI news we are facing every week.
Yeah. And so this is interesting. This 11 lab story. I've just been digressing basically so far since I've been on. But, they're a big player in this generation space. They closed a $80 million series B three months ago at a unicorn valuation and at the Super Data Science Podcast, we've actually experimented with using 11 labs to clone my own voice, because we recently started doing these.
In case you missed it, episodes that happened once a month, and they recap the most interesting parts of conversation from the preceding, month. And the production team is like, let's make your life easier so that at least one episode of the month, you don't have to be involved. But in the end, it's one of those interesting things where you're like, I could do it, and it absolutely sounded like me.
But because to make the transitions between those conversations really good anyway, I've got a scripted and then I might as well just recorded it anyway. It takes like an extra minute. So yeah, one of these interesting things you guys were talking last week about how in generative AI, the chat application is kind of well-worn through, but there's lots of other areas.
It was GitHub, I think specifically last week. You're talking about that with where there's new ways that you can be using generative AI, but it's interesting how it can get you so close. But just be a little bit off of really, at least today, saving you time in reality.
And I guess that's just going to get better and better and better as we have more and more tools and we become better at, syncing them all together and linking them all together and kind of having a web of AI systems that act as agents, act as generators, and really do make our lives easier and save us a lot of time.
Yeah. I think it's, very much context dependent. So, like, we're not a place where I can replace everyone at everything, right? But as it would be talking about later, for 11 labs and voice generation, there's now a prominent example of audible where you do have audiobooks. Many of them, we'll get to this near the end of a show being generative
AI now. And similarly, I would imagine with music it won't replace popular artists, but for TV shows, for movies, for things where you might want to quickly generate a song, especially for indie or, you know, less, financially backed projects like podcasts, like podcasts. Yeah, exactly, exactly. So anyway, exciting news from other labs. And you can go to the link in the description, as always, to actually listen to the song. It's pretty impressive.
Next story. This one actually happened last week, but we do want to make sure to touch on it since we did last week, and it is about the mysterious GPT two chat bot I modeled appear suddenly and confused. Exports. So this was on the LMS, this chat bot arena, which we mentioned quite a lot. It's where people can, basically vote on which of our best chat bots and give a score. And it appeared and got. Pretty good scores. And everyone was wondering, is this from OpenAI?
Is this some kind of a new model they're working on? There was no details at all as to, what it's from, but it did have pretty impressive reasoning abilities and, did pretty well, although also had some weaknesses compared to GPT four. And all of this is happening, as we are waiting, OpenAI has announced that, next Monday. So next week, just a couple of days from when we are recording, they have some sort of announcement comping, on Monday morning.
So people have been speculating that it's search and OpenAI. I said, no, it's not search and it's not GPT five. Is this gpt2 thing related? Maybe it's some sort of extra small model that's really good and nobody knows. But, it's interesting that this happened and, yeah, it's a fun little story to be aware of.
You've had a lot of shows recently on over, well, not overtraining, but following the chinchilla scaling laws that you guys talk about on the show a fair bit. There has been a trend recently towards taking small models and training them a ton, with a huge amount of data beyond what would be kind of the economically optimal choice according to the Chinchilla scaling laws. But you do nevertheless get really great results. And so it would be interesting to see if this is in that space.
This is it's kind of I don't know if this is a shameless plug in. You can pull this out, Andre, if you want to when you're editing. But, the the team that created this, we had the head of that team, Professor Joey Gonzalez, on my show Super Data Science and episode number 707, and he talks about Elon Says a lot, which was new at that time. And yeah, it's a really cool way to be comparing models in a way that is, crowdsourced and leveraging real human feedback.
It's definitely the biggest, the biggest player for doing that. I think the most well thought through, website for comparing, chat bots in that manner.
Yeah. And these days, you know, as we see more and more announcements of new models, we always mention that with benchmark results, it's always a bit iffy. Right? You don't really know if a model was potentially trained on some data. You want to believe that people avoid all sorts of hacks to get the best numbers, but in some cases there have been some hacks or some at least weird ways of reporting numbers.
So losses and things like it are pretty much the best way to know model quality, or at least one of the best, better ways. And again, later on in the show, we'll have another interesting bit of news with open source as to this topic.
And it's more expensive, but definitely higher quality and more reliable.
And on to the lightning round with some faster stories. First up, Soundhound AI and perplexity partner to bring online LMS to and next gen voice assistants across cars and IoT devices. So Soundhound AI provides these sorts of voice assistants and have announced this partnership with perplexity. Perplexity being an AI driven search engine that has exploded over the past year. So that's about all the details we need to provide.
Basically, you'll be able to query perplexity and answer, complex questions. Interesting to see perplexity starting to do these sorts of partnerships and expanding beyond just your own platform, providing AI services to other platforms. Have you, used perplexity at all? John?
I actually have not. I've seen demonstrations of it and it seems like a cool tool. I am kind of embarrassed that I haven't. As I'm sitting here, I'm like, man, I don't have a lot to say on the store. And I really should have used perplexity by now.
Okay, well, now you have some homework. For your next interviews. Might want to do some research with it. Because, I played around with a little bit. I'm the same. And that I haven't used it actively. Some people say they use it every day, like Nvidia CEO. I will say I don't use it actively, but I have played around with it and it's cool. It's a cool tool.
I think, like as we've spoken about this for 30s more, my mind has kind of shaken off some cobwebs and reminded me that I think when I first heard about it, maybe a year ago, I did do some searches and I was like, well, it's like it's really nice UI, but it didn't scratch any actual informational need for me beyond what I could already retrieve very conveniently with either a Google search or a generative AI search with one of the tools that I was already using.
Next story stability AI, SOS, Gen AI, discord with stable artists, and kind of weird naming there from VentureBeat. But what it means. Is that you will now be able to use, stability image generation from discord via discord bot, similar to what Midjourney has been providing for a long time, and this will be a paid service. There's going to be a free trial, and then you do need to buy credits. Very much the same as with Midjourney.
That's pretty much the story. And I guess for people who love with discord in a way of generating images, another tool you can use now.
Yeah. I don't know why anyone would deliberately want that. The weirdest thing for me about that is seeing other people and the images that they're generating, or the content that they're generating alongside yours. So we would experience.
Yeah, it's it's good for communities. But as a, you know, individual making content maybe less so, but still fun to see these new developments.
It's the exact opposite of perplexity in terms of user experience. It's like perplexity has such a beautiful UI, and it's so thoughtful with the user experience and, yeah, with with doing everything in discord. It's the exact inverse. It's. But I guess that's kind of that's the deliberate play there. It's let's get as quickly as possible the models out to people and not focus on UX.
The interesting thing I think about focusing on UI, like perplexity does, is that it gives you the capacity over time to be improving. If I went to Back to Perplexity today, I might say, wow, you guys have really done a lot more here and it is so much more valuable to me. Whereas, you know, with a kind of discord experience, that's not going to change.
Next up, Apple will revamp Siri to catch up on its chat bot competitors, and this is paired with a second story that Apple is nearing a deal with OpenAI to put ChatGPT on iPhones. So they are apparently finalizing terms for a deal to have ChatGPT features in Apple's iOS 18. And alongside that, there also has been a major reorganization within Apple, with a major focus on generative AI and seemingly this idea of upgrading Siri with generative AI capabilities.
You know, we've mentioned it before. It seems like a no brainer. And I'm sure they'll get it going. Pretty seriously. And there are some announcements, you know, presumably coming in the next few months regarding this huge story.
And it highlights the interesting dependencies between some of the biggest players in Big Tech. The other one that comes to mind for me immediately is the way that Google pays crazy amounts of money to Apple to have Google Search be the default search
on the iPhone. And this similarly seems to me this kind of situation where surely Apple would prefer not to be dependent upon Microsoft or ChatGPT for this technology, but it's expensive, and there's very few talent on the planet that can get you to the frontier. Like just a handful of labs around the world are on generative AI, and so it makes a lot of sense, pragmatically, I'm sure.
However, with things like scrapping the Apple Car, which I was super excited about, I'd I'd love to even just see what that was going to look like. But things like that project being scrapped so that resources can be diverted to generative AI. You. I wouldn't be surprised if we're not too far away a year or two, maybe from Apple being able to do these kinds of generative AI projects on their own.
And I would bet that they will have a huge focus on security and, yeah, reliability, probably more so than being right off the cutting edge of capability. I imagine Apple will be a little bit more conservative in trying to make sure that they have. Yeah, the most secure, the most reliable outputs.
And just one more story from a section. Alibaba rolls out the latest version of its large language model to meet AI demand. So this is from Alibaba Cloud. And they have released the latest version of our model Quen 2.5. Some benchmarks say that it's compatible with GPT before in some capabilities like language and creation. And Alibaba does say that there has been over 90,000 deployments by various companies.
So as always, be mindful that there's a whole ecosystem over in China of people competing and trying to create ChatGPT models and on to applications
¶ Applications & Business
and business. And the first story is open AI and Stack Overflow partner to bring more technical knowledge into ChatGPT. Just last week, we talked about how OpenAI has been making a lot of deals with news publications like the Financial Times, and now there's been this announcement of OpenAI and Stack Overflow making a deal. Stack overflow is. Has been kind of a leading website for conversations regarding programing and just technical details.
A lot of people go over to ask questions of like, how do I do this, I have this bug, etc., etc. and as a CSS student from like a decade ago, this used to be a site you went to all the time, right? They have a lot of data, and I'm pretty sure ChatGPT and all these, our models have already been trained on that data. Kind of. Anyway, so there's now a deal for OpenAI to have access to the StackOverflow API and be able to use feedback from, that community to enhance
their models. And StackOverflow will be attributed within that ChatGPT. Interestingly, similar to how ChatGPT will be attributing to things like Financial Times. So yeah, another play by them, another potentially way to be able to overcome the limitation of having, knowledge cut off in these models. Right. That's one of the key limits of GPT is it will always be trained up to a point in history of like April 2024
or whatever. So I think these sorts of plays speak to that where we have integrations of new sources and, you know, these kind of forums regarding different question and answer topics. These tools can still be useful for things that come out after retraining data.
Yeah, I think I'm probably a bit older than you, Andre. And it's interesting that when I began programing, the only way that I could get information on how to improve the code I was doing, or even just learn things about code, was from a book.
Yeah. That's, from a while ago. Yeah.
So this is in an era where we were just starting to get dial up internet at home, and we definitely did not have internet at school. So we would have PC labs, and you would go in and they would have the class textbook, and you'd learn C plus plus or Java or whatever, and you were just limited to your teachers knowledge, your classmates and knowledge and whatever was in the textbook. So definitely a different era there.
But then the internet allowed us, of course, to be able to do Google searches and find information originally more so in the documentation, kind of the official documentation for a given programing language. But then StackOverflow became invaluable. It became the default, where I would get some stacktrace and grab the error at the end of that trace, paste it into Google. It takes me to the Stack Overflow, and then I can just copy whatever the solution is without even having to think about it.
And to underline the importance of Stack Overflow. It's interesting how. Gemini from Google Cloud, also inked a deal with Stack Overflow in February. And you can see why if you want to be getting great coding suggestions, Stack Overflow is got to be the best place on the internet by far. That is 90% of the time or more where that stack trace, copy paste into Google takes me. And now it is really interesting.
That is an example of something we were talking earlier in the episode about tasks like my voice being simulated for, an episode of mine once a month where that didn't quite meet the mark. Generative AI for me. Now, Claude three opus is where I go every single time when I have an error in my code, because it is so brilliant.
In the same way that if you're familiar with doing that in GPT four or I haven't used, GitHub Copilot myself, and I'm sure it's the same kind of situation where you get great feedback right away. And the thing that is different about doing that with a web search is that it is contextualized exactly to your code, so it can just rewrite your code for you. You don't have to change any variable names or anything.
It's true, I think for non-programmers listening, I'm sure some of you, it is very much the case, I think, like there have been stats for Stack Overflow usage or visits have dropped by like 40% just because ChatGPT and tools like it have pretty much surpassed the usefulness of things. So now going into a future Stack Overflow will be mainly useful for things that don't exist yet, for dealing with new tools and stuff like that. And just one more note on the story before moving on.
I think it is worth highlighting that an additional story that came out after this ad was that Stack Overflow has actually been banning users who have, in a sense, rebelled over this partnership. Some users have started trying to delete their answers to prevent them from being used to train GPT three. And, yeah, Stack Overflow has tried to prevent that by banning users.
So as we've seen often with these sorts of deals, to some extent with Reddit, not everyone is happy to have their work, their, you know, contributions be used to train AI models.
It's an interesting problem where I suspect the terms and conditions of using this AI to give them the right to do that, but nobody reads those. And then when things change, on the other hand, it's interesting to think we have this. People talk about us running into, what's it called?
Model collapse, where if we start losing high quality real sources of data like Stack Overflow, if people stop contributing to Stack Overflow because they can just do everything in generative AI tools, does that mean that we're going to have a big, negative impact on the quality of generative AI models in the in the future, such as they completely collapse? Andre, I don't know what you think about this, but I actually I'm I'm relaxed about this.
It seems to me like the AI systems, their ability to evaluate their own outputs. For example, I just said that my favorite generative AI tool of this time is called three opus, and that uses reinforcement learning from AI feedback as opposed to from human feedback. And that seems to be part of how it's been able to outperform GPT four and Gemini. And so, I don't know, it seems to me like we're we're figuring
these things out. And we the the folks at the Frontier Labs, seem to have really clever ways of distilling down to the highest quality training data one way or another, whether it's AI generated, human generated. I don't know if model collapse I'm relaxed about, but what do you think, Andre.
I agree, I think it's, you know, there's a reason to be concerned because the internet is starting to get flooded with spam. AI generated spam as we've covered many times. But to your point, I think we also these frontier labs have, you know, the most brilliant people basically. Right? And they've been doing this for a while now and getting things like cloud free alpha ground. So I think it's a problem that exists and it's also a problem that is solvable.
And the bigger problem, I guess, is the end of data that is being approached, and where there is kind of an open question of, you know, how much, how much further can we push your systems and onto the next story? New Microsoft AI model may challenge GPT four and Google Gemini. So Microsoft is building their own large scale language model that apparently is codenamed my. One. And this one will have 500 billion parameters approximately.
So this is much bigger than what Microsoft has been releasing with Fi, kind of smaller large language models. This is more on par with things like GPT before. And I guess it it is interesting because we know Microsoft has a partnership with ChatGPT. They have been integrating OpenAI into Azure. ChatGPT. Rather into Azure. They brought on, Mustafa Suleyman from infection, where they trained a large language model, and it seems like he will be heading this development.
He also was, of course, at DeepMind. So it seems like a way to kind of de-risk things from the Microsoft side, where we don't have to rely on OpenAI to have their own large language model. They will have it and of course, will be able to compete with Google and Meta as one of the players that is able to develop these kinds of systems.
Yeah, to allow our audio only enjoyer. So people who aren't checking, I guess the this stories is you kind of I wonder sometimes as I'm listening to last week and I have my iPhone open in front of me and I'm scrolling to be, you know, keeping an eye on the story names. And so, if you're not doing that within phonetically, this model is my one, but it's spelled M, I hyphen one, for people who want to like track that for later on. It's not like a.
Microsoft I am. Yeah, yeah.
Yeah, yeah, exactly. And, it is interesting to have Mustafa Saleem in there. Previously Google DeepMind co-founder and. Yeah, then CEO of inflection, which seemed to be doing really cool things. And I haven't been following that story very closely as to why it seemed to suddenly just everyone who was major there has gone on to Microsoft, and this seems to be what they're working on, maybe because of the kinds of things that you guys are often talking about on this podcast related
to. If you want to be at the frontier, you've got to have access to a ridiculous amount of cutting edge compute and inflection. Just might not have had that. And so they figured, you know what? Let's go over to Microsoft and do it there if they've got tons of compute. And the interesting thing you already highlighted
this. I'm, I doing the thing that that, reviewer explicitly asked us not to do, but I maybe with a little bit of a twist or a little bit of additional commentary, which is that it makes perfect sense to me that Microsoft would be trying to do this stuff on their own without requiring OpenAI, because they haven't acquired OpenAI. So they've taken a huge stake. But this isn't their IP. OpenAI can make deals with Apple like we just talked about earlier in this episode.
And so Microsoft would, you know, you could imagine a scenario where if somehow Microsoft had created GPT 3.5 and released a chat GPT tool, something of that quality before OpenAI had done it, that would have been an even better outcome for them than needing to invest in an an outside party and get the models from them. So, yeah, you could yeah, you can totally understand why they would be doing this kind of thing.
Until the lightning round. And we began that with, raise of $1 billion. And that is from London based AI startup wave, which makes AI systems for autonomous vehicles. This was led by SoftBank, who has thrown a lot of money around, they previously raised around 300 million. So seems like a pretty big round. And it's interesting because we have fewer and fewer players in the self-driving space. It seems like Waymo, cruise, Tesla are really kind of leading the
pack there. But wave now has a billion of dollars, so they'll keep working on it.
Yeah, I don't have anything else to add to this story other than it's interesting to me that you have two of the biggest players in autonomous driving named Waymo and Wave. Three of the five letters are identical to the starting three. I don't know, it's just you think you'd want to differentiate a bit.
Yeah, well, it's a fun detail. And the next story also about self-driving cars. This one, on the opposite end about a setback. So motional has delayed commercial robotaxi plans and made restructuring. The startup motional is a joint venture between Hyundai and Aptiv, and they are apparently pausing their commercial operations and delaying the launch of the driverless taxi service until 2026 due to restructuring. There will be layoffs. It doesn't. We don't know too many details right now.
They have some autonomous taxi rides and. Las Vegas and deliveries in Santa Monica. Those will be halted. So, yeah, seemed like a pretty significant setback. They, did get an investment of 1 billion as well from Hyundai. So they will be continuing to be active and working on this. And perhaps this is kind of a strategic thing. Hard to say. But regardless, another player in the space trying for the same thing and I guess delaying their plan to try and launch until 2026.
Yeah. I don't know exactly what's behind this. Another setback in autonomous vehicles, but you could imagine that with was it GM that recently had the big issues where they were hired of cruise security? Yeah, exactly. And so you could imagine at the board level or something at Hyundai, them saying, you know, these kinds of things that we're seeing happening at our competitors like GM, there's too much risk.
We've got to make sure that this is really buttoned up, or let's see what happens with Waymo and the regulatory situation before, you know, we take more risk in this kind of situation. I don't know exactly what's going on, but you could imagine something like that.
And one, story in the section, The Rise of Chinese AI unicorns Doing battle with OpenAI. This is a bit more of a summary story, not anything particularly news related, but interesting to me. It, kind of highlights how where now for Chinese AI, startup, GPU, AI, moonshot, AI, minimax and point one that AI, all of which I'm sure we've mentioned in previous episodes, and all of them have surpassed $1 billion in valuation and are competing with things like OpenAI.
Some of them are not necessarily working on frontier AI models. So moonshot, for instance, focuses on digital assistants to summarize text for students and office workers. Minimax is targeting Viv, targeting the gaming market with anime themed characters point 1 or 0 one that I, of course, has been developing AI models. So yeah, they are big players in China. We may not hear about them a lot in US media, but, definitely interesting to kind of keep it in mind that this is happening in China.
The only thing I have to add here is that the thing that got the biggest chuckle from me on your episode last week was you saying, Andre, that China, that the country did not release this model. You see, that's all that China releases.
Yeah. That's awesome. Use me.
¶ Projects & Open Source
And now on to the projects and open source section. And the first one is, Paper Prometheus. Do an open source language model, specialize in evaluating other language models. This is, a collaboration between six different institutions. So Kaist, AI and MIT, various universities, algae research. And as per the title of the paper, this is related to evaluating language models. So they claim that they figure out a way to closely mirror human and GPT four judgments.
Basically, you can have, high correlations. And unlike Confucius, one which were released last year, they focused in that previous model on basically giving a score one out of five here. This one is greatly expanded. So it can do pairwise ranking, which is what this, arena does. It lets you compare two different answers and pick which one you like better. And as with Prometheus one, there is a lot of infrastructure that goes on this way.
Just highlight how you kind of have user defined evaluation criteria. That is how they had scores 1 to 5. And in this paper we go a lot into how they expanded it. So now support also pairwise rankings. Have they also have an expanded dataset with a lot of examples of pairwise ranking and also verbal feedback, of outputs. And as per a title, once again they released some models, code and data, all open source.
So pretty promising effort I think, to help with that evaluation challenge where you can go beyond benchmarks, use these sorts of labs to evaluate, you know, which I think is a pretty, standard approach that people are doing.
I can't understate the importance of innovations like this. As someone who manages a team that develops limbs, fine tune XLA labs for particular use cases and then deploy them into production, it is. You know, you talk about on the show all the time, the issues with benchmarks, those are so flawed that it seems like you've got to be looking to something beyond that, like the chat bot arena that we were talking about earlier, put out by this, interestingly, and in a parallel.
So the same, faculty member at Berkeley, Professor Julia Gonzalez, who was, one of the faculty members behind that, chat bot arena also in March of last year, in March of 2023, was one of the faculty on the release of vacuna, the, the model that was a fine tuned llama model and it out competed llama and what they did for their evaluation, they said at that time it was like only partly scientific, where they were using GPT four to compare side by side,
let's say, how does our vicuna do against llama? How does it do against GPT 3.5? And so you have these head to head tests similar to what humans are evaluating in chat bot arena. You allowed GT4 to do it, for the vicuna evaluations and for us at our company, Nebula.
That is what we do most of the time because having people either your own customers who you might annoy if you have them, kind of doing evaluations, if you bug them with that kind of thing all the time, or even if you try to internally say like, you know, let's ask our business development team or our product team to be evaluating this model that we built and compare it against GPT four or compare two different versions of our model.
But this gets it is so unscalable to ask humans to evaluate your models, because there are so many situations as a data science team on a daily basis, even as we're checkpointing a model, as we're training it, we want to know, are we overtraining and are we starting to overfit? Potentially. Or have we reached a point where this model that we've trained is, is is mature? And so these kinds of computational pairwise evaluations, to be able to compare two models head to head is so important and is.
Is the future of this space, if it isn't already today. And so I really appreciate, folks like the researchers behind Prometheus to developing the specialized model for evaluating other language models. And you can bet that our company will be using this right away.
And exciting. Good to hear that. You will benefit from this next paper. And this is now, and yet another big lamb being released and pushing the frontiers of what release models can do. The paper is deep seek V2 as strong, economical and efficient mixtures of experts. Language model. This is, much bigger than the previous release from deep Sky. Deep sea was, previously 67 billion parameters. This one has 236 billion parameters. But as with other mixture of experts models.
The reason they say this is an economical and efficient model is that, those many, many parameters of a neural net, only 21 billion parameters are activated for each token. And so when you look at a graph, we have in this paper of optimal activated parameters versus performance. Now they claim kind of a top spot in terms of four of
a number of parameters. You get really good performance on the ML you benchmark, which is kind of one of the leading ones that people still use to claim very really good. So it's roughly on par with Lama 370 B while using way fewer parameters. And there's various details on how they do that. We use an multi-head latent attention mixture of experts. They train on 8 billion tokens, with various tweaks, and they actually released like a 30 page paper summarizing a lot of
details. So to me, a pretty exciting paper, just because a lot of this is still a dark art, and this has a lot of details on how it works, and I do release the model checkpoints on GitHub.
Yeah, I think I'm not 100% sure what I heard you say there, but it's, 8 trillion tokens. I think I might have heard billion.
A trillion. Yeah, yeah.
And yeah, that highlights. It goes back to what we were talking about earlier with this overtraining. So this rumored, whatever it is, I guess we might find out on Monday what that, GPT two kind of chat bot variant is. And if that ends up being a small one, that's, like, trained on a huge number of tokens. This is another one of those scenarios where you get so much out of a model by, going far beyond what the chinchilla scaling laws would recommend.
And, another interesting thing here is that the chart is you're saying, you know, the ML, you asterisk already there in some world where ML you is a perfect, indicator of how great a model is. It is interesting to me that for deep V2, they have done exactly the same chart. Which you can see in the paper. It's figure one in the paper that's linked to this episode. They, they created the exact same chart as Mistral did for the mixture of eight by 20 to be release.
And the point that Mistral was making was, hey, look at us like we occupy this top left corner of this chart where you have activated parameters on one axis and performance on ML, you on the other axis. And these two mixture models eight by seven B and eight by 22 B occupy the top left corner and then deep seeded deep Secret comes along and says, F you guys, we're going to occupy an even further corner in the
top left. And so yeah, it's yeah, it's interesting to see how people position themselves relative to each other. It does make a lot of sense. It would be nice if we lived in a world where I don't know, you was 100% reliable. Because then this kind of release would mean that you have a no brainer to go for if you want to have the best performance with an open source model simultaneously without the cost of running A37 DB in production.
That's right. And, one more thing I'll mention. They do have that economical detail in the title of the paper. And that's because in addition to this efficiency, they also do say that they save 40% of training costs, reduce memory usage, and increase generation time throughput by almost six times. So lots of tweaks to their previous approach. Lots of nice improvements. And, overall seems like a really strong model unto
the lightning round. First up, Open Voice V2 evolving multilingual voice cloning with enhanced style control and cross lingual capabilities. I think we're gonna probably want to go quick here, so not too much to add to that. This is coming from MIT and Michelle that I and Shanghai University and, yeah, it's, really nice model to be able to do voice cloning in multiple languages.
Nice. Maybe you'll be using that soon alongside having it last week in AI theme song at least. Maybe in your outro you'll have you could be spitting out last week in AI in Spanish, French, Chinese, Japanese and Korean.
That is something we'll mention. Actually, translation is a big use case for these kinds of things. Yeah. Next up, Granite Code models, a family of open foundation models for code intelligence. This is coming from IBM. You can measure them too much, but they do a lot of stuff in AI. And with this release, they trained these models on 116 programing languages, and they compare to a whole bunch of other open, code generation models like Code Llama.
And as always with these new stories, it's the best. And they do release it under a super open license, Apache 2.0 for both research and commercial use. So, yeah, lots of progress still being made on lamb specifically for code. And next we got Huggingface launches a robot open source robotics code library. So Huggingface, made some news earlier this year, just a couple months ago by hiring from a Tesla.
And the first fruit of that effort is now coming out with this open source lab robot package that they say is basically meant to be sort of like where Transformers package Transformers is a package that is used very, very widely for doing work in natural language processing with, with transformer architecture. And they want to position this as that type
of package. But for a biotics, where it's a toolkit that is a comprehensive platform that has, you know, support for getting data sets, for simulators, for different types of robots, for, training models, for having pre-trained models, just lots of stuff. And, yeah, it's someone who has done work in robotics during my PhD. I think it's pretty exciting. We really don't have this sort of unified, kind of primary package. What people build on there is kind of a
mishmash of different ones. People have sort of tried to do this over years and hugging face, getting into that ring and releasing this, I think, has a pretty strong chance of supercharging AI research for a lot of organizations.
Nothing makes me more excited today than this emergence or this, this transfer of large language models which have over the past few years, made unbelievable impact in software, then emerging into hardware. The more that we see robotics being able to be infused with AI lens and these kinds of open source projects make that easier and easier. In the past several months, the single piece of news that has impressed me the most is out of covariate.
So Peter Abeles company where they released RFM one Robotics Foundation Model one, and the specific idea behind that, similar to the robot, is to be bringing LMS into the physical world. So autonomous vehicles being another place that you could imagine LMS and AI being able to make a huge difference in the physical world with robotics, that there's eventually going to be an infinite amount of, of real world capability infused by AI. It's more expensive, as you guys say, on the show.
Time and again, hardware is hard. So it happens a bit more slowly than software. But when you're working with actual real world physical things as opposed to just bits, there's, you know, a lot of potential there. And I'm super excited.
And the last story for the section last week, we had no open source stuff. This week we have a lot, which is kind of cool. The story is vibe evolve and new open. And heart evaluation suit for measuring progress on multi-model language models coming from RAC. AI survey release 269. Ultra high quality image text prompts and ground truth responses. designed to be difficult. So on the 50% of a hard set, all frontier models fail to arrive at a perfect answer, leaving a lot of headroom for progress.
And they say what this is meant to compliment u MoU is, multiple choice, question. And here they have like a golden answer. That is the ideal answer. And they have an evaluator rigor core that gives you a rating from five. So kind of similar to Prometheus in a way, but for specifically multimodal AI rather than a just chat bots and yeah, exciting. Again, lots of work being done on evaluation because of just something that is just needed.
¶ Research & Advancements
And now on to research and advancements. And we start of course, I think we have a big news of the past week and that is AlphaFold three. As is so often the case, the major research news in the media, and I think kind of fairly also just, a mock the community is coming from DeepMind and this one is AlphaFold three, the, next generation of er, AlphaFold models. So in the past we focused on, protein synthesis,
and analysis. Now they expand it to other capabilities within the kind of the biomolecular, interaction analysis. So this is going way beyond my knowledge base. But just reading out of the abstract, it is a model that is capable of structure, prediction on complexes, including proteins. And you click acids, small molecules, ions and modified residues. They say that it significantly improves accuracy over many previous specialized tools.
So far, greater accuracy on protein ligand interactions and protein nucleic acid extractions. Again, I don't really know what that means, but probably pretty cool. And the sort of exciting bit from an AI researcher perspective is they do this by kind of simplifying their approach. So in the paper it passed in AlphaFold two. It was a pretty complicated architecture with some very kind of custom built parts for the task.
It wasn't the sort of general purpose model, as we see with foundation models, typically it's just kind of a transformer have nothing specifically built for the task, and that's why they're able to scale and do lots of things. So in the paper, we're going to how we essentially have simplified the architecture, trained it on many tasks. And despite removing these things that are, you know, meant to make, the model do better at a specific task, we're still able to train it up to be about as good.
One more detail. They do say they will not be open sourcing this unlike we did with AlphaFold. This is done in collaboration with, the kind of more commercial, arm of alphabet Isomorphic Labs that is in partnership with DeepMind. So it will be interesting to see if they try to commercialize it. Whatever. But yeah, exciting news once again from the mind doing research on AI for science.
Yeah. Google DeepMind. Up until the ChatGPT. Hubba hubba hubba. Woohoo! That's not really a word, but I think if you know the word I'm trying to say.
Yeah, yeah, yeah.
Until that happens, I think it's safe to say that Google DeepMind was unequivocally the world's leading AI lab. And their approach to I was different. So folks like Ilya Sutskever at OpenAI said, I think scaling up is, in and of itself, going to have all these emergent capabilities. Let's just take this transformer architecture and ten x, the amount of data that we're using, the number of artificial neurons in the architecture, let's 100 exit, let's thousand
exit. And that leads from GPT two to GPT three to GPT four. More and more emergent capabilities. And so they're kind of taking that engineering focused approach of let's take this attention mechanism and scale it up really, really large.
See what happens. Google DeepMind might have been surprised by how effective that scaling approach was, that engineering approach, because what they were doing with things like AlphaGo, through to AlphaZero, they were systematically creating they're trying to make their way towards artificial general intelligence.
So an algorithm that has all of the learning capabilities of, human by starting narrow like a single game, like go and then saying, okay, let's see what we can do to have an algorithm that can not only beat the world's best go players, but also the world's best chess players and the world's best shogi players. And then a year later, let's add in Atari video games alongside that. And so their approach was to, you know, chip away generally.
Generally, generally adding more and more capabilities until eventually you reach this AGI algorithm that can do anything. And so that seems to be the same kind of approach that we're doing here, where with AlphaFold, AlphaFold two, the kind of the press headline at that time was that they solved protein folding, at least for some types of proteins.
And so similarly here, DeepMind has taken that same kind of approach that we saw from them, going from a single game to multiple games to games in completely different kinds of modalities. You know, adding in board games and video games here, they started with proteins, absolutely crushed that and then said, okay, well, let's see what we can do with other kinds of sequential molecules. So proteins are made up of strings of amino acids.
And here they've said well let's see what we can do with strings of nucleotides which form DNA or RNA. And then that gives you so much more training data, a lot more diversity of data. And just as we saw with the shift from, say, AlphaGo to AlphaZero.
Adding in.
More games, instead of having the kind of catastrophic forgetting that plagued early attempts at generalization, we're seeing more and more that cutting edge labs like DeepMind in particular, are able to take these different, kinds of data, these different modalities, and be able to have a single algorithm that outperforms so and so AlphaZero, which could play more games than just go, was better at go than the specialized
AlphaGo model. And similarly, here with AlphaFold three, we're seeing the same kind of thing where AlphaFold three is outperforming AlphaFold two, which was already the state of the art at protein folding by taking into account other kinds of information, by being able to model other kinds of data like DNA and RNA.
And then in this case, unlike, you know, there's no obvious, at least immediately, like maybe as you get closer to AGI, then there's is some kind of way that you can think about how being good at chess would make you better at playing pong on Atari or something. But interestingly here with this innovation, there is built in these interactions because in a biological system, DNA is interacting with RNA, which is interacting with protein.
And so building right there you have yeah, this generalization providing quite a bit more utility. And yeah it's really exciting to see.
And now on to the next paper. Very different but also. Very exciting. It is. Ex Lstm extended long short term memory. This is coming from a variety of people organizations in Austria and with the last offer being Sepp hoc writer who is the developer of Lstm. So short history for people who may not know LSTMs but know
Transformers. LSTMs are a form of recurrent neural nets, and they were sort of the main thing being used in natural language, basically until transformers for a couple of years and they achieved a lot of cool things. But architecturally, they were sort of abandoned because they are kind of tricky to train and tricky to scale up compared to transformers. They have some of these inherent limitations
architecturally. And so this paper, zdm is presenting a variation and extension of LSTMs that aims to fix that. In a sense, it introduces some pretty fancy new, modules. So just to get into a little detail, they have memory cells of LSTMs that have memory mixing and exponential gating. They also have ML, LSTMs that, add the ability to do more parallel training. A lot of kind of nitty gritty details where this, extension method would take probably half an hour to get through.
So unfortunately, we won't be getting into a lot of details. But the end result of these architectural improvements is that at least in the experiments on this paper, they show pretty comparable characteristics to transformers in training. So we're training at 2.7 billion parameters on a bunch of tokens.
They show kind of V golden outcome, which is you can have a scaling law that shows that as you go from not super many parameters to a ton of parameters, you have a smooth decrease in perplexity on next token prediction. And in fact, when I compare the characteristics of ACM to things like Mamba and Lama and other architectures of neural nets that do language modeling, in this paper they have a best, performance at every sort of scale of model.
Very much related to what we've been seeing with Mamba as far as exploring alternatives to Transformers that incorporate some of your ideas from recurrent neural nets, to Transformers. Attention. All of that, and potentially a big deal. I mean, the results are pretty impressive. And this is coming from one of the pioneers of natural language processing, neural nets. So generate a lot of discussion and will be very interesting to see if people continue to build on it.
Yeah, for sure. So, two German researchers, at least one of them, Hope Ryder was involved in this paper. But Hulk, Ryder and Schmidty were first published on LSTMs in 1996, which is wild to think about where we were at that time in terms of neural networks scale. So that's the same era as Le Net five, which was the first ever commercial application of a neural network.
And so that was Yann LeCun, Yoshua Bengio and others at AT&T Bell Labs at that time, who created this one net five architecture that was commercialized for the US Postal Service in order to be able to recognize handwritten digits, for zip codes. So postal codes in the US and so be able to add some automation to routing of mail. And that was kind of the peak of neural networks for a long time. We then went into an AI winter, where to get funding for neural network research.
You pretty much had to be in Canada. No other federal government was, supporting that kind of research. Until the AlexNet moment out of again, Elliot's. Let's give her Jeff Hinton, working on that AlexNet paper and Alex Shevchuk, of course, the third author on that paper and the namesake of AlexNet, that in 2012 brought kind of deep learning into this, into acceptance that not only is it super powerful, once you scale up compute wise, but it's also highly generalizable.
And so anyway, so a little bit of a history lesson there to say that this kind of architecture, the Lstm dates back to before that last AI winter, that last deep learning winter. And so this kind of approach generally from Hulk writer and Schmidhuber has been around for decades. And it's interesting that they have been able to with this paper make changes because. If you tried to use an Lstm alone before we had transformer architectures, in that Dasani at all. Paper attention is all you need.
Before that, LSTMs were kind of the best way, the best known way of trying to link many words together in a sentence or in a paragraph, and be able to make linkages between those to be able to have some kind of attention. The transformer completely blew the Lstm out of the water such that I'm not aware of commercial applications of it today, but hell creator Schmidhuber, other researchers have continued to push along on this Lstm idea, and it's interesting to see that we're now getting to a
point, at least according to this paper. And again, as you already stated earlier this episode, as usual, it is the best on the benchmarks that they that they just selected. And so it's hard to know exactly where it stands. But, yeah, I mean, even if it's performing comparably to Transformers or to, to mama, these other kinds of state space models that are coming out now, it is exciting because
it would be great. And it wouldn't be surprising if in the coming months, in the coming years, we're able to get architectures beyond the transformer that allow us to have that same level of attention capabilities that the transformer has without the really heavy compute.
That's right. And there are you know, the results are really impressive, are some limitations worth noting. And they do not appear in the paper. So some of the components of this architecture, the system is not parallelizable, parallelizable computationally due to some of the details of how it's implemented. They say they do have an efficient Cuda implementation with optimizations.
But still, when you deploying these things at scale, when you're trying to optimize the cost of compute, something like this may not necessarily be the best, but at the same time, contrary transformers, they this kind of architecture has constant memory complexity as opposed to for dramatic increase in usage with respect to the size of input.
And that is similar to things like Mamba that just have these better scaling characteristics in terms of how much compute you need as you try to input, you know, a crazy amount of text, so will remain to be seen whether this is like a new mamba and everyone starts working on this and go crazy, but definitely exciting in terms of still seeing potential successors to Transformers, at least in some context.
Yeah, we're really trying to have our cake and eat it too, aren't we, Andre? With this kind of research where you're like this extended Lstm, x Lstm, you're like, hey, on a on a relatively small scale scale, we're able to get way better performance than a transformer on memory. But the transformer, one of the beautiful things about it is that it does scale so well. So that architecture was designed specifically to be parallelizable across many GPUs from the beginning.
And so yeah, trying to have our cake and eat it to not use too much memory, be highly parallelizable, get the same kind of attention that we can get with Transformers. But, you know, there's so many labs trying to do this kind of thing, and it is so important in order to make AI be something that is broadly accessible and not super expensive, and not requiring more and more and more nuclear power plants to be set up, to run AI centers. So, yeah, really cool.
And just one last thing I'll mention is they do say in your paper that conceptually, zdm is somewhat clustered and related to things like retention, RW, V, and EGR into various groups have been exploring the idea of making RNNs work, at scale. And this is an ongoing topic of research, so we shouldn't make it seems like this is like the one and only attempt at this. It's been going on for a couple of years and there's been some promising results already.
But this is exciting, in part because it's coming from a pedigree of original inventors of science until Lightning Round. And the first paper is story diffusion consistent self-attention for long range image and video generation. They present a semantic motion predictor that in particular helps with consistent video generation. And they also have examples of kind of multiple image
generator shows. They kind of present an idea of a comic book where you have a single character that is consistent across, the frames of a comic book. And this is one of the challenges with, image generation is, you know, if you say generate X, you will have a different outcome every time. And there has been many papers on how to make consistent outputs. For, you know, creating comic books, creating illustrations, whatever.
And so this is presenting a new way to do that and has really nice looking smooth videos. And, these are comic book panels. So again, you can go and check out the link for those example outputs. I wish I had the time to actually edit videos to add all these, example, images of what's happening here, but pretty cool bit of research here. Next up, chain of art empowers Transformers to solve inherently serial
problems. This is more of a theory paper that honestly would take me a while to understand, but the gist is that they show that, kind of theoretically the idea of chain of art, which is, as you mentioned many times, prompting your language model to list out its reasoning steps before providing an answer to a given query. They show that this general approach for a certain category of problems that inherently require
serial computation. So multiple steps of thinking enables transformers to solve those types of tasks that if it were not doing, chain of art, if it were doing, just an immediate output of whatever answer you want, it just would not be able to get it right versus for these types of problems, it could get it right. So for people who enjoy reading dense theory papers, I'm sure this is an exciting one.
An analogy that I sometimes make and I don't know, there's probably theoreticians out there who would say, there's all kinds of mistakes I'm making in what I'm about to say, but I think of chain of thought. Prompting is a improvement on the kind of fast thinking that humans do. So there's a famous book by the economist Daniel Kahneman who actually recently passed away. And so his most famous book is called Thinking Fast and Slow.
And this is based on decades of experiments that him and Amos Turkey did. And so these two thinking systems, there's, system one, which is fast thinking, which is what we're using all the time. I'm using it right now. Words are just coming out of my mouth. And I'm kind of just hoping that the next word that comes out is useful to
some listeners out there. Whereas the slow thinking system to this is where you're sitting with pencil and paper, and you're making sure that, you know, you've got all of your assumptions, right as you work through a mathematical problem or you're writing some computer code. And in in order to make this analogy very easy to understand, as you say, learn how to drive a car. You start off with that slow system to thinking where you're like, okay, foot on the brake, turning the, the key
in the ignition. And now I'm, you know, going into reverse and pressing decelerate. You're having to to do all this very deliberate thinking. Whereas once you have learned how to drive a car, you're listening to the last week in a podcast as you're driving and you're having a conversation with a friend and drinking out of your coffee all at the same time, and so it moves from this, your own brain becomes adept at shifting things from the slow processing to the fast processing chain of thought.
Prompting like this, to me, is like making the most of that fast thinking. You're prompting the generative AI model, which is just predicting next token like you are when you're thinking, when you're speaking in real time or thinking in your mind, you know, in language, in real time. And so chain of thought prompting is like making the most of the fast kind of thinking that humans do.
What I'm excited about, in combination with this kind of chain of thought, prompting and generative AI is approaches like alpha geometry as well as Q star, which are specifically designed to be much more like the deliberative thought that you have when you're doing math, when you're learning how to drive a car, when you're computer programing.
And so that kind of, that kind of slow thinking in combination with fast thinking, including chain of thought, fast thinking could be a really interesting combination towards realizing AGI.
And just one more paper. Also a bit more of a theory paper with a bit more of an applicable front, and one of generated a lot of discussion. It is k a n co merger of Arnold Networks. And again we'll go a bit quick because it is sort of dense. The gist of it is of a present, a whole new paradigm to how you build your neural nets. So typically the idea of neural nets is to have at the very at the most basic level, multi-layer perceptron.
ONS where you have some weights, on edges between neural network units, where the units do some function on a combination of its inputs.
And this paper presents a whole new alternative that basically puts that paradigm on its head, where you have functions on the edges and the nodes just sum up the outputs of those inputs, and they go into a whole lot of detail of kind of, conceptually why this has some advantages in terms of interoperability and, the potential expressivity of this sort of a whole new paradigm for how you build your neural nets.
Again, a bit more theoretical in the sense that probably people will not adopt this at scale, I suspect, but interesting if you like, these sorts of very conceptual papers. And onto to a policy and safety section
¶ Policy & Safety
that with the first story being. US lawmakers unveil bill to make it easier to restrict exports of AI models. We've been talking a lot about export controls for GPUs and hardware to China in particular. That has been in the US, and now a bipartisan group of lawmakers unveiled a bill that would actually make it easier for the Biden administration to impose export controls on AI models, make it hard for software to be shared, essentially.
And if approved, the measure would remove some roadblocks to regulating export of open source AI. That apparently there's already some precedent for that, ability by the white House. And there's some more details in the story. Interestingly, apparently China, many in China have been building on top of Lama, in particular as the starting point to making their language models. So evidently that would be why they're, the U.S., lawmakers are seeking to do this.
So, yeah, very interesting conceptually to think about actually having export controls for open source models.
It seems like one of those stories that Jeremy would be really great at going into further analysis on, he really knows what's going on on Capitol Hill. The one thing that I would say here is that it's too bad that, we live in a world where geopolitical conflict is real.
I mean, I'm not pretending that it doesn't exist, but it's a shame, because it seems like we're so close to having so many of these emerging technologies like AI, nuclear fusion, quantum computing, being so close to realizing them and having them being able to bring abundance to everyone on the planet.
But these kinds of things, which I totally get it, makes perfect sense in the real world that we live in with geopolitical conflict, but it's just a shame that we can't have everyone getting along and everyone collaborating, on open source, on hardware, instead of having to have these kind of competing groups in one region, having to feel like they are needing to catch up. And it's interesting because it seems like mostly like a futile exercise.
Anyway. So, you know, the U.S restricts access to some hardware or to lama architectures to say China and China, then invest a whole bunch of redundant capital in recapitulating the same kinds of advances separately, and eventually seems like they'll catch up. So it's like, I don't know, it seems like a lot of a lot of wasted potential, a lot of wasted capital and, you know, slowing down the, the capacity for, for everyone all across the planet to be, you know, living a higher quality of life.
But yeah, I mean, I'll take my rose tinted goggles off.
Yeah. And I think it's worth noting, you know, there's been a lot of framing of this AI race between China and the US in particular over the years. And it is not entirely a fair kind of way to pose things. There's a lot of academic collaboration between these labs. You know, there's a ton of research coming out of China that people in the US read and build upon, and vice versa.
So this kind of notion that these are enemies isn't the full picture, even as there is very much a real, situation where the US law is seeking to basically combat the ability of China to make advancements. And, I think with this kind of law, it's would be surprising if it can actually be effective to actually block the ability of open source models to be used in China. But it does speak to making big moves, basically continuing to add to pressure, as we've seen also with hardware export
controls. Next up, OpenAI's model spec outlines some basic rules for a AI. So OpenAI AI has released the first draft of a proposed framework, which I call Model spec, to propose basically how AI tools should respond. It's a bit like you could say like an ethical framework, almost, where it proposes three general principles AI models should assist developer and and user. This should benefit humanity and in the case of OpenAI, they should reflect well on open AI.
There's also specific rules here, such as following chain of command, comply with laws respecting creators rights, protecting privacy and not generating not safe work content. And. Has been positioned at least online. What I've seen as trying to make it very clear what principles are guiding OpenAI's development. So it was maybe a bit opaque.
And what I've seen positioned is if, ChatGPT doesn't want to reply to you with this model spec, they can say, okay, this was a mistake and the model should have behaved differently, or this was what you want to do according to our model specs. And, yeah, it's very interesting to see, given the impact of OpenAI and just how influential they are now, seeing this released and having more clarity on their internal development processes.
And probably to some extent, at least things like reflecting well on OpenAI, following along from the trauma of last year. Yeah. Not surprising just to see this thing in there. Does this remind you in any way, Andre, at least like in some kind of vague analogy, is it reminiscent to you of anthropic constitution?
Very much so. I think Constitution is making very clear on what do you want your model to do, in a sense, versus maybe a broader in scope almost. It, is, kind of half technical, half PR almost. I think it, it probably is at least partially motivated by the big PR disasters at Google in recent months where the model started doing, you know, really largely criticized behavior of generating, different races in historical
context. So this model spec is kind of clarifying exactly how they want their models to behave and why they might, refuse to do certain things, but do other things aren't of lighting around for story. Robot dogs armed with AI targeting rifles undergo U.S. Marine Special ops evaluation. So the robot dogs are literally the sort of things you may have seen already from Boston Dynamics and, these ones developed by Ghost Robotics specifically.
And they are equipped literally, there's a gun mounted on top of it from defense tech company Onyx Industries. And this is kind of small scale testing at this point. There's two of these things that, they are experimenting with.
This organization did clarify that weaponizing the dogs are just one of the many use cases being evaluated, and that it apparently adheres to all Department of Defense policies concerning autonomous weapons, presumably meaning that you do want to have a human in the loop and not have it be fully autonomous. But regardless, I think pretty notable in that we still don't have fully autonomous systems.
We have some examples of drones being semi-autonomous, but we are still not in an age where robots are being deployed in battlefields and doing a lot of a work of kind of a traditional soldier. And if these sorts of systems of dogs with guns, you know, undergo variation development within several years, it is not kind of out of the question that they will be a large factor in battlefields, in warfare. And that is something that maybe we are forgetting in this moment.
You know, there's a lot of concerns about disinformation, misinformation. But, you know, it's it's just a matter of time until we have kind of scary robot soldiers.
Yeah. I think 2023 was the year where there were supposedly some cases in the Ukraine, Russia, conflict of drones working behind enemy lines, where so if you create, you know, there's there's this, there's warfare in this counter warfare to create drones. Obviously that's, you know, the first step. So you have drones and you're sending them over enemy lines, you're monitoring them, you're having loitering munitions that attack them.
And so the your opponent then says, okay, well, let's create all kinds of jamming to prevent the remote control operator from operating this drone effectively. But then that same drone operator says, okay, well, if I lose the ability to control my drone in enemy territory, what should we do? And 2023 was supposedly the first year where in the Ukraine Russia conflict, you know, is the first place anywhere in the world that we're aware of. Drones started to be able to act on their own after
losing. Connection to their controller. So, you know the jamming is effective. It prevents you from being able to see what your drone is doing. But you still, you know, the drone has some expense, it has some munitions on and it's behind enemy lines. And so it's using it. It's using maybe open source.
It would be using it. So we're just saying like all these kinds of tools that we're developing to try to create a better world for the most part Python code machine vision libraries, PyTorch would be being used, to have machine vision systems work behind enemy lines.
Once you've lost control of that drone and it's, you know, hopefully that machine that like open source based machine vision system is accurately detecting the enemy as opposed to just, you know, someone tending to their turn ups or something. And yeah, it's, obviously a, a minefield. And it seems inevitable that we're going to see more of this, which is, yeah, a scary thing.
And it's not inconceivable that some years from now, most of the, yeah, most of the fighting could be being done by autonomous vehicles as opposed to by humans, making decisions and. Yeah, and obviously lots of concerning things there.
Yes. And as we've covered in the past, there have been efforts in the UN to have some sort of policies with regards to autonomous weapons, not go nowhere. They have been blocked by countries like the US now. Definitely visa are still in early development phases. So not likely to be out there within the next year or two, but still, we haven't seen too much news on this front. And this is an example that this is actually happening. And another story on OpenAI.
They are releasing a deepfake detector to disinformation researchers. So this is a tool being released to a small group of disinformation researchers meant to detect valley free outputs to distinguish between real images and synthetic images generated by their tool. And OpenAI says that this will have 98.8 accuracy, although it will not be able to detect images produced by other generators like Midjourney instability. So good news to be able to see what images are from here or not.
But this whole area is still kind of unsolved. It seems to be.
Yeah, it's a relief to hear that. And I guess it's unsurprising you have a lot of experience in machine vision, Andre. So you may be able to speak to this as to how you end up kind of having signatures in the generated images or generative video that allow it to be identified as from a particular generator. And I'm grateful that we're able to do that with images, video, maybe audio, because it's proved to be since the release of GPT four, extremely difficult to identify generated text.
But people have been, for all of history, able to generate fake text. And so we're kind of prepared to take with a grain of salt things that people say or things that people write, but images or video, up until the last a year or so, we have been able to, you know, to to have the saying seeing is believing. Whereas in the last year that statement is basically no longer true.
And so it's great that, we are able to detect and yeah, it's interesting that it's kind of fundamentally different because of these kinds of signatures that we can detect in it. Yeah. I don't know if you have more kind of technical knowledge on that. Andre.
Yeah, there's been research coming out, for a while now on the ability to embed invisible watermarks, basically some things that humans would not be able to perceive even in the pixel space of the image. But that could be used to verify whether this is coming from a specific model or not. And I'm sure that this tool relies on that.
And we've been covering there's been more and more announcements of things like meta saying that will add watermarks to their generations, partially based on some pressure from the, executive branch of a U.S government. And we some of policies institute there. So definitely some progress being made. But it's still very hard when you release a model that's open source, like stable diffusion that is for bad actors, easy to mess with and make it so you can't detect images.
So it's not really a solvable problem fully, but at least for things like Dall-E free, you can do these sorts of techniques onto our sections synthetic media
¶ Synthetic Media & Art
and art. And the first story is audibles. Best of AI voiced audiobooks tops 40,000 titles. So last year, Amazon has announced that self-published offers in the US, who make rare books available in the Kindle store, are would be able to use a new tool in beta to release audiobooks on audible with AI generated narration.
And as per this, news article that has now resulted in 40,000 titles being released on audible and that, you know, as you would expect, has led to mixed, responses, with some audible users being unhappy that there is this onslaught of audiobooks while independent and self-published offers who basically will not be able to afford to hire a voice actor to do a professional narration, benefiting from this and being happy that they can use the tool. The article is pretty detailed.
It goes into some conversations with actual voice actors, and it doesn't seem like jobs are being impacted quite yet. But of course you can see the concern that maybe it would impact that industry soon enough. I wonder, John, I know you wrote a book, Deep Learning Illustrated, and, I don't know if it makes sense to do an audiobook.
Probably you do have illustrations and add book, but I could also imagine as you write things, perhaps a new book, wherever you would consider doing this sort of thing or with your podcasting experience, just do your own narration.
Yeah. It's interesting. I have thought about an audio format for a book like Deep Learning Illustrated, and my guess is that it doesn't really work because not only does Deep Learning Illustrated have lots of illustrations, which yeah, okay. You know, you could maybe even kind of narrate the images and try to make them make sense. You're definitely stretching the kind of visual emphasis of the the kind of pedagogical approach
that I took there. But even worse than the images is that I tried to minimize equations. And so, you know, there's not a. There still are quite a few equations like this. There's got to be 100 or more equations in the book. Those would be very difficult to kind of visualize if you're just listening to somebody reading out equations. But even worse than the equations is that the book is full of Python code hmhm.
So that wouldn't make sense.
Yeah, I mean, that would be some really, really dry, listening where you're like, okay, import data frame. You know, it's like, yeah, it would be pretty, pretty difficult to listen to. And at least with the way that I did it, it's the code is really interdigital right in there integrated right in there with the text and with the code and with the visualizations. And so it's hard to imagine a book like that working well in an audio only format.
Not related to this idea in general is that, one of my best friends, his name is Zach Weinberg, and he was a guest on Super Data Science in episode number 646. I brought him on that, episode to kind of be like a layperson, giving their experience of how ChatGPT has impacted them, professionally. And he said to me.
You're so lucky, in a way, to have written a book before the generative AI era, because now, anytime anyone writes a book, there's going to be some skepticism as to how much you actually did yourself. And that'll happen more and more and more and more. And so I guess, yeah, I'm I'm lucky to have squeezed one in there right at the end of the of the human writing era.
Next story. TikTok will automatically label AI generated content created on platforms like Dall-E three. So yes, related to that detection story, although in this case they will be using a technology called Content Credentials that is being developed by the coalition for Content Provenance and Authenticity, which was co-founded by Microsoft and Adobe. So in this case, these images will have some metadata. You don't need to do analysis of the image to guess whether it is AI generated or not.
Rather, these platforms will just include information to save the tour generated on the given platform. And now TikTok will be building on top of a standard to make it clear whether images are generated or not. And speaking of that, a couple, maybe slightly more fond stories related to AI generated images. The first one is that Katy Perry's fan made AI image is so real, at full the world into thinking she was at the Met Gala.
So yet another example. We've covered some of these before of things going big where they are in fact AI generated by people this many people thought were real, and in this case it was very vague images of Katy Perry being dressed in some very fancy dress at the Met Gala. There are some fun outcomes of this particular one, where apparently Katy Perry's mom even texted her and said, wow, you were at the Met Gala. This is a gorgeous dress.
And Katy Perry then responded saying, that's AI generated beware. And so yeah, and this article goes into some details on how a fan of Katy Perry, who runs a fan Instagram developed dress and how it took actually some work. They used a Microsoft tool. They used a second tool to event post-process it, things like that. But yeah. Now basically don't believe what you see is the lesson that we're getting.
That seeing is not believing. I got to say, obviously we can't in this audio only podcast be showing you, these jazzes, but it does look spectacular and it does look pretty compelling. It would be very difficult. I don't I don't think I would be skeptical if I saw those.
And I would have to do some very close analysis to notice some of those details, as is the case in general. Maybe if you look at the background carefully, you could detect some hints of AI, right?
Yeah. They're actually at least in one of the ones that we can see here, there's like a whole slew of photographers that are like appropriately dressed and they all seem to have five fingers on their hands.
That's right. As we can tell.
Yeah, yeah.
And speaking of deepfakes, the next story is South Korean woman falls for deepfake Elon Musk and loses five, 50,000 in a romance scam. So it goes into some details on how this person was contacted on Instagram by someone posing as, you know, Musk. And of course, this person didn't immediately believe that story.
But there were a lot of interactions where first, this supposed Musk told some stories, and eventually even they had a video call with a deepfake of Musk convincing, this person to then give up some money in the scam. And we've covered some stories like this in the past. I keep saying this, but if this is something you need to be aware of, scamming is going to be a big thing with AI. And be careful people.
Yeah, it's something that not only you as an individual need to be aware of, but I for one have several times and I hope that this message is really gotten to them.
I've told all of my loved ones that if you ever get a phone call from me asking for money or anything, like they'll call me back, you know, or make sure it really is my number because, you know, you and me are in the same boat, Andre, where there's a lot of we have many orders of magnitude more samples of audio available publicly on the internet than needed to simulate our voices effectively. And it'd be very easy to, to scam our voices. And, yeah, it's definitely scary. Things can happen.
Yeah, all of us gotta watch out now. And, related to this particular story, one last one that John was aware of, I didn't actually see this. The story is why a young Russian woman appear so eager to marry Chinese man. And it's a similar kind of story where there are deep fakes of these Russian women who apparently can speak Chinese fluently and then similarly, essentially develop a romance scam to, get Chinese man interested.
Well, I think that this is even more so than a romance scam. I think it's it's supposedly it's a, it's a it's a government thing. So this is, this is an effort to. Yeah. Like, it's it's not so much about individually targeting an individual and getting money from them, but it's using real, Russian looking people. So, for example, there's a real woman here, Olga Boyack. I'm probably mispronouncing your surname. She's a Ukrainian woman studying in America.
And her likeness was used in generative AI tools, where she's speaking perfect Mandarin with the Kremlin in the background, comically. And there are dozens. She found dozens of accounts using her face. Where, in these videos, she's glorifying China and, you know, complaining that Russian men are drunk and lazy while praising Chinese society and technology.
They these women with names like Natasha and Sophia in their fluent Mandarin, are saying things that like, for a Chinese husband, we'd be delighted to cook and wash your clothes and bury your children. And so it's it seems like it's, a propaganda thing, more so than, than a targeted scam, but, yeah, nevertheless, I don't know. There's something about this story that. It's I don't know. It's like the confluence of so many different things. It's like geopolitical issues, romance, generative AI.
Anyway, it makes maybe kind of a fun story to end today with.
I think so, yeah. And with that we are done. And thank you, John, for co-hosting while Jeremy takes a well-deserved break.
Yeah, my great pleasure, Andre. As I said at the outset, as a rabid listener of this podcast, it's surreal to be able to come on and yeah, always have a lot of fun, and I look forward to listening to more episodes in the future.
And as always, we are very appreciative of our listeners and fans. So thank you for listening. As always, I got a blog. We do have a newsletter last week in that I and as always, if you are a big fan, it would help if you give us a review, share it with your friends, that sort of thing. But more than anything, we do hope you enjoy the podcast and keep listening.
¶ AI Outro Song
You. Listen, you. It's time to cruise towards us.
A chance to try. If you take a break to. Just so you.
Just prove.
For some reason until you.
Say. Till next time. I want.
But for the time being.
Yes. My son. She learned.
Last here? Yes. Don't you?
That's is. Last I heard you.