In a world of ones and zeros painted bright, Adobe dreams take flight. With Tesla's future rolling down the streets, Cybercat visions what tomorrow means. All the AI minds that shape in our skies, Nobles for the visionaries, no surprise. Stories of a week where tech inspires, Last week in AI.
Hello and welcome to the Last Week in AI podcast where you can hear us chat about what's going on with AI. As usual in this episode, we will be summarizing and discussing some of last week's most interesting AI news. And as always, you can also check out our Last Week in AI newsletter at lastweekin. ai where we have weekly emails with even more articles and even more news and also everything related to this podcast with links and so on. I am one of your regular hosts, Andrey Kurenkov.
For some background, I studied AI at Stanford and now I work at a startup. And once again, Jeremy is still out for paternal leave. For a little while longer. So we do have a guest host once again. And once again, John Krone is filling in that role. I'm your regular,
irregular co host. Yes. Yeah. It's great to be back on. Thank you so much for inviting me again, Andre, and we hear. Jeremy is doing well. Everyone's healthy. Baby's healthy. Mother's healthy. That's great.
That's right. It's, yeah, going well. And, uh, from what he's told me, uh, he will be back relatively soon. Hopefully either in the next episode or episode after that, he will be getting Jeremy back, which I'm sure any older regular listeners, uh, will be excited as
am I. Yeah, I mean, it's an invaluable income stream to be co hosting this show. Every episode that I'm in here, I'm like a million dollars richer. And Jeremy's missing out on that. So he's going to get back in. Yeah, you get paid fun and news awareness. That's what you get. You do get lots of news awareness. I do like that about it. I guess in case people haven't been listening, uh, to episodes that I've been in before, I will try to keep this short. This time.
I am co-founder and chief data scientist at an AI company called Nebula. I wrote a bestselling book called Deep Learning Illustrated. I host what I'm pretty sure is the world's most listened to data science podcast called Super Data Science. Andre and Jeremy have both been guests on that show. And yeah, so I mentioned a little bit last time that I was on that there's TV stuff developing and that has continued.
There's like two separate TV projects that are developing that I'm really excited about and I can't wait till I can share some things publicly on those.
Wow, that's really exciting. I'll be keeping an eye out and I'm like we did promote. The podcast for quite a while on this show. And so I would hope that a lot of listeners of Last Week in AI are also fans of super data science. There's at
least one, because I got an Apple podcast review maybe two months ago that somebody, although it wasn't necessarily related to us having, having sponsored your show, because they said that they heard me on the show, they came over and Now they love super data science as well, so I'll try to look up while you're doing your, uh, Apple podcast updates. I'll have a look at mine to see what this person's kind of name was and what they said. So I know that there's one.
Before we get into the whole set of news, I do want to try something a bit new, uh, just a quick news preview to let people know everything that's coming up since we do have like two hour long
episodes. That is so smart. That is something as a listener. So something that I've got to say is that this podcast is the only podcast that I listened to last week in AI. And so that is something that I, I, I, man. This is on my wishlist. This is exciting.
Gotta listen to your users. So, uh, to give that preview, uh, we will be a bit lighter on news this week. It wasn't quite as much heavy hitting, uh, topics. We're going to have a lot of news regarding Adobe. They just had an event where they covered a lot of new tools, uh, coming to their suite of creative. tooling. We'll be probably talking about Tesla for a decent while. As far as business goes, they had their big we robot event with a lot of cool stuff. Maybe not.
Uh, well, we'll talk about it. Uh, then there are some exciting open source projects that will be, uh, maybe a pretty big topic. We'll be talking about the Nobel Prize Awards, of course, which happened last week and had quite a lot in AI. And we'll be talking a little bit about policy and safety. There's not quite as much going on, probably talking, uh, quite a bit about entropic and what they posted. So there you go.
It's going to be kind of a mixed week, nothing too huge, but, uh, definitely some interesting things going on. You should pause on all safety
and policy stories till Jeremy comes back.
Then just catch up. Yeah, yeah, yeah. Uh, and then before diving in as usual, quick shout out to any listener comments and corrections. I did not, uh, happen to catch much on that front this past week. So, uh, In lieu of having those kinds of comments. I did just wanna say on Apple podcasts, I took a look. We do have 226 ratings now, which is, I don't know, uh, it used to be like 200, a couple weeks ago, a couple months ago. That's right. Yeah. So I imagine, uh, wow.
Yeah. Hopefully it's, it seems like some listeners are leaving reviews, uh, just star ratings without reviewing it in text. And thank you as well. Uh, that does. Presumably help in an algorithm. I don't know. Feels nice. Our average went up by 0. 1, you know, 4. 7 out of five. So that's
really annoying. I'm going to have to go in and give a one star cause we're still at 4. 6. It's super data science. So you guys going up to 4. 7, it's completely unacceptable to me. Well,
I'm sure if you got more reviews down, you know, you got to revert to a mean and so on. Right.
So, um, and I did find here. So on August 23rd. Someone, I don't know if, do you know this person, D321P? Is that a friend of yours? You know, uh, I probably would forget her name. So D321P wrote, uh, in an Apple podcast review for super data science said, I'm a loyal last weekend AI listener and heard you as a guest on their show and figured I would hear your episodes and I'm glad I did. Great. That's nice to hear. Thanks, D321P. It really rolls off the tongue.
It really does. Okay, let's get into the news. We start with tools and apps, and as we promised, the big topic here will be Adobe. And they have had their Adobe Max 2024 Creativity Conference. And as usual, at these sorts of events, we got a whole bunch of news regarding new stuff. And a lot of it was AI. And The big news was their AI video model. So they've had their image generation models for quite a while under the Firefly umbrella, and now they have the AI video model.
They've had it in beta for a decent amount of time, uh, just for people to use, uh, I believe in a website form, they kind of previewed it. Now it is available in the beta. In Premiere Pro, which is their program for editing videos, it's one of the kind of leading products on that front. And you can extend footage by up to two seconds or make mid shot adjustment. And that's just one of the tools here. So they have that, uh, janitor extend.
They also have a text to video, which is only available as a limited public beta on the web app. Uh, this would allow you to use, uh, you know, the usual kind of thing, like Sora, like we covered last week with MovieGen. It is able to produce five second clips with, uh, not quite HD resolution, but pretty good looking quality. Uh, just looking at the examples they put out.
It is up there in terms of the footage not being obviously AI for the most part, and they do have also image to video where you have a reference image along with a text prompt for more control. And they say, uh, one of the things you could do is camera control, like saying this should pan left or right. Last note. As with all of the Firefly tools, they do say that this is, uh, commercially safe, meaning that they didn't use copyrighted data to train.
And so you should be fine to actually use it to produce commercial video, I suppose. So I know that you're an Adobe fan. Have you used this Andre? I have not, uh, because I don't do much aside from editing podcasts and, uh, generative expand, uh, so far it's not very useful. Maybe I'll add some special effects, uh, which would be interesting. I do showcase the ability to generate, uh, kind of nice. overlay of special effects or whatever you want to call it.
Uh, I did play around in Photoshop with their tools that are AI based. They really have some very nice things, particularly when it comes to cropping, like the ability to crop an individual object is just totally different. It used to be like a time consuming process. You had to like go all around the borders and object. Now AI will just do that for you. And it's great. Similarly, also like the, uh, infill of images to reduce noise, just a lot of nice stuff that is directly integrated.
And I would imagine that would be kind of for video editors, people create more kind of creative videos. This could be useful for B roll and other things like that.
Yeah, it's super cool. I watched the video of this AI video model. Of course, they're going to be cherry picking the ones that don't look AI generated. You'd have to. How could you not do that? But simultaneously, like you say, some of these features are really cool. Like being able to have an existing video clip.
I mean, you could take this footage that we have right now, the YouTube version of this podcast, and you could put like, you know, flames behind me or, you know, You know, as we talk about Firefly , fireflies, I guess around us or whatever. Mm-Hmm. . And so that's some pretty cool functionality, being able to change the camera angle. And I haven't seen that. I don't think I've seen that in the other generative video tools yet.
Yeah, we, it. Could be the case maybe that you could try it with MovieGen since they do support some fancier like video editing, although I think that was primarily for changing the actual content of the clip, not the camera. So that could be something unique to this and it would make sense because this would be Pretty much, uh, you know, meant for actual usage by people creating this sort of stuff. So it might be different from Sora and movie gen, for instance.
Moving right along, we get a couple more stories related to Adobe, and this is about upcoming things that are more in preview stage. So they previewed a couple experimental AI tools. One of them is Project Scenic, which will generate 3D scenes based on user input. And then use that as a reference to generate a 2D image, something that I find pretty interesting. So the idea being that you do have essentially sort of a set of objects, you know, house, tent, something like that.
And you lay out a scene and then you can use that as a reference to generate a 2D image with your typical AI model that will, you know, kickstart that process with that. pretty structured 3D scene, so not something I've seen and something that is interesting. They also did preview Project Motion, which is for creating animated graphics in various styles.
And Project Clean Machine, which is an editing tool that automatically removes destructions in images and videos like camera flashes or people walking into frames. So there you go. They've been announced at the same conference as they call them sneaks in development projects. So we'll probably Not be seeing this integrated for a while.
Maybe it won't even come out, but, uh, you know, as someone, as people who track AI research, uh, some of these things are pretty interesting and not necessarily something we do see in other tools or in research.
And I was saying before we started recording that thanks to this podcast, I do get a good sense of what's going on with Adobe Innovations, and I'm really glad that I get that because we hear for whatever reason, whenever OpenAI releases something everyone hears about it. Anthropic same kind of situation. It doesn't seem to be as it doesn't seem to get as much reach when adobe makes these announcements, but these are big deals. These are each one of these kind of capabilities sounds super useful.
And the key thing that adobe is getting right here is that assuming that these capabilities work well, having them in the flow of somebody using an Adobe product without having to go off and use some other tool, having everything integrated like that, that in my view is the key to success with generative AI.
Exactly. Yeah. We were talking, I believe just in last episode with MovieGen that, uh, you know, MovieGen is a cool, just like Sora, but on the other hand, it's not released and it's maybe not even practical to use because of The amount of compute, the time it would take, which was also the case with Sora, to take way too long to actually use in practice.
And also it, it likely isn't actually targeting a major pain point for many people, you know, generating a video from text is cool, but you know, aside from B roll, it's not necessarily the case that it would be useful. They do have a video editing as a major feature where you can revise aspects of a video. That's definitely one of these more practical things that would be useful.
And these sorts of things like, uh, you know, this project motion, which is for creating animated graphics, project clean machine, they're definitely less sexy, right, but they're presumably much more, uh, useful in those kinds of workflows. And one last story about Adobe, they have another experimental prototype called Project Supersonic. And this one is meant to generate audio effects for video projects.
So, this would use text to audio, object recognition, and voice imitation to create background audio and audio effects. Side effects, uh, sound effects. Sorry. Uh, so this is something we've seen from other companies, like 11 labs with, uh, audio effects being these a short snippets, right. Of like, I don't know, a ball hitting the ground and things like that, that you obviously need in videos to make them really be realistic and which is.
you know, decently less hard than generating full on music. So, uh, yeah, this is, I would say, definitely something that seems like it will be coming to, uh, the Adobe Creative Suite, although currently it is just a demo.
As you described, having this as part of the Creative Suite, again, in the flow, generative AI, and addressing a pain point, like you said, that's another really important thing to hammer home here, is that if you're not addressing a pain point, other than being cool, How is that going to be a big commercial success? So yeah, uh, lots of applause for Adobe. Keep it up.
At the lightning round, we are moving away from Adobe, but we are keeping on these kind of tools. The first story is about YouTube expanding AI audio generation tool to all us creators. So they've had this. Thank you. DreamTag, uh, DreamTrack tool that was first introduced in 2023. It would allow users to generate short instrumental audio clips based on text prompts. Uh, so it would basically create sort of royalty free soundtracks about 30 seconds long.
And as per the title, now it is being released. Uh, to all us creators. So that would presumably help a lot of people making a short clips on YouTube. YouTube does have their shorts thing, which is similar to Tik TOK and Instagram reels, and you can see how having something that's royalty free could often be something useful. So, uh, very much in keeping with YouTube also releasing tools like Text to video also on YouTube.
It seems like we're almost just saying the same story again. It's like, it's, it's exactly the same story as with Adobe. These are in the flow tools that are useful for people. Uh, exactly. Like being able to generate shorts, easily clips, easily, uh, instrumental audio, and now, yeah, this. Dream tracks. Yeah, very cool. I'm sure it will do well. And it's great that they're offering it to all the creators.
Exactly. And, uh, I think it's, it's interesting to have these stories now because, you know, it was just at the beginning of the year that we got Sora, right. It was last year, uh, text to video was very much nascent. Sora was very mind blowing at the time for the quality of video it could generate. And, uh, now, you know, Almost nearing the end of the year, we are getting to a point where these things are actually being deployed and commercialized AI video tools.
So that tells you something about the, you know, the speed at which AI is being developed and rolled out by different companies. I guess
a key distinction here. My assumption is I don't see it explicitly, but I believe all of these tools being offered to us creators by YouTube, these are free to use. Yes. So that is a distinction relative to Adobe. And it also, so the assumption here is that alphabet downstream will get more revenue from having slicker.
Uh, slicker videos and creating an easy ecosystem that creators want to be in and using and publishing videos in, uh, to continue to maintain their position as if I, if I'm still correct on the stat, YouTube is the number two most visited site in the world after Google.
Yeah, YouTube is massive for people who don't go very much it in terms of time usage, you know, is above Instagram, above TikTok, above many of these things that you already think of as places where people spend time. And my impression is YouTube shorts haven't been quite as successful as those other things. So could be also a reason for Google to be pushing this to try and push it along. And speaking of Google and rolling out tools to more users, we've got a similar story.
Now all Gemini users can generate images with Imagine 3. So we have had Imagine 3 for quite a while. It was pretty limited, and now you can use it in Gemini. Similar to, uh, GGBT, you can generate images in Gemini by just saying prompts with like, draw, generate, create, uh, x image. Uh, interestingly, these generated images do come with a Synth ID watermark. So, if you try and detect whether this came from Gemini, you can actually authenticate an image as AI generated. or not.
This is not currently available for free users. So this is being made available to people who are paying Gemini advanced business and enterprise.
Are you, it's interesting this word, which you very confidently pronounced is imagine. I'm always like, I, I'm never sure. And I guess I could watch IO 2024, Google IO 2024 to get how they pronounce it. But it's interesting because for people who are listening to this podcast, and if you want to look it up, you don't type Imagine into your Google search, you type imagen, uh, I M A G E N, which is, it's a clever name.
Cause it's like, imagine cause you're creating something and you're imagining something while it is also image generation. So imagine,
I think imagine. Yeah, that probably makes more sense. But I guess it isn't, doesn't roll off the tongue quite as easily as Imogen. So, uh, and just quick correction. I may have misspoken. So the feature to generate images with people is not currently available for free users. Of course, Gemini had quite an embarrassment with regards to that when it first rolled out image generation. So they have reintroduced it, but they're still kind of limiting it to people who are paying.
Moving right along and we are keeping with the theme of AI being rolled out more widely. Next up is Meta AI and they will be launching in six more countries today. Those countries are Brazil, Bolivia, Guatemala, Philippines and the UK, and they have said they also plan to launch in many more countries in the coming weeks, countries like Algeria, Egypt, Indonesia, Morocco, and so on.
So they'll be, of course, adding support for new languages, and these meta AI tools are You know, all these things they've introduced throughout Facebook, Instagram, WhatsApp, Messenger, chatbots, and so on. It is not going to include the European Union due to regulatory concerns. And we've seen that with a couple of things, not just meta AI. So, uh, people in Europe, uh, you know, not getting meta AI, not sure. They're going to be upset about it personally.
I've not used these things and, uh, you know, haven't benefited too much. Either way, Meta AI is being rolled out more widely. I wonder if you could just like
change your location and get access. It seems like a pretty easy thing to do. Cause I don't think for most of these things, I guess, like with WhatsApp, you provide your phone number. So that as a country code, but for Facebook, Instagram. Presumably you could just change where your country is, I guess, and get access to these tools. If not, then great work, UK. You finally got something out of Brexit. Your economy has struggled for years since leaving the EU, but you get Meta AI. Yes.
Yeah. Your ability to regulate less harshly, the UK doesn't have the new AI act, of course, because of Brexit, uh, does lead to more business friendly results like these for sure. And the last story here, finally, we are doing something a bit different. No more AI being rolled out to more people. And this one is, uh, Bit intriguing if you are more of a, let's say, nerd when it comes to AI, so the headline is OpenAI unveils secret meta prompt and it's very different from Anthropic's approach.
So it might first sound that this is the system prompt, that's what Anthropic, uh, Revealed pretty recently, they showcased this is the actual thing that we give Claude when you give it an input, you know, there's a whole bunch of text instructing Claude how to respond. We actually know what that is. With OpenAI, we don't know the system prompt and this is different from the system prompt. This is a meta prompt that is used in a tool of theirs that can optimize a prompt. So you instruct.
of the LLM here on how to improve the prompt. But either way, even if it's not a system prompt, the article goes into quite a bit of detail comparing the style of the prompt. So this one is pretty structured. They have, you know, various sections with guidelines, steps using sort of markdown formatting. Versus Anthropic, which if you looked at the system prompt was a bit more narrative style, lots of paragraphs, stuff like that.
So, uh, it's interesting to look at again, if you think about things like system prompts and do
prompt engineering. Yeah, this is really cool. And this, if I'm reading this correctly, this that they've released, this meta prompt, is this specific to the 01 model family that's now being released? It, uh, I would have to check, but it wouldn't be surprising for me if that's the case. It seems like that's what this is, um, which is interesting.
And that also explains things like actually I'm pretty positive that that is because there's things in the meta prompt around steps and how those, um, should be broken down and revealed, uh, to users. So I think we talked about 01 last time I was on the show. It was, it was brand new, um, a couple of episodes ago on last week in AI. And I mean, I, I've been using 01 to tremendous effect actually just in the last few days, it constantly blows my mind.
And, uh, I don't know, it is cool to be able to see under the hood a bit. It is also nice to see OpenAI being a bit open, um, with what they're doing here. I guess, um, you know, if there's kind of an openness arms race that Anthropic or other players encourage in this space, that's great. Keep it up.
That's right. And these prompts are from their documentation. So, you Entropic had more of a blog post and a whole bunch of kind of discussion of revealing the system prompt. This is in the internals of the OpenAI platform where they also discuss prompt engineering guidelines, stuff like that. So, This is just an example of a metaprompt you might give. This is not even something that OpenAI necessarily uses, although it is likely that this kind of prompt engineering is what they do.
Moving right along to applications and business, and we begin with Tesla. So if you're a big fan of autonomous driving and robots, as I am, uh, we had pretty interesting developments from Tesla this past week. They had a big, we robot event that has been kind of, uh, uh, previewed and scheduled for quite a while where Elon Musk has been saying for a long time, you know, we'll be revealing a lot of stuff regarding our autonomous driving taxi service that is said to be a major focus of Tesla.
They held that event last week and there were some very cool things. So they did show a new cyber cab truck, uh, sorry, A new cyber cab car, this is a futuristic looking car, kind of, if you take the cyber truck, condense it into a regular ish sedan with two doors, no steering wheel, two seats, the doors are, you know, go vertical when they open, not a regular door, so lots of fancy sci fi looking stuff, basically, and the idea being
that These kinds of cars will enter full production in, let's say, two, three years, 2026, was the promise said before 2027, and this would be how they roll out that taxi service. In addition to that, they also had a robo van, unclear why it's not a cyber van, but either way, robo van, kind of a similar idea, very futuristic, but capable of seating 20 passengers instead of just a couple.
On top of all that, also at the event, they had a whole bunch of their Optimus humanoid robots just throughout the event. They had them serving drinks, they had them chatting to people, and uh, You know, there were a lot of promises being made or projections of These kinds of robots being available for twenty to thirty thousand dollars and being able to do all sorts of things like being personal assistants.
So there you go, lots of cool things at this event, but not very many specific details and even the demo of the, uh, Of this new cyber cab was on rails. It was held at a movie studio, so it was a very safe kind of demo and the optimist robots seemingly were tele operated. They are not autonomous. People were in control and talking to people. So the stock price actually went down by a decent amount by 8 percent because investors did react. Thinking that this might be more style and substance.
Yeah, exactly. And when you just got an inbuilt meme really nicely there, when you do it on a Hollywood set, and it seems like what you're pitching is a fantasy more than reality, it really worked all too nicely for that, uh, for that hit in the stock price the next day, much higher expectations than were realized with this evidently from an investor's perspective. And part of that. Is that if you think about a big competitor in the autonomous driving space, Waymo is the leader here.
I think it's fair to say without question. And Waymo has, uh, Alphabet rather has invested 30 billion in Waymo in developing a complex system involving radar, LIDAR, as well as very detailed 3D mapping of environments That works in conjunction with just video cameras, whereas what Tesla is proposing here is that the cyber cab, and I guess later the robo van, and you're right. Why isn't that the cyber van, cyber truck, cyber cab, robo van?
Anyway, maybe the cyber cab is going to be something else. The cyber van is going to be something else. And so. What Tesla has been trying to do for years is to have their quote unquote full self driving, which is a marketing term as opposed to a reality. Um, and so I, I did an episode actually on the five levels of self driving cars. If you want to check out super data science 810, uh, episode number 810. I specifically talk about the five levels of autonomy that you can have in cars.
And while full self driving is. Described as full self driving. This is solely a marketing term. It is not full self driving because you still need to have somebody standing behind, standing behind, uh, sitting probably, unless they're very short. Uh, they'll be sitting behind the wheel of a, of a Tesla vehicle. And, and part of the.
Big hurdle that Tesla is facing is that their AI team is trying to develop some kind of way to have true full self driving, not just marketing, full self driving, where you don't need a steering wheel, where you don't need pedals with only video cameras. And that may never work. So You know, they talk here about the cyber cap cab having rolled out in probably 2026 said Elon Musk, but certainly before 2027. And I don't know how both of those things can be true.
Probably in 2026. Definitely before 2027. Um, so I guess, yeah, around 2026, 2027. But that depends on regulatory approval. And having just video cameras could be a really tricky way to do that. The gambit here. is that it would be a lot cheaper. So Waymo's approach while it has created these beautiful Jaguars that are really nice to sit in.
Um, you know, you have a really luxurious experience and the drive is amazing because thanks to the later, the LIDAR, the radar, that detailed mapping, as well as video cameras, you feel really safe. The experience is amazing, but those cars are very expensive to build. And, uh, so it actually means today that, That when you're in a Waymo, it costs more to create that experience for you than it does to have a human driving the car. Now, of course, over time, those costs will go down.
You get economies of scale. Um, but yeah, I guess the idea here, what Tesla is hoping to be able to do is to be able to leapfrog all of that by Trying to have video cameras only work. And then that allows them to sell these, um, the cyber calves for just 30, 000 or less. Whereas, uh, from memory, uh, my memory of the, of, of a Waymo Jaguar is about four times that. Um, so yeah, so I guess we'll see how it works out. Definitely a big risk could be big rewards.
Um, I guess shareholders at this time are betting that the risk, uh, reward ratio is out of balance.
Exactly. And, uh, to be fair, I mean, it's sort of noting that Tesla is, I believe, still valued more than all other automakers combined. Like this is not a normal car company. That's their main business is selling Tesla cars, but their stock price reflects a very optimistic future where they go beyond that. They, at least, you know, get a big share of the self driving taxi business, or they get a big share of a humanoid robot market, things like that.
So that's already priced in essentially, uh, in the stock price. So if they can't deliver on these things in the next couple of years, the stock price would presumably take a big hit. And as you said, currently, you know, if you are a Tesla owner, you do have this FSD, full self driving, uh, I think it's called beta still. And, uh, as we covered earlier this year, for a long time, FSD was really bad. It was like scary to use. With FSD 12 and the latest iterations of it, it's gotten a lot better.
So they have moved to this, fully AI approach that uses video and training from their giant amount of data from people driving around. Now, if you use FSD, it is definitely much better, much more human like, much more reasonable, but Tesla also kind of tweaked the terminology. So they now call it supervised FSD. That's what you have at a Tesla. It's not just FSD, it's supervised FSD.
And so they also promised in this presentation a fully unsupervised FSD being launched in Texas and California by 2025 in their cars. Right? So that's, uh, kind of a less sexy detail, but a very important detail is, as you said, they need to get on supervised FSD, right? And what that means is you can let the car do its thing. You, you can just give up control and not be scared of what it's going to do.
And that's the case with Waymo now, and they've had it working in San Francisco for quite a while. They're testing it in LA. They're trying to expand to more cities. Tesla doesn't have that yet, and it's going to be very interesting to see if they're able to actually deliver on it by 2025, as they say. Next story, not quite as exciting, but still pretty notable. As we've covered, OpenAI has been making a lot of deals this year with various media organizations.
So they have now announced a new one. They have made a deal with media conglomerate Hearst, which owns outlets such as Houston Chronicle, the San Francisco Chronicle, Esquire, Cosmopolitan, and L. So this is adding up to a bunch more deals that they've done. And now these products like ShareGPT and SurgeGPT will display content from over 20 magazine brands and more than 40 newspapers as part of this partnership. And they have many more deals that they've secured over the past year.
And, uh, as we've said, you know, it might not seem necessarily like a huge deal, but this could be a real differentiator from something like perplexity or Claude even, right? If you have to pay for access to the latest news via these, uh, media organizations. And, uh, you have to make these kinds of deals. OpenAI is the main player that we know of making those kinds of deals. And they seem to, you know, continue doing this with this latest development.
So it would be interesting to see if all these stories, uh, if all these deals
pay off. And just two nights ago at the time of recording, I had dinner with someone named Peter Goldstein, who, he speaks on generative AI in public generally, uh, really well spoken, highly educated guy, and he happens to be the chief AI strategist for Hearst. And so this deal had just been announced. And so now I can tell you all the secret things. No, I mean, he was, he was extremely professional. I got, I haven't got, you got absolutely nothing.
Uh, you know, No behind the scenes, uh, information details, no drama, absolutely nothing at all. He was a consummate professional. Um, but, uh, yeah, it is interesting. Yeah. This is just that trend that we're seeing across generative AI where Initially, these LLMs were able to train by just scraping everything. You know, nobody had their robots.
txt set up in a way to prevent that kind of scraping because it was a completely new way of a completely novel approach to scraping and generating intellectual property. And so now everyone's woken up, including Hearst, uh, New York Times, tons of These big publishing organizations, Springer, Fairlog. Um, yeah, internationally, lots of publishers waking up to how much value they can provide to, uh, the super scalers in particular.
And at a time when, you know, those same super scalers are threatening their traditional business models because generative AI tools prevent you from getting, say, A Google search result that causes somebody to click through to an asset like Cosmopolitan or Esquire or the San Francisco Chronicle, instead of needing to click through to that article where the San Francisco Chronicle has ads, display ads that allows them to generate business instead, the Gen AI tool just Bye.
provides the answer and we don't today have, I mean, Gen AI tools can do retrieval augmented generation and act in a kind of agentic way to be pulling information in real time over the web. Often when it does that, it does actually provide you with the source.
Maybe you'd click, but I suspect that we're going to, as time goes on, have LLMs be increasingly real time updated their model weights updated based on real time information so that when you're doing, say, a Google search, Gemini will bring back for you an LLM exact answer to your question in real time, as opposed to you needing to, you know, click on some Google suggestion and see the result.
And so that, yeah, again, it eats into the business model of, um, you know, historically newspapers, magazines. Uh, you know, obviously they always had ads, but you were also paying to have it show up at your door. Not a lot of people do that anymore. That's already really eaten into their business model. So now they're more dependent than ever on ads, specifically digital ads. And now generative AI poses a threat to that model.
And so I'm glad that they're able to get some Uh, some revenue back from being able to at least monetize the high value content creation that they do. Um, because yeah, if we end up in a world where you just have gen AI models learning from generated content online, that might not be ideal, though maybe we can figure out ways of making that work. So yeah, so. Make sense from the perspective of someone like OpenAI to be paying for Gen AI content, high quality content.
Hearst gets to make back some of the money that they could lose now or in the future, I guess, by not being able to have people click through as often on, say, Google search their ads. And in the long run, this does pose interesting questions for not only publishers business models in terms of ads, but also someone like Google's.
Because, you know, their own tools like a Gemini search if there's, you know, there's rumors that Google D bind had similar kinds of capabilities to chat GPT, but it wasn't released because that eats into Google's core business model of display advertising. Click through advertising, you know, uh, digital attribution.
When you have gen AI models that are just providing exactly the information people want right now, mostly in a text display, where maybe you can find ways of sneaking some display ads in on the side, but in the future, you're going to have more and more interaction with these tools through just voice and audio. And how do you then Insert sponsorship in there. Like, maybe you can have okay, like kind of boosted results. And I guess that's some way that people will go.
But it seems like one way or another, Google's business model will get eaten into, you know, there are decades now of absolute, uh, effectively monopolistic dominance in search. Um, Where that's all paid for by digital advertising could be heading into here.
Yeah. So, uh, you know, lots to say there. I think this is one of these stories in general, and these trends, not pretty sexy. They're not getting like the main headlines and they're not, you know, being the topic that's getting all the hype on Twitter, et cetera, but it's Actually, like, uh, subtly a very important thing to be aware of and think about because it speaks to really the future of the internet and the future of search, right?
And so, uh, you know, in comparison to all these deals from OpenAI, Perplexity AI has also done something kind of similar. They have a revenue sharing model. They have this publisher's. program where they've also had companies like fortune time, entrepreneur, et cetera, join on. And, uh, it's, it's very much on point. As you said, perplexity, they'll be, uh, showing their sources. They'll be providing links, but for the most part, presumably you won't be clicking to go to those sources.
You'll just be reading what the AI says. So we do need some kind of new business model for the people actually writing up the news. Uh, and this increasingly seems like that business model and not just that, but you've seen also Reddit and Twitter, you know, at least read it for sure. Signed deals to license their data. To, uh, I believe a Google and that's another aspect of the internet now where, you know, any content that we provide on these platforms can be monetized for data.
So it's, it's one of these realities that is kind of interesting and speaks to like the development of the internet at large and our information ecosystem at large. Uh, even though it's a little bit dry on the face of it.
One thing that I was reminded of as you were speaking there, you touched on it a little bit, or it, it led me to this thought, which I think is really important here, which is the ethical quandary as well, where as this kind of business model, where you're, you know, the way that you phrased it there, where you were talking about how, you know, who's going to actually pay the journalists who are creating the high quality content. And so once you are a newspaper, a magazine where maybe.
In the not so distant future, most of your revenue comes from Gen AI companies. How does that affect your reporting about Gen AI companies? So, if a player like OpenAI or Anthropic cohere becomes like, you know, this unavoidable monolith in kind of culture in general, really influential, and simultaneously, they're paying all the bills All the information sources that we get our information from, there's an ethical quandary there. It's interesting. Are you going to get unbiased reporting?
Exactly. And, uh, you know, there's a lot of interesting implications like, you know, for the ad model, you've seen, uh, quite a bit of, you know, shift to clickbait essentially, and, and getting, uh, headlines and things that drive people to click through and read the article. With this shift, Right. That's not kind of as important. So maybe people will start covering things that are just trendy, but people will be asking search GPT about, right.
Uh, and there's a whole lot more to say on this, like New York times, for instance, they've moved to a subscription based monetization model of large part. So, uh, more and more media publishers need subscription type revenue streams, which It's a little bit distinct. Anyway, I think we've said enough of this, uh, but there you go. OpenAI paying even more media publishers. Onto projects and open source.
We've got a few neat stories here, starting with OpenR, an open source AI framework enhancing reasoning in large language models. And this is actually paired with a paper that describes the whole system, not just an open source project. The broad idea is essentially to provide a framework that allows you to do all one type reasoning with open models.
And this is a collaboration from a bunch of organizations, University College London, University of Liverpool, the Hong Kong University of Science and Technology, and some other ones. So they go into a lot of detail about how the model works. They treat it as an MVP, uh, right. To where you have sequential steps, essentially of the LLM. And then for each step, you can provide a reward.
They have a whole paradigm for training, the reasoning to be able to do, um, Good reasoning, step by step, and they do show some interesting empirical results where you can see that using this reasoning paradigm, as you increase the generation budget, something we've talked about a lot, the inference scaling, what you get as, you know, post training, you can now just scale them out of time and resources you give your model to be able to generate a better answer.
Also do that experiment and show that in fact, this approach results in better, uh, outputs and even that's better than just simple things like majority vote. Unfortunately, this is coming from some universities, so it's not really evaluated at scale. It's pretty small models, but either way, I imagine we'll be seeing a lot of this sort of stuff with open source efforts to replicate or one type reasoning. Moving right along, we've got another open source release. This time, it's a benchmark.
It's the MLE Benchmark, Evaluating Machine Learning Agents on Machine Learning Engineering, and it's actually coming out of OpenAI. So, this benchmark is meant to look at machine learning engineering tasks, and If you're a software engineer, you might find this kind of fun and interesting. This benchmark includes 75 machine learning engineering related competitions from Kaggle. So Kaggle is a platform. It's been around for quite a while.
They do have competitions where you can submit, uh, an answer to a machine learning problem where if you're in a top performer, you can actually earn money. And there's been many participants. It's, it's a pretty big. platform. So in this benchmark, they, uh, allow agents to try and win in these competitions. And they have some interesting comparisons looking at, uh, different scaffolds that take actions by calling tools. They also have this aid thing, which is purpose built.
to perform at research over solutions on Kaggle competitions, and they actually say agents run autonomously for up to 24 hours in their experiments. So if you look at figure for GPT 4. 0 MLAB, they have total steps. 216 and runtime at two hours for GP 40 aid. They say they have a 30 nodes in a tree search with a runtime of 24 hours. So pretty, you know, it's getting very autonomous at this stage. And the benchmark is pretty hard. So even for the best performing model, Oh, one.
Preview, uh, to get to metal, which is to say to place well, to place better than most people, all one is only able to do that 17 percent of the time, roughly. And, uh, they, these models often make invalid submissions. So even the best one only gets to 82 percent valid submissions, which hopefully most humans are able to do. And if you don't do this aid thing, which is purpose built for Kaggle, you do much worse.
You. You know, going to single digit, uh, percentages on the metal placement and, you know, 50, 40 percent valid submission rates. So a very intriguing new benchmark in the realm of, uh, agentic AI, software engineering AI. And it will be seeing if we can, you know, destroy it has been the case of many AI benchmarks.
Yeah, this is great. Exactly. Yeah. We're just constantly having to come up with more and more complex benchmarks. A couple of years ago, this would have been seen seemed like an insane benchmark. Like what are you creating this for? Obviously machine can't do this. I don't know if a machine will be able to do this in my lifetime. I mean, this is something.
You and I in your Super Data Science podcast episode, Andre, we talked about how mind blowing for both of us, the release of G PT four was, and for both of us, that was a moment that we were like, holy crap, A GI in our lifetimes, a SI in our lifetimes, this is probably happening. Um, and so, you know, pre GPT-4.
If somebody had said they were making a benchmark where you're evaluating people on these extremely difficult machine learning engineering tasks that require a lot of thinking, a lot of outside information, a lot of steps, I would have said, why, that's a waste of time. I don't know if we'll have machines that can do that in our lifetime. And now. You know, this aid scaffolding AI DE, uh, sounds like a key part of the success, um, that OpenAI is getting here on the benchmark.
And so to be getting, you know, as you said, O1 preview, which of course it's O1, that is vastly outperforming all the other models that they tested with the aid framework. So they also use GPT 4. 0, they use the biggest llama, which is four or five B. Uh, huge and claw 3. 5.
Sonnet, you know, the aid framework with those models, as you said, it was the only way that you could get kind of any decent performance, like the, other than that, you know, other than using the aid approach, the best score was 4%, uh, you know, 4 percent chance of getting any metal in these competitions. Um, or maybe, you know, maybe a good thing to be looking at here would be looking at the above median performance.
So, um, you know, you could get 7 percent of, of your performances above median, um, above median human performance, human submission performance without aid with aid. You're able to get 7 percent or better with any of the LLMs that I just said, GPT 40, LLAMA 3. 1, 405B, CLOT 3. 5, SONNET, um, and O1 Preview won't be a shock at all to people who have used O1, crushes, uh, any of the other models, uh, where AID was involved.
And so, you know, the next best model was GPT 40, which achieved 14 percent above median scores with AID. O1 Preview got double that. 29 percent and you know, we know that the full on model will be out soon. It's interesting that even the internal researchers at opening, I weren't able to use that. I guess it's just too early in development. Um, and so yeah, when that comes out, you can expect that to continue to go up. So you start to have, okay, you know, these are relatively low percentages.
You know, the best approach here only gets a third, uh, of its answers above the median human performance, but these are extremely difficult tasks with so many steps and I bet you in a year, this is the kind of thing we'll be looking at not 30%, but we'll be looking at 80 percent
exactly. I think that's. You know, almost certainly the case. And, uh, as you said, it's, it's pretty mind blowing to consider what we're getting here, right? We're getting fully agentic AI that does basically just decides how to go about solving a problem and then does it with the case of aid, they. do run it for 24 hours and they, you know, do all sorts of search processes and so on. So I'll be only getting better and better at this benchmark over time.
And it's kind of interesting, like, you know, the best system is able to get 10 percent gold on these competitions. So maybe you could earn some money, you know, by getting the AI model to work well, although that might be more effort than actually winning the competitions. And now another release from OpenAI this time, it's more of a package, and the package is Swarm, an experimental AI framework for building, orchestrating, and deploying multi agent systems.
So this is paired with this, uh, kind of cookbook, uh, post that is Basically an example of something you might do called orchestrating agents, routines, and handoffs. So, under the OpenAI organization on GitHub, where you share code, they did release this swarm package, which in very bold font says Experimental educational, right? So I found that kind of amusing how much they emphasize that this is just a simple framework.
It's not meant to be a standalone library and it is mainly for educational resources. Either way, the idea with Swarm is that you can have agents that, you know, for instance, hold an ongoing chat with you based on some instructions, and then they can do handoffs. So you can say, okay, now you go interact with our agent. Kind of like if you call, you know, uh, your health insurance or something, and you talk to one person, and they, they transfer you to another person, that's a basic idea here.
And so, yeah, very much, you know, not a huge deal in the sense that this is experimental and educational, but does indicate OpenAI continuing to invest in agentic AI and, and having more autonomous AI as the future. Yeah,
this is definitely a big release. I'm glad that we're covering it in this episode. This is certainly what I've seen is probably the biggest splash in the past week on my social media channels. And, uh, yeah, I think agentic AI is a big, it's the most exciting topic right now that we have in AI. These systems becoming increasingly autonomous, multi agent systems being able to work together. And actually on that note, I'm going to plug something that I'm doing, which is.
On December 4th, Wednesday, December 4th at 9 a. m. Pacific, noon Eastern time, I'm running an online conference in the O'Reilly platform that is all about agentic AI. So we have a number of experts coming in, all great speakers in this space. Uh, talking about multi agent systems and how you, and there's hands on sessions teaching you how you can use, uh, open source tools in Python to be able to develop your own multi agent systems for particular tasks.
So I think this is the most, you know, when I, when I was thinking about what topics should I be covering in this upcoming conference online, I was like, agentic AI, no brainer. And, uh, yeah, so I, I think it's super exciting and no doubt we'll be talking about swarm, uh, there on December 4th.
And on to research and advancements, and we have some kind of unusual stories, usually we go into papers and so on, but, uh, the only place I could think to put this news in is in this section, and the news is the Nobel Prize, uh, being awarded, to some AI people.
So first up, we got the Nobel Physics Prize being awarded to two scientists, John J. Hopfield and Geoffrey E. Hinton, who basically were very instrumental in the development of neural networks and all the lead up to deep learning, large neural nets. Jeffrey Hinton is one of these big names that has had a massive impact on the history of AI, and not just in the way you might think. So he has been a big player over the last two decades.
He was part of what kind of resurfaced and repopularized neural nets with some work in 2006, where he used some ideas from previous work to have an initialization scheme and really demonstrate for maybe the first time that you could get a very large neural network to be very performant. But also going back decades to the 80s, Geoffrey Hinton arguably was a big player also in repopularizing neural networks.
Then with the, uh, release and, uh, with the, um, kind of documentation almost of this back propagation algorithm that is a backbone of neural network training. So this was not exactly new at a time where it's been, you know, previous developments of the same algorithm, but the paper that Jeffrey Hinton published with some other authors definitely popularized it, made it very accessible and well understood and made neural nets kind of. Hived up in the 80s.
And later, uh, Hinton's work also led to it being hyped up in the 2000s and 2010s. And that's how he got here. So, you know, many people, including I think Hinton commented that it's kind of funny that they got the physics prize. There's no computer science Nobel prize. So, you know, uh, That's kind of funny, but either way, uh, does speak to the impact of this and John J. Hopfield, you know, not to ignore that, was a big collaborator of Geoffrey Hinton.
They worked together on the notion of Boltzmann machines, which are a little bit more physics like. And, uh, turned out not to be the big player, but did contribute some ideas in the development of neural networks.
Yeah. And so I dug into this, uh, initially for a social media post, and then now for, um, an episode of my podcast of super data science, um, that'll be coming out soon. I don't know if it'll be out before this last week in a episode or not. So I won't even get into that. I'll just basically give you all the great content, uh, from that here, which is that I dug into, you know, why. Was this a physics Nobel Prize at all?
And so, as you already mentioned there, there's no kind of computer science or computing Nobel Prize. The closest kind of thing is the Turing Award, which Jeff Hinton, Joshua Bengio, Janneke already won in 2019, eight or 18, one or the other. Sounds right. Um, for their contributions on developing deep learning. And For the Nobel Prizes, I'm kind of imagining this kind of discussion where the Nobel Foundation, this is completely made up, like I'm imagining this, right?
It doesn't seem like a huge stretch of the imagine, imagination for the Nobel Foundation to be sitting around thinking, Wow. AI is doing some really crazy things lately. Um, yes, there are certainly some risks as Jeff Hinton himself has been talking about it a lot lately since he left Google, but there's also huge, tremendous benefit to humankind. And they're like, we would love to be conferring some Nobel prizes on people for their AI work, but we don't have a category that we can do it with.
And so they're like, how can we do this? Okay. For physics, Geoff Hinton, who is arguably the most important single player in the development of deep learning, as you already went over in a lot of detail, Andre, he collaborated conveniently with this guy, John Hopfield, whom I had never heard of before, personally, having done a lot of research and writing on deep learning. But this Hopfield guy is a physicist and him collaborated a lot.
And, you know, he did make significant contributions to early, relatively early artificial neural network ideas, um, together with Hinton. And so, you know, some of Hopfield's research, um, is what they call biophysics where you are trying to. Emulate the physics of biological systems and this research fell into that. This neural network research in some way fell into that.
And so it kind of ties together physics with Hinton in some way, um, and allows them to justify giving the physics prize, um, to Hinton. And it's great to see him, uh, get a Nobel prize. The next story that we're going to talk about with a Nobel prize in chemistry, that one is more straightforward to understand. You have to make a less of a leap.
Oh, it's fun to hear from Hinton. He's very direct in interviews. So this New York Times piece has a quote from him. So he said in this piece, if there was a Nobel Prize for computer science, our work would have clearly been more appropriate for that. But there isn't one.
Uh, so, and, and just to be clear, the work, uh, from the 80s on Boltzmann machines and something known as Hopfield networks, which is actually by Dr. Hopfield, those do have more relation to physics, as you said, the way, um, if you dig into the math, it, uh, relates to physics and it did kind of connect to physics in some way. So not totally ridiculous, but You know, somewhat strange, but a bit less strange is the Nobel Prize in chemistry, which is also deeply related to AI.
So, this went to Demis Hassabis and John Jumper from Google DeepMind and David P. Baker from the University of Washington, and guess what, it's for their work on alpha fold and things like that. So as we've covered, you know, in the past couple of years, DeepMind has done a lot of work on making AI models for scientific simulation and understanding. The big one from them has been Alphafolt to be able to model how proteins work. So it makes a decent amount of sense for this Nobel prize.
Uh, Alphafolt has had a very significant impact on the field and did represent a pretty massive amount of progress on this particular problem. Uh, so yeah, lots of Nobels going to people from AI this week.
And you can expect, you can expect more in the future. No question. I think that this kind of sets a precedent that just in the same way that the chemistry Nobel has in a fair bit in recent years, gone to biological advancements, you know, Big, you know, gene editing kinds of techniques where it's kind of like vaguely biochemistry in the same way that you could, you know, say Jeff Hinton stuff was kind of vaguely biophysics because he's working with a biophysicist.
And like you say, the hotfield networks. So, you know, there was this argument in the past to being used for chemistry, um, you know, to, uh, to be able to award lots of biological advancements and this falls squarely. Into that category, you've got, you know, we have superhuman abilities to be able to take the sequence of a protein and predict it's three dimensional structure, which is so mind bogglingly complex.
Humans cannot do this, or there's You know, on any of the kind of complex structures that, uh, that AlphaFold can succeed on. And so this is an example of an artificial superintelligence. And so it isn't the kind of ASI that is associated with the singularity, because it's not general. It is very narrow in its application, but this is one of my favorite examples. AlphaFold has been for some time now, been my favorite example of an artificial superintelligence where we now have.
Not just in terms of processing speed, but in terms of some intellectual capacity, a machine able to do something that humans cannot do. Um, and that's pretty damn cool.
And not to give the only attention to DeepMind here, David Baker was also awarded the prize. He is a professor, not affiliated with DeepMind as far as I know. And he has also had a long history of research in proteins. Uh, his work led to the creation of first synthetic. Protein and he was a big part of the creation of Rosetta, which is similarly a computational tool for things like the design, uh, of, and small molecule docking.
So, uh, another related sort of effort to create better computational tools for scientists. And, you know, again, it, it might seem like, uh, them sabas, John jumper, aren't. Chemists per se aren't scientists in the field, but as pointed out in statements, especially in chemistry and in these fields in general, uh, experimental progress is still progress. It doesn't have to be conceptual. And that's what these computational efforts represent.
And onto the lightning round, going back to talking about engenetic AI and throwing some cold water on it. We have a story, LLMs can't perform genuine logical reasoning, Apple researchers suggest. So this is a new paper released by people from Apple, titled GSM Symbolic Understanding Limitations of Mathematical Reasoning in Large Language Models. And what they did was modify GSM 8K, which is a standard, uh, benchmark with over 8, 000 grade school level mathematical word problems.
They, uh, modified it to have this, uh, set of symbolic templates that allowed for the generation of diverse set of equations. We, um, covered a similar effort, uh, not too long ago from scale where they also generated variations on a popular benchmark. And when they tested the models on this new benchmark, Once again, there was a big drop in performance.
The drop in performance was between 0. 3 percent and 9. 2 percent depending on the model, which means that essentially these models to some degree trained on the benchmark, or they were optimized for the benchmark. In theory, you know, this is the same level of difficulty, there shouldn't be. Uh, any change in performance on this new variation, but in fact there was.
And, uh, you know, there's some deals here, like for instance, adding a single clause that seems relevant to a question causes significant performance drops, even though the clause doesn't contribute it to a reasoning chain. Uh, there was also a high variance across different runs of a GSM symbolic with different names and values.
So various details like that, that once again, demonstrate benchmarking is hard and we shouldn't trust numbers of, uh, benchmarks necessarily because there's a lot of, uh, complications there.
As you talk about all the time on your show, benchmarks cannot be relied upon, and this is exactly the kind of thing that you see, the kind of concern that you have when benchmarks come up. Everybody, when they release their LLM, they want it to be the state of the art across all the big benchmarks. And so it would be very hard to resist the temptation not to fine tune at least, um, or to, you know, to be trying in some way to be gaming those benchmarks which are publicly available.
The, it's not like you, you don't have access to the training data to be able to make your model perform. So it's unsurprising that the models perform worse. Um, and yes, it should also come as no shock that LLMs can't perform genuine logical reasoning. They are not designed to. They are just predicting the next token and it's amazing what they can do given that.
Yeah. And as we've seen before, previous kinds of comparisons, the drop varies quite a bit. And when you get to larger, more sophisticated models, the drop is less bad. So actually for GPT 4. 0, they have a drop of less than 1%, 0. 3%, 0. 1 mini, 0. 6, 0. 1 preview, 2 percent versus something like GEMMA and, uh, Mistral, which is seven or nine percent, pretty big drops, which is to say that, you know, benchmarks can still be relied upon broadly for like, you know, a general evaluation.
It's not like the numbers are meaningless, but the exact numbers, the exact ranking of models, that's a bit where things like this, these kinds of benchmarks cannot necessarily be relied upon. And the next paper actually adds to that. So I found it fun to include both of these. The next paper is Not All LLM Reasoners Are Created Equal, and they do something a little bit different with the same benchmark.
So they look at the grade school math benchmark, GSM, and instead of creating a new variation of that, what they do is have this interesting test where essentially you need to get two questions right. in a row as opposed to just one. And so what you would expect is your performance on this variation of a problem is your performance on the standard benchmark squared, right? You multiply your success rate by itself because there's two, uh, Problems in a row.
And that's what you want to see ideally. And as with the Apple results, that's not actually what you see. There's a reasoning gap that emerges. And similarly, the gap is, is very comparable. We got, you know, GPT 4. 0, these bigger models with relatively small gaps and huge, massive gaps for things like PHY 3 and GEMMA, LAMA 3, uh, AID B, You know, the smaller models, broadly speaking. So there you go.
There's another demonstration on this particular benchmark that the benchmarks aren't quite as precise as would be ideal. All righty, moving on to policy and safety. We have a pretty good Pretty interesting development related to Entropiq. So Entropiq CEO goes full techno optimist in 15, 000 word PN2AI is a pretty fun title of a stack crunch article. And there you go. It's, it's about this very long blog post that, uh, Dario Amadei, the CEO of Entropiq released.
Pretty unusual that it hasn't been, uh, you know, unlike Sam Altman, who goes like on all the podcasts and has released already, uh, blog posts, uh, with his thoughts. This isn't something that Dario Amede does quite as much, but now he has. And it's a very, very detailed examination of the implications of AI, in particular, the positive implications. So he begins with this whole thing of like, why don't I go and be more, uh, positive?
Uh, it kind of feels a little bit like maybe he was just talking to investors and, and making it seem like a topic is a bit less safety oriented or, or is, uh, optimistic about AI. Either way, a lot of interesting. Notes in this, it's, it's very long, but I would say it's very long because it's a nuanced. So he goes into a lot of detail and it's, it's not sort of like rambling. It's. Very well thought out from my read of it. The gist of it is that Amre defines a notion of AGI.
So he says AGI isn't a very useful term. Instead, there's this term powerful AI. And the definition of it is AI that is smarter than a Nobel prize winner and feels like biology and engineering and is capable of performing tasks like proving unsolved mathematical theorems and writing high quality novels. And, uh, he has this nice phrase for summing up the implications of it is if we get these powerful AIs and we can run multiple instances of them, it's not computationally crazy.
We would get a country of geniuses in a data center. And so the beginning of a blog post sets up this idea, sets up the belief that we'll be getting this in probably five to ten years. And then the rest of it, maybe 10, 000 words or something like that, is about the implications for different things. So he talks about the implication for, uh, biology, right?
For being able to make a lot of problem, uh, progress on very significant issues we've, uh, been unable to tackle like, um, you know, obviously cancer, curing genetic diseases, halting Alzheimer's at earlier stages, all coming in the next seven to 12 years. He says, uh, there's this nice term again, uh, you can think of it as the compressed 21st century. century.
The idea that after powerful AI is developed, we will in a few years make all the progress in biology and medicine that we would have made in the whole 21st century, uh, without it. And then there's multiple other topics he gets into. He gets into uh, inequality and the implications for economies, particularly of the developing world, where again, there's some nuance.
He says, I'm not as confident that AI can address inequality and economic growth as I am that it can invent fundamental technologies because technology has such obvious high returns to intelligence. Whereas the economy involves a lot of constraints for humans. You know, that's the kind of writing you're dealing with is pretty sophisticated stuff.
Uh, but yes, he gets into things like global economy, climate change, uh, even the sort of like more philosophical things regarding meaning and the work. And of course he does have a lot of caveats about to potential, uh, side effects, you know, dangers, et cetera. It's a very long essay, but I think a very, uh, Interesting and well thought out essay on the implications of a powerful AI coming within this next decade, most likely.
No question. Uh, this is, uh, super, uh, aligned, a super, this is super aligned, uh, like hopefully our future AI systems. With my own take of what's going to be happening in the coming decades, I am highly techno optimistic, like Dario Amadei is, and he hits the nail on the head with so many points in here. The country of geniuses in a data center is perfect.
Like this is, that's something I talked about a lot around the release of 01 a couple of weeks ago, where you extrapolate this kind of capability where you scale more, uh, in terms of, you know, Some something like increasing the number of parameters or something that has that kind of ability to increase the nuance of these models, plus you scale
inference time, and it's not hard to see that we could have in a couple of years, AI systems that are like what Dario Amadei here calls powerful AI, where it's like a Nobel Prize winner, you know, you have Jeff Hinton and Demis Hassabis level of intellect in machines. And then, like you said, Andre, you can scale that up. You can have a country of geniuses in a data center, all thinking away at complex problems.
There are some things like, you know, he talked about Alzheimer's specifically and some things like that, those do seem a bit trickier to me because those still require real world experimentation. The algorithm can have a hypothesis, but then you need to test it on, you know, lab rats and then eventually humans once you've confirmed that that's safe over a number of years. So there's some kind of constraints. It's like that classic, you know, Nine women can't make a baby in a month.
There's some kinds of things related to scientific inquiry that are beyond just things you can infer based on information. There's some new information sometimes that needs to be acquired from the world with experiments that take time. Uh, so that will, you know, slow down some types of progress.
The bigger kind of progress that, uh, Dario also hits the head on here is social progress, where unlike fundamental discoveries, which are You know, something that, you know, you can just have happen, uh, kind of unconstrained by social or governmental pressures, the then distribution of that equitably, um, That, that is still something that could be a really tricky problem.
Like, you know, you might have, you know, you might be able to create crops that would provide high quality nutrition to everyone on the planet. You might technically have that, but that doesn't mean that North Korea is going to allow you to have those crops in their country. . Andrey: Exactly. So I also share your kind of take where on in general, I'm a more an optimistic front.
Uh, and this is a very grounded, very analytical exploration of why we should be optimistic and sort of the details of it. Uh, to be a bit more precise, he defines powerful AI as these things and says that it could come as early as 2026. Well, there are also ways in which it could take much longer. And what it actually focuses on is what happens in the five to 10 years after we get powerful AI. So the idea being that, you know, we get all of these very nice.
outcomes, ideally from AI, if we don't mess it up. And of course, it does also call out that it's not all, uh, roses, you know, there are dangers associated with AI, and we do need to think about it. And Froppic is heavily focused on AI safety. Uh, so, You know, very nice read, I will say, you know, in a lot of discussions of AGI, of the future of AI, it tends to be a little bit more sci fi, not grounded in definitions, not grounded in analysis, this very much is.
And just a couple more stories, the first one is once again about nuclear power. So Google has partnered with Kairos. Pyre, uh, Kairos Power to construct seven small nuclear reactors in the U. S. to power these technologies. The first reactor is expected to be operational by 2030, with the rest deployed by 2035. So, the statement from Google is that these would offer a clean, constant power source that could meet electricity demands with carbon free energy.
Uh, cash power is a nuclear energy, uh, startup, which has some kind of newer technology related to nuclear power. Uh, of course, this is related to a recent move by Microsoft also related to, uh, using the undamaged reactor at three mile Island. So, you know, overall seems like AI is making us embrace, uh, embrace nuclear power at a rapid pace, uh, which hasn't been so much the case. In the past century, really. And the other story we are going to cover is more on the research front.
As we often do, we have some research related to safety. The paper is LLMs know more than they show on the intrinsic representation of LLM hallucinations. So, what this goes into is how recent studies have shown that LLMs internal states encode information regarding the truthfulness of their LLMs. And this can be used to detect errors. And what they do is analyze this pattern a little bit more.
So they show that there are, uh, these information about truthfulness has more, uh, kind of clarity than previously realized, and that information is concentrated in specific tokens. So that means that looking at these specific tokens leads to much better error detection, although these error detectors don't necessarily generalize entirely, meaning that truthfulness encoding is not universal. The encodings are somewhat different across different topics, for instance.
So, you know, very practical, uh, implications here, of course, hallucinations, uh, LLMs, just making stuff up is a big issue with using these tools to practice and these kinds of approaches that look at, um, the outputs for presentations to detect whether LLM is hallucinating or not. Could very much be deployed by LLM providers. Yeah, it's important research here.
Being able to stamp out hallucinations almost entirely is key to so much success in AI, including particularly when we talk about agentic AI systems, multi agent systems. If you even have a 1 percent error rate in a multi agentic system that has 10 agents passing off between each other, that 1 percent error rate compounds and becomes a really big deal. So, you know, you need to be talking about fractions of a percentage.
Um, and the smaller, the more zeros we can put in front of that, uh, after the decimal place, the, the better our future AI systems are going to be.
And on to the last section, Synthetic Media and Art, and this is also the last story, we only have one here. We started with Adobe, we are ending with Adobe, uh, and this one is not related to an UI tool. It is a free web app called Adobe Content Authenticity. And that will allow creators to attach content credentials to their digital work.
So you can think of it as a nutrition label for digital creations that provides information that makes it easier for original content to be traced back to its creators. So this web app, uh, enables photographers and digital artists to apply these content credentials to all their content. It can include a verified name or identity and links. to their website and social media profiles.
And this is the kind of thing, you know, we've been talking a lot about with regards to AI, that we would need something like this, some metadata attached to, um, files to media to let us know whether it's AI or not, whether it's by a real human. Uh, or some products. So in addition to watermarks like Synth ID that are being built into Gemini and also meta tools, this could be used by human creators when they publish their photographs or their images. To essentially claim credit.
And on top of having this, uh, be a tool, they're also releasing a Google Chrome extension to allow users interact with and view content credentials online. Uh, and of course, as a user you can also inspect the content cred, credentials of a given file, uh, to be able to see that and, you know, it can sort of act like a digital. fingerprint almost. This is now in free public beta, uh, or will be rather, this will be available in free public beta starting early next year.
And they do say this, it will be further integrated into their actual apps like Photoshop, et cetera.
Yeah. Nice to come full circle with Adobe here. Uh, the Adobe super episode, not sponsored by Adobe in any way.
Take on me. But yeah, I mean, I, I think this is one of these things, again, not going to get many headlines probably, right? Uh, hasn't been the talk of a town and AI, but, uh, might be things that could have big implications for many people who work in this field, who have concerns about, uh, AI generations for things like photographs or just, uh, misinformation. This is the sort of thing we need to embrace to tackle those kinds of things. Thanks. And that's it for the episode.
Once again, we are hitting roughly the 90 meter mark. Maybe we'll keep doing it. We'll see. Uh, so thank you as always for listening. We appreciate it. As always, you can go to lastweekin. ai to subscribe to the newsletter that will also have links for all of the stories here. And, uh, you can also look at the description of episode for that. We'll also have links to super data science podcast and to this upcoming event, uh, webinar hosted by John coming up in December.
And thank you, John, once again, for being a fantastic guest cohost.
My great pleasure. It's truly an honor to always be on my favorite podcast to listen to the only podcast I always listen to last week in AI. Yeah, such a treat to be on here. Thank you, Andre, for the invitation. And yeah, thanks all you listeners for your time and attention again today.
Yes, exactly. Thank you listeners for listening. Listening, uh, presumably if you're here, you're listening to a very end, which I always find pretty impressive. Uh, and, uh, as always, we do appreciate your comments, reviews. I try to keep a lookout to make sure I don't miss any. Uh, and of course, since you. Uh, got to the end of the episode. Hopefully you will enjoy the full version of the AI Generator song I made for this episode.
Stories of a week, where tech inspires, last week in AI. Dive into the code, where the future resides, future unfolds, last week in AI. Tales yet untold, we sing it out loud. Stories of a week, with stone on lines, Memories in our sight. Towers that start by spring, Float in a tide, gather promise. Tales of the week, from fires of creation, Excellent visions, striving, changing the nation. Stories unfold like electric lines, In this episode, where day has embraced.
She wears an inverted tube, to everybody who wanna picture the planet on display.