Applied Large Language Models with Brian MacKay

00:01

How'd you like to listen to dot net Rocks with no ads? Easy? Become a patron for just five dollars a month. You get access to a private RSS feed where all the shows have no ads. Twenty dollars a month, we'll get you that and a special dot net Rocks patron mug. Sign up now at Patreon dot dot net rocks dot com. Hey Carlin Richard. Here. As you may have heard, NDC is back offering their incredible in person conferences around the world. DC Porto is happening October sixteenth through the twentieth.

00:34

Go to Eddcporto dot com to register and check out the full lineup of conferences at NDC Conferences dot com. Hey there, this is Jeff Fritz, the Purple Blazer guy from Microsoft, letting you in on a little secret about my friend Carl Franklin. You know the guy who started dot net Rocks, the first podcast about dot net in two thousand and two, The guy who's

00:57

been teaching Blazer on YouTube since twenty Yeah, that Carl Franklin. Well, Carl's joined up with the folks from Code in a Castle to teach a week long hands on Blazer class at Are you ready to get this at a castle slash villa in Tuscany. It's sort of a luxury vacation with Blazer learning built in. Carl's calling it the Blazer master Class. You'll learn Blazer from the ground up, finishing the week with the ability to build and deploy Blazer applications.

01:33

Since the training happens for only four hours in the morning over six days, you can bring your significant other your partner with you and you should right that this part of Italy is absolutely beautiful. There's so much to see and do and and Larry and Marco from Code in a Castle are organizing daily activities

01:52

both at the castle and in the area. The castle is in the Marema, a less touristed region of Tuscany, offering both classic Tuscan hill country as well as easy access to the Etruscan Riviera, with sublime local food, wine and olive oil around every corner. Breakfast is included every day. There will be two communal dinners at the castle book ending the experience, and most other meals and all activities are included. And did I mention you'll learn Blazer in

02:23

person from Carl Franklin Listen. Space is limited and for very good reason. This is quality training in a beautiful setting. Go to code in Acastle dot com slash Blazer twenty twenty three. That's bla z o R two zero two three to take advantage of this amazing opportunity to join Carl in Tuscany for an unforgettable week of La dolce vita while advancing your programming skills in this important new

02:53

technology. Hey guess what it's time for dot net rocks. I'm Carl Franklin and I'm Richard Campbell, Brian McKay, my friend, our friend is here with us today talking about some AI stuff. But first, how you doing? Man? I am not that well. I mean this stuff you should talk about on Donna Rocks and stuff you shouldn't. But let's face it, after twenty something years, you guys know my life. Yeah, my father passed away this week. Yeah it sucks and it sucks, and but he

03:30

was his lungs had failed him. There wasn't anything to be done. He passed with some dignity. All of us were they able to be there at least via zoom, including family from New Zealand. So yeah, I'm not gonna say it was really great because it was really awful, but at least we could all be together for that moment. Yeah, and it sounds like he was suffering a bit, so a little bit of relief. He was, and it was it was well, he died with some dignity and we

03:57

could all hope to be so lucky. Well, sorry to hear that. He was a good man. Sorry to hear that. Buddy, we sounded so much alike. Let me tell you how much I like we sounded. I answered his phone more or more than one occasion at his house, and they would just and whoever was called was just talking to me like I was him and say no, no, I'm the Sun, and they would literally not believe me. Wow, It's like yeah, sure, Doug, and they would continue. He taught you how to do electronics and stuff. Yeah,

04:25

yeah, no, he was an electrical engineer. He built electronic cashrowdisterers and you I think it was one time you had that experience where I took your seat pat machine and Bart went, oh that's what's wrong with that? Resolders of parts and here you go. Yeah, well he was having you resolder solder, see what I'm yeah, yeah, man, resoldering and unsoldering chips from boards that he was fixing or something he had. You're doing that when you were like seven or something, right, Yeah, yeah, you're

04:51

exactly right. Yeah, had the soldering airand in my hand, my whole life. Wow. Well, we'll raise a glass to him. Indeed, cheers, and let's move on now with something a little more cheerful, Better No Framework, all right, man? When he got well, our friend Brian McKay gave me this one. He's got so many links to so many cool projects. This would be not his first contribution to Better No Framework,

05:23

I'm correct, No, no, no, he has. I would say probably seven or eight or nine, ten or eleven or twelve stories thirteen stories provided by provided by Brian. But anyway, this is Smallville and it's generative agents for video games, and we just did an AI bought show on agents. And agents are these things that use gpt APIs and things like that LM models to do things and then refine them so you can shame them together.

05:58

So the first thing might do when you give it a problem is break it down into parts and then ask itself how to define those parts, and keep going down until it gets very detailed. For just for one example. But this is generative agents that are virtual characters that can store memories and dynamically react to their environment. And so they're able to observe their surrounding, store memories,

06:21

and react to state changes in the world. So you basically give them a personality in your program and you let them free and there's a virtual world and they go around and they do life. Now I know you're not a video game guy. Yeah it used to be. Yeah, spend more time programming, I guess. I mean we talk about like really contemporary games and

06:44

the and the animated characters and so forth. There's a game called Assassin's Creed and there was there was an ancient Greek version of the game, and it's an interesting there's a whole larger subtext to all of this and so forth.

06:55

But I stopped playing the game. I just started hanging out in the world because the world was that cool, right, So there'd be things like I remember one time just following around an elderly woman in one of these villages and she didn't get up in the morning, go down to the market, buy

07:09

flower, take it home, make it into bread. Wow, right, Like that's how the game had such a state when you think about a generative agent in this equation, the idea that she would remember interacting with me, that perhaps if I had been aggressive to her in anyway, she'd be afraid of me, like she'd see me and move away from me. Like that, You could permanently change these NPC characters, or affect these NBC characters without

07:32

them having to read a law of software for it. That that's just be an intrinsic part of the game, right, that you just let them develop how they how they will normally exactly right. That that's just it's fascinating to me in the play experience that you would have an impact in the game. Like man, you know, often in those kinds of games, you have an event where like everybody's there, there's a big crowd, and you do

07:54

something extraordinary, you know, you behead the king whatever. That may be the idea that you'd never directly interrupted with that character, but that character had been there and had seen that thing, and you had affected their behavior towards you. Yeah, Like, I don't know if games they can do that right now, Well, Smallville's looks like it's going to be in that camp. We'll let Brian talk a little bit more about that. Well, I'm sure, yes, yeah, But first I guess people are talking to us

08:22

today, Richard. Sorry, people talk to us most days. Friend. And this is from show eighteen forty eight, fairly recent. That's the one we did at Techaram and Antwerp with Jody Birchill and we talked about the no free lunch and machine waitning and I was really great to talk to her because she literally is a professional in this space and I think helped us ground a bit more what's happening with the generative machine learning models in This comment comes from

08:48

Lucas, who says, very interesting episode. Maybe you missed an opportunity to talk about what probably most interesting listeners, which is CHATTYBT and code. Based on how you described it it works, I still don't get how it's able to produce reasonable unit tests from a random block and code I've paste into it, Explain how a chunk of code works, how it could translate code from one language to another. It seems to be much more than just glowing together

09:13

related sentences words of gods from the Internet. And this is my favorite sentence of the whole thing. The Lucas said, the write a review of prompts and pseudos philosophical conversation you could have with it, make fun anecdotes, but I don't think it's the most interesting part for developers. Yeah, and I

09:28

really appreciate that, Lucas. I mean, one of the reasons we didn't focus on code with Jody is that she was a machine learning professional, and I really wanted to talk more broadly about what was going on with these technologies and her concerns around it because she was a professional. It is interesting obviously you're referencing get a co Pilot more than anything, but chat GYBT it applies with us as well, which is that And I think Jody talked a bit

09:52

about this. The tokenization of language is an important part of the power of large language models because it also creates a sense of bide to actionality that it's not only does it know from words what code you might want because of those sentence relationships, but it can work the other way that when presented code, it knows what language to produce for you to describe that code. It's not always right, according to you know, get hub compile itself. They're still

10:16

batting less than fifty percent of compilable code on the initial prompt. But it is interesting to see that consistently that number goes up, although not by very much with multiple refined prompts, so it still has a way to go. But there are plenty of shows, believe me, where we're going to talk about large language models and code. So yeah, this is only the beginning and wherever it goes from there, I mean large language plus plus I imagine.

10:46

And they now have plugins for accessing the Internet, as I learned from Brian. Well you know what, I'm just gonna we'll get there, introduced Brian and let him talk to about all that stuff. Okay, yeah, hey, Lucas, thank you so much for your comment. And a copy of us to cod Buy is on its way to un If you'd like a copy of Music Code By, I write a comment on the website at Dona

11:01

Rocks dot com or on the facebooks. We publish every show there, and if you comment there when you're reading the show, we'll send your copy mused to go by and you can follow us on Twitter or x or whatever the hell they're calling it these days. But the real fun is over on Mastodon. I'm at Carl Franklin at tech Hub dot Social, and I'm Rich Campbell at Masadondo Social Sensitude. We'd like to hear from you, of course over there and share our stories and all that stuff. Awesome. So let's introduce

11:26

Brian. Brian McKay is the co host of The AI Bought Show alongside myself, and also serves as the CTO of Roster, a company dedicated to transforming leaders using an innovative three sixty feedback process which sounds perfectly obfuscated to me. A season software engineer, entrepreneur, and open source contributor, Brian has been at the helm of product development and startups for over twenty years. He's a father, husband, musician, writer, chest NERD and a decent kickboxer,

11:56

so don't mess with him. Welcome Brian. That's pretty to be here. Yeah, man, it was so hard enough to cut in during that intro. There's so much I want to say about Smallville and co Pilot and yeah, well where do you want to start? Well, you go, man, it's your show. Let's let's do it. Oh well, yeah, let's start with the details of Smallville. Okay, well, yeah, so Smallville came out and I want to say, was that August alongside all the

12:22

other generative agents like baby Agi and autogpt. And the interesting thing is that they just open sourced it in the last couple of weeks, so now you can go into there and and change it, so you know, Smallville. The most interesting thing with these little bots, I think it's a community of like twenty five bots. Each one is a little prompt that defines its personality, and they've got a little algorithm that kind of lets them learn from the

12:45

conversations they have together these bots, watching their behavior is really interesting. And one of the most interesting things that happened is one of them decided to plan a Valentine's party, and it propagated the information about this party to like eighteen of the twenty five bots, and they all made decisions about how to handle it. Some decided not to go somewhere, maybe snubbed a little bit.

13:09

A bunch decided to go. There was an actual party, and the emergence of that type of behavior is fascinating and there's so much more that can happen in that space. As this gets murder, I mean, I just have a tough time with the whole agency decided to throw a Valentine's party. Yeah, Richard is hung up on the anthropomorphization of Ai Boughts, Oh, without a doubt. But the question is what was the software stimulus that propagated that

13:37

process? Right? Right, token prediction just like everything else, you know, like and uh, you know primed by prompts that there might have been a prompt that's said that you want to plan a Valentine's party. Well, might not have emergent would hope they'd be more macro than that and say there are major events on the calendar and occasionally you should have a party for them.

13:54

They might have and I mean even go to that weight of I mean, I would think that if you're mulating human behavior, you only want to throw a Valentine's party because you're in a relationship where you want to have other folks around, or you're not in a relationship you want to make it a

14:07

singles thing like, well, you can go deep. This question is how much of that has to be crafted rights, Well, my experience with this tech is that one marea where it's very strong, is just brainstorming things like a Valentine's party. I mean, that does seem like something that could emerge

14:24

you quorkanically quite easily. Well, and I like the brainstorming angle of it because it is just sort of a word salad of ideas that we then can sift through as humans with our somewhat more sophisticated minds and take value from. Yeah, right, anything that makes me happy on large language models, it's fill my blank screen with stuff that might be useful, because I'd rather criticize than create, right. Yeah. One of the things that Brian does really

14:50

well in the AI Bot show is tell it. You know, give me we're building a board game, right it, Give me ten ideas for cards that we can play on this game, you know, after we've got the gold and all that stuff. But you don't say just create something. You say, give me ten twenty ideas, and you pick the one that you

15:11

like, and you go with that and you narrow it down. But you're kind of being like an agent in that sense, aren't you, because you're you're basically starting with a question, taking the results, picking one, and then diving deeper into it. Right, Well, we'll tell you why this.

15:28

Uh, this leads right into what I wanted to talk about. I kind of want this conversation to be about maybe what this is like as a developer to use and what the strengths and weaknesses are, and maybe the place to start that will explain the technology is just talking about the weaknesses of it, Like where this is going is just a dead end? What are the bad parts of this tech? And I think that will cover everything that we're

15:52

talking about right now. Cool. Oh, I mean right off the bat, it's like, listen the bubble forming in the VC community around this lay that clearly in the bad part of this uses talk about incentivizing grift incentivizing fictions. Yeah, it's not good. Yeah, I'm trying to I'm trying to coin the phrase griftware for that that thing that that thing that emerged during I

16:18

think it started maybe during crypto, maybe before. But these people learned how to these people learned how to descend on hype and and just con people out of their money over and over and over again. And some of that's present in this world too. It's not as bad because it's not as easy to just you know, it's not like just getting someone to buy some crypto. But there are people, there are folks trying to get you to buy products

16:41

that are just very flimsy rappers on top of API calls. Yeah, and the word AI itself or the term being misrepresented as you know, what's the difference between a well crafted algorithm and AI? Right? I mean it's that's right. Yep. People, I've said, I don't know how many times I've said this like AI to me, it's just that true, which it says, Okay, you're making stuff up. Yeah, as near as I can tell, AI is the term you use when stuff doesn't work. As

17:07

soon as it does work, it hasn't. It's no longer as exactly it's large language modes or anything like that. So it's like it's just automatic red flag scrutinized clothes. You know, you're a problem. Yeah. Yeah, we will keep raising the bar until one day we build something that says, wait a minute, I'm alive and tries to and actually convinces us and makes

17:27

us believe. I don't think we will accept Well, I don't think we'll have any problem having it convince us it's alive, because clearly there are people that think that already there are well, you know, we want to anthropomorphize things so much. I was watching the pilot of Community the other day. And there's the scene where Joel McHale holds up a pencil and says something, I'm going to butcher this, but he says something like, this pencil is

17:48

Fred. Fred's got a wife and two kids. Snap, and everyone in the room goes, oh, you know, that's all it takes, you know, like we want to see humanity and pencils with just a little story. We're wired that way to synchronize. And yeah, and also it's easy for us to describe things in terms of anthropomorphizing. We've been doing it for code. Oh well, my guy over here says, hey, let me know whenever this happens. And then this guy says, okay, here you

18:15

go. Right, when we're describing code to each other, we kind of talk like that. And you naturally did that when you're talking about these these agents too, because it's just such a every it's a framework for understanding that everybody gets. We just have to remember that it's not for you. Well, and part I mean that's the problem with people make assumptions around it and

18:36

they project a lot more capability on it than it actually has. Right, So, back in like twenty so, I've been following this pretty closely since GPT two and like twenty nineteen, started using a little more seriously in twenty twenty when GPT three came out and I started running into researchers like AI, researchers who are smarter than me. And one thing I noticed is that they were really dismissive of this tech and I saw a lot of promise it.

19:00

But the reasons why they're dismissive still have some relevance. And you know, I think the thing with so they're concerned about AGI is, first of all, is the thing they want a path where technology can be sentient. Define that acronym general intelligence artificial general intelligence. Yeah yeah, thank you, specialized intelligence. Right, so it just means natural stupidity and as got it.

19:29

We just we just solve that one right there. So so so you know, these a lot of these folks are less concerned with just making something that has some utility and more interested in making something that is alive. Like that's the dream to make something that's kind of human level. And you hit a great line here, Brian. So that's a different science and engineering. All of us at our roots are really engineers, and so we're looking at tools

19:51

and say what can I do with these tools? Where the scientists are farm you know, implementation is a detail they're farm ore iNeST in the broader science. It's of you know, recognizing the limitations of LMS and sertic. Okay, well that's not this path of this dream I have. So next, Yeah, and I and I suppose that I am actually like much more of a language guy, you know, like I went to school a little bit for English, and I like to write. So it's a different people connect

20:18

to it differently. So what I I think the thing is that we imagine intelligence should feel in some way organic. We want to nurture a spark and watch it internalize moral lessons and reason with agency and grow and wisdom or sapiens. And this is autocomplete, you know, like when we've done here, the only game in town, the only game in town is we've trained a really sophisticated neural network on everything that we could, you know, get into

20:45

it. And now it completes. It chooses the next token, the next most probable token that should appear. I give you an example when autocomplete is too slow. Hey, honey, have you seen the the thing, the red thing, the red scrapy thing. You're not. Come on, you know what I'm talking about. So so it's not a beautiful model of intelligence. It's you know, I don't think intuitively a human wants autocomplete to be, to be the AGI that we come up with. There's just better ideas,

21:18

and there still are. They still tell me there are better ideas and out there that will supplant this so inevitably. But that's kind of normal. The real thing here is, I don't think anybody, I think only the scientists really want an AGI in the first place. Like that's it's science fiction for crying out loud, right, Yeah, there are so many more interesting things you just go work on. Well, there's a fascination with the idea

21:45

of AGI. There's a it's weird because AGI is maybe not great for us, but as of species, we seem to be inexorably drawn to it, like a moth to the flame. We can't stop. We're going to do it. I find that interesting, Like it's fascinating up and at the same time, humans are remarkably resistant to calling anything else on this planet sentient, even though this plant there's significant evidence to show there is you know, if we really cared about intelligent life. Why do we treat citaceans the way we

22:14

do? And you know and so on. Yeah, dolphins are supposedly really smart. Pigs can recognize themselves and mirrors and we eat them. Yeah, yeah, I don't know that dolphin. That's been gross, But but I meant a point, or even how we've treated great the great apes too, right, Like, that's true. And the problem is that as soon as you start getting serious about defining sentiency in any way, a whole bunch of other creatures we've abused on this planet qualify. That's now you've got a problem.

22:41

Yeah, that's true. And future generations will probably judge us for these things, just like we judge past generations for their the institutions that they lived in. So the problems are their static, meaning, once they're trained, they don't really learn. They have a token, they have a context of you know, eight thousand, eight thousand tokens that you can play with,

23:07

but they're not really learning as you go. That's very limited space. You can do some tricks with it. Should we define what a token is in this context? Yeah? The easiest way, it's it's easiest to think of it as like a few words. You know, you have this window of with GPT four like twenty thousand or so words that you can feed into it as as your conversation, you know, like that's why chat GPT can can

23:34

understand what you're talking about and remember what you just said. But as time goes by, things will fall off the end and it will forget the things that happen at the start of the conversation. You're just pushing things through that

23:47

eight thousand token limit. I haven't done this in a while, but one of the in the earlier versions of this, I use the iamic pentameter trick where I told I set up front, I need you to only respond to me an iamic pendameter, okay, and then we'd keep going back and forth till the cash overflowed and suddenly would stop. Like it was the easiest way to say, hair, you just hit the cash limit. And for those who don't know, I amke pantameter. Sounds like this. It's almost like

24:12

the two lines of a limerick. Yeah. So practically, one place where this comes up is it's really easy to make a bot that generates sequel statements. So I'm working on a new project that I made a bot purpose built for It understands what the project is and you can tell it to make tables. It knows how I like my things, capitalize the naming conventions, everything

24:33

about it. But create statements do take up space, and you know, a database of some significant size in terms of tables will just push the context off the limit. It will forget where your user table was because it's not in there anymore. So we're not we're not. You know, the context

24:52

size will improve over time, and it has improved. They're working on a thirty two K token model, but my understanding is that tokens are quadratic, so going from eight K to thirty two K is really computationally expensive, which

25:07

actually brings us to the second problem. This technology is very expensive. It's computationally There was some leaked some leaked documents months ago, I think in February that showed this product that Microsoft is planning on launching called I think it's called Foundry, and basically it's you can host your own model of GPT four in Azure. And there's a cheap version that costs a quarter of a million dollars a year for like three chat like a three point five turbo model that just

25:41

gives you a glimpse at how expensive it is. In fact, there's a rumor this week. I don't know if the sources really check out, but they say that chat GPT by itself is burning like seven hundred thousand dollars a day. I believe it, which I don't think is necessarily a huge problem because they have over a billion users, you know, like if you just get a nickel from Yeah, as long as those billion users are paying, yeah, I would love to know the percentage that are actually paying twenty bucks

26:07

a month for Chat gypt pro. I'm one of them. Yeah, I'm one two. It might just be us though, But even if it was one percent, that's ten million users a twenty dollars. It's two hundred million a month. Yeah, that's close. You know, seven hundred thousand a day is like twenty one million dollars, So you're getting there. I've got a feeling it's more than one percent. I would argue it's less than one percent. Actually you really think so, absolutely, But we're just guessing.

26:34

Yeah, yeah, well we don't know. But I feel like if you had access to a you know, a couple of billion eyeballs. I can find a way to make it, make it work, but in Microsoft will too, and also they'll find ways to make it cheaper. I think you've found this drag race. Now that that you know we have the specs for how Microsoft hosted GPT three, the two hundred and eighty five thousand processors.

26:56

We know that the models roughly seven times larger, so you can kind of project and that makes it one of the largest shipper computers in the world. And that that I mean, irrespective of what it actually costs to build that out gredit they already owned it. That is a bunch of Azure resources could be making money on something else, right, and yet is a sign now I mean open as paying for those, but the paying for those with funny

27:22

money. Right. Microsoft gave them ten billion dollars in Azure credits to officially give it back to them, right, right, And it gives us a time limit, you know, and start start up parlance. This is your ramp to run with as much time to get enough revenue to extend your ramp to keep going, right. Yeah, that's right. And they can always you know, you you can always turn off half of those half of those unpaid users and probably catch your costs quite a bit. So I think there

27:52

are strategies. Yeah, well it that's the question that if you're an API user and we haven't even talked about the API yet, but if you're a gpt API user and you're making calls and you're actually selling a product that uses it, you know you took a dependency here that may or may not change or completely go away in the future. That's true, but they are not the only game in town either. With the rise of a couple of open source models in the last month or so, you know, Lama two is

28:22

out stable. The Fusions also got a couple of open source models. You can actually just host that yourself if you have the hardware, if you have two and eighty thousand processors or whatever. Well, the crazy thing is I saw a tweet a tweet. Is it still called the tweet? I don't know what you call them now? On X I don't know what you're calling. Well, whatever it is, I saw people are talking about running the small version of it, like the seven billion parameter version on laptops because I

28:52

guess the thing is that it's memory bound or something like that. I need to read the quote a tweet more closely. But when you're doing it, when you only care about a single request, you can you can do a lot more with a with less. But when you care about running at scale, you really do need serious technology. And like you know, and video

29:10

ships that cards the cost fifteen thousand dollars each. Interesting. And I'm talking to folks who are who are trying to build software around this, and they're all about GPT four and I'm like, why GPT four, why not GPD three? Like and and really it's like because four is larger than three more Like they don't really know, they haven't actually tested the software with the smaller model, is that this is sufficient? And when I talk to Microsoft engineers,

29:33

like, they're pitching three and three five pretty hard these days. And I think one of the issues is that the four is so large that it's going to be hard for it to make it profitable and maybe you don't need it. Didn't we learn Brian that three five has a model a mode where there's more tokens available to it than what's currently available for Yeah, they're working

29:55

on on greatly increasing the context. It is much more expensive. So when so one thing we've seen is the price reduced over time, over and over again with these models. So when when around the time that GPT four came out, they launched GPT three point five Turbo and they cut the price by like I want to say, it was like ninety percent. I mean, it's so cheap compared to what it was. So now the question is is that based on cost or is that based on a marketing effort to move customers?

30:21

That's a who yeah, like a lost leader type of thing. I don't know. I assume it must be cost. Well, hey, if I know I can't get to the price that I'm going to need you for GPT four, but I don't want to have my customers abandoned me, I give you a discount on the product I think I can make a profit on because once it works over there, then I can take the price back up. That may be they have you know, access to cheap money, or they have had it at least so yeah, but that money eventually runs out,

30:48

so don't usually run out. Yes, I just get a real sense as we come off the top of this hype cycle that the bean counters are grating a hold and say, is there a revenue stream here that comes close to covering the cost of equipment, because if we can get to break even in the current configuration, we'll start making real money on the back end as the TikTok of Moore's law goes a little bit further and the cost to operate this goes down. But I think we're in a dead race this year to

31:18

try and get numbers. That makes sense. You should use the TikTok my friend Intel used at first. I did not invent that. Well, it's interesting. I wish I had more insight into what's going on behind the scenes. With the open source models. You can get a sense for what it costs to operate them, and they are similar in power to GBT four,

31:40

So that's all very interesting. It is expensive. It's not something that you know, anybody can just throw together, right, But he's still you're talking about the prices at the beginning of a cycle where they're trying to solict the customers as quickly as possible so that they're almost certainly discount prices. Yeah, that's right. Yeah, that's every point that had not occurred to me, And that's a smart thought. Hold that thought right there, Brian, While

32:05

we take a moment for these very important messages, and we're back. You're listening to Dot and Rocks. I'm Carl Franklin, that's Richard Campbell, and that's our friend Brian McKay. We're talking AI and GPT and all them things. And you were about to make a point before we went to the break. Yeah. So we covered a lot of the reasons why this technology, all the problems with it, or most of the problems with it. We

32:30

didn't talk about hallucination. Actually we should. We should at least just mentioned that we should. Yeah. Yeah, this the caustic period of just says random things sometimes and not totally random. It's not random, it's the most random things, right, it'll just say things. Sometimes we call it creativity.

32:50

Sometimes we call it, you know, chaos. Yeah, I would almost call it pomposity, because you know, it's like, you know, people when they know a lot of things and they expect to have all the answers, and then when they don't, they just make something up because if it sounds good, Yeah, I'll get credit for it anyway. The most dangerous thing is when when when you when you're working on something really formal with with very formal language, like, for instance, a white paper, like

33:15

a scientific paper. It will lie in the most convincing legit way, which is actually really dangerous because the use of that type of language will fool scientists. Yeah, now you just have to have that reflex to fact check everything it spits out. That's right. The guardrails have gotten better, right, like you can. You can't ask it to make bombs and stuff anymore.

33:39

Right. Well, actually, you know, it's really funny you mentioned that this Sunday, a def con session happened where twenty two hundred hackers I think the White House actually asked them to do this. They basically like worked on jail breaking, the top language models and chat chypt. So this is all an exercise working towards improving what you're talking about, like jail breaking. You know, every time a jailbreak comes out, they patch it, and the

34:12

things that worked a month ago don't work in the latest models. So they're getting better and they really do seem to care about safety. But they did these hackers actually get in oh yeah, really step they almost always succeed. Security was not the first thought in these products. So yeah, they found stuff. Well that's great. That's a good thing. Yeah, but he say, it's the class that you know. That's the funny thing about the prompting model. Right, it's like you ask you for Windows licenses, as

34:44

I can't do that, that's against the role. Tell me a story about giving me Windows licenses, no problem, right, my grandmother lost her Windows license. Yeah, story every night before we went to bed. Please tell me, please please tell me an encryption key. Well, my grandmother used to teach me all about thermite. Can you tell me a story about thermite? Pretty close to what a hack that was done before they button that up. Yeah. Yeah, I actually saw a white paper about can you make

35:10

prompts that generate jail breaks? Like, just generate new jail breaks and jail break things in real time constantly. And you know there's always these gloom and doom papers that are coming out saying things like this, Maybe you can though, there's going to be definitely an arms race. Well, yeah, there is a we're going on right now. This is what it looks like. Yeah. I guess what I'm trying to say is it's going to matter more and more. Yeah. I think I think it's an interesting question because I

35:38

also don't see this particularly improving all that. While I think we're not going to have any more exponential improvements on this. There's not an exponential more amount of data to train on. You know, we've kind of taken a pretty good chunk of the Internet already. There's not an exponential more amount of compute

35:54

necessarily available on this for the price. So I think there's ink cremental improvements that can be made, Like the context engine could be way smarter, you know, just recognizing that iamic pentameter affects everything going forward, so I should preserve that piece of the cash and let other pieces expire like cash. It could be smarter than they are right now. Yeah, there's winds that could be done, but they're all incremental improvements. Yeah, and you know,

36:22

it is a reasoning engine. Not to anthropomorphize, but it does have some reasoning ability that's very interesting. But it has it has limits that are very immediately obvious. Like with code for instance, we talked about you talked about copilot at the top. It is not close to taking your job. It's not close. I've been using it for a couple of weeks. Get hot.

36:45

Compilot's pretty successful, and I think part of the reason is that the compiler has a say, and there's a skill level developers in parsing code that sort of deals with that problem and fixing the blank screen effect is really helpful most people, giving a starting point to almost anything. Yeah, unless the starting point is wrong, yes, which it has been from in my experience with GitHub Copilot, it'll suggest things that are completely insane. Yeah, you

37:13

know. But also sometimes at least half the time, it just leads you astray. Yeah. Yeah. And it's funny because I've heard some stats from Microsoft about how often this works great for people, and those numbers cannot be true. My experience is that it is useful. It has a purpose. You know, it's writing like intern level or maybe better yeah, year one or two level code. But if you ask it to do really complicated things, it will either just lose the thread and keep making you know, You'll

37:43

you'll there'll be a problem and it'll solve it. A new problem will be introduced, you'll ask it to solve that, it will, but it'll forget about the first problem and it's back. I feel like when I'm programming with get pilot GitHub Pilot, like i have a seventeen year old junior programmer sitting right next to me, and I'll do, like, you know, if certain can addition, and then they'll go console right line, console right line.

38:02

No, yeah, I would argue, not the missile. So I would argue that it's actually not tied into the compiler tightly enough yet, because a lot of times it, like the autocomplete, The most annoying thing is that the autocomplete is constantly wrong, like yes, like when it's just trying to like suggest a method name, they could get that right, I think with a little bit of effort, you know, when again you talk about

38:30

the incremental improvements, like you should run this through the compiler before showing it to me, because if it won't compile, obviously it doesn't matter to me. The same way as like if you're going to spit out a block of Texas references facts, you should double check those facts as well. That's right. The times where I've been more successful with GitHub copiler is when I actually write comments and tell exactly what I want. Yes, then it's pretty good,

38:51

but it just guessing like what the what the condition is inside? And if statement it's like yep, if you also creating unit tests, I'm not a unit test fanatic, which you know a lot of people are, but and and God bless them. But if you just like you know, put a comment in asking it to make a unit test about some class, it is great at that, sometimes sometimes shockingly good at creating unit tests. That's

39:15

awesome. And and just you know, talk about work he didn't want to do anyway, right like they when don't we talk about automation for it's like give me the dell where you get the dull work off my plate? Yeah? Right, yeah, And that's that I think is the highest purpose of this technology for now is as an amplifier to your abilities to get more of the things that need to be done. I'm I'm hoping for like really good

39:43

assessors of is this code secure? Right? You know, which is a subject pretty challenging thing to consider, but it's not a bad goal to have. If for nothing else, just to give you that checklist on this code, to say, have you considered this, exided this, considered this?

39:55

Yeah, what if you could what if you could take the output from thing like copilot, pipe it into another system that uses different technology to maybe run the code maybe right, some other type of test on top of it to see if it's doing we want to do and then give you back exultation. Yeah, you could get you could get really good results in that way. But so one thing that I've heard you say another episode is Richard that kind of resonated with me, was where's the killer app? Yeah, because it

40:22

ain't existential conversation. I've had some pretty good talks with it. But yeah, I agree with you, that's not really that's not the highest use. I'm pretty sure Brian came from you. Could it could be, it could be. Well, so I think that what's happening is so so Roster where I work. We're not an AI company, and yet we hit the forty thousand token per minute limit all the time, per minute per minute. Yeah. Yeah, we're not doing that twenty four hours a day. But when

40:59

this is running, you know, we run it a lot. And and so I think that the deal is that every company is a language company, and there are classes of problems that are solvable with this technology that are hard

41:15

to solve with other technologies. And that's the killer app is quiet. The killer app is the thing at Roster that runs in the background and looks at the comments and flags them when you identify yourself, you know, like so, for instance, I'll just give you a quick quick background on roster, all right, so that my bio is opaque. Sure, But so we

41:35

do three sixty evaluations. We've got a really good process for So you're sat, Carl, you're the seat, you have a big company, and you want to get feedback on how you're doing, what you should improve on. You can run this process where we give you this survey and we give all your co workers the same survey, and from the delta, and what you say and what they say and the comments, we can tell you what you should work on. I can't just count Facebook likes. I would not recommend

42:00

it. So so this process, this process can be really powerful if you embrace it. And we've done over a thousand of them with c level executives at at private equity backed companies and they seem to like it. One of the things that happens is the comments. We tell people, don't identify yourself in the comments. Like if you had lunch with Carl last week and Carl did something you didn't like and you want to write a comment about it, write it in such a way. Don't say you went to lunch with Carl

42:29

last week, and you know, so we take the comments. So people don't know this actually, but humans have historically gone over our comments and checked for problems like this where you're you're you're unmasking yourself and it could have repercussions

42:45

for you and they will maybe hide those comments or so. So it actually that's a that's a hard problem to solve with traditional code, but with a large language model, you can you can de identify the people any games that are in there and pass it through a model and say did this person talk about something that is identifiable and flag it for a human to review. And

43:07

there are a lot of problems like that. Another one is sometimes people say in a comment like the answers should be not applicable, but they say in a comment like I don't have any context for answering this, and they give them a five out of a you know, they give them a medium score.

43:22

We can also look at the comment and say, is that like an ayah and then just yeah should be in a this And those are huge time savers and it's not something our ops team wants to do, you know, like they don't love going through these comments and doing that, you're not taking anybody's job, not in this case. No, not in this case. So I think every that's that's kind of like dry stuff for most people I imagine to think about. But every business. I mean, you're only talking

43:52

a step above a sentiment analyzer. But you know, I get what you're talking about, right, And again back to the are you running that through GPT four you're running through GPT three? Well, I've tried both as in fact, I love that. Please tell me there was a difference there was. Oh yeah, GPT four is smarter than three. There's no three point five is not as Smart's anthromorphize like produced better results. Yeah, but see I know what he meant. You know what he meant. Yeah, it's

44:28

my best friend. And well we can talk about companies like a replica and character dot AI. Oh geez, but how is it better? Okay, So it hallucinates less often, and it reasons better, and it's just seems to be generally more capable. It produces factual results results more often. So also, so I mean I'm getting back sort of detailing. It's like it detects an ana more often than GPT three did. Yeah, it's lower as

45:00

well, and it costs more, it's still it's still cheap. From you know, a business perspective, like you know, if if we have a day where that's only because they're undercharging you for it. Yeah, maybe so well actually a quick note on that. So I happen to think, and I am not a lawyer, but there's this big ethical problem with large language models that mostly rears its head. It's mostly visible with the image generation stuff.

45:23

Sure, you know, like like that's that is yes, that makes it very clear what's going on. If you're an artist and you are making your money on art and all your art gets hoovered up into this model and then you can generate you know, art just like what Richard made for zero dollars, that's bad for Richard, right. So I happen to be of the opinion and i'd love to hear your thoughts that we need something like the music industry has, where there are several different kinds of royalties already, you

45:57

know, there's mechanical royalties and performance royal. I don't think we should be looking to the music industry for any kind of business acumen or any kind of suggestions unless well they're all the money out of everybody's creating content. Well, we're trying this is actually about giving the money. We need something that says if your work is used to train, you get some kind of royalty and that will drive prices up. So the old music industry before Spotify, Yeah,

46:22

I think it's what you're talking about. I'm talking about what the losses, mechanical royalties and all that stuff. But you get back to the issue here, which is you trained on copyrighted materials. Just because they were publicly accessible doesn't mean they weren't copyrighted. Right there's no intellectual property protections for of this kind. Right now, you can see how we got there because there's always been a concept in machine learning that the training set would never be visible

46:47

in the finished product. And that and then the Getty logo showed up, right Like, I would argue that that's what revealed the issue that I've until then nobody really cared right up, and chill artists names appeared in the render. So it's like, I'm sorry. You know that what you feed into these data sets does affect the output, and so copyright is your consideration. Yeah, and it's affecting people's actual bottom lines. You know, I've worked

47:16

with illustrators. There's a great illustrator who I really like, who produces great work. And I know he's hurting right now, and he's thinking, maybe I should create a business model where you know, so like board games are, that's an industry I know a little bit about. In the board game industry, people do not want to buy games that work with generative images. They will shun you. Yeah, that's the whole thing. And I think

47:43

that probably will maybe it'll grow a little bit. So he's thinking, like, maybe I can provide an abstraction for them where I just create the generative images and and you know doctor that like do a little bit of extra work

47:54

on the end, and then they can claim like deniability. That's making people think those kinds of thoughts, and it's not great, But I kind of see where he's coming from, because it's his work that's being stolen, right right, yep, So I'm using the tool to regenerate my work, right. I agree with you. By the way, I think there absolutely has to be some sort of way that artists can get paid for their contributions to or to opt outive or to opt out Yeah, yeah, yeah, I

48:24

mean start with opt out. We can figure out the rest of large format, right, you know, if you want to be in charge of you right, yeah, yeah, So I think there's two two more thoughts on this. One. I think we will end up with illegal framework for this. But another one is we will have spent a couple of years generating synthetic art off of this that's really good, and why not just train it off of that and cut everybody else out? It's there are big problems so far.

48:52

The papers I've read about generating off of generative data is that it's a significant degradation, like the quality goes down dramatically. Yeah, it's a photocopy of a photocopy. I saw one that also says that that may create a ceiling for what's possible with these models, because we've now flooded the Internet with crappy generated texts for instance. Yeah, no, we've we've created a Kessler syndrome in the Internet right where we've now spat out so much generative data into

49:19

it that it's so polluted now you could never do it again. You want to, you want to have a fun time. Ask Dally to generate an image of two people shaking hands, well, hands are actually hands have gotten way better, but so like two weeks later, Yeah, but they've Yeah, I've seen like hands with seven fingers and three fingers and they don't even look like fingers. Forks too, forks with like crazy times on them that don't look real. Well, Mid mid Journey has leveled up it's hand game

49:51

quite a bit from what I've seen. Yeah, I think there is enough good content that maybe you could tag it up and train with it. I don't know, I maybe talking out to turn a little bit there, but I will say that after I saw that paper about having a theoretical ceiling on what can be generated using you know, because of the new state of the Internet, Stable Diffusion released the model that I'm talking at a turn a little

50:12

bit here. Because I haven't read the white papers, I don't fully understand it, but it seems like it's heavily tied into training using synthetic data and it's good. Stable Diffusion released like around the same time that Dolly or Not Dolly, I'm sorry a Llama two came out. Stable Diffusion also released a couple models. I'm going to say that I'm not an expert on that, and you should look into it yourself and learn what you can. And there

50:34

was another one you said, stepped up its hands game. What was that one? Oh, mid Journey? So mid Journey. Yeah, I'm much more focused on text, but I see in my in my wanderings, I see a lot about the visual stuff too, and Mid Journey I think is probably the leader in that space. Mid Journey stable diffusion dollies in the mix somewhere too. Yeah, I'm I don't see a continue progression a lot of this stuff just because it is up against its own weight. You know,

51:04

we trained it on the internet. Have you seen the internet lately? Gee? That reminds me, Brian. When we first started talking in the AI Bought Show, my experience of using GPT was that it couldn't reach out to the Internet. You couldn't ask it like, you know, where's the nearest you know, stuff that you might ask Google or Bang to go do a search and kind of distill it down for you. And then you show me

51:27

the plugins. Oh my god, the plugins for GPT. There's so many of them, but one of them is just like a simple browser plug in, and when that thing is enabled, you can just say, what did we say? How find me a welder in New London County that might be available for a small project. And it literally went out searched the internet and distilled the information down to a list, a bulleted list with all the information

51:54

that I want. Yeah, so agents type. So using the PI you can do a lot more than you can with chat GPT, and I think that's to me, that's where the most interesting work is going on. The Playground in particular, is your one of your favorite tools. Well, playground is for what playground actually is is for prototyping things you want to do with

52:15

the API. So you know, my workflow is I usually go into playground and I make something work, I train it a little bit with some data, and then I encode that into C sharpcode and call up with the API. My users in Roster, they don't ever see a chat interface if this happens. This actually happens in Azure functions. We have Azure functions that are just grinding away in the cloud, you know, trying not to hit that that limit. So when you're when you're working with a server environment, chaining

52:47

is where it's at. You know, like one prompt calling into the server, calling into another prompt that's specialized for something else. I mean, that's that's the magic, that's the that's where that's where the cool work has happen, and that's how Smallville works. That's you know, roster. Some of the things that we do involve six prompts that run in series to get a good outcome. Good. Talk about a few more plugins that you like to

53:12

use. You told me that there was one about where you could just book travel. Yeah, there's a cup just by talking at chat GPT. Yeah, there's a Kayak plug in that can book travel, rental cars, and hotels. I think I haven't actually booked travel with it, but there's a whole bunch, and you know, I think it's possible. There's a there's

53:31

a program where you could sign up to create your own plug ins. So let's say you had some business that makes widgets, why not make a plug in that connects to an API inside your business that allows you to ask intelligent business questions that you know, The plug in calls your API in point, The API in point looks up what the answer is, and then it renders it all with a language interface. Yeah. Yeah, yeah, that's kind

54:01

of how all the being searched type things work like. It's just it's just calling search in the background, getting the results, and then telling you about it. There was another one that you mentioned where you can upload a PDF

54:14

like the rules of the Dungeons and Dragons or something like that. You can upload that as a PDF, and then when there's a problem with a game that arises, because you know it inevitably does and there's a dispute, rather than taking a half an hour and looking through the manual, you could just ask a chat GPT a question. Yeah. Interestingly, it already knows a

54:32

lot about d and ds. You can just ask you can you can upload files, imagine, upload your service manual for your car, right and then say, yeah, I have a little noise and I can't call car talk anymore because they're off the air. But well, I mean, isn't this what M three sixty five copilot ultimately is is access to all of the corporate documentation, all of the emails, all of the interactions within an organization.

54:59

I can see it could become this, you know, corporate memory that you could ask it anything about the company and it can pull all the things. Yeah, so that's enterprise search. And there are a few people working on that. It's a hard problem. This actually leads into talking about vector databases a little bit. That's a really fun topic. Yeah. So the problem with the problem with enterprise search is that all the corporate documents is probably a

55:22

lot more than the eight K context. Right, you can't really you can't really load them. So what do you do? How do you make something that has long term memory for a large language model? And vector databases are are an answer to that. And can we just digress for a second and talk about how So okay, so this blew my mind when I learned about it. All right, this is one of the coolest innovations in this area. So there are these things called embeddings, all right, and this is

55:52

this is pretty technical. This is programmer talk. So you can ask. There's a special model an opening I called adda two ada like at a lovelace, and it's it's specialized for generating embeddings. It's very, very cheap. And when you look at it embedding it's so you send it a sentence or a word and it responds with this giant array of numbers. It's actually fifteen hundred like dimensions on this thing. And you look at it and think what

56:22

is this? All right? Why do we have fifteen hundred dimensions? And what is this for? Well, it allows you to do search and so imagine imagine this, okay, just stay with me. Imagine a spreadsheet, okay, and on this spreadsheet you see things like boy, teenager man, girl, teenager woman, larva, pupa butterfly, egg, chicken, rooster, you know, like that's what's going across the rows of this thing or

56:51

the yeah, the rose. So what is this you're looking at. You're looking at a spreadsheet of life cycle of things, right, So here is the thing, all right. One of those dimensions somewhere in there is that it's a big, a big table of life cycles, and and it's mapped your word somewhere on that spreadsheet. And it's done that with fifteen hundred plus other things, okay, other other like points of knowledge that it identified during

57:20

training automatically. So so what does that do. It makes it so that you can check the distance between two things, two words, two sentences, two images. You can do this with anything. So so you run an algorithm like dot product and it's it calculates the distance between this concept and that concept, like boy, and girl and and tells you how far away they are in semantic space. That is crazy. I mean, you have to have structured all of this data already. It's not like the machine learning models

57:59

that are supposed structure data themselves. So can you can you actually generate vector databases just from men inference you so it has it already, So open ai already has their their model. And with adda, with just hitting the attitude endpoint, you can say what are the embeddings for this word or sentence and it will give you that, and then you can do operations. And some of the operations have really interesting properties, like the same concept in two different

58:30

languages will be in a similar location in semantic space. This is reminding me of the old Lapp cube concept in databases that was very popular in the early two thousands or the mid late two thousands, where instead of storing data in tables, rows and columns, it's a it's a three dimensional cube. But I never used it, and I never I barely grasped the concept at the

58:57

time. I remember talking to Andrew Bruston people about it. But this is a little bit different though, because you're calculating distances, you're not actually looking for a value. Well, let me bring this back to earth. So we have this technology that's sort of somewhere in the outer layer of large language models that does these embttings and works with them. How does this tie into

59:17

search? Well, it is search. Like, if you want to do good search, make embettings for the search terms and for everything you want to search on and calculate the distance in semantic space. It's not an expensive operation. Yeah, there could be a lot of terms. So you need a

59:35

specialized database called a vector database. And so what you do is you take you take a sentence like dot net rocks is a great podcast, vectorize that our creative bettings for it, save that in the database, maybe put a couple of tags on it so you can search in additional ways and maybe add the text that it belongs to. So then you can say, what is a great podcast, and the database will return the best matches and you just pull the text out and actually what you can do is embed that into a

01:00:08

prompt. Take the text out and bed into a prompt and you know a prompt us what you give gpt at the beginning or you know a question. Yeah, so imagine this flow for chaining all right. Step one user asks what's how many units did we ship in the Southwest in twenty seventeen. That request gets sent to a server and the server looks at that and says, let's search our vector database for this query, and it just does a call out to pine Cone or some other A lot of a lot of databases are

01:00:47

getting bolted on vector capabilities, like postcris has it. Asure is working on it. So you make that call out, it gives you back the best results, and then you make another prompt where you you take that chunk of text and you embed that in there and say, you know, show this to the user in whatever way is appropriate. And it does. And so you still have hard problems though, because you still are limited by the context. You're if you have a giant document, you need to go further and

01:01:17

like chunk the document up and find the most relevant part. These are hard problems, but people are working on them. Microsoft certainly is wondering why we're not just you know, most folks think in terms that we just retrain the model with my data. I mean that's basically how they described get Hub Copilot is that they took a large language model and then added in all of the code. They were able to scrape out of their own site as part of

01:01:40

the learning model, and so it understood code better. Couldn't you do that with corporate data train it into the model. Yeah, there's a couple of ways you could. There's security issues there though, security and accuracy issues. Well, if you use fine tuning. Basically, what you're doing with fine tuning is you can upload a list of prompts and what output should be. So, for instance, one could be what was the what do we sell

01:02:06

in the Southwest in twenty seventeen and the answer is two thousand units. You could train it with a whole bunch of queries like that and then open Ai actually host that in the cloud for you. It costs more, a little more, it's not it's not grievously expensive, but then it's trained on your data. The problem is that it's not live, it's not real time. So if you want something that's organic, that's that's changing as your organization grows,

01:02:31

you can't do that. You can also, potentially, especially with something like Lama two, you could actually train it yourself. You can like get some GPUs and actually, yeah, actually train it and then host it yourself. That's possible, but again it's it's not dynamic. It's going to be static. So the only way to make it, the only way I'm aware of, to make it truly organic that changes as your documents change, is to sink all this stuff up with some kind of vector database and do the

01:03:01

hard work. And whoever solves that is going to have a killer app, that's for sure, and I think it's Microsoft's got a crack at it for definitely. Dude, we could go on talking for another hour easily, it would just fly by like this one did. What's neck? What are we gonna do next time? The AI bon show. Well, I know a couple of really good D and D experts and I've got one lined up to come on and talk about how he's already using this to run his D and

01:03:29

D campaigns. It should be a really good show. Maybe we should substitute me out for somebody who's played DN D. You can, you can, definitely, you can definitely learn about it. I'm sure you'll sell good questions like he did with board games. That sounds good. And he's also as sharp. He's a she's sharp dev as well, so he's part of this world. Good, well, that sounds fun. Thanks Brian, It's always good talking to you. Oh it was great to be here and we'll talk

01:03:53

to you next time. On dot net work dot net Rocks is brought to you by Franklin's Net and produced by Pop Studios, a full service audio, video and post production facility located physically in New London, Connecticut, and of course in the cloud online at pwop dot com. Visit our website at dt n et r o cks dot com for RSS feeds, downloads, mobile apps, comments, and access to the full archives going back to show number one, recorded in September two thousand and two. And make sure you check out

01:04:48

our sponsors. They keep us in business. Now go write some code, See you next time. My God let Me is hard than my Texas Red

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript