Local AI Models with Joe Finney

Speaker 1

00:00

Hey Richard, Hey Carl, what do you know?

Speaker 2

00:03

Well, I know that our friend Michelle Rubusta Monte is with us to tell us about something that's going on adjacent to DEV Intersection.

Speaker 1

00:11

What is it? It's cybersecurity Intersection. Let's let Michelle tell that story.

Speaker 3

00:16

Hey Michelle, Hey Carl, Hey Richard, how are you.

Speaker 2

00:21

Tell us about cybersecurity Intersection?

Speaker 3

00:23

Well, so, Richard and I are partnering with the group that does DEV Intersection and next Gen AI, and we are putting on a new conference dedicated to one hundred percent security focused topics. And I mean, honestly, the lineup of speakers is incredible. We have Paula A. Jenis, who's here from Poland and does keynotes all over the world and is one of the top rated RSA speakers and

00:50

black hat speaker. We're so lucky to have her. But she's not only keynoting, she's got a workshop teaches you about protecting your environments against hackers and shows you about how to you know, do attacks so that you can prevent them. It's pretty cool and sessions like that as well.

01:07

But we also have speakers from Microsoft. We have we have speakers that specialize in you know secure coding practices, Azure security, Zuero, trust architectures on Azure UH and people who do decision maker tracks, so things around governance policy and you know how to how to manage and your production operations keep them secure. So it's an amazing group of speakers, really excited about it.

Speaker 2

01:31

And I think I can count myself among the group of speakers there.

Speaker 3

01:35

Well, yes you can. That is great.

Speaker 2

01:37

Yeah, I'm doing a securing Blazer Server applications talk and also I think we're doing a Security this Week live show there somewhere that is correct.

Speaker 3

01:48

Yeah, we'll be recording Security this Week Live. We're going to have a great panel with some folks. The interesting thing here is we don't really have a Microsoft and dot net and Azure focused toecurity conference yet, so that's the reason we're putting this on as well. You know there are other security conferences, but they have a spread of topics that maybe don't focus on the things you

02:10

do day to day. And you know this overlaps with again our community of folks that specialize in again dot net, Azure and yeah, they need to keep it secure too, So with tons of talks.

Speaker 1

02:23

Cyber Intersection is part of a trio of conferences we're doing. They have Intersection alongside the next Gen AI conference all in Orlando the week of October fifth through tenth. That's workshops and the main conference. And you can get a special registration code if you sign up through Cybersecurity Intersection dot com.

Speaker 3

02:42

Yeah, so if you sign up at Cybersecurity Intersection dot com, then you put in this code so Alliance cyber three hundred and you'll get three hundred off the entry price. So that's a special code that only works at cybersecurity dot com. And then you have access to all the conferences.

Speaker 2

03:04

Like Richard said, Wow, that's cool. Thanks Michelle. I'm looking forward to it and I'll see you there. Hey, get down rock and Roll. It's Carl Franklin and Richard Campbell for dot net Rocks.

Speaker 1

03:28

Hey, Richard, how you do it, Bud?

Speaker 2

03:29

I'm good, getting psyched up to go down to Orlando.

Speaker 1

03:33

Yeah, it's almost time back to a new dev Intersection and next jen AI and the New Cybersecurity Conference side by side. Yep, yep.

Speaker 2

03:42

Looking forward to doing a live security this week's show down there.

Speaker 1

03:46

That should be fun, fun and you're crazy thing with Maddie. Oh god, you're going to aspireify dot net rocks here.

Speaker 2

03:54

I have no idea what to expect. That could be a horror show.

Speaker 1

03:57

This is you know, you love a good you know, trapease act, just going without a net.

Speaker 2

04:03

Absolutely, as long as I don't you know, screw up too badly, it should it should work out fun.

Speaker 1

04:10

You know, a good crash it burns fun too, but it.

Speaker 2

04:13

Could be fun. Yeah, yeah, yeah, Okay, let's start with nineteen seventy. That's the episode number. Oh yeah, and a bunch of things happened in nineteen seventy.

Speaker 1

04:23

Where do you want to start?

Speaker 2

04:24

Well, the unhappy things, the Kent State shootings.

Speaker 1

04:27

Yeah, it's terrifying.

Speaker 2

04:28

On May fourth, National Guard troops killed four students during protest against the Vietnam War at Kent State University in Ohio, leading to nationwide outrage and the song what is it for Dead in Ohio?

Speaker 1

04:43

Who's that?

Speaker 2

04:44

Neil Young or Crosby Sills, Nash and Young? I'm not sure. Nigerian Civil War. The conflict ended in January when Biaffron forces the Affron forces surrendered after a thirty two month struggle for independ and it's the first Earth Day was observed. On April twenty second, the Beatles broke up and let it be. McCartney said he was leaving the band on April tenth. That was the end of that. But John Lennon instant karma. He wrote and recorded this hit song

05:17

in a single day, showcasing his prolific creativity. Diana Ross and the Supremes gave their final concert in Las Vegas in January fourteenth. Back to the bad stuff the Tonguhai earthquake. Devastating earthquakes struck Tongue High County, China on January fifth, resulting in significant casualties, with estimates of up to fourteen thousand,

05:43

six hundred and twenty one deaths. Yeah, and an avalanche in someplace that in France that I can't pronounce zi fool full, Sorry about that killed forty two people, making one of the worst disasters in French skiing history. You can talk about the science, yea science. Some things happened, well I was.

Speaker 1

06:06

I mean the space one's the obvious one. After having both Apollo nine, Pollo ten, and Paula eleven and Apollo twelve all in nineteen sixty nine, there was only one Apollo mission in nineteen seventeen. That was a Poulla thirteen. It launched on April eleventh, and on April thirteenth they said, we've had a problem.

Speaker 2

06:24

Here, Houston. We've got a problem. And we're a great movie too.

Speaker 1

06:29

Yeah, and you've seen the movie, a beautiful rendering of more or less what happened. The HBO Earth of the Moon series, if you ever get a chance to watch, that does a version of Apollo thirteen, but from the view of the people on the ground, so you only ever hear the astronauts over the radio, which is how it was. Right. Sure, here's the crazy thing to realize. So the explosion in the tank happens on April on April thirteenth, the splash downy April seventeenth. It was four days. Wow,

07:00

the whole thing's four days. I know, it feels like forever. It's four days.

Speaker 2

07:03

Wow.

Speaker 1

07:04

But it was four days of are these guys going to make it? You know, like four days of sheer terror. Yeah, it was. And of course they the lunar module Aquarius was turned into a lifeboat because the power systems, a little bit of battery that was left in the command module is going to need for re entry, so they basically powered down the command module and then use the Life Sports system for two four to three and just four days and they were able to get home amazing

07:28

and survive. It's a great story. And of course the next Apollo mission would be delayed while they dealt with some of those issues, and in nineteen seventy one, you'll get Apollo fourteen. Talk about that next week apparently. Yeah. On the computer side of things, Nicholas Worth releases Pascal Woll. He worked previously on the language I'll Go sixty and there's some derivations therein he was trying to do a combination of sort of procedural and algorithmic programming. So Popular

07:56

Language did some great things. But on the heart work side, for me, the show stealer is my you know, iba Intel's most important product, the eleven O three, the d ram. Okay, this is what Moore's law actually was about, was making RAM right based on a bunch of other developments to make a transistor based memory. They were able to make a silicon substrate for an eighteen dip pin dip can with one K of RAM in it for sixty bucks.

Speaker 2

08:31

Wow, that seems cheap back then, and.

Speaker 1

08:34

One cent per bit, and it was small because then they were largely using core magnet Ferris cores for memory. So this was very compact and it was adopted immediately everywhere. It's it's uptake. That's also the same year that the first version of the IBM system three seventy comes out with all semi conductor RAM, but that was not Intel's RAM. But shortly after that, Intel's RAM just dominates the market

08:58

and sends Intel on its trajectory. Although nineteen seventy one they'll make arguably and even more in product important product. Tune in next week for nineteen seventy one, nineteen seventy one. But yes, the eleven O three was there, you know, definitive product. They were rammed digit you know, semiconductor ramming. And that's what I got.

Speaker 2

09:18

All right, well, I guess we should carry on with better no framework.

Speaker 1

09:23

Roll the crazy music possible.

Speaker 2

09:32

All right, man, what do you got again? I looked for a trending repost on GitHub and I found MCP for Unity. Oh my, yeah, you know Unity Create create games with the Unity. It's a graphical tool that uses c sharp and JavaScript for scripting, but it also does all of the three D stuff. So here's what it is, proudly sponsored and maintained by Coplay, the best AI systant for Unity. There you go create your Unity apps with

10:03

l l MS. M CP for Unity acts as a bridge, allowing a assistance like Claude Cursor to interact directly with your Unity editor via a local MCP model context protocol. We've been talking about those. A local MCP client use your lll M tools to manage assets, control scenes, edit scripts, and automate tasks within Unity.

Speaker 1

10:27

Pretty cool. Interesting, Yeah, a good show to actually walk through the process of, you know, including making a game in Unity with with the MCPM, with l MS in the role. Yep.

Speaker 2

10:40

Also code it with the AI. Dot com is up and the first episode is there and we're basically using playwright to with the code agent in the visual studio code nice and using clauds on it. And we basically one prompt told it to create a user documentation of Jeff Fritz's copilot do John dot com website, and it did a pretty good job. What we didn't show was what's involved in setting up the playwright MCP so that the agent can use it. Oh yeah, and it turns

11:25

out that's pretty complex. You need node JS and NPM and all that stuff, and we're looking for a video on how to do that, so look in the show notes for that. Cool, but that's it for a better no framework. Who's talking to us? I have a common of a show nineteen sixty nine. Yes, that's last week's show with our friend James monte Magno.

Speaker 1

11:48

And we talked a little bit about the AI tooling inside of Visual Studio code and its relationship with Visual Studio and so on. And our friend Richard Rukima, also known as Coputer, has this common but he says, I think Richard nailed it. Do you like the code or do you like a solution? I consider my expertise working with AI as a beginner, especially after listening to James, but I felt that vibe of joy in getting things done so fast. So do I like the if then else?

12:16

Or do I like ask you for a future reviewing the result? I'm long past the joy of knowing how to write procedural code. Yeah. An interesting aspect of this is, like, is it the more experienced folks that are going to embrace these tools faster? Because it's typically the more junior people that tend to jump on the bandwagona new things, but I hear the same tone over and over again. Yep.

12:37

Certainly in terms of respectful interaction with AI, I don't prescribe to the harsh language, as I feel it reveals character. It's an interesting statement rights in my character not to be harsh fully or and to focus on being respectful communication. I don't think AI should be treated any different, not for the benefit of AI or the benefit of myself.

Speaker 2

12:57

Yeah, exactly. You're not going to feel good, you know, using harsh language.

Speaker 1

13:01

Putting those mean words out there is as much impact on you as it is on anything else. And leave me. The software is not affected, that's thing, right.

Speaker 2

13:11

The only thing left to be affected is you.

Speaker 1

13:13

Yeah, so be kind to yourself. It's not necessary, right, Hey, Richard, I'm pretty sure you've got a copy of Music code By already, but thanks so much for your comment. But if you'd like a copy of music, Cobey, I write a comment on the website at dot net rocks dot com or on the facebooks to publish every show there and every comment there, and never reading the show, will send your copy of music Go.

Speaker 2

13:31

Music to code By is still going strong after all these years twenty two tracks. You can get him in uh wave, flack or MP three and that's at Music to Code by dot Net. Okay, let's bring back our friend Joseph Finney. Joseph is a mobile product owner in MVP by day and he builds productivity apps for Windows by night. When he's not programming, he's burning running and enjoying tasty coffee and beer in Milwaukee.

Speaker 1

14:00

Hey Joe, Hello, welcome back to having that.

Speaker 4

14:03

Good to be back talking more about the hot topic of the day, AI.

Speaker 1

14:08

With a Century. Yeah, but you've got you've got a cool angle of this. That's why I asked you that to come on. So what are you working on?

Speaker 4

14:15

Well, one of my most popular apps that I make is text grab, which is pretty basic. It's also the basis for the Power Toys Text Extractor, which is basically select a region on your screen of somebody who sent you text that you can't actually select and put somewhere where you want it. And it does some on device local OCR. Pretty simple, and now with these new models,

14:42

the OCR is getting better. But it does change compatibility and devices, but it's it's pretty interesting what we can do here now with these local models Microsoft's making it easier with some of their Windows AI APIs, and then there's it just gets more and more complicated from there.

Speaker 2

15:01

Mm hmm. So I have an app that I'm running right here that does little OCR and I'm using Tesseract to read the text in a bitmap at a certain coordinate. That is that the sort of representing the state of the art before AI got into the mix.

Speaker 4

15:17

Yeah, I would say it's it's similar. Tests React was the open source project that Google took over I think. I think actually HP started it way back there and then kind of Google took it over. Yeah, it's on GitHub. There's a lot of models. It's very widely used and loved. Text grab does enable you to download tests earect and then you can interact with it through the CLI. Well text grab will just interact with it directly, but there's a little bit of setting up. You do have to

15:46

download it. It's a it's another installer. It's through ub Mannheim I think who does the installation. So there's definitely some hoops you have to jump through to get it working.

Speaker 2

15:54

And there's a data set that goes along with it, right.

Speaker 4

15:57

Yeah, Yeah, so yeah, you have to download the languages. There's a lot. One of the benefits there of Tessaact is that there's a lot of languages, and they have packages for scripts, and they have packages for like handwritten and so it's really high quality. Originally, Textcrab was built using the Windows ten ocr APIs, which are definitely older, not as good, but they're very fast. So that was kind of the nice thing there. They're built in, they're fast,

16:24

they're quick for most stuff. It worked pretty well cool test erect was a bump up, but again you have that complexity where you have to download the models locally. But it's open source, it's available, it's free. And now there's these Windows AI APIs that Microsoft has released. I don't think we know exactly what those models are. I don't think they've shared. I haven't learned what they are exactly.

Speaker 2

16:46

But what was the acronym that you used before we started recording for this new.

Speaker 4

16:52

Wind WINML, Windows and machine learning.

Speaker 2

16:56

Okay, and this is new, yeah, literally days old than we don't know anything about it.

Speaker 4

17:00

Well, the win mL stuff is kind of a middle layer here, Okay. So I would say there's like three general levels of intensity. If you are a local Windows app developer and you want to get ocr image language models like all of that stuff. If you want to do that in your app. I would say there's like three different tiers of complexity that you can engage in, and the first one is the new Windows aiapis. And these were released kind of around the time the Copilot

17:32

plus PCs were released, Okayne, and they've been rolling out. Yeah, they've been rolling out slowly. They were in experimental. You had to be on the insider preview to build them. To use them, you have to have a co Pilot plus PC. But you know, there's a higher bar kind of on the consumer side, but that means it's easier on the developer side. So they basically in the code when you're building, you just have to check does this device support these APIs? If so, do it very simple

18:02

and like that's it. You don't have to manage models, you don't have to manage memory or downloading, and you don't have to worry about shipping. You know, a five gig model with your app. They're already on the device. If the device supports it, then you can kind of light up those features, turn on those buttons, show that capability, and boom, it's there.

Speaker 1

18:21

Kelly.

Speaker 2

18:21

My wife bought a new Copilot plus PC. She didn't, of course know it. We went to best Buy together, you know, and she picked it out. But the first thing I did is immediately turned off all this stuff. It's going to get in the way. The thing that takes screenshots all the time. I can't remember the name of it now, recall, recall, that's it. It was turned off by default. So that's good. That's good. I did not want that on.

Speaker 1

18:49

It's a really powerful tool. People love it, you know, like because the bottom line is you can you can ask the machine, he where did I see such and such, and it'll find it for you. Yeah.

Speaker 2

18:58

I just don't have that kind of problem, like I know where I saw stuff, and I keep good notes and dot your machine. Yeah, she didn't want.

Speaker 4

19:05

It, so yeah, I also don't use it like I have AI features in well, AI, I should say, I know this show Richard has talked a lot about how you have these big amorphous buckets of AI, and then as soon as you start explaining it and giving a more clear, straightforward name to it, it stops really being AI. And that's kind of where the OCR and LLM and image segmentation and image detection. So those are all under this umbrella of AI, and it can be a little I don't know.

Speaker 1

19:38

You left the impolite part, Joe, which is like, so for me, the term artificial intelligence means something that doesn't work. Yeah, there you go, because as soon as it does work, it gets a new name.

Speaker 4

19:49

Software, right right, that's it's a module. Yeah, well, I should say, then the using name space in dot net is AI. But then after that there's always dot tech that imaging that image recognition. So there's a bunch of there's a bunch of APIs after the namespace that actually point to the real APIs, the real functionality of what you're actually trying to do. And I don't think you

20:14

can easily turn all of that off. I would say, so there's a lot of experiences that are built on top of this technology that's already in these Copilot plus PCs, and you could turn those experiences off. You know, they're not going to run by default. But Microsoft does a pretty good job of managing bringing down the model, keeping it up to date, and making it really easy for developers to interact with, which is kind of what you want, right,

20:38

You want something really simple easy. It's a super complex problem, but you could just say, you know, send this block of text, summarize it, and then get it back.

Speaker 2

20:46

So in case anyone hasn't figured it out by now, the Copilot plus PC has a local LLM built into it.

Speaker 1

20:53

Yep.

Speaker 2

20:53

And you know, this is the kind of thing that you might think of if you were going to use OLAMA right and download models and you know, train it, run it on a laptop or something like that, the gaming PC or something.

Speaker 4

21:09

Yeah, there's that's just kind of where I said, there's like these different layers of the complexity and the easiest, simplest, like lowest level, easiest for any developer out there to integrate into their Windows app. Any Windows app by the way, so WPF or when UI or wind forms you can

21:26

or MAUI, you can do them all. It does have to have identity, some sort of identity because SB there's Microsoft doesn't want to just open up these APIs to any random raw ex But if you want to do some more maybe more niche stuff, maybe a little bit more complicated stuff, or you want to use this specific model, you can kind of use what I would call like the next step of complexity here, and that's win mL and that's there's a little bit of a middle layer

21:54

there where you can go download your own on X models and run those and it makes it easy. There's like a basically a standardized interface and you say, run this model. You don't have to necessarily optimize it for the specific hardware and it can run CPU, GPU and PU and it's an easy way. But again, there you

22:15

have to manage the model. So if you want that, if you need that in your application, maybe you have it specifically fine tuned for your application, or you have a model that isn't in the box, or I don't know if there are other legal or.

Speaker 1

22:31

Hey, I'm just appreciating you're talking about something other than in the LLM, because it's just it's just overwhelming right now. So you know, clearly there's a bunch of other models out there and all of those infrastructure, and I'm including links to onyx and things like if you haven't looked here, there's lots of good work being done for specific tasks.

Speaker 4

22:48

Yeah, and I think immediately people can kind of get annoyed by, oh, LM, why do I need an LLM in my model? I'll need AI and it definitely has become synonymous like AI and LLM. Yeah, but there are so many If you go to hugging face and you look at all the different categories, I mean OCR, image segmentation, image detection, object detection, huggy face, oh yeah, hugging face, hugging face, hugging face. Yeah, this is a I think

23:15

Facebook is kind of backing it. And it's a big repository for models, so you can access models, you get download models, and if you're thinking.

Speaker 1

23:27

Before the insanity of lllms, we had we had good tooling around just building machine models for object detection and recognizers and OCR all these good things, right, Like, it's just there was so much going on before chat GPT showed up and just overwhelm the message.

Speaker 2

23:45

Wow, hugging face looks awesome.

Speaker 4

23:47

Yeah, it's it is a huge, kind of big repository of models online where you can go download them. But if you're a normal person who's just curious and says I want to kind of to try some of these out, it's not as easy. You can't just download them and then run them. They are not programs, their models, so you need to interface with them somehow, and there is actually a way if you are inclined. You can download an app from Microsoft called the AI Dev Gallery app.

24:20

And what this is it's kind of a playground for people who are curious about models and different models and how this all works. It's open source on GitHub, it's in the Microsoft Store and it is a really low barrier to entry if you are interested in trying some of these models out on your own device.

Speaker 2

24:36

Wow.

Speaker 4

24:36

So you can download models from hugging Face. You can run them. They're very limited, basic samples, so don't expect anything grandiose or chaining them together. But it's a great way to play with those Hugging Face models.

Speaker 2

24:48

Very cool.

Speaker 1

24:48

Did you ever play with Cagle, because we've talked about this on the show Ages Ago. Just like there is another playground for practicing your mL skills.

Speaker 4

24:58

I've never tried. It is in a a website or a technology.

Speaker 1

25:01

They actually run competitions for you know. The sort of famous one for them was the predict how many people survive the Titanic sinking. There was a bunch of different models or different competitions, and some of them have a lot of money in them because they're actually you know, organizations encourage folks to mature a model particular problem space that they can then use elsewhere. There was things like

25:27

aneurysm detection and even sports predicting. So just again a reminder that there's things other than llms.

Speaker 4

25:39

Right, And I would say that is the like the farthest, the highest tier of integrating AI models into your app, your local Windows app is making your own models, training your own models from scratch, So you can do that. I mean, you can ship models and integrate them directly in. It's again way more integration work, but it's way more fine tuned. So if you have a specific application where you need a model that can do very niche things or very specific data sets, it's possible. It's doable, and

26:12

there's ways to do it. You should check it out. One of the nice things about this current age of programming is a lot of these big popular apps are open source, so you can just see how it's done, and you obviously read the license, but a lot of this stuff is available to see how other people are integrating these AI models.

Speaker 1

26:31

Guys.

Speaker 2

26:31

I know we've talked about deep seek a bit on this show, and Joe's nodding his head, so he knows about it, and this was the model that came out of China that uses a lot less resources and is therefore cheaper to run than you know, chat GPT was, and everybody was like, oh my god, open ai is going down, and it didn't. And then there were concerns about you know, if I use deep Seek, am I sharing data with you know, the country of China and

27:07

is it safe in all of these things. But you can also I think, correct me if I'm wrong, but download it the app and run it locally like olama. Is that true?

Speaker 1

27:18

Yeah?

Speaker 4

27:19

That So one of the nice things about deepseek is how small it is. But they also have NPU optimized models which you can go download and there's also an extension for vs code.

Speaker 2

27:33

Wait wait, go back to the is M or NPU and what is that?

Speaker 4

27:38

That's the neural processing unit. So you kind of have your CPU, your GPU, and your NPU.

Speaker 1

27:44

And this was.

Speaker 4

27:45

The core the chip, the part of the CPU in these ARM devices that really made it easy to run these models locally and efficiently.

Speaker 1

27:56

Okay, part of the requirement for a copilot plus PCs that it has an MPU of at least what is it, forty tops or trillion operations per second.

Speaker 2

28:04

So if you have a copile plus PC, you can download deep Seek and use it even if you don't, and you're probably going to get good results.

Speaker 4

28:13

Yeah, you don't have to have a NPU, but a lot of these models. So Microsoft makes a LM called five Silica, and this model they have they've been releasing three, three point five, they just released four. It's optimized for the CPU and the GPU and not the NPU right now, at least the models that they've released, and there are models out there that you can get that are optimized

28:39

for the NPU. So if you do have a device that is OM device or low power device and you want more of an optimized model, you can find them and run them. And you can also do that in VS code. There's an extension called AI Toolkit for Visual Studio Code, and that's another kind of playground esque place, but you can also do the model refinement and fine tuning in there. So there's a lot of ways that you can experiment with these models without really being a pro.

29:09

So if you're just curious and you have a lot of hard drive space, that is the one thing that I'll say, I recently upgraded my surface hard drive from a five to twelve to a two terabyte because these models are big and if you want accurate ones, they're very large.

Speaker 2

29:26

I just saw Richard probably knows about this, but there are now twenty two terabyte SSD drives. Yeah, for like around five hundred bucks. Can you wrap your mind around that.

Speaker 1

29:38

It's a lot of storage.

Speaker 2

29:39

Oh my goodness, Like me know, Joe's like is shaking his head, like what.

Speaker 4

29:44

One drive twenty two terramytes.

Speaker 2

29:46

Twenty two terabyte SSD five hundred bucks?

Speaker 4

29:49

You should that's not a typeout.

Speaker 2

29:51

No, there's a couple of different brands.

Speaker 4

29:53

That's amazing.

Speaker 1

29:54

Yeah, ridiculous, Yeah, that really is. I think I should. I don't think they are sists. I think they're spinning drives. Oh really two terabytes? Yeah SSDs the solid state ones and aren't that big yet?

Speaker 2

30:05

Okay?

Speaker 1

30:06

The still twenty two terabytes is madness? Like that's just a lot of storage.

Speaker 4

30:11

Yeah, it really is. And the AI Toolkit and vs code does allow you to interact with these llms through the web, and so GitHub will host some of these models, other providers will host them, and so you can kind of do comparisons. So there's the local foundry, and that's what Microsoft has branded there. You know, I've called it, I think the second tier kind of where you have win mL and you have your local models and you're

30:40

doing that work. So you have your local models and you can compare those two cloud hosted models and test them because again, you know software, you have to be able to test it. So it is hard too with these how do you compare them? Like, which one's good, which one's bad? Is it good enough? Is it good enough in our use cases? And it can be tedious

30:58

to test manually. But there are a lot of tools out there to experiment, get started, and if anybody's curious, I definitely you should check out the aidev gallery for sure. That is a lot of fun to play around with those different models and for a little bit more advanced scenarios, what more language focused. The AI toolkit in vs code is another really fun I'm looking at deep seak here right now. You can download it on your device and run it.

Speaker 2

31:27

Wow, it seems like a pretty good place to take a break. So we'll be right back after these very important messages.

Speaker 1

31:34

Stay tuned.

Speaker 2

31:36

You know, dot net six has officially reached the end of support and now is the time to upgrade. Dot Net eight is well supported on AWS. Learn more at aws dot Amazon dot com, slash dot net.

Speaker 1

31:53

And we're back. It's don that Rocks emergor Campbell. Let's call Franklin. You talking a bit to our friend Joe about work with local models and also and the non LLM stuff just sort of a good reminder there's been all kinds of cool stuff going on in the mL space that didn't necessarily have to do with language per se. But you know, you've you've hinted this a couple of

32:15

times in the first half. It's like, if you want to own the model, you know, there's a lot of models available to download from hugey face and all these other places. Why would you want to own a model because it sounds like a lot of work. It's like owning a framework.

Speaker 4

32:31

Yeah, yeah, it is like, don't trust somebody who says they can write their own language and write their own ide You're like, oh.

Speaker 1

32:38

Their own garbage collector, you know, their own crypto library. Like these are all scary things to me. So when someone says I'll just make our own model, I'm like, why do we need to do that?

Speaker 4

32:48

Well, if you're in the industry. If you have insane amounts of data and a niche in a specific industry, it might be worth it for you to look into doing this. And if you have a hard time processing large amounts of data to get insights and actions out of it, which is kind of the idea here, right, what you have an entire language that you have to train these models on, or you have an entire data set of images with boxes drawn around the dogs or

33:18

dog breeds or very specific things like that. If that's what you need to do, is something where it's not available or it's not good enough, there's really no other way around it than to build your own model today. But it really is that data.

Speaker 1

33:33

It's I mean that being said, this is all sort of non terministic thing, like you're never going to get one hundred percent out of a machine learning model.

Speaker 4

33:41

It's probabilistic, right, absolutely, even maybe especially so some of the image detection ones, and a lot of times they'll give you back a number a fraction of confidence, and I think maybe this is why they don't get as much play as they're not as exciting for individuals to use. It's like the could take a picture of your cat and then your phone will draw a box around it and say that's a cat. Yep, that's a cat. So

34:06

I think it's a lot less interesting. The language ones just kind of capture people's imagination and there's a lot more back and forth. But when you really think about building an application, like what are you doing? Maybe you have a you're playing around with your Raspberry Pie as a security system for your house, and you want to add a vision system and you want to do box detection and you have hours and hours and hours and

34:28

hours of security footage. Or maybe you have a specific niche application where you're trying to, you know, detect a particular squirrel who's given you trouble. It's a fun you know, it's a fun experiment and you.

Speaker 1

34:38

Can do a bear or a bear.

Speaker 2

34:40

Joe, do you have a toi less squirrel bird feeder?

Speaker 1

34:44

No?

Speaker 2

34:44

I do not seeing this YouTube? Check YouTube for toil less squirrel terrible right. It's basically it goes between you know what you hang the bird feeder on and the bird feeder, so it's got a hook on either side. It detects weight and so when there's a squirrel on it, it just starts spinning and the squirrels go flying. It's hilarious to whirl the squirrel.

Speaker 4

35:06

Yeah, that you could build an AI powered twirl a squirrel.

Speaker 1

35:10

There you go, There you go. I don't think that's necessary. I am thinking about animal recognition this particular part of the world where you know. The one that would be tricky that I would really challenge myself would be whale detection because we've had you know, you don't have a lot of time to pick up on the fact that there's whale blow, like they're going by, and it could be orcers and it could be humpbacks, and it could be grays, and it could be porpoises, and it could

35:32

be dolphins. Like you have to be a lot of stuff going on. You have to be on the surface. We hear no, no, we hear them like we hear whale blow before we see the whale because it travels like when they when they exhale its loud.

Speaker 2

35:46

Well, you could identify a whale by the sounds it's making too.

Speaker 1

35:49

Yeah, I wonder. Yeah, speaking of it still seems nuts to build your own model like that just seems like a thing I don't want to own.

Speaker 4

35:56

Yeah, it's it's definitely the research side of things. And I know people have been saying for a long time that data is the new oil, right, this is the new black gold of do you have the data? Do you have the databases? Is it structured, is it consistent,

36:13

is it clean? Is it real? Is it good? And if you have all that, I think we have a very small number of people who can say yes, we have that right and you don't have to spend all that time cleaning the data, which is such a challenge where you have so much noise in the data today that if you're trying to train a model, Yeah.

Speaker 2

36:31

If how I was going to use a local LM, I would want it to understand C sharp, JavaScript, Blazer, you know, and CSS. That's and I don't know how realistic that is. Like I know that the current models like Claude's on it, and you know even chat GPT understand it. But for lack of a better word, sorry, Richard, didn't mean to offend you. There. They're programmed, you know,

36:58

they're they're trained against it. But what does it take to do that locally, to train the models to train well, or to get a model that understands you know, programmers speak languages and stuff they do.

Speaker 4

37:10

Yeah, local models will and they can write code. I think part of the challenge that you'll see if you start using them is speed. So the response speed of a local model is going to be much slower actually than a cloud hosted one because your computer cannot compete with a server with a rack of GPUs. Yeah, well maybe yours, Carl, not mine.

Speaker 2

37:31

Oh, I don't know. I don't think so. But you know, I think if I had a great Copilot plus PC, you know, with a lot of RAM and a lot of storage, and I just set it over in a closet somewhere, I could probably use that.

Speaker 4

37:47

Yeah, you should try it.

Speaker 1

37:48

Yeah.

Speaker 4

37:48

Another challenge is going to be context, which is how big of a context window can the model actually hold in the provider there's all of that, there's a lot of infrastructure in between the model and actually getting stuff out. So speeding context, I would say, are going to be your biggest risks where you don't necessarily just want it to give you new greenfield CSS. You want it to give you new CSS in the right spot for your codings. Which is that?

Speaker 1

38:14

And I want a much harder question.

Speaker 2

38:15

I wanted to remember everything we've said, Like I want as big a context as I can possibly get. So is that just a measure of more RAM or is it the more that context you have, the slower it's going to be to come up with a new answer.

Speaker 4

38:31

Yeah, that's a good question. I would love to hear an expert who actually knows more about context and how that differs from the training data and how it differs from fine tuning, because in my experiences with local AI, I have a pretty narrow context window that you could basically feed it, Hey, here's everything I know, and you feed it with the prompt yeah, and you say okay, now do this and then give it back to me. But you're not feeding it documents.

Speaker 1

38:57

The thing that's made a difference for me has been the video card and the amount of memory in the video card, Like playing with frame Pack and a couple of other models, and so I'm running a fifty eighty with sixteen gigs of v RAM, and that has made a huge difference for running bigger models. No, I'm not talking about building models, but actually executing a more complex workload. And if you have got the money to spend, because they're thousands of dollars like those top in RTX cards.

39:24

Now you can get ninety six gigs in them. Jeez, it's a ten thousand dollars card. But you know that seems to be the thing that makes the most difference for a lot of these kinds of tools when you want to kindle a lot of contact.

Speaker 2

39:35

What about an NPU? Is that gonna do it less than more than a ten thousand dollars video card.

Speaker 1

39:40

No, because there's just no You know they talk about that Copilot plus PC has forty tops. I don't know what that means. Yeah, that's the trend trilling operation per second. It's the measure of its compute power for neural nets. Okay, my fifty eighty has thirteen hundred TOP. I see so. And when you look at what Nvidious selling the data centers and things, is their giant GPU like that with huge amounts of memory, this super fast memory and them for scale processing.

Speaker 4

40:05

Yeah, the NPU, I think was more of a play for a continuous operation or in the background and on mobile devices where battery and power consumption is a much bigger concern for individuals, where they're thinking, well, I don't want this GPU chugging away in the background. Can I get something? Can I get something good enough, and that's kind of where that minimum bar is that doesn't absolutely

40:28

consume my battery life. You know, you open your computer up and it's like, hey, I was working in the background seeing if anything was happening.

Speaker 1

40:35

No, thank you. Yeah. Yeah. And it's been an argument now that you can jack up a PC enough with those with a couple of those big GPUs and run a mid size LLM on it. So you know, certainly, I've had conversations with folks where it's like, I am not prepared to send any of this data to the cloud. What can I do one hundred percent local? Yeah.

Speaker 4

40:56

Another thing that you do have to consider if you're going to get into building and those apps are especially local apps, is the idea of multi modal. Yeah, these models, these local models, at least the Windows aiapis are not multimodel, so you will have to.

Speaker 2

41:11

In other words, you can't talk to them and write to them exactly. Is that what you mean?

Speaker 1

41:16

Right?

Speaker 4

41:16

So you're going to have to build that. I mean you could, but you're going to have to put a speech recognition model in front of the LM or a object detection model plus an OCR model plus that you know, you have to maybe chain these models together and then you can get that multimodal experience where you can drop images, you can put PDFs in, but you have to be able to read the PDF. So these lllms don't read PDFs by default locally. You do have to get them

41:43

into a text format. So if you're thinking about how you can apply this into your work, and I know a lot of enterprises, a lot of companies, a lot of their data is not in raw text format, so you do have to get it there.

Speaker 1

41:56

Yeah, but there's an MCP for PDFs. So you know, glue these bits together.

Speaker 4

42:02

Right, yeap, but you will have to do the gluing. Some assembly required.

Speaker 1

42:05

This is the job, right, Like, this is not just an app you run, but we are assembling parts to try and get to a place where a model could be built.

Speaker 2

42:15

So if you were going to build a local LLM Joe yourself using some existing technology, would you first reach for deep seek or would you go for just the stuff that Microsoft is exposing in Windows.

Speaker 4

42:31

Yeah, I just reach for this stuff or a Microsoft is exposing in Windows and their five model. It's pretty good, it's pretty robust, and I would say it's a nice middle middle ground there for building on top of and fine tuning. I don't have enough time to be building all these applications and learn the APIs and learning the political history of where all these models come from. So it is a The benefit of Microsoft as a software provider is it's the one throat to choke, right, this

43:06

is the one person you go to. They provide a lot of the tooling, they provide a lot of the models. Is it the best of any of the world's the absolute best. No, But when you're doing a lot of different stuff, sometimes you just have to have some heuristics here and just make the decision making. There's an infinite number of decisions that you have to make when you're picking all of these. So starting just with the built in tools, the built in APIs, it's a great easy

43:32

way to get started. And if they don't work for you, then you can start making other questions and decisions. And yeah, but I would say start with the built in stuff definitely at first.

Speaker 1

43:44

Okay, yeah, here I knew Ivious read I'd read this. I just looked at up again. Gptoss is a version of GPT three that can be run locally on a machine with sixty four gigs around and a fifty to ninety with twenty five gigs of v RAM. So that's roughly six or seven thousand dollars PC somewhere in that neighborhood, depending on how much you pay for the video car. The video cards can be driving around it. But that's running you know, GPT three, which is what the original

44:14

GitHub copilot was built. Again, Like, that's a pretty torquy, pretty good little LM one hundred and twenty billion parameters. Like it's not GPT.

Speaker 2

44:23

Four, but.

Speaker 1

44:25

Especially in a narrow scope application like a NOME set of code, that's pretty robust. Man, you could do a lot with that.

Speaker 4

44:33

Yeah, you could do a lot with that. And also you have to consider the big question of why would you build local ever, you know, why do it at all? Obviously privacy is a concern for a lot of people of why would you do this stuff locally on your own computer? If you have network concerns, if you don't have reliable or high quality or high speed internet, then

44:52

obviously this is the only solution for you. But then also there's the cost concern and the cost question of yeah, you don't necessarily want to make some code that runs out and is running all these llms, and then you come back with a bill for you know, thousands of tens of thousands of dollars because your credits went crazy. Right, But when you have it local again, try There's so many cool tools, the AIDEV gallery, the AI toolkit, and

45:20

then there's the APIs available already today. There's so many ways to get started and try and see. I you know, what is your application, what could it be? Try it out because you might not have to sign up get an API key at all. You could do all this stuff locally. And then if you want to do batch processing of again your own data, maybe you want to kind of use these models to put the data into a particular shape or clean it or work through it. But you don't want to pay tokens to do all

45:49

that work. Well, do it locally, do it overnight. Build an app, your own app, not something you ship necessarily, but do it locally, you know, process that data locally, and then go from there. Maybe you're going to build your model, but first you have to get all the data in the right shape.

Speaker 1

46:02

Right, and and you're trading time for money right right. Essentially the game you're playing here. It's like, Okay, if I run it on the cloud, it's going to cost me more, but I get it done less time, or I'm restricted to my own hardware so it may take longer. And then you start, you know, doing the economics. So just looking up the high end. Yeah, the ninety six gig in Nvidia RTX pro six thousand Blackwell, that's the big Box twelve thousands.

Speaker 2

46:28

Well, you know, it's not only the money, but as Joe said, the security and the privacy that may trump any kind of money, and you know, and that may be the requirement you know.

Speaker 1

46:39

Sorry that was Canadian dollars, just nine thousand Americans.

Speaker 4

46:42

Ah, well that totally changes.

Speaker 1

46:45

Yeah, everything's different. Now he just saved me two thousand three grand grand. But again, if I'm playing that game of the cost benefit, like what am I spending on tokens at that scale? True? And I really get the sense that as this sort of bubble starts to burst and people need to make money, like tokens ain't getting cheaper nowp.

Speaker 4

47:09

Yeah, I have been using Claude and Codex and Copilot. There's definitely times where I have three computers running and they're I'm just kind of like telling them to keep going over. They're checking and building, but it's never going to be cheaper than it is now, Like this is the cheapest is going to be. They're trying to get as many users as possible, but that floor has to rise.

47:35

I mean, I know Anthropic was having some issues a couple of weeks ago with limits and quality, and Codex I think had something a month or so ago where the limits. And again, if you're relying on these cloud services, not only are you relying on them to stay up and your connection to them to say live, but you're also relying on the model and the pricing and the availability at all from a business to point for them to stay up. Because it might make sense today, I was.

Speaker 1

48:03

Talking to some folks abroad that are big, like running five sixty seven simultaneous instances because they're working that fast right tuned models, reaching these things, and they said that over July fourth everything got dramatically faster, like they got a ton of work going July fourth because Americans weren't working like these, like these cloud infrastructures are stressed to the limit and slowing performance as it is right and say and the only and the proof we've had is

48:30

like when the stress isn't a high, things are better. So there is this interesting argument about at what point does this make more sense to be local versus remote? And this is going to be a shared resource too, like these big boxes don't have to be per dev They could be shared out again with potential performance issues like well, of course, I'm such a hardware geek, like I'd love to build out a rack of this stuff.

Speaker 2

48:52

It would be fun, wouldn't it.

Speaker 1

48:53

It would be and you know, and then now I've got the heat and power problems right.

Speaker 4

49:00

To live it firsthand. Well to your point about shared resources, that is one of the nice things about win mL that just released Execution Provider that Microsoft announced making it easier for local devs to integrate models is if you have an application and you need a model, do you download it? And then every single one of your applications is downloading a five gig LM. Yeah, obviously that becomes untenable very quickly unless you have that twenty two terabyte

49:31

drive in your computer. Solow you yeah, yeah, more than one. It does allow you to share models across application rich so you can have one machine install.

Speaker 2

49:41

Richard, you were right. I thought they were SSDs.

Speaker 1

49:44

They're not. They're HDDs ds. There are a few SSDs over eight terabytes, but most of them the line seems to be eight. By the way, the RTX six thousand, six hundred watts each.

Speaker 2

49:56

That's why I have solar panels.

Speaker 1

49:58

Yeah, that's it, you know, like oil BOYD. I'm just thinking about how much you remember in the end, this is moving electrons around and generating heat like you just made rocks make heat. Like that's saying time watts. You're gonna feel it. You don't want to sit in the room with that thing only man, No, it's going to be crazy. But it is an interesting point of view as we're still going through this to say, what are we going to shift local? What are we going to

50:21

run remote? Like, what's feasible at what makes sense for folks here? And I think, you know, not everything has to be cloud and not everybody wants it there.

Speaker 4

50:30

Right, And I think you just have to be you know wide. I'm not saying to get super deep on all of this stuff, but the tools for you to get your feet wet are available, and when you're CTO or more probably more likely, your CFO comes to you and says, hey, we can't afford this bill anymore. Your critical application can't use this LLM. You have to stop, or you have to change something because either somebody's prices went up or the business model changed.

Speaker 1

50:58

Yeah, what are you going to do?

Speaker 4

50:58

What are you going to reach for? And getting your feet wet in some of these local models, it's a great way to have an answer or have some sort of solution or see if that solution will work.

Speaker 1

51:08

Now you're swapping op X for CAPEX and then, you know, using CFO speak like, we have two ways to solve this problem. We spend month over month on it, or we made a capital investment and spend less. You know, let's do the math. You know, if you want to talk to CFO, bring a spreadsheet.

Speaker 4

51:24

Yeah, exactly. And it's as as we've said, as you've said, stuff is changing so fast. So if you get super deep, if you start training your own model, and then tomorrow somebody comes out with a model that just makes all that effort useless. This is again, this is like the sweet spot, right, Isn't this where the Windows developer has kind of always loved to live where they're like, yeah, yeah. Yeah, we're not like hardware level, we're not doing machine code.

51:51

But then we're also not just like bleeding like the best of the best. It's like, okay, we're in the middle here where we got models, we got a local It's it's efficient, it's it's a good balance.

Speaker 1

52:01

Yeah. Well, and I'm going to call back to Cagle again because one of the other ways you can get a model built is to put out a bounty on Cagle in a competition to have someone build it for you. Effectively, there you go. So you've got the data set, but you don't want to actually do the construction. You can host a competition and define your problem space and provide the sample data, and a bunch of people compete for

52:25

to deliver you the best model. It's a weird world, man, is like, if you want to go deep into mL, there's so many interesting things to be done here. M hmmm.

Speaker 2

52:33

I had the weird meta thought that you could get a model to build your model instead of you know, farming it out for a bounty.

Speaker 1

52:42

Well, you're not wrong to interact with an LM to start constructing a plan around how a model would get built, because that you know, in the end, they are a pretty clever search tool for best practices.

Speaker 4

52:53

Yeah, search and tokenization is a really nice thing that you can do with your local LM of crunching some of your data, your text, tokenize it make it easier to search, have that more natural language available for your users. It's a really hard thing to code, but if you have local l MS, I can help you build that.

Speaker 1

53:13

Why not. Yeah, that's cool.

Speaker 2

53:15

Anything else on your mind that you want to touch on before we call it a show?

Speaker 4

53:20

Not really, I mean we touched on a lot here. Yeah, we just try it.

Speaker 1

53:23

We went we went on a ride today friend again. Yeah, but this is the kind of deep.

Speaker 2

53:28

Dive into local lms and local AI that I really wanted to get to. So I'm very very happy we talked. Thank you, Joe.

Speaker 4

53:36

Yeah, happy to be here.

Speaker 2

53:37

I'm right and we'll.

Speaker 5

53:38

Talk to you next time on dot net rocks.

Speaker 2

54:03

Dot net rocks is brought to you by Franklin's Net and produced by Pop Studios, a full service audio, video and post production facility located physically in New London, Connecticut, and of course in the cloud online at pwop dot com. Visit our website at d O T N E t r o c k S dot com for RSS feeds, downloads, mobile apps, comments, and access to the full archives going back to show number one, recorded in September two thousand and two. And make sure you check out our sponsors.

54:35

They keep us in business. Now go write some code, See you next time.

Speaker 4

54:40

You got jas.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript