Machine Learning on Geospatial Data with Malte Loller-Anderson & Mathilde Ørstavik

Speaker 1

00:01

How'd you like to listen to dot NetRocks with no ads? Easy? Become a patron for just five dollars a month you get access to a private RSS feed where all the shows have no ads. Twenty dollars a month will get you that and a special dot NetRocks patron mug. Sign up now at Patreon dot dot NetRocks dot com. Hey, Carlin Richard here with your twenty twenty four NDC schedule.

Speaker 2

00:26

Will be at as many NDC conferences as possible this year, and you should consider attending no matter what. The Copenhagen Developers Festival happens August twenty sixth through the thirtieth. Tickets at Cphdevfest dot com.

Speaker 1

00:39

Ndcporto is happening October fourteenth through the eighteenth. The early bird discount ends June fourteenth. Tickets at Ndcporto dot com.

Speaker 3

00:49

And we'll see you there, we hope.

Speaker 1

01:03

Hey, guess what it's dot net Rocks. I'm Carl Franklin and i Amgard Campbell and we're here again. Yes, we keep doing this. You know people are gonna talk. Yeah, hopefully that's the whole point. We talk about everything, not just dot net but We're a podcast for dot net developers. That's the way we've always been and gotten.

Speaker 2

01:22

Developers do a lot of stuff these days, it seems I think they always have.

Speaker 1

01:25

Yeah, I have a little bit of a story of and a product to talk about for better no framework. So roll the music, man.

Speaker 2

01:41

What do you got?

Speaker 4

01:41

Well?

Speaker 1

01:41

I gave myself a birthday present this year.

Speaker 2

01:44

Happy birthday, Thank you.

Speaker 1

01:45

Happy birthday to you too, sir.

Speaker 2

01:47

Yeah right, it's about the same time, isn't it.

Speaker 1

01:48

Yeah, I got. I replaced the stock in dash stereo system in my twenty twelve Honda CRV AD it a long time yeah, yeah, with yeah, but it's a great car. But I replace it with something new that has Apple car Play and all that stuff. And I did a lot of research and I found the best bang for the buck was about five hundred and forty nine bucks. I got it at best Buy. It's a Pioneer AVH twenty five fifty n ex All right, So what is it?

Speaker 5

02:24

So?

Speaker 1

02:24

It does everything right, but mostly it connects to your phone and it will either use Android Auto which is the Android version, or Apple car Play, both are registered trademarks or whatever. It also does Bluetooth, but here's the thing. Unlike most in car dashes that have their own GPS, right, this one it connects to the rate, it has radio and all that stuff that you would imagine. But Apple Car Play in my case, because you're an iPhone user,

02:56

does every Yeah, because I'm an iPhone user. My brother has another one. It does Android Auto. But they both do the same thing, basically use your phone. So I use ways to navigate and it shows right on the console.

Speaker 2

03:09

And it's better in every way.

Speaker 1

03:10

It's better in every way than the standard stuff that comes in a car, certainly, and even better than Google Maps.

Speaker 2

03:15

I think, well, these Google Maps actually under the hood. Yeah, they owned by Google, but you get the.

Speaker 1

03:21

Sort of reports of you know, traffic and police stuff.

Speaker 2

03:26

Which apparently police watch closely and then just remove themselves whenever somebody identifies the speed trap to go Nope, that's speed trap's gone.

Speaker 1

03:32

Well, let me tell you. I drove to Pennsylvania over the weekend in back six hour trip and it worked like a dream.

Speaker 2

03:39

Yeah. Kept out of the bad traffic.

Speaker 1

03:41

Yeah yeah, and yeah, except for the bad traffic, which you know it told me about. It was too late, but it could have re routed me.

Speaker 2

03:48

But off it to tell you and say it's still the best route.

Speaker 1

03:50

Yes, it's still the best route exactly, and most of it was just caused by bad weather.

Speaker 2

03:54

So really, this is device, this pioneer device is a big screen relatively speaking in your car, ties to your phone so that you don't have to look at your phone to do NAV right right.

Speaker 1

04:04

So you can use you can listen to anything on your phone that you.

Speaker 2

04:08

Have access to Spotify at all.

Speaker 1

04:10

Yep. And how before I was connecting audio with Bluetooth right, and there's a delay with Bluetooth, But this one has a USB jack, a female jack that comes right out of the dashboard and you plug your phone right into it, so there's no delay whenever you pause something.

Speaker 2

04:27

Or you want to say you're charging, so you know.

Speaker 1

04:30

Charging, skipping forward, skivving ahead.

Speaker 2

04:32

Well, because the other reality is you're running the GPS sensors the whole time because it's actually doing the NAV work. So it's going to mow your battery unless it's plugged in.

Speaker 1

04:39

Yep, and it's plugged in and I had no problem with that. And also my car came with a little voice button on the on the steering wheel, but it really didn't do anything.

Speaker 2

04:51

It's the old voice, old old boy.

Speaker 1

04:53

But this connects directly to it, so I can essentially any time I want, I can say, call this person, text message this person.

Speaker 2

05:02

And it's now sending that to the phone.

Speaker 1

05:03

Sending it to the phone. Right, I can send text messages or whatever. I can skip ahead if I don't like the ads of my podcast, Hey, should I have said that?

Speaker 2

05:12

You shouldn't say that. No, have you noticed what your businesses like?

Speaker 1

05:15

Come on, but you know I've heard these same ads a million times, and I skipped through. I just had a great experience. Now I will tell you I gotcha, And I'm sorry, guys, I'm taking up too much time with this, but there was a gotcha. When I first set out, I was using an existing lightning cable and I was getting audio dropouts and it was the cable. It was the cable. I stopped at a mall, I went to the Apple store. I got an official Apple iOS.

Speaker 2

05:43

You spend sixty bucks on a cable and it worked, It fixed it. You know, if you actually watch YouTube on disassembling some of those sixty dollars cables, there's a lot of stuff in those cables. They're not just wires, right, so it didn't work with a third party cable that.

Speaker 1

06:00

It did with the iOS came okay, all right, who's talking to us?

Speaker 2

06:04

Richard Grabby comment off a show eighteen ninety nine back in May of twenty twenty four when we talked to our friend Aaron Erickson, who hadn't been on the show in a very long time, the nomadic developer, who we now called the nomadic AI developer because why not, all right? And it was part of a sort of two show set because we also did that show with Sean Wildermouth talking about being in a computing career later in your life,

06:25

which islicited a huge response. Oh yeah, that was great, and you know, Sean, I think expressed a lot of worries that all of us had. And then in comes Aaron who's like, hey, I've totally retooled my career again in my late fifties and I'm now working for Nvidia, all in on the on the new technology, which I thought it was just an interesting bounce to too. And this is a comment from Trev who says, Hey, I love the conversation day. Aaron has had an amazing career and

06:52

I would suggest it in VIDR. Super lucky to have him on board. The conversation on generative AI touched on the travel industry, and I thought that I would share something from a presitation I attended last week. A New Zealand travel agency who during the pandemic lost something like eighty percent of its value and sixty percent of its staff, has a terrible problem. Now they can't hire travel agents because during the pandemic they all retrained in other careers

07:16

and they're not coming back. So they've now turned two large language models and new automation to solve this problem. And he's obviously Trev used as the service because he said, normally, if you want an itinerary for a trip, it would take the travelation a couple of hours to put it together. Now I get it in two minutes, and it's based on all my wants and wishes, all the little details

07:36

and so forth. And that includes confirming the hotels, setting up all the flights, optimizing to my flight needs, your time, your budget. What stoffovers is a perfect illustration of where these technologies can transform the industry. And thanks so much for the show. Awesome, Yeah, you know it's we're dealing with It's a double whammy, isn't it. The pandemic definitely tore up a bunch of industry, and then we have

08:01

this new technology that gives us a better interface. I mean, you've always been able to build your own travel itinerary. It's just you know, I still do it this way where I got trip it open over here, and I've got various websites for airlines there and different hotels and you know, but the fact that you could pull out all together with software using a modern travel agency, and it costs the agency left to deliver you faster service. Like that's what automation is all about.

Speaker 1

08:27

I know, a whole bunch of office space that is currently available.

Speaker 2

08:31

M well, a lot of people are not, you know, still working from home, and it's not going to change. Yeah, so, or at least it seems not to, so, Trev, thank you so much for your comment and a copy of music Cobey. It's on its way to you. And if you'd like copy of music code by I write a comment on the website of dot netroocks dot com or on the facebooks. We publish every show there, and if you comment there and everybody on the show, we'll send you a copy of music Go Buy and.

Speaker 1

08:50

Music code by still going strong after all these years. A great way to stay focused when you're writing code.

Speaker 2

08:57

I got a nice note from a listener who I had sent the code too so that they could get a Copyody, Who's like, this changes the way I work?

Speaker 1

09:03

Yeah, it is good, you know.

Speaker 2

09:05

Yeah?

Speaker 1

09:06

Okay. Let us introduce our guests today who've never been on the show before, so I'm really looking forward to talking to them. Malti Lawler Anderson is a passionate machine learning engineer working for Norkart, operating in the geospatial domain. And Matilda Ushtevik has a master's degree in geographical IT with a specialization in geographical AI. You can read their full bios at dot netroocks dot com, of course, but that'll get us started anyway. Welcome guys, thank you, thank you,

09:39

thank you. All right, so that first voice you heard was Matilda, the second one was malte just to identify for our listeners. So who wants to start with the elevator pitch? What are you guys doing out there?

Speaker 5

09:50

Do you want me to take the first one? I can do that, yes, thank you? Well, you know is norwegi An it company at this point. I guess historically we did a lot of maps and stuff like that, but we are more more more pivoted into doing it, and we deliver a lot, a lot, a lot of different products within the juicepacial domain. We have exactly three point fourteen or pie products per developer.

Speaker 2

10:26

Great, so you have to if you add new products, you have to hire more developers to keep the ratio correct. Is that the plant exactly?

Speaker 4

10:33

Yeah, yeah, every developer that gets hired has to bring in three point fourteen products.

Speaker 2

10:38

So great, Oh my goodness. So you're using primarily aerial data. I guess there's some satellite, some aircraft. Of course, drones have changed this landscape as well, but then turning it into usable imagery it's one thing they have a bunch of photos is another thing. To turn it into something that's really usable. That's got to be pretty software intensive.

Speaker 4

11:01

Yeah, definitely. We're mostly using different aerial images and creating like servers that can handle both three D tiles as well as the aerial images and vector data that we can then implement into different softwares. So we're mostly using the different maps more than creating them.

Speaker 2

11:27

Okay, So yeah, somebody else is involved in collecting the data. Now you're trying to get value.

Speaker 4

11:31

From it exactly.

Speaker 2

11:33

Yeah, well, how long has that been a problem in our industry? We collect lots of data, we just don't do anything with it. So where how does machine learning come into the equation.

Speaker 4

11:41

So the first thing we started to test was how we could improve the vector data based on what is actual actually visible in the aerial images. So traditionally what you do is that you look at the aerial images manually and then you draw the maps from those images, which is a manual process, very time consuming, and of course it's prone to errors because it's easy to not detect everything.

Speaker 2

12:09

So using the human to do the pattern recognition to say that's an image of this area, here's how it lays together.

Speaker 4

12:15

Yeah, this is a building. This is its walls, right, this is the details of the buildings and so on, and roads and everything. So we started testing if we could use machine learning to detect the buildings for us and also different types of objects.

Speaker 1

12:32

That must be a lot of work job security for a lot of people.

Speaker 4

12:36

Yeah, definitely. So we started testing this back in I think it was twenty sixteen, which is when image recognition got really good and it worked pretty well, and then we've continued developing these methods until not more or less.

Speaker 1

12:54

Okay, when you were describing what you used, just these image this image data and that you go and get it. Don't these databases exist? We were talking about Google Maps, and that certainly is a rich database of topical imagery. Is it something that you can't use for copyright reasons or you need to do your own imaging in order to get the level of detail? Like why is it that you can use something off the shelf?

Speaker 4

13:22

Yeah, level of detail is definitely a big part of it, and ownership and costs also. So in Norway, all the municipalities have areal images of high resolution for their areas that they update maybe every year for the big municipalities, maybe every fourth year for the smaller ones. But it's very higher resolution and the level of detail in the

13:49

Norwegian maps is incredibly high. They're very detailed and they're on a Norwegian specific format, so all the detailed in Norwegian maps are way higher than Google Maps.

Speaker 1

14:04

Yes, so that's what you're using using the Norwegian data maps that already exist, so you're not actually going out and flying drones over buildings and taking pictures yourself or whatever.

Speaker 4

14:16

Yeah, exactly. So originally back in nineteen sixty one and so on, Woodcraft did produce aerial images, right, and then they became a software company, I see.

Speaker 2

14:28

Like everybody else. Yeah, but I guess you're I mean, your goal is not to do navigation. I imagine you're doing other things with the data.

Speaker 4

14:36

Yeah, mostly other things. We also do some navigation, but mostly other things.

Speaker 2

14:42

Yes, So what kind of what do people want this data for?

Speaker 5

14:45

Like?

Speaker 2

14:45

What are they asking for?

Speaker 5

14:46

Oh?

Speaker 4

14:46

So many things. It can be visually to see other types of data together with their images. But when you're talking about the vector data. In the vector maps, it's for analysis.

Speaker 5

15:03

Planning of infrastructure. You know, taxation in Norway at least you get text based on what buildings you have in your property, so it's really important to know, like or if a fire starts, it's really important for the firefighters to know what buildings exists where, you know, all that kind of stuff.

Speaker 2

15:22

So yeah, and it needs to be regularly updated. So you've got the municipalities, the big ones at least scanning every year. Like that also introduces in angle of how is the land evolving year over year. Certainly thanks for British Columbia, which I often equate similar to Norway in some respective stuff. We're really big. It's a tree coverage right between logging, forest fires, and regrowth. You have to go and image those areas to really understand what's the state of the land.

Speaker 5

15:53

Yeah, yeah, yeah for sure. And we've seen others also that has done exactly that in Norway as well, try to map up what trees are where?

Speaker 2

16:03

Yeah yeah.

Speaker 1

16:05

Where does development come in? Are you doing mostly Python programming against this data or are you using any cloud computing? What are you using?

Speaker 5

16:15

Yeah, we're mostly using Python like everybody else's machine learning, at least for this project, and this is mostly trained on our own data center. So we have two L forty GPUs with sixty four gigabytes of space each and these are quite actually big enough to you know, you get some really good results trying to identifying objects from their own images.

Speaker 2

16:51

Oh wow, okay, but so not even using the cloud. You're running your own You guys really are old school AI.

Speaker 1

16:59

Yeah, yeah, we're running our own.

Speaker 5

17:01

But we've also tried the cloud experimented a bit with that, but we have we figured out that for this project. We have other projects as well, of course, but for this project we're using our own data center yet and.

Speaker 2

17:14

So an L forty And I'll include a link for the show notes. Folks want to take a look at these things. I mean, it looks like a little computer essentially, something that will go. Looks like it should be in a data center. Like is it anything more than just like an RTX forty ninety in a chassis.

Speaker 5

17:34

I don't remember the exact specs of the forty. Does it have sixty four gigabytes.

Speaker 2

17:41

Of I don't think so.

Speaker 5

17:44

I think that's the biggest part of it, at least that we have that we have that.

Speaker 2

17:49

It's just a sheer amount of memory.

Speaker 5

17:51

Yeah, yeah, so you can have more images in memory at the same time.

Speaker 2

17:55

And that's yeah, that's got to be what it's all about. Really, It's just like how much RAM do you have available to you to be able to do that as those image analysis Yeah, for sure, I'm just looking it up. Yeah, twenty four gigs on a on A forty ninety and forty eight on an on an L forty. But you know, pipeline wise, like they're not that different, but they're still different and yeah PCIe interface so and three hundred watts so it'll keep a room warm.

Speaker 5

18:22

Oh yeah, you need some cooling for that.

Speaker 2

18:24

Oh yeah, no kidding, but absolutely So you use a pair of these.

Speaker 5

18:28

Yeah, we have two.

Speaker 2

18:29

That's nice. We have two.

Speaker 5

18:31

But there are sixty four gigabytes of BRIANMA, not forty eight.

Speaker 2

18:34

Yeah, so maybe a newer model.

Speaker 5

18:37

Maybe, I'm not sure.

Speaker 2

18:38

Yeah, that's cool. I gotta I gotta find how much these things cost. I need not that I need rack here anymore. I'm doing everything I can not to buy any more rack related equipment. Please please please stop, like, don't do it? Yeah, five bucks of crack man thousand.

Speaker 5

18:54

Five thousand.

Speaker 3

18:55

Yeah.

Speaker 5

18:56

Don't you remember how much we paid Matilda. I don't remember. Oh I don't remember. No either.

Speaker 2

19:03

It would have been a coroner anyway, so nobody would.

Speaker 4

19:08

All I know is that it was cheaper than doing it in the clouds.

Speaker 2

19:11

Oh wow.

Speaker 5

19:13

Yeah.

Speaker 2

19:13

There's a lot of testing, right because the cloud is only dinging you by the minute, So at some point you have to do the projection and go, hey, if we just buy this, we're good for a certain amount of time. I mean. The upside of using the cloud is they're going to upgrade their hardware for you. Now, you guys are on the hook for you know, X many years, probably four or five years of amortization over this stuff to make it make make sense. But I

19:36

get it, totally makes sense. It's very it's reasonable.

Speaker 1

19:40

I'm interested in what other uh, you know, applications that we have, and one that came to mind is law enforcement. Does law enforcement use your data in any way that you can talk about?

Speaker 4

19:55

Good question. I don't think so, but I'm not completely sure because we have a lot lot of customers that use our data that we don't necessarily know about.

Speaker 2

20:04

Right, So, how do you train a model to figure out that that's a building and that's a wall.

Speaker 4

20:09

Yeah. So the really great thing about this is that since we already have the vector maps, people have already done the manual work looking at these images. Yeah, we can just use the existing data that we have and we can produce enormous amounts of training data automatically. So

20:28

that's the really great thing about these models. So what we do is that we combine aerial images from a small area around five twelve times five total pixels, and we get the existing data for that same area and we can combine those images to produce the input image and the label image, and then we can basically produce as much data as we want because we have data for all of Norway, and we also have historical data, so we can combine this and then train our data

21:01

or train our models with that automatically produced data.

Speaker 2

21:05

Okay, and so I mean I think two things. Some of that one is you're just saving time on a new city day. Say yes, that's the same building from last time. If it doesn't identify it, maybe that's because the building is changed.

Speaker 5

21:15

In some way exactly.

Speaker 2

21:17

I wonder how sensitive it would be, Like if a building puts solar panels on its roof, so now it looks a little different. Is it still able to map it or does it pull it up as I don't know what this is.

Speaker 4

21:28

Yeah, it depends on what is in the training data. So if there were some new kind of solar panels popping up in Norway, our model would most likely not detect it. Right, If it's solar panels similar to once we already have, we would most likely detect it.

Speaker 2

21:43

So is your vector data map down to that level says that's not only a building that's a building with solar panels on the roof.

Speaker 4

21:47

If the solar panels are in the training data, they can train a model with that. Yeah, so we have tested that as well, but then we need to specifically have the training data with solar panels.

Speaker 2

21:57

Right.

Speaker 1

21:58

And when you're drawing these things by hand, you're doing these sort of annotations. I guess you would say, really, are you just like drawing lines and saying this is how long this? You know, I don't know how wide and how high wits and heights three dimensional data, I guess you would say, or how detailed do those hand drawings get.

Speaker 4

22:20

Well, we're not actually doing the drawing since the this this has already been done by people specialized in this field in different companies.

Speaker 1

22:30

Oh okay. I was under the impression when you first spoke about it that you had a team of people that were drawing on top of manually drawing on top of images.

Speaker 4

22:39

But yeah, I know different companies in Norway do this.

Speaker 2

22:42

So you're just pure data and it's done now, so you've trained the model.

Speaker 5

22:46

Yeah, that's what we're trying to automate by this manual drawing process.

Speaker 2

22:50

Right, So each time new images come in you're able to then use the machine learning model to establish that this is the same image of the same area with these same buildings.

Speaker 1

23:00

Exactly how often does the data change.

Speaker 4

23:03

It depends on the municipality when they how often they buy new images, but it can be up to every year. So when there are new images, we can run our model detect all the buildings in the municipality and we then compare those results to the existing buildings, which means that we can figure out where there have been changes. Have someone is there new buildings popping up, or has

23:30

any of the buildings changed. And that data set is very interesting for the municipalities because then they can figure out where their maps are not as similar to the reality visible in the images.

Speaker 2

23:43

What about things like seasonality, like how good is this going to map a picture from the summer versus a picture in the winter.

Speaker 4

23:49

Yeah, good question. So the companies that take the aerial images, they usually always take the images in the same same.

Speaker 2

24:01

Time of year, right, So just avoid the problem by we only photograph these this area in July should be pretty consistent. And of course you need clear days because taking pictures in the rain questionable value.

Speaker 4

24:13

Yeah, so they have to work for good weather.

Speaker 5

24:16

Yeah, and in Norway there's only a kind of good weather, you know, in the spring and summer, so that's where all these pictures are taken.

Speaker 2

24:23

Been there, I've been there in June, I've been there in November, and there's a difference. Okay, So I mean we've definitely picked an area here of using the machine learning models to do the image recognition, so you reduce the cost of introducing new images to the system and then help the help the customer in this case, you typical municipalities identify new buildings. That's a great class of work.

24:50

Are there other classes of work here? Like I would think things like, you know, are trees dying off or you know, our cultural changes, those kinds of impacts might be turned up this way too.

Speaker 4

25:01

Yeah. Absolutely, Yeah.

Speaker 5

25:02

But the main one you said was, you know, detecting buildings, and actually, you know, one of the municipalities in Norway, they actually detected a lot of new buildings so that their property tax could be lowered for everyone in that municipality, right, they could still get the same income.

Speaker 2

25:23

Because they recognized there was more buildings out there that they didn't know about that responsible for tax exactly.

Speaker 4

25:29

So some had to pay more, but in general everyone could pay less.

Speaker 2

25:32

Right, Yeah, the costs were already being burdened, but not everybody was paying their fair share effectively, right, Yeah, exactly, Like from a municipal data perspective, this is really powerful because otherwise you do this with surveyors, which you think mapping out aerial imagery is costly. Send a group of people out to go measure everything by hand like that takes a lot of time.

Speaker 5

25:52

Yeah, exactly. And also some of the some of the buildings we can see that hasn't been updated in the vector data in a lot long time, but the machine learning model can actually detect those, So in some ways they're also doing a better job the models.

Speaker 2

26:09

Right. That's interesting to see because they've got a better vantage point for all of this. So primarily a Python problem is just munging the data and doing the mapping together.

Speaker 5

26:19

Yeah, it's mostly or it's all Python this project. And what we do is that we use our GPUs of course, and then during training we will what can you call it like live sample all of the images from databases and thereby, you know, really take care of a lot of pictures at the same time. However, when you do this thing live. You also have to make sure that

26:49

you get the data that you want. For instance, in Norway, you know it's Norway, it's what is it, ninety eight point three percent of forest, So you should just pick random tiles from Norway. You're just going to You're just going to get trees and.

Speaker 2

27:07

Trees exactly.

Speaker 5

27:09

Yeah, So we have to be smart how to pick out this data when we're doing it live and stuff. So, yeah, so we have some rules that have to be set for some of the images that has to contain buildings and stuff.

Speaker 4

27:20

Right, So this is the main thing that we've been working on. What is the optimal way to to select data to get the best models?

Speaker 2

27:30

Right, really shaved down the part you don't have to you're not getting maps of the whole country. You're just getting maps of the interesting areas, the places where people are.

Speaker 4

27:39

Yeah, and training data for the interesting parts. Right, because if all the model cs our images, of course it's just going to learn that nothing is building and it's going to be correct most of the time, yes, but it's not going to be useful.

Speaker 2

27:54

How many buildings none point three per is correct right there? Yeah, Yeah, it's pretty dark accurate actually, three point one four our buildings in no.

Speaker 5

28:07

Way three point one four.

Speaker 2

28:12

All right, well we should break for a moment for these very important messages.

Speaker 1

28:17

You know, it's common for business application to contain fifteen percent repetitive code just because of metaprogramming limitations in the C Sharp language. Why write boilerplate manually when a machine could generate it for you? Enter Metalama, the code generation and verification toolkit for C Sharp. Their C Sharp to c sharp template language is simply amazing logging caching memento observable.

28:43

If it's repetitive, Metalama can automate it. Visit metalama dot net today and learn to automate your code patterns with their free edition. Remember it's Metalama with one L E T A L A m A dot net. Hey Carl here if you're wondering how to step up your debugging game. Raygun's crash reporting now supports portable PDB and offline error storage, perfect for apps built with dot Net, MAUI or Windows Universal Platform. Just upload your pdbs and ygun will enrich

29:18

your stack traces with crucial details. Never miss a beat in your development cycle again. Let raygun take the hassle out of error tracking. Visit raygun dot com slash dot net rocks that's Reygun, r A y g u n dot com, slash dot ne e t r o c ks for your free fourteen day trial.

Speaker 2

29:40

And we're back. It's dot net rocks. I'm Richard Campbell. That's Carl Franklin. You we're talking to Mattil de Malte about they challenge your geospatial data as well as how machine learning is helping out there, because you know, most of the time when I'm dealing with machine learning models, we're trying to get as much diverse data as possible

29:59

for the training. Said, and you've just painted a really great case of where you don't want to do that because there's a lot of data that's irrelevant here, Like you really have to cut it down to just the populated areas, which is mostly coastal for Norway if I remember my geography at Norway. But even that would necessarily be a good criteria. You just have to You said it just before the break there meddild Does it have

30:23

a building in it? Yeah, right, that's probably And I'm sure there's like wilderness huts out there somewhere that's like mostly forest, but there's a hut.

Speaker 4

30:31

Yeah, we did tests in the beginning. Just have a rule that said we only want training data if there is at least five percent building within that tile, right, which works really well if you want to focus on detecting buildings. But what happened is that whenever the model saw areas that wasn't connected to a building nature that is not the typically surrounding buildings, it's falsely detected buildings. White spots of snow right was detected as a building.

Speaker 2

31:03

Oops, like the face on Mars. Right, there's sh rock formations that could be building ish like, yeah, they could beloos, right, that's any thing.

Speaker 1

31:14

But okay, well you know it's called there's snow, there's ice.

Speaker 2

31:19

Yeah, somebody could make an igloo.

Speaker 1

31:20

I don't think it would have having a street address though no mailbox.

Speaker 2

31:26

I would also think that you've got lots of highways and things where there's not a lot around it for some stretch, like I've driven on those where it was just forest on both sides. I don't know that you need to map all the highways, but you know that would be something I'd want to discriminate for it. You're not going to get buildings from that or at.

Speaker 1

31:42

Least land and water, you know, unused land or unoccupied land and water bodies of water.

Speaker 2

31:49

Yeah, interesting, interesting shed of problems. So then, but the whole point here is to keep adding new data because they do want to look at things over time. Do you end up building like kind of time lapses to be able to look at a particular location, say here is several years and how it's evolving.

Speaker 4

32:04

So what we've actually done is that we've created a system that creates training data while we're training the model. And what it does is that it just selects randomly within a set of rules. So we have some rules that is creating data for tiles with buildings. How a tile five twelve times five twelve pixels?

Speaker 2

32:28

Okay, but and space that scale wise could be anything.

Speaker 4

32:32

Then yeah, it's so the images have a resolution of ten centimeters between ten and thirty centimeters per pixel special resolution, and.

Speaker 2

32:44

Then you're looking at five twelve by five twelve tile. I mean, that's just a big old jigsaw puzzle. Holy man, Like, do you have geospatial data associated with that shot? Right? You know roughly where it was taken?

Speaker 4

32:56

Yes, exactly, So we have the coordinate for the lower left corner. You have the spatial resolution, which means that we can calculate where in the real world we are. So when we can detect the building within that image, we can figure out where that building is.

Speaker 2

33:11

Now, wouldn't that geospatial tagging be the best thing for defining a set for a griven training set for a given area because you already know roughly where it is.

Speaker 5

33:20

Yes, it is exactly. And what we have done to speed the process of picking the best data for us is that we have within five meters of every building in Norway, we have made these boundary boxes. Then we made an sqo light database that we have mapped all of these data points into, so that we know exactly where all the buildings in Norway are and we can just pick from those areas. Yeah, so it goes really fast.

Speaker 1

33:51

Well that's great.

Speaker 2

33:51

So you've derived a data set essentially based on this image that says, Okay, these are the buildings we know about.

Speaker 1

33:57

Yeah, and then yes, about map reduce, that's exactly what that is.

Speaker 5

34:05

Exactly.

Speaker 2

34:07

Yeah, it's crazy set of boundary boxes. Now, I think the challenge with using the geospatial filter is if they've built out of the known range, you're going to miss it, like you kind of want to take a perimeter around that or look larger to say, where's a new set of buildings that we've just never mapped before. That's got to be kind of the hardest problem.

Speaker 4

34:28

Yeah, So when we're looking for new buildings and actually analyzing their images, we analyze every image. Okay, so the selection within the building areas is only for creating the training data.

Speaker 2

34:40

Hmm yeah, Oh I see. Now do you actually get images of all of Norway? I got to think the municipalities only map the area they're responsible for.

Speaker 4

34:47

No, we have images for all of Norway because the municipalities are responsible for their areas. But then there's also the Norwegian MAPP being authority. They also have images for all of Norway with the lower resolution but still very good quality.

Speaker 2

35:03

Right, So I presume you have provinces or states something like that, sub regions within the country that have an authority that's responsible for that.

Speaker 5

35:13

Yes, we have what's called which will be a state.

Speaker 2

35:17

I guess't you call me, but I guess this is part of the problem. Is I mean, this palady is going to give you very high resolution data relatively speaking, where the state level data will be lower resolution because it's mostly trees, so it doesn't need to be that precise exactly. Although we had a situation British Columbia with with the with the Pine beal where they were doing mapping showing that pine trees were being killed in the

35:44

forest like in the wilderness. The other trees weren't affected, but they wanted because it increased the far fire risks. They're looking for where they're concentrations of pine trees that were dying because they burn really really well and create worse fires. And so they were doing all of this mapping and occasionally you got to see some of it where it's just like this whole slope is just pine trees and they're all red now, where all the other

36:06

kinds of trees around it are still green. Like very interesting visualization data. Like we I'm such a geek on this stuff, Like let's get into multi spectral data where we could actually look at the stress levels of plants and you know, those different effects or even CO two emission and IR maps of all of this data to so what are the buildings that are environmentally inefficient or that have high emissions, Like there's a ton you can do with this data if you're scanning enough frequency.

Speaker 4

36:34

Yeah, definitely. So we've also tested using the infrared images because we also have infted images for a lot of areas in Norway, and the interted images will give us different information than the regular RGB images. And we've also included high data. So this is one of the exciting things here. We have all of these different data and we can combine it into our models in different ways to detect different types of objects, and different data will be good for different types of objects.

Speaker 2

37:01

How do you get the height data?

Speaker 4

37:02

It's also collected for Norway by the I think it's also the mapping authory, the.

Speaker 2

37:09

Vector stuff, so they must have something that they're pinging the how far they offer on the ground at any given point, Like, are they actually giving you the heights of different buildings? That to me would be astonishing.

Speaker 4

37:20

Yeah, it's height for different Yeah, the point clouds.

Speaker 5

37:24

Okay, it's very very accurate. Yeah, very cool to look at.

Speaker 2

37:29

Is the height data specific to a tile or is it or is there different height data within the tile?

Speaker 4

37:34

Different within the tile. It's also very higher resolution.

Speaker 1

37:37

So per building, basically we'll be higher than that higher.

Speaker 4

37:40

Yeah, it's many points for one building.

Speaker 2

37:42

Yeah. Yeah, sure, if you have a if you have a height data per ten centimeter pixel, you know you're going to be able to map out a rooftop. You be able to extinguish individual like an air conditioner sitting on a roof would have a different height from the rest of the room.

Speaker 1

38:00

Yeah, or chimney or something else.

Speaker 4

38:02

Yeah, yeah, definitely if you look at the height data, you can definitely see what kind of areas you're looking at. Crazy, it's very accurate. Yeah, and it helps us to differentiate, for example, terrace from an actual building, right.

Speaker 2

38:15

I was just thinking it's easier to buildings have a different height profile than normal terrain does, so it might be a good way to say that's probably a building because it's too uniform or it has a you know, consistent slope like things that only humans would do. So it's actually not a bad building detector. Even if the image, the visual image doesn't nestually show it up. The height data could really trigger you on saying that looks like a structure.

Speaker 1

38:40

Yeah, So are there other things besides like patches of snow that have confounded the algorithm or whatever?

Speaker 5

38:46

The detection we also found that we also found Atlantis. Oh, or there's it says that there's buildings out in the water.

Speaker 1

38:55

Yeah, you found Atlantis, yes, yeah, because it's in the training data.

Speaker 5

39:03

In the beginning, there wasn't really enough data with water, so it thought that patches in the water was also buildings. So that's where a lentiss that's in Norway if everyone was wondering, Yeah, that's.

Speaker 2

39:15

Why we haven't been able to find it. Talking about the Mediterranean, he had no idea it was a north Sea the whole time.

Speaker 5

39:23

Wow, exactly.

Speaker 2

39:25

Yeah, yeah, the height data is not going to help you there. The ocean is a pretty consistent hype. But yeah, but yeah, you could easily have shapes in the water that could be mistaken for buildings.

Speaker 5

39:33

Yeah.

Speaker 2

39:34

Yeah, I mean we're almost anthropomorphizing the software's ability to recognize images, right, like it's making the states of work is just not a good enough training set. Although yeah, I don't know how you even filter for that. You do you give it a set of things in the water that are not buildings to help it? Do you can not to avoid that? Yeah?

Speaker 5

39:55

Exactly?

Speaker 2

39:55

Can you do that negative option? Or you only train it towards what it should find as a building.

Speaker 5

40:00

Oh, we have to we have to also include because when we started out, we only did buildings, and in the area with high concentration of buildings, it worked really well, but then when we did it in a more rural area, it didn't work so well. So then we also now have to include some water and forests and snow and all that kind of stuff.

Speaker 2

40:18

And so your training to actually say that is objects in the water, so not buildings.

Speaker 5

40:23

Yes, exactly.

Speaker 4

40:25

Yeah, we're saying, yeah, it's it's only binary. So we say building or not building, and we make sure that there are images of water in the data set that is labeled as not building.

Speaker 2

40:34

It's not building. Okay, yeah.

Speaker 4

40:36

But one thing that we've done to make sure like you can kind of imagine that you need some water, you need some forests, you need some mountain, you need some building, and this will be a representation of reality. But it's really difficult to know how much of the different types of data that the model leads. So what we've done is that we start out with with a set of rules for how much the data the model should have for each category, but then we also let

41:02

the model find the data itself. So while we're training the model, we're testing different images to see how well the model is performing, and if the model performs badly, we add that data to the training data set.

Speaker 2

41:16

Right, Okay, something I understand where you have L forty's because you're going to retrain a lot. You have several different sets of building not building, and you're going to put different weights on them and retrain and retrain and see I mean I got imagine then you get Then you run it through a functions a functional data set and see how much is wrong and are there collections that are wrong? But you could then tell you we

41:38

should be harder on identifying water structures as not building. Yeah, at some point it's going to start ignoring real buildings.

Speaker 1

41:45

Yeah, that's a nice feedback loop exactly.

Speaker 4

41:49

And that actually happened with the snow example because one of the reasons that snow can look similar to a building is that we have greenhouses, and greenhouses can have a lot of reflection, which means that it looks white right, similar to snow. So when we added more pictures of snow, we suddenly had greenhouses that were not correct. This is always a difficult part great problem.

Speaker 1

42:14

What is the sauna to domicile ratio.

Speaker 5

42:20

We wish we had a number for that one. We wish we had a.

Speaker 2

42:22

Number for that. One of the reasons I brought up the solar panel thing is I did the the Future of Energy talk in Norway and I could not find good data on residential solar because there's enough of it that it's just not well mapped right and mostly well And you know, it's interesting that Norway didn't have commercial solar being land used for solar panels, because anywhere you have flat land you have better uses for it than solar panels. But tops of buildings are covered in solar

42:51

panels because it's good use for that. But again there's not The government did not have good data on just how much you know, pseudo commercial rooftop solar there was, so you know, I'd like you to answer that question for me. Not that I need to answer ext it anymore, but it was one of those things it's like, it's a great question to ask, and in theory, you know, you could build a training set around can we detect

43:12

solar panels from this data? And then you know, somebody's willing to pay for the compute time to figure that out and maybe get a map of the of the reality of what's out there.

Speaker 4

43:22

Yeah, and one of our colleagues is actually working on a system that can figure out which areas are good for having solar panels in Norway based on all our geographical data. Yeah, so if we combine that with where there are existing solar panels, you can find really good good areas.

Speaker 2

43:39

Yeah, lots of possibilities there for you know, utilize attity. You know, it makes sense for you to go to a building that's there that you know about, it doesn't have solars on the roof and go. You would get good results from putting solar panels up here. We've got a data set that shows that.

Speaker 5

43:52

Yeah.

Speaker 2

43:52

Yeah, it's not always an obvious thing, but it speaks to the power of the data just making it easier to manage because there's it's a lot of information. Yeah, I'd like to decomposition model. You can take all these different sources of data and break it into tiles and then map it into a common system.

Speaker 1

44:08

Do you foresee having to upgrade your in house systems dramatically? Are you growing that much? And do you have a plan to grow as your requirements grown?

Speaker 5

44:18

I mean we're definitely growing, but I'd say that this is maybe like one project that we're working on, you know, it's continuously training, continuously getting better. We're also in the meantime doing a lot of other projects. So yeah, it's mostly those that are that are in line up growing. So we won't buy any more GPUs okay anytime soon? I think.

Speaker 2

44:43

I guess the question is like how long are you waiting to finish a training set these days? Like when is it long enough that buying another one would make sense?

Speaker 5

44:50

How long is the training now?

Speaker 2

44:52

Is it's yeah? How long is a training run taken right now?

Speaker 5

44:55

Yeah?

Speaker 2

44:55

It takes on that parallel L forties.

Speaker 4

44:58

Yeah, when training a new new data from scratch, it would take between two and three weeks for it to be good.

Speaker 2

45:07

Wow, yeah that's a long time. Yeah right, Like so if you had four L forties, would it be a week?

Speaker 5

45:15

Yeah? In theory?

Speaker 1

45:16

Or is it just the fact that you can wait? Is it just the fact that you can wait for it and nobody's knocking on the door saying hey, you know, yeah, yeah.

Speaker 4

45:24

We can It's it's okay for us to wait three four weeks for this because we have a model that is good, and then we're testing different things and we're not sitting waiting and doing nothing. We have different projects running in the background.

Speaker 2

45:38

But it sounds like those two L forties are probably running at their limit most of the time. Yeah, like they're they're saturated. Y.

Speaker 1

45:45

Yeah, So are your other projects also dealing with geospatial data in these models or or you said this is just one project. Are there any other projects that you're doing they're totally different.

Speaker 5

45:57

Yeah, definitely, we have a bunch of projects. So the biggest one we're doing right now is is probably trying to make sense of Norwegian sonal plants and these sonal plants. You know, I don't know how it is in the over in the States, but in Norway you have sonal plants and that's all. The municipal municipalities have their own sonal plants and they say something about how how the land in that municipality should be used so it can see.

Speaker 2

46:30

Right, so the zone for residential, zone for commercial, zone for industrial. Yeah, and that's all you're allowed to use. I mean, that's a that's a common practice, right. You generally don't want a chemical plant beside the residential area. You know, those kinds of things.

Speaker 5

46:46

Yeah, and they're they're very detailed, and it could be anything like oh, here you cannot build a building that's higher than five meters and over here you can't build a road. It's illegal. You know, all these kinds of things. So what we're doing, is that what just like everyone else, using the large language models to try to to make sense of these plants?

Speaker 1

47:09

Yeah, okay, And is that using a geospatial data Yeah, yes, you can go. So so everything start of that you do revolves around this data.

Speaker 4

47:19

Yeah, usually it sounds like yeah, yeah. So the sonal plans have two parts. You have the document, which can be which can be up to hundreds of pages with really difficult texts to understand. And then you have a part that it's the map that tells us where what part of Norway it's the sonal pen is covering. And that map usually has a lot of information that is

47:45

not in the text document. So we combine information from the map and the text document together in an LM to try to make to understand the content and to figure out which part of the document specific for a property.

Speaker 1

48:01

That's good.

Speaker 2

48:02

So you're using a large language model in that so that people can use like a text expression to put the affection the data.

Speaker 5

48:09

Yeah, exactly. So, you know, as much as I don't know personally, when I see a chet bot, you know, I don't really like it that much at least sites. But yeah, so now I'm sitting here and making making a chet box because I want.

Speaker 1

48:24

I like that.

Speaker 2

48:25

You guys are the old school AI. You were doing it before it was cool. But no, you've been infected by the ll M monster too, right, It's just sort of consumed all of us the past two years.

Speaker 1

48:36

I especially hate happy chat bots, you know, overly enthusiastically happy chat bots.

Speaker 5

48:44

Will will make ours a little bit meaner.

Speaker 1

48:46

Yeah, actually, you will probably get better usage if you throw in snarky comments every once in a while and even little insults, you know.

Speaker 2

48:57

Yeah, John Rickles.

Speaker 5

49:01

Mode, you know, like, we'll make one now for sure. Yeah.

Speaker 1

49:09

When somebody says that data is wrong, you could say your face is wrong.

Speaker 5

49:17

Yeah, for sure, we'll look into that. I think it's important to.

Speaker 1

49:22

Make it a little bit a little more exciting to use anyway.

Speaker 2

49:24

Yeah right, right.

Speaker 1

49:26

So what's next? What's in your inbox? You guys? Matilda? What's next for you? Project wise or anything?

Speaker 4

49:35

Well, we've been testing a lot of a lot of different ideas with our data and our products. And one thing we do is that we invite to different product teams to have hackems with us, right, and we tried to figure out how AI can fit into their products. And one of the hack films that was really fun was one we did together with our Mono Repo team. So we have a platform for running our front end applications.

50:05

So we have all of our three point fourteen products for developer in one big monory po and we try to figure out how AI can help developers when they're coding in that big monory pot. And we figured out that you're now actually allowed to create a visual studio code extension on top of the GitHub co pilot extension that allows you to build your own functionality into the gitub co pilot chat wow, And we've had a lot of fun with that.

Speaker 1

50:39

That sounds fun, Yeah, we've added.

Speaker 4

50:44

Yeah, we've added our own functionality into that where you can chat with specific nud Cutch documentation. You can add a support a chat to our chat room directly after if you don't get your answer, you can summarize your chat and directly have it posted to our chat room. Stuff like that.

Speaker 1

51:08

Cool, how about you melted with's in your inbox?

Speaker 5

51:11

Well, it's going to be making some new personalities for the chat bob. That's probably the first thing I'm.

Speaker 2

51:17

Going to do.

Speaker 5

51:22

Your face, No, but it is trying to figure out if this actually is going to work, you know, these tonal plans and finishing you know that that product, and trying to show it to our users and see if they're happy with it, because it's really it's really difficult to like prompt these correctly and try to to make the I assume most people these days are familiar with rag architecture, I don't know, but trying to optimize the searching part and the prompting part and stuff like that.

52:02

So there's a lot of tricks and stuff with that these days for sure. And then we're also you know, looking into as I mentioned, you know, chatbots, not you know that fun. It's not the best thing in the world. So we're trying to look at how we can parse these sonal plants in other ways, such as you know, making summarizing pages and trying to just extract useful information from these documents. Depending on what property you are on.

Speaker 1

52:33

Yeah, very good. Well, I wish you good luck, rich and I wish you good luck in the future. And you know, if there's anything we're talking about, come on back at a later time and we'll do another show.

Speaker 2

52:44

Yeah, customizing customer and get HELB copilot. Sounds like a cool topic.

Speaker 1

52:47

Sounds like a great topic.

Speaker 2

52:49

Yeah for sure.

Speaker 1

52:51

All right, thanks again, all right, thank you guys, Thanks for having us, Thanks for having us. You bet, and we'll see you next time on dot net rocks. Dot net Rocks is brought to you by Franklin's Net and produced by Pop Studios, a full service audio, video and post production facility located physically in New London, Connecticut, and of course in the cloud online at pwop dot com.

53:36

Visit our website at d O T N E t R O c k S dot com for RSS feeds, downloads, mobile apps, comments, and access to the full archives going back to show number one, recorded in September two thousand and two. And make sure you check out our sponsors. They keep us in business. Now go write some code, see you next time.

Speaker 4

53:57

Got you.

Speaker 1

54:00

To see the summer time like me his home and.

Speaker 2

54:06

My Texas in line.

Speaker 3

54:08

Dall

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript