KCAA: Inside Analysis with Eric Kavanagh (Sun, 2 Jun, 2024)

00:00

Every industry industry. Inside Analysis is your source of information and insight about how to make the most of this exciting new era. Learn more and Inside analysis dot com, Insideanalysis dot com and now here's your host, through Eric Kavanaugh. Ladies and gentlemen, Hello, and welcome back once again to the only coast to coast radio show in the USFA that's all about the information economy.

00:28

It's time for Inside Analysis. You're truly Eric Kavanaugh here, and folks, I am very excited to have frankly a legend in the industry with us.

00:36

Today we're going to be talking with Michael Bertholdt. He is the founder and CEO of a company called NIME that's spelled k n i M and they are an open source analytics and data science platform, a visual platform for doing data science, which is good stuff because even though there are lots of people who can code very well, almost anyone can look at visuals to move boxes around

00:57

on a screen. So that's what they figured out. So at that, Michael, welcome to Inside Analysis. Tell us a bit about yourself and NIME in which you folks are working on these days. Thanks thanks for the invite, Eric and having us run the show. As you already said, NIME

01:11

is very much about visual programming low code for data science. And we started that many years ago as a platform really for as a workbench almost at my group at the University of University of Constants, to be able to kind of deploy our research results to the real world to practitioners wanting to use that. And it's grown from there to become one of the only open source visual data

01:34

flow platforms for doing anything you want to do with data. And I mean I kind of I'm always a little bit careful about calling it data science because that often scares people because they say, I just want to do data wrangling. I don't care about the science seed beads, and that's kind of scares them off. That's what nine does is want so a lot of applications that we see in real life is just in large airports, just getting data in

01:57

the right shapes from many different sources. Yeah, and I will tell you I'm a huge fan of open source. In fact, we built a website a number of years ago and we feed it with a technology that we built called media lens. It's called inside open source. And the reason we launched it was because I realized there is so much happening in the open source world.

02:16

There's the Apache Foundation, as the Linux Foundation. There are lots of other projects outside of those organizations as well, But in the origin of open source I found fascinating. So I first started researching this in two thousand and five when I was working for the Data Warehousing Institute as their web evangelist, and Katrina had just struck New Orleans. I just moved out of New Orleans, so we watched it all happen on TV. I was just, of

02:43

course terrifying. But I remember that the senators from Louisiana asked for a quarter of a trillion dollars to rebuild southern Louisiana. And I happened to know through a past life and past clients in the government space down there, that the politicians are good at making money disappear. And I thought to myself, this is a very bad situation, because if all that money just floods in, it's going to flood right out, not where anyone intended, and a lot

03:07

of it is just going to disappear. So I went on this high horse, if you will, and started doing research, and it took me to open source, and I put forth this theory about open source government, about publishing all the government data such the citizens can see where the money goes and

03:23

understand. And I basically said, look, with the sarbainez Ox the Act in the States, which came out of the Enron tobacle, corporations had to document their processes for how they come up with their numbers and they had to be very transparent about that stuff. And I thought, well, if corporate America has to do it, why doesn't the government do it as well? And people thought I was crazy, But a couple things happened that were amazing.

03:45

One guy paid attention. Out of forty thousand people I emailed. One guy paid attention, and he worked for the Heritage Foundation. He went and talked to He basically testified to Congress and said, we can have citizen auditors. This stuff really can happen. And he really leveraged the impt A tour

04:00

of TDWY and sure enough they did it. House passed the bill, Senate passed the bill, co sponsored by a guy named Barack Obama who was a Senator for Illinois, and then President George W. Bush, believe it or not, signed the federal Funding Accountability and Transparency Act in September six, twenty six, two thousand and six, and I almost fell off my chair. I was like, Wow, they actually did it. But the reason I

04:21

bring this up is to talk about the power of open source. And I realized at the time the Apache web server had just surpassed the Microsoft Web server as the number one web server. And I thought to myself, well,

04:34

that's very interesting. And I was also studying service oriented architecture at the time, and I thought, well, if you have all this open source code and you have a service oriented architecture, you should be able to plug and play and sort of take stuff out and put stuff in and have a very composable environment. And I thought, well, that's not going to be very good news for the SAPs and the Oracles of the world, because they liked

04:56

the monolith. They like control, and they have control of all that stuff. And it took longer than I thought, but about ten years later open source just blew up the market with Haddoub and with the Kafka and of course you have nine. So tell me a bit about the open source foundation of NIME and what drove your decision there and what it means for your customers.

05:19

That's a very interesting story. I didn't know about that open data movement coming from Katrina break in the old d is so NIME is a bit atypical in its open source models. I mean fundamentally, there are really three different spays to do that. You can have distributions like Linox, and you're essentially making money by packaging them up nicely and supporting them, but essentially there's not really

05:42

new code that you're adding around it. Not quite true. I mean they are little installers and that type of stuff, but that's fundamentally the idea behind distributions. You can also have what people often refer to as open corea where you have something that you open source, but it's kind of it's more of a teaser, it's more baitware, and if you really want to use this

05:58

introduction, you have to buy some commercial bits and pieces around it. And then you have, of course also the longo debs that are essentially databases. That type of stuff also maybe even a data breaks with Spark. They have really cool open source technology and they make it accessible to you in the cloud for a charge. Time is different in that we have one open source piece that's the analytics platform that allows anybody who wants to build these workflows and execute

06:23

these workflows and pretty much do anything they want to do with data. And then we have a commercial complement software to that, which we call the nine Business Hub, which allows people to productionize that and collaborate. So when you have more than one person using the analytics platform in your organization and you want to deploy that as a web interface or a west service, or you want to collaborate and have compliance and governments kind of features, that's when you buy

06:47

the hub from us. And the reason the open source platform is open source. The open source analytics platform is open source is I'm not too religious around it, but to me, it's in the data sigen and field in particular, you can't exist with proprietary software. There's so much cool new stuff going on on a daily basis in resourched groups and other types of environments that you're

07:11

essentially standing on the shoulder of many, many giants. A lot of functionality inside NIME is actually based on open source libraries, so it seems kind of unfair almost to put that into a proprietary umbrella. And also it enables us to be fair to get in in roles, make inrolls into academic into teaching environments much easier because they can just use the open source platform for teaching. But it also we have of open source contributors that are contributing additional functionality into

07:40

the line platform. So it's really I'd like to see it as a win win situation also for our customers because they essentially get a lot of maintained functionality from us. In addition, they have access to all of those community functions out there. Yeah, it's very cool. I mean, there are a lot of good things about open source. One of the things that I've heard over the years is that bad code goes away because all these eyes can see.

08:03

Now. The one shortcoming up come across is that the open source project gets to MVP status if you will, minimum viable product, and then doesn't typically go past that just because it works. Now we just kind of move beyond that. But what are your thoughts about about that, in particular about how you make sure that you have truly finished products and that you're able to

08:26

deliver robust platform analytics ongoing for all of your clients. How much effort does that take internally from your developers to stay on top of the platform and make sure everything's working. That's a very good compoint. I mean, I tend to joke that ninety nine percent of all PhD projects turn into open source projects and then they kind of die away and feel away and never really turn into

08:48

something useful in production. Now, I'm probably about half of our development team, like Ady, developers at nine are focused exclusively on the analytics platform and just making sure that Core works and works in a professional environment. We have our own extensions, which are of course maintained by ourselves, so we have

09:11

the same quality assurance there. We have what we call trusted Community extensions, where we're in close collaboration with the community contributing those extensions, so we can also make sure there's quality insurance there as well. And then there is i'd call it the long tail of extensions that are experimental, right. The nice thing is that everybody can use those and play with those and explore new technologies, and then when we see increased usage of some of these experimental extensions,

09:37

we can move them into the trusted extensions as well. Interesting that makes a lot of sense. So you're trying out things, you've got these extensions in trials basically, and then once you see there's a lot of activity, then you grow some developer support behind it. To harden it is the term we typically use right to make it sure that it's bulletproof, that it really does what you want it to do. E makes a lot of sense, and

10:01

you do end to end. NIME does everything from data ingestion, data pipelines, number crunching, model building, all that kind of fun stuff is in the nine analytics platform. Is that right? Yes, that's true. So we have everything from about the ETL part loading the data. We can access

10:16

about four hundred different data sources. You can access databases, strange file formats x so of course we can also execute bits and pieces on different execution environments like doing the ETL directly insider database or in snowflake or in data breaks or in our loop in the old days. And then we go all the way to visualization. The analytics functionality, and a lot of that, as I said before, is of course based on each charts for the visualization a lot

10:41

of pythen libraries see libraries, our libraries, Java libraries. For some of the machine learning functionality, we have integrations with TensorFlow if you wanted to do that. We have integrations with the other deep learning libraries, we have integrations with xg boosts. Pretty much everything is in there, but the other pieces. So often when people talk about data science stainly mean this kind of from

11:03

the data to the reporter, to the endpoint or to the model. But the business hub then also covers the rest of this journey, right, deploying it to others, managing it, three training models when needed, and monitoring their their performance well. Right, because at the end of the day, you want these algorithms to connect into your business, whether it's spear of marketing

11:24

or for manufacturing or supply chain or whatever it is. You want it to affect some outcome in the business, and so that involves connecting to operational systems, right. It involves connecting to EERP systems or CRM systems or things of this nature. That's where the magic happens. And a lot of times that's the hard part, right. I Mean, I've heard many stories about models that just don't get deployed because maybe the companies didn't have the wherewithal or they

11:52

didn't have the expertise to do it properly. But being able to plug the algorithms into operational systems and then monitor how those models perform and switch them out right because you've got your production model and have your challenger models that are sitting at by the wayside waiting to get pulled in and being able to switch over to a new model when a model that's in production starts faltering. That's a critical piece and that, I guess, is that done in business hub with

12:18

you folks. That's something on the business subside exactly. So as business hup, you can deploy models which really are deployed nine workflows. You can deploy in them as a rest service or as a bad application, and then people consume it, and you can constantly monitor what's happening in production and then potentially replace we train, or just alert the data science team and just say, hey, this is so out of backwith reality. We don't really know how

12:43

to fix that, do something about it. The nice thing is that you don't have to switch code in between. Right in the old days, we always somebody coded the model of the strains in some strange language and then it was reprogrammed by it into some production language. On our case, on the hubside, the workflow that was trained is also the one that runs in production.

13:03

That's interesting. So one of the other hurdles that people are running into is when they use Jupiter notebooks to write their model or to build their model to test with data, and then they want to go put that into production, and it's just this step by step tedious process of copying over code and values and all these things, and that falters often. That, from what

13:22

I understand, is a really serious problem. But I guess do you not have that challenge because you're not using the Jupiter notebooks typically, and people are just in the environment in the analytics platform building out their models after they pull in their data, et cetera. So you're already production ready when the process

13:39

begins. Is that about right? That's a very nice summary. Yes, So what you use, what you use on the creation site when you actually train the model is exactly that piece of the workflow gets then moved into production and executed by exactly the same engine, so you don't also have a translation issue there. The other piece that people often lose in this going from training to production is all of the feature engineering that you did, all the feature

14:05

transformations that you tend to lose them. So you only can take the model and move that in production, but you can't do the transformations. And in Lime you can grab automatically the part of the workflow that has the transformations and the model applying the model, take that rep that automatically and deploy it to line business up. Well, that's pretty cool. And you cover also two

14:28

different industries, right, so you do insurance healthcare? Would imagine financial services all sorts of different industries because it's more of a horizontal solution instead about right, Yes, that's absolutely true. We have customers and users in pretty much every industry. Yeah. Well, and we're going to talk about large language models here in a minute in our next segment, but before we get there, I'll just throw out one of my theories to you and see what you

14:52

think about this. To me, this explosion of AI through foundational model, including large language models, is really a major call to action for organizations to get their data house in order. And what I mean is that data governance. If you don't have data governance, if you don't even know what data governance is from an organizational perspective, you're going to have a hard time responsibly

15:16

leveraging AI. Would you agree that companies really do need to take a very hard look at their end to end data management life cycles processes, understand governance, who gets access to what data? Even understanding a broad inventory of your data sources. Would you agree that is paramount to do that before pulling the

15:35

trigger on some AI. I totally agree. And the funny in the way, it's funny that we've been preaching this for also data science processes for a long time, this government's topic, and nobody really cared, and now people really care by breaking up. Isn't that fascinating? I mean, I just

15:54

I've been in this business a long time. I've been talking about data governance, analytics, AI, all this stuff for twenty years, right, And we talked about data governance twenty years ago and fifteen years ago and ten years ago, and basically nobody was doing it. I mean, you couldn't even it wasn't even easy to do because you could either control access at the database level, which is hard to access controls, or at the application level.

16:15

But there's nothing in the middle. And really it's in the middle. And now with the cloud, that's one of the nice things about the cloud is that it is this de facto marshaling area for functionality and data. And now we have the capacity to apply very fine grain controls on things, on data sets, on types of data. For example, we can scan and find PII and then know, okay, flag this as sensitive. There are lots of things we can do these days that we just kind of couldn't do ten

16:44

years ago. Real quick, one minute, what do you think is that about? Writer? We finally can do this stuff. Until we are doing it, what do you think. I'd probably explained it slightly differently and say we could have done it probably before as well, at least some of those aspects, but people just did didn't care enough because there was not enough arm

17:02

in it. Now But now that everybody who does anything with JENNYI is the danger of sending data anywhere people are really really baking up and seeing the pain there. That's right. Well, it is like I say, it's a call to arms, it's a call to action that your organizations have got to do it because you don't want to wind up in the crosshairs of an audit.

17:22

You don't want to wind up with a breach, you don't want to wind up getting sued by someone because their information has now been leaked to sensitive the sensitive resources out there. Well, folks, don't touch that doll. We're talking all about AI and analytics platforms, and next up we're going to dive into these large language models that are just taking the business world by absolute storm. It's really quite fascinating to watch. But don't touch out. That

17:42

will be right back. You're listening to Inside Analysis. Welcome back to Inside Analysis. Here's your host, Eric Tabanac. All right, folks, back here on Inside Analysis talking to Michael bertthol. He is the CEO and founder of a company called NIME. That's k n I M Look these folks up online, an open source analytics platform. It's wonderful stuff. It's like a

18:11

giant candy store for analysts to go play and have fun. But I wanted to talk to you about these large language models, Michael, and in particular, first of all, the open source side of the equation. So Meta comes out with Lama and lama too. Open source. Open AI used to be open now it's not. Now it's the ironically named open AI because it's a black box. And with the technology this powerful, I believe we need

18:37

we need open source. I don't know that I would get behind a mandate that they must be open source, but there needs to be some transparency into how these things are working, just so that we can have our peaceful sleep at night to know that there are bad actors involved somehow. I mean certainly for regulated industries like financial services. If you bring it into some workflow for loan approval or something like that, then you have to be able to explain

19:07

how you came up to your answer. But what are your thoughts in general about open source versus closed source? With these large language models, I think there's a lot of value in it. The problem is that, in my opinion, there's open sourcing large language models isn't just about open sourcing the code, but you also need to open source how it was actually trained. So in a sense you also need to at least give open access to the data

19:33

that was used for training. Because even if I give you a model and it was trained on half copyrighted material that it's going to spit out again when you use it, you wouldn't know if you didn't have access to the training data, right that That's part of that is is was it supposed to be used? And then I think the other piece is that what some companies are open sourcing is only the code to use the model for predictions later actually apply

20:00

it, still don't know how it was trained. So that's the third element that needs open sourcing. And then I believe one of the key proprietary ingredients that a lot of these companies now have is safeguarding code around it so that some types of answers don't get produced, some types of inputs aren't being accepted, and open sourcing that as well would really really reveal their secret sauce. And I think that's why they are the open eyes of the world are shying

20:26

away from that one. Right, No, it does make sense. I mean, we have proprietary code. It's not new, but again, these are very very powerful engines. And then there's another whole side of this equation, which is the RAG model Retrieval augmented Generation, which upon great reflection, I believe will be the layer of functionality for governance, for privacy, to

20:52

a certain extent, for security, for management. You know, a lot of that's going to get baked into the RAG model, where you could bring example, Apple, before you hit your prompt, before your prompt goes up, to the large language model. Have a layer in between the checks and sees. Okay, and this is already happening. Like I asked Gemini a couple of weeks ago how many electoral votes are in Georgia and Arizona and some other state, and a thought for a second. It said elections are complex

21:18

and fast moving. We recommend you use Google. It was a guard rail. That's a guardrail. They exactly built that in to say no, no, no, no, we don't want to touch that. Right, And that's in the RAG model. Right. That's not like trained in the model. That's outside the model, But it's the workflow you have around the engine that's very very important. Right. I totally agree with that, man.

21:41

I mean that's what I call the safeguards before. And I think sometimes it's probably not even part of the context that's part of the RAG models, but it's really part of some safeguarding code even around it. I mean we use

21:51

that ATNAM as well. So we have built in what we call KAY inside the analytics platform that allows you to have a QA mode you ask questions do this and this, and excel out does that look in a nine Doug fim and then it gives you X shows you a couple of notes in nine, and we're of course filtering that these notes do actually exist, because every now and then open AI, which we use underneath the hood, hallucinates and invents

22:15

notes that NIE probably should have but we don't have it. That doesn't help the pro use. But that's a very very simple way of code around the KAI that is just making sure that what it spits out is reasonably useful. Mm hmm. Yeah, that's interesting, and you're going to see more and more of these AI agents. That's what everyone is talking about now, are AI agents, which are like little bots, semi autonomous bots that can do various things, and they can check on each other and they can do all

22:42

kinds of stuff. I mean, it's very interesting to me when we talk about data science. We talked about it before, it all seems to be getting subsumed now into AI. In conversations about AI. Even though there are lots of different versions of AI, right, I mean, there are traditional models, regression models like also of old fashioned aif you well, it's still very powerful and still works, but the new stuff is sucking all the auction

23:06

out of the room. Isn't that about right, Yeah, we see that as well, and sometimes I mean, I'm an old guy, but now I've seen this in the past. Right when back propagation came along, everybody was suddenly using gradi into cent for every problem. We just thought, hey, you can solve this directly, you don't need to do Gradi intosst. Then there was support vector machines, and then somebody else, and now then with deep learning, and now it's AI. So sometimes we see people building

23:27

workflows for even very simple things. They're reaching out to some AI and we just say, hey, there's a no denignment that does that computationally a lot less expensive. I don't use that. So I think, to me, it's it's a bit of a hype right now. It's just a new kid in the block. Everybody wants to play with it and use it. But the augmented really mixing it and matching it right with traditional techniques, I think

23:49

that's where the true value lies. Yeah. Well, and so I'm just guessing here that one of the nice things about your platform is that it is an end to end platform for building models, designing models, training models, pulling the data in all these things, and it's adjacent to this business hub, So you have a marshaling area for ideas and for testing algorithms and for testing models. Then you connect it through the business hub and see what happens

24:18

and see how it operates. And it's important to have this one environment where that takes place because when you have multiple tools, it just takes longer and it's disjointed and there are connections between the tools and things change, So it's important to have that main marshaling area to It's like a giant analytics sandbox. Is that about right? That's a very nice sbscription. Absolutely. I tend to say that data scientists doesn't necessarily need to know how the method does something,

24:49

but it needs to know what the method does. So if it's reaching out to a Python library or in our library or sea libry underneath it, it's not that important, but you still need to understand what the method actually does underneath it to be able to interpret the results. It's a simple example. If you don't know what a regression coefficient is, you won't be able to interpret it, but you don't necessarily need to understand how it was derived

25:11

from the data. Yeah, no, that's very interesting. Let me throw this concept at you and see what you think about it. I wrote up an article just last week I guess about this. I was flying to a conference in Denver just thinking about these large language models and analytics and AI and all this stuff that have been covering for a long long time, and I

25:27

thought to myself about this concept I call the executive cockpit. And the idea is that I think very forward looking organizations are going to deploy a small language model that is aligned with their business, like if it's manufacturing or healthcare or whatever, in their data center, so on prem, possibly in the cloud as well, but I have my thoughts wrapped around this on prem small language

25:51

model. Then you're going to train it on your ERP, on your salesforce, on your CRM, on your customer support for example, your tickets, like any of your core enterprise systems. You're going to train this model on your data, on your business business data. Then what you'll do is set up copa topics coming from those systems into a vector database adjacent to this interface for the small language model, and that is where the executives will spend their

26:18

day running their business. Because then you could ask any question at all, how is our marketing working? In APAC Who can we let go if we have to save some money? Where are we weak in our organization right now? Just all kinds of different questions and you'll get all these answers. And I actually mentioned to a CEO of this one company because I was trying to get him to help me do sales enablement for them, because I have this

26:41

big audience I've been marketing to for years. And one person turns out to be the next deputy Chief data officer for the IRS. And I sent this email saying, Hey, this is the lady I've known for a long long time. This is what I mean by sales enablement. Do you guys have the IRS account? And he fire back, he said, don't. I don't know for the other's accounts. I thought to myself, well, you would know if you had the executive cockpit, because you would just ask it,

27:06

do we have the IRS account? Who is the account rep? What's the latest of this account? Because you're getting information from all these systems in your private environment. But what do you think about this concept? Is that is that doable? Is that pie in the sky or what do you think about all that? It's an interesting idea, I thought about similar. I mean, at the end of the day, you're personalizing a large language model

27:27

around your own infrastructure in house data. I think the challenge there is that in order to get a really really good model like that one that's really useful, you need to train it on a lot more data than just your own. So in a sense, you need to benefit from your competitor's data without actually seeing that, but kind of learning the general structure and the general insights, and then you customize it on your own, which in return kind of

27:52

means that you should also be providing your data to other organizations. It's almost like that's kind of pre competitive training of these models so that they're useful for everybody. I think just training it on your setup, you need some bigger context than that. Or maybe you're a company and you have enough context anyway, But for every small company, I don't think you have enough data to

28:15

really get meaningful insights. That's very interesting. That's a good that's a good point because I'm just I'm wondering to myself and I'm gonna throw this one at you too. So one of my AHA moments with these large language models is when I realized that when you train them on a corpus of data. They're

28:33

not actually persisting the data verbatim. It's not like they're taking strings of text and storing it in a record somewhere, but rather, in the training process that data you use will adjust the weights and biases and the parameters of the model. So in other words, it's like, huh, well, that's that's very interesting that it can train in that fashion and then reflect back to

28:56

you such remarkably granular detail about things. And you know, what I've seen is that if there is a subject area that has been published about widely, like how computer processors work, or how an irrigation system works, anything that has a lot of content on the web that these engines were trained on, it does very well. It knows all that stuff. It's when you get to the fringe where there's not that much published. And I guess that's kind

29:22

of your point about having enough data to train the models. If you don't have enough, you're not going to get the contours right, and it's going to be skewed in one direction or other. Is that about right? I think that's a very good summary. The contrast is right. I mean, a colleague of mind want summarized, since it essentially it's a consensus engine. It's getting the consensus around what a computer programming is, learns that from the

29:44

data and can repeat that. But if it's just one isolated outcome, it's not going to be able to recall that one. Interesting. Yes, So Craig schmid Huber I think his name is. He's the guy who wrote the papers on the transformers, and he's based I guess he's actually he Arabia these days, but I want to say he's German of German origin. And I was amazed when I realized he wrote those papers in like the nineteen nineties or something. And it's just just now we have the compute to be able to

30:11

can you explain that? Is that what happened is that just the timing was right now to be able to understand this and put it into play, because

30:18

that was one of the big changes. And now it's able to see like you know, ten twelve tokens left or right as opposed to just like two or three, and you also have this, like you say, like a consensus right where so they are like I call it almost like an ai Greek chorus where one is saying I think it should be an A. I think it should be a B. I think it should be a C. And then the Okay, I'm going to pick this one. That's very interesting.

30:41

It's a very interesting development. But why do you think it took so long? Is it just because we now have the compute to do that? I think it was a computer party issue as well. And then some science tends to have a little bit more of fun, needs a little bit of time

30:55

before it truly has an impact, but mostly waiting for complete power. I don't know that one way of looking at what this consensus really does is I don't know if you watch these YouTube videas about JGBD playing chess now and the interesting part is that at the beginning it does extremely well and does very sensible things, and part of that is these opening libraries are all over the place,

31:17

so that's extremely well established consensus. And then somewhere in the middle it starts inventing bizarre moves and suddenly new figures pop up on the on the board out of nowhere, right, and it has always meaningful explanations for that, And the problem there is that that data is so sparse that there's no consensus to learn. So at the beginning it sounds it almost looks like it understands

31:38

chess rules. But the only reason it does follow the chess rules is that they're so deeply ingrained in all of the common material that you see that the kind of the likelihood of going outside of the world book is too small. But somewhere in the middle of the game it goes completely off the books. That's interesting, that's wild. So one of my good friends in the business as a gentleman named Usama Fayad. You may have come across from at some

32:02

point. He was the first chief data officer for Yahoo way back in the day, and now he runs the Institute for Experiential AI over at Northeastern University here in the States. And I had him in the show, and he's very funny, he's very candid. He said, these large models, they're too big. They're not supposed to work. We don't know why they work. What are you talking about, this guy who runs this whole operation. He's joking, we don't even know how they work. I mean, how's

32:28

that for transparency, right, I mean there's some truth, right. We don't really know how they come up with these answers. Right, it's a wild mix of it. It's a highly distributed model. We don't know why a particular answer comes. We can come up with kind of proxies for an explanation by wiggling with the inputs and trying to figure out what happens, and we can say, ah, this probably had a lot of influence on the

32:50

decision, but we don't know for sure. That's so wild. I mean, that's just such a big deal that you know, but we do. So now we have all this observability in the data space, right, You've got Data Relic and Data Dog or new Relative today Data. All these different companies are doing observability which I think spun out of Kubernetes primarily. But it's very interesting and we need that kind of observability in these large language models.

33:12

I think. I think that's going to be one of the keys to success. But folks don't touch that dot. Will be right back. We're talking to Michael berthold from NIME on Inside Analysis. Standby, Welcome back to Inside

33:30

Analysis. Here's your host, Eric Tabanac. All right, folks back here on Inside Analysis with Michael Bertholdt, founder and CEO of nime K n I m e. Looked them up online and Michael I was mentioning to you in the break that I'm wondering to myself this whole business intelligence industry and there are hundreds of players these days, hundreds of companies doing some form of analytics.

33:55

Of course, NIME is an whole analytics platform and open source platform end to end, but there are lots of point tools, whether it's visualization or number crunching, OL app roll, app all this kind of stuff, and I wonder is all of that in the crosshairs of these foundational models? What do you think? That's a very very interesting question, and we of course asked

34:16

ourselves that as well. And I think for some of the some of the tools that you mentioned, like generating visualizations, that type of stuff, I do think they are pretty replaceable by AI type models, because at the end of the day, you're doing something, you're generating a code that generates the visualization based on data, and you judge the output of that code by just looking at it and saying this is quite right. So I think that type

34:40

of stuff will go away. And we have in NIME actually built in what used to be an each chart scripting editor that has now an AI element and you don't need to touch the code anymore. So those types of wills I believe will go away. The eye tools trying to really find surprising, interesting new insights in I think that type of stuff is a lot harder to replace

35:04

because fundamentally you're trying to find something new. And like we discussed before the break, these GENEI models are consensus engines, right, so they kind of try to gravitate towards something they've seen more and more often before. Interesting. That's right, that's an excellent point. Really, that's that's exactly right.

35:22

So it's good for understanding the well trodden path basically, Like that's what it's very good at doing, is saying, Okay, there's a highway, it goes that direction, but I want to go wandering around the forest, and it's not as good on the fringe basically, so you will use it. But I mean, so I read an article some guy on LinkedIn talking about how he connected I don't know by ODBC or JDBC or something in his model with data sources, and he asked it to queer the data source and it

35:51

did. It reached into the database, pulled the information out and delivered it and you're like, Okay, that's pretty interesting. And then when I think to myself, what's what could be happening here? Is in the data warehousing space, for example, we move so much data around. It's all the data that's from your core systems that you've decided to put in, which is

36:10

a tremendous amount. Very little of that data ever gets used a lot of times it's the it's the summaries or the aggregates or the roll ups that are used for various purposes, but a lot of it just doesn't even get used at all. And I think that what these large language models are going to do is kind of turn the entire model inside out of how we viewed moving data and analyzing data and doing things with data because they don't really care.

36:37

They're just going to once they're trained on a certain space. And again, if you train it on your data, or if you're in your vector database, you have a lot of embeddings of your corporate data and you point your RAG model there, well, you can get answers to things very quickly that before would have required running reports and doing ETL and doing all this stuff.

36:55

And I think that in many use cases. These models are going to short circuit all all that stuff and you're just not going to have to do as much that stuff anymore at all. But what do you think about that? I think there's some truth to it, because fundamentally, what these models won't necessarily do is actually look at all of the data, but they're going to apply a lot of common standard practices to that. And standard sounds a little

37:19

bit too limiting. I think there's a huge wealth of standard practices that people do apply to the data, and that's part of this consensus engine, and so that the AI models will try out a lot of those things a lot faster than you ever would. So absolutely, and there's a good chance that some of these insights that will be generated are interesting to you. But then continuing the exploration and saying, I mean, how do they always say the

37:45

Eureka moment is usually preceded by oops, that's strange. I think you'll have these moments right, And AI doesn't do that. AI doesn't say this is weird, I should diggle in a little bit deeper because that's outside of the consensus. So it will continue doing kind of like you said, will will continue the normal path and that's why I believe the human intuition curiosity OOPS detection capability is going to be relevant for a long time. I like this oops

38:17

detection. That's good stuff. Well, there was a gentleman I had on the show years ago who did something. He said something a lot like that. He basically said, AI doesn't have to be the ability to be like, hmm, that's kind of weird. What's going on with that? Right? Because it's just processing information and doing what it's been told to do, which is just reflect backwards based upon a prompt and its training. It's a

38:39

very simple thing. I mean, it's very complex in terms of how it got there, but nonetheless it you know, one thing that did annoy me, I will say is in the early days when that New York Times reporter was getting deep with the with CHATGBT and trying to like tease out of it whether it's sentient or something. I'm like, dude, that is a misuse of the technology, Like that is not whether you should be using this thing for to try to like what trick it into revealing that it's really alive and

39:07

you know, what are you even talking about? And I think that's part of the downside these days is that. And I'm a media person myself, but a lot of times the media will just sort of glom onto some narrative about something and it's very hard for them to decouple from that and get down to brass tacks. And that's what we do in the show. In fact, I used to say at the beginning of every show, the show, it's all about getting down to the brass tax of what actually happens in the

39:31

data world and what you do with this stuff. And I think it is important that people keep in their minds the purpose of this technology, Why are you using it? Where is it appropriate to use it, and where is it not appropriate to use it? And that's just basic common sense, right, Yes, I totally agree. I mean I go pretty much in line

39:49

with also the European AII that they just passed. I mean, if it's not mission critical, if it's not safety critical, you can trust a system that is wrong in I don't know, point one percent of all cases. If it's controlling nuclear power plants, better not be wrong in point one percent of all cases, right, that's right. You got to watch out where it's so where do you see a lot of use case of your clients. I mean, obviously some of your clients are using large language models. Where

40:15

are you seeing success stories in that space right now? So there's a lot of success stories in other areas of the business, as you probably probably know. Undoubtedly, no checking legal contracts, doing marketing material, that type of stuff. There's a lot of value in applying GENEI on the data analytic space. Honestly, we don't. We see a lot of interest. There's a lot of people that say, oh cool, I can build a customized chatboard

40:42

using name that's not really our core business. And then the real applications tend to be around text processing, which is where jennai is really strong. And then instead of using outdated antique libraries for sentiment analysis or text segmentation, you're just handing it over to an AI model and say, hey, segment this, or extract the key components or create a summary. But that type of stuff, it's amazing. So I see. I'm also as image mining extensions.

41:09

I think that's the next setup where we can use image processing capabilities of Jennai for a lot of the number crunching. I mean, we've all seen these cases where you can't add two numbers doesn't really know what the prime numbers. This is the understanding of the concept of a number. Right. So there. I think it's more as a tool to help you build workflows, build dataations, but only as a helper, right. So that's actually an

41:37

excellent point I wanted to get into. I believe that we're just scratching the surface of using these models as a component in a workflow. So you mentioned, for example summarization. That is hugely powerful. I mean, you know you can enter especially for policies, for complex policies for law, for example, for legal protocols and when to file motions, what motions you can file, what you have to do according to I mean, you used to have

42:06

to pay lawyer's a lot of money to tell you that stuff. Now if you just get access to the rules, will load them into a large language model and just start asking questions. That is an incredibly powerful use case because it used to take a lot of time to sort through the process of how to do something. Now you just ask it, how do I fire someone? First step, send them a letter saying they're not performing properly. Second

42:27

step, you know, monitor their behavior. Third step. With all this stuff, it's like there it is like, Wow, talk about saving time. I mean, it saves time. And here's another big soapbox issue. It improves morale because nobody wants to spend their time scratching their head reading through just dreadful documentation. Nobody likes doing that, nobody, So all that stuff is going to go away, right, what do you think? I totally

42:50

agree. Yah. Your examples center a lot around firing people, by the way, But I think I tend to say, and people ask me if Jenny, I A, you going to make data science lives easier? I say no, I don't think so. But it's going to make it nicer because it's going to remove all of that boring stuff and we can now focus on the really interesting but more complex stuff. So it's going to make it

43:12

more interesting, more complex. That's interesting. Yeah, wow, And I think you do want to document things, and you can have it document things for you too, Right, you can just throw a whole bunch of stuff and document this, okay exactly. I mean, we we now have a component on the hub that takes a nine workflow and explains what the nine workflow does, and we do that by just shipping it off to Jennai, it's perfect for that. Wow, that's amazing. All right, folks, Well

43:35

podcast p A segment coming up next. We're listening to Inside Analysis. All right, folks back here on Inside Analysis talking to Michael Bertholdt is the founder and CEO of nine k N I m and Michael, I know what it's like in a software company. There's always a roadmap. You're always working on something, and we talked about a couple of key things. Governance. There's model governance, there's data governance, there's it governance. What are you working

44:01

on in the governance space? Thanks for asking that, Like, like you know, all the road maps are changing all the time. But what we're currently working on. We had actually this model government's topic on the work right, brend I've been working on that for a couple of years now. So the idea of being able to monitor what models are doing automatically we train them.

44:22

We talked about that and I think the first episode, but what we added now is the ability to also govern the AI usage of people that are creating nine work clows. So first of all, then somebody is creating nine workplos using the NIME analytics platform and uses this built in AI. We call it KAI for NIME AI. We need to make sure that gets channeled to an IT approved AI. Right, maybe that's just for expense purposes. You know, I'm going to have too much consumption in the cloud. Want you

44:51

make that in house or it's really a data privacy issue. But the more worrisome part for people is that you I mean, one of the springs of the NIE Analytics platform, the workflow concepts, is that everybody can use any technology they want. Right, they can reach out to experimental libraries, they

45:06

can reach out to our stuff, to Python stuff, to whatever. But by now they can of course also connect to various different AI providers, and we need a way for them for Central Light Tea Governance to be able to make sure that the nine workflow users inside the organization can only use approved AIS.

45:27

So maybe the maybe marketing can use an AI in the cloud, but maybe legal shouldn't or HR shouldn't, right, And that's something we have built in into the name hub now that we can limit the types of AIS, you can their users can reach out to from the nine Ndredrix platform and they get to choose from one of the approved AIS that it central Light he set up and said, okay, here's an AI that's consumption light, that's for the easy tasks. Here's the one for whatever the tech team. Here's the

45:55

one that's for compliant data. And we also allow they're set up on the hubside of safeguarding workflows so that you can before the data gets sent out to say a cloud AI provider, it gets screened for private information or maybe the data automatically it gets anonymized before it gets sent out. Yeah, that's very important stuff. And the use are you also able to do some finops on that, In other words, see how much it is costing to leverage this

46:23

AI engine versus that AI engine and do some cost optimization. Is that something you can do. We can do that as part of a nine workflow and you could build that. You could build that in there as well. But we are currently offering to our customers the abilities to monitor our consumption so they have a bit of an eye on that. But it's not automatically re routing to different AIS. But that's just an added functionality under the wood m hmm.

46:46

Yeah, it's all in the workflow is basically, and that's where we're going. In the last segment where I think that we're just the beginning of leveraging these technologies because what they're really very good at is pattern recognition, right even just the vectorization the embedding is basically how it stores it as a point in array basically, and just understanding how it can map those two things. It's not just word not just text generation. I think we're going to get

47:13

some really interesting things in terms of pattern recognition and then recommendations. I mean, I think that these little AI agents, these assistants are going to be extremely helpful in all facets of business. You know, to be able to very quickly give you a customer profile when you're on the phone with someone, or to be able to give you summarization of text on demand. I mean, really, I think the hardest challenge is going to be changing mindsets and

47:37

changing day to day behaviors and workflows. What do you think final thoughts? I totally agree with that. But we see that inside NIM as well. I mean, the developers were the first ones that really said we want to use this, you want to use this, but then getting the rest of the organization to as seriously think about It's like what can HR do? But can legal really do with that. It's a huge time saver, and it's largely untipped. I totally with you as a check and they're interesting times a

48:01

hit yes, to say the least. Well, what a fantastic conversation, folks, look them up online. Michael Berthold B E R T H O L D from nine K N I M E. We'll talk to you next time. Folks, you've been listening to inside ANALLISONA. It's my fatous, cool logic radio legend you love and the best talk progressive talk in southern California. We listen to you all the time. More than eighty million Americans depend

48:35

on AM radio each month for news, weather and emergency information. A new bill in Congress would make sure AM radio remains in cars because when sell and internet services are down, this free service could be your only lifeline. Text AM to five two eight eighty six and tell Congress to support the AM radio for every vehicle. ACT message in better rates may applay. You may receive up to four messages a month, and you may text stop to stop.

48:59

This message furnished by the National Association of Broadcasters. Tune into the Farran Dozier Show usually marks place in time, The soundtrack to Life. Sunday Nights at eight pm are KCAA Radio playing the hottest hits and the coolest conversations Sunday Nights at APM on The Ferran Dozier Show within the array of music, talk, sports, community outreach and veteran resources, the hits from the sixties, seventies,

49:28

eighties, nineties and today's hits. The Faran Dozier Show on KCAA Radio on all available streaming platforms and on a six point FiveM and ten fifty am The Faran Dozier Show on KCAA Radio. Redlands Ranch Marketing is a unique, full surface international grocery store that specializes in authentic food items from Mexico, India, and from many Mediterranean and Asian countries, including popular items from the US.

50:10

They offer fresh baked items from their in house fagery, housemade tortillas from their tortilla area, a delicious array of prepared Mexican foods, a terrific fresh food and juice bar, and a large selection of meats, seafoods and deli sandwiches, salads and halal meats. Their produce department is stocked full with fresh local and hard to find international fruits and vegetables that you cannot find anywhere else.

50:32

Don't forget to step into the massive beer Cave and experience the largest selection of domestic, artisan and imported beers in the IE. They can also cater your next event with one of the delicious takeout catering trades of food. Visit them at Redlands ranch Market dot com. That's Redlands ranch Market dot com. Redlands Ranch Market a unique and fun shopping destination. What if we took monopoly

50:59

out of our our daily diets? Busting the power of abusive and arrogant food monopolis would be of such immediate benefit to the bottom lines of farmers, consumers, and workers that even middle of the road congressional Democrats and a few Republicans

51:13

are turning into ROOSEVELTI and trustbusters. New Jersey Senator Cory Booker, for example, usually a reliable defender of corporate interests, sees the connection between inner city food deprivation and the consolidation of power by industrial farm and food profiteers, which he says are contrary to our very idea of farming in our country. Working

51:36

with progressive grassroots groups like Food and Water Action and Family Farm Action. Booker is sponsoring the Farm System Reform Act, a comprehensive proposal to overhaul major parts of the broken food structure included in his bill, or strong overdue provisions to phase out the monstrous system of cafoes, confined animal feeding operations that are tortuous holes for thousands of chickens, hogs, and other animals caged in each of

52:04

these huge factories, which are also polluters of water, air, and rural communities. Rather than conventional liberal programs to treat the symptoms of monopoly, such progressive populist approaches begin to dismantle monopoly, and they represent our best chance of actually making life fair for the majority of people. The good news is that

52:24

much of the power to do this already exists. As investigative reporter Amy Swan writes in the January Washington Monthly, we don't have to wait on recalcitrant Republicans, and we need Dems in Congress to make progress. A tool shed of laws that were put in place during the past one hundred years to counter monopoly power are still on the books. They're stored in the agg Department, sec Justice Department, and many other drawers of public power, mister Jim Hiar saying

52:53

let's put them to work. Need insurance help The insurance all with Carl Susman is your answer from policy queries to coverage concerns. Carl and acclaimed expert in media, guest offer solutions. Join us for a show that simplifies insurance, making it understandable for everyone. To Hebot Club's original pure powdy rcosuper Ta comes from the only tree in the world that fungus does not grow on. As a result, it naturally has anti fungal, anti infection, anti viral,

53:22

antibacterial, anti inflammation, and anti parasite properties. So the TA is great for healthy people because it helps build the immune system, and it can truly be miraculous for someone fighting a potentially life threatening disease due to an infection, diabetes, or cancer. The T is also organic and naturally caffeine free. A one pound package of T is forty nine ninety five which includes shipping. To order, please visit to ebot club dot com. To hebo is spelled

53:47

T like tom, a h ee B like boy oh. Then continue with the word T and then the word club. The complete website is to Hebot club dot Com or call us at eight one eight sixty one zero eight zero eight eight Monday through Saturday, nine am to five pm California time. That's eight one eight sixty one zero eight zero eight eight t ebot club dot Com with sixty years of fascinating facts. This is the man from yesterday and back in time. We go to this time in nineteen seventy two, Cynthia Lennon

54:23

claims that John Lennon hasn't exercised his visitation rights. They have a nine year old son named Julian, and according to Cynthia, John Lennon hasn't seen his son Julian in about eight months. This is com Le's go to get you go to nug you rot ahead, You're better get yourself together and from this time in two thousand, Dennis Miller is chosen as one of two new commentators

54:57

to work alongside Al Michaels on ABCTV Monday Night Football. Not only has Lindell never kicked on a Sunday, He's never kicked a football in his life. Never. And from about this time in nineteen fifty seven, Robert Young says part of the success of his father knows best is that the public likes a dad who isn't of Afoon and Jane White with Eleanor Donahue, Billy Gray and Lauren Chapin in Modern Knows Best with more at Man from Yesterday dot com.

55:43

Do you like to safely leverage bank money to earn double digit returns income tax free, with guarantees and no downside market risk? How can you do this? This is Farence, host of the Your Personal Bank Show. One you fund a high cash value policy one time to earn dividends and interest. Two establish a bank line of credit using the cash in your policy as collateral.

56:05

When you earn more in dividends from your policy than the interest the bank charges, you keep the difference, and the difference is average two to five percent annually in your favor for the past forty plus years. Three the bank funds contributions years two to twenty plus. Each year the bank adds funds, your rate of return increases. Your average rate of return can grow too strong double

56:25

digits annually within a few years. Contact niyat your Personal bank dot com Your Personal bank dot com or eight sixty six two six eight four four two two eight sixty six two six eight four four two two for more info or tune in to the Your Personal Bank Show. Your Personal Bank Show airs Tuesdays at four pm right here on CASEAA ten fifty AM and one oh six point five FM, the station that leaves no listeners behind. Palm Springs Dispensary reminds everyone

56:53

during these challenging times they understand the importance of mental health. Offering daily deals on everything from time quality flowers to edibles and discounts on CBD products. Palm Springs Dispensary's goal is to help you win the battle within. As one of the area's most beautiful wellness shops. Located on Garnet Avenue, visit the Palm

57:10

Springs Dispensary dot com. That's the Palm Springs Dispensary dot com. If you're looking for a fuller, part time sales position and you have radio, TV or print media experience, k c AA has a great opportunity waiting for you that pays the highest commissions in the market. If you're interested in a sales position with us, email CEO at KCAA Radio dot com. KCAA listing the k c AA Lowolinda at one O six point five FM K two ninety three

57:38

c F Burno Valley, NBC News Radio. I'm Chris Caragio. Former President Trump says he thinks there will be a breaking point for his supporters if he sentenced to prison time in his hush money case. In an interview with Fox and Friends Weekend, Trump said he doesn't know if the public would stand for it if he were put behind bars. I think I think it would be tough for the p to take. You know, at a certain point,

58:01

there's a breaking point. The former president's comments come just days after a Manhattan jury convicted him on all thirty four counts of falsifying business documents. Israel's war cabinet is meeting to discuss President Biden's plan to wind down the war in Gaza. The session comes after an aide for Israeli Prime Minister Benjamin net Yahoo said his government is ready to accept the framework for the proposal, despite reservations about

58:22

the deal. The chief foreign policy advisor to Netan Yahoo told Britain Sunday Times Israel dearly wants to see the release of hostages held by Hamas. The official said, while it's not a good deal Israel, we'll accept it. Cruis are making progress and restoring drinking water to Atlanta area residents following a major at water main break. That's the word from city officials as Atlanta remains under a state of emergency after a massive water main break left parts of downtown without water

58:47

this weekend. It also led to boil water advisories and prompted businesses and parts of the city's downtown entertainment district to close or cancel events. At least one person is dead and over two dozen injured after a shooting at a birthday party in Ohio. The Akron Police Department says officers responded to a report of shots fired in a neighborhood in East Akron early this morning. Authority say up to thirty people were injured. One man was killed. An investigation is ongoing.

59:13

A raging brush fire prompted evacuations in central California. State officials say the Coral fire started yesterday afternoon east of San Francisco and is blown up to over twelve thousand acres. As of a late morning, the blaze was only about fifteen percent contained. Officials in San Joaquin County have set up an evacuation center in the town of Tracy and were urging residents who lived near the fire to evacuate.

59:36

I'm Chris Caragio, NBC News Radio, NBC News on KCAA Lowland sponsored by Teamsters Local nineteen thirty two, Protecting the Future of Working Families Teamsters nineteen thirty two dot org. You're listening to an ongore presentation of this program kcaas thank you to the near for this edition of Justice Watch with Attorney Zulu Ali. I am Attorney Zulu Ali with a Justice Watch crew, Rosa Nunnez, Michael malau Klaw

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript