KCAA: Inside Analysis with Eric Kavanagh (Sun, 26 May, 2024)

00:00

Invent every industry industry. Inside Analysis is your source of information and insight about how to make the most of this exciting new era. Learn more and Inside analysis dot com, Inside Analysis dot com and now here's your host, Eric Kavanaugh. Well, ladies and gentlemen, Hello, and welcome back once again to the only coast to coast radio show in the USFA that's all about the information economy. It's time for Inside Analysis or truly Eric Kavanaugh here and folks.

00:32

I am very excited to have Frankly, a legend in the industry with us. Today we're going to be talking with Michael Berthold. He is the founder and CEO of a company called NIME that's spelled k n i M and they are an open source analytics and data science platform, a visual platform for doing data science, which is good stuff because even though there are lots of people who can code very well, almost anyone can look at visuals to move boxes around on a screen. So that's what they figured out. So at

01:00

that, Michael, welcome to Inside Analysis. Tell us a bit about yourself and NIME in which you folks are working on these days. Thanks thanks for the invite, Eric and having us run the show. As you already said,

01:11

NIME is very much about visual programming low code for data science. And we started that many years ago as a platform really for as a workbench almost at my group at the University of University of Constants to be able to kind of deploy our research results to the real world to practitioners wanting to use that. And it's grown from there to become one of the only open source visual

01:34

data flow platforms for doing anything you want to do with data. And I mean I kind of I'm always a little bit careful about calling it data science because that often scares people because they say, I just want to do data wrangling. I don't care about the science seed beats, and that's kind of scares them off. And that's what Nie does as well. So a lot of applications that we see in real life is just in large airports, just

01:57

getting data in the right shapes from many different sources. Yeah, and I will tell you I'm a huge fan of open source. In fact, we built a website a number of years ago and we feed it with a technology that we built called media Lens and it's called inside open source. And the reason we launched it was because I realized there is so much happening in the

02:15

open source world. There's the Apache Foundation, as the Linux Foundation. There are lots of other projects outside of those organizations as well, But in the origin of open source I found fascinating. So I first started researching this in two thousand and five when I was working for the Data Warehousing Institute as their web evangelist, and Katrina had just struck New Orleans. I just moved out of New Orleans, so we watched it all happen on TV. I was

02:43

just, of course terrifying. But I remember that the senators from Louisiana asked for a quarter of a trillion dollars to rebuild southern Louisiana. And I happened to know through a past life and past clients in the government space down there, that the politicians are good at making money peer and I thought to myself, this is a very bad situation, because if all that money just floods in, it's going to flood right out, not where anyone intended, and

03:07

a lot of it is going to disappear. So I went on this high horse, if you will, and started doing research, and it took me to open source and I put forth this theory about open source government, about publishing all the government data such the citizens can see where the money goes and

03:23

understand. And I basically said, look, with the sarbainez Ox the Act in the States, which came out of the Enron debacle, corporations had to document their processes for how they come up with their numbers and had to be very transparent about that stuff. And I thought, well, if corporate America has to do it, why doesn't the government do it as well? And people thought I was crazy, But a couple of things happened that were amazing.

03:46

One guy paid attention. Out of forty thousand people, I emailed, one guy paid attention, and he worked for the Heritage Foundation. He went and talked to He basically testified to Congress and said, we can have citizen auditors. This stuff really can happen. And he really leveraged the imprimiture of

04:00

TDWY and sure enough they did it. House past the bill, Senate passed the bill co sponsored by a guy named Barack Obama who was a Senator for Illinois, and then President George W. Bush, believe it or not, signed the Federal Funding Accountability and Transparency Act in September six, twenty six, two thousand and six, and I almost fell off my chair. I was like, Wow, they actually did it. But the reason I bring this

04:23

up is to talk about the power of open source. And I realized at the time the Apache web server had just surpassed the Microsoft Web Server as the number one web server. And I thought to myself, well, that's very

04:35

interesting. And I was also studying service oriented architecture at the time, and I thought, well, if you have all this open source code and you have a service oriented architecture, you should be able to plug and play and sort of take stuff out and put stuff in and have a very composable environment. And I thought, well, that's not going to be very good news for the SAPs and the Oracles of the world, because they liked the monolith.

04:57

They like control, and they have control of all that stuff. And it took longer than I thought, but about ten years later open source just blew up the market with haddoob and with the Kafka And of course you have NIME. So tell me a bit about the open source foundation of NIME and what drove your decision there and what it means for your customers. That's a

05:20

very interesting story. I didn't know about that open data movement coming from Katrina break in the old is so NIME is a bit atypical in its open source models. I mean fundamentally, there are really three different ways to do that. You can have distributions like Linox, and you're essentially making money by packaging them up nicely and supporting them, but essentially there's not really new code that you're aiding around it. Not quite true. I mean they are little installers

05:46

and that type of stuff, but that's fundamentally the idea behind distributions. You can also have what people often refer to as open core, where you have something that you open source, but it's kind of it's more of a teaser, it's more baitware, and if you really want to use this in production, you have to buy some commercial bits and pieces around it. And then you have, of course also the Mango debs that are essentially databases that type

06:06

of stuff. Also maybe even a data bax with Spark. They have really cool open source technology and they make it accessible to you in the cloud. For charge Time is different in that we have one open source piece that's the analytics platform that allows anybody who wants to build these workflows and execute these workflows and pretty much do anything they want to do with data. And then we have a commercial complement software to that, which we call the Nine Business Hub,

06:31

which allows people to productionize that and collaborate. So when you have more than one person using the analytics platform in your organization and you want to deploy that as a web interface or a rest service, or you want to collaborate and have compliance and governments kind of features, that's when you buy the hub

06:47

from us. And the reason the open source platform is open source. The open source analytics platform is open source is I'm not too religious around it, but to me, it's in that they science field in particular, you can't exist with proprietary software. There's so much cool new stuff going on on a daily basis in resourched groups and other types of environments that you're essentially standing on

07:12

the shoulder of many, many giants. A lot of functionality inside nine is actually based on open source libraries, so it seems kind of unfair almost to put that into a proprietary umbrella. And also it enables us to be fair to get in in rolls, make inrolls into academic into teaching environments much easier because they can just use the open source platform for teaching. But it also we have lot of open source contributors that are contributing additional functionality into the Linemen

07:41

platform. So it's really I'd like to see it as a win win situation also for our customers because they essentially get a lot of maintained functionality from us. In addition, they have access to all of those community functions out there. Yeah, it's very cool. I mean, there are a lot of good things about open source. One of the things that I've heard over the year is that bad code goes away because all these eyes can see now.

08:05

The one shortcoming up come across is that the open source project gets to MVP status, if you will, minimum viable product and then doesn't typically go past that just because it works now we just kind of move beyond that. But what are your thoughts about that, in particular about how you make sure that you have truly finished products and that you're able to deliver robust platform analytics ongoing for all of your clients. How much effort does that take internally from your

08:35

developers to stay on top of the platform and make sure everything's working. That's a very good compoint. I mean, I tend to joke that ninety nine percent of all PhD projects turn into open source projects and then they kind of die away and feel away and never really turn into something useful in production. Now I am a probably about half of our development team, like eighty developers at nine are focused exclusively on the Luditic platform and just making sure that core

09:01

works and works in a professional environment. We have our own extensions, which are of course maintained by ourselves, so we have the same quality assurance there. We have what we call trusted community extensions, where we're in close collaboration with the community contributing those extensions, so we can also make sure there's quality assurance there as well. And then there is i'd call it the long tail

09:26

of extensions that are experimental, right. The nice thing is that everybody can use those and play with those and explore new technologies, and then when we see increased usage of some of these experimental extensions, we can move them into

09:39

the trusted extensions as well. Interesting that makes a lot of sense. So you're trying out things, you've got these extensions in trials basically, and then once you see there's a lot of activity, then you throw some developer support behind it to harden it is the term we typically use right to make it sure that it's bulletproof, that it really does what you want it to do.

10:00

That makes a lot of sense, and you do end to end and NIME does everything from data ingestion, data pipelines, number crunching, model building, all that kind of fun stuff is in the nine analytics platform. Is that right, Yes, that's true. So we have everything from about the ETL part loading the data. We can access about four under different data sources.

10:18

We can access databases, strange file formats. So of course we can also execute bits and pieces on different execution environments like doing the ETL directly insider database or in snowflake, or in data breaks or in our loop in the old days. And then we go all the way to visualization the enderlytics functionality, and a lot of that, as I said before, is of course based on each arts for the visualization, a lot of Pithen libraries see libraries,

10:45

our libraries, Java libraries. For some of the machine learning functionality, we have integrations with TensorFlow. If you wanted to do that. We have integrations with the other deep learning libraries, we have integrations with XG boosts. Pretty much everything is in there. But the other piece is so often when people talk about data science stainly mean this kind of from the data to the

11:05

reporter, to the endpoint or to the model. But the business hub then also covers the rest of this journey, right, deploying into others, managing a three training models when needed, and monitoring their their performance well. Right, because at the end of the day, you want these algorithms to connect into your business, whether it's spirit of marketing or for manufacturing or supply chain or whatever it is. You want it to affect some outcome in the business,

11:33

and so that involves connecting to operational systems, right. It involves connecting to EERP systems or CRM systems or things of this nature. That's where the magic happens. And a lot of times that's the hard part, right. I Mean, I've heard many stories about models that just don't get deployed because maybe the companies didn't have the wherewithal or they didn't have the expertise to do

11:54

it properly. But being able to plug the algorithms into operational systems and then monitor how those models perform and switch them out right, because you've got your production model and have your challenger models that are sitting at by the wayside waiting to get pulled in and being able to switch over to a new model. When a model that's in production starts faltering, that's a critical piece and that, I guess is that done in business hub with you folks. That's something

12:20

on the business subside exactly. So as business hup, you can deploy models which really are deployed nine workflows. You can deploy in them as a rest service or as a wet application, and then people consume it, and you can constantly monitor what's happening in production and then potentially replace retrain or just alert the data science team and just say, hey, this is so out of backwth reality. We don't really know how to fix that, do something about

12:43

it. The nice thing is that you don't have to switch code in between. Right in the old days, always somebody coded the model of the strains in some strange language and then it was reprogrammed by it into some production language. On our case, on the hubside, workflow that was trained is also

13:01

the one that runs in production. That's interesting. So one of the other hurdles that people are running into is when they use Jupiter notebooks to write their model or to build their model to test with data, and then they want to go put that into production, and it's just this step by step tedious process of copying over code and values and all these things, and that falters often. That, from what I understand, is a really serious problem.

13:24

But I guess do you not have that challenge because you're not using the Jupiter notebooks typically and people are just in the environment in the analytics platform building out their models after they pull in their data, et cetera. So you're already production ready when the process begins. Is that about right? That's a very

13:43

nice summary. Yes, So what you use, what you use on the creation site when you actually train the model is exactly that piece of the workflow gets then moved into production and executed by exactly the same engine, so you

13:54

don't also have a translation issue there. The other piece that people often lose in this going from training to production is all of the feature engineering that you did, all the feature transformations that you tend to lose them, so you only can take the model and move that in production, but you can't do the transformations. And in lime you can grab automatically the part of the DOPFOW that has the transformations and the model applying the model, take that rep that

14:22

automatically and deploy it to line business up. Well, that's pretty cool. And you cover also two different industries, right, so you do insurance, healthcare. I would imagine financial services all sorts of different industries because it's more of a horizontal solution instead about right, Yes, that's absolutely true. We

14:41

have customers and users in pretty much every industry. Yeah. Well, and we're going to talk about large language models here in a minute in our next segment, But before we get there, I'll just throw out one of my theories to you and see what you think about this. To me, this explosion of AI through foundations models, including large language models, is really a major call to action for organizations to get their data house in order. And

15:07

what I mean is that data governance. If you don't have data governance, if you don't even know what data governance is from an organizational perspective, you're going to have a hard time responsibly leveraging AI. Would you agree that companies really do need to take a very hard look at their end to end data management life cycles, processes, understand governance, who gets access to what data? Even understanding a broad inventory of your data sources, would you agree that

15:33

is paramount to do that before pulling the trigger on some AI. I totally agree. And the funny in a way, it's funny that we've been preaching this for also data science processes for a long time, this government's topic, and nobody really cared, and now people really cared breaking up. Isn't that fascinating? I mean, I just I've been in this business a long time.

15:56

I've been talking about data governance, analytics, AI, all this stuff for twenty years, right, And we talked about data governance twenty years ago and fifteen years ago and ten years ago, and basically nobody was doing it. I mean, you couldn't even it wasn't even easy to do because you could either control access at the database level, which is hard to access controls, or at the application level. But there's nothing in the middle. And

16:18

really it's in the middle. And now with the cloud, that's one of the nice things about the cloud is that it is this de facto marshaling area for functionality and data. And now we have the capacity to apply very fine grain controls on things, on data sets, on types of data. For example, we can scan and find PII and then know, okay, flag this as sensitive. There are lots of things we can do these days that we just kind of couldn't do ten years ago. Real quick, one minute,

16:45

what do you think is that about? Writer? We finally can do this stuff and so we are doing it. What do you think I'd probably explained it slightly differently and say we could have done it probably before as well, at least some of those aspects, but people just didn't care enough because there was not enough arm in it. Now. But now that everybody who does anything with JENNYI is the danger of sending data anywhere. People are really

17:08

really baking up and seeing the pain there. That's right. Well, it is like I say, it's a call to arms, it's a call to action, and your organizations have got to do it because you don't want to wind up in the crosshairs of an audit. You don't want to wind up with a breach. You don't want to wind up getting sued by someone because their information has now been leaked to sensitive the sensitive resources out there. Well,

17:30

folks, don't touch that doll. We're talking all about AI and analytics platforms, and next up we're going to dive into these large language models that are just taking the business world by absolute storm. It's really quite fascinated to watch. But don't shut out. That will be right back. You're listening to Inside Analysis. Welcome back to Inside Analysis. Here's your host, Eric Tabanac. Folks back here on Inside Analysis talking to Michael Berthol. He is

18:03

the CEO and founder of a company called NIME. That's k n I m Look these folks up online, an open source analytics platform. It's wonderful stuff. It's like a giant candy store for analysts to go play and have fun. But I want to talk to you about these large language models, Michael, and in particular, first of all, the open source side of the equation. So Meta comes out with Lama and lama too open source. Open AI used to be open now it's not. Now it's the ironically named open

18:30

AI because it's a black box. And with the technology this powerful, I believe we need we need open source. I don't know that I would get behind a mandate that they must be open source, but there needs to be some transparency into how these things are working, just so that we can have our peaceful sleep at night to know that there aren't bad actors involved somehow.

18:56

I mean, certainly for regulated industries like mantile services, if you bring it into some workflow for loan approval or something like that, then you have to be able to explain how you came up to your answer. But what are your thoughts in general about open source versus closed source? With these large language models, I think there's a lot of value in it. The problem is

19:19

that, in my opinion, there's open sourcing large language models. Isn't just about open sourcing the code, but you also need to open source how it was actually trained. So in a sense, you also need to at least give open access to the data that was used for training. Because even if I give you a model and it was trained on half copyrighted material that it's going to spit out again when you use it, you wouldn't know if you didn't have access to the training data, right that That's part of that is

19:47

is was it supposed to be used? And then I think the other piece is that what some companies are open sourcing is only the code to use the model for predictions. Later to actually apply it, I still don't know how

20:02

it was trained. So that's the third element that needs open sourcing. And then I believe one of the key proprietary ingredients that a lot of these companies now have is safeguarding code around it so that some types of answers don't get produced, some types of inputs aren't being accepted, and open sourcing that as well would really really reveal their secret sauce. And I think that's why they are the open eyes of the world are shying away from that one. Right,

20:29

No, it does make sense. I mean, we have proprietary code. It's not new, but again, these are very very powerful engines. And then there's another whole side of this equation, which is the RAG model Retrieval augmented Generation, which upon great reflection, I believe will be the layer of functionality for governance, for privacy, to a certain extent, for security,

20:55

for management. You know, a lot of that's going to get baked into the RAG model, where you could, for example, before you hit your prompt, before your prompt goes up to the large language model, have a layer in between the checks and sees. Okay, and this is already happening. Like I asked Gemini a couple of weeks ago how many electoral votes are in Georgia and Arizona and some other state and thought for a second. It said elections are complex and fast moving. We recommend you use Google.

21:22

It is a guard rail. That's a guardrail. They exactly built that in to say no, no, no, no, we don't want to touch that. Right, And that's in the RAG model. Right. That's not like trained in the model. That's outside the model, but it's the workflow you have around the engine that's very very important. Right. I totally agree

21:41

with that one. I mean, that's what I called the safeguards before, and I think sometimes it's probably not even part of the context that's part of the RAG models, but it's really part of some safeguarding code even around it. I mean we use that at MIME as well. So we have built in what we call KAY inside the analytics platform that allows you to have a

21:57

QA mode. You ask questions this and this, and Excel out does that look in a nine doctim and then it gives you shows you a couple of notes in nine, and we're of course filtering that these notes do actually exist, because every now and then open AI, which we use underneath the hood, hallucinates and invents notes that Nie probably should have, but we don't have it, and that doesn't help the pro use. All right, that's a very very simple way of code around the KAY that is just making sure that

22:26

what it spits out is reasonably useful. Mm hmm. Yeah, that's interesting, and you're going to see more and more of these AI agents. That's what everyone is talking about now, are AI agents, which are like little bots, semi autonomous bots that can do various things, and they can check on each other and they can do all kinds of stuff. I mean, it's very interesting to me when we talk about data science. We talked about

22:48

it before, it all seems to be getting subsumed now into AI. In conversations about AI, even though there are lots of different versions of AI, right, I mean, there are traditional models, regression models, all sorts of old fashioned aif you well, it's still very powerful and still works, but the new stuff is sucking all the action out of the room, isn't that about? Right? Yeah? We see that as well. And sometimes I mean, I'm an old guy by now, I've seen this in the

23:12

past. Right when back propagation came along, everybody was suddenly using grade into cent for every problem. We just thought, hey, you can solve this directly. You don't need to do grade into set. Then there was support vector machines, and there was somebody else, and now then with deep learning, and now it's AI. So sometimes we see building workflows for even very simple things, they're reaching out to some AI and we just say, hey,

23:33

there's a no denignment that does that computationally a lot less expensive. I don't use that, So I think, to me, it's it's a bit of a hype right now, it's just a new kid in the block. Everybody wants to play with it and use it. But the augmented really mixing it and matching it right with traditional techniques. I think that's where the true

23:51

value lies. Yeah. Well, and so I'm just guessing here that one of the nice things about your platform is that it is an end to end platform for building models, designing models, training models, pulling the data in all these things, and it's adjacent to this business hub. So you have a marshaling area for ideas and for testing algorithms and for testing models. Then you connect it through the business hub and see what happens and see how it

24:18

operates. And it's important to have this one environment where that takes place, because when you have multiple tools, it just takes longer and it's disjointed and there are connections between the tools and things change. So it's important to have that main marshaling area to It's like a giant analytics sandbox. Is that about right? That's a very nice description. Absolutely. I tend to say that data scientists doesn't necessarily need to know how the method does something, but it

24:49

needs to know what the method does. So if it's reaching out to a Python library or in our library or Sea library underneath it, it's not that important, but you still need to understand what the method actually does underneath to be able to interpret the results. It's a simple example. If you don't know what a regression coefficient is, you won't be able to interpret it, but you don't necessarily need to understand how it was derived from the data.

25:11

Yeah, no, that that's pretty interesting. Let me throw this concept at you and see what you think about it. I wrote up an article just

25:18

last week. I guess about this. I was flying to a conference in Denver just thinking about these large language models and analytics and AI and all this stuff that have been covering for a long long time, and I thought to myself about this concept I call the executive cockpit, and the idea is that I think very forward looking organizations are going to deploy a small language model that is aligned with their business, like if it's manufacturing or healthcare or whatever,

25:44

in their data center, so on prem, possibly in the cloud as well, but I have my thoughts wrapped around this on prem small language model. Then you're going to train it on your ERP, on your salesforce, on your CRM, on your customer support for example, your ticket it's like any of your core enterprise systems. You're going to train this model on your data,

26:03

on your business business data. Then what you'll do is set up Coughka topics coming from those systems into a vector database adjacent to this interface for the small language model, and that is where the executives will spend their day running their business. Because then you could ask any question at all, how is our marketing working in APAC? Who can we let go if we have to save some money. Where are we weak in our organization right now? Just

26:32

all kinds of different questions and you'll get all these answers. And I actually mentioned to a CEO of this one company because I was trying to get him to help me do sales enablement for them, because I have this big audience I've been marketing to for years. And one person turns out to be the next deputy chief data officer for the IRS. And I sent this email saying, Hey, this is the lady I've known for a long long time. This is what I mean by sales enablement. Do you guys have the IRS

26:57

account? And I fire back you so I don't know. I don't know if the other's accounts. I thought to myself, well, you would know if you had the executive cockpit, because you would just ask it, do we have the IRS account? Who is the account rep? What's the latest of this account? Because you're getting information from all these systems in your private environment. But what do you think about this concept? Is that is that doable? Is that pie in the sky or what do you think about all

27:19

that? It's an interesting idea. I thought about it similar I mean, at the end of the day, you're personalizing a large language model around your own infrastructure in house data. I think the challenge there is that in order to get a really really good model like that one that's really useful, you

27:37

need to train it on a lot more data than just your own. So in a sense, you need to benefit from your competitor's data without actually seeing that, but kind of learning the general structure and the general insights, and then you customize it on your own, which in return kind of means that you should also be providing your data to other organizations. It's almost like that's kind of pree competitive training of these models so that they're useful for everybody.

28:03

I think just training it on your setup, you need some bigger context than that. Or maybe you're a company and you have enough context anyway, But for every small company, I don't think you have enough data to really get meaningful insights. That's very interesting. That's a good that's a good point because I'm just I'm wondering to myself and I'm gonna throw this one at you too. So one of my AHA moments with these large language models is when I

28:27

realized that when you train them on a corpus of data. They're not actually persisting the data verbatim. It's not like they're taking strings of text and storing it in a record somewhere, but rather, in the training process that data you use will adjust the weights and biases and the parameters of the model. So in other words, it's like, huh, well, that's that's very interesting that it can train in that fashion and then reflect back to you such

28:56

remarkably granular detail about things. And you know, what I've seen is that if there is a subject therea that has been published about widely, like how computer processors work, or how an irrigation system works, anything that has a lot of content on the web that these engines were trained on, it does very well. It knows all that stuff. It's when you get to the fringe where it's not that much published. And I guess that's kind of your

29:22

point about having enough data to train the model. So if you don't have enough, you're not going to get the contours right, and it's going to be skewed in one direction or other. Is that about right? I think that's a very good summary. The contrast is right. I mean, a colleague of mind wants summarized, since it's essentially it's a consensus engine. It's getting the consensus around what a computer programming is, learns that from the data

29:45

and can repeat that. But if it's just one isolated outcome, it's not going to be able to recall that one. Interesting. Yes, So Craig schmid Huber I think his name is. He's the guy who wrote the papers on the transformers, and he's based I guess he's actually in Saudi Arabia these days, but I want to say he's German of German origin. And I was amazed when I realized he wrote those papers in like the nineteen nineties or something. And it's just just now we have the compute to be able to

30:11

can you explain that? Is that what happened is that just the timing was right now to be able to understand this and put it into play, because that was one of the big changes. And now it's able to see like, you know, ten twelve tokens left or right as opposed to just like

30:26

two or three. And you also have this, like you say, like a consensus right where so they are like I call it almost like an ai Greek chorus where one is saying I think it should be an A. I think it should be a B. I think it should be a C. And then the Okay, I'm going to pick this one. That's very interesting. It's a very interesting development. But why do you think it took so long? Is it just because we now have the compute to do that?

30:48

I think it was a compute party issue as well. And then some science tends to have a little bit more of fun, needs a little bit of time before it truly has an impact, but mostly waiting for complete power.

30:59

I don't know. One way of looking at what this consensus really does is I don't know if you watch these YouTube videas about JGBT playing chess now and the interesting part is that at the beginning it does extremely well and does very sensible things, and part of that is these opening libraries are all over the

31:17

place, so that's extremely well established consensus. And then somewhere in the middle it starts inventing bizarre moves and suddenly new figures pop up on the on the board out of nowhere, right, and it has always meaningful explanations for that. And the problem there is that data is so sparse that there's no consensus to learn. So at the beginning it sounds it almost looks like it understands

31:38

chess rules. But the only reason it does follow the chess rules is that they're so deeply ingrained in all of the common material that you see that the kind of the likelihood of going outside of the world book is too small. But somewhere in the middle of the game it goes completely off the books. That's interesting, that's wild. So one of my good friends in the business, as a gentleman named Usama Fayad, you may have come across from at

32:02

some point. He was the first chief data officer for Yahoo way back in the day, and now he runs the Institute for Experiential AI over at Northeastern University here in the States. And I had him in the show, and he's very funny, he's very candid. He said, these large models, they're too big, they're not supposed to work. We don't know why they work. What are you talking about, this guy who runs this whole operation.

32:25

He's joking, we don't even know how they work. I mean, how's that for transparency, right, I mean there's some truth, right, we don't really know how they come up with these answers. Right, it's a wild mix of it. It's a highly distributed model. We don't know why a particular answer comes. We can come up with kind of proxies for an explanation by wiggling with the inputs and trying to figure out what happens, and we can say, ah, this probably had a lot of influence on

32:50

the decision, but we don't know for sure. That's so wild. I mean, that's just such a big deal that you know, but we do. So now we have all this observability in the data space, right, You've got Data Relic and Data Dog or new relative to say data. All these different companies are doing observability which I think spun out of Kubernetes primarily. But it's very interesting and we need that kind of observability in these large language models. I think, I think that's going to be one of the keys

33:14

to success. But folks, don't touch that dot. Will be right back. We're talking to Michael berthold from NIME on Inside Analysis. Standby, Welcome

33:30

back to Inside Analysis. Here's your host, Eric Tabanac. All right, folks back here on Inside Analysis with Michael bertholdt founder and CEO of NIME k n I m E looked them up online and Michael I was mentioning to you in the break that I'm wondering to myself this whole business intelligence industry and there are hundreds of players these days, hundreds of companies doing some form of analytics.

33:55

Of course, NIME is an whole analytics platform, in open source analytics platform, end to end, but there are lots of point tools, whether it's visualization or number crunching, olapp roll, app all this kind of stuff, and I wonder is all of that in the crosshairs of these foundational models? What do you think? That's a very very interesting question, and we

34:15

of course asked ourselves that as well. And I think for some of the some of the tools that you mentioned, like generating visualizations, that type of stuff, I do think they are pretty replaceable by AI type models, because at the end of the day, you're doing something, you're generating a code that generates the visualization based on data, and you judge the output of that code by just looking at it and saying this is quite right. So I

34:40

think that type of stuff will go away. And we have in NIME actually built in what used to be an each chart scripting editor that has now an AI element and you don't need to touch the code anymore. So those types

34:52

of wills I believe will go away. The eye tools trying to really find surprising, interesting new insights in data, I think that type of stuff is a lot harder to replace because fundamentally you're trying to find something new, And like we discussed before the break, these GENEI models are consensus engines, right, so they kind of try to gravitate towards something they've seen more and more often before. Interesting. That's right, That's an excellent point. Really,

35:21

that's that's exactly right. So it's good for understanding the well trodden path basically, Like that's what it's very good at doing is saying, Okay, there's a highway, it goes that direction, but I want to go wandering around the forest, and it's not as good on the fringe basically, so you

35:38

will use it. But I mean, so I read an article of some guy on LinkedIn talking about how he connected I don't know by ODBC or JDBC or something in his model with data sources, and he asked it to queer the data source and it did it reached into the database, pulled the information out and delivered it and you're like, Okay, that's pretty interesting. And then when I think to my about what's what could be happening here? Is in the data warehousing space, for example, we move so much data around.

36:07

It's all the data that's from your core systems that you've decided to put in, which is a tremendous amount. Very little of that data ever gets used a lot of times it's the it's the summaries or the aggregates or the roll ups that are used for various purposes, but a lot of it just doesn't even get used at all. And I think that what these large language models are going to do is kind of turn the entire model inside out of

36:30

how we viewed moving data and analyzing data and doing things with data. Because they don't they don't really care. They're just going to once they're trained on a certain space. And again, if you train it on your data, or if you're in your vector database, you have a lot of embeddings of your corporate data and you point your RAG model there, well, you can get answers to things very quickly that before would have required running reports and doing

36:53

etl. And doing all this stuff, and I think that in many use cases, these models are going to short circuit all that stuff and you're just not going to have to do as much that stuff anymore at all. But what do you think about that? I think there is some truth to it, because fundamentally, what these models won't necessarily do is actually look at all of the data, but they're going to apply a lot of common standard practices

37:17

to that. And standard sounds a little bit too limiting. I think there's a huge wealth of standard practices that people do apply to the data, and that's part of this consensus engine, and so that the AI models will try out a lot of those things a lot faster than you ever would. So absolutely, and there's a good chance that some of these insights that will be

37:37

generated are interesting to you. But then continuing the exploration and saying, I mean, how do they always say the Eureka moment is usually preceded by Oops, that's strange. I think you'll have these moments right, and AI doesn't do that. AI doesn't say this is weird. I should diggle in a little bit deeper because that's outside of the consensus. So it will continue doing kind of like you said, will continue the normal path. And that's what

38:07

I believe. The human intuition, curiosity oops detection capability is going to be relevant for a long time. I like this oops detection. That's good stuff. Well, there was a gentleman I had on the show years ago who did something. He said something a lot like that. He basically said, AI doesn't have to be the ability to be like, hmm, that's kind

38:30

of weird. What's going on with that? Right? Because it's just processing information and doing what it's been told to do, which is just reflect backwards based upon a prompt, and it's training. It's a very simple thing. I mean, it's very complex in terms of how it got there, but nonetheless it you know, one thing that did annoy me, I will say is in the early days when that New York Times reporter was getting deep with the with chat GPT and trying to like tease out of it whether it's sentient

38:58

or something. I'm like, dude, that is a miss use of the technology, Like that is not whether you should be using this thing for to try to like what trick it into revealing that it's really alive. And you know, what are you been talking about. And I think that's part of the downside these days is that. And I'm a media person myself, but a lot of times the media will just sort of glom onto some narrative about something and it's very hard for them to decouple from that and get down to

39:23

brass tacks. And that's what we do in the show. In fact, I used to say at the beginning of every show, the show, it's all about getting down to the brass tax of what actually happens in the data world and what you do with this stuff. And I think it is important that people keep in their minds the purpose of this technology, why are you using it, Where is it appropriate to use it, and where is it not appropriate to use it? And that's just basic common sense, right,

39:46

Yes, I totally agree. I mean I go pretty much in line with also the European aii that they just passpect. I mean, if it's not mission critical, if it's not safety critical, you can trust a system that is wrong in I don't know, point one percent of all cases. If it's controlling nuclear power plants, better not be wrong in point one percent of all cases, right, That's right. You got to watch out where it's so where do you see a lot of use case of your clients. I

40:13

mean, obviously some of your clients are using large language models. Where are you seeing success stories in that space right now? So there's a lot of success stories in other areas of the business, as you probably probably know. Undoubtedly no checking legal contracts, doing marketing material, that type of stuff. There's a lot of value in applying GENII on the data analytic space. Honestly,

40:35

we don't. We see a lot of interest. There's a lot of people that say, oh cool, I can build a customized chatboard using name that's not really our core business. And then the real applications tend to be around text processing, just where Jennai is really strong. And then instead of using outdated antique libraries for sentiment analysis or text segmentation, you're just handing over to an AI model and say, hey, segment this, or extract the

41:02

key components or create a summary for that type of stuff. It's amazing. So I see I'm also as image mining extensions. I think that's the next setup where we can use image processing capabilities of jen Ai for a lot of the number crunching. I mean, we've all seen these cases where you can't add two numbers, doesn't really know what the prime numbers. This is the

41:24

understanding of the concept of a number, right. So there. I think it's more as as a tool to help you build workflows, build datazations, but only as a helper, right. So that's actually an excellent point I wanted to get into. I believe that we're just scratching the surface of using these models as a component in a workflow. So you mentioned, for example

41:49

summarization. That is hugely powerful. I mean, you know you can enter especially for policies, for complex policies for law, for example, for legal protocols and when to file motions, what motions you can file, what you have to do according to I mean, you used to have to pay lawyer's a lot of money to tell you that stuff. Now if you just get access to the rules, will load them into a large language model and just

42:13

start asking questions. That is an incredibly powerful use case because it used to take a lot of time to sort through the process of how to do something. Now you just ask it. How do I fire someone? First step, send them a letter saying they're not performing properly. Second step, you know, monitor their behavior. Third step with all this stuff, it's like there, it is like, wow, talk about saving time. I mean,

42:34

it saves time. And here's another big soapbox issue. It improves morale because nobody wants to spend their time scratching their head reading through just dreadful documentation. Nobody likes doing that, nobody, So all that stuff is going to go away, right, what do you think? I totally agree your examples center a lot around firing people, by the way, But I think I tend to say, and people ask me if Jenny going to make data science

43:00

lives easier? I say no, I don't think so. But it's going to make it nicer because it's going to remove all of that boring stuff and we can now focus on the really interesting but more complex stuff. So it's going to make it more interesting, more complex. That's interesting. Yeah, wow? And I think you do want to document things, and you can have it document things for you too, right, you can just throw a

43:22

whole bunch of stuff and document this, okay exactly. I mean we now have a component on the hubs that takes a nine workflow and explains what the nine workflow does. And we do that by just shipping it off to Jennai. It's perfect for that. Wow, that's amazing. All right, folks, Well, Podcast pull a segment coming up next. We're listening to Inside Analysis. All right, folks back here on Inside Analysis talking to Michael Bertholdt is the founder and CEO of nine K. I am and Michael. I

43:51

know what it's like in a software company. There's always a roadmap. You're always working on something, and we talked about a couple of key things. Governance. There's model governance, there's data governance, there's it governance. What are you working on in the governance space? Thanks for asking that, Like, like you know, all the road maps are changing all the time.

44:10

But what we're currently working on. We had actually this model government's topic on the work right, brend I've been working on that for a couple of years now. So the idea of being able to monitor what models are doing automatically we train them. We talked about that and I think the first episode, but what we added now is the ability to also govern the AI usage of

44:31

people that are creating nine work clows. So, first of all, when somebody is creating nine work clos using the NIME analytics platform and uses this built in AI. We call it KAI for NIME AI. We need to make sure that gets channel to an IT approved AI. Right. Maybe that's just for expense purposes, you know, I'm going to have too much consumption in the cloud money you make that in house or it's really a data privacy issue.

44:53

But the more worrisome part for people is that you I mean one of the springs of the Namealytics platform, the workflow concepts, is that everybody can use any technology they want. Right, they can reach out to experimental libraries, they can reach out to our stuff, to Python stuff, to whatever.

45:09

But by now they can of course also connect to various different AI providers, and we need a way for them for Central Light Tea Governance to be able to make sure that the nine workflow users inside the organization can only use approved AIS. So maybe maybe marketing can use an AI in the cloud, but maybe legal shouldn't or HR shouldn't, right, And that's something we have

45:35

built in into the name hub. Now that we can limit the types of AIS, you can their users can reach out to from the nine ndlydrics platform and they get to choose from one of the approved AIS that it central light set up and said, okay, here's an AI that's consumption light. That's for the easy tasks, here's the one for whatever the tech team. Here's

45:55

the one that's for compliant data. And we also allow they're set up on the hubside of safeguarding workflows so that you can before the data gets sent out to say a cloud AI provider, it gets screened for private information or maybe the data automatically it gets anonymized before it gets sent out. Yeah, that's very important stuff and the US are you also able to do some fin apps on that, In other words, see how much it's costing to leverage this

46:24

AI engine versus that AI engine and do some cost optimization. Is that something you can do. We can do that as part of a nine workflow and you could build that. You could build that in there as well. But we are currently offering to our customers the abilities to monitor our consumption so they have a bit of an eye on that. But it's not automatically re routing to different AIS. But that's just an added functionality under the wood m hm.

46:47

Yeah, it's all in the workflows basically, and that's where we're going in the last segment where I think that we're just the beginning of leveraging these technologies because what they're really very good at is pattern recognition, right, even just the vectorization the embedding is basically how it stores it as a point in array basically, and just understanding how it can map those two things. It's not just word not just text generation. I think we're going to get some

47:13

really interesting things in terms of pattern recognition and then recommendations. I mean, I think that these little AI agents, these assistants are going to be extremely helpful in all facets of business. You know, to be able to very quickly give you a customer profile when you're on the phone with someone, or to be able to give you summarization of text on demand. I mean, really, I think the hardest challenge is going to be changing mindsets and changing

47:38

day to day behaviors and workflows. What do you think final thoughts? I totally agree with that, But you see that inside NIM as well. I mean, the developers were the first ones that really said we want to use this, you want to use this, but then getting the rest of the organization to also seriously think about it. It's like, what can HR do, but can legal really do with it. It's a huge time saver and it's largely untipped. I totally give with you check and they're interesting times ahead,

48:02

yes, to say the least. Well, what a fantastic conversation, folks, look them up online. Michael Berthold b E R T H O L D from nine k N I M E. We'll talk to you next time. Folks, you've been listening to Inside Analysis. For more local radio every day, listen to KCAA. Redlands Ranch Market is a unique, full service international or grocery store that specializes in authentic food items from Mexico, India, and from many Mediterranean and Asian countries, including popular items from the US.

48:37

They offer fresh baked items from their in house bakery, housemade tortillas from their tortillarea, a delicious array of prepared Mexican foods, a terrific fresh food and juice bar, and a large selection of meats, seafoods and deli sandwiches, salads, and hellal meats. Their produce department is stocked full with fresh, local and hard to find international fruits and vegetables that you cannot find anywhere else. Don't forget to step into the massive beer cave and experience the largest

49:04

selection of domestic, artisan and imported beers in the IE. They can also cater your next event with one of the delicious takeout catering trays of food. Visit them at Redlands ranch Market dot com. That's Redlands Ranch Market dot com. Redlands Ranch Market a unique and fun shopping destination. To Hebotea Club's original pure power to rcosuper tea comes from the only tree in the world that fungus

49:29

does not grow on. As a result, it naturally has anti fungal, anti infection, anti viral, antibacterial, anti inflammation, and anti parasite properties. So the tea is great for healthy people because it helps build the immune system, and it can be truly miraculous for someone fighting a potentially life threatening disease due to an infection, diabetes, or cancer. The tea is also organic and naturally caffeine free. A one pound package of tea is forty nine

49:54

ninety five which includes shipping. To you order, please visit to Heboteaclub dot com. Ta Hebo is spelled T like tom, a h ee b like boyo. They continue with the word T and then the word club. The complete website is to Hebot club dot com or call us at eight one eight six one zero eight zero eight eight Monday through Saturday, nine am to five pm California time. That's eight one eight six one zero eight zero eight eight to Hebot club dot com. With sixty years of fascinating facts. This is

50:28

the man from yesterday and back in time. We go to this time in nineteen seventy, Stevie Wonder discloses he's gonna marry Miss Sorita Wright in just a few months. Here in nineteen seventy, so Rita Wright is a secretary who works for a motown. Stevie Wonder says they have been composing songs together, and from this time in twenty thirty, goodbye to Click and Clack, as car talk hosts Tom and Ray Magliosi are retiring after doing their car talk radio

51:07

program for some thirty five years. Look, she said, I want something that goes from zero to two hundred in four seconds or less, and my birthday's coming up. You could surprise me for her birthday. He bought her a brand new bathroom scale, and from this time in nineteen fifty five, NBC launches Monitor. It's a radio program that lasts all weekend, and it's supposed to be a cousin of TV's Today Show and Tonight Show in that the

51:42

listener doesn't know what's coming up next. Listen again on the hour for NBC Monitor News with more at Man from Yesterday dot Com. PAULM. Spreeze Dispensary reminds everyone during these challenging times they understand the importance of mental health. Offering daily deals on everything from top quality flowers to edibles and discounts on CBD products. Palm Springs Dispensary's goal is to help you win the battle within. As

52:10

one of the area's most beautiful wellness shops. Located on Garnet Avenue, visit the Palm Springs Dispensary dot com. That's the Palm Springs Dispensary dot Com. Hi, I'm Lannie Swardblow and I'm back on KCAA ten fifty am and Express one oh six point five FM every Tuesday at eight pm. My show is

52:31

Beyond Common Sense. It's Lanny Sense featuring me Lanni Swardlowe, kcaa's resident gay Jewish liberal potsmoking race mixing, left handed atheist, an evangelical, fundamentalist, Christian nationalist, Worst Nightmare with subjects that no one else will touch in quite the same way, Every Tuesday at eight pm on Express one oh six point five FM, The Legacy ten fifty AA and live screaming on Kcaradio dot Com.

53:05

Bob Byla here with my home improvement tip of the day. Gas powered fireplaces have been showing up in more and more homes in the past few years. They're clean, easy to use, and add a nice ambiance to the home. However, one drawback is the noises they sometimes make. If you hear a popping noise when the burner's on, it may signal small leaks around joints in the burner assembly. To test for leaks, first turn off the burner and once the ceramic logs have cooled off, remove them from the firebox.

53:32

Next, mix a bit of liquid detergent with water and pour it into a spray bottle. Turn the now exposed burner assembly back on and look for any small bursts of flame that may be popping around the joints. If you don't see any, try spraying a little detergent makes on the various joints and fittings in the burner assembly. If you see bubbles, you found the leak. If you can't find a leak, or the leak you find appears to be a hole in the assembly itself, you'll want to call in a pro.

53:58

Get more info at bobyla dot com and right here at home to meet Bob Deever, you like to safely leverage bank money to earn double digit returns income tax free, with guarantees and no downside market risk. How can you do this? This is Farence host of the Your Personal Bank Show. One You fund a high cash value policy one time to earn dividends and interest. Two establish a bank line of credit using the cash in your policy as collateral.

54:28

When you earn more in dividends from your policy than the interest the bank charges, you keep the difference, and the difference is average two to five percent annually in your favor for the past forty plus years. Three, the bank funds contributions years two to twenty plus. Each year the bank adds funds, your rate of return increases. Your average rate of return can grow too strong, double digits annually within a few years. Contact me you your Personal

54:52

bank dot com. Your Personal Bank dot com or eight sixty six two six eight four four two two eight sixty six two six eight four two two for more info, or tune in to your Personal Bank show. Your Personal Bank Show airs Tuesdays at four pm right here on CASEAA ten fifty am and one oh six point five am. This station that leaves no listeners behind. KCAA Radio has openings for one hour talk shows. If you want to host a radio show, now is the time. Make kca your flangship station. Our

55:22

rates are affordable and our services are second to none. We broadcast to a population of five million people plus. We stream and podcast on all major online audio and video systems. If you've been thinking about broadcasting a weekly radio program on real radio plus the internet, contact our CEO at two eight one five nine nine ninety eight hundred two eight one five nine nine ninety eight hundred. You could skype your show from your home to our Redlands, California studio,

55:50

where our live producers and engineers are ready to work with you personally. A radio program on KCAA is the perfect work from home advocation in these ut full times. Just type kca radio dot com into your browser to learn more about hosting a show on the best station in the nation, or call our CEO for details too. Eight one five eight hundred. An auto dealer says he's saving all the good cars for himself. It's the Onion Radio News. This

56:19

is Doyle Redland reporting. Owner Jim Gannetti of Gannetti Chevrolet admitted to reporters today that his slogan the best cars for the best prices is a lie. Instead of selling the best cars, Gonnetti has secretly been keeping all the best cars for himself. He now says the guilt and storage costs have finally forced him to come clean. It was too much. Every time I heard my own

56:44

jingle, I I just felt like a fraud. Gnnetti added that he was afraid of becoming like his father, legendary auto dealer Carl Gannetti, who died alone in a house filled with two hundred eighty three Sedans. Doyle Redland for The Onion Radio News online at the Onion dot com. Del Walmsley here, the first thing you're going to have to learn is that until you stop expecting our politicians or anyone else to change your life, your life isn't going to

57:08

change. The only person who can change your life is you. But you need to know how. Listen to my show, The Del Walmsley Radio Show, where the Hype ends and the help begins right here on CACAA now broadcasting on ten fifty AM and one oh six point five FM, the stations that leave no listener Behind NBC News Radio, I'm Chris Gragio. At least fifteen

57:31

storm related deaths are being reported after severe weather slammed the nation's midsection. Authorities in Texas say seven people, including two children, were killed when a tornado roared through Cook County, just north of Dallas. Other deaths were reported in Oklahoma, Arkansas, and Kentucky as violent storms caused widespread damage and knocked out

57:50

power to tens of thousands. The International Criminal Court is facing sharp criticism from both sides of the Aisle for pursuing arrest warrants for top Israeli officials over the world in Gaza, Florida. Democrat Jared Moskowitz on Fox News Sunday Today said the move by the ICC was politically motivated. The ICCs are relevant, they have no jurisdiction. We might as well call them the Harry Potter Ministry.

58:13

Of Magic. Congressman Moscowitz said there's no equivalence between Israel and Hamas. The chief prosecutor at the ICC is seeking arrest warrants for Israeli Prime Minister Benjamin net Yahoo and Hamas officials, as the conflict in Gaza has killed tens of thousands of civilians. The Hushmuny trial of former President Donald Trump heads into its final

58:31

phase this week. Larry Kovsky Now with a preview. Closing arguments are expected Tuesday in a Manhattan courtroom, and Senator Tim Scott predicts the former president will be found not guilty. Appearing on CNN State of the Union, the South Carolina Republican said Americans no, when they see a two tiered justice system, we don't want to see the weaponizing of our justice system against our political opponents. We want to see fairness, no thumb on the scale. If that's

59:00

the case, he will be found innocent. Trump faces thirty four Fella accounts in the case in which he is accused of falsifying business records related to a payment to adult film actor Stormy Daniels. Larry Kovsky reporting and the NBA's Western Conference finals continue tonight in Dallas. With the Mavericks currently up two games to none on the Minnesota Templewolves. I'm Chris Karragio, NBC News Radio, NBC News on KCAA, Lowlanda sponsored by Teamsters Local nineteen thirty two, Protecting the

59:32

Future of Working Families Teamsters nineteen thirty two dot org. You're listening to an encore presentation of this program KCAA, The Inland Talk Express and Return to Zulu out the Justice Watch Crew, Rose of nu Yez, Michael Bloud, Parsh, Doctor Kilbasher. Today, like each week, we'll be discussed

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript