KCAA: Inside Analysis with Eric Kavanagh (Sun, 19 May, 2024)

00:00

Nineteen thirty two. Protecting the future of working Families Teamsters nineteen thirty two. Dot org. The information economy has a rid. The world is teeming with innovation as new business models reinvent every industry industry. Inside Analysis is your source of information and insight about how to make the most of this exciting new era.

00:25

Learn more at inside analysis dot com, insideanalysis dot com. And now here's your host, Eric Kavanaugh. Ladies and gentlemen, Hello, and welcome back once again to the only coast to coast radio show in the USFA. It's all about the information economy. It's time for Inside Analysis. Yours truly, Eric Kavanaugh here and folks, I am very excited to have Frankly, a legend in the industry with us. Today we're gonna be talking with Michael

00:54

Berthold. He is the founder and CEO of a company called NIME that spelled KNA and I am. And they are an open source analytics and data science platform, a visual platform for doing data science, which is good stuff because even though there are lots of people who can code very well, almost anyone can look at visuals and move boxes around on the screen. So that's what

01:15

they figured out so at that. Michael, welcome to Inside Analysis. Tell us a bit about yourself and NIME and which you folks are working on these days. Thanks thanks for the invite, Eric and having us on the show. As you already said, NIME is very much about visual programming low code

01:30

for data science. And we started that many years ago as a platform really for as a workbench almost at my group at the University of University of Constants, to be able to kind of deploy our research results to the real world to practitioners wanting to use that. And it's grown from there to become one of the only open source visual data flow platforms for doing anything you want to

01:55

do with data. And I mean I kind of I'm always a little bit careful about calling it data science because that often scares people because I say, I just want to do data wrangling. I don't care about the science seed bits, and that's kind of scares them off. That's what nine does as well. So a lot of applications that we see in real life is just in large airports, just getting data in the right shapes from many different sources.

02:17

Yeah, and I will tell you I'm a huge fan of open source in fact, we built a website a number of years ago, and we feed it with the technology that we built called media Lens, and it's called inside open Source. And the reason we launched it was because I realized there is so much happening in the open source world. There's the Apache Foundation, there's the Linux Foundation. There are lots of other projects outside of those organizations

02:39

as well, But in the origin of open source I found fascinating. So I first started researching this in two thousand and five when I was working for the Data Warehousing Institute as their web evangelist, and Katrina had just struck New Orleans. I just moved out of New Orleans. So we watched it all happen on TV. Of course terrifying, But I remember that the senators from

03:04

Louisiana asked for a quarter of a trillion dollars to rebuild southern Louisiana. And I happened to know through a past life and past clients and the government space down there, that the politicians are good at making money disappear. And I thought to myself, this is a very bad situation, because if all that money just floods in, it's going to flood right out. Not where anyone

03:23

intended, and a lot of it is just going to disappear. So I went in this high horse, if you will, and started doing research, and it took me to open source, and I put forth this theory about open source government, about publishing all the government data such the citizens can see

03:38

where the money goes and understand. And I basically said, look, with the sarbainez Ox, the Act and the states which came out of the Enron tobacle, corporations had to document their processes for how they come up with their numbers and had to be very transparent about that stuff. And I thought, well, if corporate America has to do it, why doesn't the government do it as well? And people thought I was crazy, But a couple of

04:00

things happened that were amazing. One guy paid attention. Out of forty thousand people I emailed. One guy paid attention, and he worked for the Heritage Foundation. He went and talked to He basically testified to Congress and said, we can have citizen auditors. This stuff really can happen. And he really

04:15

leveraged the inprimature of TDWY, and sure enough they did it. House passed the bill, Senate passed the bill co sponsored by a guy named Barack Obama who was a Senator for Illinois, and then President George W. Bush believe it or not, signed the Federal Funding Accountability and Transparency Act in September six, twenty six, two thousand and six, and I almost fell off my chair. I was like, Wow, they actually did it. But the reason I bring this up is to talk about the power of open source.

04:42

And I realized at the time the Apache web server had just surpassed the Microsoft Web Server as the number one web server, and I thought to myself, well, that's very interesting. And I was also studying service oriented architecture at the time, and I thought, well, if you have all this open source code and you have a service oriented architecture, you should be able to plug and play and sort of take stuff out and put stuff in and have

05:04

a very composable environment. And that thought, well, that's not going to be very good news for the SAPs and the Oracles of the world, because they like the monolith, they like control, and they have control of all that stuff. And it took longer than I thought, but about ten years later open source just blew up the market with HADDUP and with the Kafka and

05:26

of course you have NIME. So tell me a bit about the open source foundation of NIME and what drove your decision there and what it means for your customers. That's a very interesting story. I didn't know about that open data movement coming from Katrina break and the old is So NIME is a bit atypical in its open source models. I mean fundamentally, there are really three different

05:49

ways to do that. You kin have distributions like Lynx, and you're essentially making money by packaging them up nicely and supporting them, but essentially there's not really new code that you're aiding around it. Not quite true. I mean they are have little installers and that type of stuff, but that's fundamentally the

06:04

idea behind distributions. You can also have what people often refer to as open core, where you have something that you open source, but it's kind of it's more of a teaser, it's more baitware, and if you really want to use this introduction, you have to buy some commercial bits and pieces around it. And then you have of course also the longo debs that are essentially databases. That type of stuff also maybe even a data breax with Spark.

06:26

They have really cool open source technology and they make it accessible to you in the cloud for a charge. Time is different in that we have one open source piece that's the analytics platform that allows anybody who wants to build these workflows and execute these workflows and pretty much do anything they want to do with data. And then we have a commercial complement software to that, which we call

06:46

the nine Business Hub, which allows people to productionize that and collaborate. So when you have more than one person using the analytics platform in your organization and you want to deploy that as a web interface or rest service, so you want to collaborate and have compliance and governments kind of features, that's when you

07:03

buy the hub from us. And the reason the open source platform is open source, the open source analytics platform is open source is I'm not too religious around it, but to me, it's in the data science field in particular, you can't exist with proprietary software. There's so much cool new stuff going on on a daily basis in researched groups and other types of environments that you're

07:28

essentially standing on your shoulder of many, many giants. A lot of functionality inside NIME is actually based on open source libraries, so it seems kind of unfair almost to put that into a proprietary umbrella. And also it enables us to be fair to get in in roles, make inrolls into academic into teaching environments much easier because they can just use the open source platform for teaching. But it also we have a lot of open source contributors that are contributing additional

07:56

functionality into the Linemen platform. So it's really I'd like to say see it as a win win situation also for our customers because they essentially get a lot of maintained functionality from us. In addition, they have access to all of those community functions out there. Yeah, it's very cool. I mean, there are a lot of good things about open source. One of the things that I've heard over the years is that bad code goes away because all these

08:20

eyes can see now. The one shortcoming I've come across is that the open source project gets to MVP status, if you will, minimum viable product and then doesn't typically go past that just because it works. Now we just kind of move beyond that. But what are your thoughts about that, in particular about how you make sure that you have truly finished products and that you're able

08:43

to deliver robust platform analytics ongoing for all of your clients. How much effort does that take internally from your developers to stay on top of the platform and make sure everything's working. That's a very good compoint. I mean, I tend to joke that ninety nine percent of all PhD projects turn into open source projects and then they kind of die away and feel away and never really turn

09:05

into something useful in production. Now I am a probably about half of our development team we have, like ad developers at nine are focused exclusively on the analytics platform and just making sure that core works and works in a professional environment. We have our own extensions, which are of course maintained by ourselves,

09:28

so we have the same quality assurance there. We have what we call trusted Community extensions, where we're in close collaboration with the community contributing those extensions, so we can also make sure there's quality assurance there as well. And then there is I call it the long tail of extensions that are experimental, right. The nice thing is that everybody can use those and play with us and explore new technologies, and then we see increased usage of some of these experimental

09:54

extensions, we can move them into the trusted extensions as well. Interesting, that makes a lot of sense. So you're trying out things, You've got these extensions in trials basically, and then once you see there's a lot of activity, then you grow some developer support behind it. To harden it is the term we typically use right to make it sure that it's bulletproof, that it really does what you want it to do, etc. Know that that

10:16

makes a lot of sense, and you do end to end. NIME does everything from data ingestion, data pipelines, number crunching, model building, all that kind of fun stuff is in the nine analytics platform, is I'm right, Yes, that's true. So we have everything from about the ETL part loading the data. We can access about four hundred different data sources. You

10:35

can access databases, strange file formats, ex. So of course we can also execute bits and pieces on different execution environments like doing the ETL directly insider database or in snowflake or in data breaks or in our loop in the old days. And then we go all the way to visualization the underlytics functionality and a lot of that as I said before, is of course based on each charts for the visualization, A lot of tything libraries c libraries. Our librari

11:01

is Java libraries. For some of the machine learning functionality, we have integrations with TensorFlow if you wanted to do that. We have integrations with the other deep learning libraries. We have integrations with XG boosts. Pretty much everything is in there. But the other piece is so often when people talk about data science ontainly mean this kind of from the data to the reporter, to the

11:22

endpoint or to the model. But the business hub then also covers the rest of this journey, right, deploying it to others, managing it, three training models when needed, and monitoring their their performance well. Right, because at the end of the day, you want these algorithms to connect into your business, whether it's MOO or for manufacturing or supply chain or whatever it is. You want it to affect some outcome in the business, and so that

11:50

involves connecting to operational systems, right. It involves connecting to EERP systems or CRM systems or things of this nature. That's where the magic happens. And a lot of times that's the hard part, right, I mean, I've heard many stories about models that just don't get deployed because maybe the companies didn't

12:07

have the wherewithal or they didn't have the expertise to do it properly. But being able to plug the algorithms into operational systems and then monitor how those models perform and switch them out right, because you've got your production model and have your challenger models that are sitting at by the wayside waiting to get pulled in and being able to switch over to a new model when a model that's in production starts faltering. That's a critical piece and that I guess is that done

12:35

in business hub with you folks. That's some of the business upside exactly. So as business up you can deploy models which really are deployed nine workflows. You can deploy them as a rest service or as a bad application, and then people consume it, and you can constantly monitor what's happening in production and then potentially replace we train or just alert the data science team and just say hey, this is so out of backwith reality. We don't really know to

13:00

fix that, do something about it. The nice thing is that you don't have to switch code in between right, And the old days was always somebody coded the model of the strains in some strange language and then it was reprogrammed by it into some production language. On our case, on the hubside, the workflow that was trained is also the one that runs in production. That's

13:20

interesting. So one of the other hurdles that people are running into is when they use Jupiter notebooks to write their model or to build their model to test with data, and then they want to go put that into production, and it's just this step by step tedious process of copying over code and values and all these things, and that falters often. That, from what I understand,

13:41

is a really serious problem. But I guess do you not have that challenge because you're not using the Jupiter notebooks typically, and people are just in the environment in the analytics platform building out their models after they pull in their data, et cetera. So you're already production ready when the process begins.

13:58

Is that about right? That's a very nice summary. Yes, So what you use, what you use on the creation site when you actually train the model is exactly that piece of the workflow gets then moved into production and executed by exactly the same engine. So you don't also have a translation issue there.

14:13

The other piece that people often lose in this going from training to production is all of the feature engineering that you did, all the feature transformations that you tend to lose them, so you only can take them model and move that in production, but you can't do the transformations. And in Lime you can grab automatically the part of the workflow that has the transformations and the model applying the model, take that, grap that automatically and deploy it to nine

14:39

business up. Wow, that's pretty cool. And you cover also two different industries, right, so you do insurance, healthcare, would imagine financial services all sorts of different industries because it's more of a horizontal solution instead about right, Yes, that's absolutely true. We have customers and users in pretty much

15:00

industry. Yeah. Well, and we're going to talk about large language models here in a minute in our next segment, but before we get there I'll just throw out one of my theories to you and see what you think about this. To me, this explosion of AI through foundational models, including large language models, is really a major call to action for organizations to get their

15:22

data house in order. And what I mean is that data governance. If you don't have data governance, if you don't even know what data governance is from an organizational perspective, you're going to have a hard time responsibly leveraging AI. Would you agree that companies really do need to take a very hard look at their end to end data management life cycles processes, understand governance, who gets access to what data? Even understanding a broad inventory of your data sources?

15:50

Would you agree that is paramount to do that before pulling the trigger on some AI. I totally agree. And the funny in the way, it's funny that they've been preaching this for also data science processes for a long time. This government a topic and nobody really cared, and now people really cared that by breaking up. Isn't that fascinating? I mean, I just I've

16:11

been in this business a long time. I've been talking about data governance, analytics, AI, all this stuff for twenty years, right, And we talked about data governance twenty years ago and fifteen years ago and ten years ago, and basically nobody was doing it. I mean, you couldn't even it wasn't even easy to do because you could either control access at the database level, which is hard to access controls or at the application level, but there's

16:33

nothing in the middle, and really it's in the middle. And now with the cloud, that's one of the nice things about the cloud is that it is this de facto marshaling area for functionality and data. And now we have the capacity to apply very fine grain controls on things, on data sets, on types of data. For example, we can scan and find PII and then know, okay, flag this as sensitive. There are lots of things we can do these days that we just kind of couldn't do ten years ago.

17:00

Real quick, one minute, what do you think is that about? Writer? We finally can do this stuff and so we are doing it. What do you think I'd probably explained it slightly differently and say we could have done it probably before as well, at least some of those aspects, but people just didn't care enough because there was not enough arm in it. Now But now that everybody who does anything with JENNYI isn't the danger of sending data

17:23

anywhere? People are really really baking up and seeing the pain there. That's right. Well, it is like I say, it's a call to arms, it's a call to action. As good organizations have got to do it because you don't want to wind up in the crosshairs of an audit. You don't want to wind up with a breach. You don't want to wind up getting sued by someone because their information has now been leaked to sensitive the sensitive

17:45

resources out there. Well, folks, don't touch that down. We're talking all about AI and analytics platforms, and next up we're going to dive into these large language models that are just taking the business world by absolute storm. It's really quite fascinated to watch. But don't shut up. That will be right back. You're listening to Inside Analysis. Welcome back to Inside Analysis.

18:11

Here's your host, Eric Tabanac. All right, folks, back here on Inside Analysis talking to Michael Berthol. He is the CEO and founder of a company called NIME. That's k n I M look these folks up online, an open source analytics platform. It's wonderful stuff. It's like a giant candy store for analysts to go play and have fun. But I want to talk to you about these large language models, Michael, and in particular, first of all, the open source side of the equation. So Meta comes out

18:41

with LAMA and LAMA to open source. Open AI used to be open Now it's not. Now it's the ironically named open ai. Because it's a black box. And with the technology this powerful, I believe we need we need open source. I don't know that I would get behind a mandate that they must be open source, but there needs to be some transparency into how these things are working, just so that we can have our peaceful sleep at night

19:10

to know that there aren't bad actors involved somehow. I mean, certainly for regulated industries like financial services, if you bring it into some workflow for loan approval or something like that, then you have to be able to explain how you came up to your answer. But what are your thoughts in general about open source versus closed source? With these large language models, I think there's a lot of value in it. The problem is that, in my opinion,

19:37

there's open sourcing large language models. Isn't just about open sourcing the code, but you also need to open source how it was actually trained. So in the sense you also need to at least give open access to the data that was used for training. Because even if I give you a model and it was trained on half copyrighted material, that it's going to spit out again when you use it, you wouldn't know if it didn't have access to the training data. That that's part of that is is was it supposed to be

20:04

used? And then I think the other pieces that what some companies are open sourcing is only the code to use the model for predictions. Later to actually apply it, still don't know how it was trained. So that's the third

20:19

element that needs open sourcing. And then I believe one of the key proprietary ingredients that a lot of these companies now have is safeguarding code around it so that some types of answers don't get produced, some types of inputs aren't being accepted, and open sourcing that as well would really really reveal their secret sauce. And I think that's why they are the open eyes of the world are shying away from that one. Right, No, it does make sense.

20:47

I mean, we have proprietary code. It's not new, but again, these are very very powerful engines. And then there's another whole side of this equation, which is the RAG model retrieval augmented generation, which upon reflection, I believe will be the layer of functionality for governance or privacy to a certain

21:10

extent for security, for management. You know, a lot of that's going to get baked into the RAG model, where you could, for example, before you hit your prompt, before your prompt goes up to the large language model, have a layer in between the checks and sees. Okay, and this is already happening. Like I asked Gemini a couple of weeks ago how many electoral votes are in Georgia and Arizona and some other state. And I thought for a second, it said elections are complex and fast moving. We

21:37

recommend you use Google. It was a guard rail. That's a guardrail. They exactly built that in to say no, no, no, no, we don't want to touch that. Right, And that's in the RAG model. Right. That's not like trained in the model, that's outside the model. But it's the workflow you have around the engine that's very very important. Right. I totally agree with that. Then. I mean that's what I

21:59

called to say cards before. And I think sometimes it's probably not even part of the context that's part of the RAG models, but it's really part of some safeguarding code even around it. I mean we use that at NIME as well, so we have built in what we call KAI inside the analytics platform

22:12

that allows you to have a QAI model. You ask questions, I do this and this and excel, how does that look at a nine bog fram and then it gives you shows you a couple of notes in nine And we're of course filtering that these notes do actually exist because every now and then Open AI, which we use underneath the hood hallucinates and invense notes that NIME probably should have but we don't have it, and that doesn't help the pull use all. But that's a very very simple way of code around the KI that

22:41

is just making sure that what it spits out is reasonably useful. M hm. Yeah, that's interesting, and you're going to see more and more of these AI agents. That's what everyone is talking about now, are AI agents, which are like little bots, semi autonomous spots that can do various things, and they can check any other and they can do all kinds of stuff. I mean, it's very interesting to me when we talk about data science. We talked about it before, it all seems to be getting subsumed now

23:07

into AI. In conversations about AI, even though there are lots of different versions of AI, right, I mean, there are traditional models, regression models like all sorts of old fashioned aif you well, it's still very powerful and still works, but the new stuff is sucking all the oction out of the room. Isn't that about right? Yeah, we see that as well, And sometimes I mean, I'm an old guy, but now I've seen this in the past. Right when back propagation came along, everybody was suddenly

23:33

using grade intocent for every problem. We just thought, hey, you can solve this directly, you don't need to do grade into cent. Then there was support vector machines, and then it was somebody else, and now then we're steep learning, and now it's AI. So sometimes we see building workflows for even very simple things. They're reaching out to some AI and we just say, hey, there a no dennment that does that computationally a lot less expensive. I don't use that. So I think, to me, it's

23:56

it's a bit of a hype right now. It's just a new on the block. Everybody wants to play with it and use it. But the augmented really mixing it and matching it right with traditional techniques, I think that's where the true value lies. Yeah. Well, and so I'm just guessing here that one of the nice things about your platform is that it is an end to end platform for building models, designing models, training models, pulling the

24:21

data in all these things, and it's adjacent to this business hub. So you have a marshaling area for ideas and for testing algorithms and for testing models. Then you connect it through the business hub and see what happens and see how it operates. And it's important to have this one environment where that takes place because when you have multiple tools, it just takes longer and it's disjointed and there are connections between the tools and things change, so it's important to

24:49

have that main marshaling area to it's like a giant analytics sandbox. Is that about right? That's a very nice subscription. Absolutely. I tend to say that data scientists doesn't necessarily need to know how the method does something, but it needs to know what the method does. So if it's reaching out to a Python library or in our library or Sea library underneath it, which is not that important, but you still need to understand what the method actually does

25:17

underneath to be able to interpret the results. It's a simple example. If you don't know what a regression coefficient is you won't be able to interpret it, but you don't necessarily need to understand how it was derived from the data. Yeah, no, that that's pretty interesting. Let me throw this concept at you and see what you think about it. I wrote up an article

25:33

just last week I guess about this. I was flying to a conference in Denver just thinking about these large language models and analytics and AI and all this stuff that have been covering for a long long time, and I thought to

25:45

myself about this concept I call the executive cockpit. And the idea is that I think very forward looking organizations are going to deploy a small language model that is aligned with their business, like if it's manufacturing or healthcare or whatever, in their data center, so on prem, possibly in the cloud as well, but I have my thoughts wrapped around this on prem small language model.

26:08

Then you're going to train it on your ERP, on your salesforce, on your CRM, on your customer support for example, your tickets, like any of your core enterprise systems. You're going to train this model on your data, on your business business data. Then what you'll do is set up COFA topics coming from those systems into a vector database adjacent to this interface for the small language model, and that is where the executives will spend their day running

26:37

their business, because then you could ask any question at all. How is our marketing working in APAC? Who can we let go if we have to save some money? What where are we weak in our organization right now? Just all kinds of different questions and you'll get all these answers. And I actually mentioned to a CEO of this one company because I was trying to get him to help me do sales enablement for them, because I have this big

26:59

audience that have been marketing to for years. And one person turns out to be the next deputy Chief Data officer for the IRS. And I sent this email saying, Hey, this is the lady I've known for a long long time. This is what I mean by sales enablement. Do you guys have the IRS account? And if fireback, he said, I don't know, I don't know if we have those accounts. I thought to myself, well, you would know if you had the executive cockpit, because you would just

27:22

ask it, do we have the IRS account? Who is the account rep? What's the latest of this account? Because you're getting information from all these systems in your private environment. But what do you think about this concept? Is that is that doable? Is that pie in the sky, or what do you think about all that? It's an interesting idea I thought about similar. I mean, at the end of the day, you're personalizing a large

27:44

language model around your own infrastructure in house data. I think the challenge there is that in order to get a really really good model like that one that's really useful, you need to train it on a lot more data than just

27:56

your own. So in a sense, you need to benefit from your competitor's data without actually seeing that, but kind of learning the general structure and the general insights, and then you customize it on your own, which in return kind of means that you should also be providing your data to other organizations. It's almost like that's kind of pre competitive training of these models so that they're

28:19

useful for everybody. I think just training it on your setup you need some bigger context than that, or maybe you're a company and you have enough context anyway, But for every small company, I don't think you have enough data to really get meaningful insights. That's very interesting. That's a good's a. That's a good point, because I'm just I'm wondering to myself and I'm gonna

28:41

throw this one at you too. So one of my AHA moments with these large language models is when I realized that when you train them on a corpus of data, they're not actually persisting the data verbatim. It's not like they're taking strings of text and storing it in a record somewhere, but rather, in the training process that data you use will adjust the weights and biases and

29:02

the parameters of the model. So in other words, it's like, huh, well, that's very interesting that it can train in that fashion and then reflect back to you such remarkably granular detail about things. And you know, what I've seen is that if there is a subject area that has been published about widely, like how computer processors work, or how an irrigation system works, anything that has a lot of content on the web that these engines were

29:32

trained on, it does very well. It knows all that stuff. It's when you get to the fringe where it's not that much published. And I guess that's kind of your point about having enough data to train the models. If you don't have enough, you're not going to get the contours right, and it's going to be skewed in one direction or other. Is that about right? I think that's a very good summary. The contrast is right, I mean a coneague of mind want summarized. Since it's essentially it's a consensus

29:55

engine. It's getting the consensus around what a computer programming is. Don steff from the data and can repeat that. But if it's just one isolated outcome, it's probably not going to be able to recall that one. Interesting. Yeah, So Craig schmid Huber I think his name is. He's the guy who wrote the papers on the transformers, and he's based I guess he's actually in Saudi Arabia these days, but I want to say he's German of German

30:19

origin. And I was amazed when I realized he wrote those papers in like the nineteen nineties or something. And it's just just now we have the compute to be able to can you explain that? Is that what happened is that just the timing was right now to be able to understand this and put it into play, because that was one of the big changes. And now it's able to see, like you know, ten twelve tokens left or right as

30:41

opposed to just like two or three. And you also have this, like you say, like a consensus right where so they are like I call it almost like an ai Greek chorus where one is saying, I think it should be an A, I think it should be a B. I think it should be a C. And then the thing, Okay, I'm gonna pick this one. That's very interesting. It's a very interesting development. But why do you think it took so long? Is it just because we now have

31:03

the compute to do that? I think it was a compute power issue as well. And then some science tends to have a little bit more of an leads a little bit of time before it truly has an impact, but it mostly waiting for complete power. I don't know. One way of looking at what this consensus really does is I don't know if you watch these YouTube videas about JGBT playing chess now, and the interesting part is that at the beginning

31:26

it does extremely well and does very sensible things. And part of that is these opening libraries are all over the place, so that's extremely well established consensus. And then somewhere in the middle it starts inventing bizarre moves and suddenly new figures pop up on the on the board out of nowhere, right, and it has always meaningful explanations for that. And the problem there is that that data is so sparse that there's no consensus to learn. So at the beginning

31:52

it sounds it almost looks like it understands chess rules. But the only reason it does follow the chess rules is that they're so deeply ingrained in all of the common material that you see that the kind of the likelihood of going outside of the world book is too small. But somewhere in the middle of the game it goes completely off the books. That's wild. So one of my good friends in the business, as a gentleman named Usama Fayad. You may

32:19

have come across from at some point. He was the first chief data officer for Yahoo way back in the day, and now he runs the Institute for Experiential AI over at Northeastern University here in the States. And I had him in the show, and he's very funny, he's very candid. He said, these are large models. They're too big. They're not supposed to work. We don't know why they work. What are you talking about this guy who runs this whole operation. He's joking, we don't even know how they

32:44

work. I mean, how's that for transparency, right, I mean there's some truth. Right, We don't really know how they come up with these answers. Right, it's a wild mix of it. It's a highly distributed model. We don't know why a particular answer comes. We can come up with of proxies for an explanation by begining with the inputs and trying to figure out what happens, and we can say, ah, this probably had a lot of influence on the decision, but we don't know for sure. That's

33:08

so wild. I mean, that's just such a big deal that you know, but we do. So now we have all this observability in the data space, right, You've got data Relic and data Dog or new relatives to say data. All these different companies are doing observability, which I think spun out of Kubernetes primarily. But it's very interesting, and we need that kind of observability in these large language models. I think, I think that's going to be one of the keys to success. But folks don't touch that.

33:34

Dot will be right back. We're talking to Michael Berthol from NIME on Inside Analysis. Standby, welcome back to Inside Analysis. Here's your host, Eric Tavanaugh. All right, folks back here on Inside Analysis with Michael Berthold, founder and CEO of NIME k n I m E look them up online And Michael, I was mentioning to you in the break that I'm wondering to myself this whole business intelligence industry and there are hundreds of players these days, hundreds

34:10

of companies doing some form of analytics. Of course, NIME is a whole analytics platform and open source analytics platform end to end, but there are lots of point tools, whether it's visualization or number crunching, OLAP, roll, app all this kind of stuff, and I wonder is all of that in the crosshairs of these foundational models? What do you think? That's a very

34:31

very interesting question, and we of course asked ourselves that as well. And I think for some of the some of the tools that you mentioned, like generating visualizations, that type of stuff, I do think they are pretty replaceable by AI type models because at the end of the day, you're doing something, you're generating a code that generates the visualization based on data and you judge the output of that code by just looking at it and saying this is quite

34:57

right. So I think that type of stuff will go away. And we have in NIME actually built in it used to be an each chart scripting editor that has now an EIE element and you don't need to touch the code anymore. So those types of still I believe will go away. The eye tools trying to really find surprising, interesting new insights in data, I think that type of stuff is a lot harder to replace because fundamentally you're trying to find

35:22

something new. And like we discussed before the break, these GENEI models are consensus engines, right, so they kind of try to gravitate towards something they've seen more and more often before. Interesting. That's right, That's an excellent

35:37

point, really, that's that's exactly right. So it's good for understanding the well trodden path basically, Like that's what it's very good at doing is saying, okay, there's a high way, it goes that direction, but I want to go wandering around the forest that It's not as good on the fringe basically, so you will use it. But I mean, so I read an article some guy on LinkedIn talking about how he connected I don't know by ODBC or JDBC or something in his model with data sources, and he asked

36:07

it to queer the data source and it did. It reached into the database, pulled the information out, and delivered it. And you're like, Okay, that's pretty interesting. And then when I think to myself about what's what could be happening here? Is in the data warehousing space, for example, we move so much data around. It's all the data that's from your core

36:25

systems that you've decided to put in, which is a tremendous amount. Very little of that data ever gets used a lot of times it's the summaries or the aggregates or the roll ups that are used for various purposes, but a lot of it just doesn't even get used at all. And I think that what these large language models are going to do is kind of turn the entire model inside out of how we viewed moving data and analyzing data and doing things

36:51

with data because they don't really care. They're just going to once they're trained on a certain space. And again, if you train it on your data or if you're your vector database, you have a lot of embeddings of your corporate data and you point your RAG model there. Well, you can get answers to things very quickly that before would have required running reports and doing ETL

37:12

and doing all this stuff. And I think that in many use cases these models are going to short circuit all that stuff and you're just not going to have to do as much of that stuff anymore at all. But what do you think about that? I think there's some truth to it, because fundamentally, what these models won't necessarily do is actually look at all of the data, but they're going to apply a lot of common standard practices to that.

37:35

And standard sounds a little bit too limiting. I think there's a huge wealth of standard practices that people do apply to the data, and that's part of this consensus engine, and so that the AI models will try out a lot of those things a lot faster than you ever would. So, absolutely, and there's a good chance that some of these insights that will be generated are interesting to you. But then continuing the expiration and saying, I mean,

38:00

how do they obvious? Say? The Eureka moment is usually preceded by oops, that's strange. I think you'll have these moments, right, and AI doesn't do that. AI doesn't say this is weird. I should diggle in a little bit deeper because that's outside of the consensus. So it will continue doing kind of like you said, will will continue the normal path. And that's where I believe the human intuition curiosity oops detection capability is going to be

38:32

relevant for a long time. I like this oops detection. That's good stuff. Well, there was a gentleman I had on the show years ago who did something. He said something a lot like that. He basically said, AI doesn't have to be the ability to be like, hmm, that's kind of weird. What's going on with that? Right? Because it's just processing information and doing what it's been told to do, which is just reflect backwards

38:54

based upon a prompt and its training. It's a very simple thing. I mean, it's very complex in terms of how it got there, but nonetheless it you know, one thing that did annoy me, I will say, is in the early days when that New York Times reporter was getting deep with the with chat GPT and trying to like tease out of it, whether it's

39:14

sentient or something. I'm like, dude, that is a misuse of the technology like that is not whether you should be using this thing for to try to like what trick it into revealing that it's really alive and what you know, what are you even talking about? And I think that's part of the

39:29

downside these days is that. And I'm a media person myself, but a lot of times the media will just sort of glom onto some narrative about something and it's very hard for them to decouple from that and get down to brass tacks. And that's what we do in the show. In fact, I used to say at the beginning of every show, the show, it's all about getting down to the brass tax of what actually happens in the data world

39:50

and what you do with this stuff. And I think it is important that people keep in their minds the purpose of this technology, Why are you using it? Where is it appropriate to use it? Where is it not appropriate to use it? And that's just basic common sense, right, Yes, I totally agree. I mean, it goes pretty much in line with also the European aii that they just passed. But I mean, if it's not mission critical, if it's not safety critical, you can trust a system that

40:14

is wrong in I don't know point one percent of all cases. If it's controlling nuclear power plants, better not be wrong in point one percent of all cases. Right, that's right. You got to watch out where it's So where do you see a lot of use case of your clients. I mean, obviously some of your clients are using large language models. Where are you seeing success stories in that space right now? So there's a lot of success

40:38

stories in other areas of the business, as you probably probably know. Undoubtedly, no checking legal contracts, doing marketing material, that type of stuff. There's a lot of value in applying JENEI on the data analytic space. Honestly, we don't. We see a lot of interest. There's a lot of people that say, oh cool, I can build a customized chatboard using name not really our core business. And then the real applications tend to be around

41:04

text processing, which is where Jennai is really strong. And then instead of using outdated antique libraries for sentiment analysis or text segmentation, you're just handing it over to an AI model and say, hey, segment this, or extract the key components or create a summary. But that type of stuff, it's amazing. So I see I'm also as image mining extensions. I think that's the next setup where we can use image processing capabilities of Jenai for a lot

41:34

of the number crunching. I mean, we've all seen these cases where you can't add two numbers. Doesn't really know what the prime numbers. It misses the understanding of the concept of a number, right. So there. I think it's more as a tool to help you build workflows, build dataations, but only as a helper, right. So that's actually an excellent point I wanted to get into. I believe that we're just scratching this surface of using

42:00

these models as a component in a workflow. So you mentioned, for example summarization, that is hugely powerful. I mean, you know you can enter especially for policies, for complex policies for law, for example, for legal protocols and when to file motions, what motions you can file, what you have to do according to that. I mean you used to have to pay lawyers a lot of money to tell you that stuff. Now if you just get access to the rules, will load them into a large language model and

42:30

just start asking questions. That is an incredibly powerful use case because it used to take a lot of time to sort through the process of how to do something. Now you just ask it, how do I fire someone? First step, send them a letter saying they're not performing properly. Second step, you know, monetor their behavior. Third step. With all this stuff, it's like there, it is like wow, talk about saving time. I

42:51

mean, it saves time. And here's my other big soapbox issue. It improves morale because nobody wants to spend their time scratching their head reading through just full documentation. Nobody likes doing that, nobody, So all that stuff is going to go away, right, what do you think? I totally agree. Yah. Your examples center a lot around firing people, by the way, but I think I tend to say when people ask me if Jenny, I are going to make data science lives easier, I say no, I

43:19

don't think so. But it's going to make it nicer because it's going to remove all of that boring stuff and we can now focus on the really interesting but more complex stuff. So it's going to make it more interesting, more complex. That's interesting. Yeah, wow, And I think you do want to document things, and you can have it document things for you too, Right, you can just throw a whole bunch of stuff and document this, okay exactly. I mean we we now have a component on the hub that

43:44

takes a nine workflow and explains what the nine workflow does. And we do that by just shipping it off to Jenna. It's perfect for that. Wow, that's amazing. All right, folks, will Podcast put a segment coming up next where you're listening to Inside Analysis. All right, back here on Inside Analysis talking to Michael Bertholdt is the founder and CEO of nine K n IME and Michael, I know what it's like in a software company. There's

44:08

always a roadmap. You're always working on something, and we talked about a couple of key things. Governance. There's model governance, there's data governance, there's it governance. What are you working on in the governance space? Thanks for asking that, Like, like you know, all the road maps are changing all the time, but what we're currently working on. We had actually this model government's topic on the work map. I've been working on that for

44:32

a couple of years now. So the idea of being able to monitor what models are doing automatically we train them. We talked about that and I think the first episode, but what we added now is the ability to also govern the AI usage of people that are creating nine work clows. So first of all, then somebody is creating nine work close using the NIE Analytics platform and uses this built in AI, we call it KAI for nine AI, we need to make sure that gets channel to an IT approved AI. Right.

45:02

Maybe that's just for expense purposes. You know, I'm going to have too much consumption in the cloud. Want to make that in house or it's really a data privacy issue, But the more worrisome part for people is that you I mean, one of the strengths of the NIE Analytics platform, the workflow concepts, is that everybody can use any technology they want, right, they can reach out to experimental libraries, they can reach out to our stuff,

45:23

to Python stuff, to whatever. But by now they can of course also connect to various different AI providers, and we need a way for them for central IT governance to be able to make sure that the nine workflow users inside the organization can only use approved AIS. So maybe the maybe marketing can use an AI in the cloud, but maybe legal shouldn't or HR shouldn't, right,

45:51

And that's something we have built in into the name hub. Now that we can limit the types of AIS you can their users can reach out to from nine under drix platform and they get to choose from one of the approved AIS that it central. I set up and said, Okay, here's an AI that's consumption light, that's for the easy tasks, here's the one for

46:12

whatever the tech team. Here's the one that's for compliant data. And we also allow they're set up on the hubside of safeguarding workflows so that you can before the data gets sent out to say a cloud AI provider, it gets screened for private information or maybe the data automatically it gets anonymized before it gets

46:30

sent out. Yeah, that's very important stuff. And the use are you also able to do some fin apps on that, In other words, see how much is costing to leverage this AI engine versus that AI engine and do some cost optimization. Is that something you can do. We can do that as part of a nine workflow and you could build that. You could build that in there as well. But we are currently offering to our customers the abilities to monitor consumption so they have a bit of an eye on that,

46:58

but it's not automatically. We route two different day eyes, but that's just an edit functionality under the good m hmm. Yeah, it's all in the workflows basically, And that's where we're going in the last segment, where I think that we're just the beginning of leveraging these technologies because what they're really very good at is pattern recognition, right even just the vectorization the embeddings basically how it stores it as a point in array basically, and just understanding how it

47:23

can map those two things. It's not just word not just text generation. I think we're going to get some really interesting things in terms of pattern recognition and then recommendations. I mean, I think that these little AI agents, these assistants are going to be extremely helpful in all facets of business. You know, to be able to very quickly give you a customer profile when you're on the phone with someone, or to be able to give you summarization of

47:50

text on demand. I mean, really, I think the hardest challenge is going to be changing mindsets and changing day to day behaviors and workflows. What do you think? Final thoughts? I totally agree with that one. We see that inside NIM as well. I Mean, the developers were the first ones that really said we want to use this, you want to use this, but then getting the rest of the organization to us seriously think about it's like what can HR do? But can legal really do with that? It's

48:12

a huge time saver and it's largely untapped. Totally with you as Jack and Dick, it's there. They're interesting times ahead. Yes, to say the least. Well, what a fantastic conversation, Folks, look them up online. Michael Berthold b E R T H O L D from nine K N I M E. We'll talk to you next time. Folks, you've been listening to Inside Analysis expect Redland's Ranch Market is a unique, full service international grocery store that specializes in authentic food items from Mexico, India, and from

48:50

many Mediterranean and Asian countries, including popular items from the US. They offer fresh baked items from their in house bakery, housemade tortillas from their tortill area, delicious array of prepared Mexican foods, a terrific fresh food and juice bar, and a large selection of meats, seafoods and deli sandwiches, salads and hellal meats. Their produce department is stocked full with fresh, local and hard

49:12

to find international fruits and vegetables that you cannot find anywhere else. Don't forget to step into the massive beer Cave and experience the largest selection of domestic, artisan and imported beers in the IE. They can also cater your next event with one of the delicious takeout catering trades of food. Visit them at Redlands

49:30

Ranchmarket dot com. That's Redlands ranch Market dot com. Redlands Ranch Market a unique and fun shopping destination to Hebotea Club's original pure powdy arcosuper tea helps build red corpuscles in the blood, which carry oxygen to our organs and cells. Our organs and cells need oxygen to regenerate themselves. The immune system needs oxygen

49:52

to develop, and cancer dyes in oxygen. So the tea is great for healthy people because it helps build the immune system, and it can try truly be miraculous for someone fighting a potentially life threatening disease due to an infection, diabetes, or cancer. The t is also organic and naturally caffeine free. A one pound package of T is forty nine ninety five which includes shipping. To order, please visit to ebot club dot com. T hebo is spelled T like tom, a h ee b like boy oh. Then continue with

50:22

the word T and then the word club. The complete website is to Hebot club dot com or call us at eight one eight sixty one zero eight zero eight eight Monday through Saturday, nine am to five pm California time. That's eight one eight sixty one zero eight zero eight eight to ebot club dot Com with sixty years of fascinating facts. This is the man from yesterday and back in time. We go to this time in nineteen sixty five Wednesday nights on

50:52

NBCTV Virginia, starring James Drury and Doug McClure. It's the only ninety minute Western on Prime on TV. Here in nineteen sixty five and from this time. In two thousand and seven, Nickelodeon teams up with Marriott International on a chain of kit friendly resort hotels that are going to feature elaborate water parks and live entertainment from the likes of SpongeBob, SquarePants, Dora the Explorer, and other Nickelodeon stars who observe it in Yellow and Forest. Something you Wish,

51:39

And back to this time in nineteen fifty six. He's one of the funniest Broadway performers and he gets his own TV special. Don't miss Victor Borg's Comedy in Music on CBSTV. Doctor. Yes, I don't like to be mosey. But who are all these folks? But I really don't know. You don't know. No, I have never met them. They are from the audience. Well, they were selected from before the show, from the members of our studio audience. That's right. You just seem to know better than

52:06

I do. With more at Man from Yesterday dot Com. Every Wednesday at three pm, it's The Uncommon Sense Democrat with host Eric Bauman. I love when his people talk about how old Joe Biden, but he's just a couple of years behind him. You'll get the best political commentary and stuff like this. Good night, don't well. Join us for The Uncommon Sense Democrat every Wednesday at three pm on the stations that leave no listener behind casey AA ten

52:42

fifty am Man one oh six point five at them. This Mother's Day helped fire breast cancer. Schedule your mamogram. Our sponsor JT Auto Repair and Body Shop of Sam Borandino, his family owned serve in the area with quality and pride. You're experts and everything automotive, whether it's engine works, break tuneup's, minor and major collision repairs, including a full service paint center. Stop by two one sixty six South Guardina Street in Samberandino or call nine oh nine

53:09

seven nine nine five one nine nine. That's KT Auto Repair and body shop for route supporters in the battle against breast cancer. What is the Del Walmsley Radio Show? Welcome to the Del Walmsley Radio Show where the high fends and the help begins. You need to stop being dependent on a paycheck. All these self help motivational people, they wind you up like a little clock click

53:30

click, and then they'll let you. Who is the show about. I'm your host, Del Walmsley, and as always we're working on your financial freedom. Learn the secrets of building wealth from Dell Walmsley weekdays from eleven am to noon right here on KCAA. Senator Joe Manchin, a one man political steamroller in Washington, is a Democrat except twenty's nine, which is most of the time a multimillionaire West Virginia cole executive. He's the darling of fossil fuel and

54:01

pipeline lobbyists and beloved by Republican opponents of progressive democratic policies. Indeed, he's funded by Republican billionaires. But Washington lobbyists and billionaires are not the only source of personal political power that allows him to hold office and block little Deed Democratic policies that the American majority wants and needs. Back Home, Joe has maintained tight authoritarian grip on West Virginia's Democratic party structure, rigging the rules to put

54:31

Little Joe's in each and every party position. In turn, this has long given Boss Mansion control over who gets to run as Democrats for down ballot elected offices in the Mountain state until June eighteenth. That is, that's when a statewide Democratic rebellion that had been organizing for six years elected its slate of over fifty candidates to oust the matchinites on the party executive committee, replacing all but one of the top party officials with grassroots activists. It truly was a diverse

55:04

people run victory. Selina Vickers, a social worker, was chief strategist, and Mike Pushkin, a cab driver, is now the party chair. Danielle Walker now vice chair of the Committee and the first person of color in state history to sit on the party's governing body. Summed up the significance of this turnaround. There's a new beacon of light shining down on the government, with people energized and ready to strategize with a return to the democratic process. This

55:31

is Jim Hitar saying. To help bring this kind of progressive reform to your local or state Democratic Party, go to Our Revolution, the one national group working on this fundamental democratic change to the Democratic Party. Our Revolution dot com KCIA Radio has openings for one hour talk shows. If you want to host a radio show, now is the time. Make Cacia your flagship station. Our rates are affordable and our services are second to nine. Broadcast to a

56:00

population of five million people plus. We stream and podcast on all major online audio and video systems. If you've been thinking about broadcasting a weekly radio program on real radio plus the internet, contact our CEO at two eight one five nine nine ninety eight hundred two eight one five nine nine ninety eight hundred. You could skype your show from your home to our Redlands, California studio,

56:23

where our live producers and engineers are ready to work with you personally. A radio program on KCAA is the perfect work from home advocation in these stressful times. Just type KCAA Radio dot Com into your browser to learn more about hosting a show on the best station in the nation, or call our CEO for details to eight one five nine nine ninety eight hundred. You like to safely leverage bank money to earn double digit returns income tax free, with guarantees and

56:52

no downside market risk. How can you do this? This is Farence, host of the Your Personal Bank Show. One. You fund a high cash value policy one time to earn dividends and interest. Two establish a bank line of credit using the cash in your policy as collateral. When you earn more in dividends from your policy than the interest the bank charges, you keep the difference, and the difference is average two to five percent annually in your favor

57:16

for the past forty plus years. Three the bank funds contributions years two to twenty plus. Each year the bank adds funds, your rate of return increases. Your average rate of return can grow too strong, double digits annually within

57:30

a few years. Contact me at your personal bank dot com your personal bank dot Com or eight sixty six two six eight four four two two eight sixty six two six eight four four two two for more info, or tune in to the Your Personal Bank Show. Your Personal Bank Show airs Tuesdays at four pm right here on CASEAA ten fifty AM and one oh six point five FM, the station that leaves no listeners behind NBC News Radio. I'm Chris Karashio.

57:59

The President of a Rand's condition is unknown after his helicopter crashed earlier today. Local media reports that the helicopter was also carrying Iran's foreign minister and suffered what was described as a hard landing. No other details have been released. The White House says President Biden has been briefed on reports about the crash. Senate Majority leader Chuck Schumer said intelligence officials have told him there is no evidence

58:19

of foul play. Former President Trump's New York criminal hush money trial is nearing an end. Things will resume tomorrow and ex Trump lawyer Michael Cohen is expected to be in the witness stand again for more cross examination about hush money paid to porn actress Stormy Daniels. The judge has told defense lawyers and prosecutors to be ready for closing arguments. As soon as Tuesday, but that depends on

58:38

whether Trump testifies. If he does take the stand, that could delay the close of the trial by a few days, but the jury is expected to get the case this week. Senate Democrats believe President Biden is ready to debate his political rival Donald Trump in June. Maryland Senator Chris van Holland on ABC's This Week pointed to Biden's State of the Union address in March, in which Van Holland said, the president came out swinging. It's not about the age

59:00

of the candidate. It's about the ideas the candidate, what they're going to do for the American people going forward. Trump and Biden have agreed to debate each other twice in the coming months, with the first one set for June twenty seventh. Trump meanwhile, is calling on Biden to take a drug test before the debate, claiming Biden was high as a kite during his State of

59:16

the Union address. Democrats want Supreme Court Justice Samuel Alito to recuse himself from cases related to the twenty twenty election, following the report that he flew an upside down American flag outside his home shortly after the Capitol riot. Pennsylvania Senator John Fetterman told cn in the State of the Union, the incident is bizarre and surreal. He said he doesn't believe, however, Alito will choose to

59:37

recuse himself. The New York Times published a photo taken by a neighbor of the flag outside Alito's home in Alexandria, Virginia, just days after a pro Trump mob stormed the capitol. The inverted flag has been associated with Trump's claims that Biden stole the twenty twenty election. I'm Chris Caragio, NBC News Radio, NBC News on CACAA Lowel sponsored by Teamsters Local nineteen twenty two. Protecting the Future of Working Families Teamsters nineteen thirty two. Dot org

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript