KCAA: Inside Analysis with Eric Kavanagh (Sun, 31 Mar, 2024)

00:00

Eight eight seven. Tony even stays awake all night, twenty four hours a day, seven days a week, so you can sleep better and rest easy. South Pacific sleep left, start feeling better and get in a great night of sleep. Today. You're on board caseyaas Inland Express caseyaa home, Linda en fifty am the station that needs notice here behind the information economy has a ride. The world is teeming with innovation as new business models reinvent every industry

00:38

industry. Inside Analysis is your source of information and insight about how to make the most of this exciting new era. Learn more at inside analysis dot com, Inside analysis dot com. And now here's your host, Eric Kavanaugh. All right, ladies and gentlemen, Hello and welcome back once again the only

01:00

coast to coast show all about the information economy. It's called Inside Analysis, or is truly Eric Kavanaugh Here with one of my best friends in the business, a guy who's been around we're a couple of years now and has made his way into all sorts of different data scenarios, data products, data concepts, data models, data designs. Mister Kent Graziano, I asked him before

01:22

we started, does that mean? Does that mean thank you very much in Italian because of course gratimile, but it means grace or beloved or deer, so kent deer gratiano. How was going out there, buddy? Good? How are you doing rich? I'm doing good. So yeah, folks, I had this idea to do a series called profiles in data, because it's all about the people when you get right down to it. I mean, of course the data is important, and the tools and the technologies and the

01:51

methods and how to do all that stuff. But at the end of the day, even in this era of AI, it's still about the people. And it's always going to be about the people. I mean, the jobs are going to change. I think we're going to go through a very disruptive period. It's already happening from what I've seen. It's it's already taking place. There are lots of reasons for it, but the one thing you can count on is the wisdom of sages, and so Kan't. I'm going to

02:14

throw you in that category as a data sage. You've been around a long time, You've seen the ups and downs, and we're in a very different world right now. You know, I'm going to throw kind of a curb. I'll let you to start. I feel like these large language models represent a very serious inflection point in the history of business, quite frankly, but

02:35

primarily of the data world. And i'll explain why. So in the very earliest days of DM radio, I remember wrapping my head around ETL, all this ETL moving around and around and around and around, all these batch windows being hit, and I thought to myself, this is crazy, like that. You're probably moving data that doesn't get used. You're probably moving the same

02:54

data multiple times. You're probably sometimes overwriting good data with bad I mean, and there just must be so many things happening here on the data front. But business just demands things. The business demands data, It demands execution, it demands performance. Business people say we got to get our data in they're okay, fine, let's move it in there. Oh we didn't get the schema right, and then things changed, right like with snowflake, and you

03:16

know a lot about Snowflake. Maybe we'll start there. One thing that I thought they really solved and really figured out was that schemas change because the world changes. So you want to be able to change the schema of your warehouse. Well, in the pre Snowflake days, that was a huge problem.

03:34

I mean, you know, you'd be asking for lots of trouble, a lot of money, a lot of very unhappy engineers, like lots of bad things would happen, and Snowflake's like, now, let's make it easy to pull this thing down, change the schema in the spin it back up again. It was it your take as well. That was a very big change

03:51

in the in the industry. Yeah, I mean the I mean Snowflake was that you had to use the standard phrase a game changer, certainly in the data warehouse world because of a number of the features that been voluntary architected into the platform right and you know what you're talking about there, which made a huge difference and still obviously does to a lot of people was their zero copy

04:15

clone. The ability to take an existing schema and with one command clone the entire thing and basically make a complete replica of it with no additional storage costs. That was the thing, because that was something I couldn't do in the data warehousing world if I wanted to have a dev instance that had the same

04:34

amount of data as my PROD instance. I never ever in my career was able to get that because the cost was too much and the time, you know, once you started getting into terabytes of data to replicate that on storage with the systems at the time, that could take days, weeks sometimes depending on the system, and you really couldn't get which meant you couldn't actually innovate with confidence that you weren't going to break something right right. You couldn't test

05:09

it. It's like, oh, the query works great. Yeah, well you're got one tenth of the data in your QA instance as you do in production. And then it goes to production and the users are screaming, going, I can't live with this, this is this is too slow. And now you're doing, you know again, additional iterations and refactoring to figure out how can I tune it to make it better, how can I make it

05:28

faster? And you know, the users are already mad at you because you gave them, you told them you had what they wanted, and they're like, well, this is unusable, it's too slow. And the zero copy

05:39

cloning thing that was like, wow, this this is awesome. This means I can actually not shoot myself in the foot before I move something into production and have to do these cycles again, we could do real user acceptance testing without this exorbitant cost or you know, waiting weeks for the infrastructure team to put more disc online and for the DVAS to you know, restore something from backup, assuming the backup tape was actually good, right, you know that

06:05

they could do all of that, So that that that really did change a lot for a lot of companies out there. Then, and of course the whole separation to compute from storage that allowed you to scale the storage if you needed without scaling the compute, and scale the compute without scaling the storage just made it so flexible. It just changed. It changed so many things. Then you throw in there the variant data type that allowed you to load Jason.

06:31

So you talk about schema changes. You know, we actually got real schema on read load in a Jason document into a table in relational database with the table access it was SQL. And you know, as your data pipeline is running and you got new data coming in, if there's a little change in the scheme, it didn't break anything. I mean, I'm sure you remember the days that you were talking about ETL. It's like the source system

06:57

changed. They dropped one column right, and didn't tell anybody because they're thinking about their operational system, they're not talking about the data warehouse. And then your informatica job blows up in the middle of the night and now everybody's screaming the next morning because the data hasn't been refreshed. You told us that'd be refreshed overnight and it's not been refreshed. What happened, Eh, you changed

07:16

the source. They changed the source system. Didn't tell us. Now we got to go back and re engineer the ETL and change the tables and the database, and you know, so much of that went away in the last Well see, Snowflake was founded in twenty twelve, so last ten years. I started with them in twenty fifteen, so it's been less than a decade, and you know, they solved those really big problems that we had in

07:43

data warehousing for multiple decades. Well, I mean you kind of hinted that the big downside of the old way, which is basically an opportunity costs around innovation because it was fragile, because it was difficult to work with, because changes were very hard, people didn't try to make changes. I mean, it's like a kid who keeps getting reprimanded for trying new things. They just

08:05

stopped like, fine, I won't do anything right. So the psychology of data use really took a whole new turn, in large part thanks to Snowflake. Now, of course we have data bricks, we have a whole bunch of other options of ways to do things. But the point is that now there is more confidence to move forward to try changes to do things because you're

08:26

not so worried about bringing the whole thing to a crashing halt. Right right, You're not creating instant technical debt by doing something right, and so you can take a more agile approach. You can be more responsive to the business and the business requirements when they want to change, with like yeah, without having to worry that, oh yeah, we're gonna break it. And then the tools have evolved so much too, between the data catalogs and the data

08:54

engineering tools. You know, the thing that we always needed was really good ima analysis reports. But to get the impact analysis reports, you needed something tracking the metadata around data lineage. You had to know where did that data come from and where did it go? We pulled a data element in from one source to a system, how many places did it go And if we changed that, how many things is it going to impact? Well, you've

09:18

got to have the lineage to do the impact analysis. And now we've got a lot more tools in er what we call our modern data stack that allow us to do that so much easier, so much quicker. You're no longer relying on a couple of data engineers to know the code inside and out. And go, oh, yeah, that's right. I remember over in that routine we did this, and we did this based on the user requirements. But then they had us do this in a different routine and yeah, if

09:45

we touch that, we're going to break three different things. You don't have to do that anymore. You can find it through the metadata. Yeah. Well, and you hinted it something else tonight. Throughout the largest language models to start us off here, and you think about the impact of these engines now, I think there were numerous other shoes left to drop in this equation, just one way of putting it, basically, because they are very powerful.

10:09

But you know, Sam Altman had that very curious comment he made a couple of months ago, say, oh, the era of large language models is over. Everyone was like, what, what are you talking about I think, No, I don't know. This is just my personal theory. But we've now seen that they've had some challenges people, And this is research that I've done and people have talked to have said, there are things that it did very well three months ago, four months ago that now has a

10:33

hard time doing. A good buddy of mine, who actually almost worked with open Ai, said that he looked at their models three years ago or so and said, you're gonna have all kinds of problems because you're not respecting entropy. He said, you're gonna have short term memory problems, long term memory problems. Basically, he blew it up at them, and they just said, all right, we'll get out of here. We're going to do it our way. We'll deal with that later. Yeah, we'll do exactly,

10:56

We'll figure that out later. And of course they rolled out and huge. I mean, you know, the friends I have at Google, I can tell they are stressed out. The company is very stressed out right now, which is kind of hard to believe because it's Google, Like, I mean, goodness, you're one of the biggest companies in the world. You change the world with the stuff that you did, and of course what happened.

11:16

Microsoft goes throws a whole bunch of money at open AI, then the opening eye board curiously kicks him out, and then such Inno Della hires him, and like, wow, that was one of the biggest tennis matches I've ever seen. Like what is going back and forth? Point being? And where I'm kind of going with all this is, even though there are some challenges, they're so powerful. You can load code and ask the machine to tell you what the code means. You can load a whole bunch of documents and

11:48

ask it to summarize. So I think those are the two better use cases. But kind of where I'm going is I see these llms, the engines of them at least, and maybe it's SLM small language models to become more prominent, really serving as a very powerful component in a workflow. So they're good at ETL. For example, I scraped a whole bunch of data off the CEES website because I wanted to get a clean list of companies from that

12:13

site. So I scraped a whole I did a just command a command C command V into a Google doc and then I pasted that in the chat into Gemini was barred back then, and I said, please give me a clean list of just the company names and all this text, and there's all this other text inside. It was like, okay, that just banged it out.

12:31

I'm like, all right, that's pretty impressive. You know that it can take an instruction like that ascertain I only want company names, know what those are in the context, and then bang out a whole list, And then I asked that to give me a Twitter handle for each one. It did that too, but half of them were bad. So it does make

12:52

stuff up. But nonetheless, I mean, think about if you'd seen this blob of text, you know what I'm talking about, just all kinds of other words in between places and tabs and commas, and is also kind of crap. I had no problem just stripping all that stuff out. That's a very powerful function, which would have taken a good ETL developer. I don't even know how long to sort through and to figure out it would be hard to do. Not for the llms. So what kind of impact do you

13:18

think they're going to have on the data management business? Writ large Well, I think that what we're seeing is the next step in data democratization right by wee can now with even what you described, you basically you took the data engineer out of the loop, right, You were able to do something very quickly on your own. And I think that's part of the promise and the power of these things is to allow business analysts now to do things with data

13:48

without having to decoders. Right, they don't even necessarily need to know SQL. In one of the recent demos I saw, I think it's called Cortex when the new sale featured, right, is like you can you can ask a question say I want I want the uh, you know, the average sales over the last three years for my top five regions, and it writes the sequel for you, right, And you know that's starting need to come out. And I've seen a couple of demos like that recently from the data

14:22

analytics perspective that now again you don't you don't have to know SQL. Uh, you don't even necessarily have to understand the schema of the database in order to get some basic analytics done using an ll m U, using a you know it's prompt. And my son is in school now and he he mentioned something about one of the careers that they've talked about in the career education stuff that he's going through at college was a thing called prompt engineering, right that

14:52

you could you could take classes in prompt engineering. It's like, well, what does that mean? That's okay? How to ask the question? So I think we used to call that critical thinking, right learning how to ask the question, But now you're going to learn how to ask the question of an AI basically right, and how you ask the question will have an effect on the outcome because again, the AI the LM can't read your mind and

15:20

it can only interpret so much. So being able to ask very clear, concise questions using language that the AI recognizes is going to be the skill, right, That's going to be the skill rather than knowing how to write, you know, select some count star group by right in order to get it in order to get the answer, which is so huge because and this is leading up to maybe segment two. I'm going to give you one of my big ideas and sear to think about it. But because now you've democratized access

15:56

to analytics, is what you've done. Basically, you had to know some sqel or have a good idea at least how to use the technologies to be able to build the queries. Now you can just ask it a question. Now again, it does make things up, so you have to be careful. You have to vet things. But we've always had to vet things. I mean, there are times when the data engineering was done wrong. So it's not like humans didn't make mistakes, and now the AI is creating some

16:19

new function called a mistake. No, like, it makes stuff up, and it has some problems, but the fact that you can sit there and just have such an interactive discourse with the data, to me is incredibly game changing. And I think it's going to put even more excitement in the workplace to use the data because a lot of times people don't use the technologies because they're hard to use, because they're slow, because they don't trust the data

16:47

for example, and we do have the trust issue. But this stuff is not hard to use. I mean, it can be hard to get really good at your prompt engineering, but even that, just to ask a different question, it doesn't get tired, it doesn't get annoyed at you. I will admit I tried Groc a little bit. I thought Grock was kind of annoying. That's the one that's in Twitter these days. It's supposed to be sarcastic and goofy, And I'm like, is there a big market for an

17:10

engine that gives me sarcastic answers? I just don't think so. I think, you know, maybe Elon smoked a couple of dubies before that one came out, which is fine. I mean, the guy's done amazing things. He's an engineer himself, and you look at the accomplishments this guy's had.

17:25

It's just absolutely shocking. But you do have competition out there. And I guess what I'm driving at is by democratizing access to systems of record with these llms, we are fundamentally changing the game about how people interact with data, how they're able to consume data learn from data. I'm writing an abstract right now, in fact, for the show I'll do with you on Thursday,

17:47

all about this concept of data literacy. And I think that just playing around with these tools connected to your data sources, you are fostering better data literacy because you're learning things about the data and you don't have to take a class necessarily, just have to dedicate the time to hop online, play around, click a few buttons, type into some questions and see the different answers, and you have to work with it to see how it operates and to kind

18:11

of know sort of the contours of what it does. But folks, don't touch out that. I'll be right back in a moment. We're talking to a hero of data. Ken Graziano will be right back. Welcome back to Inside Analysis. Here's your host and Eric Tavanaugh. All right, folks, back here on Inside Analysis with Kent Graziano, who has been a data avenger for years. Quite frankly, you've been out there on the front line. I remember I met you at the Data Vault conference a number of years ago.

18:48

That was fun. That was before the big COVID, And you know, COVID gave us a lot of time to think about stuff, for sure, and to sort of reevaluate what was going on. I'm sure there's a bit of a COVID surge in terms of innovation because people are time to think about stuff and do cool new things. And with that in mind, I'm

19:03

going to throw my big idea over to the sage graz Siattle. I want to get your thoughts on this, so I keep thinking about how executives use data and how we've used data historically, and if there are these long processes of building reports and doing all these different things, I think the llm's kind of turn all that on its head. And if you do it right, I think what's going to happen is companies are going to get their own idea,

19:26

to get their own private model. Maybe it's a small language model for a particular industry like legal or healthcare or retail, because they have their own lexicon, and so it's good to kind of weed out the semantic issues that you'll find with a large language model. That's kind of the point of the small language models. So you get that, and then you train, you get your vector database. You basically take your curated, trusted data and you

19:51

start feeding it into the vector database as you're embeddings. These are your anchors of truth. And what I think is going to happen is this is sort of the Valhalla, if you will, is that you get enough of your

20:03

corporate data in your embeddings. You've got your anchors of truth now, and then you set up whether it's COFKA topics or some kind of stream of changes coming through like CDC, basically into the vector database and what's going to happen is the executives are just going to sit there and have this amazing intern with tremendous amounts of knowledge at their fingertips. They'll be able to ask any number of questions, you know, how how are we doing this month? What

20:30

can I do to change? Who's really excelling in my sales team? Like all these kind of questions that you can ask and just get information pouring into them. And it's going to be very useful because it's connected to your data warehouse, to your CRM system, to your sales sporce, to your clickstream analysis, whatever it is that you've got, that senior executive is tapped into it and can now ask any question he or she wants. How viable is

20:52

that? Do you think? And is that the future? Yeah? I think you've hit on something there, because really this is, you know, the extension evolution. Maybe it's the the fruition of all the ideas that we had around data warehousing and decision support systems and advanced analytics and business intelligence in general. Right that to have that at the fingertips of an executive, that's that's where the value comes in, Right, is you've got to be able

21:23

to use the data to make effective business decisions. And I think that what you're describing could be the differentiator that allows an organization to get their competitive advantage. And specifically when they're looking at their internal data right and training a model on their data, if they do that right, that's the thing that's going

21:48

to give them the competitive advantage. Because so many things have been commoditized, right, and we've got lots of data out there that's being shared via data marketplaces, and sure, you're probably going to want to pull some of that in. But it's the combination of that external data with your internal data, your proprietary data, that's the thing that's got to give you the competitive advantage.

22:07

And if you've got like you said this uh basically online on demand uh executive intern data intern that's putting all this together automatically right through through a small language model that the executive got a question they just you know, like like Alexa, Well they'll probably give it. They'll have to give it their own name and say, hey, uh, I heard from my sales guy that

22:34

we're having a problem in the western region. Give me give me a summary of what happened in sales in the last three days in the West Region and boom, it's like and eventually it should be able to give us. You know, well, what's the implication of that, Oh, the trend. The trend is if the sales, if what's happening in sales in the West Region continues for X number of more weeks, you're going to lose this much profit, right, right, These are the kinds of questions that we built

23:07

data warehouses to answer. Yes, it's kind of where I'm going with this. Absolutely, I don't I don't think that data warehouse is going to go away, but it's to me, it has to get simpler, I think. And you know, so you look at the marketplace. What happened.

23:22

Data Bricks went out and bought Mosaic mL for like one point four billion or something like that when this was just taking off, and Terra Data and some of the others were still kind of laughing at the jen Ai stuff, which I thought was a bit foolish frankly, But those guys were laughing at the cloud originally too, if you remember, Yeah, well that's wrong, that's

23:41

pretty funny. It didn't work either, right, Well, I don't know, man, I just I keep thinking about the disruptive power of these engines. And again, yes, they get some stuff wrong. So do the old systems. The old systems got stuff wrong because their data was wrong or the model was wrong, and it takes even longer to find it and fix it in the old systems. Well, that's a different other thing, right,

24:03

that's the other thing. So it's like, here's what fascinates me is trying to figure out how do these models figure out what to choose when they give you their answer. Now, kindy that got bought by click I took a demo of their engine. It's pretty interesting because it connects to an LLM and basically you use it for like frequently asked questions for your manual for some new product that you bought, like the new iPhone or something, and you

24:30

load the whole thing in there. It will automatically give you frequently asked questions that it suggests and the answers, and you curate that and then when someone uses it, you'll get the answer you asked. But they'll give you a drop down that'll show you where it got those bits from, So it'll say the first sentence came from page three, paragraph two. The other one came

24:47

from here. I'm like, now that is compelling because now you're getting no annotation, right, that's the annotation that we were taught in our language. Our language classes, right in our English classes are learning how to do research papers. It's like, don't put anything in there that you can't prove where it came from, right, And it's and whether there's endnotes or footnotes, all those annotations are there, and that's yeah, that's part of what we

25:11

need. That's I guess the part of the QA on these things is can we can we see where this data came from rather than it even though it might be a black box that generated the result, it's still traceable to the source and say, these are the references, this is where these numbers came from, This is where this concept came from, this is this is how

25:32

we're coming up with this particular recommendation. And I think that's critical. That that is critical, you know, to differentiate between a hallucination and a good answer that you actually want to run your business off of. Right. Well, and that's the thing, right, is that to run your business off these things, you're going to have to have some certitude about what they're telling you. So that is a concern, But I just I feel like there's

25:56

going to be downward pressure on pricing data warehousing. But at the same time, you've got so much automation now, it's so much easier to do things than it was in the old days. I mean, I guess that one of the biggest hurdles to overcome here is just old fashioned mindsets. What do

26:11

you think? Absolutely? Oh yeah, I mean that's the Even when I started off with Snowflake, the most frequent question or might actually comment I got when I was out, you know, being the evangelist and talking to people about the separation compute from storage and zero copy cloning and all the things that we talked about. And the first thing is like, oh, that'd be that'd be awesome if it was true. And then they say, okay, well, well how do we how do we index our queries? It's like

26:38

you don't, Well, then this thing can't work. It can't work. There's no way it can work because you don't have Indexes's like, no, this is a completely different architecture. And it was getting that that mindset change.

26:52

Same thing when we went from Waterfall to agile, right, trying to think about how do we do a project in iterations, versus spending six months writing up the requirements document, getting everybody signed off, and then spending two years building it, or spending a year and a half doing an enterprise data model and then having to hand it to a DBA to convert it to a schema in a database, and three years later you've got a data warehouse that

27:15

has nothing in it because nobody cares anymore. Right, we had to change the way people were thinking. I think what we're talking about here, it's the same thing. It's the you know, we've talked over the last couple of years about companies about data literacy and data culture, and it really is the organizational culture and expectations that you've got to find a way to shift those right and get out of the well the classic and this is this is every

27:47

industry. It's not just it, it is every industry. Well, we've never done it that way before, or the reverse, Well, we've always done it this way, and I'm comfortable doing it this way, right, so I want to just keep going this way. I actually, early on Snowflake, we worked with a large retailer in England and they after their evaluation, decided to switch from one of the existing big data warehouse companies to Snowflake,

28:18

and several DBAs literally resigned and went and found other jobs. And this is in Europe, where you know, people don't change jobs hardly at all. Because they didn't want to spend the last five years of their career learning Snowflake. They wanted to stay with what they were comfortable with, which was

28:34

the older system. And they found another company in London that needed a DBA that had their expertise, and they figured they were just they were going to go over there and sail off into the sunset, that that was going to be less stressful for them than to stay where they were and learn this really exciting new technology that they literally left the organization rather than change the way they were doing things and thinking about things differently. Yeah, that's not uncommon.

29:06

I was. I was floored. I was like, you do they did? What? Okay, I'm done, I'm out, No, thank you. That's uh, that's wild, I mean, and that that's I feel like, that's where we are right now because of these lllms, I mean, so much is being shaken at the moment, and people stuff to do their jobs. These have to get stuff done. I mean, I've played around with these things and I'm really amazed at what they're able to capture. Even though they do hallucinate, even though they do make things up. It's

29:37

really impressive what they can find. And you just have to sit there and wonder, Wow, all of this stuff is embedded in these models, and it's just deep in there. It's gonna take us years to find out what's in there because there's so much stuff in there already, right, I mean, like, how do you even how do you begin? Yeah, and like everything is, you got to start one foot in front of the other.

30:00

But you've got to be willing to try. And I think that's that's the big that's the big lesson in all of this, and certainly, you know, even the last ten years with the you know, going from a dupe to cloud data warehouses like Snowflake and then data bricks, is you've got to be willing to take a look at these new technologies and think about it critically. Is like how can I take advantage of this technology rather than go, oh my god, I'm going to lose my job, right, They're

30:30

not going to need me anymore. It's like, well, yeah, they're not going to need you anymore if you don't learn how to do new things, if you don't learn how to use these new technologies, you know, figure out how how to apply yourself to the new technology with the knowledge that you already have, you know, whether it's you know, in the data world, we're talking about data management. You understand things like schemas and SQL and all that. Well, how can you apply that in the ll M

30:55

world and the SLM world? You know, how can you be of value to your organization helping them make these transitions to the newer technologies for the advantage

31:08

of the organization. Yeah, that's very interesting. I mean you mentioned earlier you schema on read, and now with these llms, it's almost it's almost like you can do that to a certain extent, right and schema, but not in the way that any of those sequel people ever dreamed of, right, right, So that we now have an a an AI that can read the schema right right, that that somebody doesn't have to look at the er

31:34

diagram and know how to write the joints. Now, the downside, you're I guess the caveat and all of this and you mentioned it kind of in one of your earlier statements about having your data warehouses and all that is. This presumes you've got a good, a well structured data architecture, that there's metadata at least that tells you how the data is related and what the data means. The semantics of the data you talked about, you know, the you know, having a lexicon of say a law firm. You know you

32:13

got that terminology there, well, you know the underlying data. There's got to be a relationship there between that legal legal ease and the data in order to be able to ask a question and create a prompt that will get you

32:28

the answer you want. And so it presumes it it's your data structure that is documented and that it's high quality, right, because the the ll M is not going to discriminate between a bad row of data and good row of data unless you've somehow programmed it to say, you know, ignore all the roles where data is null. Right, So you're you've put a data quality check into the prompt itself. Right. I know I don't want to do analysis on things where the sales date is blank, right, because obviously I've

33:00

got a problem. But that you would have to know enough about data quality and know enough about the data to ask a question that way. So otherwise

33:09

you got it somewhere. Somebody has to still do you know the hard part that we've always done there with with building that data platform and ingesting the data and curating the data to get you use that word curating curated data that we're going to use to feed all these things, right, And I think that's going to be that the next big push is figuring out these pipelines to feed

33:36

the l ms. Do you have to get your your sort of foundation in place, but then it's going to be updating these things and then monitoring over time. But folks, don't touch out. That'll be right back. You're listening to Inside Analysis. Welcome back to Inside Analysis. Here's your host, Eric tabnaugh show. All right, folks, back here on Inside Analysis talking to the one, the only. Kent Gratziano is going to be at Data Universe April tenth and eleventh in New York City the Javit Center. Don't miss

34:08

it, folks, It's going to be fun. Yours truly will be there and I'll throw one of my other curve balls at you. Can't just because you're such a good guy and I know you can hit good curve balls. What I'm going to talk about one of my talks is the death of journalism, as I call it, And what I'm really referring to is the fact that there's no media company who can stand up to these large language models and the power of this AI in terms of personalization, in terms of covering everything

34:38

that could be interesting to someone. And I think what's going to happen here is that the smart media companies are going to figure out how to do what we mentioned. They're going to get their own model. They're going to train their model on their voice by loading all the past articles as embeddings. And then what's going to happen is journalists are going to be more like curat and editors, and you're going to be spinning out stories from facts. You have

35:05

to have your fact sets. This is really really important fact sets. And that's systems of record. So I think like sales tax systems, for example, I think a municipality has to collect sales tax. They're collecting all this data from all the different stores and you determine what products are sold. Most have barcodes so you can be able to do some analysis. Wouldn't that be

35:24

a fantastic service to subscribe to as a business person. Let's say I run a small like an easy Mart or something like that, to be able to see who's buying what. And the example I give is squish mellows. Like one day you're just looking at your report, like, what are these squish metal things that everyone's making tons of money on? Look into this, Oh, there's some new toy for kids that kids freaking love. So now kids have to have every squish mellow under the sun babies, it is. It's

35:51

a new It's exactly what it is. It's the new beanie baby with lots of different sizes. My point is that it's not going to be so much individual reporters going out spending all day writing something, as it's going to be dynamically generated bits of information from systems of record, spoken in text that is, from a large language model that can be altered. Because now, like, it's like having a reporter that you can ask questions of all day long. No, wait a minute, tell me more about this. Tell me

36:19

more about that. No reporter can sit there and answer questions for everyone all day long you just can't do it. So I think there's going to be a whole movement of data engineering around media, of connecting to these large language models. And you still have reporters and journalists who will sort of review this stuff and write some of their own original content for their voice. But to me, there's no stopping this train. It's like a high speed train coming

36:43

at all of us. What do you think? Well, my one question there is like, but where how do we collect the facts to go into generating these stories? I like the idea of being able to ask the questions. And you know, right now we've got all kinds of clickbait headline, so it's like, you want to ask the question. You read the article and it doesn't quite tell you. It's like, well, right, where what was the vaccination rate in New York State between March twenty twenty one in

37:14

April twenty twenty two, broken down by county? And you might want to get that detail because you live in a particular county in New York. You want to say, well, what really happened? The headline says vaccination rates dropped fifty percent over what it was pre COVID or something like that. And you want to get the details, but the article doesn't quite cover it, but you want to ask those questions. But again back to what's the source

37:37

of the data though, where's that information going to come from? If you don't have you still have to have I think the reporters in the field.

37:46

Somebody has to go out and interview people somehow, and it might be via zoom like this, right that you find, Hey, here's a person who has information about something that just happened, and so we get them on a zoom call and we talked to them, and then the zoom AI transcribes that transcribes that conversation, so nobody's actually having to type it all up, right, And then that could be a source, I guess for the LM.

38:15

But somewhere there's still got to be that interaction with the external world somehow, right, Yeah, And I think that's going to be these systems of records. So taxes is one, because every county collects taxes. There are other sources for government, like the Federal Register for example, wherever actions are taken. So I think like your ERP system, basically that's tracking the movement of goods and the sales and all this kind of stuff, salesforce, That's all

38:42

I was mentioning. You want to tap into all these different systems and get a feed from them to be able to ask questions of your business and so of the government. You could do the same thing. And I just think about how much there is to be gleaned from these from these environments of ours. And you think about even policy documents. Right. A buddy of mine, Jim Harris, was saying he was talking to a friend who works for I think a Canadian government organization, and he was like, Oh, I

39:06

need to fire this person. How can I do that? I don't know what the paperwork is going to look like. And he said, just asking LLM. And it's like, okay, he asked, and it's like okay. Step one you have to file a letter of grievance and say hey, your job is not performing well at et cetera. Step two is to wait two weeks. Step three is do this and all these things like, because if you've fed all this documentation into a large language model, it can give

39:28

you a good summary. And so I mean, wow, think about, like in the court system, how you have all these different motions you can file and you have to found the motion on something based on this, based on that case law for example. I mean there are these stories of an attorney who just used whatever the LM gave him and got in trouble. I almost think that's law. Yeah, it actually referenced, it didn't exist,

39:53

right, and and that's a problem. Right. But a small language model that's trained on all the case laws, it's going to be a very different story. And now you can check things. And I mean you can even work that into the workflow to say, as your RAG model, check and make sure that these are actual cases at the So again that gets back to the workflow stuff. Well that I mean we've paid lawyers hundreds even thousands of dollars an hour sometimes to know the nuances of these codes and these policies.

40:22

Well that you know, you don't have to do that anymore. To me, that's just a huge change. It's I think about just you your manual, like your you know, your code of conducts or something. For these big companies to have like big thing or even the loss that they're passing these days a thousand pages. Oh we can't. There was a famous politician who said, oh, what's the point of reading it if you don't have two days and two lawyers, Well now you can. Now you can just feed

40:45

that sucker until on them and ask it all kinds of questions. What are the strangest things in here, what's the most expensive, what's the least expensive? Point being that is a massive gains changer for day to day workflow. And it's not just text generation, it's not just image generation. I mean, those are two categories of use case. But to me, the analytical side is the most interesting because it does have all these different ways of looking

41:08

at things because of all the parameters. Right, Yeah, no, I think I think you're You're not wrong on that. I'm still thinking about the journalism experience example, though, as to how that's going to work, Like you can see the other end of it. But I still say you have a sourcing issue. You've still got to have the humans involved somehow to get

41:31

the basic information for it. Though. I guess, you know, these days you could, I guess you could scrape everything off of Twitter because there's uh, you know, we talked about citizen data scientists in the past.

41:45

Here with democratizing data, we've got citizen journalists out there. Unfortunately, you don't know necessarily how accurate the reporting is, but I guess if you have something happening in the world and you've got you know, twenty or thirty Twitter feeds that are reporting it, that are that are not just re reposting what somebody else posted, but are actually taking pictures themselves. And now we've got ais that are now able to start doing things like analyze right, pictures right,

42:17

and say what's in that picture? Say these these five these five posts all have pictures supposedly of this event, and it should be able to overlay them and go, yeah, those pictures, they're unique pictures, but they are of the same area, and you know, you get the kind of the landscaping overlay of this line here that yep, that's that same building in the background, and be able to verify all of that, right, Well, that is a very very interesting point, and so kind of to that

42:46

end, I thought of this a while ago. And cameras, all these digital cameras, they have their own metadata, right they have. Some of them have GPS, so they know where you are, they know what time of day you took the picture. They and they have the medidated. I'm sure that's encoded in the file, like we talked about parquet files. Right, it's got the medidata baked in there so you can vet and see uha,

43:07

this one was doctor that one wasn't. But to your point, you could aggregate on demand when there's a big event, like when there's a riot someplace, or when you know there's a smash and grab or something, to be able to dynamically pick up little bits and pieces of that. I could see an AI engine writing a story there was a row at Bob's bar last nights and you know, twenty seven people were involved, Like you could get some juicy details and Bob Jones was taken to jail because you get his mugshot

43:39

or something. I mean, it is possible, But to your point, you have to have some source. So Twitter becomes a source, social media becomes a source. Official records like police reports and things of that nature becomes

43:52

sources, and those those things have to be accessible. That's a's the ither thing that the whole data sharing thing, whether it's a government source, linconomics data you were talking about, the tax data you're talking about, it has to be accessible somewhere so that so that the LLM can get to the data, or you could build a data pipeline to pull it in. That's right, No, that's exactly correct. And I think there's gonna be a lot

44:17

of pressure on those systems because when this stuff starts to take off. I mean, apparently that's why Elon instituted that rate limiting stuff for a period of time, because they realized that someone was sucking all their data out and using it to train other models because here it was a free source and you could use it. So he was trying to stop that. It's just crazy the things we're dealing with these days. But don't touch out. The podcast bonus

44:38

segment is up next. We'll be right back. All right, folks, time for the podcast bonus segment Here on a fantastic inside analysis with our profiles in data Sage Kent Gardziano. We've been talking all about the Data Universe conference. Coming up is the April tenth and eleventh dates just a few weeks away now. Will do a fireside chat with yours truly about privacy, which of course conjures up images of governance and security and data management and ethics and all

45:09

that fun stuff. And there's also the Data Vault conference coming up in the beginning of May, Is that right, Kent? Yeah, yeah, w W DVC. It's gonna be the tenth anniversary that flies when you're having fun, Oh yeah, when you're not in Stove. Vermont's beautiful, beautiful location. It is a little harder to get to than New York City, but in al for but it's it's worth it. It's worth it to get there.

45:36

Well, it's funny speaking of getting there. When we drove, when you and I met there, which is like I guess the year before COVID twenty nineteen maybe nineteen, and if we drove and you know, higher forces were watching out four of us because I'm trying to look at the snap like to understand what this is, like the straight thing we get and we realize it's a fairy, Like we better get that fairy because and we were there just in time to get the last one and across the southern end of Lake

46:04

Champlain. Yeah, I guess so, because we would have had to drive all the way rount it would have been a really you'll be glad to know there's a bridge there. Now it's there, Okay, there there is a bridge. It's a it's a Whitey drive through the countryside there in in the far east border of New York State, the very southern end of Lake Champlain. But there's actually a bridge that goes over now, so you don't you don't have to wait for a ferry. There's actually a real bridge that got

46:30

put in there. I must have been during COVID, because I you know, when I went after COVID and drove they say, oh, hey, this is great. And I used I used Google Google Maps, and I was just following the map going it looks like there's a bridge on the map. I say, okay, I'll go that way, sure enough, and it goes into southern Vermont. Well. And it's funny because, you know, in terms of following bots and just doing what the bot tells you.

46:54

I heard someone give a real good example and say you're already doing it when you use your map speature on your phone, just trusting what it tells you to do, and like, oh, here we go. And I think that is going to be a very interesting other change in our industries, these

47:08

little AI agents doing little things, just scanning around looking for stuff. I mean, you and I are to talk about privacy, and of course that again brings governance into the picture and ethics what you what you should do ethically. I think that there are all these conversations happening now because we need them

47:25

now, because guess what, this is a big deal. And you know, I'll tell you one of my other big visions here is these companies like Amazon and Google and Facebook and LinkedIn and others have done very well by using our data to train their algorithms, which of course they then use, they say, to give us better stuff. I've never yet bought the argument I'm going to get better advertising, like you know, I don't think I've seen

47:50

better advertising exactly. It's like, you know, really, is that that's the that's the caret the end of the stick here, as I give up all my data and then you I know that you're gonna give me good things that people are trying to sell me, and I haven't seen that. But I will say, there is some cool stuff on you know, some cool gadgets these days. So I've come across some of those on Facebook and some

48:12

other places, so there is some benefit to it. But I want to be able to own our algorithm, like everyone talks about being able to own your data and get some money like secondary income. I mean, I think that the numbers aren't quite there to really make that interesting, Like how much data will it be? Is that? Like how much Spotify pays small bands? I hear people joke, Yeah, I got my check for a dollar

48:34

forty nine in the mail, and how exciting is that? There is this scale problem, But I do feel like still all the power is centralized with these big organizations, and I'd love to find some way it's kind of pull that out and get some more power back in the individual's hands or even in aggregate you know, Like I wrote about what I referred to as a consumer

48:53

facing data leg of information about transactions. So you think credit card companies sell their exhaust data to investment banks so they can optimize what they buy and the stock market and other reasons like that. I think when they sell our data to some third party like that, they should also provide an anonymized version for this consumer data leg where you and I and anyone else could kind of log in and just see interesting data about what's happening in the world. How many

49:22

apples are being sold, how many cars are being sold. That's interesting information for business people, and this kind of gets back to the journalism thing, because, especially for business people, what I want more than anything is information. I don't really want attitude or spin or narrative or all that stuff. What I want is facts about what are people paying for this, what are people paying for that? How long does it take this company to do that

49:44

job, how long does they take that company to do this job. That's really useful information, and it's just facts. It's just basic facts that come from transactional systems. So that's my big vision for the future. But what do you think about that? Oh? Yeah, I'm with you because you know, Dan lind said, who invented data vault? He he used to say, you know, the data vault is a historical repository of the facts

50:10

from your source system. It's not a single source of truth. It's a single source of facts because you know, truth becomes malleable to a certain extent with your interpretation, right. You know, But to to create the information, for to be information and to be informative, it has to be based on facts. So you've got to start, You've got to set that foundation right. And so having a you know, a we don't have to have a single source of facts like we used to. That was our goal with

50:39

data warehousing was the single source of truth. You know the technologies. Well, not everything has to necessarily be in the data warehouse, provided you've got the data pipelines and you know the right business use cases for it, but you do have to have a source of fact in order to do this, because you don't want you know, guesswork going on, you know, unless,

51:02

of course you're dealing with customer sentiment, which is different. But then even that, getting customer sentiment off of say, off of x or Facebook or LinkedIn by people posting about your product, those are facts. This customer said this, right right, And from that fact and a lot of other similar facts you can start to evolve then an understanding of customer sentiment about your product, right right. That's so cool. I love that as single source

51:37

of facts. It makes a lot of sense because, and I'll throw one last teaser out here, I came across as stat as I was preparing for my talk on the death of journalism, and the biggest challenges are bias. Quite frankly, there's bias. There's also something in might like there's misinformation, there's disinformation, and there's missed information, and to me, that's the biggest one. Go ahead, you get I'll give you another one, poisoned information.

52:02

Poisoned Yeah, yeah, Dan alerted me to that one. There's been a couple of articles about that, about poisoned data. So think of it as reverse hacking. Instead of somebody hacking into your system and stealing your data, they hack into your system and inject bad data so that your algorithms come out with the wrong answer. Wow, that's crazy. Isn't that crazy? That is crazy. Well, it's been so much fun talking to you. Can't You're such a rock star in the industry. I've always looked to you

52:32

for advice on what's happening. I look forward to seeing you in New York. And it's a couple of weeks. Folks up online of Data Universe and we'll see it a couple of weeks. You've been listening to Inside Analysis. KCIA Radio has openings for one hour talk shows. If you want to host a radio show, now is the time. Make CACIA your flag ship station. Our rates are affordable and our services are second to none. We broadcast to a population of five million people plus. We stream and podcast on all

53:00

major online audio and video systems. If you've been thinking about broadcasting a weekly radio program on real radio plus the Internet, contact our CEO at two eight one five nine nine ninety eight hundred two eight one five nine nine ninety eight hundred. You could skype your show from your home to our Redlands, California studio, where our live producers and engineers are ready to work with you personally. A radio program on KCAA is the perfect work from home avocation in these

53:28

stressful times. Just type KCAA Radio dot com into your browser to learn more about hosting a show on the best station in the nation, or call our CEO for details to eight one five nine nine ninety eight hundred. I'm doctor Anthony Lyiserwitz, and this is Climate Connections. Plastic is used in many products, from containers and bags to electronics and vehicles, and it's a significant source of climate warming pollution. Plastic's production. At the moment, it's come through

54:00

to four point five percent of global greenhouse gas emissions. Livia Carbernaut of the Technical University in Munich, Germany, explains that almost all plastics are made from oil, natural gas or coal. Extracting and transporting those fuels emits carbon pollution. Then more fossil fuels are burned to supply the heat and electricity used to refine those raw materials and manufacture the plastic products, so the entire process is

54:25

very carbon intensive. Carbonnaout's research shows that in the past few decades, carbon pollution from making plastic has doubled because production has grown and shifted to parts of the world that burn a lot of coal for energy. So she says the industry needs to change. Making plastics from algae or plants can reduce the need for fossil fuels as raw ingredients, and switching to renewable energy can reduce carbon

54:50

pollution from production facilities. Also, for sure, we should avoid plastics whenever possible, like the single use plastics to help limit how much is made in the first place. Climate Connections is produced by the Yell Center for Environmental Communication. To learn more about climate change, visit climatec connections dot org. And now the Voices of KCAA was an exciting announcement. Want to hear NBC News or KCAA anywhere you go, Well, now there's an app for that.

55:21

KCAA is celebrating twenty five years in our silver Anniversary with a brand new app. The new KCAA app is now available on your smart device, cell phone, in your car, or any place. Just search KCAA on Google Play or in the Apple Store one touch and you can listen on your car radio, Bluetooth device, Android Auto or Applecar Play. Catch the KCAA buzz in your earbuds or on the streets. Celebrating twenty five years of talk news and excellence with our new KCAA app, Just do it and download it. KCAA

55:53

celebrating twenty five years. I'm Listening reminds you that talk saves lives and nine eight eight makes it even easier to reach out and talk nine to one to one for emergency services nine eight eight for mental health needs nine eight eight connects you with trained counselors and over two hundred crisis centers nationwide. Find out more at I'm Listening dot org. AM radio provides always on new sports, talk, traffic, and weather reports. It also delivers vital emergency information when your

56:21

community needs it most. A new bill in Congress would ensure AM radio stays in your car because when sale and internet services are down, this free emergency service is critical. Text AM to five two eight eighty six and tell Congress to support the AM radio. For every Vehicle act message in data RATESMA AMPLA. You may receive up to four messages a month, and you may text

56:39

stop to stop this message. Furnished by the National Association of Broadcasters. T Hebot Club's original purepowd to rco Super tea comes from the only tree in the world that fungus does not grow on. As a result, it naturally has anti fungal, anti infection, anti viral, antibacterial, anti inflammation, and anti parasite property. So the tea is great for healthy people because it helps build the immune system, and it can be truly miraculous for someone fighting a

57:06

potentially life threatening disease due to an infection, diabetes, or cancer. The tea is also organic and naturally caffeine free. A one pound package of tea is forty nine to ninety five, which includes shipping. To order, please visit to Hebota Club dot com. T hebo is spelled tea like tom a, h ee b like boy o. They continue with the word t and then the word club. The complete website is to Hebot Club dot com or call us at eight one eight six one zero eight zero eight eight Monday through

57:36

Saturday nine am to five pm California time. That's eight one eight six one zero eight zero eight eight to Hebot club dot com. Are you graduating high school soon and wondering what to do next? College is one option, but why not consider the high paying jobs made possible by union power Labor Union Teamsters Local nineteen thirty two is open to training center to get you into the high school to high paying job pipeline. You'll learn all the skills needed to excel

58:07

in opportunities across industries. Visit nineteen thirty two Trainingcenter dot org to enroll today. That's nineteen thirty two Trainingcenter dot org. Look around your office? Is it time to change things up? Start a new home office or reorganize your professional office space. Visit Office Furniture Outlet and Corona and you'll feel great. With a huge inventory of both new and pre owned office furniture you can buy,

58:37

sell, or even trade to get the job done. Office Furniture Outlet and Corona will get you looking and feeling good, and that simply means success and great business results. From executive office collections to home office options, you can find exactly what you need at an affordable price office furniture outlet. They

58:54

have desk selections that range from modern and contemporary to traditional and elegant. With the law large selection of sizes, finishes, and styles, you can design an office just the way you need it, an office you can be proud of. Pre office Finisher outlet and Corona is just south of the ninety one Freeway in McKinley at two eighty four DuPont Street, or visit ofousa dot com.

59:15

That's ofousa dot Com for the office furniture outlet in Corona today. For several years, KCAA has been marketing the Youngevity brand of nutritional and personal care products. Our experience with Youngevity has been one hundred percent positive, so we are pleased to recommend them to you. Regarding nutritional supplements, we recommend pollen Burst in the berry flavor and tangy Tangerine two point zero in the tablet form.

59:44

For regularity issues, we recommend three day cleanse, and for personal care we recommend morning hydration cream. You can shop online for Youngevity at www dot KCAA team dot com, or you can order by phone by calling eight hundre Ndred ninet eight two three, one, nine seven, and tell customer support that you are part of the KCAA team. Youngevity is an American company based in San Diego.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript