KCAA: Inside Analysis with Eric Kavanagh (Sun, 30 Jul, 2023)

00:00

Blandam the stitution that needs no miss. You're behind being innovation as new business models, we invent every industry industry. Inside Analysis is your source of information and insight about how to make the most of this exciting new era. Learn more at inside analysis dot Comside Analysis dot com. And now here's your host, you, Eric Kavanaugh. Oh yes, ladies and gentlemen, Welcome is

00:35

what the future is here? Already itched yet? That's our line from future Proof Your host here, Eric Kavanaugh, the only coast to coast radio show all about the information economy, and boy am I excited for today, folks. Last week we had some great guests, including Levine Rao, the CEO of Mosaic mL, which just got acquired by Data Bricks for one point three

00:56

billion dollars and one of the fastest deals ever made in our industry. Quite frankly, really impressive stuff, and Data Bricks is just on a tear. They also rolled out their own vector database. We'll talk about those on the show today. They're doing amazing things with their data warehouse and with data orchestration, and even this whole thing about Iceberg and Delta Lake and Hootie and the different possibilities. They solved it by just looking at the parquet files. So

01:21

look all that stuff up. It's a really impressive array of innovations coming out of Silicon Value these days, and of course all around the world. But our guests today knows a lot about all those things and also knows a lot about LMS, So it's going to joke anyone remember data science. We used to talk about data science all the time. Then we started talking about data match and of course data products, and then it's large language models and AI

01:42

has just taken the world by storm. Well, data science is still around, and we're going to talk about today is really automating the hard part of data science to get to the value quickly, so cost to value, time to value, get those really into a nice happy place. That's going to make CEO is happy, it's going to make customers happy. Or talk all about these large language models and ETL for LMS. So with that, let

02:08

me bring in Brian Raymond from Unstructured dot Io. Brian, you folks are doing some really interesting things out there, and I'd like to help our audience kind of wrap their heads around the magnitude of what's happening and I think you had mentioned to me that there's been something like thirty billion dollars of investment in lms in the last six to eight months or so, which is a pretty staggering number. That shows you that the VC community sees where the rabbit is

02:32

running now and that's where they're going, right. Yeah, it's an incredible shift, especially from where we were twelve months ago, everyone expecting a recession, massive tech layoffs, contraction on innovation budgets in the commercial sector, to the run that we've seen since the beginning of the year on the heels of diffusion models first last summer and then large language models, which is really sparked by the introduction of chat GPT last November, but now in dozens and dozens

02:59

of lms that are emerging every month. Yeah. And so our audience, our core audience who understand data warehousing and analytics and business intelligence, well they all know ETL extract transform load. That's been the mainstay of data warehousing for arguably almost forty years or so, since the earliest days when we figured out that you could not run analytics on enterprise systems like ERPs, for example. They're not designed to do that. They're designed to move stuff around and get

03:27

the job done. So we invented this whole data warehouse concept ETLT. You move things, but you're doing a very special kind of ETL and of unstructured data. So we talk all the time about eighty percent of the data the enterprise is unstructured. It's in word documents, emails, PDFs, PowerPoint presentations, etc. The structured data, well that's the numbers of how much product

03:49

we've sold, etc. What the cost was. The context for the structured data comes from the unstructured data, but historically it's been just of the analysts or the user to kind of piece that together themselves. But now with these large language models, in particularly with what unstructured dot io is doing, you can actually pull out this corpus of context and you also kind of clean it and refine it and get it to the textual value that's going to be used.

04:18

Right, tell us about that and how that works. Yeah, I mean we use the phrase ETL for ELBMS. That's that's a metaphor that breakly it's probably something closer to connect transformed stage and what that means in practice interests that we're building enterprise grade connectors kind of like what five trend has done an airbyte, but that are built from the ground up for files containing natural language dayler major repositories where those live. So think amazon As three as your blob

04:49

SharePoint. Wherever these these files contain natural language, they are are generated and then save down so we're able to grab those on the transform stage. Instead of mapping relational databases to one another, what we're doing is we're doing a couple of things. We're doing file transformation first, so those pds, pptxs, xls, whichever the file extension might be getting, extracting the natural enguage data out of it and rendering it into a common JSON format typically and then

05:20

on the staging side, getting it ready for the analytics downstream. And so that might be tokenization, vectorization, chunking scheme of mapping to the particular Jason scheme that's required given the task downstream. Either way, what's it has in common with ETL is that this is data engineering, and this is a data engineering in service of data science of machine learning and making this data available to models. And then we'll talk more about how it's being made available through training

05:51

or through vector databases. But that critical step of how do you connect to that knowledge where it's being stored, transform into a common format, and then get it ready is a very slow and manual process still today in twenty twenty three. Yeah, and you had mentioned something that I think is quite compelling that when you're extracting, let's say, from a word document or from a PowerPoint presentation, your technology has the capacity to identify headers versus subheads versus body

06:23

text, etc. And folks had probably seen this now. And there are some great algorithms in the engines that we use, like on Google and YouTube and other places where it'll automatically put tables of contents together, or you look at something like otter ai taking notes on your meetings and it'll say who said what when, and even distill that for you and do some analysis dynamically for

06:45

you, which is incredibly powerful. I mean, you think about all the ways we used to take notes and just try to keep track of what we had heard and go back to your notes and explain, well, now it's all recorded and it's all translated, and it's organized for you. And it sounds like that's kind of what you're doing with the Sunstructured data is you're able to pull out the value the signal, which is once you want anyway, and then plug that signal into the model to stage it somewhere to be able

07:11

to leverage it for some analytical purpose. Rights. That's that's exactly right, and in practice we think about it as both cleaning the data and so getting rid of unwanted artifacts off the OCR something use a operable character recognition or undergo

07:27

that file transformation process. There's almost always just nasty artifacts that data engineers and data scientsts who are doing this data engineering, you have to confine white space, unicode characters, sentence fragments, these sorts of things, so that you got to clean it up. But then you need to curate your data. And this is something that folks have been thinking about a long time. When you have columns and rows but with respect and natural anage processing. Suppose you

07:49

have an analyst research report. Everyone's kind of seen one of these states on Ford Motor Company over the last quarter. That analyst research report might GM. It may have a section, have another section talking about supply chain issues, but just like a meandering body paragraph, that's actually or the security and that's what you want to push through your machine learning model, your pipeline, or

08:11

make available to your large language model. And so in that example, how do you take thousands of these that all have different lived and you know, a dozen different file formats and curate that natural English data and so you can park get somewhere to make it available right now? The answer is, and historically the answer has been custom regular expressions, Python scripts for every single document layout. So it's a slow, painful process, and that's where we're Yeah,

08:37

that's really remarkable. And you know, when I think about topics like information life cycle management, right like information comes in, it gets used, it gets stored. I mean just even the basics of storage, like when I win. Over the weekend, I've been thinking about our conversation last week and thinking to myself, this is big in so many ways. I mean just d duping for example, just the number of duplicate documents that people have

09:01

in large organizations. What's the average like seven they say per documents, I mean, depending upon which study you look at. So that's a whole lot of wasted space and wasted time and effort, etc. And it's also just chaos there. I mean, basically unstructured data is chaos right now in most organizations, and you are providing a mechanism to extract a meaning from those environments and then feed them into these foundational models or large language models or whatever.

09:28

That is a huge deal. I mean, it's almost the kind of thing and I joked about this when I was writing an article about knowledge graphs that you know, so many organizations have had to choose do we actually make sense of all the stuff that we have, or do we just focus on the now and do what's needed at the moment. And of course they all do the latter and just sweep the rest under the SharePoint, as I joked about it, but now you'll be able to extract that value, and then you

09:52

could do even interesting things about the storage. Right once you have it out, if you're not required by law to keep these things, you can drop all that's or at least put it in super cold storage somewhere where it's not costing a lot of money. Point being this is it's a revolution of information management. What do you think. I think we've seen kind of three stages fold since last November and so, especially with respect to large language models.

10:18

The first was, hey, let's just use what's in memory let's just ask chat GPT questions and see see how far we get. And that was incredible. Um, and then around the December January February timeframe, folks started saying, hey, what if we connected these with external data right and made it available either by retraining models like bloom which was one of the first open source

10:39

elms, or putting in a vector database. And so they grabbed data mostly off the Internet that was already in a fairly clean format that you can get over API and made it available and built some really cool applications on top of that. And then as we progress throughout the spring, folks were like,

10:56

you know, what if we're actually going to deploy these things. If we're gonna you know, the my T study, Goldman Sack studies showing the productivity promise of these, the productivity that, you know, the promise, the productivity gains with these, we're going to have to take all of our private data and utilize the conjunction of these models. Then it said, okay, we can't. It's not just what's in memory. It's not just the data

11:16

that's easy that's already out there. It's the hard data that we're continuing to produce tens of thousands of files every day. How do we do that? And that's the reality that the industry as a whole is confronting kind of head on right now as we cool prototypes and start to transition them into production.

11:31

Right that's such a big deal. And you had told me before the show too that you know, really there are two paths that organizations can take, and one is to embed your data that you chose to use or put it into these vector databases, and then you've got a layer of abstraction. Basically, you've got the model and then it reaches out to the vector database as it needs to. That strikes me as the way to go. Frankly, it's sort of loosely coupled, if you will, as opposed to tightly coupled.

11:58

Tell us about that and how your part to the left of either option. Sure, So, as you mentioned kind of past, Number one is to either pre train or fine tune a model with your data. And so this is where mosaic mL comes in. They're fantastic where customers of mosaic mL we've pre trained our own model using them. And the objective there is how do you take your data and then encode it into that into the memory of that model, and so it knows it's seen and is familiar with and answer

12:28

questions about your data. There's two problems with that, or shortcomanies I should say why that's not sufficient in and of itself. The first is recency latency of the data and so it's frozen in time. And so what that requires is for you to periodically curate large batches of new data, label it and then re encode it in. So you're paying for label and class. You're paying for the compute costs and a time and energy to run those projects that

12:54

we've been in call it over the last ten years. And it's still it's still required for a lot of the cap like you know, the large lement based projects that we're doing. However, a new kind of path has emerged, and it's on the back of a technique called retrieval augmented generation RAG or databases. And here what you do is you park a bunch of your data in a vector database with vectorized embedding representation. So think of it like a

13:24

seventy digit string of numbers. It's a multidimensional representation of that text. And as you prompt a model. What it does. It looks at what it has in memory, but then also reaches over and grabs relevant data from the vector database, considers it all together, and and produce a response that solves the second problem, which is hallucined hallucinations. It's not a silver bullet. But what you can do is you can anchor it. You can make it

13:50

show homework and so you have chain of thought reasoning. A buch great way to describe it. Yeah, I'm right in this town. You can anchor it. That's exactly right. So you're you're optimizing your chances of getting a real, governed, trusted answer out of this thing, as opposed to hallucination. And you know, I'd mentioned to you that when I was covering the Data Breaks conference to go to market, guy Adam something from chat GPT was

14:15

engaging with me on Twitter, which of course is now called xs. Elon took a page out of Mark Zuckerberg's book to Arenas, but he was pointing out that hallucinations are a feature. Note with these large language models, they are predictive engines, and their job is to predict what text they think you want to hear based upon your prompts. Right, So they're really fusing vectors of information together dynamically in pros formats to create content, either marketing content or

14:41

business proposals or whatever. So it's not going to go away completely, but to your point, to anchor it like that's to give it tremendous weight and to basically gravitated toward the truth, right, Yeah, and for it to be able to show that somewhere. So say it produces h you know, a TPS memo, a page line smml. Right, you should be able to drill down and on this paragraph, Um, you know what documents underlie the inference. You know that that that generated that, and that's achievable using

15:13

techniques like RAG and you know some of the iterations of that. Yeah, I mean that's very very powerful. And so you you do wind up getting into a conversation with this engine and you walk through it and you help it understand things, and it helps you understand things. It's a very bidirectional thing in that sense, right, because you're trying to get to the bottom of something and you're trying to generate. That's what they do. These are generative

15:37

models. Generative AI generate stuff. It creates new things, right, It absolutely does, and so, like the it is a feature, not a bug. But at the same time, if you're depending on this for business purposes, you don't want a five year old that's completely fabricating information. You want something something more like Encyclopedia Britannica where it's written, but that you can trill down on sources inside and so. And there's a wide spectrum there in

16:03

between what and what you can produce using these models. Yeah, there is

16:07

an absolutely wide spectrum, and it's a learning process. But that's the other thing that gets me really excited about this is that, especially when you train it using your own data, you now have this very robust window pane into your organizations text and imagery and language and messaging, so you can look for these things, and then that's a whole discovery process, right And to me, discovery has always been one of the most important parts of any sort of

16:37

analytical process because you know, first of course, you gather data, you organize data. In the old world, you had to come up with your data model early, your schema, basically your start your snowflake or whatever you're gonna do Kimble versus in mind, that kind of stuff, But you had to decide ahead of time that plumbing was going to look like. Well, that's very fragile, and that's very difficult to use in a fast changing world.

17:07

And so what we have now is a much more robust way to understand what the data is telling us. Tell me what the model should look like, tell me what the processes should look like. Still vet it, still make your own decisions, but it's a much different and much more dynamic and flexible world. Folks, don't touch that. I'll be right back. If you're listening to inside amountmals. What if you could own a piece of the future. What if you could build your next castle not on sand, but

17:37

on the bedrock of a modern blockchain ecosystem. The first Internet gold Rush made millionaires, the second wave is minting billionaires, but the third wave is at anyone can get in on the Adding to crowdpointtech dot com to learn how you can secure a foothold in the blockchain revolution. Whatever your passion, wherever you want to go in life waiting you right now. Go to crowdpoint tech dot com to learn how the blockchain who will fuel the next generation of innovation in

18:04

this globally connected world. That's crowdpointtech dot com your trusted agent in an untrusted world. What's the longest running radio show in the world focused on data? DM Radio? Want to be a guest sometime? Send an email to info at dmradio dot biz. That's info at DM radio dot biz. Can your ire financial crisis that our top economist are saying is that are allocating a percentage

18:30

of your IRA into physical gold and silver. With a tax free rollover, you can diversify in, safeguard your holdings from trigulent markets and echo up putting your IRA back on the gold standard. Find out how to safeguard your STS with a tax free rollover with a Genesis Gold IRA, the only IRA that can hold physical precious metals. Call now for your free gold and silver report.

18:53

Protect your IRA today with one simple phone call and learn how to qualify if we're up to ten thousand dollars in free silver called Genesis Gold Group Empowering Faith Driven Stewardship. Eight hundred six four four eight six one one eight hundred six four foot eight hundred six four four that's eight hundred six four four eighty six eleven. When a player's sudden cardiac event brought a national football game to

19:23

a halt. It's shown a spotlight on the importance of CPR readiness. Now, with youth sports in full swing, the American Heart Association is rallying parents and coaches to be ready in an emergency. To be ready learn hands only CPR. It's He'll am and learn in minutes. Just visit Heart dot org slash hands only CPR. Hands Only CPR is nationally supported by Elephant's Health Foundation. Each year, three hundred and fifty thousand Americans died from a cardiac arrest.

19:52

When seconds matter most, CPR can be the difference in whether a friend or family member survives. That's why the American Heart Association is challenging every household to elect at least one person to learn a CPR. If you have ninety seconds, you can be your family CPR hero. Just watch the American Heart Astiation tell at heart dot org and become a hero. Do you need to get your hands on some extra money right now? Maybe twenty five thousand or

20:22

more. If you're a homeowner, Now is a perfect time to get cash out. While homes in many neighborhoods like yours have gone up in value. You can use the money for anything. It's yours. You can buy an investment property, payoff higher interest debt, or make home improvements if you need twenty five thousand, fifty thousand or more. Now is the time home values are up, and so is your equity. We offer you a way to use it. No need to use your savings called New American Funding. Now

20:48

and see how much cash out you can get. Call eight hundred seven one h three seven three nine, eight hundred seven one h three seven three nine one O three seven three nine. That's eight seven one h thirty seven thirty nine NMLS sixty six h six www dot MLS Consumer exit commitment to end subject to borrow. We're improperty qualifications, not all of borrow, whereas we'll qualify terms. Welcome back to Inside Analysis. Here's your host, Eric Tavanaugh.

21:22

Oh yes, folks, take us to the future. Indeed, that's exactly what large language models are doing. They're taking us to the future right now. We're going there right now, folks. It's very exciting stuff. I've played around with these. They are very powerful and they're good to lots of different things. Like one of the things they're very good at is summarizing, like you can give it one hundred page document and tell it give me a two page summary of this, and then dive into this, diving of that

21:47

very very powerful research tool and content creation tool as well. And of course, underpinning all this is an LP natural language processing, which has been around for a very long time, decades, quite frankly, and it really didn't get anywhere for a long time, and it was kind of stumbling. And then of course Siria came out and Bixby and all these others, and we

22:07

got better at that stuff. But one of the things I've learned talking with unstructured here is that the dirty secret of natural language processing these days is that many data science teams are still building one off artisanal models and it takes a tremendous amount of time. It takes a tremendous amount of effort, and by the time you're done, the world has probably changed a little bit. So that's what's really changing, and that affects the cost, that affects the viability,

22:33

that affects what people try. Because thinking about it in large organizations, man, you better have a good feeling about the bet that you're going to put a million bucks on that it's going to come to pass and generate value, and that's been a very difficult thing. So tell us a bit about that, Brian. It's good thing. So I would say that with any NLP project industry wide, you're looking at about a twenty percent success rate,

22:59

historically with enterprise about eighty percent failure rate. And there's a number of challenges here. This isn't because the folks were working on it didn't try hard enough, smart enough. You're challenged in two different areas on the NLP side itself right before it, right at the at the point of inference, and one is on the pre processing side. So starting with like the NLP side itself, it's huge lead forward with the introduction to transform based models over their predecessors

23:29

in terms of the power of the models. However, let me give you an example. He wanted to train a model to extract people, like a named nity recognition model. You can do that. Then if you wanted to have it identify folkses you know where they work or where they live, or you know where they went to college, you need some relation extraction model for each one of those relations person and college or person and towns and maintenance.

23:56

So you end up with these Byzantine model pipelines in order to produce the structured data that they labor intensive, and so in terms of economics, it had to drive a tremendous amount of value to make sense. However, if you introduced any complexity in terms of the data feeding into that pipeline, compounding the economic challenges, and this is the data engineering challenge that never received any BC money. It didn't receive from any attention. This is where we're really focused.

24:26

Which was all right, if you're grabbing Twitter's API and you're feeding that in, Okay, you understand the schema, you can map that schema, you can clean it, curate it, and send it in. It's a

24:37

one time investment. That's good to go. However, if you're wanting to do that on your data and taking your emails and your power points and your memos and your call transcripts or any other type of natural ers data and whatever formatity, any time the document layout changed, anytime the file extension change,

24:56

you needed to go and rebuild the preprocessing pipeline completely costom every time. And so this meant that the amount the cost of value and the time to value to get these up and running was extremely expensive and also meant if anything changed at all, you have this in this huge professional services bill that came your way in order to keep it running. Right, So LMS changed that first problem where you don't need these byzantine model pipelines, but you still face that

25:26

economic challenge on the preprocessing side, and that's why we're focused there. That makes a lot of sense. And you know, these transformer models. One of the things they allow you to do is leverage this attention component, right, which is basically how much attention the computer spends examining this versus examining that. And it doesn't take too much thought to realize how important that is, right, because your brain has this you you're constantly modifying what gets more attention.

25:53

Someone just barked outside or barked dog, just bark, Oh, what's that? You can shift your focus very very quickly. And these models, I mean, it used to be that you just said go and it had to go through and do the whole thing, no matter what. Now it's more like go, but I want you to find this, and it starts down one path and goes I'm not finding this, and it switches. So it has the capacity to change its focus and attention. That's a really really

26:18

big deal. Can you talk about how that happens and what's actually happening on the cover. Yeah, So where we were five years ago up to about a year ago is it would look to the left and I look to the right, about two hundred to two hundred and fifty tokens typically like a five

26:33

to twelve token window token about two thirds of a word. You can think about it that way, right, and so it like it had blinders on, Think a horse with blinders on, and it's just looking at these kind of chunks of text, and it's looking left and looking right and hunting for

26:49

answers and looking and evaluating that multidimensional text space within there. With the introduction of large language models, what's been really exciting is that that five twelve token window went to one thousand, went to two thousand, and went to four

27:04

thousand, and recently has gone to one hundred thousand and beyond. And so what that means is that you can, as you mentioned earlier, you can give it one hundred pages attacks and say please summarize this, because it can look at one hundred pages from thirty thousand feet read everything at once and then

27:22

take it all in together. Wow. One of the challenges that's emerged over the last three months, and some really important research that's been done by folks at Stanford and elsewhere has been looking at okay, we have these large attention windows, but is the first quarter of a document or of that corpus considered

27:41

the same as the second quarter as a third quarter. And what it was finding was that there's this big donut hole right in the middle that are looking at the beginning, the first ten thousand or the last ten thousand tokens and would ignore most of the meddal thing. And so this is where that relationship with these vector databases comes in, which is like, okay, let's not

27:59

try and feed one hundred thousand tokens at a time. Let's give it a more reasonable amount, but then give it access to all of this data that's a vector database. It's cheaper, so the inference cost is cheaper, and you're going to get better performance. That's interesting too, and I'm guessing that you know the way the data is persisted on the vector databases, there is some element of metadata. In other words, this data is about the color

28:22

of cars, This data is about the speed of cars. This stata is about engines or etc. I mean, I even think about on parquet files. Right on a parque file, that first bit of the parque file gives you like the men and the max and some basic information. There was a company infobright that at a database which did that kind of thing, and I

28:41

think they unfortunately went away. But you know, the parque file format was designed to facilitate analytics, and you know data bricks solve that Iceberg hooty Delta Lake thing by just going to the parquet files. So I'm just guessing there that in the mechanism that that to persist the data in the vector database and then of course thee leves accessing it, there is some component where it can go, Oh he wants to know more about colors of cars, I'll grab

29:07

it from here. Is that about right? Well, it's right in a roundabout sort of way, Eric, and it's about it's a back to the future story. And Verry over at Lama Index and Harrison Chase and others and

29:18

of Metal are doing some really cool work here. So call it this past February and before you were doing something called nearest neighbor search so co signed similarity, So have a vectorized representation of your prompt, Hey tell me how to make spaghetti right, and it would victorize that, and it would look for

29:38

similar vectorizing bettings in that vector database and bring it back together. Where the tech is going now, especially in the last quarter to two quarters, has been Hey, there's this whole world over here of of data engineering in these parquet files and all of this meta data that could be valuable. How do we do cosign similarity and all this super you know, complicated math around nearest

30:03

DA research and then also leverage that metadata to produce even better inference. And so those those worlds are just now coming together, you know, the spring and end of the summer. That's crazy. Yeah, it's really it's really

30:19

fascinating too to see how quickly things are changing. So we had another really really cool two three months snowflake for a while to Wild is now doing something with a relational AI, I think, And he was even telling me, he said, three months ago, I would have told you a very different thing that I'm going to tell you now, because that's how much advancement we've made in that short period of time. And that's what really is exciting,

30:45

and there are lots of factors to that, right. The open source movement is a big factor, big contributing factor. Just in the mindset of committing code to public and then leveraging it and being able to share so that we all collectively build these foundations and different vendors can focus on their part speaks to tremendous success in the near future with these technologies. What do you think,

31:07

Well, it's funny you mentioned him. His most recent book, DA Data Preneurs just bought for everyone in the company and gave it a copy to everyone at our offsite last week. Nice folks. This is kind of a forty year view on the world that we're at today, the importance of data for driving business outcomes, and that it's the architecture that's evolving. Yeah, and the architecture is evolving. Mean, that's kind of where I was going earlier

31:36

in the show about how just having an information strategy. So my European partner Eve Mulkers and I were going to be working on this, and I've had this theory for a long time now and we're going to try to crystallize this a bit. But every organization must have and that can be focused on any number of different thing be one area of focus understanding what are us? What kind of a team do we need to put together to be able to let

32:00

these things? And that's the job of the Information Strategy Group, and you hash out the use cases. You know, when we had Navine round the show last week, he said, yes, I mean marketing, content creation, business proposals, the sales and marketing in general is a big area of focus for us. It's going to be the next couple of years, he thinks. And it's great because it turns every writer into a ten X writer, just in the same way that rules will turn a good engineer into a

32:27

ten X engineer. And who doesn't want a bunch of ten X writers and creators out there? It helps you by giving you eighty percent of what you need, and then if you get good at the prompt you can get to eighty five to ninety percent, and you've got to fine tune it on top. But still, if every marketer walked in every morning and had eighty percent of their traditional job done for them, that's a pretty good change in workflow

32:50

and in productivity increase. Right, It's not only that it's triggered a foot race, and it's a foot race in terms of product within this economy that we're all operating again. And so what you're seeing now is shareholder pressure and competitive pressure to adopt and integrate this technology as quickly as possible because it's producing true quality and improvements and the work that's done, but also the efficiency with

33:19

which it's accomplished. And so I think one of the most surprising things over the last year has been that a lot of the you know, in the past decade that low skilled jobs we're going to be the first impacted by AI, right, But we're seeing in practice is that's actually high school that are

33:36

impacted, but that's on everything sideways. And so goldman sacts are both extremely bullish, both extremely bullish on you know, the effect that these models are going to have on growing the economy, but it's going to reshuffle it at the same time. Yeah, well, and we're also going to see some

33:53

organizational changes too. I was talking about this last week on the show in hierarchies in teams and how you build teams and who's responsible for what in certain teams, you know, I mean, I think back to my old days in the print press world where you had this whole linear process of laying out the paper and getting all the articles that you're not going to be updating that document unless you're going to print another eighty thousand and so as a very different

34:22

world. And that's kind of like what we are in this world today where the Web comes along. Now I can publish something that I can change it instantly. There's a mistake, boom changed, we got it out of there, right. Even if you think about databases, how it used to be overwriting records and databases. Now we're appending records so all. And I think

34:38

we're going to see some real changes in organizational structures that reflect that. Meaning you're gonna you know, people who were just copywriting now they're doing more than copywriting. They're actually engaging more with the salespeople to learn from them, or

34:51

whatever the case may be. You're going to see the traditional flow of work managing that I'm seeing Companies like Drift for examp Ample are out what used to take three or four different people using what used to take three or four different people. They're all doing it in real time by scoring someone who comes to your website, creating some copy to send them and all this stuff. It's like, wow, that's how that's how day to day life changes. Right.

35:17

One of the most forward leading organizations I know folks probably won't believe me on this is actually the Pentagon and the Intelligence organizations. And you know I have a background and intelligence and we've been doing a lot of work with them from the very beginning. Analytic engines in the form of lms that don't require as much retraining or fine tuning. If you have your data available and provision like what we're helping to do and what can you do, you can quickly

35:45

build apps on top of that scaffolding. They're actually calling that AI scaffolding. Craig market a chief day to day office, and they envisioned a future where you have these cheap, fast apps that are quickly built on top of the data, and they analytic engines that you're not having to spend years building this. You can have internal teams that are doing this. And maybe do you own an annuity either fixed rate, indexed or variable? Are you paying high

36:12

fees and getting low returns? If so, Annuity General would like you to have this free book to learn the pitfalls and mistakes of buying an annuity. The Annuity Dues and Don'ts for Baby Boomers contains the little known truths about annuities, like how to help reduce your fees and increase retirement income. And it's free, that's right free. As a bonus, we'll also throw in a free annuity rate report just for calling. We researched over one thousand annuities and

36:37

summarized rates and benefits from financially strong insurers. You get Annuity Dues and Don'ts for Baby Boomers and the Annuity Rate Report, both absolutely free for calling Annuity General Today. Hurry supplies are limited. Call now eight hundred two four or five one six nine seven, eight hundred two four, five one six nine seven, eight hundred two four or five one six nine seven. That's eight hundred two four or five sixteen ninety seven. Do you own a timeshare?

37:08

We'll face the facts. You made a mistake, you made a bad purchase. A timeshare is not an investment. It's a money pit that continues forever. If you use your time share, that's great. But if you don't and you want illegally get out of your contract, call my friends right now at the timeshare Exit Hotline. They're an experienced team of lawyers who help good people like you get out of a timeshare contract that they just don't want.

37:37

Don't throw away your money on maintenance fees. Use it for things you really want. We can help you end your time share contract and stop the money drain immediately. If you are ready to move on with your time share, call our team right now. Cancel your time share now with a free call six seven O five eight hundred two ninezero six eight hundred two nine zero six. That's eight hundred two nine zero sixty seven. Do you own a timeshare, We'll face it. Do you own a timeshare, We'll face the facts.

38:09

You made a mistake. You made a bad purchase. A timeshare is not an investment. It's a money pit that continues forever. If you use your time share, that's great. But if you don't and you want illegally get out of your contract, call my friends right now at the Timeshare Exit Hotline. They're an experienced team of lawyers who help good people like you get out of a timeshare contract that they just don't want. Don't throw away your

38:37

money on maintenance fees. Use it for things you really want. We can help you end your time share contract and stop the money drain immediately. If you are ready to move on with your time share, call our team right now. Cancel your time share now with a free call eight hundred two eight nine O four one three eight hundred two eight nine O four one three eight hundred two eight nine oh four one three. That's eight hundred two eight nine

39:05

zero four thirteen. If you served in the Marine Corps, by now you know about You probably have a lot of questions. We have some answers. You could be entitled to compensation. Billions of dollars are being allocated to pay for damages to anyone stationed at Camp Lejune during that time. Unfortunately, it appears that officials may have known the contaminated water problem existed and did little to

39:25

protect their men. The Semper five code was not honored. If you or someone in your family has developed a serious illness, including various forms of cancer, called this Camp Lejune Legal Support line right now. You can't turn back the clock and change what happened, but you can certainly call right now and learn your rights as a marine. Here's the number. Call eight hundred two five four three two one eight eight hundred two five four three two one eight.

39:52

That's eight hundred two five four thirty two eighteen. Welcome back to Inside Analysis. Here's your host Eric Kabinat all right, folks, back here on Inside Analysis. Automating the hard part of data science, and folks, that is what AI does. It automates a lot of the hard stuff, the difficult things, but also the tedious tasks, for example, gathering information, optimizing pipelines. Just been writing about observability. We have our top twenty list

40:30

of observability vendors out there. That's all testiful because Google had a great vision about container orchestration about really the next generation of the foundation of enterprise software is what we're talking about. But right there at the end of the last segment,

40:42

Brian, this next generation of really fast and fast developed apps. That which is very exciting stuff because now, and it's interesting, I'm seeing this as a trend in the marketplace of AI infused apps or analytics infused apps, or you're not just hitting my sequel database or you know, a Maria DP or whatever. Instead you're hitting some of these models and thus can can leverage

41:10

the power of analysis, the power of AI very very quickly. That's like a Cambrian explosion right now, right It's one of the biggest stories of the last six months has been the fact that this is not just a data science

41:23

and machine learning engineer world anymore. Directly on top of these LEM architectures very easily, and we've first started in March with a huge demand for markdown and other file types in the length chain ecosystem, one of the key kind of orchestration tools in the LM space, and since then it's unlocked thousands and thousands of new users space for thousands thousands of new users, entities that hadn't talked to each other. Um. You know, you had the data scientists that

41:58

we're working on everything under the hood, and they'd hand it often. Then you'd have some React developers build on top it. And now you're having build directly on these element engines, and it's really exciting to see what they're coming up with. That's very cool too. And we were amusing in the break about my observation over the last five to seven years as data science has really

42:15

taken route and gotten a lot of investment, a lot of activity. I told you that my experiences those teams never talk to the data management teams, which is like, what, like, why would you have your data warehousing

42:28

people not talking to your data science teams? And that's the answer. The short answer as well, it's different projects that they're working on, but from an organizational cohesion perspective, you want these people talking to each other and working on the same things because you could be That's the beauty of open source, right is you don't have to always rebuild and reinvent wheels because that's what companies

42:49

were doing is reinventing wheels every day. And I think the excitement and the power of these models is really going to force new conversations to get people to talk to each other and get a much more cohesive approach to what we're doing, which is going to save money, save time, and do my favorite thing, which is improved morale of your workforce. What do you think, Well, a lot of the you know, aiml in issues with thein large

43:15

organizations have been computer versions and looking into syboy lines. You're looking at crops IoT data right and doing large ternal analysis on that time serious analysis on that or kind of you know, really in depth and LP projects and if you introduce new data to any of those. What you have to do you have to go get new label data to retrain and rebuild a stack. You have

43:34

to monitor it. So what you have is you had these I don't want to call them stovepipe, but you almost had these kind of experiments in an incubator, right, or these capabilities that could just look at this little window of data and now what the model is. Becoming as powerful as they are, you're able to bring to bear a lot more data in different types of data to this analytic foundation, and meaning that us as people right within organization,

44:00

we're talking to folks that we haven't talked to before. And so to your point, exact theory. And this gets very interesting too because I think about the amount of heavy lifting that has to be done to get certain projects afloat and get the analysis that you want, get the insights that you need to make your decisions. And I think what we're going to see here is

44:20

a sparking out of all these different ideas from different sources. We're gonna be able to get signal from places we never thought we could get signal because people are going to be able to see it in the models and the results and

44:32

what they get, you know, seeing is believing. One of my favorite philosophers, Wittgenstein, he wrote this book, Tractatus Logico Philosophicus, which was really a big joke almost on himself and on the industry, because you get to the end of the book and he said, yeah, everything I said, forget about it. He says. In fact, he said, use it as a ladder to climb up into the clouds, and then kick the ladder away, and then you have reached enlightenment. Now you understand things.

44:58

And one of his big points was some things cannot be said, they must be shown right and it's like, wow, that's pretty interesting. And that's what is happening when people use these models and start seeing what they can see into the details of things. My buddy Steve Lucas from Booming was joking they connected an ALM to some of their data and he said, show me all the different versions of order to cash that we have right now, and it

45:22

did and he was just like, how did you do that? So this learning experience is going to be I think very inspiring, yes, daunting, yes, a bit challenging or you know, causing a little bit of fear. But I think on balance, people are going to be like, oh my goodness, this is really powerful. Let's get the ball rolling. What do you think, Oh, ten years ago we were talking a bit about big data. What we didn't have to go as big data was big analytics.

45:46

And you had the data, but you didn't have a means for actually harness leveraging that data to answer the questions that you really wanted answers to. Rather, you're just curious on that you wanted to noodle on and pursue your curiosity. Do the things that make you human, right, that are uniquely human. And now that we have big analytics, like with LMS, you're

46:07

able to do that. And to see what folks were able to uncover is incredibly exciting and is unlocking entirely new job job descriptions like prompt engineers who don't need to have any coding ability and are getting paid three hundred thousand dollars at some places right out of the gate, right Well, and you know, just think about because I tell you, when I first started to playing with these things, I'm like, hmm, if you can write in French and

46:28

German and English and different languages, I bet I can write in coalball in JavaScript, and the short answer is yes, yes it can. And now I think is in data bricks coming out with their encoder or something or their decodes kind of fire. That's calling it something that basically you can pump code

46:44

into it and say what does this code do? Right? Which was the big miss, the big issue with coball developers because a lot of these old systems were written in coball and if you don't have someone who knows Coball hitherto, you were just at the end of the line folks, and you'd have

47:00

to just work around it. But now you can actually go in And this is what really excites me is somewhere down the future being able to examine entire information architectures and understand the flow of data and systems and what's doing what in which case and optimize that because that's great for sustainability, for efficiency, for data quality, for process quality, all these things. We're going to get

47:25

really good at not wasting time. Final thoughts from you. I think that what we're finding is that language really is the native what you just described as

47:37

code to text. We have image attacks, we have speech to text, and we have tax to text, right and I think the trick is going to be how do we actually make this Like the last ten percent is going to be the hardest, and so how do you make this production bread where you can trust it enough to hand over the work that you don't want to do so that you can work on the things that really are most rewarding for

47:59

you and they are the most impactful to the organization. We're at that moment right now where enterprises are just beginning to engage engage with this, and that's on us as an industry to help solve that problem. That's interesting, Yeah, and I think that you know, companies do need to be practical and understand what are these tools supposed to be used for. Hammers are used to

48:20

hammer nails, screw drivers were used to screw things in. You have different tools that have different purposes, and it's going to be important for people to really understand what lms should do right now and what they should not do. Like there's the fun case of the guy who asked it to give him case law that it just made up and it was all hallucinations. Well, don't do that. I think that is not a proper use of the technology.

48:45

It will get you in trouble. But to your point, you've got to get someone look at it this stuff, folks, I promise you know, marketing people, you're curiosity people, right that, the people who are naturally curious and want to know things. That's who you want playing around with this stuff. That's who you want playing around with these engines. To figure out the use cases and then get your use cases and just you know, be

49:07

methodical about it, but don't not go down the road. Right, this is the kind of thing where everyone is going to be leveraging these technologies in some way, shape or form, and they're they're advancing so quickly. I'll give you one minute left. That's one of the other really interesting things is how fast they're evolving and improving. So the problem that was a big issue last month may not be an issue anymore. Right, Yeah, that's exactly

49:30

right. And so what we've seen is, you know, folks were really worried about this becoming an oligopoly over the winter months and just a couple of companies owning everything. Right, Well, you've seen you know, the teams and the capabilities that Navine and others have built a mosaic and the good work on the architecture side have pulled the cost down to building these dramatically to several hundred thousand dollars or several million dollars, and so from a capital investment standpoint,

49:52

is more tractable. From an inference side, that's that's an area where it's still really expensive and we need to work on. But but it's it's far more accessible than it was twelve months ago, and infinitely more accessible than it was when GPT three was first introduced in June at twenty twenty. Yeah, that's a pretty fast innovation curve. I would call that logarithmic. Just go walk, it's going like that. Will folks look these guys up online?

50:16

Brian Raymond from Unstructured dot io Just like it sounds unstructured dot io ETL for LMS. You're gonna be hearing a lot more from these folks. You're gonna be hearing a lot more about LM's folks sending email info and Inside analysis dot com. I want to know what you want to know. We'll talk

50:31

to you next time you've been listening to Inside Analysis. Did you know that every year millions of men, women, and children are trafficked worldwide, including right here in the United States. You know, pure pouty arc supert comes from the only tree in the world that fungus does not grow on. As a result, it naturally has anti fungal, anti infection, anti viral,

51:01

antibacterial, anti inflammation, and anti parasite properties. So the T is great for healthy people because it helps build the immune system, and it can truly be miraculous for someone fighting a potentially life threatening disease due to an infection, diabetes, or cancer. The T is also organic and naturally caffeine free. A one pound package if T is forty nine ninety five which includes shipping. To order, please visit to eebotclub dot com. T Hibo is spelled T

51:29

like tom, A H, E, B like boy oh. Then continue with the word T and then the word club. The complete website is to hubot club dot com or call us at eight one eight six one zero eight zero eight eight Monday through Saturday nine am to five pm California time. That's eight one eight six one zero eight zero eight eight toebot club dot com. It's time to make the Tri City Center in Redlands a regular part of your

51:54

weekly shopping experience. Tri City is home to a wide assortment of quality businesses, including the all new Ocean Aquatics. Check out their variety of exotic tropical fish along with fish food accessories and tanks of all shapes and sizes. The Tri City Center is located just off of Alabama and the Tennessee exits in Redlands. Visit the Tri City Center today and find out why it's called the mall with a heart. When you're alone and life is making you lonely, you

52:24

can't always go Downtown. Nestled in the heart of downtown San Bernardino is living history and the place you want to be on the Internet. It's three twenty downtown dot com. That's three twenty downtown dot com. The Enterprise Building with its rich interiors, it's a place so special you just have to see it. It's a three twenty Northeast street in downtown San Bernardino. The Enterprise Building is the heartbeat and entertainment life of downtown San Bernardino, as well as a

52:55

distinguished space for your new officer. Building you can grow with its newly renovated banquet area, meeting rooms, three twenty bar top deck, terrace, and plenty of parking space with over eighty nine square feet of reasonable and available opportunity Today. It's family owned and operated by Alicia Allen and their son Ryan.

53:15

They've rolled out the red carpet and crafted a geracious space, keeping the historical feel of the building while providing the opportunity to create the future memories of your upcoming wedding or celebration. It's three twenty downtown dot Com. That's three twenty

53:30

downtown dot com the Enterprise Building. It's Downtown waiting for you. This segment sponsored by the all new Rams Market, Rams Express car Wash and the Rams Ultimate One Stop Shop at twenty ninety mentone Boulevard on the corner of Crafton Hills Avenue, just across from Jacento Farms. Rams is now open, all new, well stocked, bringing fresh and sharp convenience to Mentone. With their clean,

54:05

safe chemicals, they'll leave your car all shiny and protected. When you buy gas inside the one stop shop, you'll get savings on your car wash. Make life easy and so we eat it all with Rams. You can save with the Dinosaur and your top tier Sinclair Fuel. Rams has great car wash membership deal and you can save on gas when you wash your car satisfy your appetite when you drop in the one stop shop convenience store. Smell the

54:30

chicken delectable, mouthwatering, crispy, crunchy fried chicken that is. RAMS is now open at twenty ninety Menton and boulevarded Mentone. Thank you to RAMS for their community spirit and sponsoring this radio station. Rams and Mentone is waiting for you. I always hear from our clients who hired another firm that they wish they'd hire DNA Financial first. Don't have regrets about your I R S tax case. Just hire the best in the first place. One owed one hundred

55:00

fifty thousand to the IRS and it's spent thousands on another firm. We stopped the levies and negotiated a payment plan and had their penalties forgiven. And while every case is different, we guarantee that we'll find your perfect resolution and get it done right. For a free consultation, call us at eight six six two zero one zero one five six. That's eight six six two zero one zero one five six. Then you can say DNA dude right by me.

55:22

You're listening to Cadre, where we play all the best classic water wasting hits. Like watering your lawn during the hottest time of the day and leaky fossils. There are tunes from another time because now that we face a hotter, dryer future, every drop counts despite an extremely wet winter. Keep using less water, fixing leaks, and reusing indoor water for your garden. Visit save our water dot com for more ways to conserve Up next, brushing your teeth

55:50

with the water on through. It's a bird, it's a plan. No, it's superraw. Okay, a gimmicky opening for a commercial about super Roth universal life insurance, but I'm sure it got your attention. Now what is a super roth, you ask. It's a permanent indexed universal life insurance that's totally liquid and easily accessible once it matures. Can be used to supplement retirement

56:16

savings or a death benefit, or both. Has no income or contribution limit, has no five year rule like roth iras, has no ten percent penalty for accessing the funds before age fifty nine and a half. Oh, and the average historical returns are five to seven percent annually. Tax free. Super roths also lock in gains, which means you don't lose your money when the market is down. Sounds incredible right sounds. Super super Roths are the way

56:44

of the future, specifically your future. To see if you qualify for a super Roth, go online to the super Roth dot com. This segment is sponsored by press Print, Southern California's best full service union printer and mailhouse. The offer the lowest prices around with unmatched service and reliability and free delivery throughout So cal. Press Print can print anything from letterhead, business cards, and campaign literature to mailers of any size, lawn signs, banners, doorhangers,

57:12

or just about anything you might want. Press Print promises to save money for you, your business, or your campaign. If you'd like to learn more, contact Mike Croombrin at press Print seven one, four, three, nine nine eight, seven zero eight. Get the Union Bug. You've eaten lots of great food and lots of great food at restaurants. Cowboy Burgers in Fontana and now on Arlington and Riverside will fast become one of your favorites with their

57:37

delicious, mouth watering burgers and breakfast burritos. Cowboy Burgers in Barbecue also serves fantastic smoke barbecue, baby back ribs, dry tip chicken pull Park sandwiches as well as lunch and dinner plates. Everything is made from scratch, including their delicious side dishes like cole slap, potato salad, barbecue beans, and much much more. Check out their rich, decadent chocolate brownie him food critic Gallenborgan and you can dine in, take food out, or have them cater your

58:04

next special event. I highly recommend Cowboy Burgers and Barbecue at their new location at five five seven three Arlington Avenue in Riverside. Just looked them up on the Internet. That's Cowboy Burgers and Barbecue, happy eating and perfect for the holidays. Cowboy Burgers in Barbecue is also available for catering. That's Cowboy Burgers and Barbecue in Fontana and now in Riverside on Arlington. The Tri City Shopping Center in Redlands is serving up some really cool ice cream at Lamicho Acana.

58:37

Then get your chocolates and other delights from Seas Candies. Moms and future moms who visit the mall can cool off and relax while they get treated like royalty at Shiny Nails or Francis Nails and then pampered at Texture Hair. The Tri City Center is filled with retailers who care about you. Shop at the Tri City Center in Redlands and see why they call it the Mall with a Heart. Now here's a new concept. Digital network advertising for businesses. Display your

59:07

ad inside their building. If a picture is worth a thousand words, your company is going to thrive with digital network advertising. Choose your marketing sites or jump on the DNA system and advertise with all participants. Your business ad or logo is rotated multiple times an hour inside local businesses where people will discover your company. Digital network advertising DNA a novel way to be seen and remembered.

59:37

Digital network advertising with networks in Redlands and KAIPA call in the nine on nine area two two two nine two nine three for introductory pricing. That's nine O nine two two two nine two nine three for digital network advertising. One last time Digital network advertising nine O nine two two two nine two nine three. Now here's a new concept. Digital door on board casey AA's inlandok Express casey a

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript