KCAA: Inside Analysis with Eric Kavanagh (Sun, 12 May, 2024) - podcast episode cover

KCAA: Inside Analysis with Eric Kavanagh (Sun, 12 May, 2024)

May 13, 20241 hr
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

KCAA: Inside Analysis with Eric Kavanagh on Sun, 12 May, 2024

Transcript

Comedy The Fall Guy finished second with thirteen point seven million. Tennis drama Challengers finished third again this week with four point six million. Low budget horror film Taro finished fourth with three point four million, and Godzilla Xkong and New Empire refuses to leave the top five with two point five million. I'm Chris Karagio, NBC News Radio, NBC News on CACAA Lomolinda sponsored by Teamsters Local nineteen

thirty two, Protecting the Future of Working Families Teamsters nineteen thirty two. Dot org. The information economy has a rod. The world is teeming with innovation as new business models reinvent every industry industry. Inside Analysis is your source of information and insight about how to make the most of this exciting new era. Learn more at Inside analysis dot Comsideanalysis dot com. And now here's your host,

Eric Kavanaughs. Oh, ladies and gentlemen, Hello, and welcome back once again to the only coast to coast radio show that's all about the information economy. It's time for Inside Analysis, your host Eric Cavanaugh here in folks. I'm very pleased to have an industry visionary on the call today. We're to be talking to Justin Borgman. He is the co founder and CEO of a company called Starburst, and they are shaking up the industry of analytics.

They've come up with a way to allow you to query data pretty much wherever it is. Maybe it's in your relational database, maybe it's in in open formats like Iceberg. We're to be talking a lot about Iceberg. I'm going to be attending their virtual event shortly too. I think it's in about a week or two. And they've shaken up the market in significant ways. And justin welcome to the show. Thank you for having me. Eric super excited

to be here. Yeah. Absolutely, And as we were just discussing before the show, I've been in the data warehousing space for a long time, almost twenty five years. It's hard to believe. And boy have things changed. I mean, if you go back in you know two thousand and one time frame, storage was expensive, processors were relatively slow, the pipes were fairly thin, so it was a different architecture. And these days, I mean, goodness, gracious, we've had so many innovations in distributed systems for

example, and then being able to leverage data where it lives. And I think that's one of the keys to success in the future is being able to analyze data wherever it's sitting. And that was one of your early concepts right when you rolled this out. Please talk about that. Yeah, exactly. So you've got me beat in the number of years, department, But I've been in the space for fifteen years, and even fifteen years ago. You

know, it was very interesting. That was really the early days of Hindu kind of the first data lake, and a lot of the concepts that are so important today were actually kind of pioneered back then, you know, the idea of open formats like parquet files and avro and rc file and storing that in cheap storage and a distributed file system and so forth. And then my

first company was acquired by Terra Data. I spent a few years there, and to me, one of the things that I realized was that despite terror Data being the industry leader certainly at the time, not one of their customers

actually had all of their data sitting in a classical data warehouse. That single source of truth in one proprietary EDW just wasn't actually achievable, and that you would always have data that lived outside of that, and even where you could centralize, it actually probably made more sense for you to centralize in more of

a lake model. And around the same time as I was having these realizations, I discovered an open source project coming out of Facebook that was originally called Presto it later became Trino as it's known today, and decided to partner with

the creators of this project while they were at Facebook. We started collaborating together between Terra Data and Facebook to improve this project, make it really enterprise grade, and in twenty seventeen left ter Data to form Starburst as the company behind this project, and the creators of Preston Trino joined me from Facebook and sort of, you know, from there we went, yeah, that's fantastic.

You know, I'm the biggest fan of open source. And it was in fact, back in two thousand and five when I was working for the Data Warehousing Institute that I did a bunch of research into it. And way back then, the Apache web server had just eclipsed the Microsoft the web server as the number one web server, and I was like, oh, this is

very interesting stuff. And I remember doing some research and doing some writing around the nexus of open source, and back then, of course it was service oriented architecture, and I was thinking to myself, hmmmm, this is very interesting because ideally, if you have an SOA and you are using open source technologies, you should be able to rip and replace different component parts build a composable architecture. And that doesn't sound like good news for the big Oracles and

ibms of the world. Of course, IBM though, went all in an open source and really supported so they saw that vision. But wow, you look back in time now and that was a very pivotal moment in this industry, right when open source really took over. Now a patchy is huge with all these different projects. That's pretty cool stuff, right absolutely. I mean

that was one of my key lessons from that period as well. I think like when when you have an open option and a proprietary option, that the open option is is probably going to win, and it's it's better for customers. I mean, at the end of the day, I think customers, you know, get smart about these things, and they want to build, you know, architecture that can stand the test of time, you know,

something that's truly future proof. And and that's why it's so important to use open components because they are you know, generally modular and swappable and you can adapts as things change. And at least from where we said, that means, you know, how you store your data can change. You know, maybe it's Oracle and tereartate on prem today, and maybe it's you know, storing data and Iceberg and S three and you know, Mango deb for for

your operational database in the future. Right, And being able to have analytics across all of that, regardless of where it lives, is really core to our mission and providing customers that optionality. Yeah, and that's so cool because it's always in the context that you understand a particular business problem. And as I as I'm sure you know, in the old days if you will, of data warehousing, the whole idea was to pull data from source systems,

put it into a relational model that then you can analyze it. Because we learned that these ERP systems and other production systems were not designed for analytics. They were designed for doing stuff for transactions. So it's a different architecture, it's a different workflow, and the idea was, well, gosh, how can we query these things? Well, you do that by bringing it into

a data warehouse. But by enabling analysis of data and many different systems and formats, you've obviated the big, huge need of all of this ETL extract transform load, extract transform load. It's never going to go away. But I remember seventeen years ago when we launched DM radio thinking to myself, this

has got to stop. I mean, there's so much time, effort, and money spent moving data around, and we now have evidence to suggest that, you know what, eighty percent of that data isn't even really needed for your queries. So you're moving data that you don't even need to move. Well, that's silly in and of itself, but then when you can analyze it where it lives, now you've enabled the kind of rich context that's going to help answer business questions. Right, that's right, Yeah, that's exactly

right. Yeah. So, I mean I think it's fascinating that you're able to do that. Can you kind of get into the weeds a bit about how so obviously of connectors, ODBC, JDBC traditional ways of connecting to data, but then you can sort of infer the format of the data and then transform that I guess in real time to be able to do analysis. Is that right, walk us through how that whole process works. Yeah. So you know, the way I like to explain it to folks is that Trino

and Starburst are like the top half of the database. And what I mean by that is it's a database without storage, so we access the storage wherever that may be. It could be a traditional relational database as you said, maybe it's Oracle or Terra Data or SQL server or Postcress or mycequl or what have you. It could be no SQL database like Mango. It could even be cough Ga topics. It could be these open formats, which I think are a big deal, and I know we're going to spend some time talking

about those in a bit like Iceberg. But regardless of where you store it, what we're doing is we're essentially connecting to it, connecting to the catalog that has the metadata about how that data is laid out, and and executing that query in memory in a in a distributed MPP architecture massively parallel processing architecture,

so it's scalable and super fast. And that's really I think what what gave it its momentum as a as a project when it was first created at Facebook years ago was the fact that this thing could really work at petabyte scale with you know, tons of concurrency and complexity around the types of analytics that

you're trying to do and do that in a performance and scalable way. And so today it's it's leveraged by some of the largest organizations of the world LinkedIn Netflix, AIRBNBU, you know, large enterprises like Comcast, a ton of the big banks, and and really using this at at at scale to do

exactly as you described. Yeah, and I mean this whole issue of being able to analyze data where it lives, no matter what format it's in, Like you said, a no SQL database Cassandra Mango dB our relational database. Of course my SQL postcress SQL. You know as well as I do. There are millions and millions of instances of these databases all around the globe.

But the bigger the company gets, the more you're likely to have multiple different systems, maybe you know, a call center system, a CRM, customer databases, all these different kinds of systems, which hitherto in the old days you would have had to run extraction routines to pull the data out load it into a particular schema. Well instead, what you're doing is like you say, you got that top half of the database, and you don't care where

it's stored. So now your business analyst can go, you know, I want to be able to triangulate how much we're selling with our with our current supplies, with our customer needs, with what we're seeing on our website, all these different things. You can pull that together in a query and hit

boom, get and get answers. Right, that's exactly right. And I think to your point, you know, if you're a large enterprise fortune five hundred and forty one thousand enterprise, that that sort of Cambrian explosion of data sources is inherent. You know, it's unavoidable. And in fact, there's a venture capitalist named Matt Turk at First Mark Capital who's only I mean you've

probably you know where I'm going with this here. He maintains this data landscape of all the different types of database systems and data sources basically that exist in our space. And he's updated it every year since back when I was doing my first startup fifteen years ago. And so you can watch this progression. It never simplifies, it never seems to get smaller. It's just more and more and more data sources, to the point where if you're a large organization,

you have probably most of what's on that on that market map. Yeah, and to your point, the world is getting more diverse. There are more tools, there are more data types, and there's value in those data types. There's value in them hills, as they say, there's gold in

them hills. Right. So to be able to have this abstraction layer above the panoply of data sources underneath, that's very powerful because again, otherwise you would have to manually go in and connect to this source pull all that stuff out. Those pipelines break all the time. They're expensive, they're cumbersome. Typically you lose some data when you're doing that. You can have all sorts

of data quality issues occur. So instead just leaving it where it is and being able to query it, and you still want observeability, you still want to know when it's not coming through the way it's supposed to come through. Things of this nature, that whole space is exploded as well. The key is to be able to enable your professionals, your analysts, to ask whatever questions they want from data that's in any number of different sources, right,

yep, exactly, that's exactly right. That's pretty cool stuff. And give us some examples maybe of some of your clients how they're able to pull together data from multiple different sources to see something and to understand something that's key to their business. So one example that we've been working with Comcasts for a number

of years. Actually, they've been a customer CHISE, I don't know, probably maybe six years now, since the earliest days of the company's existence, and I always love chatting about some of their use cases because they're very relatable

to a wide audience. We all watch TV in some former fashion. And when they first started working with us, they had all of the classical billing data in a traditional enterprise data warehouse on prem and they had all the viewing behavior basically every time you change that channel, every time you interact with that

cable box that was stored in a Hadoo data lake. And what they really wanted to do was be able to correlate the shows that you watch with how much you spend and do cross sell and upsell based on your viewing behavior, which makes a ton of sense, but is a perfect example of one data set living in one data source which is huge volume, right, like all that event data or every time you change the remote that needs to be in a data lake. I mean, you really wouldn't want to put that in

a data warehouse. Its cost you a fortune. But then they had all that classical billing data from you know, decades of customer history in the in their data warehouse, and so they wanted to be able to combine that join tables across these two systems uh. And and we were able to deliver that value very quickly for them simply by being this abstraction layer and being able to

join across these two data sources over time. What that let them do also is start a migration effort, a digital transformation effort, if you will, or cloud transformation effort maybe more precisely, uh, and start to move some of that data into the cloud into you know, object storage like S three or across multiple clouds uh, and leverage more of a data lake or lake house architecture for large portions of that data, where we become essentially the engine.

And and again I know we'll get to the but basically, when you combine you know, Starburst or Treno and a open format like Parquet or Iceberg or Hoodie or what have you. You essentially have a warehouse. It's just an open source warehouse with an open engine combined with an open format, which can now today in today's world, give you the same performance and functionality that you might traditionally have gotten from a proprietary data warehouse. Yeah, you know,

that's very interesting for lots of reasons. One, you don't want to hamstring your analysts. You want your analysts to be able to ask whatever questions are going to come to their mind, and you want them to be able to leverage new data sources quickly. Yeah. Right, Because there's an opportunity cost of doing the old model in that people just don't think outside the box. They think very much inside the box, and that greatly limits the kinds

of questions you can ask, the kinds of analysis you can do. So what you've really done is open the doors to all kinds of exploration. Right, that's exactly right. You literally have the freedom to be curious, uh and be able to get a response and iterate on that curiosity. So for exploratory analytics, it's it's truly a game changer to have everything at at your fingertips. And you know, in some of the large organizations that we work

at, big banks and so forth. I'll tell you another anecdote. Uh, this one, I can't say the name, but I'll just say one of the largest banks, Uh, you know, the the CEO used to call their data analytics team and say, hey, can you tell me you know X, y Z, you know how many how many times a day are our customers checking the banking app today? Or you know what's our what's our mortgage default risk or what have you? And uh, it used to take that team, you know, a week or two weeks and they'd say,

hey, yeah, we'll get we'll get that answer for you. But now by having connectivity to all those data sources, that the analytics team can say, hey, just wait, wait one second and get the answer while he's on the phone, you know. And that's a that's a powerful, powerful thing. Well, it is amazing because again, once you acclimate to being able to discover I love that line you said, you have the freedom

to be curious. It's very interesting because the mind is a very powerful thing, and I know that these analytical engines are very powerful, but human creativity is the key ingredient in all this stuff. Right, you can have all this data, all this analysis, reports, dashboards, everything you want. It's a human being that's going to have to really look at that and map it all through the business and the plans and where you want to go.

And these days you have to be able to change plans pretty quickly. I mean, I've been in business myself for on and off, self employed for twenty five years and two long instances, and I've never seen a more dynamic marketplace than what we have today. I mean, especially for mid sized businesses, but even for big businesses. You have to be able to know what's happening. Where's the money coming from, where are the expenses coming from,

where's the margin coming from, what are the people doing. All these things are crucial to understand and be able to pivot, because otherwise you have to lay people off, and a lot of times we've seen a lot of layoffs these days. It's very difficult to recover from that stuff. Just knowing what people we're working on is half the battle. Then being able to reallocate that worked as someone else is another whole half of the battle. And then by

the time you've done that we'll guess what the market's changed again. So the long winded point I'm making here is that it's really important to be able to analyze your data to understand what it's telling you, no matter which system it's coming from. And folks, don't touch up that. I'll be right back. You're listening to Inside Analysis. Welcome back to Inside Analysis. Here's your

host, Eric Tavanaugh. All right, folks, back here on Inside Analysis talking to Justin Borgman, co founder and CEO of Starburst, a very cool company that is doing amazing things with data analytics. And we were just talking on the break just at about COVID, how COVID came along and really forced everyone to pay attention to their processes, to their workflows, to their data. You better understand what's happening. And of course all these behaviors changed.

And that's one fascinating aspect of the industry of business in general that I found so curious because everyone tells me this is the case. And what happened is you have all these algorithms that were designed to optimize sales and marketing and operations and different aspects of a business, but they're predicated on old behavior of how people acted before COVID. Then COVID hits and all that stuff went out the window. So you had to come up with new ideas, new ways of

doing things. And that's where analytics plays a crucial role. Right, That's exactly right. I mean, we live in vollats ole dynamic times and being able to react and understand how changes in the world around you are impacting your business and drive strategy in accordance with that is so important, and that time to inst is critical in driving that. And I think Covid is a fantastic

example. Right, all of a sudden, overnight we started ordering our own groceries and doing our banking online, and those mobile apps you know, became so important, and delivery services became so important, and you know, calculating that incorrectly, you know, could have huge, huge impact. I'll tell you a quick story here in the Boston area. You know, one of our great tech success stories is a company called Toast, which has basically,

you know, reinvented the point of sale system and restaurants. And there was this moment in the very beginning of COVID where they thought, oh my god, our business is screwed, basically, and yeah, toasts. Yeah, no, pun intended. And so you know, tragically they let go of half their company. You know, it's like a thousand people were suddenly laid

off. It ended up being the case that their business actually ended up growing dramatically during that period, and you know, but they couldn't see it yet that that COVID was actually putting a lot of restaurants out of business but creating the opportunity to create new ones. And as new ones were created, the first thing that somebody decides when they open up a restaurant is which point of

sales system do I want to buy? And so you know that that prospect of selling to legacy restaurants, existing restaurants and getting them to change is much much harder for them than selling to a brand new restaurant. And so it ended up being this huge boon to their business. But but you know, they just couldn't see that at the time, and you know, weren't able to react early in that and unfortunately, you know a lot of people lost their job early. And now to you know, TOA is a huge success

story public company, you know, great great business. Well, yeah, and being able to pull data from systems like that for example, is very useful. So, especially if you've got ten twenty thirty stores around the country, being able to analyze the data coming in through a system like that like toast point of sale is great for apply chain, it's great for planning, it's great for you know, understanding what mixture of products to offer in your

stores. I mean a lot of people don't realize how much work goes into rolling out a new product at a company like that, like at a Chipotle with their brisket or something like that. Yeah, a lot of work goes into the understanding what is the sourcing, what is the pricing, How are you going to be able to pull this off? That's what analytics is here

for. And because you have this abstraction layer that allows you to dig into any number of different data sources, you've greased the tracks to that kind of analysis. Right, that's right, that's exactly right. Yeah, Because speed matters these days, you have to know and you have to be able to fail. Fast forward, I think is the real key. Try something, if it doesn't work, know that it doesn't work, move on. No sacred cows in business, Right, that's right, that's exactly right. Yeah,

Yeah, that's interesting. Well, let's dive into this open table format stuff. We talked about open source earlier in the show, and how amazed I am still at the number of open source projects we mentioned had dupe briefly, and I watched that whole evolution a little bit skeptical, to be honest, I'm like, hmmm, I'm not sure if this is if this is one NiFe that's going to solve all these problems and that produce you know,

it's great for indexing. The Web had a very good use case, that's why it was designed right, but it was there were security issues, there's overhead, there were network issues. All kinds of different things came into play. But the point is we learned a lot. We learned a lot about distributed systems. We learned a lot about file storage, for example, and

now we have these interesting developments in file formats, in table formats. There's parque right, which has been adopted as a standard, I think, and one of the reason real quick is because it has that little analytical box at top right in the beginning of the file. It has a little area that's designed to give metrics on what's in that file, which is great for analytics.

Right, do you know much about Park? Yeah, yeah, no, you're You're absolutely right, And I think that's one of the one of them most important, you know, creations of that Hadoop era that you were describing, you know, maybe fifteen years ago, you know, when I first got involved. Back then Park didn't exist, and you know, the query performance was also exceptionally slow. Map produce itself was exceptionally slow, and the file formats that you were reading from were also very slow to read from.

And so Park was a columnar format first and foremost in terms of how the data was laid out, which allowed for much better performance and mimics, as you say, like a lot of the capabilities of a of a traditional database. And I think to your earlier point, you know, Hadoo was a test ground of a lot of new concepts that I think have been perfected now ten to fifteen years later. And this is really like you know the Lake two point zero, if you will, where every little component has been

upgraded. You know, in the early days it was Hive or maybe I Paula, if you were a cloud era customer to it might be you know, Treno and Starburst on the query engine layer. And similarly, while it was, you know, part of k for a long time. You can now leverage Iceberg or other formats like Iceberg, there's Hoodie, there's Delta. When can talk about some of the pros and cons of each if you'd like.

But you know, these really allow you to do updates and deletes, which is sort of like finishing the last mile on the file format piece. So you can now modify the data. You know, if you have a GDPR use case where you need to take somebody out of a mailing database or mailing list if you will, because they've opted out, you can now do that. And historically you couldn't do that with data lakes. They were append only, and so it was a real pain in the butt to try to

actually modify your late. But with Iceberg you can do that, you can do time travel, you get all this new functionality that is much more historically associated with a proprietary data warehouse like tear Data or snowflake. That's great, it's another abstraction layer, right, what was the on one of these shows years ago someone joked that in it we always think that one more abstraction layer will solve all our problems, and it kind of does at least it solves

many problems. Right, then you have to do all the mappings and make sure that you've doubted your eyes and cross your t's. But nonetheless, that is incredibly important, and it's a great example you just gave too, being able to delete something, say, okay, get it out of this person that's unsubscribed. We don't want to hit them anymore. Yep. Most people think, oh, can't you do that? Not in that old format, no, but now in this format you can. So we're always making sacrifices

to achieve new ends in this business. The key is to kind of understand what do we need to keep, what can we let go of? How can we solve these problems? And these abstraction layers are very powerful, being able to solve many many problems in one layer, right exactly, Yeah,

yeah, I think you're exactly right. And you know the history of computing is a lot of abstraction on top of abstraction, right, Like, thank goodness, we're not deciding exactly where to write every bitten byte to a hard drive anymore, right, So, yeah, well, there are lots of things happening. So how how would a starburst work with Iceberg for example in

a client situation. How does that all come together? Yeah, So this is this is what's got me really excited right now currently is that we've seen first and foremost widespread adoption of the Iceberg format and tremendous momentum there. We see so many customers are looking at it because it is open, it is independent. You know, number of vendors have embraced it. Even Snowflake is

embracing it, which is interesting on its own right. But the industry has really rallied behind this this format, and I think it becomes the de facto standard. It becomes the new new parquet and you know, in the sense of its ubiquity is our view. And one of the reasons that's exciting to us is because when Iceberg was first created at Netflix, it was created to go pair with Treno or Pressed as it was first known. Now today Netflix, a huge Traino user, uses that with Iceberg, so that pairing of

Trino and Iceberg goes back to its earliest beginnings. And one of the reasons was that Netflix was a big data warehouse customer of my former employer, and they were trying to figure out, how can we really do the full functionality of data warehousing in an open warehouse essentially open and warehousing model, and that was one of the motivations for the creation of this format and it's been it's

been very successful. So the combination of trino and Iceberg is something that we see a large portion of the Internet companies already embracing again LinkedIn, Netflix, many others, Apple and so forth. And we call this the ice house, you know, the Iceberg wear and I think that's very exciting for those that are going to centralize data we would say centralized that are in an ice house, and then you can also leverage us of course to query the other

data sources you have as well. That's very interesting. You know. What also excites me, and I've seen this now several times, is people in these very large organizations, in the data teams dealing with massive amounts of data. I mean, goodness, greacious, Netflix, think about how much data that they're dealing with. It's a tremendous amount and relatively unwieldy data too. A lot of machine data, but a lot of user data, so not

your traditional types of just transactions buying this and that. I mean, they have that too, but they have all these other things they have to manage. Well, what's interesting is how you see these diasporas every so often where people will come up with an idea in this organization that spring out and start their own thing. It started with Yahoo I think was the first big one

back in like the early aughts. Basically a whole bunch of people sprang out of there and they found it, I mean forty different companies, and that became the had dup ecosystem, right because first there was adup hdfs. Then you had this whole ecosystem around it to do all the stuff like governance and analytics and different things like that. And now we're seeing it come out. People come out of Google, people come out of LinkedIn, that's where Kafka

came from. People come out of Facebook and Meta. It's really interesting, right because they were at the coal face, as they would say, hacking through these incredibly difficult challenges, and then they went and they rolled their own like Cassandra, I mean that came out of Facebook too, right, yep.

I'm so with you on that, Eric, And these are sort of some of my own personal truisms that I've you know, learned over the years is a open is always is generally better than proprietary, and then I think

the markets move in that direction. And b if you want to see what the future looks like, look at the Internet companies because generally they're on the frontier, and I think architectures start to evolve to mimic you know how those folks deal with things, because they're always ahead of the game in terms of

scale and performance and really pushing the limits of those systems. Yeah, and then to open source stuff, I mean, how cool is that, right for for the folks at LinkedIn to roll out Copka, open source it, makeing it Apache project so that do it yourselfers can go grab the code and stand things up. Now you're going to have to maintain it, and that takes a lot of time and effort. Right, So, Like, there are pluses and minuses to using open source stacks, and there are security issues

we've learned about recently. There was that one case of a developer who caught some little potential hack and it was saved us all from a lot of trouble, I think. But nonetheless, I think you're right that openness foster's innovation. It also builds trust right, totally, absolutely, Yeah, you know, and that's really important because, especially in industries that have regulations, you want to have an audit trail. You want to be able to explain what

this thing is doing. Black boxes are not so good. Chat GPT was open, open AI, now it's not open. Now it's a black box. So these things are issues we have to worry about. And I think that by and large, the black boxes are are kind of over for a big business. But what do you think about that? Yeah? I think again, you know, over time, you know, where there's an open alternative, that's where people are going to want to graduate for all all the

reasons that you described. You know, even at let's say JPMC, Jamie Diamond talks about you know, he's never going to use one cloud, he's going to have four clouds, the fourth I guess being his own on prem You know, he doesn't want to choose one vendor. He doesn't want to be locked into anything. I mean, and this is like a banking CEO who has the the you know, I guess foresight to think about his his

technology strategy in a similar risk management way. And I think, you know, Jamie has just been a fantastic risk manager in all aspects of his business. But you know, just another great example of you know, someone even at his stature, thinking about I think a lot of these principles you're speaking of, Yeah, well, I mean they are at the front end of innovation as well. I mean they kind of have to be, because you have to protect that money first of all, so it's all going to be

focused on governance and security and compliance and transactional integrity. You can't lose your money. But they're also very forward thinking in terms of how to use analytics to understand customer behavior. And you talked about risk too. I mean, what a complex feel that is. I had a whole past life moderating webinars

for a group called GARP. That's the Global Association of Risk Professionals, or as I call them, the defacto Illuminati, like chief risk officers for every central bank in the world, right, And these folks are like heavy hard eye on the market on what's happening. When COVID went down, they were the first to figure out, we got to do something about this. And you know, just real quick, I had a guy from my healthcare company

on the show. Yes, I guess about maybe nine months after COVID struck, and he said that in the first six months after COVID hit they not out five years of digital transformation because they had to and they knew they had to. And so when the pressure mounts and you know you have to act, amazing things can happen. And it was all the stuff we talked about a COVID of the spirit workforce. You've got to understand your workflows, all that fun stuff. But folks, don't touch up that. I'll be right

back. You're listening to Inside Analysis. Welcome back to Inside Analysis. Here's your host, Eric Tabanaugh. All right, folks, back here on Inside Analysis talking all things data and analytics with Justin Bordman, co founder and CEO of Starburst. I love your company name. By the way, you got a Starburst. That's good stuff. We have a lot of fun with that. Yeah, those of you watching the video, at least you can see this the background. That's one of my favorite words in Spanish is Istria,

the star Istria fugas the shooting star. Right, cool stuff. Yeah, but let's talk about the stuff you're doing with Iceberg. And I understand you have a great team member who has just joined you. But what you're doing in an in jest right, because in jest has always fascinated me. You point your your connectors to data sources to pull all that stuff in. How

can you do that efficiently fast? It's just really important to be able to do that because the longer you wait, the longer you're waiting to do someone else is tell us about that, yeah, exactly. So building on that idea of a ice house architecture, really what we're going for is to create a full end and experience that is, you know, just like a traditional data warehouse, but leveraging a open format at its core. And again that's

what an ice house is. It's a open lakehouse leveraging Iceberg at its core. And that means, you know, addressing a few areas, one being data ingestion, both streaming and batch turning that data into Iceberg tables. A lot of people just you know, don't know how to do that. They've never had to do that before. So how do you create those tables. There's the governance aspects of access control, lineage, auditing. There's also just

the management of those tables. There's something called compaction where you want to compact a lot of small files into larger files to get better performance. And so that's just kind of a utility that needs to take place for performance optimization. There's you know retention and snapshot expiration. There's you know, capacity management, increasing or decreasing your size of your cluster for performance to match the demand and

being able to do that elastically as well. So these are all the things that we've been working on from the Starbars side. You're absolutely right. We just hired Carl Steinbach. He was a very early member in the Iceberg community, has been working on Iceberg since twenty eighteen, and he's a PMC member and actually was one of the co founders of Tabula, which is an Iceberg company, and we're super excited to have him here working with us on continuing

to drive this this roadmap. And one of the reasons he joined is I think exactly what we believe, which is that you know, open formats belong with open engines, and this is really the stack for giving you a end to end, you know, open warehouse experience. I got to tell you, I love this ice house concept, right, I mean, I've been around a long time. As we were talking about data warehousing the data lakes. I will tell you I was a little bit concerned in the early days

of the data lake. I'm like, hmmm, are we making the same mistake again of thinking we can store all this data in one discrete location. Because data is going to live wherever it lives, and moving it is always going to take time and effort. This is concept of data gravity. Right. Of course data doesn't have actual gravity, but the point is in a conceptual way it does because you have to pull it out of somewhere, push it to some other location, land it there, and make sure it all

lands properly. And again this whole federated worldview. I remember the first time I learned about Apache Arrow, I was like, oh wow, that sounds very interesting. So enabling the federated queries, I think in memory was their whole vision with Apache Arrow and again another Apache project. Right. So here you have the best minds who come together and in an open way develop technology that is open source, right, because open leads to future connectivity, it

leads to future collaboration. It's not a closed door. Proprietary as a closed door. And you know, this is what I saw way back when in like two thousand and five time frame or something when I'm like, wait a minute, this open store stuff is pretty cool. What's gonna stop it? And you know things have happened. I won't name names, but there have been some movements to truncate open source. But I think the cats out of the bag. What do you think? I totally agree. I totally agree.

And and you know there's another point that you reference there on sort of centralization versus decentralization. Our answer is both, like you're going to have you know, some centralization and in those cases leverage open formats like Iceberg and the ice House model. And you're going to have other cases where you're going to want to keep the data where it is and query it there directly. And

that's where you know federated queries adds value. And so you know, we think it's a it's a both, not a not an either or an or. And absolutely you know you want to be leveraging open components wherever you possibly can in that sack to give you that flexibility, right, well, and who likes lock in? Right? Nobody wants lock in? Ye? I saw one of the big cloud providers, was it Google? Maybe that lowered their egress fees. Did you see this recently? It was like, yeah,

what's happening there? It's these you know again, no one likes lock in, right, there's no CTO that wants lock in. There's no CFO that wants to lock in. Nobody wants that. And open formats allow you to stay open and to mix and match to bring some new technology, because that's I mean you mentioned the term future proof. Of course, that's the name of our show or our TV show future Proof. It's really important these days to leave doors open because amazing new things come down the pike. I

mean Iceberg. I don't think too many people saw Iceberg coming five six years ago. That's right, and here it is, and I know for sure that all the other big guns look, oh we have to talk to Iceberg. Yeah, you probably should, because it's just it's taken off. Right. There was Hoodi, Delta Lake, and Iceberg are the three different versions you could use. And of course Delta Lake is proprietary for data bricks. Hoody was for someone else, can remember who launched Hoodi came out of Uber

and there's a company behind that one called One House. Whody's interesting. It just hasn't gained as much momentum, you know, and sometimes it's just the most popular wins, right, and I think Iceberg is that, But who do you whods a great format too, and we do support all three, you know, back to that point like optionality is what creates that that that future proof ability, right and and and then allows you to kind of double down where where the market moves. And I think in this case you know

that that's Iceberg. Yeah, no, that's a good point. And uh, and getting that momentum, getting enough developers focused on something that's one of the key too. Maybe we should dive into that for the developer world, right, because it's I heard a great stat that every year, like the number of developers is growing by some astronomical number, which means there are new developers all over the place every year. And so how do you attract those

folks to work on your stuff? To understand your architecture, your your languages, your vision. That's all important, right, And I've seen this develop over the last really less i'd say seven to ten years. Software companies not just trying to get clients, but working for or to get developers to pay attention to them to work in their communities, right, talk about that for

a minute. Yeah, that's an excellent point, And I think developers like to work on open source frankly, it actually allows them to not only you know, build their resume on technologies that they know are going to be useful wherever they go in their career, but also if they choose to participate in those communities and contribute, they're building a different kind of resume on GitHub, you know, and that's really valuable to them. And so I think you're

absolutely right. I think this is one of the secrets, by the way of Facebook's success from an engineering organization perspective, They've always attracted tremendous talent, and I think one of the reasons is that they have a very open source centric culture that allows their developers to participate in a variety of projects and really

become almost like famous, if you will, within those projects. And that's certainly the history of my co founders creating Presto and Trino when they were there, and so many other great you know, open technologies that have come out of companies like that, you know, like Cofka, at LinkedIn and others. Yeah, I think it's really amazing, and developers are the front lines

these days. They are building the products that you use. When your products are web based software as a service, or you're selling all sorts of things using the web to sell all that stuff. Understanding all that, you need developers and you want the developers to be motivated. You want them to be interested in things. And you made an excellent point about how GitHub is like your resume. It's a meritocracy. So you're not just saying you can do

stuff. You're saying, look at the stuff I've done, this is how it works. And this is just such a white hot space. It's a great place to be for a developer working with data, working with the unique nature of any business. I think that's probably the exciting thing, right, is that every company is different. Every business has its own, its own way of doing things, its own people. Of course, businesses a bunch

of people. It's people, data, systems, technology, vision, where things are going, and what Starburst is doing and the other people in this industry. To be fair, you are providing the mechanisms of action for inventing new things, for trying new things, for deploying new things. And it's going to be all about efficiency in this economy. I mean people say it's the information economy. I think it's more like the execution economy these days.

Because the information is everywhere, the data is all over the place. The question is what do you do with it? How quickly can you make use of it? And that's really what we're talking about with these federated queries and these distributed systems, make use of it quickly, such as you can ideate, create, test market and then redo, reinvent, try something different.

It's all going to be very fluid in the next few years, and the companies that manage their data best are going to have the best chances of victory. But podcast bonus segments coming up next, folks, don't touch that Valu're listening to Inside Analysis, all right, folks, back here on Inside Analysis. Time for the podcast bonus segment with Justin Borgman of Starburst CEO and co

founder. And Starburst Galaxy is your cloud based system. Like many vendors, you started off with a non prem version and you realize that SaaS is probably the future. So now you've got star Wars Galaxy, which I think you said is the easy button for the ice house. That's right, that's pretty cool. So up in the cloud. This is metadata management largely right up

in the cloud. You would then point to your different data sources and it pulls the metadata up in there, and then you can kind of rationalize and reconcile it and work your magic. Is that about right? Yeah. You absolutely can connect to all of your different data sources. You can query them directly. You have the ability to hook up your favorite bi tool or query editor or Python notebooks or whatever the case may be, and and interact with

that data directly. And as you said, it's the easy button. We try to make this, you know, super sophisticated distributed MPP engine as easy to use, uh, with as few knobs to turn as possible to make it, you know, really low low maintenance for you from a operational perspective. We also have built in a lot of these features we were talking about earlier in the on the ice House topic of you know, streaming in jest and creating Iceberg tables, and you know, doing compaction and data profiling and

data lineage and creating curated data products off the data that you have. You can do all of that within this Galaxy experience. And you know you touched on in the earlier segment the important of efficiency. One one of one element of efficiency is cost performance efficiency, and compared to cloud data warehouses, we are roughly one third the costs of a traditional cloud data warehouse. So you can save a bunch of money leveraging this open lakehouse model or ice house as

we call it, and Galaxy makes it easy to do that. Now, I kind of hear some hints of governance in there. I mean, if you can access the systems and manage metadata and things of that nature, is governance in the roadmap? Can you do data governance to some extent? Yeah. So access control has always been actually part of our strategy, at least as a company. So we built in fine brained role based access controls,

you know, many years ago, maybe five years ago. More recently in last couple we've actually upgraded that to attribute based access control as well, and so you can tag your data and leverage that for your access controls. But also we've even added the ability to sort of automate tagging of data. And this is where we actually incorporate some AI in the product itself, where we can automatically tag p I I data to help you out in your in your

in your governance strategy. By by no means do you have to leverage that you should also be checking all of all of your data, but it helps

with with that process. And again very fine grain, so row level, column level, data masking, query auditing, all of that's built into the platform and it's enforced across all the data that you connect to, So not just your Iceberg tables, although of course that's that's a common common use case, but also across you know, my SQL postgrass oracle, you know what,

whatever data sources you may be connecting to. That's very interesting. I mean, I think it sounds like a wonderful way to have this management layer because then it doesn't matter where you are. You could be on the road, you could be at your home office, you could be anywhere. You just log into your cloud based system. You can see this whole environment and

you can run your queries from that environment. Right, I mean, you have to do your connectors to your various systems on prem or whatever they are, and see that. This is what I think is so exciting is being able to connect to SaaS sources on prem sources whatever the case may be, from one marsting area then to be able to build queries, ask questions across this whole environment and under the covers star Wars Galaxy does that right? That's

exactly right. Yep, that's exactly right. And for the power user, there's even a built in query editor, so you can literally just open it up and start using it. Of course, if customers prefer their favorite tool maybe Tableau, powerbi Looker or what have you, you can use that as well. But that's exactly right. All that powers at your fingertips. This is wild, wildly stuff, folks. While we've been talking to Justin Borgman, CEO and co founder of Starburst, hop online to Starburst dot io to

learn more about those folks, and it's moving fast, folks. I like the ice house, the open lake house. I think the ice house sounds good. That's tough. Talt you next time. Send me need if you want to be in the show. Info at inside analysis dot com. We'll talk you next time. You've been listening to Inside Analysis, jerfully everything to everyone? Or can we the station that leaves no listener behind? KCAA.

Hi, this is Chris Klin, investment manager for Capstone Wealth Management. I've been through just about every market imaginable since the early nineties, and you know what they have in common. We helped people just like you navigate them and that's given our investors peace of mind. Now, my boys say being in the game this long just makes me old, but I say it makes me

battle tested. I've been blessed to work for a lot of people who have entrusted tens of millions of dollars of their hard earned capital, and me and my team, if you'd like to see how we can successfully manage your money, let's start a conversation. The best way to do that is to shoot us an email. Info at carefromwealth dot com. That's info at carefromiwealth dot

com. Redland's Ranch Market is a unique full service international or grocery store that specializes in authentic food items from Mexico, India, and from many Mediterranean and Asian countries, including popular r items from the US. They offer fresh baked items from their in house bakery housemaide tortillas from their tortilla area, vicious array of prepared Mexican foods, a terrific fresh food and juice bar, and a

large selection of meats, seafoods, and deli sandwiches. Salads and hellal meats. Their produce department is stocked full with fresh, local and hard to find international fruits and vegetables that you cannot find anywhere else. Don't forget to step into the massive beer Cave and experience the largest selection of domestic, artisan and imported beers in the ie. They can also cater your next event with one of the delicious takeout catering trays of food. Visit them at Redlands Ranchmarket dot

com. That's Redlands ranch Market dot com. Redlands Ranch Market a unique and fun shopping destination to Hebo Tea Club's original pure power Drco Super Tea helps build red corpusos in the blood, which carry oxygen to organs and cells. Our organs and cells need oxygen to regenerate themselves. The immune system needs oxygen to

develop, and cancer dies in oxygen. So the tea is great for healthy people because it helps build the immune system, and it can truly be miraculous for I'm in finding a potentially life threatening disease due to an infection, diabetes, or cancer. The tea is also organic and naturally caffeine free. A one pound package of tea is forty nine ninety five, which includes shipping. To order, please visit to Hebotea Club dot com. T hebow is spelled

tea like tom a, h ee b like Boyoh. They continue with the word t and then the word club. The complete website is to Hebotea Club dot com or call us at eight one eight six one zero eight zero eight eight Monday through Saturday, nine am to five pm California time. That's eight one eight six one zero eight zero eight eight to hebot Club dot com with sixty years of fascinating facts. This is the man from yesterday and back in time. We go to this time in nineteen eighty two. Well, it's

official. Don Henley and Glenn Fry have broken up. The Eagles each has solo albums in the works. Here in nineteen eighty two, the new King in Town, Everybody Loves You, and from this time in two thousand and five, Ken Jennings, who won seventy four games on Jeopardy, loses a three day Tournament of Champs and a two million dollar prize to a Pennsylvania contestant. This term for a long handled gardening tool can also mean an immoral pleasure seeker. Ken, What's a hoe no, whoa whoa, and back in

time to this time. In nineteen sixty nine, while conducting your Montreal Bedding for Peace at the Queen Elizabeth Hotel, John and Yoko Lenin put out a call for recording equipment. Someone wrote oversized lyrics to their new song Give Piece a Chance and placed them up on a wall in their sweet so everyone could see with more at man from Yesterday dot com This Mother's Day, help fight

breast cancer, schedule your mammogram. Our sponsor, KT Auto Repair and Body Shop of Samberandino is family owned, serving the area with quality and pride. You're experts and everything automotive, whether it's engine work, break jobs for tune up's, minor or major collision repairs, including a full service paint center. Stop by two one sixty six South Guardina Street in Samberandino or call nine oh nine seven ninety nine five to one nine nine. That's KT Auto Repair and

Body Shop. Route supporters in the Battle against breast cancer. Tune into the Faran Doozier Show USIC marks Place in Time the soundtrack to Life Sunday nights at eight pm on KCAA Radio, playing the hottest hits in the Coolest Conversations Sunday Nights at a PM on The Faran Dozier Show with an array of music, talk, sports, in the outreach and veteran resources with the hits for your

sixties, seventies, eighties, nineties, and today's hits. The Farando zih Show on KCAKA Radio on all available streaming platforms and almost six point five FM and ten fifty AM The Farando zi Show on KCNA Radio, NBC News Radio. I'm Chris Garagio. The United Nations says tens of thousands of Palestinians are fleeing Southern Gaza daily as the fighting there intensifies. UNI spokesperson Tess Ingram calls it a desperate situation. Do we know that in the last week, three

hundred thousand people have been forced to leave Rafa. They're piling onto donkey carts and trucks and buses possessions. You and officials released the figure today as the Israeli military called for more evacuations from Southern Gaza's largest city, The Idea of reports dozens of Hamas fighters have been killed in heavy fighting near the area where more than one million Palestinians have been seeking shelter. Former Trump lawyer Michael Cohen

is set to testify tomorrow at the hush money trial. Donald Trump is accused of falsifying business records to hide a hush money payment that Cohen made adult film actress Stormy Daniels ahead of the twenty sixteen presidential election. The New York judge presiding over the trial has been telling Cohen to stop talking about the case. Judge Van Marshan issued the warning on Friday after Donald Trump's legal team brought up recent statements by Cohen, who went on TikTok last week wearing a t shirt

that appeared to show Trump behind bars. The controlled demolition of debris lying on the cargo ship that struck the Key Bridge as being delayed until tomorrow. According to the Baltimore Sun. Keybridge Unified Command says the operation is being pushed back another day due to weather concerns. Cruz initially had planned to use explosive charges over the weekend to break apart a large bridge trust and haul it away. Removing the debris will allow the ship to be refloated and returned to the Port

of Baltimore. Kingdom of The Planet of the Apes is finishing first at the box office. On its opening weekend, The action adventure film earned fifty six point five million dollars, topping expectations. Action comedy The fall Guy finished second with thirteen point seven million. Tennis drama Challengers finished third again this week with

four point six million. Low budget horror film Tarot finished fourth with three point four million, and Godzilla ex cong and New Empire refuses to leave the top five with two point five million. I'm Chris Karragio, NBC News Radio, NBC News on KCAA Lomelanda sponsored by Teamsters Local nineteen thirty two, Protecting the Future of Working Families Teamsters nineteen thirty two dot org. You're listening to an encore presentation of this program KCAA The Inland Talk Express. Thank you for tuning

in for this edition of Justice Watch with Attorney Zulu Ali. I am Attorney Zulu Ali, with a Justice Watch crew throws a new

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android