Saket Saurabh on Automating Data Engineering

00:00

In this episode of Data Driven, frank and Andy get back to the data engineering side of the equation by speaking with Sakit Saurab, cofounder of Nexler. Nexler specializes in tools for automating data engineering processes. Now onto the show. Hello and welcome to Data Driven, the podcast where we explore the emergency fields of data science, machine learning, and of course, the ever present data engineering. This is season seven that we're now in and we are welcoming. Andy is shaking

00:38

his head. If you're not watching the video, it is hard to believe that we hit season seven seasons. But by the time this is launched you probably have heard our one or two shows where we did kind of delve in deep. So you're probably tired of hearing us bang on about that. I really like kind of kicking off this first guest interview for season seven with a Saket Sarab who runs, who is co founder and CEO of a company called Nexla, whose tagline is automation for data

01:13

engineering. And if there's anything we've heard about in the last, say, six to ten months, it's all about automation this, automation that. Whether it's Chat, GPT or any other kind of low code, no code, automation is all the rage. And he also very much like a previous guest, has a cool vendor tag from Gartner. So we're going to talk to that. Welcome to the show. Soquette thank you Frank, and thank you Andy. Good to be here. So very fascinated about kind of your story. In the virtual green

01:49

room we were talking about. You used to write Linux drivers for video card manufacturers and we spent a few minutes on waxing poetic about how easy Linux has become. So what exactly does automation for data Engineering mean to you? I think let's start there. Yeah, I think when we look at enterprises and companies out there with a lot more data, with a lot more people who need to use data, there are two ways you

02:20

can achieve scale. One is through automation and the other is through collaboration and automation. Or achieving scale through automation means that the tasks that we do today, can they be automated, can they

02:32

become more intelligent? So for example, if I had to create a data pipeline and I have to connect to a data system and read that data, process it, maybe transform that, push the data somewhere, let's say it takes me four weeks or six weeks to write that code, test it QA, take it to production. Automation would basically mean that can a lot of these things be done automatically and faster? So can I, for example, not have to write a connector? It can get auto generated there,

02:59

right? Can I not have to write, test or error conditions and check for them because the system can look at the data, understand its properties and say, oh, this would be a good validation for this type of data. For example, or if I had to process or run the same pipeline. But now the data volume has grown ten x, I don't have to go and do a whole bunch of engineering to manage that scale. The system can understand, oh, the scale is increasing. My bottom neck is in

03:26

this part of the processing. Let me allocate more containers to that and just let it run smoothly. So automation is a lot about doing the same tasks that we do, but doing that faster. Why? Because something can figure out certain tasks, do it for us, create more reliability, create more repeatability, create better performance without us having to do that

03:49

manual work. So when we go back into automation for data engineering and you understand that there is so much data engineering work to do, I think it's almost impossible for the data engineers out there to just support all that demand that they have. Automation for them is like something that helps them and supports them. And it's like a lot of easy use cases can be done automatically and quickly. And a lot of difficult use cases can have big chunks taken care of in various

04:16

aspects. So that's kind of where that direction is and automation is one of the key parts to that scale. Interesting. Yeah. I really like your description of this. I'm wondering if it's okay if you share a little more detail. I've worked some with automating data engineering in the past and I find that it's very applicable when you're doing pretty much straight one to one type stuff and that's not throwing off on

04:47

your product by any stretch. I don't know if you agree or disagree, but I think about staging data so I can pull data from extracts, text files, flat files and load that into some data store, usually a database. And once I get it there, I find a couple of things are true. And I may not pull it from an extract, I may pull it from the system of record. I want to get in, get the data and get out with doing as little harm as possible, stealing as little

05:15

cycles. But once I get my copy of it, then I can start applying rules, looking for opportunities to apply strong data types and the like. And automation really works well there. Your product, I'm I'm assuming, does that and does that part well? Yeah, absolutely. But there are also parts where you're getting that extract and there is a slight change in schema now and it's more or less the same. But can automation cache that for you and take care of a few of

05:45

those things? Or do you have to go back and write that piece of code there? So there are many places where you can benefit from that. So there is what we call when we talk about automation, what is the driver of automation, what is the source of that? And we put that on an aspect of applying

06:05

intelligence to the metadata. So when we look at the data and we understand that the metadata is actually things like the Schema or I have a price attribute in my extract, but this is the behavior of that attribute, this is how it looks like, these are the characteristics. And based on that metadata, can I apply a validation rule to it, for example, automatically without having to define that. And something does it for me that is

06:28

bringing automation. So actually the roots of that come from there is so much information. For example, your data extract happens every day at 04:00 p.m. And you expect the finished data to be ready by five and on someday. It doesn't happen. It's 06:00 and it's not there. So is there automation going and saying alert it didn't come

06:48

through. Oh, by the way, the reason it didn't come through was that your stuff was all great except these 20 records in between completely threw it off because it was wrongly formatted or whatever. So stuff like that is where I think that automation really becomes an assist in this. So you know the business problem, you know what you're trying to do, this is how you get there faster. So I've heard that unexpected changes in formats, which I see it all the time because it's

07:18

what I do. But I hear that addressed under the topic of Schema drift. And it can happen a couple of different ways. You can miss a tab or comma whatever delimiter you're using. You can either miss one or an extra one be inserted in the extract process. You can have missing files, completely missing files for a variety of reasons, some of which are legit, like you're doing an incremental extract and nothing changed and stuff like

07:51

that. And I guess your product addresses that has rules for saying if this is missing, just keep going. It's a slowly changing dimension and if we miss ten in a row, it's no big deal. So what we do at a very high level, right, when we think about data engineering, one of the key problems that it solves is integrating data, getting data from point A to point B and making sure it's valid, it is trusted, it can be used by the

08:15

downstream application. There is often an implicit contract that dashboard is relying on this sort of information. So what we do basically at a high level is one we are like hey, we can figure out how to connect to new systems. This is a part where we bring automation to the connector creation. So instead of writing code for connectors, we are able to generate most of the connectors out there. So

08:37

that's one part. But when we scan so the data we understand, we do understand what the Schema is and all of that, and we present that and automatically sort of package that into what we call as a logical data product that becomes much more easily understandable by an average data user person who

08:54

understand it. So in that process, in between that, yes, the Schema Drift is an important part, but it's not as straightforward because what happens is you're getting data with first name and last name and email address and suddenly you get data maybe two records which don't match that. Is that an error? Is that a change? Is that an evolution? You got first name, last name, email address and now you're also getting phone number. Well, it's a sparse schema potentially and

09:19

it's a drifting schema. Well does it break the downstream contract because something got renamed or does it just simply add to that? So there are a few of those aspects. We do cover all of those by sort of saying that when I connect to a data system, I'm going to present that data in a certain sort of a data product view is we call it

09:39

a logical data product. So here's a logical data product, this is all there is, these are the characteristics and stuff and you decide what you want to do with it and how you want to use it. But once you have a consumer for a data product, then it sort of implicitly creates a contract and we keep track of that. And there's some interesting concepts that we do there, which is you can take a data product and create some

10:02

derivatives out of that. So you can say I have an orders or transactions and it has credit card number, I'll mask it and I have an order ID and I'll look up the items from a different entity and now I have a new data product which is enriched and which is maybe more Pisa. So some interesting you know what I. Like about that is we think about tools that visualize lineage, Atlas and Purview and tools like that, but those are very reactive, even though both Atlas and Microsoft's

10:32

implementation of that. Purview introduce automatic scans, automated scans and they manage Schema Drift to a certain extent. You know this if you've ever tried to code a tool to react to Schema drift, then you know how complex that is. And I can't wait to see large language models integrated into that process because I suspect it'll do a much better job of managing that than trying to guess what this data type should be and

11:02

such. But what I hear you describing in your contract sounds like a proactive piece to that where you meet with your client and you say, yeah, here's whatever you want to call it, our data dictionary, what have you of all of our source data and our fields and columns, fields, columns, metadata for all of that. And then you're saying these particular fields are important because they're used downstream in these dozen reports at the very end of the

11:32

process. But what you're saying is if another related field was to show up in that list, then you're going to be able to make an educated guess at whether that's schema drift or whether it's additional attributes. Is that what I'm understanding we make. An educated guess about? Is this a one off thing and we should treat it as an error? Hey, this record is an error, it didn't actually meet certain criteria or this is a change that is

11:59

showing up. And the way you do that is you observe that data over a certain period of time to make the determination and we do a certain level of drift analysis. And if the drift is very significant, then what we do is we today actually do not go and make assumptions on behalf of the user. We actually create a notification and saying we saw a significant change and we think this might be a new data product that we have detected. So we are connected to

12:26

a store. So looking at the schemas and stuff and we're saying here's a data product, that data product is transactions and this is what it looks like. Oh, some came here and whatnot. But then at some point we may say, hey, this looks significantly different, would you like to consider this as a new entity? And then it sort of notifies the user and

12:42

lets them do that. I think when building automation, at least my understanding is that's very important to understand, when to make a reasonable assumption and when to actually let the user decide. But even creating that workflow is a big value add because this is stuff that actually would have gotten missed maybe for a few days, but you are getting notified about

13:02

that upfront. So we try to be maybe a little bit more conservative, if anything about making assumptions on behalf of the customer or the user because you go wrong and it's not fun at all. Yeah, I really like it. Now that's interesting. And I see your company has been around about seven years. Yeah. What has changed in those seven years? Because I think seven years ago AI was not on the top of everyone's lips, obviously. I think certainly since Chat GPT came out. Right. It's a big part

13:42

of the conversation. Have you also seen kind of the data engineering world be kind of touched by AI and if so, how? Not as much, but I think it is getting there. So when we started very early, as I said, seven years ago, we had actually come in with the question when starting the company was do we want to build an AI or a machine learning company? Because actually in 2016 it was hot, it was hype. There was a lot of like, oh, this is going to

14:11

change the world. Always the hype is ahead of the reality. And it took a while before things like LLM came around and generative AI is starting to succeed. But even otherwise there are a lot of other AI initiatives that are still figuring their way out. But we were very clear that we want to build the technology for the

14:27

users of data. We want to focus on users of data and allow the user of data to get the data wherever they need and do whatever they want to do with it in whatever tool they want to do with. So we came with that approach because we said that the user of data is not going to be necessarily very expert in

14:49

the data system. So there's expertise in data, which is I understand what the data is, and there's expertise in data systems, which is with the more engineering side, we said the data user will not be an expert in data. So if you have a lot more variety of data that's coming, how do they use it? We'll create an abstraction. We'll create an abstraction that will give them a clean, consistent view of data no

15:07

matter where it comes from. So now you're like, I don't care if the data was a stream or API or JSON or document or whatever, I have something consistent to work with. So we went in and looked at metadata and started to apply metadata intelligence to create that. I would say that when we were doing it, a lot of people early on were like, why are you doing that? Why not just create like this straightforward

15:26

thing that everybody does? What has changed for us is in the last two years or so, our approach of creating this logical entity around data and using that started to catch on with the concept of data products. So where we were initially struggling to say what does this mean? And why is it valuable? Has suddenly become like, oh, it makes so much sense that you guys have done it this way. So that has certainly

15:48

changed. I think that application of intelligence to the metadata itself to make data tasks themselves more automated is a very valid use case. The generative models are doing quite well in things like can you generate a description for this data if this data looks like this? There's also been some really interesting stuff in terms of generating code as far as data engineering is concerned. Hey, can you generate code that reads it up from here

16:19

and pushes it out there? So I think there is that happening as well. So there's a lot of, I think, places transforming data. For example, if the data looks like this and it has to become like that, can we figure out what is the logic in between? So I think I would say that it's good that we're moving in that direction because I believe that the number of things to do is so massive that some degree of automation is essential.

16:44

I totally agree. And I was just scrolling, so apologies, Frank, scrolling around on Nexla.com and looking at your data operations. And first I want to commend you for next sets. That's a cool play. And some of the fields that you cover here in my career, I've been doing this for a long time. I'm old socket but Continuous Metadata intelligence caught my eye. That is a very cool concept. And I've done what you described here sounds like some stuff that I've done but that's a much cooler

17:23

name. Continuous Metadata CMI that just rolls off the tongue. The idea, the automated Error management and quarantine. I'm just kind of going from the bottom up here on the page for data operations. Those are just key pieces of functionality that time and time again data Warehouse ETL includes something like that. But everybody's rolling it from scratch. This is just very cool. I love this idea of abstracting that out and I'm just going to throw this out there. I'm going to be signing

18:04

up for a demo. I want to see more. Oh absolutely. You're very welcome for that. And I think the other problem that it helped us solve, as I said, there are two ways to get scale in enterprise. One is through automation and the other is through collaboration. By creating this abstracted entity we were able to say hey, this is a lot more easy to understand as

18:23

access control. I can be really good at connecting to the data from a transaction system and cleaning it up and doing some applying some compliance to it. But then the output of my work, which is the cool thing about these next sets of the data products, is you take one and you apply some operations to the output is another next set, which is identical in behavior and consistency, but it is a slightly different view of the data and you can give somebody else access to that and you can

18:50

keep repeating that process. You can imagine in a large company people are finding these, they're creating their own variants and they're using it. But what they are doing becomes an input to somebody else and they can go take it out of there. And what we did a month back was introduced the concept of you can take all of these logical data products and Nexus that we are creating and make them into, put them into a marketplace that is internal to

19:11

the company. You can allow people to go find it request to get that you're not really buying. You're saying hey, can I get access to that? And there is a mechanism to approve and give people access to that. Now the interesting thing is that these are all very importantly, these Nexus or Data products are logical entities. They're not

19:30

making copies of data. So they're bringing the same sort of benefits that containerization for example, has done on the compute side is like you have an abstracted entity, it's consistent, you don't have to worry about what was under the hood. Where did it come from, was it XML data, was it CSV? Now you have something consistent to work with and it opens up a whole bunch of interfaces. I can take that data product, that next set and say, I would like this data in a warehouse or I would like this

19:55

data as an API. And you realize that the same entity can have benefits for different users and they can approach it in different ways. So we think that is what is bringing collaboration. So when you bring together automation on one hand, collaboration on other, and then you really get the benefits of scale from both technology and process. I absolutely love you. Now I get why you keep saying data product and it makes perfect sense now.

20:21

You're creating a very interesting, almost an integration layer in between the idea of Containerized for code and you're containerizing data. And that's what I believe your data product represents. And now I'm really interested in that demo. Well, especially if my recent forays into OpenShift and kind of what Containerization has done for developers, I think it's only a matter of time before containerization kind of

20:53

hits the data world. And there's something I want to point out is that it was very smart, I think, of you to focus on the data engineering side, right. Because AI is the hype machine, right. I fully admit that I say this as a data scientist, right? But one of the things that we've kind of discovered, both Andy and I, and in both our professional careers is that data scientists will tend to brush aside the simple basics of it's. Five words, right? First you get the data,

21:31

or first we got the data. Right. Behind that is months of work. It's orders of magnitude of work. In fact, one of the spiels I have now is kind of like the idea of rock stars and roadies, right? And for every rock star there's, there's an army of Rhodeies that set up the lights to move the chairs around and take set up the equipment and

21:55

manage the sound. So I think that the data engineering, I think, is one of those things that I think has not it's like the Rodney Dangerfield of kind of the data world where it didn't get any respect up until lately because Chat GPT will get all the headlines, right? But think about the data that went into it. I've heard numbers, billions and trillions of parameters for four. I think it's smart that you picked that and I think that actually worked out pretty well for you.

22:25

Actually what worked, fortunately for us, was that I actually really looked at machine learning as a way to I'd been an entrepreneur before. I'd built a company I really enjoyed sort of building the data aspect of it. I had built a company in the advertising at Tech Space, built one of the earliest mobile ad servers. We became part of the largest ad exchange at the time outside of Google. So we were processing over 300 billion records a day, and my co founder here was running that

22:53

infrastructure. And we're like, at some level, be candid, I didn't really enjoy being in advertising, but I did like the sort of data challenges that were there. So we were looking to figure out what is the next because we had taken that company public and all that stuff, and we moved on. I was like, okay, what is the next thing you want to work on? And I really seriously looked at building something because Machine Learning and AI was hot

23:12

topic even in 20, 15, 16. But when I looked at it, I understood that at some level, it is so specific to that business and that company and that problem. It almost becomes consultative. And I was like, how do you platform it? It's hard to platform because even you can take two retail companies and they're solving maybe the same problem of recommending products to people and their models will be very different. And what you do there is not easily translatable. So I

23:38

hesitated for that reason. And when I look back, in hindsight, almost some of the major companies that came out on machine Learning, ultimately, when you look deeper into it, there are large professional services organizations under the hood. And that's what gave us a hesitation. I'm more of an engineer. I like to build a platform and make it once and let people use it. And the data engineering part of it is what

23:59

looked like that thing. But I don't know if you have read this paper called The Hidden Debt in Machine Learning Systems. It's a Google paper and it actually talks about there's a very cool diagram in there. I don't know if you can screen share here, but actually it's a diagram which shows that in all these different boxes, there's a tiny box called Machine Learning, and there's huge boxes around it which are all essentially data. I think it's a 2016 paper. If I'm not, I'm looking it up.

24:27

Yeah. Hidden technical debt in machine learning systems is the paper. And there's a diagram on the third page which shows that. But it is interesting that even at that time, people who were working on these things were seeing the same pattern. Yes, I have seen this diagram. This is something that comes up in my day job quite a bit, where we talk about how the machine learning is only one part of it, right? There's a whole lot that has to go into that. So yeah,

25:10

and that's a good point. And I think that everybody wants to be the rock star, right? Everybody wants to have their name up on lights, but the amount of people that goes to make that rock star look good, there's a lot of opportunity in there. And he's been a guest on the show. He's famous within a certain Internet circle. John Lee Dumas. Right? He has a phrase where I like boring boring is good because no one is competing to do the boring stuff. Not that data

25:40

engineering is boring. I want to head off the hate mail right there. But no, I mean it's one of those things where there's enormous opportunity. If you look that box, it's one part of the the whole operation and. There'S a tiny little ML code box. In the it takes out all the oxygen in the room. But realistically, in order to have that little box, you need to stand on a lot of other operations. Right. Andy and I kind of had this back and forth and obviously data science is

26:20

important, blah, blah, blah. AI is important, machine learning is important, but it stands on the shoulders of giants. Another analogy I've seen where it shows like a rocket, right, in a little capsule that holds people. But it's sitting on top of a massive rocket, which of course has the launch pad and all the other accessories to it. That's another way to look at it. Right. It is crucial. And I think it's a shame that we kind of not we, but it doesn't get the attention that it deserves.

26:51

Yeah, I think data engineering is complex. It is also painful. It is also something that has to be done at a massive, massive scale. And it's challenging. But remember, it's a means to the end and people get fascinated about that end. That happens eventually, but the means to the end takes a lot of work. And sometimes, to be candid, it becomes tankless work. Because why are we such a big proponent of bringing automation into data

27:18

engineering work? You cannot automate all of data engineering, just to be clear. But when you bring in automation, you're saying that if I'm running literally, I have customers running thousands of pipelines in our system, for example. And you don't want to be waken up on that Saturday night because one of those is not working. You want automation in there. Right? You want that system to work for you. Otherwise all you get in data engineering is a lot of lot of work to do and complaints when

27:42

it doesn't work. But it is a very key niece then. So I do celebrate the work that they do. And if you look at OpenAI, for example, because it's such a hot company right now, that billion dollar plus in funding, very few companies could have done what they have done because they had that kind of money okay. Right. To begin with. But I bet if you look at how that money got spent, I'm sure a big chunk is in pipelines because of data and processing that and moving that around and we don't talk about

28:09

it. But the reason that Chat GPD works so well is because it can look into all of that data and talk intelligently about it, right? No, absolutely. I don't know when this switch happened, but in terms of staffing, a. Previous job, at the end of 2021, they wanted to do something. Can't say what it was, but they wanted to do something. And they said, oh, it'll be challenging to find the data scientist for this job. And my manager and I can look each

28:43

other. Actually, at this point in time, it's going to be a lot harder to find a data engineers that you're going to need for that. Right. Because they only really needed two data scientists. But just based on what the aggressive thing that they were trying to build, they would need, I would just spitball and I would say a dozen data engineers.

29:02

But from a technology provider and a tool provider perspective, I would say the interesting thing about data engineering is it is very complex, but the challenges are very consistent. I can look at our customers in retail, like a BedBath, or Forever 21, or in delivery like DoorDash, or in Pharma like Yansen JNJ, or in financial services, or in cybersecurity, all of them. The challenges at fundamental level are very similar, which is large amounts of diverse heterogeneous data.

29:32

Being able to take that process that do that, reliably detect the issues, all of that stuff. Have data quality monitoring, make that data usable by people scaling. All of those are very similar. Which means that it fits very nicely into the traditional sort of problems that can be solved by software, problems that can be solved by automation sort of model. Right. So I think that is definitely a part where the challenges are not very unique to a certain problem. And of course, there

30:03

are unique flavors to it. Some have real time data, some have data from devices, some have data from legacy systems and so on. But yeah, there is structure. Excellent. Cool. Very cool. All right, I'm tweeting about Nexla. Cool. Thank you. Right now. And for all the stalkers, we are recording this on April 12, just FYI. So if you look at Andy's feed and you're like, where is the tweet? You have to go back. So at this point in the show, we want to switch to kind of the

30:41

pre found questions. And given your background, the first one, I really have to know the answer, right? We always ask, how did you find your way into data? Did the data life find you or did you find data? I think it was happening together, right? I guess so. My decision to start a company, and I mentioned to you guys, I'd been at Nvidia, on the compute side of the world, really, and at some point I decided to start my own company. And when I was looking to do that in 2009, I

31:10

was like, where do I go? Build a platform, if you will. And I felt that at the time, in 2009, apps were new and I'm going to build a monetization platform for app developers in the mobile space. That whole approach about building something around ad servers and one of the early ad servers in the mobile space, you realize that it's a very data driven world in advertising. And there is a reason why a lot of the data innovation, I would say, if you trace its roots,

31:38

come from advertising. Whether it was Yahoo, whether it was Google, whether it was Facebook, what were these guys doing with data in the first place? They were dealing with a huge number of people visiting those pages and clicking on those ads, and they had to really figure out how to show the performance and say, which ad should you spend more money in, where should you not? And a lot of machine learning systems that we built early on back in 2011, twelve actually were for that purpose.

32:02

We ran one of the largest ad auction systems at the time, and if you're running an automated ad auction with 15 billion auctions happening and 300 billion bids on that, you have a lot of data. But you can also figure out that, hey, based on certain patterns, I can decide who to invite for an auction, I can decide what the floor price should be. And those were all

32:24

machine learning systems. So I ended up actually building this advertising technology and system in those days to solve that developer problem of like, I'm building apps, how do I monetize it? But realizing that a whole chunk of it was data, and this was a lot of the data stuff that we did was pre kafka, even when big data hadoop was relatively early at the time, so the technologies were limited. Did a lot of homegrown stuff at the time, but realized that this

32:54

is a massive problem. And I think data and I sort of met in that time, but I'm coming from this compute land. I'm building software for embedded systems where you are trained to squeeze every single kilobyte and every single ounce of performance in some of these systems. Like I mentioned, I was building software for the PlayStation Three when I was at Nvidia. And you come from that mindset of high performance squeezing the most of the system, and you see

33:20

the data challenges. And I thought it sort of intersected nicely for me in sort of a developer or a product approach. And then we said, well, more people need to be using data. That's where the world is headed. Everybody is going to become a data user. I could see my second grade kid at the time, and they do simple survey in the class, and they created a histogram like, okay, everybody's going to be data user. How do you really get there? Not everybody's going to

33:47

be technical and engineer. So that is kind of where my sort of direct experience in the data world started to happen, is like, we want to solve that problem. We want to make it possible for anybody to use data. And what is standing in their way is that data is complex. It's everywhere. It is hard to work with. Only developers are able to do that. So we're going to automate that. We're going to bring that and present that data to this user and they'll be able to use it

34:12

wherever they want. And that was the driver for me. Wow, that's fascinating. Our next question is what's your favorite part of your current job? I think the CEO job and the co founder job is a new challenge, I would say every day and all sorts of unexpected things. I'm still a very product person at heart, so it's like those things are always fun to look at. But I would say no day is similar to the last one is the best part of

34:47

it. I think about a month back, all of a sudden we were like, oh, the bank that we are banking with is going under and what do you do? Did I go in or did any of those CEOs go in on that Thursday or Friday morning to say, oh, this is what I'm going to do. I'm going to spend the whole weekend figuring out if we have any money as a company or we're out. All of a sudden the rug pulled underneath us. So I think that's the challenge. But that's also the fun of this

35:17

particular role. I do enjoy the thought that we are doing some very cool stuff in data. And the number one source of satisfaction for me is when our customers come to us and say, this user of ours in this pharma industry, they're like, you know what, this data that we're using with you guys, it was processing in multiple hours and now it happens in nine minutes. I was like, wow, my goodness. And when I hear those kind of stories, I'm like, okay, we are actually

35:44

delivering value. I don't know how to make a medicine or a medical device, but these guys who know how to do that, we are somehow enabling them to do their job better. And that is, I think, ultimately the satisfaction of the work. Right? Very cool. Interesting. So we have three complete the sentences. And the first one is when I'm not working, I enjoy blank. So many things. I love road biking. I do that a lot right now, but I love snowboarding and

36:19

a bunch of activities. I don't do flying anymore actively, but that's another one I would do. Nice. Very cool. Our next complete the sentence. I think the coolest thing in technology today is blank. I would have to say that Generative AI is certainly one of the coolest things out there. I think I'm still trying to understand from a technical engineering perspective, like the ins and outs of it, but it is fascinating. It is also scary, to be honest. I wouldn't deny that either.

36:53

Well, it's funny you mentioned that because I was thinking on that earlier today. Even. And the whole idea, the moving parts, when you start thinking about the image generation, just take a subset and you think about what goes into that. You describe something, so it has to understand what you describe, and that has that LLM component to it. Right. And then it interprets that in such a way and then probably tokenizes it, and then it generates this

37:29

image. And I was reading a blurb, a quote from someone at Nvidia today, and that's what kind of got me off doing Billable work, mind you, and running down the rabbit hole. And I think it was a guy from Nvidia. And if it wasn't, then I apologize. But it was a person at Nvidia who made the statement that we're approaching that point where we're no longer rendering the pixels we're generating at the pixel level. So now they're rendering

38:05

splotches of it. Yes, generated, but probably pulled in from someplace based on the description. A tokenized image from a tokenized description. But they're talking about generating the pixels suck in. Wow. I would like to understand how lighting because lighting has always been the biggest part in doing this, right? And how that applies to it. But I would say the reason I used to have unconditional love for technology innovation was ten

38:33

years ago. Everything that is better is always or faster is better. But I would say that post social media and YouTube and all of the stuff have become a little bit concerned that we really have to understand what is this technology going to do? And nothing scares me more than that about genetic AI. Is that

38:51

okay? It is a cool piece of innovation, but unlike a faster chip, which was almost a no brainer, I think now the question is like, oh, okay, what is it going to do that we can't even think about today? Yeah, I'm with you. And I get the hesitancy and the thinking part of our population calling for a moratorium, kind of a six month pause. Knowing what I know about geeks and engineers, even knowing what I know about me, that's not going to happen. I

39:20

found a quote. It's from digitalnative Substac.com, and here's a quote from it. It's in a talk with Sequoia last week, nvidia CEO Jensen Huang said, every single pixel will be generated soon. Not rendered, generated. And that was from, I think, at the time of this recording. It's either the latest from Digital native or next to the latest. And he's a very smart guy, and he knows the stuff better than anybody out there. So if he's saying it, I believe it. Okay?

39:56

Yeah. But I don't know if everybody gets how big of a leap that is to go from what we were doing before to generating pixels. I don't know. Maybe I'm making too much out of it, but it boggles my mind. At least the game logic and stuff is going there. I was reading something about how they put these AI agents in the game.

40:20

In this paper that came out, I think, two or three days back, and they figured out to do a Valentine's Day party in there and invite people and all that sort of social mechanics were happening in that sort of generative way. So they're not pre programming these goodness.

40:34

That was also sort of crazy fascinating, because when it means to game development and the gaming experience, you can have a truly sort of a multiplat storyline of any sort and you don't know what everybody's gaming experience is going to do different. I mean, there was a thing at a time when you would code those things and make people have a different experience. Right. One more thing and then

40:54

we'll shut up. Do you remember that Black Mirror episode where the lady's social score was just crashing as she went to some gathering and by the time she got there, she couldn't get in because she didn't have a high enough social score? It's like we're getting to more and more to that point in some place. Of the world that already exists. Your next fill in the blank. I'm sorry. Sure. No, it's all good stuff. That's what makes this podcast I mean, dare I say, it makes this

41:29

podcast look cool. But that's what makes this field interesting. Right. It's not just about the bits anymore. Right. There's social connotations. Now, I used to also be unquestioned fan of technology. It makes everything better, it's going to solve the problems. And here we are some ten years later. It's like we actually created a whole bunch of new problems. And I don't know, is that maturity? I'm ten years older, or is that kind of the state of the technology that

42:01

we have created? No, I'll let the philosophers debate that one. I think it's evolution because we went from building the basic infrastructure, which is like chips and compute and stuff, and those things are but we didn't see that end application to the level that we are seeing now. So it's better building technology. Bricks, cement, all that is good, but suddenly cities are built that look crazy and whatever, and you're like, oh, is that what we're trying to do

42:27

here? So I think it's just applying technology to every kind of problem. Yeah, no, absolutely. All right. And our third and final complete the sentence I look forward to today when I can use technology to blank. I think that I would say, like, the self driving car. I think it would save me a good bunch of time if it really becomes reliable. I don't know if I'll trust it, but. Why is it that all the engineers are suspicious of self driving cars?

43:04

I'll be candid with you. At some level I feel like when I'm driving the car, it's not that much work. And I'm sitting there, I might as well just have my hand on, because most of the stuff is got an automated like those adaptive cruises and lane management and stuff. So, yeah, that much work. But yeah, adaptive cruise control is not self driving in the purest sense, but I will tell you, I can't live without it. Now, my car, this is first world problems, right? My car will break all the way to 0.

43:38

Wife's car will stop at around 22 when I'm stuck in a traffic jam in my car. I wouldn't say it's no big deal, but I can just kind of sit back and let the car handle the braking in my wife's car when it goes below a certain speed. Now I'm on the hook. It actually is annoying, which is pretty funny. But next question. Andy? I'm in. Yeah, I was going to interrupt you there, Frank. Before you yell at you to put down the shovel and climb out, share something different about

44:14

yourself. Socket. But we remind our interviewees that it's a family podcast, so we want to keep our clean ratings. So something different about you. And you already mentioned a few. Yeah, from a family perspective, I'm a dad of three kids, two boys who are 13 and ten and a daughter who is six. And there is definitely a joy to seeing all of that happen. So, yeah, I'm a pretty regular individual in that way, but definitely bitten by the desire to build. And I'm like, I see a problem, I

44:56

have to solve it. And that's kind of what landed me in this entrepreneur boat. Okay. Yeah, that's awesome. Awesome. And the next question, although technically speaking, it's out of order, so I need to fix that. Audible sponsors data driven. Do you do audiobooks? And if so, can you recommend a good one? And if you don't do audiobooks, any book recommendation will do. I do actually love them. That's where the driving part comes in,

45:28

right? I mean, that's the way to to make the most of your driving time is to be listening into a book. I do a lot of technical stuff, actually, I think the most recent book that I was listening into was I'm just looking back at my bookshelf because always get a physical copy as well. I do the same thing. Yeah, I think my most recent book that I really enjoyed was I think the book called Sapiens. I don't know if you have read yes. Oh, I've heard of that.

46:09

Yeah, it is an older book, but yeah, I got to it more recently. I've got it. Haven't read it yet. That author wrote something else that I read. Was it the guns? Germs and steel or something like that? That's an amazing book. It's a difficult read, I would say. That'S why it is. We can sit back and listen to it. Yeah, crank that dude up to 1.25 and let it go.

46:38

Yeah, exactly. I actually just finished The Wolf of Wall Street, the Abridged version, and I'd seen the movie, and I follow a lot of the other things that Jordan Belford had kind of done. But after listening to the story, there's a lot that didn't make it into the movie. And the impression I'm left with was truth really is stranger than fiction. I think they didn't put some of that stuff in because no one would have believed it. And if you've seen the movie, it's a

47:12

pretty wild story. Anyway. So there's like even crazier stuff in there. Oh, my goodness. Yeah. So it's hard to imagine there's a. Second edition of The Black Swan out. Oh, really? Yeah, it's got a few extra stuff in it. There's definitely an appendix, I think a brand new appendix at the end where he talks about implications and applications. But there's new stuff all the way. Even in the introduction. He added some nassim. Nicholas Tilleb and I believe the first book of his

47:44

encerta. So very interesting listening to that. I have to pick up the New Blacks one. That'll be interesting. Yeah. As a product person, I actually love also listening to a lot of these books on the business side of things. So I was listening to this book called The Ultimate Sales Machine maybe a couple of months

48:08

back. And what I really enjoyed was that while you're listening to these books in the car and you have this free time, which is so hard to get, by the way, these days, is that gives me the space to think about things and reconstruct my ideas and

48:23

thoughts. Most of the ideas that I get about maybe writing a blog post and stuff like that happen typically in the car, listening to something that triggers some idea in my head and do that because otherwise the whole day is like meetings and chasing stuff and whatever. Sure. Awesome. That's a good point. It's interesting how it always good to stretch your brain, because our brain is pretty much I'm not physically fit, but I like to think my brain is, at

48:54

least. It's just good to kind of I get a lot out of just walking around or driving and kind of thinking about things and listening to books that are kind of outside my norm. Right. Which is probably why I picked up The Wolf of Wall Street because I'd been doing so many technical books and kind of sales books, and I was like, Let me check that out.

49:14

Sometimes reading something or listening to something in one context, which is completely different, suddenly starts connecting to all the stuff that we are doing, like day to day. Absolutely. Yeah. I totally get that. Well, cool. So with that, is there anything where people can find you? I know@nextla.com. Yeah. next.com. And on LinkedIn, actually, I do engage with a lot of folks in conversations over there as well. Excellent. Awesome. Well, thank you for

49:48

being on the show. And we'll let Bailey wrap us up now. That was some show. Is it me, or are the shows getting better? It could be my bias that leads me to say that. But I figured I would ask to get more input. After all, what's an AI without good input and a feedback loop? Speaking of feedback, have you checked out Data Driven Magazine yet?

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript