225: The Stone Cold Truth About Data: False Hopes and Hard Truths with The Cynical Data Guy - podcast episode cover

225: The Stone Cold Truth About Data: False Hopes and Hard Truths with The Cynical Data Guy

Jan 22, 202534 min
--:--
--:--
Listen in podcast apps:

Episode description

Highlights from this week’s conversation include:

  • False Hope in Data Roles (1:17)
  • Naivety of Junior Data Analysts (4:27)
  • The Challenge of Defining Data (6:41)
  • Struggles with Enterprise BI Tools (9:43)
  • Career Advice for Data Professionals (12:36)
  • Generational Shifts in Data Roles (16:51)
  • Self-Service Data Requests (18:17)
  • The Importance of Analysis Skills (19:46)
  • The Broader Context of Analysis (21:44)
  • Boring Challenges in AI Deployment (23:29)
  • Technology Development vs. Human Absorption (26:14)
  • VC Resolutions for 2025 (27:00)
  • Value Addition in Leadership (32:08)
  • Final Thoughts and Wrap-Up (33:06)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript

Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to The Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies.

Welcome back to the Data Stack Show. Today, we are welcoming back one of our favorite recurring guests, who is now referred to as the stone-cold Steve Austin of the data world. The cynical data guy. Matt, welcome back to the show, or Steve, I guess I should say. Thanks for having me back. Eckhart Ricker, thank you for that little tidbit right there.

Yes, we do have to thank Eckhart for actually showing us what we knew all along, is that, you know, the world of data is actually a wrestling ring where people act out elaborate, you know, elaborate violent scenes. But no one actually gets hurt. It's all K-Fab. That's all it really is. Okay, well, actually, Stone Cold Steve Austin of Data, we have a really good one to start out with.

One of our good friends, Rogagen, he's been on the show multiple times, just a fun guy and a longtime friend of the show. Man, he had such a great post about false hope. So I'm going to read this, and then I have a question for each of you. So cynical data guy and agreeable data guy. Man, this is just so good. No one is more of false hope than a data analyst building a dashboard they believe the entire company will look at.

It's about laughing. A data engineer who just had someone tell them that the schema will never change. A data scientist analyzing a data source they thought was 100% accurate. A data leader planning out how to build the single source for their company. Okay, so here's the question. I hope the listeners are laughing because all of those are just unbelievable. Which one of these is the highest level of false hope?

We're the highest level of delusion. I'll let you go first, Matt, but I definitely have one. Maybe two. You can't have two. It's which one's the most. So, I mean, this is probably also biased just because of my background. I would say the data scientist who thought that was 100% accurate because that's just you, sweet summer child. That does not exist.

Actually, can I ask one question on that? Because I'm not super familiar with it. I don't have a lot of direct data science experience. But the little that I do have, like a really good data scientist sort of assumes that they actually assume that going into the pod, right? It's always wrong. The first step is usually to figure out where it's wrong so that you can correct forward or remove things or whatever there.

I have more questions. As jumping off from that, that's essentially the whole field of applied statistics. We know this was wrong. We're going to do things to make it more representative. I'm going to go dashboard. I think that to me is the largest amount of false hope for two reasons. One, because data analysts tend to be like...

that tends to be like a career starter. There's a lot of people that start their careers as a data analyst. Some as engineers or scientists, but data analysts, most of those jobs, a lot of companies have entry-level data analyst jobs and maybe don't have entry-level data science jobs. So there tends to be, at least for juniors, a higher level of naivete for that particular role if you're a junior data analyst.

And yeah, and it just makes me laugh because I've been that person. I've believed like, oh, this dashboard is going to be great. Even sometimes this executive wants me to build it. So you even have an executive that's a little delusional about it and like, yeah, we're going to have this wonderful thing. It'll be our North Star for the company and everybody will look at it. Well, I mean, so yes, I think there's naivety there because a lot of times they don't know any better if they're...

earlier in their career but i kind of view that as not the most delusional because of that now if you're and also because if you're like a it's more excusable yeah and if you're like a 10-year data analyst like your soul has been getting ground down for 10 years so the cynicalness of it is going to be pretty high sure right there you have a pretty clear view from the mountain of unused dashboards that you spend upon right

I mean, I think it was my first dashboard I built that was, I had just moved into a role from another part of the company and my boss was like, we're going to build this thing for operations. I was like, yeah, sure. And no one ever used it. And I was like, why? And then I looked at it and I was like, I realized that a meeting we had with operations, no one had asked for it. And like right there, I was like, well, I'm never doing this again.

I was so excited. One of my first dashboards, I was so excited because it went on the TVs at the big operation center. I was like, man, you know, like, I don't know a hundred, how many hundred people are in the operation center at this company. And then I find, and then, and it was a remote site. So I'm like physically located somewhere else. And I'd occasionally visit this like operation center. And I remember visiting and the TVs were off and then I inquired about it. And it was like, oh yeah, like.

tv has been broken for like a month and like they may fix it at one point like and then i and i just like it occurred to me like oh they don't care that the tv doesn't work and like oh then i'll look at my dashboard that's on the tv well and one other thing i'll just say is i think on that kind of sliding scale the data leader one would generally be would be the highest i feel like because you should know better except for the fact that

I think a lot of them don't believe it. That's just what they're telling management. Like, yeah, we're going to work towards that. But I don't think they really believe it. And if you do believe it at that point, you're the biggest sucker in the room then. That's a great point in that the single source of truth, the single source of truth conversation actually a lot of times comes from management or the business, right?

There are all these problems from all these functional areas of the business where they say, oh, well, I need this data or I don't have this information or there's some sort of problem in me hitting my number because of this data, right? And so then someone from management says, okay, this is a technical problem, a data problem. We need to solve this at the root. And so the data leader, sort of their big project is, okay, you need to go figure out how to build a single source of truth, right? Yeah, I feel like that comes up partially because you get the situation where you have an executive who,

every month they're doing reviews and it's always, well, our numbers say this, but finances numbers say that, but marketing's numbers say that. And so they're like, you know what? If I could just have one place where like all the numbers were the same and everyone had to use them. But the problem with that is they don't realize is that one, none of the business units want that because they've all skewed it towards what's best for them. And they are going to fight you on that. So you're just adding another number to the fight. And also,

I think also there's this thing where they're like, you, data person, go define this. And a lot of this stuff is not definable by the data. It's like, okay, well, we need to know what a sale is. Well, finance has one definition. Marketing has another definition. And no one wants to be the person who says, everyone shut up. This is what the definition is. I think I'm least cynical about this one because I have had some success with that.

But at a large company, yeah, forget it. That's never going to happen. Smaller companies, you never get there 100%, but I think you can get closer. There's a lot of things that have to go right to get close to that. I think it's one of those, if you can fit everyone around a table with a pizza, you have a chance of that. Exactly. Once you get beyond that, no. Especially once budget and comp is tied to any of this stuff.

They screwed. Okay, quick question, because we have many more juicy morsels to move on to. One question on the data analysis and the dashboard. So I was meeting with a customer in person, actually, which was great. It seems to be more and more rare. And we were talking about data, and they were discussing some of the things they wanted to do in terms of capturing product telemetry to understand onboarding better and just some basic things that they wanted to do.

And so I was asking about what they currently do for analytics. And this is a startup, okay? So it's a venture-backed startup company, not very big, but sort of early stage, right? So it's looking for product market fit. And they said, and this is a product manager, and they do some data stuff in their platform, actually. And this is the product manager for their data, features and functionality. Super smart. And he kind of laughed and he said, well,

we have Looker and Tableau. And I'm thinking, wow, this is not a very big company. How does that happen? Because to me, that gets at this false hope of someone is onboarding non-trivial enterprise-grade BI tools at a small company, and they're not dumb people, right? These are smart people. And you're talking headcount of like 50? Yeah.

Yeah. Like very small. Yeah. Yeah. Somewhere in. So how does that dynamic happen, right? Because that's fascinating. And I guess this stark contrast of we have two enterprise-grade BI tools, and we're talking about how to get basic product telemetry so we can optimize onboarding flow.

That's just fascinating. It's a speed optimization thing, usually, where some person comes in, they know Tableau, it would take them X amount of time to ramp up on said other tool. Or they just don't want to. Or they just don't want to, sure. Yeah, the person before him used Looker. And then they make the argument, hey, we're a startup, it would take me X amount of exaggerated time to ramp up on this other tool.

Let's just use this tool I already know. And then like tout some benefits of whatever tool they already know that may or may not be true compared to the other tool because they don't really know the other tool. I think also one of the ways you can have that is the person comes in, they want to use Tableau. They don't even have that discussion first. They just download it, build stuff. Sure. And then they're like, oh, it would take me so long to do this in Looker, but look, I've already got it in Tableau. So now you have to buy Tableau for me. Yep.

Probably also Big Query or the Google team and what they did with Data Studio and Looker. I believe this company is running on Big Query. We didn't talk about this specifically. But man, there's an easy on-ramp to go from Data and Big Query. You can get it in Looker Studio and then Looker. That pathway is super easy if you buy Big Query.

yeah and it and if you started with there was like one or two people with like a tableau license then someone else need to do something in another department right it was like look we can just turn it on and we've already got it up there and then once it's up nobody wants to change it because nobody wants to actually consolidate it into one place okay moving on to round number two this will be a a double header here so two posts

So one is from Tris J. Burns. And this is great. This is going to be a great topic. Short statement here. Before starting a career in data, I highly recommend gaining experience in another field. And then I'll follow that up from another longtime friend of the show, multi-time guest, Ben Stansel, who continually just produces unbelievable thoughts generally. But a quote from a recent post, a couple of quotes here.

We can't just be analysts or analytics engineers. We have to decide that we want to be true experts in understanding how to build consumer software first and product analysts second. Or define ourselves as working in finance, then become an analytics engineer at a fintech company. Because there's a corollary to Dan Liu's theory of expertise, while it implies that we can become pretty good at stuff pretty quickly, it also implies that other people can become pretty good analysts.

And in almost every field, that combination, a domain expert and 95th percentile analyst is almost always better than the inverse. And then I'll close it out with another. This is a paragraph down. We probably can't get away with being good at asking questions. We need to know some specific things too. Cynical data guy. That one cuts a little bit that last line right there because I've used that before.

I think this is one of those that, I don't know, if you've been in the field, you can kind of go back and forth on it sometimes where it's like, yeah, we really need people who know the business and stuff like that. But if you don't have enough of the data working with it, then you run into a lot of problems with it. And then you flip over to, no, I just need someone who's really good at data. I just need a data engineer who can just step in here and doesn't care and can just put stuff in place.

You know, I think there's I think especially like 10 years ago. You could kind of be more of like I'm a data person. I think that's shrinking over time. I don't think it'll get rid of it completely in the near future. But the idea of we're going to have like a really, you know, like a data team that does like, OK, it's all the data analysts and they just are like data experts who help the other the other parts of the business. I don't think that's going to last. I think that's.

already kind of gone in a lot of places agreeable data i think and this is i think this is in the same article he kind of ben does this like really good comparison with data and science and says like hey science is a subject you can take in school and then he has this like great like science is what you take in fifth grade not what you win a nobel prize for so like there's that like

Yeah, that's so good. And that's especially what's becoming data. What data is becoming for businesses, it's like, cool, you do data. That's great. You have a business degree? That tells me almost nothing because it's so broad. And I think part of what the domain knowledge brings to it is it brings specificity and more clear application. Like, oh, you know about...

marketing data or marketing data for law firms are like you know you can continue to drill into specificity which makes it more valuable well i think if you look at the original kind of data people if you go back 10 plus years ago there was no like institutional educational infrastructure for it so you had to start at something else and then you saw this thing as data and you got into it and then there was kind of the move towards

semi-professionalizing it, I would say. It wasn't completely because the barriers to entry still weren't super high. But there was this move towards like, oh no, this is going to be a professional thing that you do. But I mean, I think where that's going to go away partially, I think that was partially a generational thing. The idea of like, hey, we need a data person to do this is a little bit like going back to the 90s and having someone be like, I need an assistant to do my email because I don't do email.

Like that whole, there's a generational aspect of it too. If you didn't grow up with it, you don't really want to do it. But there's going to be, you know, you've got people who are now in their 40s and 30s that like they've grown up with this basically in their corporate life. They're going to be more likely to be a data person, you know, be a finance end data person or a marketing end data person. Yeah, sure.

I worked with someone. Actually, this person exemplifies someone who had domain expertise in multiple areas and wasn't technically an analyst and had never worked as an analyst, but was probably one of the best analytical people that I've ever worked with. They worked in supply chain. They worked in marketing disciplines and were unbelievable.

at like general math, right? And so you combine all those things and it's like, wow, they're unbelievable at like solving problems with data or like uncovering things. Anyways, coincidental that this person fits that exact profile. But returning to your point about like someone answering my email, they worked for someone, I want to say they were like, I don't remember the details. It was some sort of chief of staff type position where they were, you know, sort of assigned to an executive to solve problems, right? Go on and solve this problem, right? But one of the things they did was,

they had to print the executives emails out for them, like figure out what the important ones are and print them out and put them on my desk. And so anyways, Matt, I was like, okay, what's the data corollary to that? Like printing the email, you know? Well, I think, I think some of that is like self-service BI, right? Where you're like, Hey, look, you can go do it. And they're like,

can you put that in a pdf and send it to me right like answer my question for me copy it over into an email and send it to me yeah that's what they want i don't want to go look through yeah can you make my google sheet update instead like i don't want to open the dashboard yeah yeah can you

I don't want a Google sheet. Can you just screenshot it and send it to me every week for me? That's what I want. Or put it back in Salesforce or my marketing tool. I don't want to go anywhere. Just let me use my tools and shove the data in there. I think it also depends on the job because there are those infrastructure jobs that realistically you're not going to go from kind of...

necessarily i was in the business too now i'm doing what's becoming more and more a very like software engineering type role right but but i think for those analyst ones and i will say i think also if you come from like there's two ways you can kind of go about this you can either be a person who really cares about data for some reason and then you have to learn the business aspects but i found a lot of people who

come in through that data one, especially in the last, I don't know, five years or so, they don't really care about the business enough to want to learn those things. So it might be easier to go the other way around. I do think though, we also need to start defining that a data person is not just someone who knows SQL and a little bit of Python. That tends to be, if you look at like, hey, we're going to turn our business people into data people, it's we're going to teach you SQL. And it's like, okay, but

To do this well, you have to learn how to think well. Yeah. And that's the part that's missing in a lot of this. And if you don't do that, you're given, it's going to sound bad, but like you're given a monkey, you know, like a chainsaw and you're like, look, he can do this. Well, no, he can't do this. It's going to cause damage. That sounds a lot like, you know, WWE in the, you know, monkey, the monkey, the chainsaw in the wrestling ring. Well, maybe I could go wrong.

I'm sure there's some ring in Japan where they've done that. Yeah, yeah, yeah. No, I think as moderator, I don't even think it's a hot take on that. I agree, Matt, in that I believe that this has always been true, actually. And what I mean by that is when I think about people who are

The people that I've worked with who are incredibly good analysts, they fall on an extremely broad spectrum of technical skill with data, to your point, you know, SQL or Python. What they're really good at is understanding and breaking apart a problem into its component pieces so that they know what type of analysis even needs to be done, right? And even those people, like we kind of joked about,

you know, printing the emails or like putting the thing in a PDF. But the thing is, some of those people may not have the technical skill to do it, but they know how they know the best way to solve the problem. And they may need to bring someone with the technical skills in to help solve a legitimate, really tricky statistical analysis problem because of some very, you know, outliers, underlying data issues or whatever. Right. But to your point, Matt, they understand.

the best way to approach solving the problem because of the larger context and that's actually the core of like good analysis yeah and i think that's the part that's missing when we talk when like if you make the comparison to you know data is like science well science has a method to it that is then applied in all these other ways we don't really have a codified method like i think there's people who are good at it

And if you talk to them, they all kind of fall within this narrow range of how they do it. But there is no, no one's teaching you to do that. You have to kind of figure it out yourself. Yep. Okay. Round three. And then if we have some time, we'll get to a bonus round. This is a really, this is a really great post. So this is Kurt Mumel. I think I'm pronouncing that correct. So sorry, Kurt, please come on the show and correct me. We'd love to have you as a guest. Is AI democratization a myth? No.

but it's a 20-year project. This is a deeply insightful discussion with DataIQ. Am I saying that right? DataIQ? DataIQ. DataIQ. I like read that in my mind all the time, but I never, I don't think. Right, yeah. It's like haiku. It's like the haiku. Yeah, DataIQ. Yeah, yeah. Okay. Discussion with the DataIQ co-founder and CEO of Florian Dueto. Two main takeaways. One.

Best AI ML data applications will be built by multidisciplinary teams that blend data and domain expertise. Two, the capabilities of LLMs are not what's holding back enterprise deployment of generative AI. It's all of the boring stuff. Security, data quality, monitoring. All right. Cynical data guy. I mean, yeah, I think the truth in that is probably the, we overestimate all of this stuff in the short term.

And then we run the risk of underestimating it in the long term. So, I mean, I think that part is true. I think the idea of you needing multidisciplinary teams, I think for a lot of this, for a lot of data and technology stuff, that's always been true. It's not a one team thing. It's not a one person thing. And I think like, yeah, the thing that's holding a lot back is the boring stuff. But I also feel like that feels slightly hand wavy to me. And it's not hand wavy.

it's hard. It's hard. There's a lot of details. You know, this is where it's, this is that like, you know, it's almost like saying, oh, that's like the last 10%. Well, that last 10% is going to take one to two X as much energy as the previous 90%. Yeah. Tim, the best analogy I can think of would be when video, like when video first worked over the internet, like you could stream a video.

like you're like wow this is really neat this is great this can revolution everything like to like netflix with like this massive streaming platform with like you know hundreds of thousands of videos that can be accessed instantly from anywhere in the world like that was not like a short trivial you know journey between those two things and i feel like that's a good maybe even should be like

That might even be underrepresented in the effort, but that feels like, okay, just because one person can stream a video at a college on a T1 connection or whatever, there's all these other things that need to happen to make this a reality. But the interesting part is that the boring stuff is going to be lagging. And he says enterprise. I think that's almost any company.

Yeah. So I don't think it's just, if you're thinking, my mind went to like big enterprises. Like, no, I mean, I just replaced the word with like any company deploying some kind of generative AI. But the interesting part is like the LLM like curve, I think it's just going to keep going. And then what, are we going to just have a growing and growing span of like LLM capability from actual implementation capability, if that makes sense.

So that'll be interesting. Because essentially, I don't know what that gap is going to be. And if the LLMs keep developing at X rate, are there things we can do to speed up the Y, which is the boring stuff? Yes, but it'll be interesting. How does that gap progress over time? Does it decrease or increase? That's almost kind of like there's how quickly technology can develop and there's how fast people can actually absorb it and integrate it.

it sounds similar to that i mean i think also just learning because we're i think we're slowly starting to come out of it but for the first couple of years especially there was this idea of llms we're going to do everything and we were just going to shove everything into the llm right instead of realizing it's got us it's got to have a it has a part

and it's an important part but it may not even be the central part of whatever system or agent or whatever it is you're using like there you still need the deterministic elements in there and that probably has to be the driver not the passenger it's going to be wild 20 years okay lightning do we have time for a lightning round yeah i think so okay bonus round sorry bonus round so matt turk has created so much great

content over the years and he had an amazing post about his vc resolutions for 2025 okay here's what we're going to do with this one matt and john can you pull this up and just pick your favorite resolution pick your favorite vc resolution yeah and most of these i think are like written kind of humorously oh it's so tongue-in-cheek but are also like

You know, like the fun tongue-in-cheek, but have some nice nuggets in them as well. Oh, man. Go first. I'll go. Honestly, the first, man, there's a couple I can pick on here. The first one just really got me laughing. We made you pick one earlier in the show, and this is a bonus round. Yeah. We'll go back and forth for one or two of them. Yeah, back and forth for one or two. Oh, great, great. Yeah, the first one really got me laughing. And so it's VC resolutions for 2025.

shed the past is number one on here delete all my posts from last january about why the apple vision pro will be the big story of 2024 which is so great like the you know like it's i i think one of the things in like the age of social media the way that like media cycles work now is no one looks back like occasionally like posts will recirculate right like

you know, five, 10 years later or whatever, but like pretty much no one looks back. So I think it's actually really cool to like pull something up and be like, hey, like this was a projection or whatever I had from January that like, you know, didn't work out as I expected. I will say though, just one quick comment on that. And this is one anecdotal data point, but one of the best engineers that I know, very successful entrepreneur, moved to an Apple Vision Pro for their like workstation.

And they said, I'm never going back. How long have they been? Are they still using it? Do you know? Yeah. They use it exclusively. Really? I have a laptop and stuff. But like a year into it then? Yeah. Wow. Yeah. And then actually someone who works here at Rudderstack is they went, I need to just call them and say, hey, can I come try this out? But they went, they were at their house and they tried it out.

And they said it is absolutely unbelievable. The cost is super high, and it's a very dramatic change. But anyway, just one anecdotal data point, but someone who I generally respect, and we've talked a lot about workflow and tooling and all that sort of stuff. And so it was really shocking for me to hear them say, this is it. I'm not going back.

That's super interesting. And I've heard people say that and they do it for like a couple weeks or a month and then they do go back. You know what? I'm going to text him today. You should. And on the next Stone Cold Steve Austin. But if they're a year into it and they really have stuck with that, that's a huge win and that's a really interesting... We'll have him on the show. You should ask if he's not going to the chiropractor because the date of it was one of the biggest complaints. Oh, man. Okay.

Syndical to that guy. All right. We're going to go with six. Focus. Add a filter to my inbox to automatically discard startup pitches that do not start, that do not use the word agenic and say that I'm using AI in my deal flow process. Just leaning into that VC AI obsession. Man. Costas, the...

Former co-host of the show when he was, he has a startup. So at some point we need to ping him and have him come on the show. I think they were getting close to being ready. Yeah. But God, this was maybe a year ago, maybe a year ago. We were messaging back and forth and just catching up on life. And I was asking him about, you know, are you talking with investors and all this sort of stuff? And I need to try to see if I can pull it out. But he had the funniest statement about.

you know, sort of just going up and down Sand Hill Road saying, you know, foundational model. And he's like, I mean, I don't know how much money you would walk. Sort of the digital version. Okay. One more in the bonus round. Okay. I've got it. I've this one. I'll choose between. Okay. I'm going to pick, I'm going to pick number nine on here and it's, and it really got me laughing. So each of these like thoughts are pre.

There's like a little like summary. Like one of them says inspire. One of them says like anticipate. So this is the one under add value, which makes me laugh. And it says, like I'll skip down, like tell my founders to be more like Sam Altman, which I know they really appreciate, even though they often don't say anything in response. Oh, man.

That's a good one. And that used to be like Elon Musk, I feel like four or five years ago, Steve Jobs. There's always been a guy that's been that guy, but that one made me laugh too. And that it's under add value. Yeah. That one's going to be related to, I chose number eight, Inspire. My CEOs should not forget about founder mode.

Text them helpful reminders from the pool during my upcoming midwinter break in Cabo. He really just goes for the jugular on the stereotypes. Matt, if you're listening, love to have you on the show. Yeah, we'd love to have you on the show. All right. Well, that concludes our bonus round. Stone Cold Steve Austin, thank you as always for joining us and sharing your war stories from deep in the bowels of corporate data America.

And thank you to the listeners. We'll catch you on the next one. Stay cynical. See you guys later.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.