The Data Stack Show - podcast cover

The Data Stack Show

Rudderstackdatastackshow.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

The Data Stack Show Live: Solving the Data Quality Problem

Don't miss our livestream event on April 27 as we talk about all things Data Quality with some of the best in the business. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Apr 21, 20223 min

84: Why Are Analytics Still So Hard? With Kaycee Lai of Promethium

Highlights from this week’s conversation include: Kaycee’s background and career journey (2:34) Why analytics are hard (7:28) Defining “data management” (11:47) Defining “data virtualization” (15:57) The relationship between data virtualization and ETL (18:34) Where a company should invest first (21:40) Building without a Frankenstein stack (25:19) How Promethium solves data stack issues (27:53) Giving context to data (35:14) Cataloging: background, at Promethium, future (39:29) Who uses data ca...

Apr 20, 202256 min

83: Closing the Gap Between Business Analytics and Operational Analytics With Max Beauchemin of Preset

Highlights from this week’s conversation include: Max’s career journey and role today (2:56) Hitting the limits of traditional BI (11:06) The most influential technology (14:34) Merging with BI and visualization (17:35) Two thoughts on real-time (21:02) Defining BI (24:53) How many have actually achieved self-serve BI (29:54) How preset.io fits in the BI architecture of today (32:36) How to use preset.io to expose analytics (35:23) The analytics process to power something like embedded (42:45) O...

Apr 13, 202257 min

The PRQL: BI, Real-Time, and Data Tooling

Eric and Kostas preview their upcoming conversation with Max Beauchemin of preset.io. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Apr 08, 20224 min

82: Databases: The Fun Never Stops with Robert Hodges of Altinity

Highlights from this week’s conversation include: Robert’s background and career journey (2:21) How studying languages influences database work (5:13) Why Robert has been working with databases for 40+ years (7:50) Explaining the ClickHouse database (10:43) How ClickHouse is able to focus on latency (13:39) The use cases behind ClickHouse (19:19) How ClickHouse is different than other databases (25:47) Why old problems are just now getting addressed (29:04) How ClickHouse works with others again...

Apr 06, 20221 hr 2 min

The PRQL: What Inspires Continued Innovation in Databases?

Eric and Kostas preview their upcoming conversation with Robert Hodges of Altinity. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Apr 01, 20224 min

81: Digging into Data Ops with Prukalpa Sankar of Atlan

Highlights from this week’s conversation include: Prukalpa’s background and career journey (3:16) Applying a data-driven mindset to poverty (7:21) What Atlan does (11:53) The makeup of a realistically functioning data team (15:25) How to create a company’s first data team (18:13) Defining “agile data” (22:01) The necessity of data ops (26:36) The minimum data stack needed (29:16) Data team size (31:58) Where to start when you need to make adjustments (34:51) Collaborate with different parts of t...

Mar 30, 202256 min

The PRQL: Data Team Diversity & Maturing Data Ops

Eric and Kostas preview their upcoming conversation about data ops and diversity with Prupalka of Atlan. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Mar 25, 20223 min

80: Is Reverse-ETL Just Another Data Pipeline? With Census, Hightouch, & Workato

Highlights from this week’s conversation include: Panel introductions (2:23) What is driving the trend behind Reverse ETL? (5:24) The obstacles to building an internal Reverse ETL tool at scale (15:34) How to decide system management vs. user flexibility (20:14) Why previous products failed in creating this category (29:12) Increased demand and democratization of datastack skills via SaaS (42:03) Broader applications for Reverse ETL (47:29) Limitations of Reverse ETL (55:05) How user technical a...

Mar 23, 20221 hr 16 min

The PRQL: Is Reverse ETL New or Old?

Eric and Kostas preview their upcoming panel discussion on reverse ETL and the modern data stack. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Mar 18, 20224 min

79: All About Experimentation with Che Sharma of Eppo

Highlights from this week’s conversation include: Che’s background and career journey (4:23) Coherence between hemispheres in the human brain (6:58) Raising Airbnb above primitive AB testing technology (8:54) Economic thinking in Airbnb’s data science practice (14:24) Dealing with multiple pipelines (16:48) Eppo’s role in recognizing statistically significant data (20:01) Defining “experiment” (23:25) Types of experiments (25:57) The workflow journey (27:18) Dealing with metric silos (34:21) Why...

Mar 16, 202256 min

The PRQL: Is A/B Testing Only Relevant for B2C?

Eric and Kostas preview their upcoming conversation with Che Sharma of geteppo.com. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Mar 11, 20223 min

78: The Etymology of Reverse ETL & Why It’s a Key Piece Of The Modern Data Stack with Boris Jabes of Census

Highlights from this week’s conversation include: Boris’ background career journey (2:32) The origins of “reverse ETL” (6:39) Reverse Fivetran (16:35) Product as an experience (22:41) Fivetran users vs Census users (24:14) How to add value to a data dump (26:56) Ways companies are creating IP (33:48) The cascade effect of the modern data stack (37:56) Defining “data federation” (43:51) Lessons from building a product (49:10) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP...

Mar 09, 20221 hr 6 min

77: Standardizing Unstructured Data with Verl Allen of Claravine

Highlights from this week’s conversation include: Verl’s career journey (2:46) M&A data evaluation criteria (7:12) What Claravine does (10:48) The breadth of data (15:03) Adding to content and advertising data (18:22) How Claravine standardizes data (23:53) Designing a data model (25:40) The underlying technologies of building a product (33:43) The main consumer (35:02) Maintaining quality (39:06) Helping solidify definitions (41:37) Implementing Claravine’s model across various companies (4...

Mar 02, 20221 hr 1 min

76: Why a Data Team Should Limit Its Own Superpowers with Sean Halliburton of CNN

Highlights from this week’s conversation include: Sean’s career journey (3:27) Optimization and localized testing results (7:49) Denying potential access to more data (13:46) Other dimensions data has (18:32) The other side of capturing events (20:55) Data equivalent of API contracts (25:03) SDK restrictiveness for developers (27:40) How to know if you’re still sending the right data (30:38) Debugging that starts in a client of a mobile app (36:08) Communicating about data (38:36) The next phase...

Feb 23, 202252 min

75: How To Become a Data Engineer with Parham Parvizi of the Data Stack Academy

Highlights from this week’s conversation include: Par’s background and current role (2:48) About Talend (6:46) Nonlinear pathways to data engineering roles (11:08) What a data engineer needs to be successful (17:37) Before “data engineer” was a title (27:59) Signs you should be a data engineer (32:39) Curiosity and data engineering (38:31) Defining the modern data stack (45:07) How to get a feel for data engineering (52:52) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP ...

Feb 16, 202259 min

The PRQL: Can We Define the Role of the Data Engineer (Yet)?

In this PRQL, Eric and Kostas preview their upcoming conversation with Parham Parvizi of tura.io. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Feb 11, 20224 min

74: Kostas Respawns at Starburst, is Interviewed by Eric, and Reminisces About Winamp

Highlights from this week’s conversation include: Big News: podcast hits, Kostas’ career change (2:19) Kostas’ career start in data pipelines (4:09) The Winamp and Napster era (11:46) Starting an API gateway (16:56) Observing new technology from afar (23:43) Starting Blendo (32:38) Problems faced in architecting the product (37:12) Kostas’ role at Starburst (40:25) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, anal...

Feb 09, 202245 min

73: What a High Performing Data Team (and Stack) Looks Like with Paige Berry of Netlify

Highlights from this week’s conversation include: Paige’s career path (2:44) Paige’s role and responsibilities at Netlify (6:38) Sharing data insights (8:55) Scope in the context of delivering an insight (12:39) Defining “insight” (15:10) Where the client journey begins (16:43) Miscommunication because of vague terminology (20:06) Netlify’s internal knowledge repository (23:01) Breaking down Netlify’s hub and spoke model (30:45) What data tools to use and when (35:21) The metric layer and BI (44...

Feb 02, 202257 min

The PRQL: How High Performing Data Teams Put Tooling in the Background

This week on the PRQL, Eric and Kostas discuss tooling as they preview the upcoming show with Paige Berry of Netlify. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Jan 28, 20224 min

72: Building Data Ops Into the Data Lifecycle with Douwe Maan of Meltano

Highlights from this week’s conversation include: Douwe’s career journey (3:04) The missing piece in GitLab’s data tooling (7:35) The open-source offering in the data space (12:38) Singer’s connection with Meltano (22:31) How Meltano manages connectors on a diverse codebase (35:21) The data house side of Meltano (39:47) Data house operating versus Airflow (44:06) Meltano’s vision present today (47:02) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each we...

Jan 26, 202255 min

The PRQL: Is It Viable to Manage Integrations Open Source?

Eric and Kostas preview the upcoming show featuring Douwe Maan of Meltano. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Jan 21, 20226 min

71: ETL at the Edges with Jimmy Chan of Dropbase

Highlights from this week’s conversation include: Jimmy’s career background (3:01) How to use Data cubes (5:52) What Dropbase is and who it is built for (11:01) Getting sales and marketing data in usable formats (16:46) Ensuring data remains flexible and transferable (28:36) Defining what “offline data” is and how to use it (34:09) How Dropbase can work with the rest of the data stack (43:30) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll ...

Jan 19, 202257 min

The PRQL: Is Kostas an Excel Power User Yes/No?

Eric and Kostas preview the upcoming conversation with Jimmy Chan of Dropbase. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Jan 14, 20226 min

70: The Difference Between Data Lakes and Data Warehouses with Vinoth Chandar of Apache Hudi

Highlights from this week’s conversation include: Vinoth’s career background (3:19) Building a data lake at Uber (6:52) Defining what a data lake is (14:01) How data warehouses differ from data lakes (22:46) When you should utilize an open source solution in your datastack (37:36) Evolving from a data warehouse to a data lake (45:09) Early wins Hudi earned inside of Uber (52:30) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data e...

Jan 12, 20221 hr
For the best experience, listen in Metacast app for iOS or Android