The Data Stack Show

Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

Follow on

Episodes

79: All About Experimentation with Che Sharma of Eppo

Highlights from this week’s conversation include: Che’s background and career journey (4:23) Coherence between hemispheres in the human brain (6:58) Raising Airbnb above primitive AB testing technology (8:54) Economic thinking in Airbnb’s data science practice (14:24) Dealing with multiple pipelines (16:48) Eppo’s role in recognizing statistically significant data (20:01) Defining “experiment” (23:25) Types of experiments (25:57) The workflow journey (27:18) Dealing with metric silos (34:21) Why...

Mar 16, 2022•56 min

The PRQL: Is A/B Testing Only Relevant for B2C?

Eric and Kostas preview their upcoming conversation with Che Sharma of geteppo.com.

Mar 11, 2022•3 min

78: The Etymology of Reverse ETL & Why It’s a Key Piece Of The Modern Data Stack with Boris Jabes of Census

Highlights from this week’s conversation include: Boris’ background career journey (2:32) The origins of “reverse ETL” (6:39) Reverse Fivetran (16:35) Product as an experience (22:41) Fivetran users vs Census users (24:14) How to add value to a data dump (26:56) Ways companies are creating IP (33:48) The cascade effect of the modern data stack (37:56) Defining “data federation” (43:51) Lessons from building a product (49:10) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP...

Mar 09, 2022•1 hr 6 min

The PRQL: Reverse ETL and the Distinction Between Operation vs Analysis on Data

Eric and Kostas preview their upcoming conversation with Borris Jabes of Census.

Mar 04, 2022•3 min

77: Standardizing Unstructured Data with Verl Allen of Claravine

Highlights from this week’s conversation include: Verl’s career journey (2:46) M&A data evaluation criteria (7:12) What Claravine does (10:48) The breadth of data (15:03) Adding to content and advertising data (18:22) How Claravine standardizes data (23:53) Designing a data model (25:40) The underlying technologies of building a product (33:43) The main consumer (35:02) Maintaining quality (39:06) Helping solidify definitions (41:37) Implementing Claravine’s model across various companies (4...

Mar 02, 2022•1 hr 1 min

The PRQL: If Everything Is Data, How Can We Make Sense of It All?

Eric and Kostas preview their upcoming conversation with Verl Allen of Claravine.

Feb 25, 2022•6 min

76: Why a Data Team Should Limit Its Own Superpowers with Sean Halliburton of CNN

Highlights from this week’s conversation include: Sean’s career journey (3:27) Optimization and localized testing results (7:49) Denying potential access to more data (13:46) Other dimensions data has (18:32) The other side of capturing events (20:55) Data equivalent of API contracts (25:03) SDK restrictiveness for developers (27:40) How to know if you’re still sending the right data (30:38) Debugging that starts in a client of a mobile app (36:08) Communicating about data (38:36) The next phase...

Feb 23, 2022•52 min

The PRQL: How Important Is the Human Factor When Working With Data?

Eric and Kostas preview their upcoming show with Sean Halliburton of Warnermedia.

Feb 18, 2022•4 min

75: How To Become a Data Engineer with Parham Parvizi of the Data Stack Academy

Highlights from this week’s conversation include: Par’s background and current role (2:48) About Talend (6:46) Nonlinear pathways to data engineering roles (11:08) What a data engineer needs to be successful (17:37) Before “data engineer” was a title (27:59) Signs you should be a data engineer (32:39) Curiosity and data engineering (38:31) Defining the modern data stack (45:07) How to get a feel for data engineering (52:52) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP ...

Feb 16, 2022•59 min

The PRQL: Can We Define the Role of the Data Engineer (Yet)?

In this PRQL, Eric and Kostas preview their upcoming conversation with Parham Parvizi of tura.io.

Feb 11, 2022•4 min

74: Kostas Respawns at Starburst, is Interviewed by Eric, and Reminisces About Winamp

Highlights from this week’s conversation include: Big News: podcast hits, Kostas’ career change (2:19) Kostas’ career start in data pipelines (4:09) The Winamp and Napster era (11:46) Starting an API gateway (16:56) Observing new technology from afar (23:43) Starting Blendo (32:38) Problems faced in architecting the product (37:12) Kostas’ role at Starburst (40:25) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, anal...

Feb 09, 2022•45 min

The PRQL: What Prompts a Conversation About Winamp & Quake Arena on The Data Stack Show?

Eric and Kostas preview some exciting news coming up on episode 74 of the Data Stack Show.

Feb 04, 2022•3 min

73: What a High Performing Data Team (and Stack) Looks Like with Paige Berry of Netlify

Highlights from this week’s conversation include: Paige’s career path (2:44) Paige’s role and responsibilities at Netlify (6:38) Sharing data insights (8:55) Scope in the context of delivering an insight (12:39) Defining “insight” (15:10) Where the client journey begins (16:43) Miscommunication because of vague terminology (20:06) Netlify’s internal knowledge repository (23:01) Breaking down Netlify’s hub and spoke model (30:45) What data tools to use and when (35:21) The metric layer and BI (44...

Feb 02, 2022•57 min

The PRQL: How High Performing Data Teams Put Tooling in the Background

This week on the PRQL, Eric and Kostas discuss tooling as they preview the upcoming show with Paige Berry of Netlify.

Jan 28, 2022•4 min

72: Building Data Ops Into the Data Lifecycle with Douwe Maan of Meltano

Highlights from this week’s conversation include: Douwe’s career journey (3:04) The missing piece in GitLab’s data tooling (7:35) The open-source offering in the data space (12:38) Singer’s connection with Meltano (22:31) How Meltano manages connectors on a diverse codebase (35:21) The data house side of Meltano (39:47) Data house operating versus Airflow (44:06) Meltano’s vision present today (47:02) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each we...

Jan 26, 2022•55 min

The PRQL: Is It Viable to Manage Integrations Open Source?

Eric and Kostas preview the upcoming show featuring Douwe Maan of Meltano.

Jan 21, 2022•6 min

71: ETL at the Edges with Jimmy Chan of Dropbase

Highlights from this week’s conversation include: Jimmy’s career background (3:01) How to use Data cubes (5:52) What Dropbase is and who it is built for (11:01) Getting sales and marketing data in usable formats (16:46) Ensuring data remains flexible and transferable (28:36) Defining what “offline data” is and how to use it (34:09) How Dropbase can work with the rest of the data stack (43:30) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll ...

Jan 19, 2022•57 min

The PRQL: Is Kostas an Excel Power User Yes/No?

Eric and Kostas preview the upcoming conversation with Jimmy Chan of Dropbase.

Jan 14, 2022•6 min

70: The Difference Between Data Lakes and Data Warehouses with Vinoth Chandar of Apache Hudi

Highlights from this week’s conversation include: Vinoth’s career background (3:19) Building a data lake at Uber (6:52) Defining what a data lake is (14:01) How data warehouses differ from data lakes (22:46) When you should utilize an open source solution in your datastack (37:36) Evolving from a data warehouse to a data lake (45:09) Early wins Hudi earned inside of Uber (52:30) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data e...

Jan 12, 2022•1 hr

The PRQL: What Old Tech Concepts Were Borrowed to Build the Data Lake House?

Eric and Kostas preview the upcoming show as they talk about data lakes and data warehouses and why these are important.

Jan 07, 2022•5 min

69: What is the Modern Data Stack?

Highlights from this week’s conversation include: Panel introductions and backgrounds (2:55) What the modern data stack means to each of our panelists (5:04) Defining the fundamental components of a modern data stack (17:22) How the modern stack drives insights and actions for businesses (28:03) Getting to a uniform definition to the modern stack (33:45) Managing the modernization of a large scale data stack (39:09) How testing works in the dbt context (48:44) The relationship between the data w...

Jan 05, 2022•1 hr 4 min

The PRQL: Should Data Trust Drive the Evolution of Your Data Stack?

In this PRQL, Eric and Kostas preview their upcoming show where they discuss the modern data stack with some of the top experts in the industry.

Dec 31, 2021•5 min

68: Season Three Recap: Holiday Edition with Eric Dodds and Kostas Pardalis

In this episode, Eric and Kostas look back over the great topics and guests from season three of the Data Stack Show. The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of ...

Dec 29, 2021•25 min

67: Now is the Time to Think About Data Quality with Manu Bansal of Lightup Data

Highlights from this week’s conversation include: Manu’s career background and describing Lightup (2:31) Why traditional tools don’t work for modern data problems (6:04) How a data lake differs from a data warehouse (11:35) Defining data quality (14:07) The business impact of solving and applying data quality (31:36) Constructing a healthy financial view on the impact of data (41:09) How to work with unstructured data in a meaningful way (47:44) The Data Stack Show is a weekly podcast powered by...

Dec 22, 2021•56 min

The PRQL: Will Data Quality Always Require a Human in the Loop?

Eric and Kostas preview the upcoming show by talking about data quality.

Dec 21, 2021•4 min

66: How Data Infrastructure Has Evolved and Managing High Performing Data Teams with Srivatsan Sridharan

Highlights from this week’s conversation include: Starting his career on the first-ever data team at Yelp (2:00) How to approach the adoption of new technology (7:04) When to use stream processing vs. batching (11:35) What is a pipeline and why is it core to a data engineer? (14:07) Where a new data scientist should begin their career (19:14) The key factors impacting a new technology decision (27:09) Managing team emotions in decision making (34:25) The unique challenge of Fintech vs other cons...

Dec 15, 2021•51 min

The PRQL: How Would You Define a Data Pipeline? Featuring the RudderStack Eng. Team

On the PRQL this week, Eric and Kostas bring in some of the Rudderstack engineering team to discuss data pipelines and preview episode 66 of the Data Stack Show.

Dec 10, 2021•5 min

65: Operationalizing Data from the Warehouse With Aayush Jain of Cliff.ai

Highlights from this week’s conversation include: Aayush’s career background (4:13) How his biological sciences academic training impacts his work (8:04) How do we allow dashboards to get messy? (9:35) Building cultural or technical solutions to effective dashboards (15:19) Using data dashboards to make material business improvements (23:19) What is business observability? (32:23) Building a platform for operations teams (43:15) How important community is to the cliff.ai business proposition (41...

Dec 08, 2021•56 min

The PRQL: Why is the Data Engineer's Role Expanding?

In this show PRQL, Eric and Kostas talk about the evolution of the role of a data engineer and preview the conversation with Aayush Jain.

Dec 03, 2021•10 min

64: Data Stack Composability and Commoditization with Michel Tricot of Airbyte

Highlights from this week’s conversation include: Announcement: Data Stack Live! (1:00) Michel’s career background (4:13) Solving the technical and process challenges of moving data (7:04) Lessons learned from managing data at Live Ramp (9:35) How to build a modern data stack (16:19) Triggers to signal when more data infrastructure is needed (23:19) Why Airbyte is an open-source product (30:23) Airbyte’s role in providing support to open-source problems (38:15) How important DPT is for the Airby...

Dec 01, 2021•56 min

← Prev Next →

For the best experience, listen in Metacast app for iOS or Android

Open in Metacast