The Data Stack Show - podcast cover

The Data Stack Show

Rudderstackdatastackshow.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

Episodes

79: All About Experimentation with Che Sharma of Eppo

Highlights from this week’s conversation include: Che’s background and career journey (4:23) Coherence between hemispheres in the human brain (6:58) Raising Airbnb above primitive AB testing technology (8:54) Economic thinking in Airbnb’s data science practice (14:24) Dealing with multiple pipelines (16:48) Eppo’s role in recognizing statistically significant data (20:01) Defining “experiment” (23:25) Types of experiments (25:57) The workflow journey (27:18) Dealing with metric silos (34:21) Why...

Mar 16, 202256 min

78: The Etymology of Reverse ETL & Why It’s a Key Piece Of The Modern Data Stack with Boris Jabes of Census

Highlights from this week’s conversation include: Boris’ background career journey (2:32) The origins of “reverse ETL” (6:39) Reverse Fivetran (16:35) Product as an experience (22:41) Fivetran users vs Census users (24:14) How to add value to a data dump (26:56) Ways companies are creating IP (33:48) The cascade effect of the modern data stack (37:56) Defining “data federation” (43:51) Lessons from building a product (49:10) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP...

Mar 09, 20221 hr 6 min

77: Standardizing Unstructured Data with Verl Allen of Claravine

Highlights from this week’s conversation include: Verl’s career journey (2:46) M&A data evaluation criteria (7:12) What Claravine does (10:48) The breadth of data (15:03) Adding to content and advertising data (18:22) How Claravine standardizes data (23:53) Designing a data model (25:40) The underlying technologies of building a product (33:43) The main consumer (35:02) Maintaining quality (39:06) Helping solidify definitions (41:37) Implementing Claravine’s model across various companies (4...

Mar 02, 20221 hr 1 min

76: Why a Data Team Should Limit Its Own Superpowers with Sean Halliburton of CNN

Highlights from this week’s conversation include: Sean’s career journey (3:27) Optimization and localized testing results (7:49) Denying potential access to more data (13:46) Other dimensions data has (18:32) The other side of capturing events (20:55) Data equivalent of API contracts (25:03) SDK restrictiveness for developers (27:40) How to know if you’re still sending the right data (30:38) Debugging that starts in a client of a mobile app (36:08) Communicating about data (38:36) The next phase...

Feb 23, 202252 min

75: How To Become a Data Engineer with Parham Parvizi of the Data Stack Academy

Highlights from this week’s conversation include: Par’s background and current role (2:48) About Talend (6:46) Nonlinear pathways to data engineering roles (11:08) What a data engineer needs to be successful (17:37) Before “data engineer” was a title (27:59) Signs you should be a data engineer (32:39) Curiosity and data engineering (38:31) Defining the modern data stack (45:07) How to get a feel for data engineering (52:52) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP ...

Feb 16, 202259 min

74: Kostas Respawns at Starburst, is Interviewed by Eric, and Reminisces About Winamp

Highlights from this week’s conversation include: Big News: podcast hits, Kostas’ career change (2:19) Kostas’ career start in data pipelines (4:09) The Winamp and Napster era (11:46) Starting an API gateway (16:56) Observing new technology from afar (23:43) Starting Blendo (32:38) Problems faced in architecting the product (37:12) Kostas’ role at Starburst (40:25) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, anal...

Feb 09, 202245 min

73: What a High Performing Data Team (and Stack) Looks Like with Paige Berry of Netlify

Highlights from this week’s conversation include: Paige’s career path (2:44) Paige’s role and responsibilities at Netlify (6:38) Sharing data insights (8:55) Scope in the context of delivering an insight (12:39) Defining “insight” (15:10) Where the client journey begins (16:43) Miscommunication because of vague terminology (20:06) Netlify’s internal knowledge repository (23:01) Breaking down Netlify’s hub and spoke model (30:45) What data tools to use and when (35:21) The metric layer and BI (44...

Feb 02, 202257 min

72: Building Data Ops Into the Data Lifecycle with Douwe Maan of Meltano

Highlights from this week’s conversation include: Douwe’s career journey (3:04) The missing piece in GitLab’s data tooling (7:35) The open-source offering in the data space (12:38) Singer’s connection with Meltano (22:31) How Meltano manages connectors on a diverse codebase (35:21) The data house side of Meltano (39:47) Data house operating versus Airflow (44:06) Meltano’s vision present today (47:02) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each we...

Jan 26, 202255 min

71: ETL at the Edges with Jimmy Chan of Dropbase

Highlights from this week’s conversation include: Jimmy’s career background (3:01) How to use Data cubes (5:52) What Dropbase is and who it is built for (11:01) Getting sales and marketing data in usable formats (16:46) Ensuring data remains flexible and transferable (28:36) Defining what “offline data” is and how to use it (34:09) How Dropbase can work with the rest of the data stack (43:30) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll ...

Jan 19, 202257 min

70: The Difference Between Data Lakes and Data Warehouses with Vinoth Chandar of Apache Hudi

Highlights from this week’s conversation include: Vinoth’s career background (3:19) Building a data lake at Uber (6:52) Defining what a data lake is (14:01) How data warehouses differ from data lakes (22:46) When you should utilize an open source solution in your datastack (37:36) Evolving from a data warehouse to a data lake (45:09) Early wins Hudi earned inside of Uber (52:30) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data e...

Jan 12, 20221 hr

69: What is the Modern Data Stack?

Highlights from this week’s conversation include: Panel introductions and backgrounds (2:55) What the modern data stack means to each of our panelists (5:04) Defining the fundamental components of a modern data stack (17:22) How the modern stack drives insights and actions for businesses (28:03) Getting to a uniform definition to the modern stack (33:45) Managing the modernization of a large scale data stack (39:09) How testing works in the dbt context (48:44) The relationship between the data w...

Jan 05, 20221 hr 4 min

68: Season Three Recap: Holiday Edition with Eric Dodds and Kostas Pardalis

In this episode, Eric and Kostas look back over the great topics and guests from season three of the Data Stack Show. The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of ...

Dec 29, 202125 min

67: Now is the Time to Think About Data Quality with Manu Bansal of Lightup Data

Highlights from this week’s conversation include: Manu’s career background and describing Lightup (2:31) Why traditional tools don’t work for modern data problems (6:04) How a data lake differs from a data warehouse (11:35) Defining data quality (14:07) The business impact of solving and applying data quality (31:36) Constructing a healthy financial view on the impact of data (41:09) How to work with unstructured data in a meaningful way (47:44) The Data Stack Show is a weekly podcast powered by...

Dec 22, 202156 min

66: How Data Infrastructure Has Evolved and Managing High Performing Data Teams with Srivatsan Sridharan

Highlights from this week’s conversation include: Starting his career on the first-ever data team at Yelp (2:00) How to approach the adoption of new technology (7:04) When to use stream processing vs. batching (11:35) What is a pipeline and why is it core to a data engineer? (14:07) Where a new data scientist should begin their career (19:14) The key factors impacting a new technology decision (27:09) Managing team emotions in decision making (34:25) The unique challenge of Fintech vs other cons...

Dec 15, 202151 min

65: Operationalizing Data from the Warehouse With Aayush Jain of Cliff.ai

Highlights from this week’s conversation include: Aayush’s career background (4:13) How his biological sciences academic training impacts his work (8:04) How do we allow dashboards to get messy? (9:35) Building cultural or technical solutions to effective dashboards (15:19) Using data dashboards to make material business improvements (23:19) What is business observability? (32:23) Building a platform for operations teams (43:15) How important community is to the cliff.ai business proposition (41...

Dec 08, 202156 min

64: Data Stack Composability and Commoditization with Michel Tricot of Airbyte

Highlights from this week’s conversation include: Announcement: Data Stack Live! (1:00) Michel’s career background (4:13) Solving the technical and process challenges of moving data (7:04) Lessons learned from managing data at Live Ramp (9:35) How to build a modern data stack (16:19) Triggers to signal when more data infrastructure is needed (23:19) Why Airbyte is an open-source product (30:23) Airbyte’s role in providing support to open-source problems (38:15) How important DPT is for the Airby...

Dec 01, 202156 min
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast