The Data Stack Show - podcast cover

The Data Stack Show

Rudderstackdatastackshow.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

148: Exploring the Intersection of DAGs, ML Code, and Complex Code Bases: An Elegant Solution Unveiled with Stefan Krawczyk of DAGWorks

Highlights from this week’s conversation include: Stefan’s background in data (2:39) What is DAGWorks? (3:55) How building point solutions influenced Stefan’s journey (5:03) Solving the tooling problems of self-service at an organization (11:44) Creating Hamilton (15:53) How Hamilton works with definitions and time-series data (19:34) What makes Hamilton an ML-oriented framework? (23:39) Navigating the differences between ML teams and other data teams (26:27) Understanding the fundamentals of Ha...

Jul 26, 202357 min

The PRQL: A Methodology for Better DAGs with Stefan Krawczyk of DAGWorks

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com . ...

Jul 24, 20234 min

Shop Talk: Snowflake Summit Recap

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com . ...

Jul 21, 202320 min

147: Where Data and Infrastructure Converge Featuring Lars Kamp of Resoto

Highlights from this week’s conversation include: Lars work on Resoto in helping to cut cloud costs for organizations (2:02) The trend of large resources to micro resources (5:59) What are some of the typical resource drains in data infrastructure (8:56) Managing cost on the backend with scale and experimentation (12:51) Solutions for resource management problems (17:38) How Resoto is solving pain points in resource management (26:17) Navigating the complexities of data infrastructure (29:01) Re...

Jul 19, 202358 min

146: What Is a Customer Data Platform? Featuring Soumyadeb Mitra of Rudderstack

Highlights from this week’s conversation include: Soumyadeb’s background and journey in data (5:49) Defining customer data (8:10) The complexity of customer data collection (10:04) What is a CDP and how it is properly deployed (17:12) Bridging the gap of data collection and useful analytics for marketing (21:46) How Rudderstack translates data and the new profile feature (25:30) The foundations of data in building a 360 degree customer profile (30:30) Solutions for the intersection between engin...

Jul 12, 202352 min

The PRQL: Synthetic Data and Self Driving Cars with Omar Maher of Parallel Domain

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com. H...

Jul 03, 20236 min

144: Explaining Features, Embeddings, and the Difference Between ML and AI with Simba Khadder of Featureform

Highlights from this week’s conversation include: Simba’s background in the data space (3:05) Subscription intelligence (6:41) ML and Distributed Systems (9:09) The Brutal Subscription Industry (12:31) Serendipity in Recommender Systems (16:31) Subscription as a Strategy (20:47) Customizing Content for Subscribers (22:19) Creating User Embeddings (25:53) Building Featureform (28:01) Embedding Projections (32:47) Spaces and similarity (35:53) User embeddings and transformer models (38:22) Vector ...

Jun 28, 20231 hr 12 min

The PRQL: Feature Stores and ML Ops with Simba Khadder of Featureform

In this bonus episode, Eric and Kostas preview their upcoming conversation with Simba Khadder of Featureform. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Jun 26, 20235 min

Shop Talk: Accountability and Opportunity for AI

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com . ...

Jun 23, 202320 min

143: Collaborative Data Analytics on the Data Warehouse, featuring Rob Woollen & Stipo Josipovic of Sigma

Highlights from this week’s conversation include: Stipo and Rob’s background in data (2:43) What is Sigma? (7:46) Takeaways from building analytics products in-house (9:16) Sigma’s approach to datastore interface (11:32) Why analytics and BI are still not a solved problem (15:50) Combining SQL and spreadsheets for useful interface (23:17) The evolution of BI to today (29:40) Overcoming the challenges of collaboration in working with data (33:17) Creating operational coding that humans can unders...

Jun 21, 20231 hr 15 min

Shop Talk: Why AI Is Not Another Crypto

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com . ...

Jun 16, 202324 min

142: Martech’s Separation and Return to Data Infrastructure with Scott Brinker of HubSpot

Highlights from this week’s conversation include: Scott’s background in martech (3:10) Where things have gone wrong between IT and marketing (5:46) The explosion of digital marketing data (12:04) Costs of having data siloed (16:14) The convergence of marketing and IT teams around data (19:27) Navigating the massive landscape of martech tools (26:10) Needed tools in the martech stack (31:11) The importance of an accurate attribution model (34:37) Building tooling for marketers and developers to u...

Jun 14, 202357 min

The PRQL: Marketing, Martech, and Data with Scott Brinker of HubSpot

In this bonus episode, Eric and Kostas preview their upcoming conversation with Scott Brinker of HubSpot. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

Jun 12, 20234 min

141: A Journey From Backend Engineer to Data Engineer with Ioannis Foukarakis of Mattermost

Highlights from this week’s conversation include: Ioannis’ background and journey in data (2:42) Rudderstack’s transformations feature and examples of its application (4:20) Winning the transformations contest at Rudderstack (7:21) How Ioannis’ transformation project works for data governance (9:40) Memories from college for Ioannis and Kostas (12:30) Getting into the world of software development (17:27) The changes in data and engineering over the years (20:29) Bridging java with python (23:15...

Jun 07, 202358 min

140: Stream Processing for Machine Learning with Davor Bonaci of DataStax

Highlights from this week’s conversation include: Davor’s journey from Google and what he was building there (3:32) How work in stream processing changed Davor’s journey (5:10) Analytical predictive models and infrastructure (9:39) How Kaskada serves as a recommendation engine with data (14:05) Kaskada’s user experience as an event processing platform (20:06) Enhancing typical feature store architecture to achieve better results (23:34) What is needed to improve stream and batch processes (27:39...

May 31, 20231 hr 2 min

139: Decoupling the Execution Engine From Python’s Pandas with Aditya Parameswaran of Ponder

Highlights from this week’s conversation include: Aditya’s background and journey in the data space (2:47) What does Ponder do? (5:18) 101 on Pandas and why people utilize it (6:42) The challenge of translating Pandas to a big data platform (16:11) Data Warehouses and ML workflows (21:27) The differences in the “zoo” of data languages (26:56) Why do ML and data engineering have to be so different in languages? (34:39) Builders should be adapting to the users and not the other way around (39:32) ...

May 24, 202358 min

138: Paradigm Shift: Batch to Data Streaming with A.J. Hunyady of InfinyOn

Highlights from this week’s conversation include: A.J.’s background and journey in data (2:23) Challenges with Hadoop ecosystem (8:50) Starting InfinyOn and the need for innovation (10:02) Challenges with Kafka and Microservices (14:01) Real-time data streaming for IoT devices (19:28) Paradigm shift to real-time data processing (22:17) Benefits of Rust (29:45) Web Assembly and Platform Features (36:29) Analytics and Event Correlation (40:16) Real-time data processing (47:03) ETL vs ELP (52:20) F...

May 17, 20231 hr 2 min

137: Data Collection Secrets & The Search Data Problem with Josh Wills

Highlights from this week’s conversation include: Josh’s background in data working at Google, Slack, and other companies (1:21) The need and process for high quality data (4:33) Digging into auction code (14:03) Joining Slack and working in the early days of the company (18:00) Not fighting the last war in data (25:42) Building a product, while using the product (30:35) Transitioning to the search team at Slack (36:50) Usage patterns of search (41:21) Josh’s work in helping build DuckDB (46:20)...

May 10, 202359 min

The PRQL: Data Engineers in the Front End with Josh Wills

In this bonus episode, Eric previews his upcoming conversation with Josh Wills, an experienced data scientist who has worked with IBM, Google, Slack, DuckDB, and more. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.

May 08, 20232 min

136: System Evolution from Hadoop to RocksDB with Dhruba Borthakur of Rockset

Highlights from this week’s conversation include: Dhruba’s journey into the data space (2:02) The impact of Hadoop on the industry (3:37) Dhruba’s work in the early days of the Facebook team (7:54) Building and implementing RocksDB (14:33) Stories with Mark Zuckerberg at Facebook (24:25) The next evolution in storage hardware (26:14) How Rockset is different from other real-time platforms (33:13) Going from a key value store to an index (37:15) Where does Rockset go from here? (44:59) The succes...

May 03, 20231 hr
For the best experience, listen in Metacast app for iOS or Android