The Data Stack Show - podcast cover

The Data Stack Show

Rudderstackdatastackshow.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

Episodes

Data Council Week: AI Isn’t Just Hype - How To Successfully Apply LLMs Today with Tristan Zajonc of Continual

Highlights from this week’s conversation include: Tristan's Background and Journey into Data (1:14) Evolution of Machine Learning and AI (3:13) Impact of Generative AI (6:33) MLOps and Challenges in Early Data Science (8:48) Success and Applications of AI Today (11:34) Continual AI Copilot Platform (18:04) Challenges in building remarkable AI assistants (19:58) Reliability and accuracy in AI responses (25:31) Regulation and adoption of AI assistants (31:30) Future of AI assistants and Continual ...

Apr 17, 202436 min

Data Council Week: How To Do Self-Service Data Analytics and Business Intelligence Right with Ryan Dolley of GoodData

Highlights from this week’s conversation include: Ryan’s background in data (0:58) Transition from Performing Arts to Data (2:23) Understanding End Users in Data Projects (6:08) Learning from Failures in Data Projects (8:07) The self-service era (19:50) Struggles of self-service (21:23) The disillusion with dashboards (26:23) GoodData's approach (30:06) Merging wisdom with modern approach (31:50) User experience with GoodData (34:05) Defining metrics and AI (36:35) Connecting with Ryan and GoodD...

Apr 15, 202442 min

185: The Evolution of Data Processing, Data Formats, and Data Sharing with Ryan Blue of Tabular

Highlights from this week’s conversation include: The Evolution of Data Processing (2:36) Ryan’s Background and Journey in Data (4:52) Challenges in Transitioning to S3 (8:47) Impact of Latency on Query Performance (11:43) Challenges with Table Representation (15:26) Designing a New Metadata Format (21:36) Integration with Existing Tools and Open Source Project (24:07) Initial Features of Iceberg (26:11) Challenges of Manual Partitioning (31:49) Designing the Iceberg Table Format (37:31) Trade-o...

Apr 10, 20241 hr 30 min

The PRQL: The Two Parallel Tracks of Development In Data Processing with Ryan Blue of Tabular

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com ....

Apr 08, 20245 min

184: Kafka Streams and Operationalizing Event Driven Applications with Apurva Mehta of Responsive

Highlights from this week’s conversation include: Apruva’s background in streaming technology (0:48) Developer experience and Kafka streams (2:47) Motivation to bootstrap a startup (4:09) Meeting the Confluent founders and early work at Confluent (6:59) Projects at Confluent and transition to engineering management (10:34) Overview of Responsive and event-driven applications (12:55) Defining event-driven applications (15:33) Importance of latency and state in event-driven applications (18:54) Lo...

Apr 03, 202458 min

The PRQL: Event-Driven Applications: Where Low Latency Meets High Impact with Apurva Mehta of Responsive

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com ....

Apr 01, 20244 min

183: Why Modern Data Quality Must Move Beyond Traditional Data Management Practices with Chad Sanderson of Gable.ai

Highlights from this week’s conversation include: Chad’s background and journey in data (0:46) Importance of Data Supply Chain (2:19) Challenges with Modern Data Stack (3:28) Comparing Data Supply Chain to Real-world Supply Chains (4:49) Overview of Gable.ai (8:05) Rethinking Data Catalogs (11:42) New Ideas for Managing Data (15:16) Data Discovery and Governance Challenges (18:51) Static Code Analysis and AI Impact on Data (24:55) Creating Contracts and Defining Data Lineage (27:31) Data Quality...

Mar 27, 20241 hr 3 min

The PRQL: The Data Supply Chain with Chad Sanderson of Gable.ai

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com ....

Mar 25, 20248 min

182: Building a Dynamic Data Infrastructure at Enterprise Scale Featuring Kevin Liu of Stripe

Highlights from this week’s conversation include: Kevin’s background and work at Stripe (0:31) Evolution of Data Infrastructure at Stripe (2:18) Kevin's Interest in Data (5:29) Software Engineer or Data Engineer? (8:27) Speech Recognition Work at Amazon (11:06) Efficiency and Cost Management (15:50) Metadata and Query Analysis (18:38) Surprising Discoveries in Metadata Analysis (21:43) Optimizing Cost and Value (23:55) Product Sizing Stripe Data (26:39) Popular Tool for Data Interaction (30:08) ...

Mar 20, 20241 hr 1 min

The PRQL: Exploring the Intersection of Software Engineering and Data Management with Kevin Liu of Stripe

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com ....

Mar 18, 20246 min

181: OLAP Engines and the Next Generation of Business Intelligence with Mike Driscoll of Rill Data

Highlights from this week’s conversation include: Michael’s background and journey in data (0:33) The origin story of Druid (2:39) Experiences and growth in Data (8:08) Druid's evolution (21:46) Druid's architectural decisions (26:32) The user experience (30:06) The developer experience (35:14) The evolution of BI tools (40:55) Data architecture and integration (47:53) AI's impact on BI (52:26) What would Mike be doing if he didn’t work in data? (56:27) Final thoughts and takeaways (57:02) The D...

Mar 13, 20241 hr

The PRQL: Making the Data Stack Serverless in the Cloud with Mike Driscoll of Rill Data

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com ....

Mar 11, 20246 min

180: Data Observability and AI for Data Operations Featuring Kunal Agarwal of Unravel Data

Highlights from this week’s conversation include: The evolution of data operations (1:13) Unravel's role in simplifying data operations (2:17) Kunal’s journey from fashion to enterprise data management (5:23)\ The Unravel platform and its components (10:08) Challenges in data operations at scale (16:34) Users of Unravel within an organization (22:32) Calculating ROI on data products (25:55) Understanding the cost of data operations (27:01) Measuring productivity and reliability (30:59) Diversity...

Mar 06, 202453 min

179: Time Series Data Management and Data Modeling with Tony Wang of Stanford University

Highlights from this week’s conversation include: Tony's background and research focus (3:35) Challenges in academia and industry (6:15) Ph.D. student's routine (10:47) Academic paper review process (15:26) Aha moments in research (20:05) Academic lab structure (23:09) The decision to move from hardware to data research (24:43) Research focus on time series data management (27:40) Data modeling in time series and OLAP systems (32:01) Issues and potential solutions for parquet format (37:32) Role...

Feb 28, 202451 min

178: How to Build a Data Stack to Win PLG, Featuring Peter Chapman

Highlights from this week’s conversation include: Peter's background and journey in data (0:26) Introduction to PLG (4:18) Starting in data at Heroku (6:05) Building the data stack at Heroku (8:13) Data stack requirements for early-stage companies (12:00) Differentiating PLG companies from open source companies (19:26) Venture capital and open source as a lever for growth (22:56) Initial data modeling and analysis (25:38) Operationalizing Data (29:16) Sales and Marketing Operationalization (31:5...

Feb 21, 202457 min

177: AI-Based Data Cleaning, Data Labelling, and Data Enrichment with LLMs Featuring Rishabh Bhargava of refuel

Highlights from this week’s conversation include: The overview of refuel (0:33) The evolution of AI and LLMs (3:51) Types of LLM models (12:31) Implementing LLM use cases and cost considerations (00:15:52) User experience and fine-tuning LLM models (21:49) Categorizing search queries (22:44) Creating internal benchmark framework (29:50) Benchmarking and evaluation (35:35) Using refuel for documentation (44:18) The challenges of analytics (46:45) Using customer support ticket data (48:17) The tag...

Feb 14, 20241 hr 7 min

176: The Fundamentals of Event-Driven Orchestration and How Generative AI Is Shaping Its Future with Viren Baraiya of orkes.io

Highlights from this week’s conversation include: Viren’s background in data (0:39) Evolution of Orchestration (1:52) AI Orchestration (3:00) Understanding Conductor and orkes (6:26) Event-Driven Orchestration (8:10) Viren’s Transition to Founder (12:27) Non-Technical Aspects of Being a Founder (15:50) Democratizing AI for Developers (18:16) The evolution of microservices orchestration (21:56) Challenges in appealing to the 99% developer group (24:32) Value of orchestration for developers (30:31...

Feb 07, 202453 min

175: The Parts, Pieces, and Future of Composable Data Systems, Featuring Wes McKinney, Pedro Pedreira, Chris Riccomini, and Ryan Blue

Highlights from this week’s conversation include: Introduction of the panel (0:05) Defining composable data stack (5:22) Components of a composable data stack (7:49) Challenges and incentives for composable components (10:37) Specialization and modularity in data workloads (13:05) Organic evolution of composable systems (17:50) Efficiency and common layers in data management systems (22:09) The IR and Data Computation (23:00) Components of the Storage Layer (26:16) Decoupling Language and Execut...

Jan 31, 20241 hr 19 min

174: Does Your Data Stack Need a Semantic Layer? Featuring Artyom Keydunov of Cube Dev

Highlights from this week’s conversation include: Artyom’s background in the data space (0:32) The growth and changes at Cube (5:58) Pain points of managing metrics definitions across different tools (9:39) Trade-offs between coupled and decoupled semantic layers (12:12) Making a case for implementing a semantic layer (14:17) The evolution of semantic layers (23:28) Challenges in designing a decoupled semantic layer (24:16) Different approaches to solving the interface problem (26:58) Implementi...

Jan 24, 202458 min

173: Data Analytics Is a Team Sport, Featuring Jay Henderson of Alteryx

Highlights from this week’s conversation include: No Code Analytics (1:22) Analytics as a Team Sport (2:31) The workflow of someone without Alteryx (11:27) Alteryx's ability to handle diverse data sources (14:32) The balance between ease of use and complexity (23:06) Enabling casual end users with a no code interface (24:19) Taking analytics to the data (31:47) The boundaries between data engineers and end users (33:44) The importance of collaboration in analytics (34:12) The potential of every ...

Jan 17, 202447 min

172: How WebAssembly is Enabling the Third Wave of Cloud Compute with Matt Butcher of Fermyon Technologies

Highlights from this week’s conversation include: Matt’s background and journey with Fermyon (2:32) WebAssembly and enhanced security models (3:43) The IOT Startup and Google Acquisition (10:49) Google's Early Containers (11:50) Scaling and anticipating requests (20:22) Introduction to WebAssembly and its importance (23:32) The Benefits of WebAssembly (30:57) Comparison of Virtual Machines, Containers, and Micro VMs (33:12) The Importance of Fast Startup Times in WebAssembly (37:39) Metaphysics ...

Jan 10, 202456 min
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast