The Data Stack Show - podcast cover

The Data Stack Show

Rudderstackdatastackshow.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

Episodes

176: The Fundamentals of Event-Driven Orchestration and How Generative AI Is Shaping Its Future with Viren Baraiya of orkes.io

Highlights from this week’s conversation include: Viren’s background in data (0:39) Evolution of Orchestration (1:52) AI Orchestration (3:00) Understanding Conductor and orkes (6:26) Event-Driven Orchestration (8:10) Viren’s Transition to Founder (12:27) Non-Technical Aspects of Being a Founder (15:50) Democratizing AI for Developers (18:16) The evolution of microservices orchestration (21:56) Challenges in appealing to the 99% developer group (24:32) Value of orchestration for developers (30:31...

Feb 07, 202453 min

175: The Parts, Pieces, and Future of Composable Data Systems, Featuring Wes McKinney, Pedro Pedreira, Chris Riccomini, and Ryan Blue

Highlights from this week’s conversation include: Introduction of the panel (0:05) Defining composable data stack (5:22) Components of a composable data stack (7:49) Challenges and incentives for composable components (10:37) Specialization and modularity in data workloads (13:05) Organic evolution of composable systems (17:50) Efficiency and common layers in data management systems (22:09) The IR and Data Computation (23:00) Components of the Storage Layer (26:16) Decoupling Language and Execut...

Jan 31, 20241 hr 19 min

174: Does Your Data Stack Need a Semantic Layer? Featuring Artyom Keydunov of Cube Dev

Highlights from this week’s conversation include: Artyom’s background in the data space (0:32) The growth and changes at Cube (5:58) Pain points of managing metrics definitions across different tools (9:39) Trade-offs between coupled and decoupled semantic layers (12:12) Making a case for implementing a semantic layer (14:17) The evolution of semantic layers (23:28) Challenges in designing a decoupled semantic layer (24:16) Different approaches to solving the interface problem (26:58) Implementi...

Jan 24, 202458 min

173: Data Analytics Is a Team Sport, Featuring Jay Henderson of Alteryx

Highlights from this week’s conversation include: No Code Analytics (1:22) Analytics as a Team Sport (2:31) The workflow of someone without Alteryx (11:27) Alteryx's ability to handle diverse data sources (14:32) The balance between ease of use and complexity (23:06) Enabling casual end users with a no code interface (24:19) Taking analytics to the data (31:47) The boundaries between data engineers and end users (33:44) The importance of collaboration in analytics (34:12) The potential of every ...

Jan 17, 202447 min

172: How WebAssembly is Enabling the Third Wave of Cloud Compute with Matt Butcher of Fermyon Technologies

Highlights from this week’s conversation include: Matt’s background and journey with Fermyon (2:32) WebAssembly and enhanced security models (3:43) The IOT Startup and Google Acquisition (10:49) Google's Early Containers (11:50) Scaling and anticipating requests (20:22) Introduction to WebAssembly and its importance (23:32) The Benefits of WebAssembly (30:57) Comparison of Virtual Machines, Containers, and Micro VMs (33:12) The Importance of Fast Startup Times in WebAssembly (37:39) Metaphysics ...

Jan 10, 202456 min

171: Machine Learning Pipelines Are Still Data Pipelines with Sandy Ryza of Dagster

Highlights from this week’s conversation include: The role of an orchestrator in the lifecycle of data (1:34) Relevance of orchestration in data pipelines (00:02:45) Changes around data ops and MLOps (3:37) Data Cleaning (11:42) Overview of Dagster (13:50) Assets vs Tasks in Data Pipeline (19:15) Building a Data Pipeline with Dexter (25:40) Difference between Data Asset and Materialized Dataset (28:28) Defining Lineage and Data Assets in Dagster (29:32) The boundaries of software and organizatio...

Jan 03, 202456 min

170: Discussing Data Roles and Solving Data Problems with Katie Bauer of GlossGenius

Highlights from this week’s conversation include: The evolution of the data scientist role (1:03) Common problems in different companies (2:05) Measuring and curating content on Reddit (4:29) The challenges of working with unstructured content at Reddit and Twitter (11:03) Lessons learned from Reddit and applying them at Twitter (13:17) Data challenges and customer behavior analysis at GlossGenius (20:16) How the data scientist's role has changed over time (00:25:10) The essence of the data scie...

Dec 27, 202354 min

169: Data Models: From Warehouse to Business Impact with Tasso Argyros of ActionIQ

Highlights from this week’s conversation include: The Evolution of Databases and Data Systems (2:33) Abstracting Data for Business Users (4:31) Building a Database for Google-like Search (7:58) The Big Data Explosion (11:10) Selling Myspace as First Customer (13:14) Starting ActionIQ (16:57) The customer-centric organization (22:46) Transitioning to customer data focus (23:53) Understanding business users' needs (28:30) Supporting Arbitrary Queries and Data Models (34:42) Unique Technical Perspe...

Dec 20, 20231 hr 6 min

168: Decoding Data Mesh: Principles, Practices, and Real-World Applications Featuring Paolo Platter, Zhamak Dehghani, and Melissa Logan

Highlights from this week’s conversation include: Defining data mesh (6:37) Addressing the scale of organizational complexity and usage (9:04) The shift from monolithic to microservices (12:24) The sociological structure in data mesh (13:59) Data product generation and sharing in data mesh (17:27) Data Mesh: Simplifying Data Work (24:09) Getting Started with Data Mesh (29:14) Building products for Data Mesh (36:42) Building a customizable and extensible platform to shape data practice (39:28) Th...

Dec 13, 202357 min

167: Data-Driven Investing and Company Building with Ben Miller of Fundrise

Highlights from this week’s conversation include: Ben’s background in real estate (3:27) Why Fundrise was Started (4:37) Democratizing Investment Opportunities (6:35) Investment Thesis for Venture (11:55) Challenges with Data and Technology (12:34) Importance of Data Model Abstraction (20:03) Data Infrastructure and Investments (23:22) Evolution of Data Engineering (25:12) Closing the Tooling Gap (34:23) The user base segmentation (36:28) The emotional reality of investment decisions (40:50) Dat...

Dec 06, 202357 min

166: Data Processing Fundamentals and Building a Unified Execution Engine Featuring Pedro Pedreira of Meta

Highlights from this week’s conversation include: The concept of composable at a lower level of data infrastructure (1:28) New architectures and components that allow developers to build databases (3:44) Pedro's background and experience in data infrastructure (6:18) The Spectrum of Latency and Analytics (12:59) Different Query Engines for Different Use Cases (16:32) Vectorized vs Code Gen Data Processing (19:33) Vectorization and Code Generation (21:21) Examples of Vectorized Engines (24:33) Re...

Nov 29, 20231 hr 12 min

165: SQL Queries, Data Modeling, and Data Visualization with Colin Zima of Omni

Highlights from this week’s conversation include: Colin's Background and Starting Omni (1:48) Defining “good” at Google search early in his career (4:42) Looker's Unique Approach to Analytics (9:48) The paradigm shift in analytics (10:52) The architecture of Looker and its influence (12:04) Combatting the challenge of unbundling in the data stack (14:26) The evolution of analytics engineering (21:50) Enhancing user flexibility in Omni (23:44) The evolution of BI tools (32:53) What does the futur...

Nov 22, 202354 min

164: How The GTM and Data Teams at Snowflake Work Together with Travis Henry and Hillary Carpio

Highlights from this week’s conversation include: The Unique Perspective of Practitioners (2:10) Account-based Marketing (6:30) Sales Development Representatives (SDR) (8:05) Descriptive, People, and Engagement Data (11:38) Data Overload and Actionable Data (14:20) Working with Data Teams and Internal Data (17:52) The relationship between business and data teams (22:27) The importance of collaboration between marketing and data teams (24:17) Travis and Hillary writing a book (25:33) The taxonomy...

Nov 15, 202357 min

163: Simplifying Real-Time Streaming with David Yaffe and Johnny Graettinger of Estuary

Highlights from this week’s conversation include: Johnny and David’s background in working together (1:56) The background story of Estuary (4:15) The challenges of ad tech and the need for low latency (5:44) Use cases for moving data at scale (10:35) Real-time data replication methods (11:54) Challenges with Kafka and the birth of Gazette (13:54) Comparing Kafka and Gazette (20:22) The importance of existing streaming tools (22:28) Challenges of managing Kafka and the need for a different approa...

Nov 08, 20231 hr 4 min

162: Accelerating Enterprise AI Transformation With Open Source LLMs Featuring Mark Huang of Gradient

Highlights from this week’s conversation include: The potential of AI-driven applications (1:34) The need for hardware infrastructure in AI experimentation (2:40) Oligopoly on the closed side (11:50) Advantages of private side vs. open source (13:18) Leveraging valuable data within enterprises (16:00) The urgency of adopting LLMs in the enterprise (24:02) Expansion of LLMs into new business verticals (25:06) The challenges of operationalizing LLMs (29:32) Seamless experience with OpenAI (37:29) ...

Nov 01, 202357 min