The Data Stack Show - podcast cover

The Data Stack Show

Rudderstackdatastackshow.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

52: Discussing Data Warehouses, Lakes, and Meshes with James Serra of EY

Highlights from this week’s conversation include: James’ background at Microsoft and current work with EY’s data fabric (2:22) The external and internal facing components of EY’s data fabric (6:39) The importance of the data lineage (11:29) The most important requirements for data quality (15:32) Looking at the data capabilities of Microsoft (21:30) The data warehouse, explained (29:00) Using a data warehouse or a data lake (34:33) Defining the buzzword data mesh (51:13) The problem with data me...

Sep 08, 20211 hr 9 min

51: Democratizing AI and ML with Tristan Zajonc of Continual

Topics in this wide-ranging conversation include: Tristan’s background with Cloudera and the need for continual operational ML and AI (3:15) How the complexity of Continual is hidden behind a simplicity of use (14:48) Focusing on data that lives within a data warehouse (18:43) Understanding features in the ML conversation (22:47) The three layers of Continual (26:11) The importance of SQL to Continual (30:19) Caching layers and the data warehouse centric approach (38:28) Betting on the warehouse...

Sep 01, 202155 min

50: From Data Infrastructure to Data Management with Ananth Packkildurai

Highlights from this week’s episode: Ananth’s background (2:51) The evolution of Slack (4:54) Kafka and Presto’s two of the most reliable and flexible tools for Ananth (9:43) How Snowflake gained an advantage over Presto (13:24) Opinions about data lakes (17:23) Core features of data infrastructure (23:22) The tools define the process, and not the other way around (31:30) Defining a data mesh (36:44) Data is inherently social in nature (40:31) Lessons learned from writing Data Engineering Weekly...

Aug 25, 202159 min

49: MLops - The Finalization of the Data Stack with Ben Rogojan of Facebook

Topics in this conversation include: Ben's background and his shift to data engineering (2:19) Trends in the data space: finding the most efficient tools, the Snowflake phenomenon, and keeping up with new functionalities (5:33) Key differences in data practices in small companies and Facebook-sized companies (12:38) Having to build tools specifically designed for Facebook because of SaaS product limitations (16:00) Team structure at Facebook (18:17) Developing more robust systems that are resist...

Aug 18, 202155 min

48: Season Two Recap with Eric Dodds and Kostas Pardalis

Highlights from this week’s episode: Dissecting the different team structures from organizations in season two (1:16) The people behind the data are key to the data itself (9:17) Open source licensing and the core components needed for large scale commercial viability (15:13) Game-changing core technologies in the new data economy (22:09) Snowflake vs. Databricks battle. "The UFC of Geeks" (25:54) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week w...

Aug 11, 202133 min

47: Taming the Four Dragons of Data with Sven Balnojan of Mercateo Gruppe

Highlights from this week’s episode include: Sven's Ph.D. in Singularity Theory (2:59) The Databricks vs. Snowflake conversation (8:17) The difficulty of not just inventing something new, but making it accessible (18:01) Databricks and unstructured data (22:22) Organizational change responding to technological change (29:27) The three-dimensional evolution of a successful open source project (40:31) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week...

Aug 04, 202151 min

46: A New Paradigm in Stream Processing with Arjun Narayan of Materialize

Highlights from this week’s episode include: Introducing Arjun and how he fell in love with databases (2:51) Looking at what Materialize brings to the stack (5:28) Analytics starts with a human in the loop and comes into its own when analysts get themselves out and automate it (15:46) Using Materialize instead of the materialized view from another tool (18:44) Comparing Postgres and Materialize and looking at what's under the hood of Materialize (23:16) Making Materialize simple to use (32:33) W...

Jul 28, 202156 min

45: Open Source and Attribution with Ophir Prusak of Codesmith

Highlights from today's conversation include: Ophir's decision to switch from software engineering to marketing and riding the startup train (2:39) Open sourcing in the world of software (5:55) How open source has changed Ophir's life as a marketeer working at startups (10:28) Chartio's sunsetting drove Ophir to search for a data tooling replacement (27:27) Discussing trends in adoption of tools for small scale and large scale companies (35:01) Data challenges related to attribution--how wrong d...

Jul 21, 202156 min

44: Leveraging Data in a Post-Covid World with Ruben Ugarte of Practico Analytics

Highlights from this week's episode: Ruben's background (2:36) Massive shifts in data caused by COVID (4:47) Big Tech is no longer untouchable (9:54) Accelerations in the BI space (15:17) A focus on people and on trust (23:43) Numbers are filtered by the biases of the people viewing them (28:46) AI trends and adoption (38:06) Using qualitative data for insights, particularly at early stages (40:56) Recommendations for taking stock of who is using the data and assessing what their skills are (50:...

Jul 14, 202158 min

43: Modern Authentication and User Management with Sokratis Vidros of Clerk.dev

Highlights from this week's episode: Sokratis' realization that big corporations were not the best thing for him (2:56) Transitioning for Workable to Clerk.dev (3:40) Convincing developers to outsource components to a service (9:36) Clerk's layered solutions and how it affects the developer and the end-user (12:41) Starting with Clerk from scratch vs. using Clerk to replace an existing component (19:55) Synergies and SaaS starter kit (24:06) Building Clerk to avoid a single point of failure (29:...

Jul 07, 202146 min

42: Scaling Data Science with Ryan Boyer of Shipt

Highlights from this week’s episode include: Ryan's full circle path from stocking shelves at Target to using data science for a company owned by Target (2:00) Building great tools and wielding them effectively (5:04) Changes at Shipt since being acquired (9:29) How people’s bias impacts models built by data scientists (12:30) The different data sources Shipt incorporates (22:02) How Ryan's work as a data scientist has changed as Shipt has grown (25:29) How data science helps marketing (31:38) I...

Jun 30, 202153 min

41: Doing MLOps on Top of Apache Pulsar and Trino with Joshua Odmark of Pandio

Highlights from this week’s episode: Joshua started his first company at age 15 and then sold two more startups after that (2:15) Embracing the open source movement and not reinventing the wheel if you don't have to (12:15) Pulsar seemed built to address Kafka's weaknesses (17:23) Using Redis as a coordinator for federated learning and taking advantage of its portability (23:05) The pillars of Pandio and some practical use cases (31:24) Feature stores and model versioning (38:23) Seeing Pulsar a...

Jun 23, 202150 min

40: Graph Processing on Snowflake for Customer Behavioral Analytics

Highlights from this week’s episode include: Launching Affinio and the engineering backgrounds of the co-founders (2:36) The massive transformation in customer data privacy regulation in the past eight years (6:23) Creating the underpinning technology that can apply to any customer behavioral data set (10:05) Ranking and scoring surfing patterns and sorting nodes and edges (14:13) Placing the importance of attributes into a simple UI experience (19:28) Going from a columnar database to a graph p...

Jun 16, 202158 min

39: Diving deeper into CDC with Ali Hamidi and Taron Foxworth of Meroxa

Highlights from this week’s episode include: Meroxa is a real-time data engineering managed platform (4:53) Use cases for CDC (6:20) Meroxa leverages open source tools to provide initial snapshots and start the CDC stream (12:29) Making the platform publicly available (14:14) What the Meroxa user experience looks like (16:10) Raising Series A funding (17:49) Easiest and most difficult data sources for CDC (20:23) The current state of open CDC (23:16) Expected latency when using CDC (29:56 CDC, r...

Jun 11, 202150 min

38: Graph Databases & Data Governance with David Allen of Neo4j

Highlights from this week's episode include: David’s background in comparative databases (1:50) David’s experience and lessons he learned from writing his book (3:23) How writing a technical book compares to writing technical documentation (4:41) The process of writing a book (6:30) The best and worst part of David’s book writing experience (8:02) An introduction to what Neo4j is (9:08) What you need to graph (11:13) Typical problems a graph database is a good solution for (13:00) The difference...

Jun 02, 202151 min

37: The Components of Data Governance with Dave Melillo of FanDuel

Highlights from this week's episode include: Dave's "nerdy" interests in sports statistics and data (2:12) Trends in collecting, processing, and using data (4:45) Finding a better term for "reverse ETL" (5:48) The blurring of the distinction between sources and destinations (7:41) The role of BI is changing (13:24) Data governance and the physical execution behind it (19:00) Data governance is defining and managing data in a logical way that is actionable by the business (23:43) Consolidation of...

May 26, 202154 min

36: Crypto and Compliance with Nick Fogle, Co-Founder of Churnkey and Wavve

On this week's episode of The Data Stack Show, Eric and Kostas talk with Nick Fogle, co-founder of Churnkey and Wavve. Together they discuss how having a legal background can impact engineering decisions, dealing with privacy and compliance concerns, and selling Wavve and starting Churnkey as a result. Highlights from this week's episode include: Nick's background in economics and law and teaching himself to code (2:01) Thinking like a lawyer and trying to minimize risk to the greatest extent po...

May 19, 202143 min

35: The Future of Development is Distributed with Jim Walker of Cockroach Labs

This week on The Data Stack Show, Eric and Kostas talk with Jim Walker, the VP of product marketing at Cockroach Labs, about distributed systems, competing against the speed of light, and making data easy. Highlights from this week's episode include: Jim background of translating deep technical concepts into understandable English and his work at Cockroach Labs (2:23) The origin of Cockroach Labs and distributed SQL (6:10) Living without Atomic Clocks (10:10) Having the speed of light as the ult...

May 12, 202154 min

34: The Intersection of Data Engineering and Marketing with John Marbachm of Grafana Labs

On this week's episode of The Data Stack Show, Eric and Kostas talk with John Marbach, senior growth manager at Grafana Labs. In this conversation, John discusses marketing ops and the blending of roles of data engineering and marketing. Highlights from this week's episode include: Grafana Labs John Marbach Senior Growth Manager Introduction to John Marbach and working in the blurred lines between marketing and data engineering (2:14) How managing pipeline building and consuming data influences ...

Apr 28, 202149 min

33: ML is a Data Quality Problem with Peter Gao from Aquarium Learning

On this week's episode of The Data Stack Show, Eric and Kostas talk with Peter Gao, co-founder, and CEO at Aquarium Learning. A former engineer at Cruise Automation, Peter and Aquarium Learning help ML teams improve their model performances by improving their data. Highlights from this week's episode include: How getting hit by a drunk driver made researching self-driving cars personal for Peter (2:12) Filtering out the hype in self-driving car news to get a clear picture of its state today (6:5...

Apr 14, 202157 min

32: Cooking with Data Ops with Chris Bergh from DataKitchen

On this week's episode of The Data Stack Show, Eric and Kostas talk with Chris Bergh, the CEO and head chef at Data Kitchen. DataKitchen’s mission is to provide the software, service, and knowledge that makes it possible for every data and analytics team to realize their full potential with DataOps. Highlights from this week's episode include: Chris' background and how the lessons learned in the Peace Corps and at NASA apply to him today (2:03) Why AI left Chris feeling like a jilted lover (7:49...

Apr 07, 202159 min

31: How a 160 Year-Old Publisher is Using Data with Jenna Lemonias From the Atlantic

On this week's episode of The Data Stack Show, Eric and Kostas chat with Jenna Lemonias, director of data science at The Atlantic. The Atlantic, a publication that's been around since 1857, is adapting with the times and is implementing and emulating some of the data science practices seen at big tech companies. Highlights from this week's episode include: Jenna's background in astrophysics and how she pivoted to data science (2:14) Differences in dealing with data at a FinTech company and then ...

Mar 31, 202143 min

30: The DataStack Journey with Rachel Bradley-Haas and Alex Dovenmuehle of Big Time Data

On this week’s episode of The Data Stack Show, Eric and Kostas are joined by the co-founders of Big Time Data, Rachel Bradley-Haas, and Alex Dovenmuehle, formerly of Mattermost and prior to that, Heroku. At Big Time Data, they work together to provide companies with the ability to derive value and insights from decentralized datasets, improve business processes through data enrichment and automation, and build a scalable foundation to enable a data-driven culture. Highlights from this week’s epi...

Mar 24, 20211 hr 2 min

29: The Present and Future of Data Engineering with Joe Reis and Matthew Housley from Ternary Data

On this week’s episode of The Data Stack Show, Eric and Kostas are joined by Matthew Housley, CTO, and Joe Reis, CEO and co-founder of Ternary Data. These self-described “recovering data scientists” focus on teaching skills to build a solid foundation for organizations to work with their data. Highlights from this week’s episode include: Joe and Matt’s background and expertise (2:44) Common threads and trends in the data sphere (9:39) Differences and commonalities between startups and enterprise...

Mar 17, 202158 min

28: Next Gen Data Governance with Stefania from Avo

On this week’s episode of The Data Stack Show, Eric and Kostas are joined by Stefanía Bjarney Ólafsdóttir, the CEO and co-founder of Avo. Avo, which started in 2018, provides data analytics governance as a service, helping organizations make data-driven decisions to improve their customer experience. Highlights from this week’s episode include: Stefania's background with mathematics, philosophy, bioinformatics and consumer mobile (2:39) Making pioneering decisions as head of data science at Quiz...

Mar 10, 202159 min

27: Building B2B Marketplaces with Mike Luby from LeafLink

On this week’s episode of The Data Stack Show, Eric and Kostas are joined by Mike Luby, director of engineering at LeafLink. LeafLink is a cannabis industries B2B wholesale marketplace where thousands of brands can manage and track their orders and relationships. Highlights from this week’s episode include: The infrastructure LeafLink provides for the cannabis supply chain and how it deals with compliance issues. (2:03) Structuring product management organization to launch high-velocity teams (8...

Mar 03, 202142 min

26: Democratizing the Insurance Market with Daniel Gremmell from Policygenius Inc.

On this week’s episode of The Data Stack Show, Eric and Kostas are joined by Daniel Gremymell, head of data at Policygenius, Inc. Policygenius, an insurance marketplace, strives to make it easy for people to understand their options, compare quotes, and buy a policy all in one place with help from licensed experts. Highlights from this week’s episode include: What brought Daniel to Policygenius and how his background in industrial engineering and statistics impacts what he does (1:49) Policygeni...

Feb 24, 202139 min

25: MLOps and Feature Stores with Willem Pienaar from Tecton

On this week’s episode of The Data Stack Show, Kostas is joined by Willem Pienaar, tech lead at Tecton to discuss machine learning, features and feature stores. Highlights from this week’s episode include: Willem Pienaar's background in South Africa and southeast Asia and from Goject to Tecton (1:58) Tecton was founded by the builders of Uber's Michaelangelo (6:37) Defining features and their life cycles (10:05) Comparing a feature store to a database (16:40) Data architecture in a feature store...

Feb 17, 202151 min

24: Demystifying AI with Duc Haba

On this week’s episode of The Data Stack Show, Eric is joined by Duc Haba, an AI researcher and enterprise mobility solution architect consultant who most recently did AI consulting work with Cognizant. Their discussion revolves around demystifying artificial intelligence and why so many people either fear AI or place too much trust in it. Duc talks about some of the AI projects he has worked on, some successes and some failures, and points to how the data biases that humans bring into the model...

Feb 10, 202151 min

23: Migrating from On-Premises to the Cloud with Alex Lancaster from Intuit

On this week’s episode of The Data Stack Show, Kostas and Eric are joined by the risk data engineering manager at Intuit, Alex Lancaster. Alex has been with Intuit, known for its products like QuickBooks, TurboTax, Mint and more, for 15 years and was part of a recent massive and successful re-architecturing from on prem to cloud-based. Highlights from this week’s episode include: Alex and his role at Intuit (1:51) Data marts at Intuit (2:57) Revolutionary changes in the data engineering space in...

Feb 03, 202143 min
For the best experience, listen in Metacast app for iOS or Android