The Data Stack Show - podcast cover

The Data Stack Show

Rudderstackdatastackshow.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

Episodes

63: The ETL - ELT Flip With Ciaran Dynes of Matillion

On this week’s episode of The Data Stack Show, Eric and Kostas have a conversation with Ciaran Dynes, the Chief Product Officer at Matillion, a powerful and easy-to-use, completely cloud-capable ETL/ELT solution.

Nov 24, 202156 min

62: The Internet of Everything with Rob Rastovich of ThingLogix

Highlights from this week’s conversation include: Rob’s career began as an early adopter in internet marketing and then he got the bug for machine-to-machine IoT (2:47) Making assumptions about mass scale (8:44) Pervasiveness of IoT in the market (11:47) Initial reactions to technological advances that we take for granted today (17:28) What makes IoT unique (23:56) Killing the SQL server (29:11) What really separates a smart device from a dumb device that can send data to the cloud (33:13) 5G, L...

Nov 17, 202152 min

61: What is Data Design? With Kevin Gervais of Touchless

Highlights from this week’s conversation include: Kevin’s interaction with data at an early age (2:35) Working with telecom data (5:08) Analyzing emojis in customer sentiment (8:44) Infrastructure needed for diverse data (12:22) Building better interfaces and looking out for human error (24:17) Dealing with differences in identities in different layers of the stack (41:21) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data enginee...

Nov 10, 202155 min

60: Architecting a Boring Stream Processing Tool With Ashley Jeffs of Benthos

Highlights from this week’s conversation include: A brief overview of Ashley’s background (2:47) Benthos’ creation and the problems it was meant to address (4:01) Use cases for Benthos (18:25) Key features of Benthos that make it stand out (22:23) Adding windowing to Benthos for fun (29:23) The highs and lows of maintaining an open source project for five years (32:17) The architecture of Benthos (36:23) The importance of ordering in streaming processing (42:15) Gaining traction with an open sou...

Nov 03, 20211 hr 7 min

59: Making ETL Optional with Justin Borgman of Starburst Data

Highlights from this week’s conversation include: Starburst Data is Justin’s second startup (2:42) Starburst focuses on doing data warehousing analytics without the need for the data warehouse (4:14) Multi-cloud solutions among merger and acquisition use cases (8:32) Ways the stack is increasing in complexity (12:25) Comparing essential components of a data stack from 2010 to now (15:01) The future of ETL (27:36) The best maturity stage for an organization to implement Starburst (31:27) Starburs...

Oct 27, 202158 min

58: Data Federation is No Longer The "F" Word with Scott Gnau of InterSystems

Highlights from this week’s conversation include: Solving problems with data has been a long-time passion of Scott’s (2:52) Day-to-day use of data at InterSystems (6:25) The technical aspects involved in constructing a data fabric (17:52) Companies at a variety of maturity levels can adopt a data fabric (26:49) A paradigm shift in the marketplace (28:39) Comparing and contrasting data fabric and data mesh (30:49) Sharing data across the business and not having it siloed in different departments ...

Oct 20, 202150 min

57: Improving Data Quality Using Data Product SLAs with Egor Gryaznov of Bigeye

Highlights from this week’s conversation include: Egor’s software engineering background and history with Uber (2:19) Experimentation platforms and analytics definitions (7:49) Bigeye’s function and use cases (9:40) Managing the relationship between the data engineer maintaining the pipelines and the downstream teams providing the context (18:49) Pinpointing problems in data compared to problems in software (21:55) Defining data quality at Bigeye (24:13) Machine learning models as a data product...

Oct 13, 202156 min

56: Stream Processing and Observability with Jeff Chao of Stripe

Highlights from this week’s conversation include: Jeff’s history with stream processing (2:52) Working with Mantis to address the impact of Netflix downtime (4:20) Defining observability as operational insight (6:58) Time series data and the value of data today (18:52) Data integration’s shift from batch to streaming (29:34) The current state of change data capture (32:20) How an engineer thinks of the end-user (56:21) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for d...

Oct 06, 20211 hr 4 min

55: Tables vs. Streams and Defining Real-Time with Pete Goddard of Deephaven Data Labs

Highlights from this week’s conversation include: Pete’s background in data engineering and capital market trading (2:10) Comparison of the tooling from 2012 when Deephaven started with that of today (10:30) Taking a closer look at defining real-time data (19:47) Getting non-technical people, clients, and developers all on the same platform (36:11) Deephaven’s incremental update model (40:25) Kafka, timely data flow, and Deephaven (44:22) Use cases for Deephaven (51:52) Going to GitHub to try ou...

Sep 29, 20211 hr 7 min

54: The Center of the Modern Data Stack with Neil Rahilly of Mixpanel

Highlights from this week’s conversation include: Neil’s programming hobby turned into a career and how he cold-contacted Mixpanel for a job (2:28) Lessons learned from nine years at Mixpanel (5:05) Defining product analytics (8:06) How Mixpanel has evolved into the product it is today (10:56) The importance of Mixpanel’s real-time analysis (19:52) Looking at Arb, Mixpanel’s own arbitrary segmentation database (23:44) The business impact that the rise of the cloud data warehouse had on Mixpanel ...

Sep 22, 20211 hr 9 min

53: What Religion, a Cult, and a Tech Product Have in Common, with Bart Farrell of DoKC

Highlights from this week’s conversation include: Bart’s journey from southern California, to New York, to Egypt, to London, to Spain (3:31) Exposure to different communities and finding shared language and experience (10:21) Looking back at early online communities and how they furthered your learning journey (27:50) How the level of niche-ness impacts a community (44:06) The cautionary tale of WeWork (57:28) Surefire community killers (1:03:44) Open source communities in tech and the passion t...

Sep 15, 20211 hr 20 min

52: Discussing Data Warehouses, Lakes, and Meshes with James Serra of EY

Highlights from this week’s conversation include: James’ background at Microsoft and current work with EY’s data fabric (2:22) The external and internal facing components of EY’s data fabric (6:39) The importance of the data lineage (11:29) The most important requirements for data quality (15:32) Looking at the data capabilities of Microsoft (21:30) The data warehouse, explained (29:00) Using a data warehouse or a data lake (34:33) Defining the buzzword data mesh (51:13) The problem with data me...

Sep 08, 20211 hr 9 min

51: Democratizing AI and ML with Tristan Zajonc of Continual

Topics in this wide-ranging conversation include: Tristan’s background with Cloudera and the need for continual operational ML and AI (3:15) How the complexity of Continual is hidden behind a simplicity of use (14:48) Focusing on data that lives within a data warehouse (18:43) Understanding features in the ML conversation (22:47) The three layers of Continual (26:11) The importance of SQL to Continual (30:19) Caching layers and the data warehouse centric approach (38:28) Betting on the warehouse...

Sep 01, 202155 min

50: From Data Infrastructure to Data Management with Ananth Packkildurai

Highlights from this week’s episode: Ananth’s background (2:51) The evolution of Slack (4:54) Kafka and Presto’s two of the most reliable and flexible tools for Ananth (9:43) How Snowflake gained an advantage over Presto (13:24) Opinions about data lakes (17:23) Core features of data infrastructure (23:22) The tools define the process, and not the other way around (31:30) Defining a data mesh (36:44) Data is inherently social in nature (40:31) Lessons learned from writing Data Engineering Weekly...

Aug 25, 202159 min

49: MLops - The Finalization of the Data Stack with Ben Rogojan of Facebook

Topics in this conversation include: Ben's background and his shift to data engineering (2:19) Trends in the data space: finding the most efficient tools, the Snowflake phenomenon, and keeping up with new functionalities (5:33) Key differences in data practices in small companies and Facebook-sized companies (12:38) Having to build tools specifically designed for Facebook because of SaaS product limitations (16:00) Team structure at Facebook (18:17) Developing more robust systems that are resist...

Aug 18, 202155 min

48: Season Two Recap with Eric Dodds and Kostas Pardalis

Highlights from this week’s episode: Dissecting the different team structures from organizations in season two (1:16) The people behind the data are key to the data itself (9:17) Open source licensing and the core components needed for large scale commercial viability (15:13) Game-changing core technologies in the new data economy (22:09) Snowflake vs. Databricks battle. "The UFC of Geeks" (25:54) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week w...

Aug 11, 202133 min

47: Taming the Four Dragons of Data with Sven Balnojan of Mercateo Gruppe

Highlights from this week’s episode include: Sven's Ph.D. in Singularity Theory (2:59) The Databricks vs. Snowflake conversation (8:17) The difficulty of not just inventing something new, but making it accessible (18:01) Databricks and unstructured data (22:22) Organizational change responding to technological change (29:27) The three-dimensional evolution of a successful open source project (40:31) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week...

Aug 04, 202151 min

46: A New Paradigm in Stream Processing with Arjun Narayan of Materialize

Highlights from this week’s episode include: Introducing Arjun and how he fell in love with databases (2:51) Looking at what Materialize brings to the stack (5:28) Analytics starts with a human in the loop and comes into its own when analysts get themselves out and automate it (15:46) Using Materialize instead of the materialized view from another tool (18:44) Comparing Postgres and Materialize and looking at what's under the hood of Materialize (23:16) Making Materialize simple to use (32:33) W...

Jul 28, 202156 min

45: Open Source and Attribution with Ophir Prusak of Codesmith

Highlights from today's conversation include: Ophir's decision to switch from software engineering to marketing and riding the startup train (2:39) Open sourcing in the world of software (5:55) How open source has changed Ophir's life as a marketeer working at startups (10:28) Chartio's sunsetting drove Ophir to search for a data tooling replacement (27:27) Discussing trends in adoption of tools for small scale and large scale companies (35:01) Data challenges related to attribution--how wrong d...

Jul 21, 202156 min

44: Leveraging Data in a Post-Covid World with Ruben Ugarte of Practico Analytics

Highlights from this week's episode: Ruben's background (2:36) Massive shifts in data caused by COVID (4:47) Big Tech is no longer untouchable (9:54) Accelerations in the BI space (15:17) A focus on people and on trust (23:43) Numbers are filtered by the biases of the people viewing them (28:46) AI trends and adoption (38:06) Using qualitative data for insights, particularly at early stages (40:56) Recommendations for taking stock of who is using the data and assessing what their skills are (50:...

Jul 14, 202158 min

43: Modern Authentication and User Management with Sokratis Vidros of Clerk.dev

Highlights from this week's episode: Sokratis' realization that big corporations were not the best thing for him (2:56) Transitioning for Workable to Clerk.dev (3:40) Convincing developers to outsource components to a service (9:36) Clerk's layered solutions and how it affects the developer and the end-user (12:41) Starting with Clerk from scratch vs. using Clerk to replace an existing component (19:55) Synergies and SaaS starter kit (24:06) Building Clerk to avoid a single point of failure (29:...

Jul 07, 202146 min

42: Scaling Data Science with Ryan Boyer of Shipt

Highlights from this week’s episode include: Ryan's full circle path from stocking shelves at Target to using data science for a company owned by Target (2:00) Building great tools and wielding them effectively (5:04) Changes at Shipt since being acquired (9:29) How people’s bias impacts models built by data scientists (12:30) The different data sources Shipt incorporates (22:02) How Ryan's work as a data scientist has changed as Shipt has grown (25:29) How data science helps marketing (31:38) I...

Jun 30, 202153 min
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast