The Data Stack Show - podcast cover

The Data Stack Show

Rudderstackdatastackshow.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

Episodes

104: A Decade of Change in the Data Space with Benn Stancil of Mode

Highlights from this week’s conversation include: Benn’s background and career journey (2:28) The problem Benn sought to solve (4:48) Data engineering a decade ago (9:58) Technology inside vs. outside Silicon Valley (18:11) What’s next for data (24:42) Mode’s evolution and journey (29:31) Challenges of getting enough context to create (39:21) Current trends that won’t see long-term benefits (48:44) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week ...

Sep 14, 202257 min

103: Everyone Is Invited to the Data Lakehouse with Kyle Weller of Onehouse.ai

Highlights from this week’s conversation include: Kyle’s background and career journey (2:38) Unique challenges in building data engineering products (9:33) The problem set Databricks resolves (13:46) About Onehouse (17:15) From Microsoft to Onehouse (20:59) Why there’s so much distance between data powers (24:45) Why the data lake is not enough (30:15) Who should have a lake house (39:03) Why we have all three data platforms (43:53) How to step into the data lake house world (49:48) The Data St...

Sep 07, 202256 min

102: Building Pinot for Real-Time, Interactive User Analytics with Kishore Gopalakrishna of StarTree

Highlights from this week’s conversation include: Kishore’s background and career journey (2:30) Internal analytics versus user-facing analytics (3:49) New ways of thinking about analytics (8:06) What makes Pinot different (13:45) How Pinot transforms systems (21:53) Understanding the data landscape (32:40) The Pinot user experience (36:27) Something exciting about StarTree (40:05) When you should adopt this technology (43:15) The Data Stack Show is a weekly podcast powered by RudderStack, the C...

Aug 31, 202249 min

101: The Future of Machine Learning with Willen Pienaar of Tecton and Tristan Zajonc of Continual

Highlights from this week’s conversation include: When is it right to use ML? (5:22) ML business models (10:21) Significant changes in delivering ML (19:07) Why ML is different (25:19) SQL becoming more important (34:39) Graduating from SQL-based to real-time (37:22) Space for a new role (45:11) State-of-the-art models (49:03) The most exciting thing in the ML space (53:59) Open source in ML (56:39) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week...

Aug 24, 20221 hr 4 min

100: Data Quality Is Relative to Purpose with James Campbell of Superconductive

Highlights from this week’s conversation include: James’ role at Great Expectations (2:33) What Great Expectations does (5:49) How Great Expectations approaches data quality (7:01) Why a data engineer should use Great Expectations (16:41) Defining “data quality” (19:16) Translating expectations from one domain to the other (27:00) Community around Great Expectations (30:59) The user experience (33:41) Something exciting on the horizon (40:27) Interacting with marketers in a non-technical way (43...

Aug 17, 202254 min

99: State of the Data Lakehouse with Vinoth Chandar of Apache Hudi

Highlights from this week’s conversation include: Vinoth’s background and career journey (3:08) Defining “data lakehouse” (5:10) Databricks versus lake houses (13:37) The services a lakehouse needs (17:37) How to communicate technical details (26:55) Onehouse’s product vision (31:41) Lakehouse performance versus BigQuery solutions (36:44) How to deliver customer experience equally (40:17) How to start building a lakehouse (44:00) Big tech’s effect on smaller lakehouses (55:33) Skipping the data ...

Aug 10, 20221 hr 13 min

98: Category Theory and the Mathematical Foundation of the Technologies We Use with Eric Daimler of Conexus

Highlights from this week’s conversation include: Eric’s background and career journey (3:30) Presenting to people without knowledge of AI (11:04) Why math was chosen over AI (19:03) From compilers to databases (25:42) The contribution of category theory (30:09) The Connexus customer experience (37:45) The primary user of Connexus (46:33) Interacting with 300,000 databases (51:07) When Connexus begins to add value (54:02) The best way to learn this mathematical approach (55:46) The Data Stack Sh...

Aug 03, 20221 hr 2 min

97: How To Build an Organization-Empowering Data Team with Emilie Schario of Amplify Partners

Highlights from this week’s conversation include: Emilie’s background and career journey (3:00) Hypergrowth at GitLab (5:23) Being close to the money in data (9:50) Big things taken from GitLab to Netlify (13:00) Defining “data organization” (17:53) The first roles you should hire for (22:06) Defining “analytics engineer” (23:44) One role to bridge different needs (27:26) Why data analysts are needed (30:51) How to avoid a kitchen sink of data (40:20) Data engineer archetype (45:48) Data roles c...

Jul 27, 202254 min

96: How To Collect and Leverage Data From the Physical World with Prateek Joshi of Plutoshift

Highlights from this week’s conversation include: Prateek’s background and career journey (2:10) The lack of advanced data tools for the physical world (4:55) Dealing with data from the physical world (10:53) Stocks in the physical world (14:20) What it takes to execute this kind of project (19:05) Challenges around this infrastructure (25:56) ML tools that are useful in this environment (31:55) Physical instrumentation and environmental interaction (36:43) Current adoption of physical instrumen...

Jul 20, 202255 min

95: How the Metrics Layer Bridges the Gap Between Data & Business with Nick Handel of Transform

Highlights from this week’s conversation include: Nick’s background and career journey (2:40) What Transform does (5:53) Metrics layer vs. metrics store (8:04) Signals vs. metrics (13:24) The user of a metric layer (14:34) Using Transform within an organization (17:05) How to fuse two sources into a metric (23:54) Currently supported databases (28:46) Community engagement (31:33) Optimizing for queries, metrics, and use cases (35:33) Technology and the human factor (40:49) Managing metrics amids...

Jul 13, 202258 min

94: Notebooks Aren’t Just for Data Scientists With Barry McCardel of Hex Technologies

Highlights from this week’s conversation include: Bary’s background and Hex (3:05) Reconciling two sides of data (9:16) Collaboration at Hex (15:10) What it takes to build something like Hex (20:02) Defining “commitment engineering” (26:01) How to begin working with Hex (30:56) Hex customers and uniqueness (40:31) The future in a world of data acquisition (45:30) Crossover between analytics and ML (51:33) Advice for data engineers (57:19) The Data Stack Show is a weekly podcast powered by Rudder...

Jul 06, 20221 hr 3 min

93: There Is No Data Observability Without Lineage with Kevin Hu of Metaplane

Highlights from this week’s conversation include: Kevin’s background and career journey (1:54) Metaplane and the problem that is solves (6:47) The silence of data problems (9:53) Data physics work that requires more (13:35) Trusting data when bugs are present (19:12) Building a navigable experience (22:36) Developing anomaly detection (30:06) What Metaplane provides today (35:05) Metaplane’s plans for the future (37:45) Comparing Bigquery, Snowflake, and Redshift (40:56) Why data goes bad (48:15...

Jun 29, 20221 hr 5 min

92: Building a Decentralized Storage System for Media File Collaboration with Tejas Chopra

Highlights from this week’s conversation include: Tejas’ background and career journey (2:49, 43:04) Digital collaboration with Netflix Drive (7:57) A formal version control component (23:44) Centralized store vs. local affairs (31:05) The different skill sets a data engineer needs (37:38) How to get into data engineering (40:57) New technologies coming into day-to-day work (44:39) The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to dat...

Jun 22, 202256 min

91: The Future of Streaming Data with Stripe, Deephaven, Materialize, and Benthos

Highlights from this week’s conversation include: How we should think about batch versus streaming (6:02) Defining “streaming ETL” (9:34) A brief history of streaming processing platforms (22:07) The birth and evolution of Benthos (28:41) What led Jeff to build a new tool (34:29) Why you shouldn’t share all the data (37:23) Making streaming technologies approachable to engineers (42:09) Breaking out of traditional terminology (52:58) The Data Stack Show is a weekly podcast powered by RudderStack...

Jun 15, 20221 hr

The PRQL: Can Streaming Simplify Your Data Flows?

Eric and Kostas preview their upcoming livestream panel talking about all things streaming. Don't miss next week's episode with experts from Stripe, Deephaven, Materialize and Benthos

Jun 10, 20223 min

90: The Modern Data Stack Has a Join Problem with Ahmed Elsamadisi of Narrator AI

Highlights from this week’s conversation include: Ahmed’s background and career journey (2:27) Why the modern data stack “sucks” (4:53) The limitations of progress (9:13) Showing data with only 11 columns (11:55) Managing one table that rules them all (19:02) Viewing the world as timestamped activities (32:40) When this model becomes harder to use (35:15) The two parts you need in a company (44:41) Those who use Narrator (48:32) The Data Stack Show is a weekly podcast powered by RudderStack, the...

Jun 08, 202257 min
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast