The Data Engineering Show

The Firebolt Data Bros•podcasts.fame.so

The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting. SEASON 1 DATA BROS Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse. SEASON 2 DATA BROS In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space. For inquiries contact tamar@firebolt.io Website: https://www.firebolt.io

Last refreshed: June 17th, 2026 at 12:40 PM ⓘ

Follow this podcast in the Metacast mobile app to refresh it and see new episodes.

Follow on

Apple Podcasts

Spotify

RSS

Podcasts are better in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

AI for Data and Data for AI: The Dual Frontier of Modern Data Engineering with Pranav Motarwar

In this episode of The Data Engineering Show, host Benjamin Wagne r sits down with Pranav Motarwar , a data engineer who worked across major tech companies, and the intersection of AI and data infrastructure, to explore how artificial intelligence is fundamentally reshaping the data engineering landscape not by eliminating roles, but by bifurcating the field into two distinct, equally critical domains. What You'll Learn: - Why the "data engineering is dying" narrative is clickbait: Data engineer...

Jun 16, 2026•20 min•Ep. 57

AI Won't Replace Engineers, But This Framework Will Change How They Build with Rohit Girme

Scaling AI from proof-of-concept to production requires more than just deploying models; it demands robust evaluation frameworks, human oversight, and a fundamental shift in how engineering teams approach development. In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Rohit Girme , Staff Software Engineer at Airbnb, to explore how Airbnb built a Gen AI evaluation platform to assess LLM outputs across product surfaces, from customer support bots to search and bookin...

May 07, 2026•18 min•Ep. 56

The Framework Canva Uses for 200M+ Designers with Paul Tune

AI agents are moving beyond simple automation into collaborative design workflows requiring fundamentally different approaches to user experience, model training, and infrastructure than traditional ML systems. In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Paul Tune , Staff Research Scientist at Canva, to explore how the design platform is building agentic workflows, managing multimodal data pipelines, and tackling the unique challenge of teaching machines to ...

Apr 28, 2026•22 min•Ep. 55

Llama 2 & 3 Safety: Soumya Batra on Agentic AI Training

In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Soumya Batra , founder and CEO of WisePort AI and former tech lead at Meta where she led safety efforts for Llama 2 and Llama 3, to explore the evolution of NLP, the complete lifecycle of foundation model training, and why the next AI frontier lies in natively agentic systems rather than simply scaling larger transformers. What You'll Learn: Why historical NLP work becomes obsolete with each paradigm shift: Underst...

Apr 08, 2026•23 min•Ep. 54

The Data Fusion Secret & Why Custom Query Engines Fail with Nikita Lapkov

In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Nikita Lapkov , Senior Software Engineer at Cloudflare, to explore the architecture, design decisions, and future roadmap of R2 SQL- Cloudflare's new R2-based distributed query engine launched in September 2024. What You'll Learn: How to leverage existing query engines strategically : Why Cloudflare chose Apache Data Fusion for single-node query processing rather than building an analytical engine from scratch, fre...

Mar 24, 2026•18 min•Ep. 53

How Zipline AI Turns Weeks of Engineering Into Minutes of SQL Queries ft. Nikhil Simha

In this episode of The Data Engineering Show, host Benjamin sits down with Nikhil Simha , CTO of Zipline AI and co-author of Chronon, to explore how a declarative feature platform solves the speed-vs-scale paradox in modern ML infrastructure, from fraud detection at Airbnb to powering OpenAI's recommendation systems. What You'll Learn: How to eliminate the data scientist-to-ML engineer bottleneck by generating Spark, Flink, and orchestration pipelines automatically from simple SQL queries, enabl...

Mar 10, 2026•24 min•Ep. 52

The Geo-Data Problem Nobody Talks About And How Voi Solved It ft. Magnus Dahlbäck

In this episode of The Data Engineering Show, host Benjamin sits down with Magnus Dahlbäck , Senior Director of Data and Platform at Voi, to explore how a rapidly scaling European e-scooter company transformed its data infrastructure, adopted a metrics-first approach to analytics, and is now leveraging AI to solve real-time operational challenges across 150 cities and 150,000 vehicles. What You'll Learn: How to escape the "dashboard chaos" trap by adopting a metrics-first architecture with a sem...

Feb 19, 2026•16 min•Ep. 51

Why 99% of Data Teams Give Up on Real-Time And How Artie Changes That

In this episode of The Data Engineering Show, Benjamin sits down with Artie CTO and co-founder Robin Tang , to explore the complexities of high-performance data movement. Robin shares his journey from building Maxwell at Zendesk to scaling data systems at Open Door, highlighting the gap between business-oriented SaaS connectors and the rigorous demands of production database replication. Robin dives deep into Artie’s architecture, explaining how they leverage a split-plane model (Control Plane a...

Feb 03, 2026•29 min•Ep. 50

The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft

In this episode of the Data Engineering Show, host Benjamin Wagner sits down with Ritesh Varyani , Staff Software Engineer at Lyft, to explore how the company manages a sophisticated multi-engine data stack serving thousands of engineers, while simultaneously integrating AI across infrastructure and user-facing analytics. What You'll Learn: How to architect a polyglot data platform that serves fundamentally different workloads, Spark for ML training and massive parallel processing, Trino for das...

Dec 16, 2025•26 min•Ep. 49

60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer with Maddie Daianu

What does MLOps look like when you are deploying 60 billion machine learning predictions a day ? Maddie Daianu , Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her team manages terabytes of daily data on Google Cloud and explains the massive strategic pivot they are undertaking right now: T...

Nov 19, 2025•20 min•Ep. 48

Block Bad Data Before the Write with Nike’s Ashok Singamaneni

In this episode of The Data Engineering Show, Benjamin and Eldad are joined by Ashok Singamaneni , a Principal Data Engineer at Nike. Ashok dives deep into his work on the open-source projects BrickFlow and Spark Expectations. He shares his journey from mechanical engineering to data engineering and the lessons learned over a decade of tackling production data quality issues that lead to costly recomputes. Ashok explains the philosophy behind Spark Expectations: treating the ingestion and transf...

Oct 07, 2025•20 min•Ep. 47

Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal

This episode details Instacart's strategic shift from Elasticsearch to a self-hosted PostgreSQL cluster for its retailer search, driven by the unique demands of fast-changing grocery inventory. Ankit Mittal explains how consolidating search, ranking, and filtering into Postgres, leveraging extensions like PG Vector and robust data pipelines, drastically reduced network calls and improved efficiency. The discussion also covers the architectural decisions, trade-offs, and future directions for building high-performance search systems.

Sep 17, 2025•22 min•Ep. 46

Is Self-Service BI a False Promise? Lei Tang of Fabi.ai Thinks So

Explore the future of AI-powered business intelligence with Lei Tang , CTO and Co-founder of Fabi.ai , as he discusses the evolution from traditional self-service BI to "Vibe-analytics." Learn how AI is transforming data accessibility, enabling anyone to perform sophisticated analytics without deep technical expertise. From building trust in AI-generated insights to creating intelligent semantic layers, discover how modern BI platforms are bridging the gap between data teams and business stakeho...

Aug 28, 2025•21 min•Ep. 45

Building Uber's AI Assistant: How Genie Revolutionizes On-Call Support with Paarth Chothani from Uber

Journey inside Uber's innovative AI assistant "Genie" with Paarth Chotani , Staff Engineer at Uber , as he shares how they're revolutionizing on-call support using LLMs and vector search. From processing massive amounts of internal documentation to building scalable RAG pipelines, discover how Uber tackles the challenges of implementing AI assistants at scale. Get insights into the evolution from traditional chatbots to agent-based solutions, and learn practical lessons about staying current in ...

Jul 22, 2025•26 min•Ep. 44

From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta

AI's transformative impact on data engineering and analytics is reshaping how professionals create value, shifting focus from technical skills to strategic thinking and communication. In this episode of The Data Engineering Show, the bros talk with Sumit Gupta , Lead BI Engineer at Notion, about his journey through prominent tech companies, modern data stacks, and how AI is revolutionizing data workflows and professional development. What You'll Learn: How modern data stacks are evolving with to...

Jun 10, 2025•22 min•Ep. 43

How Rising Wave Is Redefining Real-Time Data with Postgres Power

In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Yingjun Wu , founder and CEO of Rising Wave , to explore the evolution of stream processing systems and the innovations his company is bringing to the space. What you’ll learn: Yingjun's journey from academic research in stream processing to founding Rising Wave, and the challenges of building trust in a new database system. How Rising Wave's architecture, using S3 as primary storage, delivers second-level sca...

May 07, 2025•32 min•Ep. 42

Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach

In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources. What You’ll Learn: How Apache Gravitino differs from others like Unity catalog and Polaris by being able to support multiple catalog systems. What the “Push-Down Permission Management” security model is and how to implement it across different data sys...

Apr 08, 2025•24 min•Ep. 41

Database Technology in the Age of AI with DuckDB Labs co-creator Hannes Mühleisen

In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. Together, they: Talk about the journey of DuckDB, an open-source analytical database system designed as a universal wrangling tool. Explain how DuckDB differs from SQLite, highlighting the analytical and transactional use cases. Discuss DuckDB’s special feature and its approach to innovation including creating their Parquet Reader. Explore the simple and...

Mar 19, 2025•31 min•Ep. 40

AI and Data Movement: Trends and Best Practices with Estuary’s Daniel Pálma

In this episode of The Data Engineering Show , the bros sit with Daniel Pálma, Head of Marketing at Estuary. Join them as they: Talk about Daniel’s career transition from data engineering to marketing and how his background in data engineering has been a tremendous help to his marketing competence. Discuss the role of AI in the evolution of data movement ensuring a faster and easier process of creating data pipelines. Shine light on the challenges of vector databases and structured data in AI ap...

Feb 11, 2025•31 min•Ep. 39

AI and Data Change Management with Chad Sanderson, CEO Gable AI

In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Chad Sanderson, CEO and co-founder of Gable AI to explore the interesting world of data change management. Join them as they: Delve into challenges of data quality, how it degrades over time and the one-sided data quality checks on the “last mile” of the data supply chain. Talk about how Gable works through a 3-layer flow of technology which is to identify data production points, trace the data flow and commun...

Jan 07, 2025•37 min•Ep. 38

Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success

Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects. They discuss: From Excel to Expert: From basic Excel tasks to a full ma...

Nov 26, 2024•25 min•Ep. 37

Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan

In this special roundup episode of The Data Engineering Show , the Bros revisits some of the best bits from episodes with data thought leaders Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan, spotlighting essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist. Topics include: ...

Oct 31, 2024•28 min•Ep. 39

The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn

In this episode of The Data Engineering Show , the bros, Eldad and Benjamin are joined by Ryanne Dolan from LinkedIn to discuss the innovative Hoptimator (H2) project. This conversation reveals how LinkedIn has improved its data pipelines by automating the setup and management of complex workflows. Together they cover: Automated Data Pipelines: Ryanne explains how Hoptimator allows users to create and manage data pipelines using just a simple SQL SELECT query, streamlining the process of setting...

Sep 24, 2024•33 min•Ep. 38

Vector Databases Won’t Replace SQL - Andy Pavlo

SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production...

Jun 04, 2024•43 min•Ep. 37

How ZoomInfo transitioned from data graveyards to ROI-driven data projects

Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado...

Apr 16, 2024•40 min•Ep. 36

Matthew Weingarten from Disney Streaming about Data Quality Best Practices

Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, ...

Mar 26, 2024•27 min•Ep. 35

Joseph Machado, Senior Data Engineer @ LinkedIn talks best practices

Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado...

Feb 29, 2024•26 min•Ep. 34

Professors Joe Hellerstein and Joseph Gonzalez on LLMs

Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded. If you consider yourself a hardcore engineer, this episode is for you. The Data Engineering Show is brought to you by firebolt.io and handcrafted b...

Jan 24, 2024•46 min•Ep. 33

Megan Lieu on powerful notebooks that enable collaboration

There are two types of data influencers on LinkedIn: 1. Those who talk directly about the products and companies they work for 2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly? We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode. Megan is one of those influencers that combine the two approaches, and with al...

Jan 01, 2024•32 min•Ep. 32

Transitioning from software engineering to data engineering

Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important. Without software engineering skills you’ll be limited to the rigid capabilities of your stack. But wi...

Nov 22, 2023•30 min•Ep. 31

For the best experience, listen in Metacast app for iOS or Android