The Data Engineering Show - podcast cover

The Data Engineering Show

The Firebolt Data Brospodcasts.fame.so
The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting. SEASON 1 DATA BROS Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse. SEASON 2 DATA BROS In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space. For inquiries contact tamar@firebolt.io Website: https://www.firebolt.io
Last refreshed:
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Download Metacast podcast app
Podcasts are better in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episodes

AI Won't Replace Engineers, But This Framework Will Change How They Build with Rohit Girme

Scaling AI from proof-of-concept to production requires more than just deploying models; it demands robust evaluation frameworks, human oversight, and a fundamental shift in how engineering teams approach development. In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Rohit Girme , Staff Software Engineer at Airbnb, to explore how Airbnb built a Gen AI evaluation platform to assess LLM outputs across product surfaces, from customer support bots to search and bookin...

May 07, 202618 minEp. 56

The Framework Canva Uses for 200M+ Designers with Paul Tune

AI agents are moving beyond simple automation into collaborative design workflows requiring fundamentally different approaches to user experience, model training, and infrastructure than traditional ML systems. In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Paul Tune , Staff Research Scientist at Canva, to explore how the design platform is building agentic workflows, managing multimodal data pipelines, and tackling the unique challenge of teaching machines to ...

Apr 28, 202622 minEp. 55

Llama 2 & 3 Safety: Soumya Batra on Agentic AI Training

In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Soumya Batra , founder and CEO of WisePort AI and former tech lead at Meta where she led safety efforts for Llama 2 and Llama 3, to explore the evolution of NLP, the complete lifecycle of foundation model training, and why the next AI frontier lies in natively agentic systems rather than simply scaling larger transformers. What You'll Learn: Why historical NLP work becomes obsolete with each paradigm shift: Underst...

Apr 08, 202623 minEp. 54

The Data Fusion Secret & Why Custom Query Engines Fail with Nikita Lapkov

In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Nikita Lapkov , Senior Software Engineer at Cloudflare, to explore the architecture, design decisions, and future roadmap of R2 SQL- Cloudflare's new R2-based distributed query engine launched in September 2024. What You'll Learn: How to leverage existing query engines strategically : Why Cloudflare chose Apache Data Fusion for single-node query processing rather than building an analytical engine from scratch, fre...

Mar 24, 202618 minEp. 53

How Zipline AI Turns Weeks of Engineering Into Minutes of SQL Queries ft. Nikhil Simha

In this episode of The Data Engineering Show, host Benjamin sits down with Nikhil Simha , CTO of Zipline AI and co-author of Chronon, to explore how a declarative feature platform solves the speed-vs-scale paradox in modern ML infrastructure, from fraud detection at Airbnb to powering OpenAI's recommendation systems. What You'll Learn: How to eliminate the data scientist-to-ML engineer bottleneck by generating Spark, Flink, and orchestration pipelines automatically from simple SQL queries, enabl...

Mar 10, 202624 minEp. 52

The Geo-Data Problem Nobody Talks About And How Voi Solved It ft. Magnus Dahlbäck

In this episode of The Data Engineering Show, host Benjamin sits down with Magnus Dahlbäck , Senior Director of Data and Platform at Voi, to explore how a rapidly scaling European e-scooter company transformed its data infrastructure, adopted a metrics-first approach to analytics, and is now leveraging AI to solve real-time operational challenges across 150 cities and 150,000 vehicles. What You'll Learn: How to escape the "dashboard chaos" trap by adopting a metrics-first architecture with a sem...

Feb 19, 202616 minEp. 51

Why 99% of Data Teams Give Up on Real-Time And How Artie Changes That

In this episode of The Data Engineering Show, Benjamin sits down with Artie CTO and co-founder Robin Tang , to explore the complexities of high-performance data movement. Robin shares his journey from building Maxwell at Zendesk to scaling data systems at Open Door, highlighting the gap between business-oriented SaaS connectors and the rigorous demands of production database replication. Robin dives deep into Artie’s architecture, explaining how they leverage a split-plane model (Control Plane a...

Feb 03, 202629 minEp. 50

The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft

In this episode of the Data Engineering Show, host Benjamin Wagner sits down with Ritesh Varyani , Staff Software Engineer at Lyft, to explore how the company manages a sophisticated multi-engine data stack serving thousands of engineers, while simultaneously integrating AI across infrastructure and user-facing analytics. What You'll Learn: How to architect a polyglot data platform that serves fundamentally different workloads, Spark for ML training and massive parallel processing, Trino for das...

Dec 16, 202526 minEp. 49

60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer with Maddie Daianu

What does MLOps look like when you are deploying 60 billion machine learning predictions a day ? Maddie Daianu , Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her team manages terabytes of daily data on Google Cloud and explains the massive strategic pivot they are undertaking right now: T...

Nov 19, 202520 minEp. 48

Block Bad Data Before the Write with Nike’s Ashok Singamaneni

In this episode of The Data Engineering Show, Benjamin and Eldad are joined by Ashok Singamaneni , a Principal Data Engineer at Nike. Ashok dives deep into his work on the open-source projects BrickFlow and Spark Expectations. He shares his journey from mechanical engineering to data engineering and the lessons learned over a decade of tackling production data quality issues that lead to costly recomputes. Ashok explains the philosophy behind Spark Expectations: treating the ingestion and transf...

Oct 07, 202520 minEp. 47

Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal

This episode details Instacart's strategic shift from Elasticsearch to a self-hosted PostgreSQL cluster for its retailer search, driven by the unique demands of fast-changing grocery inventory. Ankit Mittal explains how consolidating search, ranking, and filtering into Postgres, leveraging extensions like PG Vector and robust data pipelines, drastically reduced network calls and improved efficiency. The discussion also covers the architectural decisions, trade-offs, and future directions for building high-performance search systems.

Sep 17, 202522 minEp. 46

Is Self-Service BI a False Promise? Lei Tang of Fabi.ai Thinks So

Explore the future of AI-powered business intelligence with Lei Tang , CTO and Co-founder of Fabi.ai , as he discusses the evolution from traditional self-service BI to "Vibe-analytics." Learn how AI is transforming data accessibility, enabling anyone to perform sophisticated analytics without deep technical expertise. From building trust in AI-generated insights to creating intelligent semantic layers, discover how modern BI platforms are bridging the gap between data teams and business stakeho...

Aug 28, 202521 minEp. 45

Building Uber's AI Assistant: How Genie Revolutionizes On-Call Support with Paarth Chothani from Uber

Journey inside Uber's innovative AI assistant "Genie" with Paarth Chotani , Staff Engineer at Uber , as he shares how they're revolutionizing on-call support using LLMs and vector search. From processing massive amounts of internal documentation to building scalable RAG pipelines, discover how Uber tackles the challenges of implementing AI assistants at scale. Get insights into the evolution from traditional chatbots to agent-based solutions, and learn practical lessons about staying current in ...

Jul 22, 202526 minEp. 44

From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta

AI's transformative impact on data engineering and analytics is reshaping how professionals create value, shifting focus from technical skills to strategic thinking and communication. In this episode of The Data Engineering Show, the bros talk with Sumit Gupta , Lead BI Engineer at Notion, about his journey through prominent tech companies, modern data stacks, and how AI is revolutionizing data workflows and professional development. What You'll Learn: How modern data stacks are evolving with to...

Jun 10, 202522 minEp. 43

How Rising Wave Is Redefining Real-Time Data with Postgres Power

In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Yingjun Wu , founder and CEO of Rising Wave , to explore the evolution of stream processing systems and the innovations his company is bringing to the space. What you’ll learn: Yingjun's journey from academic research in stream processing to founding Rising Wave, and the challenges of building trust in a new database system. How Rising Wave's architecture, using S3 as primary storage, delivers second-level sca...

May 07, 202532 minEp. 42

Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach

In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources. What You’ll Learn: How Apache Gravitino differs from others like Unity catalog and Polaris by being able to support multiple catalog systems. What the “Push-Down Permission Management” security model is and how to implement it across different data sys...

Apr 08, 202524 minEp. 41

Database Technology in the Age of AI with DuckDB Labs co-creator Hannes Mühleisen

In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. Together, they: Talk about the journey of DuckDB, an open-source analytical database system designed as a universal wrangling tool. Explain how DuckDB differs from SQLite, highlighting the analytical and transactional use cases. Discuss DuckDB’s special feature and its approach to innovation including creating their Parquet Reader. Explore the simple and...

Mar 19, 202531 minEp. 40

AI and Data Movement: Trends and Best Practices with Estuary’s Daniel Pálma

In this episode of The Data Engineering Show , the bros sit with Daniel Pálma, Head of Marketing at Estuary. Join them as they: Talk about Daniel’s career transition from data engineering to marketing and how his background in data engineering has been a tremendous help to his marketing competence. Discuss the role of AI in the evolution of data movement ensuring a faster and easier process of creating data pipelines. Shine light on the challenges of vector databases and structured data in AI ap...

Feb 11, 202531 minEp. 39

AI and Data Change Management with Chad Sanderson, CEO Gable AI

In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Chad Sanderson, CEO and co-founder of Gable AI to explore the interesting world of data change management. Join them as they: Delve into challenges of data quality, how it degrades over time and the one-sided data quality checks on the “last mile” of the data supply chain. Talk about how Gable works through a 3-layer flow of technology which is to identify data production points, trace the data flow and commun...

Jan 07, 202537 minEp. 38

Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success

Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects. They discuss: From Excel to Expert: From basic Excel tasks to a full ma...

Nov 26, 202425 minEp. 37

Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan

In this special roundup episode of The Data Engineering Show , the Bros revisits some of the best bits from episodes with data thought leaders Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan, spotlighting essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist. Topics include: ...

Oct 31, 202428 minEp. 39

The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn

In this episode of The Data Engineering Show , the bros, Eldad and Benjamin are joined by Ryanne Dolan from LinkedIn to discuss the innovative Hoptimator (H2) project. This conversation reveals how LinkedIn has improved its data pipelines by automating the setup and management of complex workflows. Together they cover: Automated Data Pipelines: Ryanne explains how Hoptimator allows users to create and manage data pipelines using just a simple SQL SELECT query, streamlining the process of setting...

Sep 24, 202433 minEp. 38

Vector Databases Won’t Replace SQL - Andy Pavlo

SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production...

Jun 04, 202443 minEp. 37

How ZoomInfo transitioned from data graveyards to ROI-driven data projects

Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado...

Apr 16, 202440 minEp. 36

Matthew Weingarten from Disney Streaming about Data Quality Best Practices

Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, ...

Mar 26, 202427 minEp. 35

Joseph Machado, Senior Data Engineer @ LinkedIn talks best practices

Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado...

Feb 29, 202426 minEp. 34

Professors Joe Hellerstein and Joseph Gonzalez on LLMs

Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded. If you consider yourself a hardcore engineer, this episode is for you. The Data Engineering Show is brought to you by firebolt.io and handcrafted b...

Jan 24, 202446 minEp. 33

Megan Lieu on powerful notebooks that enable collaboration

There are two types of data influencers on LinkedIn: 1. Those who talk directly about the products and companies they work for 2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly? We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode. Megan is one of those influencers that combine the two approaches, and with al...

Jan 01, 202432 minEp. 32

Transitioning from software engineering to data engineering

Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important. Without software engineering skills you’ll be limited to the rigid capabilities of your stack. But wi...

Nov 22, 202330 minEp. 31

Vin Vashishta explains why we should stop using dashboards

Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It’s no longer just about the data volume, it’s about quality and relevance. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingart...

Oct 04, 202336 minEp. 30
For the best experience, listen in Metacast app for iOS or Android