The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory. Learn from the biggest influencers in tech about their practical day-to-day data challenges and solutions in a casual and fun setting.
SEASON 1 DATA BROS
Eldad and Boaz Farkash shared the same stuffed toys growing up as well as a big passion for data. After founding Sisense and building it to become a high-growth analytics unicorn, they moved on to their next venture, Firebolt, a leading high-performance cloud data warehouse.
SEASON 2 DATA BROS
In season 2 Eldad adopted a brilliant new little brother, and with their shared love for query processing, the connection was immediate. After excelling in his MS, Computer Science degree, Benjamin Wagner joined Firebolt to lead its query processing team and is a rising star in the data space.
For inquiries contact tamar@firebolt.io
Website: https://www.firebolt.io
Last refreshed: ⓘ
Follow this podcast in the Metacast mobile app to refresh it and see new episodes.
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more
Scaling AI from proof-of-concept to production requires more than just deploying models; it demands robust evaluation frameworks, human oversight, and a fundamental shift in how engineering teams approach development. In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Rohit Girme , Staff Software Engineer at Airbnb, to explore how Airbnb built a Gen AI evaluation platform to assess LLM outputs across product surfaces, from customer support bots to search and bookin...
AI agents are moving beyond simple automation into collaborative design workflows requiring fundamentally different approaches to user experience, model training, and infrastructure than traditional ML systems. In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Paul Tune , Staff Research Scientist at Canva, to explore how the design platform is building agentic workflows, managing multimodal data pipelines, and tackling the unique challenge of teaching machines to ...
In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Soumya Batra , founder and CEO of WisePort AI and former tech lead at Meta where she led safety efforts for Llama 2 and Llama 3, to explore the evolution of NLP, the complete lifecycle of foundation model training, and why the next AI frontier lies in natively agentic systems rather than simply scaling larger transformers. What You'll Learn: Why historical NLP work becomes obsolete with each paradigm shift: Underst...
In this episode of The Data Engineering Show, host Benjamin Wagner sits down with Nikita Lapkov , Senior Software Engineer at Cloudflare, to explore the architecture, design decisions, and future roadmap of R2 SQL- Cloudflare's new R2-based distributed query engine launched in September 2024. What You'll Learn: How to leverage existing query engines strategically : Why Cloudflare chose Apache Data Fusion for single-node query processing rather than building an analytical engine from scratch, fre...
In this episode of The Data Engineering Show, host Benjamin sits down with Nikhil Simha , CTO of Zipline AI and co-author of Chronon, to explore how a declarative feature platform solves the speed-vs-scale paradox in modern ML infrastructure, from fraud detection at Airbnb to powering OpenAI's recommendation systems. What You'll Learn: How to eliminate the data scientist-to-ML engineer bottleneck by generating Spark, Flink, and orchestration pipelines automatically from simple SQL queries, enabl...
In this episode of The Data Engineering Show, host Benjamin sits down with Magnus Dahlbäck , Senior Director of Data and Platform at Voi, to explore how a rapidly scaling European e-scooter company transformed its data infrastructure, adopted a metrics-first approach to analytics, and is now leveraging AI to solve real-time operational challenges across 150 cities and 150,000 vehicles. What You'll Learn: How to escape the "dashboard chaos" trap by adopting a metrics-first architecture with a sem...
In this episode of The Data Engineering Show, Benjamin sits down with Artie CTO and co-founder Robin Tang , to explore the complexities of high-performance data movement. Robin shares his journey from building Maxwell at Zendesk to scaling data systems at Open Door, highlighting the gap between business-oriented SaaS connectors and the rigorous demands of production database replication. Robin dives deep into Artie’s architecture, explaining how they leverage a split-plane model (Control Plane a...
In this episode of the Data Engineering Show, host Benjamin Wagner sits down with Ritesh Varyani , Staff Software Engineer at Lyft, to explore how the company manages a sophisticated multi-engine data stack serving thousands of engineers, while simultaneously integrating AI across infrastructure and user-facing analytics. What You'll Learn: How to architect a polyglot data platform that serves fundamentally different workloads, Spark for ML training and massive parallel processing, Trino for das...
What does MLOps look like when you are deploying 60 billion machine learning predictions a day ? Maddie Daianu , Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her team manages terabytes of daily data on Google Cloud and explains the massive strategic pivot they are undertaking right now: T...
In this episode of The Data Engineering Show, Benjamin and Eldad are joined by Ashok Singamaneni , a Principal Data Engineer at Nike. Ashok dives deep into his work on the open-source projects BrickFlow and Spark Expectations. He shares his journey from mechanical engineering to data engineering and the lessons learned over a decade of tackling production data quality issues that lead to costly recomputes. Ashok explains the philosophy behind Spark Expectations: treating the ingestion and transf...
This episode details Instacart's strategic shift from Elasticsearch to a self-hosted PostgreSQL cluster for its retailer search, driven by the unique demands of fast-changing grocery inventory. Ankit Mittal explains how consolidating search, ranking, and filtering into Postgres, leveraging extensions like PG Vector and robust data pipelines, drastically reduced network calls and improved efficiency. The discussion also covers the architectural decisions, trade-offs, and future directions for building high-performance search systems.
Explore the future of AI-powered business intelligence with Lei Tang , CTO and Co-founder of Fabi.ai , as he discusses the evolution from traditional self-service BI to "Vibe-analytics." Learn how AI is transforming data accessibility, enabling anyone to perform sophisticated analytics without deep technical expertise. From building trust in AI-generated insights to creating intelligent semantic layers, discover how modern BI platforms are bridging the gap between data teams and business stakeho...
Journey inside Uber's innovative AI assistant "Genie" with Paarth Chotani , Staff Engineer at Uber , as he shares how they're revolutionizing on-call support using LLMs and vector search. From processing massive amounts of internal documentation to building scalable RAG pipelines, discover how Uber tackles the challenges of implementing AI assistants at scale. Get insights into the evolution from traditional chatbots to agent-based solutions, and learn practical lessons about staying current in ...
AI's transformative impact on data engineering and analytics is reshaping how professionals create value, shifting focus from technical skills to strategic thinking and communication. In this episode of The Data Engineering Show, the bros talk with Sumit Gupta , Lead BI Engineer at Notion, about his journey through prominent tech companies, modern data stacks, and how AI is revolutionizing data workflows and professional development. What You'll Learn: How modern data stacks are evolving with to...
In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Yingjun Wu , founder and CEO of Rising Wave , to explore the evolution of stream processing systems and the innovations his company is bringing to the space. What you’ll learn: Yingjun's journey from academic research in stream processing to founding Rising Wave, and the challenges of building trust in a new database system. How Rising Wave's architecture, using S3 as primary storage, delivers second-level sca...
In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources. What You’ll Learn: How Apache Gravitino differs from others like Unity catalog and Polaris by being able to support multiple catalog systems. What the “Push-Down Permission Management” security model is and how to implement it across different data sys...
In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. Together, they: Talk about the journey of DuckDB, an open-source analytical database system designed as a universal wrangling tool. Explain how DuckDB differs from SQLite, highlighting the analytical and transactional use cases. Discuss DuckDB’s special feature and its approach to innovation including creating their Parquet Reader. Explore the simple and...
In this episode of The Data Engineering Show , the bros sit with Daniel Pálma, Head of Marketing at Estuary. Join them as they: Talk about Daniel’s career transition from data engineering to marketing and how his background in data engineering has been a tremendous help to his marketing competence. Discuss the role of AI in the evolution of data movement ensuring a faster and easier process of creating data pipelines. Shine light on the challenges of vector databases and structured data in AI ap...
In this episode of The Data Engineering Show, host Benjamin and co-host Eldad sit with Chad Sanderson, CEO and co-founder of Gable AI to explore the interesting world of data change management. Join them as they: Delve into challenges of data quality, how it degrades over time and the one-sided data quality checks on the “last mile” of the data supply chain. Talk about how Gable works through a 3-layer flow of technology which is to identify data production points, trace the data flow and commun...
Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning leadership with data projects. They discuss: From Excel to Expert: From basic Excel tasks to a full ma...
In this special roundup episode of The Data Engineering Show , the Bros revisits some of the best bits from episodes with data thought leaders Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan, spotlighting essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspectives on where data engineering is heading and why certain challenges persist. Topics include: ...
In this episode of The Data Engineering Show , the bros, Eldad and Benjamin are joined by Ryanne Dolan from LinkedIn to discuss the innovative Hoptimator (H2) project. This conversation reveals how LinkedIn has improved its data pipelines by automating the setup and management of complex workflows. Together they cover: Automated Data Pipelines: Ryanne explains how Hoptimator allows users to create and manage data pipelines using just a simple SQL SELECT query, streamlining the process of setting...
SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging ML for autonomous database optimization, using Postgres for practical applications, tuning production...
Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado...
Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, ...
Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado...
Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded. If you consider yourself a hardcore engineer, this episode is for you. The Data Engineering Show is brought to you by firebolt.io and handcrafted b...
There are two types of data influencers on LinkedIn: 1. Those who talk directly about the products and companies they work for 2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly? We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engineering Show episode. Megan is one of those influencers that combine the two approaches, and with al...
Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important. Without software engineering skills you’ll be limited to the rigid capabilities of your stack. But wi...
Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It’s no longer just about the data volume, it’s about quality and relevance. The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so Previous guests include: Joseph Machado of Linkedin, Metthew Weingart...