Data Brew by Databricks - podcast cover

Data Brew by Databricks

Databricksdatabricks.com
Welcome to Data Brew by Databricks with Denny and Brooke! In this series, we explore various topics in the data and AI community and interview subject matter experts in data engineering/data science. So join us with your morning brew in hand and get ready to dive deep into data + AI! For this first season, we will be focusing on lakehouses – combining the key features of data warehouses, such as ACID transactions, with the scalability of data lakes, directly against low-cost object stores.

Episodes

Data Brew Season 2 Episode 1: ML in Production

For our second season, we will be focusing on machine learning, from research to production. We will interview folks in academia and industry to discuss topics such as data ethics, production-grade infrastructure for ML, hyperparameter tuning, AutoML, and many more. In the season opener, Matei Zaharia discusses how he entered the field of ML, best practices for productionizing ML pipelines, leveraging MLflow & the Lakehouse architecture for reproducible ML, and his current research in this f...

Apr 22, 202131 minSeason 2Ep. 1

Data Brew Season 1 Episode 6: Journey of Big Data

Jules Damji and Tathagata Das guide us through their journey in big data and the evolution of data architecture in the past 30 years. They discuss some of the biggest changes in industry they’ve seen, as well as trends to look forward to in the coming years. This is a fun episode connecting all four authors of the Learning Spark, 2nd Edition book. See more at databricks.com/data-brew...

Feb 18, 202140 min

Data Brew Season 1 Episode 4: BI on Data Lakes - Making it Real for Retail

In this session, we discuss the lessons learned with Lara Minor, Senior Enterprise Data Manager at Columbia Sportswear, on how her team achieved a 70% reduction in pipeline creation time. This had reduced ETL workload times from four hours with previous data warehouses to minutes enabling near real-time analytics. Her team migrated from multiple legacy data warehouses, run by individual lines of business, to a single scalable, reliable, performant data lake. See more at databricks.com/data-brew...

Dec 22, 202029 minSeason 1Ep. 4

Data Brew Season 1 Episode 3: Demystifying Delta Lake

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake offers ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. It runs on top of your existing data lake and is fully compatible with Apache Spark APIs. For our “Demystifying Delta Lake” session, we will interview Michael Armbrust - committer and PMC member of Apache Spark™ and the original creator of Spark SQL. He currently leads the team at Databricks that design...

Dec 06, 202026 minSeason 1Ep. 3

Data Brew Season 1 Episode 2: Welcome to Lakehouse

Legacy approaches have failed to deliver on the promise of a single data architecture that can support every downstream use case from BI to AI. Lakehouse aspires to address this by combining the best of data warehouses and data lakes. Ali Ghodsi, Co-Founder and CEO of Databricks, and David Meyer, SVP of Product at Databricks, explain how. See more at databricks.com/data-brew

Nov 12, 202026 minSeason 1Ep. 2