The Analytics Engineering Podcast - podcast cover

The Analytics Engineering Podcast

dbt Labs, Inc.roundup.getdbt.com
Tristan Handy has been curating the Analytics Engineering Roundup newsletter since 2015, pulling together the internet’s best data science & analytics articles. Tristan and co-host Julia Schottenstein now bring the Roundup to real life, hosting biweekly conversations with data practitioners inventing the future of analytics engineering. You can view full episode summaries and read back issues of the Roundup newsletter at https://roundup.getdbt.com. The podcast is sponsored by dbt labs, makers of the data transformation framework dbt. To reach our team, drop a note to [email protected].

Episodes

Building a data team from the beginning (w/ Daniel Avancini)

Daniel Avancini is the chief data officer and co-founder of Indicium —a fast-growing data consultancy started in Brazil. There are a lot of data consultancies around the world, and a lot of them do great work. What has been so fascinating about Indicium’s journey is their HR model. Rather than primarily hiring experienced professionals, they decided to go hard on training. They built a talent pipeline with courses and an internal onboarding process that takes new employees from zero to 60 over a...

Jan 26, 202550 minSeason 1Ep. 75

Data engineering at Snowflake (w/ Rahul Jain)

A look inside at the data work happening at a company making some of the most advanced technologies in the industry. Rahul Jain, data engineering manager at Snowflake, joins Tristan to discuss Iceberg, streaming, and all things Snowflake. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com . The Analytics Engineering Podcast is sponsored by dbt Labs....

Jan 12, 202544 minSeason 1Ep. 74

The intersection of UI, exploratory data analysis, and SQL (w/ Hamilton Ulmer)

Hamilton Ulmer is working at the intersection of UI, Exploratory Data Analysis, and SQL at MotherDuck, and he's built a long career in EDA. Hamilton and Tristan dive deep into the history of exploratory data analysis. Even if you spend most of your time below the frontend layer of the stack, it is important to understand the trends in both the practice of data visualization and the technologies that underlie that practice. For full show notes and to read 6+ years of back issues of the podcast's ...

Dec 22, 202451 min

Making data movement as reliable as electricity (w/ Taylor Brown)

Fivetran recently passed $300 million ARR and has over 7,000 customers globally. Taylor Brown, the cofounder and COO of Fivetran, joins the show to talk about Fivetran’s moat, the impact of AI on the data ingestion space, and open table formats and catalogs. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com . The Analytics Engineering Podcast is sponsored by dbt Labs....

Dec 08, 202447 minSeason 1Ep. 72

Data as an assembly line (w/ Cedric Chin)

Cedric Chin runs Commoncog—a publication about accelerating business expertise. He joins Tristan to talk about the analytics development lifecycle, how organizations value (or misvalue) data, and why “data teams are not some IT helpdesk to be ignored.” For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com . The Analytics Engineering Podcast is sponsored by dbt Labs....

Nov 17, 202451 min

The data jobs to be done (w/ Erik Bernhardsson)

Erik Bernhardsson, the CEO and co-founder of Modal Labs, joins Tristan to talk about Gen AI, the lack of GPUs, the future of cloud computing, and egress fees. They also discuss whether the job title of data engineer is something we should want more or less of in the future. Erik’s not afraid of a spicy take, so this is a fun one. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com . The Analytics Engineering Podcast is...

Nov 03, 202443 minSeason 1Ep. 70

Coalesce 2024 edition: What’s next for data teams? (w/ Scott Breitenother)

Show description: Scott Breitenother, founder of data consultancy Brooklyn Data Co., joins Tristan at Coalesce 2024 in Las Vegas to discuss the early days of dbt, the evolution of data teams, and what's next for the dbt community. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com . The Analytics Engineering Podcast is sponsored by dbt Labs.

Oct 20, 202444 min

The current state of the AI ecosystem (w/ Julia Schottenstein)

Former co-host Julia Schottenstein returns to the show to go deep into the world of LLMs. Julia joined LangChain as an early employee, in Tristan’s words, to “Basically solve all of the problems that aren't specifically in product and engineering.” LangChain has become one of, if not the primary frameworks for developing applications using large language models. There are over a million developers using LangChain today, building everything from prototypes to production AI applications.

Oct 06, 202446 min

Creating value from GenAI in the enterprise (w/ Nisha Paliwal)

Nisha Paliwal, who leads enterprise data tech at Capital One, joins Tristan to discuss building a strong data culture for in the world of AI. She is the co-author of the book Secrets of AI Value Creation. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com . The Analytics Engineering Podcast is sponsored by dbt Labs.

Sep 22, 202445 minSeason 1Ep. 67

Developer productivity on GitHub Copilot (w/ Eirini Kalliamvakou)

Dr. Eirini Kalliamvakou is a senior researcher at GitHub Next. Eirini has built a career on studying software engineers, how to measure their productivity, how developer experience impacts productivity, and more. Recently, Eirini has been working on quantifying the impacts of GitHub Copilot. Does it actually help software engineers be more productive? Tristan and Eirini explore how to quantify developer productivity in the first place, and finally, arriving at whether or not Copilot‌ makes a dif...

Sep 08, 202454 minSeason 1Ep. 66

The rapid experimentation of AI agents (w/ Yohei Nakajima)

Yohei Nakajima is an investor by day and coder by night. In particular, one of his projects, an AI agent framework called BabyAGI that creates a plan-execute loop, got a ton of attention in the past year. The truth is that AI agents are an extremely experimental space, and depending on how strict you want to be with your definition, there aren't a lot of production use cases today. Yohei discusses the current state of AI agents and where they might take us. For full show notes and to read 6+ yea...

Jun 09, 202446 min

Funnel analytics and AI models for event sequences (w/ Misha Panko)

Misha Panko has worked in data for a long time, including on high performance data teams at Uber and Google. Today, Misha is the co-founder and CEO of Motif Analytics , a product focused on helping growth and ops teams understand their event data. In this episode, Tristan and Misha nerd out about the state of the art in computational neuroscience, where Misha got his PhD. They then go deep into event stream data and how it differs from classical fact and dimension data, and why it needs differen...

May 26, 202444 minSeason 1Ep. 63

From Moneyball to Gen AI

Eric Avidon is a journalist at TechTarget who's interviewed Tristan a few times, and now Tristan gets to flip the script and interview Eric. Eric is a journalist veteran, covering everything from finance to the Boston Red Sox, but now he spends a lot of time with vendors in the data space and has a broad view of what's going on. Eric and Tristan discuss AI and analytics and how mature these features really are today, data quality and its importance, the AI strategies of Snowflake and Databricks,...

May 12, 202438 minSeason 1Ep. 62

Being Pro-Human in the AI Era

Barry McCardel is the co-founder and CEO of Hex . Hex is an analytics tool that's structured around a notebook experience, but as you'll hear in the episode, goes well beyond the traditional notebook. We're big fans of Hex at dbt Labs, and use it for a bunch of our internal data work. In this episode, Barry and Tristan discuss notebooks and data analysis, before zooming out to discuss the hype cycle of data science, how AI is different, the experience of building AI products, and how AI will imp...

Apr 21, 202450 minSeason 1Ep. 60

The 2024 Machine Learning, AI & Data Landscape (w/ Matt Turck)

Matt Turck has been publishing his ecosystem map since 2012. It was first called the Big Data Landscape. Now it’s the Machine Learning, AI & Data (MAD) Landscape . The 2024 MAD Landscape includes 2,011(!) logos, which Matt attributes first a data infrastructure cycle and now an ML/AI cycle. As Matt writes, “Those two waves are intimately related. A core idea of the MAD Landscape every year has been to show the symbiotic relationship between data infrastructure, analytics/BI, ML/AI, and applicati...

Apr 07, 202436 minSeason 1Ep. 61

How the Media Covers Gen AI (w/ Matthew Lynley, Supervised)

Matthew Lynley is a bit of a hybrid. He's been a long-time journalist covering enterprise tech, currently in his fantastic AI and data newsletter Supervised , and he's also been a hands-on data practitioner. Matthew has covered the analytics tech stack, but this time Tristan turns the tables to get Matthew’s perspective on the rise of Gen AI as a topic in the popular press, what's going on in the space today, and where AI is headed. For full show notes and to read 6+ years of back issues of the ...

Mar 24, 202448 minSeason 1Ep. 59

AI's Impact in the World of Structured Data Analytics (w/ Juan Sequeda, data.world)

Juan Sequeda is a principal data scientist and head of the AI Lab at data.world, and is also the co-host of the fantastic data podcast Catalog and Cocktails. This episode tackles semantics, semantic web, Juan’s research in how raw text-to-SQL performs versus text-to-semantic layer , and where we both believe AI will make an impact in the world of structured data analytics. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdb...

Mar 10, 202448 minSeason 1Ep. 58

The End of the Modern Data Stack (w/ Benn Stancil, Mode)

Benn Stancil, cofounder and CTO at Mode, returns to The Analytics Engineering Podcast to discuss the evolution of the term "modern data stack" and its value today. Tristan wrote on this idea for The Analytics Engineering Roundup in Is the Modern Data Stack Still a Useful Idea ? For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com . The Analytics Engineering Podcast is sponsored by dbt Labs....

Feb 25, 202446 minSeason 1Ep. 57

Data Mesh Architecture at Large Enterprises (w/ Moritz Heimpel and Ben Flusberg)

Moritz Heimpel from Siemens and Ben Flusberg from Cox Automotive have very similar jobs. They both act as stewards of the data strategies at large, complex companies. In this episode, we get into what it’s like to collaborate with data at scale. Ben and Mortitz share their experiences adopting a data mesh architecture and what that looks like at their organizations. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com ....

Dec 08, 202346 minSeason 1Ep. 55

Let's Talk About Data Vault (w/ Brandon Taylor and Michael Olschimke)

If Data Vault is a new term for you, it’s a data modeling design pattern. We’re joined by Brandon Taylor, a senior data architect at Guild, and Michael Olschimke, who is the CEO of Scalefree—the consulting firm whose co-founder Dan Lindstedt is credited as the designer of the data vault architecture. In this conversation with Tristan and Julia, Michael and Brandon explore the Data Vault approach among data warehouse design methodologies. They discuss Data Vault’s adoption in Europe, its alignmen...

Nov 17, 202344 minSeason 1Ep. 54

Navigating AI Complexity (w/ Jonathan Frankle)

Jonathan Frankle is the Chief Scientist at MosaicML, which was recently bought by Databricks for $1.3 billion. MosaicML helps customers train generative AI models on their data. Lots of companies are excited about gen AI, and the hope is that their company data and information will be what sets them apart from the competition. In this conversation with Tristan and Julia, Jonathan discusses a potential future where you can train specialized, purpose-built models, the future of MosaicML inside of ...

Nov 03, 202346 minSeason 1Ep. 53

Career Growth in Data Roles (w/ Hubspot's Kasey Mazza at Coalesce 2023)

In this conversation with Tristan recorded at Coalesce 2023, Kasey Mazza, an analytics engineering manager on the RevOps team at HubSpot, discusses the roles of data analysts and analytics engineers, the importance of building internal data communities, and the evolving landscape of data teams. Watch Kasey’s Coalescse 2023 presentation The career growth software development lifecycle . For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://...

Oct 20, 202329 minSeason 1Ep. 56

Operationalizing Your Warehouse, Streaming Analytics, and Cereal (W/ Arjun Narayan of Materialize and Nathan Bean of General Mills)

It turns out data plays a big role in getting cereal manufactured and delivered so you can enjoy your Cheerios reliably for breakfast. We talk with Arjun Narayan, CEO of Materialize, a company building an operational warehouse, and Nathan Bean, a data leader at General Mills responsible for all of the company's manufacturing analytics and insights. We discuss Materialize’s founding story, how streaming technology has matured, and how exactly companies are leveraging their warehouse to operationa...

Oct 06, 202342 minSeason 1Ep. 52

Roche’s Data Transformation Journey (w/ Yannick Misteli)

Yannick Misteli is the head of engineering for the go-to-market domain at Roche, a $250 billion multinational pharmaceutical and diagnostics company. Roche was an early supporter of dbt Cloud, and Yannick helped move his team of 120+ engineers to a modern data stack. He always finds a way to push the boundaries to make a large company founded in 1896 incredibly modern and innovative. We wanted to know more about the "how" of the work—the people, process, and technology. Read more about Roche's d...

Sep 22, 202340 minSeason 1Ep. 51

The State of Databases Today (w/ Andy Pavlo)

Andy Pavlo is a professor of databaseology (he says it's a made-up word) at Carnegie Mellon and currently on leave to build his own company—OtterTune, which uses AI to figure out the settings to get the best performance out of databases. He is one of the preeminent minds on databases and a die-hard relational database maximalist. We talk about the state of databases today, why there are so many specialized databases (and if we need so many), why tuning databases is so hard but important, and how...

Sep 08, 202348 minSeason 1Ep. 50

Bring Your Own Data to LLMs (W/ Jerry Liu of LlamaIndex)

Jerry Liu is the CEO and co-founder of LlamaIndex. LlamaIndex is an open-source framework that helps people prep their data for use with large language models in a process called retrieval augmented generation. LLMs are great decision engines, but in order for them to be useful for organizations, they need additional knowledge and context, and Jerry discusses how companies are bringing their data to tailor LLMs for their needs. For full show notes and to read 6+ years of back issues of the podca...

Aug 25, 202343 minSeason 1Ep. 49

Ramp's $8 Billion Data Strategy (W/ Ian Macomber and Ryan Delgado)

Ian Macomber, head of analytics engineering and data science at Ramp and formerly the VP of analytics and data engineering at Drizly, and Ryan Delgado, a staff software engineer at Ramp, have played pivotal roles in establishing Ramp's data team from the ground up and are spearheading the development of their comprehensive roadmap. In this conversation with Tristan and Julia, Ian and Ryan share insights on how Ramp's data team transformed unstructured data from contracts into valuable insights t...

Aug 11, 202349 minSeason 1Ep. 48

dbt Labs on dbt (w/ Daniel Le)

Daniel Le is the CFO at dbt Labs where he has built multiple teams. He is also the former head of FP&A and operations at Zoom, and he helped scale FP&A as the former finance director at Okta. In this conversation with Julia, Daniel shares his view as CFO on the challenges SaaS companies face and the importance of finance teams creating a holistic view of their business. Daniel gives advice to data leaders about how they can automate business processes with dbt Cloud and use self-service analytic...

Jul 28, 202331 minSeason 1Ep. 47

The Arc of Data Innovation (w/ Bob Muglia, former CEO of Snowflake)

Bob Muglia likely needs no introduction. The former CEO of Snowflake led the company during its early, transformational years after a long career at Microsoft and Juniper. Bob recently released the book The Datapreneurs about the arc of innovation in the data industry, starting with the first relational databases all the way to the present craze of LLMs and beyond. In this conversation with Tristan and Julia, Bob shares insights into the future of data engineering and its potential business impa...

Jul 12, 202348 minSeason 1Ep. 45

It's 2023, and Privacy Is Now Fun! (w/ Ian Coe of Tonic.ai + Abhishek Bhowmick of Samooha)

Advances in ML have transformed data privacy from a regulatory necessity into an opportunity to improve the work of data people. Synthetic data for modeling + testing is one example of a hard thing that's now easy - and in this conversation with Tristan and Julia, Ian + Abhishek cover many other ways that privacy can actually be a skill that propels your work forward, rather than a mere legal best practice. For full show notes and to read 6+ years of back issues of the podcast's companion newsle...

Apr 21, 202348 minSeason 1Ep. 45