Yohei Nakajima is an investor by day and coder by night. In particular, one of his projects, an AI agent framework called BabyAGI that creates a plan-execute loop, got a ton of attention in the past year. The truth is that AI agents are an extremely experimental space, and depending on how strict you want to be with your definition, there aren't a lot of production use cases today. Yohei discusses the current state of AI agents and where they might take us. For full show notes and to read 6+ yea...
Jun 09, 2024•46 min•Transcript available on Metacast Misha Panko has worked in data for a long time, including on high performance data teams at Uber and Google. Today, Misha is the co-founder and CEO of Motif Analytics , a product focused on helping growth and ops teams understand their event data. In this episode, Tristan and Misha nerd out about the state of the art in computational neuroscience, where Misha got his PhD. They then go deep into event stream data and how it differs from classical fact and dimension data, and why it needs differen...
May 26, 2024•44 min•Ep 63•Transcript available on Metacast Eric Avidon is a journalist at TechTarget who's interviewed Tristan a few times, and now Tristan gets to flip the script and interview Eric. Eric is a journalist veteran, covering everything from finance to the Boston Red Sox, but now he spends a lot of time with vendors in the data space and has a broad view of what's going on. Eric and Tristan discuss AI and analytics and how mature these features really are today, data quality and its importance, the AI strategies of Snowflake and Databricks,...
May 12, 2024•38 min•Ep 62•Transcript available on Metacast Barry McCardel is the co-founder and CEO of Hex . Hex is an analytics tool that's structured around a notebook experience, but as you'll hear in the episode, goes well beyond the traditional notebook. We're big fans of Hex at dbt Labs, and use it for a bunch of our internal data work. In this episode, Barry and Tristan discuss notebooks and data analysis, before zooming out to discuss the hype cycle of data science, how AI is different, the experience of building AI products, and how AI will imp...
Apr 21, 2024•50 min•Ep 60•Transcript available on Metacast Matt Turck has been publishing his ecosystem map since 2012. It was first called the Big Data Landscape. Now it’s the Machine Learning, AI & Data (MAD) Landscape . The 2024 MAD Landscape includes 2,011(!) logos, which Matt attributes first a data infrastructure cycle and now an ML/AI cycle. As Matt writes, “Those two waves are intimately related. A core idea of the MAD Landscape every year has been to show the symbiotic relationship between data infrastructure, analytics/BI, ML/AI, and applicati...
Apr 07, 2024•36 min•Ep 61•Transcript available on Metacast Matthew Lynley is a bit of a hybrid. He's been a long-time journalist covering enterprise tech, currently in his fantastic AI and data newsletter Supervised , and he's also been a hands-on data practitioner. Matthew has covered the analytics tech stack, but this time Tristan turns the tables to get Matthew’s perspective on the rise of Gen AI as a topic in the popular press, what's going on in the space today, and where AI is headed. For full show notes and to read 6+ years of back issues of the ...
Mar 24, 2024•48 min•Ep 59•Transcript available on Metacast Juan Sequeda is a principal data scientist and head of the AI Lab at data.world, and is also the co-host of the fantastic data podcast Catalog and Cocktails. This episode tackles semantics, semantic web, Juan’s research in how raw text-to-SQL performs versus text-to-semantic layer , and where we both believe AI will make an impact in the world of structured data analytics. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdb...
Mar 10, 2024•48 min•Ep 58•Transcript available on Metacast Benn Stancil, cofounder and CTO at Mode, returns to The Analytics Engineering Podcast to discuss the evolution of the term "modern data stack" and its value today. Tristan wrote on this idea for The Analytics Engineering Roundup in Is the Modern Data Stack Still a Useful Idea ? For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com . The Analytics Engineering Podcast is sponsored by dbt Labs....
Feb 25, 2024•46 min•Ep 57•Transcript available on Metacast Moritz Heimpel from Siemens and Ben Flusberg from Cox Automotive have very similar jobs. They both act as stewards of the data strategies at large, complex companies. In this episode, we get into what it’s like to collaborate with data at scale. Ben and Mortitz share their experiences adopting a data mesh architecture and what that looks like at their organizations. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com ....
Dec 08, 2023•46 min•Ep 55•Transcript available on Metacast If Data Vault is a new term for you, it’s a data modeling design pattern. We’re joined by Brandon Taylor, a senior data architect at Guild, and Michael Olschimke, who is the CEO of Scalefree—the consulting firm whose co-founder Dan Lindstedt is credited as the designer of the data vault architecture. In this conversation with Tristan and Julia, Michael and Brandon explore the Data Vault approach among data warehouse design methodologies. They discuss Data Vault’s adoption in Europe, its alignmen...
Nov 17, 2023•44 min•Ep 54•Transcript available on Metacast Jonathan Frankle is the Chief Scientist at MosaicML, which was recently bought by Databricks for $1.3 billion. MosaicML helps customers train generative AI models on their data. Lots of companies are excited about gen AI, and the hope is that their company data and information will be what sets them apart from the competition. In this conversation with Tristan and Julia, Jonathan discusses a potential future where you can train specialized, purpose-built models, the future of MosaicML inside of ...
Nov 03, 2023•46 min•Ep 53•Transcript available on Metacast In this conversation with Tristan recorded at Coalesce 2023, Kasey Mazza, an analytics engineering manager on the RevOps team at HubSpot, discusses the roles of data analysts and analytics engineers, the importance of building internal data communities, and the evolving landscape of data teams. Watch Kasey’s Coalescse 2023 presentation The career growth software development lifecycle . For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://...
Oct 20, 2023•29 min•Ep 56•Transcript available on Metacast It turns out data plays a big role in getting cereal manufactured and delivered so you can enjoy your Cheerios reliably for breakfast. We talk with Arjun Narayan, CEO of Materialize, a company building an operational warehouse, and Nathan Bean, a data leader at General Mills responsible for all of the company's manufacturing analytics and insights. We discuss Materialize’s founding story, how streaming technology has matured, and how exactly companies are leveraging their warehouse to operationa...
Oct 06, 2023•42 min•Ep 52•Transcript available on Metacast Yannick Misteli is the head of engineering for the go-to-market domain at Roche, a $250 billion multinational pharmaceutical and diagnostics company. Roche was an early supporter of dbt Cloud, and Yannick helped move his team of 120+ engineers to a modern data stack. He always finds a way to push the boundaries to make a large company founded in 1896 incredibly modern and innovative. We wanted to know more about the "how" of the work—the people, process, and technology. Read more about Roche's d...
Sep 22, 2023•40 min•Ep 51•Transcript available on Metacast Andy Pavlo is a professor of databaseology (he says it's a made-up word) at Carnegie Mellon and currently on leave to build his own company—OtterTune, which uses AI to figure out the settings to get the best performance out of databases. He is one of the preeminent minds on databases and a die-hard relational database maximalist. We talk about the state of databases today, why there are so many specialized databases (and if we need so many), why tuning databases is so hard but important, and how...
Sep 08, 2023•48 min•Ep 50•Transcript available on Metacast Jerry Liu is the CEO and co-founder of LlamaIndex. LlamaIndex is an open-source framework that helps people prep their data for use with large language models in a process called retrieval augmented generation. LLMs are great decision engines, but in order for them to be useful for organizations, they need additional knowledge and context, and Jerry discusses how companies are bringing their data to tailor LLMs for their needs. For full show notes and to read 6+ years of back issues of the podca...
Aug 25, 2023•43 min•Ep 49•Transcript available on Metacast Ian Macomber, head of analytics engineering and data science at Ramp and formerly the VP of analytics and data engineering at Drizly, and Ryan Delgado, a staff software engineer at Ramp, have played pivotal roles in establishing Ramp's data team from the ground up and are spearheading the development of their comprehensive roadmap. In this conversation with Tristan and Julia, Ian and Ryan share insights on how Ramp's data team transformed unstructured data from contracts into valuable insights t...
Aug 11, 2023•49 min•Ep 48•Transcript available on Metacast Daniel Le is the CFO at dbt Labs where he has built multiple teams. He is also the former head of FP&A and operations at Zoom, and he helped scale FP&A as the former finance director at Okta. In this conversation with Julia, Daniel shares his view as CFO on the challenges SaaS companies face and the importance of finance teams creating a holistic view of their business. Daniel gives advice to data leaders about how they can automate business processes with dbt Cloud and use self-service analytic...
Jul 28, 2023•31 min•Ep 47•Transcript available on Metacast Bob Muglia likely needs no introduction. The former CEO of Snowflake led the company during its early, transformational years after a long career at Microsoft and Juniper. Bob recently released the book The Datapreneurs about the arc of innovation in the data industry, starting with the first relational databases all the way to the present craze of LLMs and beyond. In this conversation with Tristan and Julia, Bob shares insights into the future of data engineering and its potential business impa...
Jul 12, 2023•48 min•Ep 45•Transcript available on Metacast Advances in ML have transformed data privacy from a regulatory necessity into an opportunity to improve the work of data people. Synthetic data for modeling + testing is one example of a hard thing that's now easy - and in this conversation with Tristan and Julia, Ian + Abhishek cover many other ways that privacy can actually be a skill that propels your work forward, rather than a mere legal best practice. For full show notes and to read 6+ years of back issues of the podcast's companion newsle...
Apr 21, 2023•48 min•Ep 45•Transcript available on Metacast Julia just got back from Data Council in Austin, a conference organized by Pete Sonderling, where lots of startups share what they're building, data practitioners go to learn in hands-on workshops, and of course investors go to spot the next big trend. In this episode, Taylor Murphy (Head of Product & Data at Meltano) + Pedram Navid (Founder, West Marin Data) join Julia to recap the conference and have a bit of fun. They talked streaming, how the MDS is growing up, new SQL variants, and, of cour...
Apr 07, 2023•42 min•Ep 44•Transcript available on Metacast Brad Culberson is a Principal Architect in the Field CTO’s office at Snowflake. Niall Woodward is a co-founder of SELECT, a startup providing optimization and spend management software for Snowflake customers. In this conversation with Tristan and Julia, Brad and Niall discuss all things cost optimization: cloud vs on-prem, measuring ROI, and tactical ways to get more out of your budget. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https:...
Mar 24, 2023•46 min•Ep 43•Transcript available on Metacast Nick Handel, as co-founder at Transform, helped develop the popular open source metrics framework MetricFlow. Drew Banin, a co-founder at dbt Labs, helped build the initial version of the dbt Semantic Layer, which launched last year. Transform was acquired in February by dbt Labs, and in this conversation with Tristan, they talk through their collective plans for the future of the dbt Semantic Layer. For full show notes and to read 7+ years of back issues of the podcast's companion newsletter, h...
Mar 10, 2023•43 min•Ep 42•Transcript available on Metacast Sarah and Chris are both at the forefront of bringing the promise of gen AI to our actual work as data people—which is a unique challenge! Precise truth is critical for business questions in a way that it’s not for a consumer search query. Sarah Nagy is the CEO of Seek AI, a startup that aims to use natural language processing to change how professionals work with data. Chris Aberger currently leads Numbers Station AI, a startup focused on data-intensive workflow automation. In this conversation...
Feb 24, 2023•48 min•Ep 41•Transcript available on Metacast Auren Hoffman currently serves as the CEO and Chief Historian at SafeGraph, a data-as-a-service company he founded, which provides primarily location data. In this conversation with Tristan and Julia, Auren shares how truly few companies are making use of 3rd-party datasets today, how opening up more datasets to public research could help us solve big problems, and a fun fact about Abraham Lincoln's (!) work in the industry. For full show notes and to read 6+ years of back issues of the podcast'...
Feb 10, 2023•45 min•Ep 40•Transcript available on Metacast Mike Stonebraker is a veritable database pioneer and a Turing Award recipient. In addition to teaching at MIT, he is a serial entrepreneur and co-creator of Postgres. Andy Palmer is a veteran business leader who serves as the CEO of Tamr, a company he co-founded with Mike. Through his seed fund Koa Labs, Andy has helped found and/or fund numerous innovative companies in diverse sectors, including health care, technology, and the life sciences. In this conversation with Tristan and Julia, Mike an...
Jan 27, 2023•48 min•Ep 39•Transcript available on Metacast Wes McKinney is the creator of pandas, co-creator of Apache Arrow, and now Co-founder/CTO at Voltron Data. In this conversation with Tristan and Julia, Wes takes us on a tour of the underlying guts, from hardware to data formats, of the data ecosystem. What innovations, down to the hardware level, will stack to lead to significantly better performance for analytics workloads in the coming years? To dig deeper on the Apache Arrow ecosystem, check out replays from their recent conference at https:...
Jan 06, 2023•47 min•Ep 38•Transcript available on Metacast Product experimentation is full of potholes for companies of any size, given the number of pieces (tooling, culture, process, persistence) that need to come together to be successful. Vijaye Raji (currently Statsig, formerly Facebook + Microsoft) and Sean Taylor (currently Motif Analytics, formerly Facebook + Lyft) have navigated these failure modes, and are here to help you (hopefully) do the same. This convo with Tristan + Julia is light on tooling + heavy on process: how to watch out for spil...
Dec 16, 2022•46 min•Ep 37•Transcript available on Metacast The first LIVE IRL episode! Stephen Bailey, data engineer at Whatnot and writer of an incredibly entertaining data substack, joins Tristan for a follow-up conversation to Stephen’s Coalesce talk, “Excel at nothing: how to be an effective generalist.” You can read Stephen’s writing at https://stkbailey.substack.com/ . For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by ...
Dec 02, 2022•27 min•Ep 35•Transcript available on Metacast WARNING: This episode contains detailed discussion of data contracts. The modern data stack introduces challenges in terms of collaboration between data producers and consumers. How might we solve them to ultimately build trust in data quality? Chad Sanderson leads the data platform team at Convoy, a late-stage series-E freight technology startup. He manages everything from instrumentation and data ingestion to ETL, in addition to the metrics layer, experimentation software and ML. Prukalpa Sank...
Nov 18, 2022•49 min•Ep 34•Transcript available on Metacast