Highlights from this week’s conversation include: Chang’s background and journey with Pandas (6:26) The persisting challenges in data collection and preparation (10:37) The resistance to change in using Python for data workflows (13:05) AI hype and its impact (14:09) The success and evolution of Pandas as a data framework (20:04) The vision for a next-generation data infrastructure (26:48] LanceDB's file and table format (34:35) Trade-Offs in Lance Format (42:45) Introducing the Vector Database ...
Oct 25, 2023•1 hr 21 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Chang She of Eto Labs.
Oct 23, 2023•5 min
Highlights from this week’s conversation include: Santona’s journey from nuclear physics to data science (4:59) The appeal of startups and wearing multiple hats (8:12) The challenge of pseudoscience in the news (10:24) Approaching data with creativity and rigor (13:22) Challenges and differences in data workflows (14:39) Schema Evolution and Quality Problems (27:01) Real-time Data Monitoring and Anomaly Detection (30:34) The importance of data as a business differentiator (35:48) The SQL job cre...
Oct 18, 2023•1 hr 6 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Santona Tuli of Upsolver.
Oct 16, 2023•6 min
Highlights from this week’s conversation include: How music impacted Bob’s data journey (3:16) Music’s relationship with creativity and innovation (11:38) The genesis of Weaviate and the idea of vector databases (14:09) The joy of creation (19:02) OLAP Databases (22:21) The progression of complexity in databases (24:31) Vector database (29:23) Scaling suboptimal algorithms (34:34) The future of vector space representation (35:51) Databases role in different industries (39:14) The brute force app...
Oct 11, 2023•1 hr 9 min
In this bonus conversation, Eric and Kostas preview their upcoming conversation with Bob van Luijt of Weaviate.
Oct 09, 2023•5 min
Highlights from this week’s conversation include: Nick’s background and journey in data (2:28) Founding Dagster Labs (7:50) The evolution of data engineering (12:32) Fragmentation in data infrastructure (15:04) The role of orchestration in data platforms (19:53) The importance of operational tools for data pipelines (25:01) Lessons learned from working with GraphQL (26:19) The role of the orchestrator in data engineering (34:51) The boundaries between data infrastructure and product engineering ...
Oct 04, 2023•1 hr 2 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Nick Schrock of Dagster Labs.
Oct 02, 2023•3 min
Highlights from this week’s conversation include: Amr’s extensive background in data (3:23) The evolution of neural networks (9:21) The role of supervised learning in AI (11:17) Explaining Vectara (13:07) Papers that laid the foundation for AI (15:02) Contextualized translation and personalization (20:07) Ease of use and answer-based search (25:01) AI and potential liabilities (35:54) Minimizing difficulties in large language models (36:43) The process of extracting documents in multidimensional...
Sep 27, 2023•1 hr 4 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Amr Awadallah of Vectara.
Sep 25, 2023•5 min
Highlights from this week’s conversation include: Alex’s background in the data space and the creation of Redpanda (4:23) The cost and complexity of streaming (11:07) The evolution of storage with Kafka (12:04) The distinction between streaming technologies (15:10) Simplicity as a Core Design Principle (27:03) Cost Efficiency in a Cloud Native Era (30:44) Removing complexity with Redpanda (34:21) Migrations and compatibility with Redpanda (40:35) The Future of Redpanda (43:44) The Story Behind R...
Sep 20, 2023•55 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Alex Gallego of Redpanda.
Sep 18, 2023•4 min
Highlights from this week’s conversation include: Emilie’s background and journey in data (3:42) The problem of three-way match (8:56) Operational workflows and how data stacks solve them (13:16) Turbine’s solution as a lightweight ERP (14:05) Workflows and analytics (14:59) Consolidating information into helpful application (27:41) Challenges in operational workflows (32:19) Friction and hurdles in ERP usage (39:28) A solution for purchase order management (40:47) Turbine’s focus and limitation...
Sep 13, 2023•1 hr 1 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Emilie Schario of Turbine.
Sep 11, 2023•7 min
Highlights from this week’s conversation include: Pardis’ background and journey in data (3:24) AI before the hype (8:37) Founding General Folders (12:36) Data collaboration challenges (15:31) Examples of data sharing (17:40) Data transfer in various industries (22:16) Defining the transfer problem (28:30) The demand for scalable solutions (32:06) Data transfer and model exposition (41:02) Data governance and API (43:23) Final thoughts and takeaways (56:48) The Data Stack Show is a weekly podcas...
Sep 06, 2023•1 hr 3 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Pardis Noorzad of General Folders.
Sep 04, 2023•6 min
Highlights from this week’s conversation include: Jakub’s journey into data and working with notebooks (2:43) Overview of Deepnote and its features (7:22) Notebook 1.0 and 2.0 (14:04) Notebook 3.0 and its potential impact (15:46) The need for collaboration across organizations (17:16) Real-time, asynchronous, and organizational collaboration (28:02) Challenges to collaboration (32:03) Notebooks as a universal computational medium (36:14) The rise of exploratory programming (41:40) The power of n...
Aug 30, 2023•1 hr
In this bonus episode, Eric and Kostas preview their upcoming conversation with Jakub Jurových of Deepnote.
Aug 28, 2023•4 min
Highlights from this week’s conversation include: Ken’s background and journey to Heap (2:32) Heap’s problem-solving approach (8:19) Auto-capture and its significance in the marketplace (13:03) Providing qualitative context: sessions and surveys (16:23) Collection and storage of data (25:42) Challenges of real-time data collection (26:40) The true gap in the market today (37:39) Consolidation and aggregation of data solutions (41:58) Simplifying the data stack (47:32) A different approach in eng...
Aug 23, 2023•1 hr 7 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Ken Fine of Heap.
Aug 21, 2023•4 min
Highlights from this week’s conversation include: The need for reverse ETL in marketing (2:24) Closing the gap between engineering, data, and marketing teams (8:37) The analytics persona’s opportunity (11:53) Interface layer (13:06) Approach to messy warehouse data (15:57) The need for a complicated infrastructure (28:43) Challenges in data integration for marketers (29:26) The evolution of the analytics stack (31:53) Orchestration of the data warehouse (38:39) The role of marketing tools (40:35...
Aug 16, 2023•53 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Chris Sell of GrowthLoop.
Aug 14, 2023•4 min
Highlights from this week’s conversation include: Brendan’s background and journey to Groundswell (2:25) The impact of generative AI on sales reps and product building (5:38) Lead sourcing challenges (12:22) Salesforce as a data model (14:30) The need for guardrails in building applications around sales (24:37) The question of interfaces in the layers of Salesforce (26:11) A UI solution for sales and marketing (30:45) The future of logic and machine learning models (37:11) The battle for data ow...
Aug 09, 2023•1 hr 12 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Brendan Short of Groundswell.
Aug 07, 2023•5 min
Highlights from this week’s conversation include: Building Dozer: Simplifying Data Sources into APIs (1:13) Bridging Data Engineering with Application Engineering (4:19) Turning Data Sources into APIs (7:46) The cost of caching (12:59) Challenges with legacy systems (14:30) Real-time data integration (19:31) YAML and SQL experience (25:37) Behind the scenes of Dozer (29:18) Heavy Workloads and Low Latency (42:00) Use Cases of Dozer (45:51) Reliability and storing data from different connectors (...
Aug 02, 2023•1 hr 4 min
In this bonus episode, Eric and Kostas preview their upcoming conversation with Matteo Pelati and Vivek Gudapuri of Dozer.
Jul 31, 2023•7 min
Highlights from this week’s conversation include: Stefan’s background in data (2:39) What is DAGWorks? (3:55) How building point solutions influenced Stefan’s journey (5:03) Solving the tooling problems of self-service at an organization (11:44) Creating Hamilton (15:53) How Hamilton works with definitions and time-series data (19:34) What makes Hamilton an ML-oriented framework? (23:39) Navigating the differences between ML teams and other data teams (26:27) Understanding the fundamentals of Ha...
Jul 26, 2023•57 min
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com ....
Jul 24, 2023•4 min
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data. RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com ....
Jul 21, 2023•20 min
Highlights from this week’s conversation include: Lars work on Resoto in helping to cut cloud costs for organizations (2:02) The trend of large resources to micro resources (5:59) What are some of the typical resource drains in data infrastructure (8:56) Managing cost on the backend with scale and experimentation (12:51) Solutions for resource management problems (17:38) How Resoto is solving pain points in resource management (26:17) Navigating the complexities of data infrastructure (29:01) Re...
Jul 19, 2023•58 min