Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal

The Data Engineering Show

Sep 17, 2025•22 min•Ep. 46

--:--

Listen in podcast apps:

Apple Podcasts

Spotify

Download

Listen to this episode in Metacast mobile app

Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

This episode details Instacart's strategic shift from Elasticsearch to a self-hosted PostgreSQL cluster for its retailer search, driven by the unique demands of fast-changing grocery inventory. Ankit Mittal explains how consolidating search, ranking, and filtering into Postgres, leveraging extensions like PG Vector and robust data pipelines, drastically reduced network calls and improved efficiency. The discussion also covers the architectural decisions, trade-offs, and future directions for building high-performance search systems.

Episode description

In this episode of The Data Engineering Show, Benjamin Wagner sits down with Ankit Mittal, former Senior Engineer at Instacart, to explore how they revolutionized their search infrastructure by transitioning from Elasticsearch to PostgreSQL. Learn how Instacart tackled the unique challenges of fast-moving grocery inventory, achieved high-performance search capabilities, and leveraged PostgreSQL extensions for complex retrieval operations. Whether you're scaling search functionality or optimizing database performance, this deep dive offers valuable insights into building robust, production-ready search systems using PostgreSQL.

Discover why Instacart moved from Elasticsearch to PostgreSQL for retailer search
Learn about handling real-time inventory updates and search optimization
Explore PostgreSQL extensions, sharding strategies, and data flow architecture
Understand the trade-offs between different search infrastructure approaches

What You'll Learn:

How Instacart managed fast-moving grocery inventory data by consolidating search, ranking, and filtering into a single PostgreSQL cluster
Why pushing compute closer to the data layer can significantly improve search performance and reduce network calls
The architecture decisions behind using PostgreSQL extensions like PG Vector and custom solutions for search functionality
How to implement efficient data ingestion through S3-based pipelines and bulk writes instead of real-time updates
Why table maintenance operations like PGD pack are crucial for optimizing read throughput in production environments
The trade-offs between traditional search engines and relational databases for complex search implementations
The challenges of maintaining self-hosted PostgreSQL in a predominantly cloud-managed environment

If you enjoyed this episode, make sure to subscribe, rate, and review it on Apple Podcasts, Spotify, and YouTube Podcasts. Instructions on how to do this are here.
About the Guest(s)
Ankit is a Software Engineer at ParadeDB and former Senior Engineer at Instacart, where he specialized in PostgreSQL infrastructure and search systems. With extensive experience in database optimization and search architecture, he played a key role in modernizing Instacart's search infrastructure by transitioning from Elasticsearch to a custom PostgreSQL solution. In this episode, Ankit shares deep insights into building and scaling high-performance search systems for e-commerce, particularly focusing on the unique challenges of grocery retail's fast-moving inventory. His work at Instacart revolutionized their single-retailer search functionality, demonstrating how traditional relational databases can be adapted for complex search operations. His expertise in database systems and their practical applications in high-scale environments makes this conversation particularly valuable for engineers interested in modern search architecture and database optimization.
Quotes
"Think about it. If there's a lot of things that you can get the database to do, then the applications become simpler." - Ankit
"My non-Instacart experience has largely been in pre-PMF startups where the approach of abuse your database to its absolute limits works wonders." - Ankit
"Almost everything that we got retrieved had to be filtered out. So we go back to Elasticsearch again." - Ankit

"We traded off the quality of retrieval, hardcore core retrieval, with the whole system reducing the network calls." - Ankit
"It's a place to go to find what item is available, in what store, what item is available, at what price, including full product taxonomy graph and product and ontology." - Ankit
"The grand theme here is that we wanted more control over the cluster, how to spin it off, what kind of disks it would have." - Ankit
"We tell teams who want to have their data in this cluster, create an s3 home, create either a bucket or a home, whatever they want to do, and tell us that we would sync ourselves." - Ankit
"What we found is that the read throughput, we can throw more data if the tables are repacked nicely." - Ankit
"Most engineers who want to work on search, they are more used to the Elasticsearch shape of the query." - Ankit
"The relevance is better because they could join more things in the database. They also saw the cost of the normalized data reduced." - Ankit
Resources
Company Websites:
- Instacart - Grocery delivery platform
- ParadeDB - Database technology company
- Firebolt - Cloud data warehouse (firebolt.io)
Tools & Technologies:

- PostgreSQL - Database system

- Elasticsearch - Search engine

- PG Cat/PG Dog - PostgreSQL proxy tools

- PG Vector - PostgreSQL vector extension

- PG Repack - PostgreSQL table repacking tool

- ClickHouse - Column-oriented DBMS

- TantiVy - Rust-based search engine library

Articles:

- Instacart Search Modernization Blog Posts (Series on hybrid retrieval)

- Target's AlloyDB Migration Blog Post

For Feedback & Discussions on Firebolt Core:

Primary Speakers:

The Data Engineering Show is brought to you by firebolt.io and handcrafted by our friends over at: fame.so

Previous guests include: Joseph Machado of Linkedin, Metthew Weingarten of Disney, Joe Reis and Matt Housely, authors of The Fundamentals of Data Engineering, Zach Wilson of Eczachly Inc, Megan Lieu of Deepnote, Erik Heintare of Bolt, Lior Solomon of Vimeo, Krishna Naidu of Canva, Mike Cohen of Substack, Jens Larsson of Ark, Gunnar Tangring of Klarna, Yoav Shmaria of Similarweb and Xiaoxu Gao of Adyen.

Check out our three most downloaded episodes: