Production-Grade AI Systems with Fred Roma - podcast episode cover

Production-Grade AI Systems with Fred Roma

Jan 27, 202652 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

This episode features Fred Roma from MongoDB, discussing the complexities of taking AI applications to production. He highlights how MongoDB's expanding platform, including the acquisition of Voyage AI, provides integrated capabilities for data management, search, real-time analytics, and AI-powered retrieval. The conversation covers schema evolution in the LLM era, the importance of vector search and reranking for accuracy and cost-effectiveness, and the need for data platforms to adapt to AI's rapid pace while maintaining security and organizational flexibility.

Episode description

Engineering teams around the world are building AI-focused applications or integrating AI features into existing products. The AI development ecosystem is maturing, which is accelerating how quickly these applications can be prototyped. However, taking AI applications to production remains a notoriously complex process. Modern AI stacks demand LLMs, embeddings, vector search, observability, new caching layers, and constant adaptation as the landscape shifts week to week. Increasingly, the data layer has become both the foundation and the bottleneck to AI app productionization.

MongoDB has been expanding beyond its core document database into a full AI-ready database platform with integrated capabilities for operational data, search, real-time analytics, and AI-powered data retrieval. The company also recently acquired Voyage AI to provide accurate and cost-effective embedding models and rerankers to its users.

Fred Roma is a veteran engineer and is currently the SVP of Product and Engineering at MongoDB. He joins the show with Kevin Ball to talk about the state of AI application development, the role of vector search and reranking, schema evolution in the LLM era, the Voyage AI acquisition, how data platforms must evolve to keep up with AI’s breakneck pace, and more.

Full Disclosure: This episode is sponsored by MongoDB.

Kevin Ball or KBall, is the vice president of engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript meetup, and organizes the AI inaction discussion group through Latent Space.

 

 

Please click here to see the transcript of this episode.

Sponsorship inquiries: sponsor@softwareengineeringdaily.com

The post Production-Grade AI Systems with Fred Roma appeared first on Software Engineering Daily.

Transcript

Introduction to AI Production Challenges

Engineering teams around the world are building AI focused applications or integrating AI features into existing products. The AI development ecosystem is maturing, which is accelerating how quickly these applications can be prototyped. However, taking AI applications to production remains a notoriously complex process. Modern AI stacks demand LLMs, embeddings, vector search, observability, new caching layers, and a constant adaptation as the landscape shifts week to week.

MongoDB's AI-Ready Data Platform

Increasingly, the data layer has become both the foundation and the bottleneck to AI app productionization. MongoDB has been expanding beyond its core document database into a full AI ready database platform with integrated capabilities for operational data, search, real-time analytics, and AI-powered data retrieval. The company also recently acquired Voyage AI to provide accurate and cost-effective embedding models and re-rankers to its users.

Fred Roma is a veteran engineer and is currently the S VP of product and engineering at MongoDB. He joins the show with Kevin Ball to talk about the state of AI application development, the role of vector search and re ranking. Schema Evolution and the LLM era, the Voyage AI acquisition, how data platforms must evolve to keep up with AI's breakneck pace, and more.

Kevin Ball, or Kate Ball, is the Vice President of Engineering at Mento and an independent coach for engineers and engineering leaders. He co-founded and served as CTO for two companies, founded the San Diego JavaScript Meetup, and organizes the AI in action discussion group through Leighton Space. Check out the show notes to follow K-Ball on Twitter or LinkedIn or visit his website, k-ball.lc.

Fred Roma's Career Journey

Fred, welcome to the show. Hey Kevin, uh great to be here. Thanks for having me. Yes, I'm really excited to dig in with you on this. So let's maybe start with a quick background of you and how you got to where you are today at at Mongo, and then maybe we can use that as a a way in. Yeah, absolutely. So I I started as a software developer. Uh it was a long time ago, in France, in Paris.

I evolved, I I I became a manager and and worked uh also on the product side and I worked in different continents, small startups, big companies. The biggest one was AWS, uh small startup. It's probably some startup you haven't heard of. French startup.

But yeah, no no I and and uh I would say m most of my career have been in the in the cloud, even before we call that cloud, like more you know, application service provider or but plugging some servers and giving access to these servers to customers. And mostly around security things like payment, identity encryption, and more lately at MongoDB on data management and AI. So I'm having some fun here.

Building Production-Grade AI Applications

Yeah, so let's dive straight in there. We're talking about data management and AI. I think everyone's trying to figure out what is it to build an effective AI application right now with these new tools. What's your take on the core pieces you need? Yeah, I mean it's never been easier to vibe code an application. That's for sure. And that's that's exciting. I mean I don't know about you, like I'm I I just love it, like it's going so fast and that's so thrilling.

But so what we see though is that when you want to build something that is really for production large customer consumer application or large enterprise application. It's still very hard. And I think we could go to a couple of reasons. The first one when you build I think there are other uh blockers later on when you want to launch it in production, but the first one when you build is just uh how complex the stack is now.

You need an LLM, you need a vector surf, you need different kinds of AI models, you need an AI framework, you need new caching mechanisms. And that can be a little bit, when you leave the vibe coding piece and you really want to build that in a professional manner, that can be a bit scary. Yeah, absolutely.

Key Principles for AI Data Stack

Well, let's talk about the angle that I think I understand you all are addressing, which is the data side of it. Cause to your point, right? I have my database. I might have to do some embedding. I need a vector search situation. I need all those different pieces. So How do you think about the data stack for AI? What needs to be in there?

First what we think about the data stack, I think there are three things. We we may want to y you'll tell me which one you want to dive on on first, but like uh first one we we think it should be simple and simplified as much as possible. The second one is uh you need to make sure it's really accurate. And cost effective Because the information retrieval can be pretty expensive if you don't take care of it.

Part is you want to make sure it can evolve quickly. Things are going so fast. You never know uh you never know which is the best L L M model. You you you just unplug for two days and that's a new one. You never know which tool you need to use, you you you never know uh how fast your data will grow.

So I think it would be the three things. I mean we we want the stack to be simple, we want the accuracy of the information retrieval to be really, really good, and we want to make sure that if you change your mind or if the ecosystem is changing or things like that, you can touch your application and make it evolve easily. That would be the three three key.

Schema Evolution in LLM Era

Let's actually talk about that last piece, which is the evolvability piece, because this is one of the things I see a ton as we're doing AI agents internally, the things that I'm working on. Like schemas are not nearly as durable as they once were. Yeah. No absolutely. I mean they're not durable either because you change your mind or you pivot your application as a developer and that's totally fine. They're not durable as well because the ecosystem is changing so fast.

When you want to connect to a new partner or new integration points or even using a new LLM framework, maybe you will have a new field to account for. Or maybe you want to adopt this new observability for L L M to do real good evaluation and and and you so yeah, that is changing super fast, absolutely. I think there's even an aspect that I'm starting to see, which is LLM derived schemas, right? Instead of having one mega schema that the developers come to, let the LLM choose.

No absolutely, yeah, it's definitely a trend right now. Plus when you say uh let's the LLM choose it, the reality is that you may have to play with several LLMs. So you may have to be able to handle several schemas either again because the L LM that was best this month is different than the one that was best last month, or because for some task you still need some specialization. You still have LMs. And I think that will be more and more of a trend.

You will have LLMs that are really good at some specific industry or use cases and others are really good at other things. But no, absolutely. We touch. Mongo it's been MongoDB value proposition forever, like uh go with a document model, you don't have to You don't have to stress about any change you will want to do. So I mean we love it.

Just to be clear. We love that the full AI world is uh speaking JSON and we love that the full AI world is coming with uh all of these changes because then we can yeah, I mean that was already something important before to be able to evolve quickly, but that's even more the case. Means airway.

MongoDB's Enterprise Scale Capabilities

So let's talk a little bit then about the second step that you talked about in terms of okay. We want to take this to production. We wanna be able to scale. We wanna be able to deal with all of these different things. Cause I think Mongo has long been, at least on in the front end world where I used to live a lot, like

database of choice for rapid prototyping. And then at some point sometimes people would say, Oh, well, now we've got to switch over, we've we've arrived where we're going. But I think at this point y'all scale all the way up, yeah. Oh yeah, we we have like uh now we have like s I think it's seventy five percent of the Fortune five hundred.

I've been at MongoDB for one year and a half and it started far before me, but I can tell you we always speak about big for like security and durability, availability, performance, being able to speak to large enterprise and we see that more and more. We see more and more of this very large workload. Uh you can totally manage a seat transaction, you can totally manage this uh very strong transactional financial uh thing. So I think it's uh I mean

MongoDB is the open source. I've been a customer of MongoDB far before even considering joining the company. And so I I also had initially oh yes, uh I remember we are going so fast, but uh I uh that's a very use uh very serious database for scale.

Integrating Data for AI Applications

So let's then talk about how to effectively use it in an AI application because I think This is a space where okay, the model layer is pretty well understood if changing very, very rapidly, right? You have these LLMs, you can throw text at them, they love JSON, all these different pieces. And then you have this kind of, okay, there's these cool applications being developed, but all those middle pieces and and as you highlighted, the like complex stack that's going into that is very much in flux.

Absolutely. Yeah, yeah, absolutely. So even past what we discussed before, this document model and JSON that is again very well adapted, optimized for AI, you still need a a search, a vector search. Because I mean you don't have any serious AI applications that will really give you a lot of value if you just plug an LLM. The value of these AI applications is okay, how do I connect an LLM, what the LLM knows with what my company knows?

And if you want to do that, you need a search usually and a vector search, by the way, we can maybe we can come back to that as well. But like if you are looking for I don't know, you are building an optimized e-commerce website and you are looking for uh red shoes, yeah, you probably want to see these Burgundy sneakers as well. So you need search, you need vector search.

You need AI models and bidding models and re ranking models because this is how you can really have very good information retrieval and and really make sure that if you are building your rag application, if you are your semantic search, your your agentic system to make sure that the right information is provided, granted results are provided to your user.

Depending on what you are building, you may need some stream processing if you have some events, you you may need many of these things. And so as a customer, You can choose to stitch together different solutions a database, vector search, a search, a re ranker, an unbidding model, etcetera.

I don't think you should do that. I don't think you should tell your best friends to do that. But you could stitch these things together and connect multiple times your identity providers and create this pipeline for the data to transfer. What we are really betting on and where what we see is is bringing value to customers, it's just making it super simple.

You have a database. We don't reinvent the world with all the stack you need for AI, but on the data layer, you have a database that is becoming a data platform. And so you can do yes, uh storing of your information, querying of your information, but also information retrieval and data in motion and all of these AI model optimizations to make sure that you will get good results.

AI Products as Advanced Search

Let's go a little bit deeper in what that takes on search. Cause I I love that you brought up search. I feel like to me, one of the things that I'm seeing is Everybody's thinking of these as chat products. They're not chat products. They're search products at their core. They're about surfacing the correct information and LLMs help you interpret that and put it into context for someone. So it's not as simple as throw it at

an open source embedding model or open AI's embedding model, run naive queries and just go. There's a lot of pieces that go into effective search. Absolutely. Absolutely. We we can maybe take an example, a concrete example, a simple one.

Let's say you are a bank and you are building a an application for your customer support and you know your bank's customers they will maybe be able to ask questions about their account and their credit card, etc. And so obviously you need an LLM because the full interaction, the conversation is an LM interaction, so you need that for sure. And you don't have access to the internal documents of your bank, probably a bit weak as an experience.

So you need search. You need search. I would say you need both. You need probably vector search that are really the semantic search, like because if I'm asking you, okay, how much money do I have on my account? Maybe you will look at it from not maybe how much money, but like what is a sold of my account? What's there are very different words in very different language languages. So you will need search, uh vector search.

You will probably need as well, maybe some kind of reaction to that event because if if you see that uh something has been paid in the last two minutes and we are discussing for five, you want to take that into account. So you need all of these pieces to come together the search, the vector search and the stream.

Voyage AI Embedding Model Features

So let's talk about the pieces that go into vector search, because I think You know, once again, for folks who are coming into this to the beginning, they think, okay, yeah, what do I need to know? Right? Like, I mean, I started the first interactions I had with search were back in the days of solar.

And Lucene, right? In these things. Absolutely. There were no vector search. It was all keyword-based, but you had some synonyms and things like that. But to me as an end user, I was like, okay, throw it all at solar, do a couple configurations, and I'm good. It's golden. It serves things up.

I think there's there's more nuance than that if you just throw all of your documents at OpenAI's embedding model and assume absolutely it's going to work, it's it's not gonna just work. So what are the different pieces that go into that? Yes, there are a couple of different pieces. I will start with the basic. Different models will have different accuracy. So the quality of the result and it's always a trade off between how fast you want the result and how good you want these results to be.

And the best embedding models, and that's exactly why we we acquired Voyage uh a bit more than six months ago now. I mean they were doing and they're still doing the best embedding models out there. They are more accurate than this open AI that you mentioned.

So you you absolutely want the best result with a reasonable latency because you have a user probably maybe behind a chatbot or maybe behind an AG antics system application waiting for results. So accuracy is is a big one. Multimodal is a big one as well. Most embedding models uh will be able to do a good job comparing text or maybe pictures, but the real world is messy. You have uh

images and text and videos combined and you will have PDF and you probably don't want as a developer to break that things down and call different models. So you you that's also one like what what are the formats that you can support? I would mention two more things. So yeah, uh accuracy, multi modal.

The ability to understand context is also very important. Like for instance, if I'm asking you, Okay, let's say you have this support application, chatbot, maybe now you are like a networking company and I say, Okay, how do I configure this router?

If you find somewhere in your corpus of data an exact sentence or blob or text that explains how to configure that router, most embedding models will be super happy. Oh, I found the information. The best embedding model will be able to say, well, wait a minute. I found a line, a sentence that looks exactly what the user is asking for, but is it part of a recent documentation, or is it part of a ticket that is five years old and the setting is totally outdated?

So we'd say context is very important. And last, and that's last in my in my list, but probably sometimes first in the customer's mind is like the cost of it. uh people sometimes don't realize but these embeddings they can be even bigger than the data that they represent. If you have a great model that is able to do all of that and you don't you don't have many of them, good accuracy and multimodal and and good context,

But the embeddings are so big to be able to achieve these results, it would just be an awful ROI for your application. So you also want a model that is able to do that with really short embeddings, as that would be cheaper to store and cheaper to crop.

Multimodal and Contextual Embeddings

Yeah. So that's interesting. I kind of wanna explore a few different of of those aspects and maybe we can explore them from the the context of Voyage, because you know, that is the recent acquisition and and that was kind of as I understand it the secret sauce there. So First starting with this like multimodal piece, right? Because if I think about an application I'm building

I am probably doing a fair amount of pre processing, right? I'm like, oh, this is an image. I've got to translate this image to text and now I've got to spend this text over to my embedding model. And now I've got to take that and do my vector search. And it's like this whole pipeline of thing. Absolutely. But it sounds like what you're saying is they've got a multimodal model that you can just throw whatever at it and it's gonna translate.

Yeah, no, you you nailed it that that's exactly how what customers were doing and when we speak to customers and they are describing that. They say, Oh, I have my let's go back to the PDF example they say, Oh, I I have all of this pipeline and I will extract My pictures and my text from my PDF, and then I will run them through different kinds of embedding models and I will try to reconcile the results afterwards.

Yeah, with a Voyager multimodal model you just throw your PDF in the embedding models and and you will have an embedding. By the way, the result will be even better. than when you are doing all the pipeline because when you are breaking your document down, you will lose some context, you will lose some interaction. Where was this picture exactly? Was it above this text or below this text kind of thing?

Uh so the result would be better but the big benefit is also as a developer you can go s super fast. Just use your document and your embedding model and you're done. Yeah, no, the simplicity definitely appeals to me. So let's explore the context piece a little bit more because what I kind of hear you describing is your embedding right now is taking into account not just the text or not just in in the case of the PDF, like text and images.

But it sounded like things like metadata, updated timestamps, like all these different things. Like what does this API call look like? What do I pass to it? Yeah, oh it's even more than that. So what you describe by the way, the fact that you you also want to take into account the metadata and some other information in addition to

purely semantic search, it's super important, top of mind for customers, and that's why they're using uh, by the way, vector search and search combined. And that's why it's so important for search and vector search to be where your operational data is. Like if you're using a separate vector search

You will have to oh what is all this data or metadata I have to also synchronize to this no when your database is there, you can do all this stuff that you are mentioning. The context is a bit different. It's part of the embedding model, it's just the way the model is trained. And then when at inference time, instead of just isolating a chunk of text from your document, that's how you again maybe I should step back. When you are running an embedding model on a text, you should cut

You don't give it like two pages of document. You you are chunking the document in like small sentences or blurbs or things like that. We call that chunks. Then they will have an embedding for each of this chunk, but they don't really know what is a chunk before and what's the chunk after. And what the Voyage model is doing and the Voyage context model, that's a specific one that we we release, is that it will parse the full document

And that it will preserve some context in addition to the specific chunk. So yes, you will know for instance, yeah this sentence. really explains how you can configure this router, this security configuration maybe, but that's how you know that you are part of an old ticket, because you also see maybe three or four chunks above That it looks like a support ticket and that it is sixty years old, and that probably you shouldn't give it too much importance.

Got it. So conceptually if I'm gonna just try to like map this out, if I were building this with a a much more naive model. It would look something like, okay, I have a summary of the whole document with maybe some additional things, and then I have each chunk and then those two things are getting put together kind of linked in each set. Interesting. Yeah, that's that that's that's a great uh great way to look at it, yeah.

Search, Vector Search, and Reranking

Fascinating. Well, and you alluded to another piece of this, which is combining search. And that gets us into this topic of re-ranking and and all of that. So can you maybe like lay out what that looks like? Just broadly for context for folks who haven't built these applications before and then what the voyage take is on it.

Yes, the it really depends about what you are trying to achieve in your application, but what we see most of the time, when you want to have the best results and I w I I'll give you one example, but when you want to have the best result, like the most accurate result Like your user is asking for something through a chat bot, through a an agent, etcetera, and you want to give the best results.

Combining a keyword search, like looking for a document with the exact keywords that were part of the query. plus also the semantic Search, meaning that documents that may have very different keywords but are speaking about the same topic, combining the two is how you get the best results. Like the example would be let's say you say, Oh I want to I'm interested in a I'll go back to my red shoes. I'm looking for Nike red shoes.

Maybe red shoes is totally okay to go with Burgundy sneakers because that's almost the same. Now, if as a user you made the effort It may be very important to you and you want really to make sure that you are looking at the keyword Nike. So let's look at the keyword Nike, but let's only only look at the semantic meaning of all of these red shoes and maybe Burgundy sneakers are perfectly fine.

This is just a very simple example and it doesn't fit to all use cases, but most of them the best accuracy would be to combine both of them. And that's why having search and vector search and the database at the same place It's a big deal in terms of uh you you remove a lot of round trips to do that. Absolutely. Yeah, you're you're right, sorry, that's a very important point that you are touching on. I'm not saying that you couldn't do it with multiple pieces, you could.

But then you have to run your search on a keyword search, you have to run your semantic search and you have to build your own algorithm to see how you are ranking those results com yes, so that that's that's exactly a lot of hurdles that you are removing. So implementation wise, if I then was using your API, am I able to specify param you know, run this against these two searches, re rank in this way? Like what are the knobs that I have available as a developer?

Yeah, so we didn't reinvent the wheel, by the way, with a neither search or vector surf, you use a MongoDB aggregation pipeline, the one you are using with your database to just query data, but we created new operators. score fusion, rank fusion. So I may not go into these details because these are just slightly different ways to merge the results.

But you have full control about so first you have one operator where you can combine keyword search and vector search, but you have full control about how you want to do that. I mean many customers are just happy with the basic kind of way to combine, but some say okay I want to have different weight, I want to overindex a little bit, maybe in the in this keyword and a bit less on this ones. So it's it's it's up to you. But you you you just use the MongoDB aggregation pipe.

Maybe it's worth actually stepping back and talking about that aggregation pipeline a little bit because that is a capability that doesn't exist in all databases. Sì, assolutamente. No no that's that's really the I mean if if if you have used MongoDB, that's uh something that customers usually really like, is uh it really give you the ability to make several operations on your database.

one after the other and the result of an operation can be used as an input for the next operation, so it it can be really, really powerful. And that's the case for this again search. You can use combined operators as I was describing, but you can also decide to do some search and then you will do some re ranking somewhere else and you will do some other stuff. So it's really a pipeline that you can implement uh to play with your with your data.

Yeah, so just to sort of echo back, right, like if you were using another database, you might have an external pipeline tool where you're defining a series of stages with dependencies and data transfer and kind of moving that around. And with the aggregations pipeline, you can do that all inside of the database. You can do that all inside the database because this is natively integrated, like search, vector search.

are natively integrated with the database. You don't have to move the data around, you don't have to use different aggregation pipeline indeed, but you don't have to use different CLI. You can really have all of that as a single experience. It's not like an extension where you have Be careful about how that will be supported. Or how you can plug things together. That's the same tool. It's giving you access to all of this.

And just so that we're clear, those things can be defined on the fly. If someone, for example, was giving the LLM tools to the kingdom, it could write its own aggregation pipeline and and run it. The developer. Oh so yeah, so we we did release an MCP server that's really trendy these days and it actually is really effective as well. So uh customers are are I mean the adoption is

pretty nice if you use this MCP server as an example. I'm mentioning MCP server because you mentioned LLM and usually that's how developers more and more are are going with interaction with our database. You can absolutely for sure create clusters and do operation but also configure your your aggregation by Let's talk a little bit about

Security for AI Applications

Security. You mentioned security as an area that you had dealt with. And I think, you know, this question of what are L LMs allowed to see, what are they not allowed to see, all of this is is definitely top of mind for those of us building applications here. What are the primitives that are, I'm guessing, baked into the database to allow you to build secure AI applications on top of mine?

So I I would say I want to step back on the overarching principle because you touched on LLM and what an LM can can see. That's exactly because you don't want probably to train an L L M on your private data. that the the pattern, the architecture that is winning out there is no. I will not I mean uh there are exceptions, but I will I will not fine-tune or or post-train my LLM on my data. I will use an LLM, really good at again, everything they're great at.

But for my use case, when a user wants something, then I will connect what this LLM knows with what my company or my application knows. That's not about giving anything to the LLM. It's about your application being able, with very good information retrieval, to say, Okay, my user is asking me again for my what is my uh credit limits on my credit card?

A lot of things can be handled by the LLM in terms of how to answer to a user asking questions, but if I really want this information I also have to go and find my private information of my bank about what is the real limit.

And I will provide this information to my user combined, but the L L M will never see this private information. So I uh maybe just to clarify the overall pattern before going into the security uh details, that that's super important. I think that is very important in terms of

What sets of data are I think the term I sometimes use is moderated through the LLM? So even if the LLM the LLM may be the UX delivery, but do I load this data, pass it to my model and have it present it, or do I sidestep around the LLM because this has to be right and I can't count on it not hallucinating something about it.

Yeah, yeah, absolutely. And there are different patterns there. Most of the time what customers will do is again if a user, an agent want to do something and that does require information retrieval, you will first look for the information that will be relevant. I'll stick with the same example uh what is the internal document that that explains what are the limits on my cr on credit cards, and then we'll insert that in the prompt of the LLM.

So yeah, that will go through the LLM from the prompt and the answer of the LLM. But it's not stored anywhere on the L L M side, it's not been part of the training of the LM side. Et si vous décidez, et c'est à vous, comme le client, où vous voulez ce LLM être hosté, et où vous voulez ces queries et ces tokens être servés. Et donc vous pouvez décider, dépendant de votre sécurité et de votre sensibilité, avec beaucoup de clients.

say okay I'm totally fine with uh having my LLM uh on AWS or OpenAI or AGO etcetera. Or some customers will say, No, I want to do that, but I want some specific security agreement with these providers to make sure that my data is never shared and some customers say, You know what, I want to host my own L LM You can do that if you want. What is important is that there is n nowhere um the blending, if you want, of how the LLM is trained and and your private information.

So coming back though to building applications with these pieces, I think I'm curious to understand how you're seeing people defining these lines or barriers. Is it changing at all in terms of how you're managing security at the data layer.

I mean i security has always been a very I mean, customers trust us with their data as a as a data platform, so it's always been top of mind anyhow. But I I I would say that with this AI specific application we see more and more I would say at least even more discussions about this LLM integration, the one you spoke about before.

And you know, um one of the big value of MongoDB is you can run it anywhere. And when you say run anywhere, sometimes people tell us, Oh, you are cloud agnostic. You can run it on AWS and GCP and Azure and etcetera Yes, that's true. We can also run it on premise.

You can also uh you know, we have an enterprise and you can do that in your own data center. And we see customers that are doing that. What is very interesting, I think, in this AI world, we see customers that are saying, You know what, for this use case, I'm totally fine if it's in the cloud. But for this use case, I really want to make sure that my data is never in any cloud provider and and and never touched by any LLM provider either, and so I will run it on prem.

And and the y you start to have this kind of and again I think you know it's so early and things may evolve, etcetera, but the fact that they have a choice. and they can decide what to uh rely on as a cloud provider as a their own data center. I think it's pretty interesting and I do believe it's more and more a discussion topic.

That's fascinating. And when you're seeing that, are you seeing them doing this within the context of the same application? So you're having to kind of have security boundaries and federation and however that's working? Or like how does that work? I'm not sure I can extract one pattern, one one answer to that. You have customers that will say, Okay, I will really have all of my data will be on prem and then some applications will be in the cloud and some won't.

I see some customers saying, you know what, I want my data to be in my data center, so I want a MongoDB Enterprise Advanced and that's how I manage it. But I'm still okay to do a call to an LLM outside of my boundaries because they believe that they can control, and they are right in many cases, they can control the prompt.

So it's okay to send some information as soon as it's something that your application control, but at least nobody's seeing the raw data. So I really see different kinds of pattern out there. I think what is very important though is that overall I think there is really this intent to remain as flexible as possible. I think say you know, I'm I'm back to the previous point. Maybe this L L M is good for me right now but in six months that will be another one.

Maybe this is cloud, hosting is good for me right now, but then I will want to go on prem for any regulation or specific concern later on. I think there is really a willingness to remain as flexible as possible and to have options on the table.

Rethinking Team Organization for AI

That gets to kind of a a somewhat different topic, but You know, I think one of the things that these tools are doing is they're changing the speed at which people are operating. They're changing kind of how fast we're moving. They're changing how adaptable we need to be. How are you both internally and with your customers like rethinking the way that we organize the teams doing this?

Oh yeah, yeah, that's that's uh okay. I mean there is a product angle. I thought we were initially going to the product angle and and uh we just go there. Super quick uh uh l super quickly, but I I want to go to the team organization, I think that's super in important as well. But from a product I'm I I'll go back to If you want to go fast.

But you have to move your data around and to I mean any of us who had been software developers or architect at one point know that when you have to optimize for latency, performance, iterate quickly with network. Layers and data to transfer is just a nightmare.

So we say that that's that's one of the key arguments, not just for MongoDB, I think that's a big value of MongoDB, but uh Overall having the database, vector search and search at the same place and you don't have to do this ETL and this kind of difficult network configuration between I think that's a big one for your question as well about the speed of development and it, right? Now, if your question is going more to the team organization, I think that's a very good question as well.

Even so, I mean what what two things. I think what's happening right now is that it can be also um very thrilling when you are a a product manager or business person say, Oh, I can build it myself, I can beat fast. And personally there is something I love about that. What I love about that is that instead of trying to debate about maybe some text and some PowerPoint you can really show, I love that. I think the risk is to believe that it's easy.

Yes, it's easy to show something. That's the old, you know, uh engineering and and product managers sometimes have to align on that. I think it's great if you use that well. If you use that as a way to say, oh, and that just put it to production. Well, for some stuff, yes.

But how will you scale? And because that's that's that's all of these tools, right? They're making the code, uh writing code super fast. Well reviewing code is not faster, and making your security assessment of this code is not faster, and defining the right architecture is not faster yet.

So I think it can be awesome if you know what it is. It can be a bit dangerous if you believe that then oh th then that's it and I can just push it. Sorry, I don't know if I if I maybe pivoted a little bit to your question, but that just made me think of this point. No, I think it is key and it kind of gets to a couple of different questions, some of them related to the product piece and some not. So like one of these things is going from zero to a prototype I can show is now very fast.

Absolutely. And if you put some restrictions in place. Some of things are then, if I was hearing you correctly, you can then actually take out to production. Some things actually do are able to ship relatively easily, but others are not. So I guess the place I would start to ask you is like, how do you draw those lines? And are there ways in which Having, for example, a database that can handle all these different pieces together makes it easier to bridge that gap.

Yeah and and I will obviously be biased because of working at MongoDB right now database and we are not vibe coding a database. I mean there's too much at stake. You're not? No, we are not. We are not vibe coding. Now that doesn't mean that we are not leveraging AI for many things, for prototypes, one hundred percent.

For this product and engineering alignment. By the way I didn't answer your question about team organization, but maybe later, can put pin on that. But also how you handle tickets, how you on board? I I think people, even very senior engineers, When you ask them say, Oh, I have to discover this new part of my code base that I haven't touched in a while, I can go so much faster now to understand. So I think there are many, many benefits as well for real product, but the code

That is written for production. I think that we we are still doing it very manually for the core database for sure. And then I would say even for more internal things or stuff that are maybe a bit less sensitive, where you can for sure go faster in your I don't know if vibe coding is the right thing, but like AI assisted coding.

still believe like uh yeah the the the security audit, this uh observability is there's still a lot. Uh it's not it's not ready yet uh for these I2s in my opinion, but they can help you yeah prototype and align on tickets and align on requirement. I think that's pretty impressive. Do you have a line in your product of like within this must be handwritten outside of this AI assist okay? Yeah, o all all of the core database and core product of MongoDB, we don't vibe it, we don't AI it.

The code is written manually each line by a developer and we are doing the code review as we did and we are doing the security review as we did, that that is for sure. We we can use again some AI tools for some security finding kind of things. I mean we we we can get some help, as many companies do, but I would say most of what we experience with AI are a lot of more the management tool around that. That's still a lot.

For uh you know, but like uh all the management tool uh around that, all the internal tool, all the before coding and after coding piece to go faster, we use AI uh more and more, but we don't vibe code the core MongoDB database for sure.

So that's obviously probably pretty different than a very new, not so security focused startup. So like Thinking about that, has it changed anything about how you're internally organizing your teams, or does it still take just as many people because the core has to be so kind of solid and locked down? Yeah yeah no that's a that's a great one. I would say so. Th there's something I I'll

first as maybe as a cultural kind of stuff that I I I I came to really enjoy about MongoDB. And I saw that in my previous company at AWS as well, is that even people that are not engineers Pretty deep, technically. We have product manager trying to build their M C P uh uh experimentation. We have I mean I think that's culturally speaking that's the case and you can see that because you have people moving from engineering management to product management role easily and I I I think I think I I do

So I just want to see yes, that that in this context, because I I I think that if you are in a different context which can have also pros and cons, but I think maybe that's different. But I think in that case it can really fast track the alignment on what you want to build.

Because the example you took before, like instead of just describing a long document, oh that's the UI I want, you can just do it and then you can align and then you can you can discuss about how to do that. So yes, it does fast track uh the alignment between product and engineering, no doubt.

We went pretty fast in terms of organization. We made the decision a few months ago now to uh we don't even have an engineering and a product organization anymore. We have a product and technology organization. That's part of it, it's not just because of AI, but that was part of it.

I I think two big objectives. One, how to make sure we can fast track the decision making, the alignment, the sharing of information between the product decision and the engineering decision because they are the same eventually. So that's that's that's that's number one. And number two, how to make sure that everyone is customer obsessed.

And even if you're an engineer, you should be customer obsessed. And I do believe AI helps with that, by the way. You can really see what your product manager has in mind. We are using that to uh really show some report from customer discussion, advisory board, these kind of things.

So we really uh made this leap because you you are touching on the organization and I I do believe that I guess we probably would have done that as well, but with these AI tools and way to collaborate, I think it's helping product engineering to be in the same kind of uh smaller team than before it was a product and engineering organization.

Yeah. No, I think there is definitely something and it and have been in a lot of conversations kind of talking about this convergence between engineering and product. Yeah. Whether that looks at more technically minded product people, whether that looks at more product minded technology people. As code becomes, maybe not in a database core, but in many contexts as more commodity, it means that product mindset is more and more important.

I think that's spot on. I would have a bit of a nuanced tech on this one, which is like a I love that product people can be a bit more engineering or engineering people a bit more product or the wording you used. I love that because again I think as an engine that's a great way to be customer obsessed and to be focused on the outcome and what you are trying to achieve more than just trying to articulate what you even think kind of thing. However, the expertise doesn't go away.

Like if you're a product person and you are meeting many, many customers a week, you will have an expertise about reading between the lines and understanding the reading the room in a meeting. If you're an engineering person, it's not just about this prototype that your product manager colleague can build. It's really about

How will that scale? How will that evolve? What do we expect my growth to be and and my new changes to be and my next security audit to require? So I think yes, it's great to bring people a bit closer to uh the other side somehow, uh under bracket. But I don't believe we should respect expertise. There's still a lot of expertise in what it is to build a system at scale.

for real production usage and what it is to really understand customer intimacy and what is the need of of a market. And we should we should respect that. We we shouldn't believe that because we have tools that that are helping a little bit, this expertise doesn't matter. Yes. I think that that is You know, if I were to summarize, one of my big lessons about LLMs is they're incredible tools, but you cannot turn your brain off. your brain as an expert is is super necessary still.

AI Application Accuracy and Cost

Oh I I I love that and to me the I I don't remember who who said that. I would love to have been smart enough to say it, but some someone says something like oh I don't understand why w when uh my my LLM is super smart on all the topic I don't know, but when it's a topic I really know it's not that smart. I love that because you y y you know your expertise is still important. When you know deeply a topic, you do realize that the LLM is wrong sometimes.

And you do realize that grounded that with uh back to the Mongo we could take that back to the MongoDB case, but grounded that with real data, real knowledge is important. But even overall, I mean i expertise matters. Even within using it, I find I'm better able to guide these tools in areas I know well than in areas I don't. Yeah, I l I I love that. So coming back a little bit to this piece around the data layer under LLM applications.

What do you see as the big unsolved problems? What are the things that your team is working on looking forward for the next I don't know how long we're allowed to project in the AI era? Two weeks, six months, something like that, right? Yeah, maybe. No, I I think you know, everyone out there is developing an AI application, right? I mean, it's pretty rare to see a a company that but you don't have so many I think we are at this

Turning point right now when w where they are really going to production. I mean you have this It was a f a famous MIT paper like three or four months ago saying oh ninety five percent of these AI applications they don't make it to production or when they make it to production they disappoint, they don't bring the air OI they were expecting to bring.

But I do believe it's changing. I I I I do believe more and more of these AI applications are getting production ready. And to me the key to your question about what what we what we see, I I believe that people Customers really understand more and more how the quality of the grounded response is, how the quality

de l'information retrieval pour assurer que votre LLM n'est pas seulement LLM smart, mais c'est l'application de l'application de l'application. L'application de l'application de l'application de l'application de l'application de l'application de l'application de l'application So I think people are really realizing that even a little bit of impact on the hallucination is like a big deal for the user experience. And I do believe as well that people realize that well

Actually these stuff are expensive. This AI model can be pretty expensive. And even right now where you know they are subsidised by a lot of VC money and all of that, that's still expensive. So if you are able To only call Z L L M when you need to call Z L L M And when you're able to optimize the length of your prompt because you you did a good job at finding the relevant information in your corpus of data before.

If you are using embedding and ranking models that are pretty short and cost efficient, that can totally change the ROI of an AI application. So accuracy and and cost of this AI application, I I think that would be a a big topic for the years to come, in my opinion.

Data Preparation for AI Accuracy

In terms of accuracy then, I mean I think you've talked some about we've talked about like what it takes in terms of searching, in terms of re ranking and what you surface. Yeah. Are there any other kind of best practices you've seen or you recommend to folks in terms of like what you're actually putting in that? Are you doing pre-processing? How are you, you know, navigating those different levers?

Yeah, absolutely. I mean with that I mean th that could be a very long discussion because as that's but I I would say uh one, the quality of your data. If you don't clean your data regularly, handle metadata, no know what what's important the quality of your data is is key. How you are preparing your data for AI.

Uh we spoke about shanking. How do you cut your your your long text into smaller text that makes sense? I think that there is a science, there is a a science to an art to that. How do you prepare your data for so it has to be clean, it has to be prepared for AI. And then that's really the what is the right information retrieval strategy for you? Do you want results that would be super fast? Do you want results that will be super accurate because you are a legal company?

And when you are providing advice or a financial company that can impact the tax return or legal document, and then you will want to use the best embedding models with all the context and all the lengths, etc., and that's okay if they are more expensive. Or maybe if you're an e-commerce company, you're able to go super fast and make sure that you have 10 or 12 good results and it doesn't have to be the only one. So I think it's about your strategy, about...

What is the right trade-off for you between this uh yeah uh usual quality and cost and speed and all that? Yeah, the interactivity is an interesting one. Even within an application, you might have different workflows. Like I I was talking to someone who was doing an agent and they were saying, Yeah, when I know that the user is right there.

I bias towards interactivity and speed and getting it up in front of them. And if it's an async workflow, now I care more about accuracy. Now I care more about, you know, I can take my time. Oh absolutely and we see that an example is not a public reference, but we have a an an e commerce customer of us. And how you make your trade off for your e commerce piece like the where the user will go to find the red user of Pur Gundy Sneakers and how you will manage your stock.

Behind the scene, these are different kind of requirements and latencies and costs and things like that. So absolutely depending on the workload, you you will have a different sensitivity.

Optimizing Embedding and Reranking Models

So I think a lot of our listeners are now familiar with in some of the AI coding tools. Like you can dial up your budget for oh, I want you to think longer, I want you to reason more, I want the, you know Mini versus high versus whatever, what are the equivalent knobs that you have in the embedding models and the re-ranker and all these different parts that you have with Voyage?

Oh that's a that's a great great one. So I didn't think about this parallel before by the way, so I love it, the syncing and th the the one with the LLM. You can definitely go fast and cheaper with like a I would say I would say basic, but even the basic you can have really good ones versus not so good ones, but like a basic text embedding model. And we I mean one of the value of Voyage model is they come with different sizes.

So you can decide how long your embeddings will be and even the type, like do you want to to go with a float or like a to a binary kind of so you you can decide how much so there is definitely a first decision there. about a cost versus accuracy trade off. Then you can discuss whether we discuss multimodal and context. These are heavier models. They can take a bit more time, a bit more compute. But they will give you better results, so is your use case worth this investment?

And last, we didn't speak much about re ranking, but re ranking is an additional layer that more and more customers are using, which is like the embedding model will basically give you oh, these are like the ten, twenty, one hundred best documents in your corpus of data for this query. That's pretty fast and you optimize for that.

If you want to know the best model, uh this sorry, the best document for this query, then it has to be compute intensive. You have to go beyond the embedding, you have to go back to the document itself, and that's what re rankers are doing. So that's also another layer of your thinking versus uh you know a C LM analogy which I I I like. You can also decide whether you need a re ranker to have a very optimized ranking of your results.

Now, you mentioned for a lot of these in MongoDB you would put them as an aggregation pipeline. I'm thinking about use cases that I've had and building these things. Oftentimes I'll kind of do things in layers where I'll show them something quick, fast, but then I might re do behind the scenes. Okay, I'm gonna re-rank, I'm gonna resurface this, I'm gonna bump this, I'll do things like that.

If I were to do that in your system, can I get those kind of intermediate results streamed out to me in some way? Or like how does that end up working? Yeah, you you can have uh well, what you described is like in a in a single query you could first go with a quick search and and and a bit of a longer one. I don't have many use cases in mind doing that. What I have though is like uh developers that will start

Maybe um the first iteration they will go with a an embedding model and that's about it. And then when you really want to go to production and you will you will have real users and real data, etc., uh then they will add

they will upgrade their model to a more powerful one to improve the accuracy of the result or they will add a re-ranker. But thinking out loud, what what you described is totally possible. You could totally do a first search and then do a re-ranking in parallel. Like for instance, as an example I haven't seen it but it's

search a fresh face that uh doesn't uh I kind of like the ID still. You could still show oh these are the ten pair of shoes that are looking like your red shoes an Nike kind of thing, and you show them immediately. But after a few seconds you can re rank them to make sure that the one that is very accurate comes to the top.

You could totally build something. The technology doesn't prevent you to build something that dynamically. I don't know if the financial aspect or balance is a very important thing to do. If it'll make sense, yeah, it'll it'll depend for sure on the the application involved. But yeah, I think these types of latency trade offs

are all over the place in these types of applications. And so then there is this question of, okay, you know, how much value is there in showing something to the user versus getting the right answer in front of them? And maybe there's value in each. Yeah, absolutely, absolutely. But at least what's important I think is to have options. Whether you use that in in in in the same flow as you were describing, or you use that because you have different phases of your project.

And at one point you just want to improve accuracy and and etc. Or you do that because for a specific application or workload, uh accuracy is so important and for another kind of workload, uh maybe you have a free tier.

Et c'est ok que vos clients ont un bon résultat, mais quand ils sont payés et que vous avez un premium tier, vous voulez que vos clients ait excellent résultats. Je pense qu'il y a la liberté de faire ça facilement, parce que vous pouvez changer votre schéma, vous pouvez changer votre AI modèle, vous pouvez optimiser les choses. Customers are looking for, right?

And having that flexibility within the same API, same interaction, I don't have to now I have to go and get a different thing. No, that's definitely And same document model, I'm coming back to that. We didn't invent anything new to store the embeddings, for instance. They are part of your document model. They are there. That works. And we didn't reinvent the wheel about

Because what people love about MongoDB usually is the horizontal scaling and the fact that you can have these shards and replication. Oh, we did the same for search and vector search. You can have your own search nodes and you can decide that they will be a bit more memory intensive and they will not impact your database. It just uh using the same principles of the document model, of the distributed architecture, uh just apply to this embeddings as you went.

The Imperative of AI Flexibility

Awesome. Well, we're coming close to the end of our time. Is there anything we haven't talked about that you think would be important to discuss before we wrap? No, I think we touched on it, but I just want maybe to double down on it. Things are changing so fast. Like the the quantity of data is changing so fast. Now it it's not just even humans now generating data and consuming data, it's it's agents generating data. So the quantity of data is changing so fast.

The ecosystem is changing so fast. There are new players and some of them are amazing and some of them are looks amazing but won't be here in six months, so it it will go super fast. The LLM race is like uh I I love it by the way. I love when like oh Google is coming with this great one and then open AI, but like it's changing so fast. Your sensitivity to oh that that should be in this cloud provider and that should be on prem is changing so fast. I think it's super important.

to go with a data platform that can handle this flexibility. And that you know if something changes, uh you can change. You don't have to oh I have to rebuild this data pipeline and you have to change my schema and I have to integrate this new identity system to these new players. I think it's really important to at least over index on uh flexibility. I love it. Let's wrap there. Awesome. Thank you, Kevin.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android