#243 Max: Gemini's New File Search API – Build RAG Agents 10x Cheaper & Easier | AI Fire Daily podcast

00:00

You know, if you've ever tried to build a rag agent for like a real world application, you know the pain. It's just this massive technical headache that can stop a project coal. Oh, it's brutal. The infrastructure alone. You're trying to figure out document chunking, getting the right vector embeddings, and then you have to stand up and manage a whole vector database. Pinecone, Milvus, whatever. It's a full time engineering job. Exactly. And the costs just

00:25

start climbing right away. Yeah. But what if there was a way to just... Skip 90 % of that. Well, that's what we're talking about. Here's the headline that should make you just stop. We're talking about indexing 121 -page PDF, a huge knowledge base, for less than two cents. Welcome back to the Deep Dive. Today, our whole mission is to unpack Google's new Gemini File Search API because it looks like it completely automates the hardest parts of retrieval augmented

00:52

generation. And you can do it with a simple no -code tool. Yeah. And let's just make sure we're on the same page with ARAC. Retrieval augmented generation just means using your own documents to ground an LLM to make sure its answers are based on some kind of truth, not just its training data. That grounding is absolutely everything. So here's what we're going to do. We'll walk through the insane cost savings. Yeah. Then the super simple four -step workflow. And then we

01:19

have to talk about the limitations. Because, you know, if it sounds this good, there's got to be some catches. Okay, let's get into it. Let's see just how simple this really is. So the traditional way of doing our edge, it really was a gauntlet. It wasn't just, you know, pointing an LLM at a file. You had like... A dozen different things you had to build and then maintain. Absolutely. You had to worry about all the different file types, how to ingest them, adding metadata, and

01:44

then the chunking. Oh, the chunking. Recursive character splitting, running every single one of those little chunks through an embeddings model. And only then, after all that, could you even put it in the database. Right. It was so intense. That's just the definition of high friction. So how does this Gemini file search solution... get around all that. What's Google actually doing under the hood? It just simplifies the whole thing from the developer side. You just upload

02:09

the file. That's it. Google takes care of the chunking. They generate their own embeddings and they handle the storage. That entire pipeline is managed for you. So the big idea is you don't have to build your own search system. You don't have to set up a vector database or worry about keeping things in sync. You're just using their pipeline. Exactly. And what's really interesting is that their chunking is probably way better than what most of us would build. It's not just

02:35

split every 500 characters. It understands the document's flow and structure. That's a great point. And just to be clear for everyone, when we say embeddings, we're just talking about turning words into numbers, right? Into a mathematical format so a computer can search them incredibly fast. That's it. It's just turning language into math for quick comparisons. So if you had to pick just one thing. What's the single biggest piece of complexity that this new Gemini method

03:01

just gets rid of? It's that whole chain of manual document splitting, generating the embeddings, and then managing all the separate search infrastructure to glue it all together. Okay, let's talk about the money, because this is where the story gets kind of wild. You said you could index a huge document for pennies. How does that pricing actually work? The main thing is that you really only get charged for that first step, the upload, the indexing, and the cost is just... Tiny. It's

03:26

15 cents per 1 million tokens. Let's put that in real terms again. That 121 -page PDF was, what, about 95 ,000 tokens? Yeah. So the math on that really is less than 2 cents. It costs basically nothing to load your knowledge. It's incredibly cheap to get the data in. And here's maybe the biggest deal right now. Storage is free. Totally free. You are not paying by the gigabyte for all those vectors to just sit there on Google servers. Okay, but what about querying?

03:55

I have this agent running all day. Am I going to get slammed with retrieval fees? No, not for the retrieval itself. You pay the normal rate for using the LLM, you know, for Gemini 2 .5 flash to generate the answer. But the actual cost of pulling the data from your store is, for now, absorbed. I saw the cost comparison table. Yeah. And it's a little bit shocking. Right. Let's say you have 100 gigs of data and you run a million queries in a month. Right.

04:22

For a traditional setup with something like Pinecone plus all the compute you'd need, you're looking at hundreds, maybe even thousands of dollars a month easily. Whoa. Yeah. And with Gemini File Search, that whole first month, including the one -time indexing fee for all that data, is about $47. Wait, $47? Yeah. For that kind of volume? Yeah. That's the moment of wonder right

04:43

there. I mean, imagine. Imagine what you could build, what you could experiment with if your entire knowledge -based infrastructure costs less than a pizza. It just opens a powerful RAG to everyone. It's a huge democratization. It's moving away from spending big capital on infrastructure to just a simple operational cost. So beyond that initial tiny fee, what's the main takeaway on cost for someone just starting out? The fact

05:05

that storage is currently free. that removes the single biggest recurring cost that you always have with traditional vector databases okay so the money part is a no -brainer let's get practical let's talk about building this thing in nat which is basically a tool for visualizing api calls right and you only need four of them four simple http request nodes it's like stacking lego blocks it's really that linear okay walk us through them what are those four steps doing step one

05:33

is create store Think of this as just making a folder. You're creating a permanent named index on Google's side where your documents are going to live. Got it. Then you have to get the actual file up there. Exactly. Step two is upload file. But this is just a temporary step. The file is in the Google Cloud environment, but it's not connected to your store yet. It's just sitting there. So you have to link the file to the folder. That is step three. Move file to store. This

06:00

is the magic step. This is what actually kicks off the indexing and makes the file a permanent part of your knowledge base. And then finally, you can ask it a question. Step four, query the store. This is the request you send to the Gemini model, and you tell it, hey, use this specific knowledge base to help you answer the user's question. Okay, a quick but important detour on setup. Getting the security right. The original notes mention some confusion in Google's documentation.

06:28

So what's the right way to do authentication in ADEM? The best way is to not paste your API key in every single node. That's messy and insecure. Instead, you use ADEM's generic credential type. Right. And you use the query auth option specifically. Correct. You just tell it the parameter is called key and you paste your Gemini API key in there once. Then all four of your nodes can just reference that saved credential. It keeps everything clean

06:53

and secure. And why is getting that authentication right so important, even if you're just building a quick prototype? Because we should always be building securely from the start. It just prevents you from accidentally exposing your key in a bunch of different places. All right, let's talk about actually running this. You said step three, moving the file into the store, is the critical

07:09

one. What happens if you forget to do that? The file just stays temporary, just floating out there in the cloud, and it'll get deleted after a little while. It never gets indexed, so your agent can't see it. You have to make that link

07:20

to the store. Okay, so once it's linked... and we're ready to query it in step four what does that api call look like so we're using the gemini 2 .5 flash model and the most important part is in the json you send you have to include a search config parameter and paste in the unique store name id that you got from step one that's how you tell the llm exactly where to look so if you get that id wrong The agent just defaults to its general knowledge. It doesn't use your

07:48

documents at all. Precisely. It's the whole key. And then, of course, you need good prompt engineering. We used a clear instruction. You are a helpful ag agent. Use your knowledge -based tool for truth. Cite your sources. And there was that weirdly specific rule you found. No punctuation, quotation marks, or new lines. Why that? Huh, yeah, that's just a practical little hack. It's to stop the underlying API from throwing a JSON error when it tries to pass data back. It just

08:14

ensures the data transfer is clean. A strange quirk, but it's necessary for stability right now. So let's get to the results. You threw three really different documents at it. The official rules of golf, an NVIDIA press release, and an Apple 10K filing. How did it do? It did extremely well. For instance, I asked the golf PDF, what happens if your club breaks during the middle of the round? It came back with a perfect cited answer about how you can continue using it or

08:41

have it repaired legally. And what about across multiple documents? Could it find a specific number from the NVIDIA file? Yep. Asked for the Q1 2025 fiscal summary. It correctly pulled out the $26 billion in total revenue and the $22 billion in data center revenue. And it cited the press release correctly. And the overall score. after 10 really tough questions across almost 200 pages of documents, was a 4 .5 out of 5 for correctness. That's amazing for a setup

09:08

that took minutes. What's so impressive about that 4 .5 out of 5 score is that it got there with basically zero complexity, zero fine -tuning, and zero maintenance from the person building it. So we know it's simple, we know it's cheap, we know it's accurate. But now we have to be realistic. This isn't magic. Where does it fall short? What are the limitations? Okay. Limitation number one is a big one for any real application,

09:33

data management. Right now, Google doesn't have any kind of version control for the files in your store. So if I have my Q1 report in there and then I upload the Q2 report, now I just have two of them. You have two of them. And the store gets cluttered with old conflicting data. Your agent might pull from the wrong one. So the only solution right now is manual. You have to track your own versions and remember to delete the old file before you upload the new one. I appreciate

09:58

you sharing that vulnerability. It feels like no matter how advanced the tools get, we're always stuck with data hygiene problems. Oh, yeah. I still wrestle with prompt drift and messy data pipelines in my own complex projects. It's just a constant frustrating challenge in this field. And what about the quality of the documents themselves? Garbage in, garbage out. That rule absolutely still applies. Gemini has OCR, which is great,

10:23

but it's not a miracle worker. If you upload a blurry, poorly formatted PDF, you're going to get bad answers. You still have to do the cleanup. Okay, and limitation number three. This one's about what it's actually good at. Right. This system is fantastic at finding a needle in a haystack. A specific fact, a number, a rule. It fails completely when you ask it for a holistic summary or to understand the whole document. So you couldn't ask it to, say, summarize a 500

10:51

-page book. Exactly. We saw this in our tests. I asked it, how many total rules are in the Gulf PDF? And it answered five. It couldn't see the whole document to count them all. It just found the five nearest chunks that mentioned the word rule. And the last one, which is maybe the most important for businesses. You have to remember your documents are on Google servers, so you have to be really careful about what you upload. No sensitive PII, personally identifiable information,

11:16

and no top secret company data. And you need to think about compliance. Absolutely. GDPR, IAPA, if you have really strict data sovereignty or security needs, you might still need to build your own on -premise solution. This might not be for you. So just to be crystal clear. If your goal is to summarize a huge entire document, should you use this? No. Its architecture is built for finding facts inside chunks, which limits the kind of holistic understanding you

11:45

need for a good summary. OK, so let's pull all this together. The big takeaway here seems to be that Gemini File Search just massively lowers the barrier to entry for RAG. It's a huge leap in simplicity and cost effectiveness. The verdict is pretty clear. You can set it up in 30 minutes. It's basically free for most normal use cases, and it delivers really high accuracy. That 4 .5 out of 5 is no joke. Four simple API calls are automating what used to be weeks of painful

12:13

infrastructure work. It's pretty incredible. So who should be using this right now? I'd say developers who are prototyping org ideas, small businesses that need a simple internal Q &A bot, or maybe content creators trying to organize a huge library of their own work. And who should maybe hold off for now? Big companies with really strict compliance rules, anyone who needs total control over their data, or use cases that are all about deep, full document summarization instead

12:38

of fact -finding. It really lets you change your focus. You can stop worrying about the RRAG infrastructure and just start building a valuable agent. Which leads to a really interesting thought. If this is the trend... If R just becomes a cheap built -in feature of these models, what does that mean for the future of, say, a specialized vector database engineer? Is that entire job going to change? That is a fascinating question to think about. It really is. All right. Go out and build

13:05

value, not infrastructure. We hope this deep dive gave you the clarity you were looking for. Until next time.

Transcript source: Provided by creator in RSS feed: download file

#243 Max: Gemini's New File Search API – Build RAG Agents 10x Cheaper & Easier

Episode description

Transcript