So imagine you're talking to someone, right? And maybe five minutes after you tell them your name, they forget it. Oh, yeah. Or, you know, you poured your heart out last week about your big life goals, and today, blank stare, like you've never even met. It's kind of like that movie Memento. but for your AI assistant. Exactly. Honestly, that's probably the biggest thing holding AI agents back right now, this goldfish memory problem. Most just have this tiny temporary buffer.
It really stops them from becoming true partners, doesn't it? Or collaborators. Definitely. Welcome to the Deep Dive. Today we're... taking a plunge into fixing that, giving AI real, actual, long -term memory. And it's not just about remembering the last two chat messages, not at all. No, it's much bigger than that. It's about building AIs that genuinely understand you. You know, your preferences, past talks, goals. That's how they give you that really personal evolving help.
So our mission today is to unpack this core problem. We're going to look at how these relational knowledge graphs, something like this open source tool called Zep, can offer a real solution. Zep's basically a database made for AI memory. Exactly. And we're going to walk you through how this memory gets built from scratch. And the big one. How you do it without racking up some crazy huge bill from your AI provider. Yeah, the cost thing is key. We'll also get into some advanced tricks,
the ethics side of things, privacy. Super important. And even give you a kind of four week plan to try this yourself. The goal is you'll understand how to take your AI agents from, well, forgetful assistants to really powerful collaborators. All right, let's get to it. So when you first start with an AI agent, its memory is. Well, pretty basic. Mm -hmm. Very basic. It usually just remembers the last few things said, maybe five to 15 messages back and forth. It's all
in this temporary digital space. If you say, hi, my name is AI Fire, sure, it can say your name back. But, and this is the crucial part, the fatal flaw, really, it's just reading a transcript. The AI isn't learning anything deep. It's just looking back a few lines. So if the chat goes on too long. Or you start a new one later. Yeah. Poof. That old context is just gone. It doesn't understand AI fire as a person. It just knows those words were typed recently. It remembers
the words, not the person. Okay, so contrast that with real long -term memory. What does that look like? Maybe I ask, hey, remind me about that Paris trip plan from last month. Yeah. And the agent, without you feeding in any details again, just comes back with, sure, here's that personalized plan for John Doe's Paris trip next month. Duration. Budget. Around $2 ,000. It pulls John Doe and the budget from its stored knowledge. Not from what I just typed. Exactly, from deep
storage. Learn stuff. That difference, remembering words versus actually understanding a person, that seems huge. What's the fundamental difference there, inside the AI? It's remembering recent chat versus deeply understanding you. Okay, so how does it know all that stuff then? The John Doe details, the budget, the trip. It's this relational knowledge graph you mentioned. Yep, that's the core of it. And it's not just a simple
list of facts. Think of it more like a mind palace, or maybe a really detailed visual map of how knowledge connects. For every single user, it's like their own personal Wikipedia. It stores facts, identifies the key things, the entities, like John Doe or Paris, and crucially, it maps the relationships between them, like John Doe plans trip to Paris. And I guess as you talk
more, that map gets... Dense exactly grows new things get added like maybe the musee door say and new connections like John Doe is interested in art museums Interesting and it's not just the connections each thing in the graph each entity like John Doe or Hotel Paris Central Gets its own little summary generated by the AI something like John Doe user planning Paris trip budget $2 ,000 so this structured brain lets the AI quickly scan for the key ideas and links, makes
its answers way faster and smarter, because it doesn't have to reread tons of old chat logs. So how does this mind map make the AI smarter, essentially? It's a structured brain for quick, precise knowledge retrieval. Let's try to picture this starting from zero. Imagine a totally new user. Let's call him Max. He talks to the agent for the first time. He says, my name is Max. I enjoy hiking, and I currently live in Vancouver, Canada. OK, boom. Instantly, behind the scenes,
Zep kicks in. It creates a main thing, an entity, Max. Right. Then it starts drawing lines, the relationships. Yeah. Max likes hiking. Max lives in Canada. Maybe even AI assistant A's chatting with Max. It's like stacking Lego blocks of data, building up that context piece by piece. And then Max shares more. The graph gets richer. Maybe he adds, I usually hike on weekends. My favorite trail is Grousegrind. I also like taking photos of nature, but I'm not a pro photographer
yet. OK. Graph expands again almost instantly. New lines, Max favorite trail, grouse grind, Max enjoys nature photos, Max skill level amateur photographer, and a new entity pops up, grouse grind, tagged as a hiking trail, Max likes. Okay, now for the payoff, the intelligence test. Max asks, got any ideas for what I could do this Saturday? Right, and the agent doesn't just give some generic stuff, it queries that knowledge graph it just built about Max. It pulls the pieces
together. Just the size, is it? Yeah. The response might be something like, hi Max. Okay, since you like hiking and nature photos, here are some ideas for Saturday. One, hike a new trail, maybe try a different Vancouver trail, mix it up from Grau's Grind, like Lynn Canyon or Cypress Mountain. Ah, see, that's the magic, isn't it? It combined totally separate facts, the hiking, the photos, the location, his usual spot to give a really personalized recommendation. It didn't just recall,
it connected things. Absolutely. What's the magic when the graph grows? Agent combines diverse facts for truly personalized results. So, this all sounds amazing, right? This AI that actually gets you. But, oh, here's the catch. The thing people don't always talk about up front. There's always a catch. As that memory gets smarter and deeper, Every single check can get way way more
expensive. Ah the cost why it comes down to tokens That's how most AI models charge you think of a token as like roughly a word or sometimes part of a word Okay Every time you send a message the agent doesn't just send your message to the big brain the large language model or LLM It bundles up this whole context package. What's in the package usually a summary of you? the user, plus relevant facts pulled from that Knowledge Graph we talked about, and maybe the last few
messages from the chat. Gotcha. So in our simple max example, asking about Saturday, how many tokens was that? Believe it or not, around 2 ,727 tokens, just for that simple question. Now, scale that up. Imagine a loyal customer you've interacted with for months. Their Knowledge Graph might have hundreds of facts. That context package could easily hit 3 ,000, 5 ,000, maybe even 10 ,000 tokens per message. Okay, let's do the math. If it's, say, 0 .002 tons per thousand tokens.
Right. 3 ,000 tokens is 0 .006 cents. Six tenths of a cent. Doesn't sound like much. Quite. But if you have 1 ,000 users chatting daily... That's $6 a day, $180 a month, just for the memory piece. And that can easily balloon into thousands if your graph gets really big or you have lots of users. So why does having a smart AI memory get so expensive? AI models charged by tokens and long context uses many. Right. So the million dollar question, or maybe the thousand dollar
a month question. How do we get this powerful memory without going broke? Method one. Smart context filtering the surgical approach as you called it. Yeah, because a lot of the off -the -shelf memory tools They kind of act like a blunt instrument. They just grab everything all the facts all the history It's frankly often just
lazy engineering. Okay. So what's the fix? You got to take control instead of pulling everything blindly you use direct HTTP requests basically specific web commands to be like a surgeon You precisely select only the relevant info. So you're telling the system just give me the last 10 messages Exactly. Or, show me only the top three facts from the long -term memory that are really relevant to this specific question. Maybe you even add a filter, like ignore anything less than 70 %
relevant. Okay, contrast the flows. Standard flow. User message block. Grab all facts plus entire history directs that stuff it all into the LLM. wasteful, super wasteful, optimized flow, user message block, targeted requests for say last 10 messages, smart search requests for top three relevant long -term facts, merge that small relevant package, then send to LLM. Much leaner, faster, focused. Totally. And you can
get clever with the search query itself. Maybe use the LLM first to refine the user's raw message into a better search term before you even hit the knowledge graph. And I know you mentioned using tools like N8n's code node to handle some of the data formatting. That can be tricky. Oh, yeah. I mean, I still wrestle with prompt drift myself sometimes when trying to get AI to structure complex JSON data perfectly. It's not always
easy. You might ask Claude or another AI to help write the JavaScript snippet to clean it up. That's a good tip. So the results of this surgical approach? Dramatic. Like we saw, you can go from maybe 2 ,700 tokens per interaction down to around 670. Wow, that's a huge drop. Yeah, like a 76 % reduction. Cuts your API costs by more than half. Easy. So how exactly does this surgical approach cut costs so dramatically? By sending only highly relevant filtered data to the AI.
OK, method one sounds great. But sometimes, even being surgical hits a wall, right? You mentioned some APIs make it hard to get history in the right order. Yeah, exactly. Some systems, maybe they give you the whole history, but oldest first. So to get the last 10 messages, you have to pull everything and sort it yourself. Kind of defeats the purpose of being efficient. So that leads to method two, hybrid memory architecture. Right. The best of both worlds approach. The core idea
is super simple, but really powerful. Different kinds of data belong in different kinds of databases. You know, you wouldn't use a hammer for a screw. Makes sense. So what's the two brains set up here? OK, so for your long term memory, that complex web of facts and relationships, you stick with Zep's knowledge graph. That's its superpower. It's built for that. Got it. And for short term. For the recent conversation history, the last 10, 20 messages, use a standard Simple database
like Postgresql. It's super fast and really efficient for just storing ordered lists with timestamps. Perfect for recent chat history. Ah, okay. So you get the deep understanding from Zepp, but the lightning fast recall of recent stuff from Postgresql. Exactly. The best of both worlds. How does the flow work then? Message comes in. Message arrives. Then at the same time, you fire off two requests. One API call to Zepp for the
top. say three relevant long -term facts, and D, a super quick query to PostgreSQL for the last 10 messages. Okay, parallel requests. Yep. Then you quickly merge those two small relevant context packages together, send that combined package to the LLM. And after the LLM replies? You update both memories. Add the new exchange to PostgreSQL and let Zep process it to update the knowledge graph if needed. Let's picture it. User asks something complex like, where should
I move? I need a place that fits my interest. Right. An agent with this hybrid setup can give a really nuanced answer. It uses ZEP, the long -term brain, to pull facts like enjoys hiking, knows about Lynn Canyon Park, interested in photography, and it uses Postgresco, the short -term brain, for the immediate context, like the user just mentioned, my future, or thinking about change. It combines both for a personalized insight without wasting tokens on stuff that's not relevant right
now. What's the main advantage of this hybrid two -brain approach, then? It combines deep relational understanding with lightning -fast recent recall. This is where it gets really exciting. Moving from a single demo to something that works for tons of users. The key is session IDs, right? Absolutely. It's fundamental. The whole system hinges on this. Every unique user gets their own unique identifier, the session ID. Think of it as the key to their own private knowledge
graph. So this could be their Telegram chat ID. Or their email address if it's an email bot or user account ID from your website. Anything unique to them. And the benefit. Massive scalability. Thousands, even millions of users can be talking to the agent at the same time. But each conversation is totally separate. Max's knowledge graph doesn't leak into mics. It's completely isolated. That's how you go from a chat bot to a real AI workforce. Think about the applications. Customer support.
Oh, yeah. An agent that remembers your entire purchase history, pass support tickets, even your preferred way of communicating. You never have to repeat yourself. It feels like talking to a dedicated account manager who actually knows you. Or an educational tutor. Building a unique learning profile for every single student. Tracking their progress, spotting weaknesses, adapting teaching styles automatically. It's incredibly
powerful. The sales assistant. Imagine. Detailed history for every prospect needs, objections raised before, personal interests they mentioned offhand. It's the perfect briefing for the human salesperson stepping in to close the deal. And even just onboarding new users. Remembering exactly where they left off in a complex setup process that could massively reduce churn. People hate starting over. Whoa. Just imagine the impact. A sales agent that knows hundreds of prospects
intimately, instantly. or a support agent recalling every single detail. That really scales human -like intelligence. It's a total game changer for the user experience. So how do these AI agents manage to remember so many different users distinctly? Each user has a unique session ID and a private knowledge graph. Okay, so we've got the core methods down. What about taking it to the next level? Pro strategies. Definitely things you can do. For cost optimization, you can play with
dynamic relevance scoring. Meaning? Adjusting that relevance threshold. Maybe lower it for creative brainstorming tasks where you want more tangential ideas, but crank it higher for technical support where accuracy is paramount. OK, that makes sense. You can also set up entity prioritization. Tell ZEP, hey, things like pass support ticket ID are way more important than favorite color. So it prioritizes retrieving the critical stuff. Nice. What about old information? Intelligent
memory decay. Basically, if a fact hasn't been relevant or accessed in a long time, its important score gradually fades. It stops outdated info from cluttering things up. Smart. And for really big scale. You might look beyond just Postgresql for the short -term memory. Maybe Redis for absolutely blazing fast caching if you have users hitting the same topics repeatedly. Or Elasticsearch if your knowledge graphs become truly enormous. And managing the memory itself. Critical. You
need limits. Set a max number of facts per user graph. Have a strategy to archive graphs for inactive users on a cheaper storage. Otherwise, costs and slowdowns are inevitable. It sounds like there are pitfalls, too. What are the common mistakes people make building these? The landmines to avoid. Oh yeah, plenty. Number one is probably over -storing. Saving every tiny detail makes the graph noisy and less useful. Use those relevance thresholds. Prioritize entities. Session ID collisions.
Using ideas that aren't truly unique. Disaster. You mix up user data. Always use secure, unique IDs like UIDs or properly hashed identifiers. Unbounded memory growth. Just letting graphs grow forever without limits or archiving leads to slowdowns, ballooning costs, implement those limits. Poor relationship quality. Sometimes the AI extracting facts, it gets it wrong, creates weird or incorrect links like Maxi located in
Panama when he's in Vancouver. You need to validate, maybe fine tune the prompts used for extraction. Good point. And the last one. Ignoring token optimization. Just assuming memory is worth any cost. You have to monitor usage and implement filtering like we discussed. Costs can sneak up on you fast. So what are the biggest mistakes people make when building AI memory systems? Over storing data, wrong IDs, unchecked growth,
poor data quality, ignoring costs. Building these powerful memory systems isn't just tech though. There's a huge ethical dimension. Absolutely. With great memory comes great responsibility. A system that remembers so much about someone requires you to be a really careful guardian of their privacy. What are the key principles there? Transparency, number one. Tell users the agent remembers things to help them. Something simple like, to improve our chats, I'll remember
key details. Goes a long way. And letting users control their data. Crucial. The right to be forgotten, like under GDPR. Users must be able to easily see their data. export it, and most importantly, delete it if they want to. And security, obviously. Non -negotiable. Especially if you're storing anything remotely sensitive, robust security is a must. And you mentioned performance benchmarks earlier. These optimizations aren't just theory, they have real impact. Huge impact. We looked
at cost per 1 ,000 interactions. Basics app might be, say, $150 to $240. Okay. Our optimized HTTP filtering method. cuts that way down maybe $60 to $90. Big improvement. But the hybrid architecture, that's the winner. We saw a cost between $48 and $72. That's a massive saving compared to the basic setup. And user experience improves too. Dramatically. Response times drop from like three eight seconds down to one three seconds. Accuracy jumps from maybe six out of ten to eight
point five out of ten. User feedback goes from yeah it's okay to wow this thing actually gets me. And it's fast. So what's the biggest tightrope walk for developers building these powerful AI memories? Balancing advanced memory capabilities with user privacy and data security. OK. So for listeners who are thinking, all right, I want to build this, you put together a kind of four -week plan. Yeah, a practical roadmap to get started. Week one, foundation setup. Meaning?
Get Zepp installed. Get PostgreSQL running. Maybe grab a pre -built workflow template. Start having simple chats just to see the graph begin to form. Get the basics working. Week 2. Customization. Start tailoring it to your specific need. Adjust those relevant scores. Maybe define some custom types of entities you care about. And critically, set up proper session ID handling for how your users will connect. Week 3. Optimization. Now, implement those cost -saving tricks. Fine -tune
the limits, the relevance filters. Double -check those PostgreSQL queries are running fast. Set up monitoring so you can actually see your token usage and costs. And weak for. Production deployment. Test it hard with simulated users to make sure everyone's data stays separate. Set up backups and a recovery plan. Then you're ready to go live. So the bottom line here, this isn't just
about fancy or chatbots. No. Not at all. This is really the foundation for a whole new class of AI agents, agents that can actually form genuine, useful long -term relationships with people. They learn from the past, they understand what makes each user unique, and they provide value that actually gets better over time. Right. The core message is simple. The smartest AI model in the world is basically useless if it can't remember what actually matters to the person
it's talking to. And building this kind of memory, this understanding, that's a real edge. Huge competitive advantage. While everyone else is building agents that forget everything tomorrow, you can build agents that learn and grow with your users. It's about remembering what matters, doing it securely and doing it respectfully. So here's a final thought to chew on. As AI gets more and more woven into our lives, how do you
think our own human memories might adapt? when we can lean on these agents that, well, never forget. That's a deep question. And what does understanding even mean when an AI can perfectly recall every single thing you've ever said to it? Lots to think about there. We definitely encourage you to explore these possibilities, giving your AI agents memory that's intelligent, affordable, and ethical. That's our deep dive for today. Out, T -Pro Music.
