#35 Robin: The "Brain in a Box" - How Hermes Agent Turns a Cheap VPS into a Proactive, Memory-Rich AI Personal Assistant | AI Fire Daily podcast

00:00

We have a very passive relationship with AI right now. Think about your daily routine. You open a tab, type a prompt, and wait for an answer. Then you take that answer and do the actual work yourself. But imagine an AI that doesn't just wait around for your questions. Imagine one that actually acts. It runs tasks in the background. It checks your servers. It messages your phone with updates while you sleep. It is a massive

00:24

behavioral shift, honestly. We are moving from an AI that just answers to an AI that... takes independent action. It's the difference between having a smart encyclopedia and, well, having an actual employee. Okay, let's unpack this. Welcome to the Deep Dive. Today, we were looking at a huge stack of GitHub documentation, developer blogs, and Reddit threads. All about a project called Hermes Agent. Exactly. Our mission is to explore what it takes to build a self -hosted

00:52

AI personal assistant. we're going to cover what hermes actually is we'll see how it compares to heavyweights in the space too right and we will break down the five core pillars of its brain and crucial for anyone listening we're going to look at how to safely set it up yeah so it works strictly for you without you know breaking your computer exactly so let's start with what hermes actually is because the repositories

01:15

are going viral right now They really are. I mean, for a long time, we treated AI like a very smart search box. You asked a coding question, you copied the answer, you pasted it into your editor. You were the one moving the data. It was useful, but it was highly manual. You were still the bottleneck. Exactly. Now, looking at the developer community, user expectations are fundamentally shifting. People don't want a search

01:38

box anymore. They want a personal operator. They want an AI that remembers their preferences. They want it to understand their ongoing projects. And Hermes isn't just another chat bot you log into. It is an open source framework for building your own personal assistant. And self -hosted is a key term here. Yeah. You don't just log into a website. You run it yourself. You can put it on a VPS. That is a rented remote computer

02:03

running constantly in the cloud. Right. Or you can run it on a Mac Mini sitting on your desk. You could even run it on an Android phone using an app called Termux. Right. And because it lives on your infrastructure, it is always there. It doesn't go to sleep when you close a browser tab. It is a persistent entity. It is also multi -channel. This is the part that fascinates me. You don't open a special AI app to talk to it. You talk to it via Telegram, Discord, Slack,

02:30

or WhatsApp. The documentation points to Telegram as the easiest starting point. And that choice changes the psychology of how you use it. Yeah, it makes Hermes feel like a real human assistant living inside your phone. You just text it, you are at the grocery store, you get an idea, and you just send a message to your bot. And it actually uses tools. Look, standard AI is like a smart dictionary. It knows a lot of things, but it just sits on the shelf. Right. Hermes is like

02:54

giving an assistant their own desk. You give them a set of tools, a memory system, a calendar, and a phone number. It can run actual terminal commands. It uses browser automation to look at websites. It even has vision to analyze images you send it. That is the crucial difference here. The workspace is where it runs. The tools are what it uses to manipulate that workspace. The calendar is for its scheduled tasks. It isn't just generating text. It is executing logic.

03:23

Whoa. Beat. I mean, imagine an assistant that remembers your exact workflows. And it can run them in the background without you typing a thing. It's wild. It's pulling reports and formatting data while you literally go for a walk. That is the real promise here. It isn't about writing a better poem. It creates reusable workflows for the tedious digital chores you repeat often. But this makes me wonder, does this mean we're replacing our standard chat GPT habit entirely?

03:48

No, it just moves AI from conversation to... actual background task execution okay that makes sense it is a different kind of tool entirely and that brings us to the broader landscape yeah if we're going to invest time setting this up we need to know where it fits we don't want you using a hammer to drive a screw exactly let's look at how armies compares to other big names in the agent space specifically claude code and open claw right if we connect this to the bigger

04:15

picture the ai agent space is fragmenting really fast Six months ago, an agent was just a buzzword. Now, different tools have very specialized jobs. Let's start with Cloud Code. Going through the developer blogs, it is clear this is built for focused, deep work. desk work. Exactly. Cloud Code is what you use when you are sitting at your computer staring at a massive code base. It excels at local environment management. It edits your files, manages your repositories,

04:44

and works deep inside your terminal. It is a dedicated coding partner. But it is not a mobile first assistant. You aren't going to text Cloud Code on WhatsApp while you're driving to remind you about a meeting. No. Or to summarize a news article. Right. It is heavy machinery. Then on the other end of the spectrum, you have something like OpenClaw. The repositories show this is a much heavier system. Okay. It is basically a multi -agent gateway. Wait. A gateway system.

05:09

So it connects messaging apps, multiple specialized agents, and external plugins into one massive architecture. Yeah. That sounds incredibly complicated. It is. It is powerful, but it can feel very brittle. It is really designed for power users, startup founders, or developers building complex commercial platforms. So when you have that many moving parts, if one plugin updates, the whole house of cards might fall. Exactly. It requires constant maintenance. Which leaves us with Hermes Agent.

05:38

It sits in a very specific practical lane. It is lightweight. It is telegram first. And it is designed to be a self -improving personal assistant. Hermes is for when you are walking around with your phone, but you still want your digital work to move forward. You want something that grows with your specific habits. And as a bonus, the architecture is great for testing new open source language models. Yeah, definitely.

06:00

But hold on. If I am using cloud code for my programming and I am setting up Hermes for my daily schedule, aren't I just giving myself a management headache? I mean, it feels that way at first, but separation of concerns is actually

06:12

it. s practice here right you wouldn't ask your accountant to also design your logo exactly you want specialized tools for complex tasks hermes acts as the project manager while claude is the factory worker so if i'm just coding an app is hermes the wrong tool yes use claude code for deep coding and hermes for daily assistant tasks got it now we really need to look under the hood to understand how hermes manages to be so fiercely independent we have to examine its brain Yeah,

06:42

the architectural docs break this down into five core pillars. These five pillars are the exact mechanisms that turn it from a reactive, forgetful chatbot into a proactive, reliable agent. Pillar number one is memory. And honestly, this is the biggest friction point with AI right now. Most AI models wake up with total amnesia. Oh, absolutely. Every single section starts totally fresh. That

07:04

is incredibly frustrating. You don't want to explain your business model, your project context, and your personal communication style every single time you open a chat. Ermi solves this at the root level using two specific markdown files. Let's break those down. The first is the user file. This file is entirely about who you are. Right. It holds your professional role, your preferences, and your writing style. You can literally write in the file, keep my email simple,

07:31

warm, and direct. And the mechanism here is crucial. Every time Hermes wakes up to do a task, it secretly reads that file first. It gives itself a crash course on who you are before it even sees your prompt. The second file is the memory file. This holds your project context. Yeah. So business information, server environment details, recurring facts, stuff that doesn't change daily. Yeah. But the docs warn you have to be careful here. You really do. You only save durable facts, not

07:57

temporary task statuses. If you save every little to -do item, the file gets bloated and the AI gets confused. Exactly. Pillar two is skills. If memory dictates what the AI needs to remember, skills dictate how it repeats a task. They act as reusable recipes for your repeated digital chores. These live in individual skill files. They contain step -by -step instructions. They also use YAML front matter. Let's define that. Hidden code tags that tell the AI when to trigger

08:23

skills. So instead of just guessing how you want to report formatted, it reads the hidden tags, sees the trigger word, and pulls up the exact blueprint you created last week. And I have to admit, I still wrestle with prompt drift myself. Yeah. Beat. Yeah, it's so common. You know, having to constantly rewrite instructions because the AI slowly forgets what I want it to do by the 10th message in a chat. This skill system. completely fixes that. It absolutely does. You stop asking

08:51

the agent to improvise every time. You give it a rigid track to run on. Which brings us to pillar three, the soul. This lives in the soul file. This one's fascinating. This file gives the agent its actual personality and tone. You can make it warm and empathetic. You can make it highly sarcastic. Or you can make it strictly technical, returning only code and no apologies. And this matters deeply if Hermes is, for example, drafting

09:15

customer support replies for your business. A generic robotic assistant sounds totally forgettable. You want it to sound like your brand. How does the sole pillar actually change the output practically? It dictates the tone, preventing your assistant from sounding like a generic corporate robot. Right. It literally changes the weights of the words the model chooses. Pillar four is cron jobs. Let's define that jargon too. Scheduled tasks that run automatically at a specific time.

09:42

This is the feature that turns Hermes from a reactive tool into a proactive operator. You aren't just messaging it, it is messaging you. You schedule a market research summary for 7 a .m. or database backup every Friday at midnight. It just happens without a prompt. And pillar five is the self -improving loop. This is the absolute magic of the system. It is the process of turning your manual corrections into permanent skills. You ask Hermes to do a piece of work.

10:10

It does it, but maybe the output is messy. You correct it. Then Hermes saves that newly improved workflow. It searches its past sessions to keep continuity. It literally rewrites its own instructions. So the next time it runs that task, the output is simply better. It learns from its mistakes. Okay, we are going to talk about how to safely set this architecture up next, right after this. Sponsor, here's where it gets really interesting. We know how its brain works with the markdown

10:39

files and the cron jobs. Now, how do we safely give it a body in the real world? Because we definitely do not want an autonomous AI accidentally destroying our personal files. Setup and security are absolutely critical here. Remember, Hermes can run terminal commands. That means it can delete files. It can install software. That is incredibly powerful, but very risky. The documentation strongly recommends running it on a separate

11:04

VPS or an isolated Docker container. Let's define Docker container, a secure digital box keeping software isolated from your main computer. Think of Docker like putting the AI in a soundproof room inside a shared house. If the AI makes a massive mess, the house stays perfectly clean. You just delete the room and start over. It keeps everything safely organized. You log into your remote server, install Docker, create a clean folder, and then run the official installation

11:27

script. Right. You do not just download it to your desktop. During that setup, you connect it to Telegram using a tool called Botfather. This gives you a unique bot token. But the community forums highlight a massive security warning here. They do. You absolutely must lock the bot strictly to your specific Telegram user ID. Yes. If you skip that step, anyone who finds your bot on Telegram can start texting it. And since it has terminal access, random strangers could be running

11:56

malicious commands on your private server. It is a huge vulnerability. Another strict rule. Never, ever paste your API keys, your passwords, or your tokens directly into the chat window. You store those secrets safely in .env files. That's a hidden digital safe used specifically for storing your private passwords. And you use a .gitignore file to protect them from being uploaded to public code repositories. Clean security

12:20

habits from day one are essential. The developers also recommend disabling any tools you don't actually need yet. If you aren't using browser automation, turn it off. Limit the attack service. So why shouldn't I just run this locally on my MacBook right now? An autonomous agent with terminal access could accidentally break your personal computer files. That is a terrifying thought. So we put it in a safe space, we lock it down,

12:43

then we build our first workflows. Yeah, and the community guides say you should not build complex multi -step automations on day one. Start very small. The very first workflow you build should be connecting a private GitHub repository for daily cron job backups. This makes total sense. You are backing up the memory and skill files. You are literally saving the brain and habits of your assistant. If the server crashes, you don't lose the weeks of training you put

13:11

in. The second recommended workflow is a 7 a .m. morning briefing. You set a cron job for Hermes to pull your top priorities, scan relevant industry news, and flag urgent calendar reminders. It sends it to your phone right when you wake up. It makes Hermes feel genuinely helpful immediately. You get a clean, customized start to your day. The third recommended workflow is a weekly server health check. Reviewing disk usage, checking

13:34

running services, and scanning error logs. Hermes lives on a server, so ask it to manage that server. Exactly. Checking disk usage isn't just asking, is the disk full? It's having the AI run the command, read the output, notice that a log file is eating up 80 % of the drive, and texting you to ask if it should clear it. It built a great maintenance habit early on. We have the setup, we have the safety checks in place. What does this actually look like in the wild when it starts

14:01

working along? This is where we look at the real -life applications. Sure, you can use it as a simple personal assistant for reminders, but the architecture scales up beautifully. You can use it as a full business operator. monitoring a Discord community, summarizing hundreds of YouTube comments to find common questions, or drafting technical support replies based on your actual documentation. What's fascinating here is that the value isn't day one perfection. You

14:28

have to adjust your expectations. The real value is day 30 automation. Exactly. Think about a content creator workflow. Hermes could draft scripts, pull research, and create data diagrams. You can give it a cron job and say, every Friday at 4 p .m., review my audience comments from the week and create 10 new content ideas. It just happens. And you can interact using voice,

14:49

which is huge for accessibility and speed. You can be out on a walk, not looking at a screen, and just brain dump a chaotic voice message into Telegram. Hermes will transcribe it, parse out the actionable items, and create a structured task list in your system. It also has vision capabilities. You can feed it UI screenshots from an app you were building or complex data diagrams and ask it to check for visual errors. Or you can use it as a dedicated, relentless

15:14

research assistant. You ask it to monitor specific AI news feeds during the week. Compare three different software tools. Pull the pricing. format it into a table, and save the findings directly to its memory. And this all feeds right back into that magic self -improving loop we talked about earlier. You start with one real messy task. You let Hermes attempt to help. You review the output and you correct it. You say, no, don't use bullet points. Use numbered lists and keep

15:40

the summaries under two sentences. And then you save that exact pattern as a permanent skill. What happens if the first research reported generates is completely useless? You correct it, then tell it to save that improved format as a skill. It is so incredibly practical. You are not just sitting around waiting for the big tech companies to make the base AI model smarter. You are actively teaching your specific instance how to work perfectly for you. Beat. you build a fundamentally better

16:09

way of working together. By offloading these routines, it drastically reduces the number of small, exhausting decisions you have to make every single day. So let's bring this all together. Hermes Agent and tools like it represent a fundamental shift in computing. We are moving from renting intelligence one prompt at a time to owning an evolving operator to sex silence. It builds its own memory. It builds daily habits. It builds complex workflows tailored exclusively to the

16:35

messy reality of your life. It is definitely not magic out of the box. I think that's important to stress. It requires genuine guidance and discipline. You have to actively prune its memory and take the time to create those foundational skills. But the compounding return on your time is immense. Once you get those skills dialed in, it operates silently in the background, constantly pushing your work forward while you focus on the big

16:59

picture. It essentially becomes a custom, lightweight operating system for your professional life. Before we wrap up today, I want you to think about this. Look closely at your daily routine right now. What is the one tedious digital chore you do every single morning? Pulling data, checking a dashboard, formatting an email that you can hand off to an autonomous assistant right now.

17:22

Thank you for joining us on this deep dive. We covered a lot of technical ground today, but hopefully it gave you a blueprint for the future. So what does this all mean? It means your new assistant is waiting for instructions. Out hero music.

Transcript source: Provided by creator in RSS feed: download file

#35 Robin: The "Brain in a Box" - How Hermes Agent Turns a Cheap VPS into a Proactive, Memory-Rich AI Personal Assistant

Episode description

Transcript