¶ Welcome to Practical AI!
Welcome to the Practical AI Podcast, where we break down the real world applications of artificial intelligence and how it's shaping the way we live, work, and create. Our goal is to help make AI technology practical, productive, and accessible to everyone. Whether you're a developer, business leader, or just curious about the tech behind the buzz, you're in the right place. Be sure to connect with us on LinkedIn, X, or Blue Sky to stay up to date with episode drops, behind the scenes content, and AI insights. You can learn more at practicalai.fm.
Now on to the show.
Welcome to another episode of the Practical AI Podcast. This is Daniel Whitenack. I am CEO of Prediction Guard, and I'm joined as always by my cohost, Chris Benson, who is a principal AI and autonomy research engineer. How are doing, Chris?
Doing great today, Daniel. How's it going?
¶ Introducing the Guest
It's it's going it's going really good. I I feel like I'm going to, you know, just add a bunch of tools to my to my toolkit today, via via MCP, because today we have with us Craig McLuckie who is CEO of StackLock, which we'll learn a little bit more about and, of course, talk a a good bit about MCP and other things. Welcome, Craig. It's great to have you.
Hey. Thanks for having me on the show.
Yeah. Well, I I kind of I mean, and people can look, look up the the projects, that you're working on now on StackLock. A lot of that has to do with MCP and AI on top of Kubernetes. Do you wanna just give us a a little bit of a of a kind of setup of why you are spending your time thinking about this intersection of AI, MCP, Kubernetes?
¶ Craig's Background and the Intersection of AI, MCP, and Kubernetes
Yeah. I mean, think for me, it's it's sort of interesting. And my you know, maybe I can just introduce my background a little bit. You know, I'm an infrastructure guy. I've been building infrastructure technology for pretty much the totality of my career.
You know, I built a lot of infrastructure tech when I was at Microsoft. And then at Google, I was happy to meet my friend Joe, we built out a Google Compute Engine, which is virtual machine infrastructure technology. We started the Kubernetes project, did a few other things inside there. Subsequently, we've built a lot of other technologies together either as in the startup context or as part of larger companies. You know, I think for me, there's this sort of a lot of interesting parallels.
I think, you know, history doesn't repeat itself, but it often rhymes. And, you know, one of the things that, you know, that got me really kind of captivated by this whole MTP thing was, you know, taking me back to my early career. I remember the first time I saw Docker. And so Docker is a technology, if if if the audience isn't familiar with it, that enables people to package up an application and all of its dependencies in a single container that's then highly portable and can be deployed everywhere. And when I saw Docker, I saw two things that were kind of, you know, occupying the same space.
Was one of these wonderful technologies where, you know, it solved an obvious problem that developers have, which is how do you package up an application so it can run pretty much anywhere with all of its dependencies. That's fantastic. But you could also peer through it, and you could see Kubernetes on the other side of it, meaning an orchestration system that enables you to build more complex applications that you run the way that cloud native organizations like Google ran their applications. And it did double duty. And and when I saw MCP, I had that same kind of moment where you could see this technology occupying two spaces at the same time, which is it's very rare and it's very wonderful.
And, you know, the the one part of it was it hints at what the future of a end tier AI native application might look like, where you start to think of the the LLM as the presentation layer and view model for an application. It's starting to describe how you would formalize interfaces, what that middle tier of the application looks like if you if you think about existing databases and and systems being the kind of the the sort of persistence tier for the for these these modern applications. But it also, you know, pointed to a set of capabilities that I think are gonna be extremely exciting to not just exciting, extremely necessary to, you know, large organizations. Because as you're bringing agentic systems into your environment, they are stochastic. There's new rules.
Like, they're very you know, they can be quite difficult to integrate sustainably and securely. And MCP really represents this small, protocol that you can use to start reconciling the behavior of systems that are accessing the real world and also setting up guardrails and controls. And so I saw this and I was really captivated by it, and it started, you know, you know, causing me to ask questions like, hey. I think I can see, you know, what the future might look like. And I think that we need to be able to operationalize this layer for organizations.
And that really motivated me to start working on on the work that we've been doing here at Stackwell.
And just so people I mean, we've talked about MCP on the show before. Some audience members may remember that some that maybe that's this is their first episode. Could you just give us a kind of high level of maybe maybe a little bit of the why of MCP and then the what of of what it is?
¶ Understanding the Model Context Protocol (MCP)
Yeah. So I mean, the the why of MCP, like, you have this new technology called the transformer. Right? Like a a large language model, generative AI. And one of the most remarkable and wonderful things about it is that it interacts with you in natural language.
It's very good at interacting with your natural language. It's very good at dragging semantic meaning out of large quantities of information. And and it's it's about, you know, taking data, turning data into knowledge, taking knowledge, turning knowledge into into a decision, and then, you know, turning a decision into an action. Right? You know, it's this new new capability.
But the trick here is that there are some things as good and there's some things as bad at. It it it is really programmed to work in a kind of natural language format. So asking it to interact with relatively traditional APIs can be very fiddly. It's not necessarily set up to deal with the authentication authorization process. You know, it it sometimes isn't necessarily to be completely trusted.
And so by introducing model context protocol, effectively what Anthropic did was start to describe the outside world in relatively simplistic natural language terms with adjacent schema backing it that would enable large language models to start reasoning about what tools exist and being able to start invoking those tools in a deterministic way. And so the way I think about it is effectively the selectively permeable membrane that an organization can wrap around its existing systems that allows value to flow through in both directions, but enables you to start asserting the controls that you need to be able to enable AI systems to actually do real work in the real world.
Yeah. That that's amazing. And could you give maybe just some concrete examples of some of those kinds of systems within an organization that you might wanna tie in this way? I know there's like innumerable and people could imagine anything, but just so people have some concrete concrete examples.
¶ Concrete Examples of MCP in Action
Yeah. I mean, so let's let's just imagine the the workflow of a recruiter. Right? Like a recruiter is gonna be doing their work on a day to day basis, and they're gonna be interacting with email. They're gonna be interacting with LinkedIn.
They're gonna be acting interacting with the CMS system. They're gonna be interacting with the calendaring system, etcetera. Right? Now the way that they would do their work today is they would do a lot of jumping between different SaaS systems. So I might go to Gmail and send a note to a candidate.
I might go to the Google Calendar app to, you know, calendar something, or I might have my my content my my candidate management system that has some natural integration to support these things. But I'm constantly jumping around. And what I'd ideally like to be able to do is have an AI assistant that is able to access my email, is able to access my calendar, is able to access my candidate management system, and but is able to do that in a way that I have some level of control over. Right? Like, I don't want this thing just going rampant and sending things.
And so what what MCP enables you to do is to start taking all those systems and describing nouns and verbs. Like, what is a candidate? You know? What is a what is a what is a what is a calendar invitation? You know?
What are the actions I wanna perform? Schedule an interview. Do this type thing. And start to describe those as discrete resources that the AI model can go and acquire and and tools that the AI can actually invoke to to do work in the real world. And by, you know, bubbling it up to a level where these things are described in relatively simplistic terms and then presented to the the the the LLM, they can now discover tools that are available to it, where it's like, oh, you want to do this?
Well, let me see if I can access your calendar. Oh, I can because this calendar is now available through through this MCP server. And what it also enables you to do is start dealing with the authentication authorization problem. Right? That that that that recruiter has an identity.
You know, that could be like an Okta or be an Entro or one of these IDPs. Being able to now set it up to say, like, hey. An agent that's working on behalf of this recruiter should be able to access these systems as the recruiter is able to, but also with certain levels of controls in place, because you don't necessarily just wanna give these things unfettered access. And so MCP is really the gateway to value for enterprise systems, and and this is why I think people are so excited.
And yeah, just as a to to, I guess, add add to that, even literally earlier today, like three hours ago, I was I was sitting at my desk in our our open office, and I heard one of our go to market folks go over to one of our engineers and say, hey. I'm I need to set up these, I forget how many it was, four or five MCP servers in the agent platform that that we use in our platform. And the engineer would say like, why I I mean, I think his mind was in like Claude Code World. It's like, use my agent and all my MCP servers to my technical stuff, but this goes beyond that. Certainly, it applies to developer tools because obviously, agentic things power developers now, but it it certainly goes beyond that.
I think the ones he was pointing to, I think it was HubSpot and Instantly and LinkedIn or I I forget which ones, but that that level of things.
And it's it's really interesting because I think, you know, what we're seeing, you know look. Let's be clear. LLMs are really great at writing code. Right? Like, it's it's one of the things that they do tremendously well, but they're getting increasingly good at a lot of other things as well.
And, you know, like, when I think about, you know, my own personal workflow, you know, how I tend to do work, you know you know, we've been partnering with Anthropic for a while. They introduced a tunnel. So we now have all of the the StackLock internal knowledge management systems are available through that tunnel to to to Claude. So it's not necessarily tethered to my desktop. I can use this on my phone.
I can ask questions on my phone about things that are in my email. You know? It's it's it becomes a very a very powerful way to kind of open up the system in a way that's that's sort of sustainable. And it just fundamentally redefines how people work. But I think the key thing here is that, you know, what most people are experiencing today and they assume is that this is really tied to the developer's machine.
If you look at the way that Claude code is is structured, that's, you know, coworker structured, it's using the developer's desktop as the aggregation point where all of the state is coming in and all of the outbound connections are being terminated. And now, you know, what I think we're gonna start to see is this need to be able to move that off the developer's desktop. So you actually have this controlled entry point into an environment where an organization can think more holistically about what services you wanna provide, but can also do it for Claude and OpenAI and Gemini and and all of the other technologies. So it's this this great democratizer. It's democratizing data access while preserving control, and it's doing it in a way that's not tied to a specific provider.
So you go through the exercise of of exposing your data once, setting up the policies once, and now you have the ability to, you know, use a a pretty broad cross section of models.
It occurs to me as we're as we're talking about this, and I think for anyone that's listening or watching that has been kinda in it, this is very, helpful, but I'm also a little bit worried about people joining. Could you as we could you kind of take a second and talk about what the stack looks like? You know, now you have MCP in there. You've you've kinda talked about some of the virtues of that and some of the problems, but could we could we actually take a second and look at what the whole stack looks like? Because we mentioned a whole bunch of different technologies over the last few minutes.
¶ Exploring the Technology Stack and Integration
And because I think one of the challenges I keep hearing this year is people trying to kinda keep up with all the new things that are in infrastructure. As you roll this world we're all living in this world, but there's a lot of questions and people are hearing things. You know, we've we've talked about other technologies that we haven't brought in, things like a claw clawed claw and open claw and all these other things, and people are so confused in conversations that I'm having out outside the podcast. Can you take just a little bit of moment before we move forward into some of the specific things that you guys are addressing and just kinda talk about the stack and how does it all fit together and how people should think about that a little bit Just to level set before we dive a little bit deeper, because we've already kinda gone into to some pretty cool stuff, and I don't wanna leave people behind.
Yeah. Yeah. No. No. I think it's and I totally respect that. You know, like everyone's at a different point in the journey. Right? Like a lot of us are geeking about, you know, how do we wire in this specific tool and optimize tool calling and deal tool pollution? And a lot of folks are being like, what is a tool and why would I use one? Right?
So I I totally respect that there's a pretty broad cross section. I mean, it's you know, ask me to describe the stack. It's a very open system, and there's lot of different interpretations of what the stack is. But let me let me describe, you know, what what problem that we might be solving, and let me describe the set of technologies that might be used to solve that problem. Right?
So I think, you know, right now, what most people are experiencing and what works really well for a lot of organizations is just go buy Anthropic. Right? Like like, you know, like, it's like, it's it's almost this world where it's like back in the days of the mainframe. Just go buy no one got fired for for buying blue, you know, like big blue. Like, you know, like it's the it's the warm blue blanket that tops you.
And Anthropic is definitely kind of, you know, starting to look and feel a lot like, you know, IBM did around the dawn of the of the PC. And and for good reasons, they're doing amazing work. Like, you know, like we are, you know, rabid fans of their technology. And so for a lot of organizations, the starting point might be, hey, let me go get cloud code, or let me go get x y z. And the starting point for them is like, okay, I've got this system and, you know, it has access to my local file system.
And so I can, know, if I wanna, you know, develop code, I can go grab some code and copy it locally, and then I can have it futz around with that code. Now I'll get to a point where I need more than the ability to just deliver, you know, this this sort of code. I wanna integrate this into, you know, a variety of different other systems. Like, I wanna be able to integrate this into technology like GitHub, or I wanna be able to integrate this into Slack, or I wanna be able to integrate this into to whatever. And the starting point usually is, okay.
Well, you know, Claude will offer its own kind of native integration system. So they've gone and partnered with Slack, they've gone and partnered with Google, they've partnered with a variety of people. And you get these basic integrations in place, and it works pretty damn well. You know? You you you get it.
You you you authorize it. Native integration works. But at some point, you're gonna start asking questions about like, well, what about all the other systems, you know, that I used to do my work? How do I start to expose those? And so I to raise questions around like, what do I need to build a bridge between this AI system that I used to do my work and this big wide world of other technologies out there?
And so for a lot of organizations, the answer is I need an MCP platform. And I think there's kind of what I think of as being four pieces that go into that. The first piece that goes into that is I need a runtime. I need somewhere where I can run this thing so that it's it's it's hosted. In the case of, you know, most development technologies, people go NPX run.
They basically just pull a package onto the Internet. It runs locally. God help you if that happens to have been exploited by a hack or something like that. And so, you know, having a a secure runtime environment for this kinda makes sense. The second thing you need is a registry.
Like, what service do I wanna use? Like, how do I know whether they're good or bad? You know, can I actually, you know, provide a list of service to my organization that I might use? So the registry is the next important piece. The the following piece of that is a gateway. So, hey. Okay. Here's these service. Like, I wanna run them in an environment. I wanna have a single endpoint that exposes them to something like Claude or to Codex or to any other systems.
You need that kind of gateway technology. And then the final piece is what I think it was a control plane. And as you go from one to 10 to a 100 to a thousand servers, as you're starting to reason about mapping service to specific user groups, you need to be able to do that. And so that kinda starts building up this this kind of what we think of as the the sort of MCP gateway system, like the the the MCP kind of platform system. But that's not necessarily all that people need.
The other thing that, you know, people start to look at is also, well, what about the LLM gateway? Like, maybe I wanna start building my own agents. Maybe I want to start reasoning about using a variety of different models. Maybe I want to be able to institute my own tracking and policy management around, you know, we can talk to what. So an LLM gateway is is another compliment.
I think of those as being kind of two bookends to any kind of agentic platform that that a real world, you know, organization wanna use. You need an LLM gateway so you can start to direct traffic to a variety of different models and and and assert controls. And then you need this kind of MCP gateway so that you can start to connect real world systems. And then between those two bookends, you know, it it gets really fun and interesting. You know, we can talk a lot about harnesses.
We can talk a lot about, like, memory management system session management. Gets you know, there's a lot of, like, moving boxes. We can talk about agentic frameworks like N8N and how they fit into that or, you know, or crew or Landgraf or, you know, there's this that that gets more and more and more detailed. But what I tend to think about is and the guidance I tend to give most enterprises is like, look, you know, start with a vertically integrated system and then see how far you can get. Then start to assert, you know, assert an appetite and ask questions like, as for our developers on cloud code, so for our knowledge workers, what does it look like to get there?
And you will inevitably, you know, realize that you really do need these two integration bookends. You need an MCP gateway and you need an LM gateway. Those two things, you know, typically you wanna kind of deploy together. And then there's gonna be a lot of other constituent pieces that you might pull into that to start creating really great experiences for your knowledge workers that that kind of decouple you from your, you know, your vertically integrated AI platforms. I don't if that's helpful, but that's just how I think about the space.
No. It is a great good explanation there. I appreciate that.
¶ Sponsor: Prediction Guard
If you've been listening to the show over the past few months, you realize just how transformative AgenTic AI is, whether that's Claude Code or Hermes Agent or custom built software that you're deploying for operational efficiencies or as new products to your customers. Regardless of your maturity now, this is the world that we're headed towards, this agentic AI world. And there's a lot of security and governance teams that aren't letting these agents go into production because of risks related to agency and autonomy and how do you take care of things like prompt injections or insecure tool usage. There's a lot to take care of, and that's why I'm personally spending my time outside of the show working with an amazing team of AI engineers to build Prediction Guard. Prediction Guard is an AI control plane that you run-in your own infrastructure behind your firewall.
Developers can build on top of this control plane using everything that they want to use OpenAI and Anthropic compatible APIs, MCP servers, frameworks like LangChain, but all of this is plugged into a built in governance harness that enforces your organization's AI policies and all of
that
telemetry goes to your monitoring and alerting systems. I would encourage you to check out what we're doing at predictionguard.com/practicalai. You can schedule a demo with me and the team, I'd love to get your feedback on what we're doing. So visit us at predictionguard.com/practicalai. That's predictionguard.com/practicalai.
¶ Identity and Authentication in MCP Systems
Well, Craig, I I have a bunch of I I I I don't know how many I'll get to fit in, but I have a bunch of selfish questions just as a practical developer of of of some of these things. I I think one of the things that is some or or or maybe you could help help people understand is you have this, let's say it's an MCP server for, let's say Salesforce or, you know, HubSpot or whatever, that is running somewhere at a run in a run time, like you said, it's it's hosted somewhere. Then there's this, like, identity, authentication piece that I think is is often very confusing for people or maybe a lot of times if they're building their own MCP server, they say, oh, well, here's this API to x system and I have an API key for that API, so I'll set that as an environment variable and just all of my traffic will go to that API, but then you lose that identity piece for who who's using that. Do they have access to the data they should or shouldn't have access to? Could you help us understand like how that piece fits in the identity of the user, the authentication with the MCP server?
What are the kind of best practices around that and some of the things that people could think about?
Yeah. I mean, I think there's there's there's a lot to unpack here. Right? And I think, you know, there's the the world that is, and there's the world that we hope to move into together. Mhmm.
You know, think this is probably you know, it's funny. My my buddy Joe, who I've worked with for years, you know, we built Compute Engine together and Kubernetes, and Heptio, and Tanzu, and like now he's my CTO here at at StackLock. He wrote the SPIFFE paper. I don't know if you've heard of SPIFFE. It's it's an identity system that's kind of a sort of the sort of zero trust, you know, kind of an entity framework.
And he wrote that paper about ten years ago. I think we're finally now at a point where AI is the thing that's gonna kick us over the over the line to actually move past, you know, relatively traditional kind of OIDC based systems to something like that. But let me kind of, you know, tear tear this apart into pieces. So, like, so first and foremost, MCP as a specification was really grounded in, like, all of two workflows. So the idea being that, you know, the way that sort of AdSropix certainly looked at the world is you have a user, they're using Claude, they have an OIDC token, that token can then get pushed onto an NTP service.
It's basically identifying the user to the server. And then what happens on the back end of that is broadly an exercise to the reader. So basically, you know, like whatever you wanna do. And I think there's really two problems that have to be answered. The first is an authentication problem, and the second is an authorization problem.
Right? So, you know, and and and you know, obviously, the set of resources that that you're accessing are going to be, you know, sort of varied. The only thing that we really have right now that's that that works on most organizations is the existing OIDC kind of tokens. And so I think we just have to accept that's where we are. But over time, as we start to build agents, agents are going to have to have their own identity.
And it cannot be, you know, as simplistic as as the way that we've structured identity today, because it's really gonna be this kind of three legged stool. There's what I think of as a service account identity that's effectively identifying the agentic endpoint. It's like you're speaking to this specific agent. There's a set of claims that are basically provided or presented to that endpoint based on the role that the owner of the agent is provided. And then there's a set of on behalf of claims that are gonna be inherited from the user who's accessing that agent.
And that could then get changed through a variety of things. So there's a lot of really interesting work being done both in the MCP upstream specification. You can start looking at things like, you know, the transaction tokens and there's there's there's a lot of innovation happening in the IDP space around this, but that's that's only gonna help us tomorrow. It's not gonna help us today. Know, we actually have to get through the the definition implementation of these these systems.
¶ Token Exchange and Security Patterns
And so for for most people today, I think, you know, what what tends to work is you you first need to institute some kind of token exchange. So typically, you don't wanna be in a situation where you you receive a user credential and then you pass it on to another system. You you we want to make sure that you're descoping the claims to the minimum set of of claims necessary to perform a task. And so typically, what we tend to do when we work with organizations is institute some kind of token exchange. So it could be, you know, and there's four or five different patterns here that that that might make sense.
You have, you know, straight pass through to, you know, where the API receives an OIDC token. You could have federated trust where you have to basically exchange the token to another federated trust domain. You have the situation you talked about where you basically have to exchange it for an API key. And you need to be able to make sure that, you know, that that action is actually pulled out of the the the agent's kind of purview as handled individually. So that's that's what we think of as being one of the primary roles of a technology like Toolhive is that it it starts to formalize that so that the the MCP tool developer doesn't have to deal with all of these mechanics of of token exchange.
That all hand is handled in the proxy layer, you know, for the user, and then you just have to start setting up, you know, and reasoning about how you want this to be handled. A very common pattern that, you know, we tend to work with people to do is, like, I wanna use the AWS MCP server, and I wanna use in read only mode. Right? Because, god help me. Like, I'm I have an agent that's running on my desktop.
I don't wanna keep watching it and having to scrutinize every time it interacts with the system. But I certainly don't want it deleting my RDS instance, you know, just because on a whim to to clear up an issue. So how do you configure that to support read only mode? And one way you can do that is actually just implementing a token exchange where you take your Okta token or whatever, map it to an AWS token, descope the claims, and then hand that to the MCP server or to the to the back end API to actually, you know, pass through. And that's the kind of pattern that I think a lot of people, you know, wanna be able to institute.
But it's fiddly and requires a fair bit of work. So you really need a platform team that's willing to do this work on behalf of users and and recognize this four or five of these common patterns. The other thing that's that I think is really important is is the authorization side of the house. You know, I think most authorization schemes today schemas today are already grounded in the idea that, you know, it's you you have deterministic systems that are accessing it. You know, having an unsupervised system that's starting to access resources means you really wanna start pulling out a lot more policy and and so put a lot more scrutiny on on on tool calls.
And so one of the patterns that we see, you know, being very helpful is, you know, relying on the existing auth auth c systems, you know, to to decide whether the agent should have access to it because the agent's acting on behalf of the user. But then start to describe additional agent only policy as code capabilities that you apply to all your MTP servers. So you can start to describe those in a in a technology like Cedar or Rego or what have you. And then if you've got a common proxy, you know, system, you can start to apply that to to every every tool call that you're making. And so I know I know that's maybe like a little bit too, you know, specific, but, know, I I think you do need to separate out those two things.
And unfortunately, there's no easy answer when you start, you know, kind of having to deal with with with with with things like token exchange or or credential mapping. The the the one piece of hope I could I can give teams is that if you have a platform team that's willing to take this work on, it is relatively easy to get to a point where you can just start to have relatively vanilla servers that rely on platform delivered, auth and auth capabilities, and you can just kinda snap them in and use them.
¶ The Role of Proxy Layers in MCP Connections
And you you mentioned Toolhive, which I think is is super fascinating. And and I I wanna, I wanna make sure, our our listeners kind of understand also this this proxy layer. I maybe a way to frame this question is I I could perfectly well in some of the AI APIs, whether that's OpenAI, Anthropic, etcetera, sort of on the fly, insert information about what MCP server I want to call and just handle that at my application layer. Right? Why why is a a proxy layer something that is is helpful for people in terms of proxying those MCP connections rather than kind of integrating that at the application level?
There's there's several different, you know, reasons why you wanna institute a proxy. I mean, first is is basically a visibility and governance. Right? So let's imagine you're building a system where, you know, you have your recruiter and they wanna schedule an interview, and that interview is gonna touch three different systems. And something's going wrong and you need to debug it.
Like, it it's it's you know, meetings are showing up on your calendar, but they're not showing up on the candidate's calendar or something else. And you'd like, you know, like, if you have a if you have a proxy, which is basically and you have a tool, which is now describing a simple system like schedule interview, and it's it's it's it's it's kind of amalgamating those pieces. You can start to, you you see a trace through the whole system. So, you know, when you have these workflows that are relatively complex and touch multiple systems, by having that single kind of proxy layer, you can start to generate observability. You can start to apply policy.
And so it's just from a general hygiene perspective, it makes a ton of sense. A second reason why you may want to have that kind of that proxy or that that kind of gateway technology is is optimization. Right? So one of the things that and I I don't if this is too deep for, general folks, but one of the things that you hear a lot about is tool pollution. Right?
So an MCP server has a tool description associated with it. So, you know, one of all one MCP server will have multiple tools and resources. Each of those tools and resources has a description associated with it. When you want to make those tools available, those resources descriptions are in the the context window, you know, all the time. And that might you know, over time, if you pull in three or four different MCP servers, you may have a 150 tools.
You may be burning twenty, thirty thousand tokens every every interaction. Just just saying, hey. By the by the way, here's the tools. Input token caching helps somewhat, but only to a certain point. And so being able to start, you know, kind of, you know, basically amalgamating that and basically saying, hey. Here's two endpoints, find tool and bulk tool. Yes. It's gonna be more chatty. Meaning, the LLM is gonna go, okay. I need to use a tool.
What tools are available? You know, find tool with a description we're trying to accomplish. And then, you know, provide back a list of tools that actually meet that description. So it it reduces input token consumption by 80 to 90% when you have these these things. And so that's a very big deal versus, you know, just allowing the model to access it.
Tool selection, you know, particularly, you know, when you like, when you when you're you're working with OPUS four seven, it's just so damn good. It really doesn't matter. It's gonna figure its stuff out. But the minute you start dropping down to Sonata or Haiku or one of the smaller systems, or if you're trying to build an autonomous agent, one of the hardest problems is making sure the thing calls the damn tool when it's supposed to call the tool. And smaller LLMs are notoriously bad at tool invocation.
And if you start putting twenty, thirty tools in there, forget about it. It's just not gonna happen. But if you if you replace that with a single endpoint that, you know, can provide much more, you know, fine grained guidance and distill it down to just the sort of actions that a a a system wants, you can get back up to the sort of 95, 97% threshold that actually makes the system useful. So it it drives behavior there. And then finally, less clutter in the context window generates better results.
Context optimization is another big, you know, point of it. And then the final piece of it is just, you know, like, you know, project or user based views. You know, sometimes you want to construct a set of tools that are specific to a task. Let me give you an example. If you're working on a GIS system as as a developer, that's a, like, you know, kind of mapping thing.
Feature means something very specific. Right? It's a collection of vectors that describes something on the terrain. Right? If you're interacting with a GitHub MCP server, feature means something completely different.
And if that developer's talking about features in Claude code, it's gonna get that thing completely confused. Right? So being able to start, you know, formalizing the nomenclature of like, instead of just describing this as a feature, but like describing this as a GIS feature or something like that versus a product feature and being able to kind of, you know, sort of augment the tools with something that's semantically more relevant to the task at hand, enables you to improve the behavior. So the other the other reason to kind of institute this type of abstraction is that you can also start to create much more fine grained tuned views for specific agents, user groups, etcetera. That takes up another tool and makes it far more intrinsically useful.
Such a great explanation there. I really appreciate that. I I think, you know, one of the things that we were were kind of mentioned by name a moment ago was Toolhive. And as we are kind of taking the concepts that you're sharing with us and and diving into how you guys are approaching, you know, the the proxy issues and stuff. Could you could you for those who haven't had any exposure to Toolhive, could you could you take us into what that is as a as a solution and kinda define how it fits in with some of the the context that you just now addressed, that'd be fantastic.
¶ Introduction to ToolHive: Building the Yellow Brick Road
And let's be clear. Toolhive is an open source project. It's Apache two licensed. You know, my background ahead of Kubernetes, it was a great open source project. I bootstrap CNCF. I love open source. I love communities. This is an invitation for people to party on, you know, with us in the open on this technology. There's nothing there's no greater compliment than discovering someone who's using it and and, you know, reaching out later or four k. I don't really care.
It's open. That's what it's there for. Right? And so what we built with Toolhive, the the the philosophy of Toolhive was really this, which is, look, Anthropic, OpenAI, Google are describing the Emerald City. You know, they're telling us about this beautiful place in the future.
Someone needs to build the yellow brick road. You know, someone needs to build the the basic procedural things that enable you to actually get to that that destination. And so we started looking at, you know, a technology like MCP, and we're like, oh gosh. This has to be done right. It just has to be done, you know, to enterprise standards.
It's such an important thing. So we started asking questions like, well, look. We don't have to reinvent the wheel. Like, there's a lot of great technology that came out of the cloud native ecosystem, which is something that I was very intimately, you know, kind of a participant in in sort of in in in shepherding into existence. Can we take a lot of the learnings, a lot of the technologies out of cloud native ecosystem, and just repurpose them so that they work really well in the AI native world?
And so starting point for us was like, hey, that Linux application container, it's the foundation for Kubernetes. Let's just put our MCP service in a Linux application container. You know what that means? Well, for an enterprise, that means that it's an OCI image, and they know how to reason about and harden and scan and validate that that that that that image. We can complement that.
And so what we've done with Toolhive is not just, you know, hey. It it runs through your full SDLC the way any other piece of technology that you're deploying does. We also do a lot of, like, MCP specific scanning and and reasoning. But, you know, so so we basically provided a pipeline that you can basically generate a container and then deploy it in a runtime environment. The second thing that, you know, we started looking at was like, well, there's you a lot of servers out there that are really useful.
Like, the fetch server is probably one of the most commonly used server. Hey. I have an agent. I wanna be able to access something off the Internet. I wanna use the fetch server.
I might just have given that agent access to the totality of my Internet if it's running behind my firewall. Right? Like, how do I constrain its view to like, I just wanted to fetch documentation. So how do I turn the fetch server into my fetch documentation server? And the way to do that would be to constrict which network endpoints it can talk to.
Turns out containers are already great at doing that. So by wrapping it up in a container, you can start to say, hey. I'm running this thing. I don't want it to access my personal photos, you know, so I can describe which portions of the file system can access. I can describe what network endpoints it can access.
So it becomes a secure environment to run these MCP servers that you can then, you know, control. And you can, you know, turn them up locally on a developer's desktop. You can turn up in the in the cloud with with Kubernetes, and you can run one, ten, or 100 of these things. The next thing that people tend to encounter is this idea of like, well, I want my developers to be able to find and use MCP servers, but I want them to find servers that are vetted, trusted, etcetera. So the registry becomes a very natural part of that.
So basically, being able to describe to a client that speaks to registry protocol saying, hey. Here's the MCP service for your organization. Here's where you can find them. And, you know, whether they're being downloaded and run locally or whether they're just being accessed via proxy at a a sort of hosted endpoint, the registry is a very important part. So we've built out an MCP registry.
You know, we provide tools and capabilities that allow you to harden the images to your taste. You know, we'll we'll we'll provide a prepopulated set of of images that we've scrutinized. We scanned. They're they're they're coming in out of the community, but we we stand behind them. You know, we we we we hold to a certain standard.
But we can also enable people to start layering in their own attribution. Like, hey. I want additional scrutiny on these things. And the registry becomes that that critical control point, and it becomes the place we start to describe the policy that follows that server down into the destination where it's running and enables clients to discover those servers. And then the the, you know, the the other piece I talked about is this kind of what we think of as the VMCP gateway, the virtual MCP server.
The ability to say, for this set of users, I want to expose this set of tools, and I want them described this way. And some of those tools might be composite. You know, I might instead of, you know, saying to the to the the the agent that my recruiter is using, hey, here's Google Calendar, here's whatever, here's whatever. Maybe I wanna build an NTP server, which is where it has a single endpoint, which is schedule interview. And then there's a a sort of declarative workflow behind the scene that actually goes from system to system and and binds that whole thing in a transactional context so that that either passes or fails atomically.
So you don't have, you know, you know, some calendar invitation showing up here if they're not showing up there. So you can start to, you know, build out those capabilities where you can take basic MCP services building blocks and create this this virtual view on them that that's that's really tailored to specific user cohort, etcetera. And so that's another part of the platform that we've built. And then the final piece is just, you know, what we're you know, one of things we've observed is when we built the system, most people were running these servers locally, but we're seeing, you know, 50% month over month growth in the Kubernetes use of this technology. You know, like, it's it's it's astonishing how quickly we're seeing people actually adopt the ability to run MCP servers in a in a Kubernetes destination.
And so, you know, we're getting millions and millions and millions of of of of tooling vocations from the the Kubernetes side of the house. And so that that that Kubernetes control plane is the the final piece of it.
And you you mentioned some of the, yeah, the the Kubernetes side of things, the declarative nature of some of that. I think working with Kubernetes at at certain points, I'm certainly no expert, but I one of the things that's always, of course, great feeling is to sort of have that declarative workflow where I say I want I want this to be the the state and that sort of happens on the on the back end. Right? I'm I'm wondering how you see that infrastructure side of things developing because now because the interface that we have as developers, infrastructure, DevOps people is a lot of times now in natural language, sort of declarative in its in its own sense. It seems like that Kubernetes control plane and, you know, maybe the the downstream things like the tool hive and and other things that would be declared that way would be very natural to, to to manage and configure, via via natural language.
¶ Future of Infrastructure and Agent Management
Is that is that something that yeah. Yeah. I I guess how do you see that developing and and how do you see that fitting into kind of this? Because these systems, like you say, maybe it's all of a sudden I have 600 agents or I have 900 agents. Everyone's on a different maturity path here and maybe some people are listening to this.
They have one agent right now. But I think in the future, there is a future where they'll have many many agents running in their in their environment and that that can be very scary infrastructure wise as well.
Yeah. I think, you know, one of the things that was beautiful about Kubernetes, you know, like it's and this is a testament to, you know, like Joe and Brendan and, you know, some of the earlier people that worked on it. And and then, you know, also a lot of the the sort of hardcore Google engineers that had been sweating the details on this the systems. But this idea of kind of reconciliation driven infrastructure. Right?
The idea where you basically can chew off something, describe how you want it to be, and then have a system that is, you know, solely responsible for making that true. Right? And so I think there's there's a lot of different directions we can go with this. You know, one is, you know, those reconcilers today are in principle, like, deterministic systems. I mean, no system's really deterministic.
Anytime you you're you're dealing with the real world, entropy has a way of creeping in to anything that you're building. Right? Just by virtue of the fact that, you know, life is chaotic, that the world is chaotic. But now we're introducing there's the there's certainly the the the possibility that we can start to have stochastic systems driving reconciliation loops. Right?
And that's that's the direction we can get, you know, kinda start, like, leaning into, you know, where, you know, we can start to, you know, describe what we wanna have happen. And then when something goes out of conformance, invoke a stochastic system to reason about why it's out of conformance and then start driving it back into conformance. So I think one of the things that we will certainly see over the next little while is the ability to have self annealing, self healing, self optimizing systems. So you'll be able to describe what you wanna have happen. It'll basically generate the YAML and manifest, hand it off to Kubernetes, and then you'll just have very smart systems that are are watching it.
And when something goes out of conformance, it can, you know, potentially pull in. You know? And, obviously, it'll try to reconcile it. And if it gets to a point where the reconciliation's not working, like, hey. This pause and crash, you back off.
I'm at the at the boundaries of what a reconciler can currently do. That's when you will have the opportunity to start pulling in stochastic systems to drive it. I think that's gonna be a very interesting direction for us as, you know, as we, you know, just get even further out of the infrastructure. We just give it to the infrastructure and let the infrastructure run it. Now in terms of, like, what's necessary to run agents, I mean, there's a lot to unpack there.
You know, I think that we will certainly see Kubernetes as patterns. I think we do need to start reasoning about, like, what is the packaging, you know, definition for an agent look like? How do we you know, maybe is it is it something as an OCI, you know, entity? How do we make other systems available and, you know, like like, hey. I want this thing to be able to, you know, generate and run code as part of its behavior, but it needs to be isolated.
And so I think there are gonna be a number of agent specific platform systems that have to be added that can then be, you know, fit into that control loop system and then described, you know, as as either tools or other, you know, abstractions to to agents that are running. And then I think that the the harder question, and this one I don't have an answer to, and I think if there's anyone on this podcast that knows the answer to this and and has a really strong theory around this is, you know, like, actually tracking agent behavior and then what a reconciliation loop looks like when you want to start bonding agent behavior. You know, certainly, evals are pointing us in one direction, kind of human evaluated, you know, kind of human the loop style systems, you know, being able to signal and, you know, sample aggregate signal on on certain patterns. You know, having other, you know, sort of agents watching agents where you can start to, you know, have a a sort of a watching agent start to reason about the state of another system or behavior system. There's a lot that has to be done there.
I don't know exactly what that pattern looks like yet. You know, we're certainly playing with ideas. You know, we've we've we've penciled out a few things. We've built a few things ourselves. But I think we we still are just learning as a community of of, you know, whether, you know, stochastic reconciliation, you know, outside of, you know, you know, performing remediative action is is, you know, like, what that looks like.
And we haven't we haven't yet got there, and I think that's something that we're gonna have to think about as a as a community.
Well, you you've already kind of started going the direction that that we we usually end up the show on with with a guest, which is kind of looking forward to to to what's next. You know, you mentioned some of those challenges that are yet to be yet to be addressed in the community. What are what are maybe just as we wrap up here, what are some of those things that you're excited about that are maybe coming within the ecosystem and you see developing that you think would be transformative or or things that maybe it is other things that you're kind of when you're laying in bed at night, you're you're thinking about these problems. What what's at the top of your mind
kind Yeah. Of going into
What what's at the top of your mind going into into this next season of MCP and agents?
I I think the thing that I'm most excited about is, like, you know, as for developers, for knowledge workers, meaning, you know, what Claude has done, and, like, when you when you look at an individual like, I look at my team of Right? And like, I look at our performance. And, you know, we we are very deliberate about instrumenting our our our code, you know, like, we we have DevLake deployed, you know, all of the developers have, you know, hooks. We know exactly which agents they're using, how they're using. We can we can, you know, correlate the the the behavior.
We we treat it as a performance it's almost like a performance sport. Like, my developers are not performance athletes, and they're, you know, they kind of wired up, and they're they're they're they're and and we can see what's driving productivity. And the thing that's driving the most dramatic productivity from our developers is what I think of as agentic concurrency. So being able to have a system that they set up where they'll have 15 different agents, each with a slightly different configuration recipe role performing a set of tasks with access to tools that are highly controlled so that they're basically running in YOLO mode. You know, it's it's sort of on on the path that kind of dark factory story, but there's still a a human operator, you know, spinning plates, like, you know, having somewhere between five and and fifteen agents concurrently running.
And the productivity is dramatic. Right? Like, it's look. It's costing us a lot of money. We spend burning a lot of tokens, but it's more than making up for that in terms of productivity.
I mean, I I I track our weekly productivity. This last week, our person our engineering team's throughput went up 60% in a week just because as the team is starting to get better at at sort of systematic, you know, concurrency, our ability to deal with community issues. You know, we're finally over the threshold where we're actually able to burn down issues faster than they're coming in. Like, everything is changing. What does that look like for knowledge workers?
And that's the thing that I'm most excited about. Because I'll tell you now, like, there are things that are the same, meaning, you know, we will get knowledge workers at that point where they're able to spin plates and imagine themselves as orchestrating a lot of things. But there's a lot of things that are different. The developer's desktop the the the desktop just cannot be the aggregation point. Their threshold for pain is a lot lower.
They cannot be trusted to kinda build and run MCP servers. Right? That that has to be provided to them. They really need to be served by a platform team. But I think we can give people superpowers.
I think, you know, the the the productivity gains we're seeing on the development side will translate to every other function. If we can just start to learn from what's really working well and, you know, you know, I I I love what Anthropics done. Like, it they really write letters from the future. And if you just sit down and bother to read them and then think about what this looks like through the lens of other domains, there's a lot to be there's a lot to be gained in.
That's awesome. Well, I I appreciate you taking time today, and also thank you from from the community for the great work that you and the team are doing on on, Toolhive and and other things. And, we'll look forward to having you back on the show to to talk about it in the future. Thanks, Craig.
Hey. Thanks for having me on.
¶ Outro
Alright. That's our show for this week. If you haven't checked out our website, head to practicalai.fm, and be sure to connect with us on LinkedIn, X, or Blue Sky. You'll see us posting insights related to the latest AI developments, and we would love for you to join the conversation. Thanks to our partner, Prediction Guard, for providing operational support for the show.
Check them out at predictionguard.com. Also, thanks to BreakMaster Cylinder for the beats and to you for listening. That's all for now, but you'll hear from us again next week.
