Hello, and welcome to the AI Engineering Podcast, your guide to the fast moving world of building scalable and maintainable AI systems. Unlock the full potential of your AI workloads with a seamless and composable data infrastructure. Bruin is an open source framework that streamlines integration from the command line, allowing you to focus on what matters most, building intelligent systems.
Write Python code for your business logic and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. With native support for ML and AI workloads, Bruin empowers data teams to deliver faster, more reliable, and scalable AI solutions. Harness Bruin's connectors for hundreds of platforms, including popular machine learning frameworks like TensorFlow and PyTorch.
Build end to end AI workflows that integrate seamlessly with your existing tech stack. Join the ranks of forward thinking organizations that are revolutionizing their data engineering with Bruin. Get started today at aiengineeringpodcast.com/bruin. And for dbt cloud customers, enjoy a $1,000 credit to migrate to Bruin Cloud.
Your host is Tobias Maci, and today I'm interviewing Steven Watt about how to adapt your existing infrastructure investments to support your AI workloads and gain AI sovereignty. So Steve, could you start by introducing yourself? Sure. Hi. I'm Steve Watt. I lead the, office of the CTO at Red Hat, which is, our research and emerging technologies functions that are very focused right now on generative AI. And do you remember how you first got started working in this overall ML, AI, and data space?
Yeah, sure. It was actually literally twenty years ago. I was just reflecting on that. My first project was with IBM Classification Module, which was a Bayesian classifier. And I was building a prototype that took a newscast and did speech to text and then did classification
on the sentences in the text, and then would say whether it was what topic it was, and they would pull in content in another widget from the web related to that in real time. So it was a little prototype we were working on, but from there, I sort of went into Hadoop, and so we were sort of building big data analytics, but then the HBase project started to be used to produce,
what we call today predictive AI, you know, sort of, you know, basic classification modules and things like that, classifiers. And before we dig too deep into some of the specifics of the how, I'm wondering if you can give an overview of what you mean by this idea of AI sovereignty. Yeah, sure. This is such a great question because I think it just depends which lens you come at it from.
So I think if you're an existing enterprise, so a company, it's sort of your ability to exert control over your operations, your infrastructure, your data, and then also hit your compliance metrics in a specific way, right? And so typically, like layman's terms, it means being able to keep all your stuff in your particular geography, and that implies hiring people from your geography as well, making sure your data doesn't leave that, and then keeping up with the compliance
regimen from whichever compliance authority you have to comply with. So that's one lens that you could look at it, and Red Hat's got a suite of capabilities and consulting offerings to help our customers do that. But I think there's a broader angle of describing this as well, which is that, and it's the nation state angle. So nation states are there, hopefully, to empower their populace.
So nation states are quite interested in ensuring that their particular geography or whatever they're responsible for doesn't get left behind. And so they are ensuring that their people, so researchers, general citizen populace, startups, general companies, are well equipped to take advantage of this generative AI boom and capabilities.
And that typically, there's a pattern that's going on across the world right now on this, which is basically the government will create an entity, and that entity will be responsible for producing platform infrastructure with GPU capabilities to be consumed by that intended audience. And often, that will also be complemented with some sort of innovation hub that provides discounted compute access to this infrastructure and sometimes startup funding and mentorship to leverage that infrastructure.
And so I actually have been focusing way more on that ladder and trying to understand, you know, I think unless you've been living under a rock, you're aware that geopolitical tensions are rising, And this is also creating demand for self determinism by nation states.
And so having these own capabilities, not just to support what their citizen populace needs or their commercial entities within their country, but also ensuring that they have the ability to determine their own technological future and aren't dependent on areas where they don't have control or agency.
So you brought up some interesting questions in there about the idea of how AI sovereignty manifests, but also whose responsibility it is, where it can be a motivating factor of that nation state where it's something from their governmental entities that are driving that requirement, or it can be due to regulatory pressures on the organization that is trying to ensure that they are compliant with whatever government they are subject to, or it could be related to those same regulatory pressures.
But because you're trying to be able to access a particular market that is subject to those regulations. And I'm wondering how you are seeing the people who are doing the actual work think about their relationship to that overall motivating force and how much that changes the ways that they think about how to approach the overall solution.
It's, you know, it's it's on the nation state angle versus, you know, I'm a customer that has this regulatory compliance that I need to back into. In the same way that every customer is kind of different, I think every nation state is kind of different. And so your question is specifically Red Hat. We're sort of the clearinghouse for all where all the different open source projects will come through to sort of address these varying needs.
How are we reasoning about all of this and what to do where? It's kind of interesting. So if you look in The United States, The United States are wanting to have their own open models, right? And so this is interesting because if you look at a lot of where the open large language models are coming from, they're coming from China. So you'll see examples of like Reflection AI, which if you look at their tagline, it's sort of The USA's answer to DeepSeek, which is an equivalent,
highly relevant Chinese model. And so you see that in The US. You see in The United States, Massachusetts Innovation Hub, and Red Hat just announced a partnership with them where we're helping the startups in the state of Massachusetts leverage the compute infrastructure that's part of their compute cloud. So you're sort of seeing that in The United States, and then you're seeing a desire for states to leverage research IT in universities to create the data infrastructure that
their target audience can use. It's being, in some places, very focused on both classic AI, traditional predictive AI, and generative. In Europe, there is a sort of, I'd say what it seems is, less a focus on, hey, we need our own non Chinese foundation model. But this capability to do, it's much more of a self determinism focus than we see in The States. And there is a lot of conversation around cost, energy consumption.
There is a kind of an interesting angle here as well, and like from a different strokes with different folks. So like if you reason about The States, which wants to, a lot of the sort of high growth startups, they're wanting to leverage the latest and greatest GPUs, which are water cooled. Most of the data centers that we have in the world are air cooled. It's tricky to retrofit an air cooled data center to be a water cooled data center. So we're seeing this explosion
across The United States of these data centers being built. I live in Texas. We have two being built by the Stargate ones, one up in Abilene and one in what we call the Valley in Texas. It's not the Silicon Valley, it's down by the Mexican border by South Padre Island of Spring Break fame. And so we're building new data centers for that. But the state has a lot of land, right? And so if you sort of reason,
you can't retrofit, you have to build new. This is sort of a challenge for Europe, which is, okay, you want sovereignty, you want to build out these large data centers, can you? Do you have the land, and do you have the power to be able to power these new black walls that are very energy hungry? And so I think there could be this different angle around this conversation,
which comes back to your original question, Tobias, like what are people making? I personally think Europe's going to have a mixture of what can I do with what I have? And not just Europe, actually, I would say developing nations. So this is broadly true of the African continent from which I hail. And also,
I'm going to PyTorch India next week to talk about that, and I'll be talking about CPU inference there as well. And so there's going to be a demand for lower power inference, and I think that's something else that's being built in the VLLM project, like VLLM CPU, but you also see you know, IBM has SPIRE, which is a low power inference accelerator. There's a lot of new technologies that are being built for these different types of sovereign markets.
Another aspect that factors into some of that overall question of AI sovereignty and pointing to what you're discussing around the hardware requirements, the energy budgets, is the investment that is going into different model architectures to allow them to be faster and easier to train as well as more efficient on the inference side.
And I'm wondering how you're seeing that factor into some of this overall question of where and how to invest in developing capacity and capability to create new AI models or AI capabilities. One would certainly reason that's the case. And I know there's post transformer architectures that are coming out. My team, which is a mixture of research and emerging technologies, is looking at the space. So like diffusion models,
there's a VLLM Omni project that we're paying quite close attention to. That is a space. It's just not, right now it's very focused on image and video, and so it's like non text generative. And it's still emerging where the actual, if you're not in a company that typically leverages image and video, like how you actually leverage these new architectures for, a bit weird to say, but like your classic generative
application use case, it hasn't been around for that long. But I think we all know that there is at least some major patterns you know, around text editing and simplification and things like that. Know, could these neural architectures be much more energy efficient at this? And I think this is interesting, and I think you're touching on a broader, more interesting arc, which is, you know, simplistically
put, right? There's like kind of what we've done with generative is brute force. You know, let's throw more servers at it and they'll consume more power, but we'll have models with more and more parameters, you know, and like I once saw Dario, who used to head IBM Research, had this slide which basically just showed if the art continued, that it would eventually require a cluster that would consume more power than planet Earth had on it.
So obviously it's not sustainable. And I think the interesting piece is like, if you compare, we did this experiment a long time ago with Watson, IBM Watson, if you remember the competition against Ken Jennings with Jeopardy! Right? And it was a fascinating battle of the minds. But I was talking to a gentleman last night at a roundtable
who'd actually worked on that project, and he had this sort of line. He said, Well, in order for Watson, which was a huge Hadoop powered cluster to compete with Ken Jennings, you know, under the floor, there was a whole data center that it was using to answer this. And Ken was basically the amount of calories he was using to answer the, you know, Ken was just powered by nuts and berries, effectively, as opposed to just these
huge amounts of power in the cluster. So I think that's a next arc, which is get this brute force, very expensive, power hungry AI. Like how can we just, without even improving the reasoning performance, how can we get to that same yield at a fraction of the power? And there's lots of areas of science that's pretty interesting, like neuromorphic computing,
that is closer to how the human brain works and consumes energy. And so I also think that's another area that we haven't really seen much of occur. There are people working on it, but there haven't been many breakthroughs that are popping up onto society's radar on that space right now.
Another interesting corollary to this question of AI sovereignty is the related and, I guess, previously focused on term of data sovereignty, which was a big focus, particularly with the introduction and enforcement of GDPR.
And I'm wondering how you're seeing those two terms interrelate, and what are the areas where you're seeing them be or just kind of the the juxtaposition of those two ideas of data sovereignty and AI sovereignty and how they relate and maybe the cases where AI sovereignty is a more immediate pressure point?
I think it's a little bit to do with so GDPR is like a compliance regimen, a policy that if you have data infrastructure, you need to comply with, which is it's sort of more the stick than the carrot from an opportunity aspect. I think there's an opportunity angle around data sovereignty. So if you drive down Highway 101 in the Bay Area, you see all these billboards of these different AI companies. But effectively, they're GPT wrappers, basically.
A bunch of guardrails and prompt not guardrails, but a bunch of clever prompt engineering around, basically, GPT's capability allows you to either create interactive or autonomic agents that solve a particular set of business capabilities. But in some sense, if OpenAI ever wanted to get into that game, don't really have much of a moat, right? And they understand their business and the industry,
but they're really heavily relying on the intelligence within a GPT model. And so really to create some sustained differentiation for your country or your companies within your country, you want to really start adding existing value to a pre trained model through post training, retrieval augmentation, and that implies you've got your own data.
And I think this is the angle, which this is the carrot, rather than the stick of thou shalt comply and be able to serve up information if someone makes a request on data you have about. I think this is the opportunity. So I haven't thought too deeply about that topic, but that is the one angle I would say.
Yeah. The thing that came to mind is what you were just referring to of the value of having your own data that you can use for powering and hydrating these different capabilities of these generative models and these agentic use cases is that that can be the differentiating factor between just taking a frontier model off the shelf from whatever API provider versus one that you are actually using in house because it has that agentic harness with that contextual corpora available to it.
Yeah, and I think there's this small language model arc around this as well, right? But the question is, how do you get a small language model? And it's usually some large language model that's had a pretty significant haircut, you know, where they've used sparsification or quantization techniques to really shrink the focus of that model down to a particular set of domain, knowledge domains. And so just coming back to the sovereign AI piece, it's critical
that there are entities that are creating these foundation models that are open in some ways. So minimum definition of open source AI from Red Hat is open model weights and then open software stack that it runs on. But we'd certainly love to see open pipeline, open training data as well. And I think because a number of different reasons, sunlight's the best disinfectant, but meritocracy of ideas, the ability for communities to improve the pipeline,
augment the data set. So lots of different benefits. Eventually. But without that large open foundation model that then you can post train to get to whatever you need it to be for whatever it's a sovereign use case or any kind of other open use case, I think we're hampered. And so it is critical. And I think there's this weird,
hey, we want to be open and we want to be sovereign. And then there's this actual kind of dependency on these commercial labs, these proprietary commercial labs that are producing these things.
And digging now into some of the infrastructure required to power these sovereign AI capabilities, obviously, there is a lot of new hardware that is coming onto the market, but it's often difficult to obtain either because of the lack of availability because it's all being bought up by all of the people who are building all these massive data centers or because from a rental perspective, it's already being consumed by competing companies.
What are some of the ways that teams, particularly if they have a physical data center presence, but even people who are just investing in operating on the cloud, how can they capitalize on their existing operational capacity to be able to actually manage the deployment and execution and, you know, care and feeding of these AI systems that can be very nondeterministic
and complex to keep running efficiently, as well as the fact that you need to increase the potential surface area of ways things can go wrong because of the fact of their nondeterminism. Yeah. So I think I heard two parts to that question, like what can they use and then how can they operate it, if I understand correctly. And so the what can they use, right? And so so just a quick segue on this, right?
So why go self managed is, I think, the first question, which is, look, if Martin Casado from A16z, Andreessen Horowitz, has this great article about essentially software as a service or infrastructure as a service. But basically, he reasons that in today's day and age, you would be insane not to start
building on the public cloud, right? And then he's like, but as soon as you cross some sort of threshold of scale, you'd be insane to stay on the public cloud because basically it'd eat up all your margins. And he uses examples of Uber and Lyft and whatnot to sort of demonstrate that. And so there's this sort of to the cloud repatriation
arc, you know? And it's really about a software as a service mentality, whether it's sort of tokens as a service or software as a service, there's still the same arc. And then, and I think that's a strong argument around self managed, which is you want agency around managing your cost and your optionality and your operations and sovereign compliance, etcetera. And so we have when you look at that, that's sort of what we baked in. So assuming
self managed, you now have this agency to run things where most optimal to run them. And to your point, right, like today, the story's been GPUs. GPUs are multipurpose things that are great for crypto mining, that are great for video games, that are also great for pre training and inference, right? But because they're so general purpose, they're not actually power optimized for any one of those use cases.
And so they're very power hungry. You have to have the power to be able to do that from the most fundamental infrastructure part of your operations. Like, can you put enough power to that floor tile for that rack? And so we are investing in creating options for people. And that's kind of what Red Hat does, right? If you think about all of our platforms, whether it's enterprise Linux or enterprise Kubernetes, or we
basically try to give you choice. And so today, Insight, we offer that in the VLLM project through optionality of whether you want to run on an Intel GPU, an AMD GPU, or an NVIDIA GPU. We use a variety of different technologies to do that with the GPU kernel programming, whether it's Triton kernels or now moving to Helion kernels.
We're investing in PyTorch Core through Open Ridge to basically enable device drivers for whatever you're doing in PyTorch, whatever accelerator you're using to be out a tree. So basically, we're trying to reduce the amount of friction inside PyTorch itself for vendors to come in and add new accelerators
that provide a wide variety of functionality. So IBM Spire is an example of one, but you may be familiar with Cerebras and Grok. Grok, I think NVIDIA just entered a unique kind of partnership with Grok, but CPU inference is another big part of this, so ensuring that you've got all of that. And then the question as you're coming up, as you sort of move up the stack is, great, you're actually able to have an inference server capability that can run on your hardware, now you can serve these models.
This is where there's another sort of interesting arc around operations in the past and operations in the future. So Kubernetes does as well. However, I will say that the MLOps community historically is a scale up community, where Kubernetes at its heart is a scale out community. What I mean by this is if you're using a Jupyter Notebook, you're interacting with a predictive model, that predictive model usually runs just fine on one server, even if it's a beefy server, and
so it's scale up in that architecture. And if you look at the historical PyTorch tool chain, it is sort of scale up in mind. And it's also like that whole foundation and that community hasn't had a historic focus in operationalizing ginormous models. So now this is kind of this new thing. Kubernetes is really good at this. We have LLMD,
which is basically scale out, highly efficient LLM model serving, where you can disaggregate the inference across multiple servers. You can have different components live on different servers, some that are compute bound, some that are memory bound. You can optimize all of that. So that runs great in Kubernetes.
I have a Kubernetes as a decade old project, and I'm not saying that like it's old and irrelevant. It's like it's had ten years of features added to it, and it is now a fairly sizable thing to come and learn. If you come from this MLOps world of like, Hey, I can just deploy it on the server as long as I make sure the server runs well, and I have my operational tool chain on how to update the models on it and run my Jupyter Notebook on it,
You might look at Kubernetes as an aircraft carrier you don't want to learn how to operate. And so there is this sort of dissonance between what's going on in the PyTorch Foundation ecosystem and what's going on in the CNCF, and I think we have this work to do to create a clear path from the PyTorch tool chain that's intuitive into a very well established, mature operational runtime.
And beyond just the core requirements of do I have the infrastructure, can I build, deploy, and run these workloads, there's also the layering on top of that that is the control plane and observability layer of being able to understand
is the thing doing what I said it should do? And so five years ago, there was a significant growth in the investment in MLOps tooling that was driven by the previous decade's worth of investment in data science and data engineering and the laddering of those two ecosystems. And over the past two years, there has been a subsequent growth in the investment of LLM ops tooling,
some of which is just an extension of the MLOps tooling that was built for all of the deep learning work. And I'm just wondering if you can talk to some of the ways that you're seeing teams invest in those higher order abstractions and how much of that is actually necessary when you strip it down to the fundamental principles of what the workloads are and how you need to manage them.
Yeah, I think it's a great point and it is super different mainly just because of it's non deterministic, and so it's inherently unpredictable. And you can do a lot of things to make it more predictable, but it is inherently unpredictable. And so I would say I think the most interesting thing on the operations side that we're doing to control the environment in flight is we started this project called Semantic Router. I think we announced it around April or June.
But essentially, this is model inference. And so with model inference, there's a whole lot of just different operational concerns that you have to handle, like you mentioned around security and guardrails. So if the model tells the user to do something harmful, however you might define the word harmful,
how do you know it happened? So the first part is observability and instrumenting both the inference request and the inference response and being able to log that and track that and analyze what it's saying and be able to articulate policy to identify what it's saying and when that might be problematic. So we have tooling that we're building on that inside the semantic router.
And it's as pluggable, right, because it's kind of arbitrary. Policy itself is arbitrary. You have a policy engine, but what policy people want to apply is usually a pluggable component that they get to articulate. That's getting put in there, and the idea is that basically all your organization's inference requests can come through this router
that you can then basically put these you can sort of think over simplistically, one dimension of it could be thought of as sort of like a LLM equivalent of a firewall, like what's allowed in, what's not allowed in. But then the other thing is for what's allowed in, how can you ensure compliance
and leverage policy to be able to do that? And then there's other aspects of this, which is around cost management that we've historically done, which is, okay, Sally's team gets to use the new Blackwells. Dave's team has to go use the A100s. Both running exactly the same model, doing the same use case, and maybe Sally's prototype typing in Dave's teams in production, like being able to do inference routing for the right thing. There's a much more interesting arc around that.
That project's grown up now and has become the semantic router project inside VLLM, so it graduated from Red Hat into the VLLM project. But there's an aspect of composable intelligence,
so the ability to have a physics model, a history model, a science model, sorry, a sports model, and then it basically can analyze the request coming in and route the request to the right job. That's an interesting It's just using a Bayesian classifier. It's using BERT to do that. And so that allows you to actually compose the same capabilities that you might get from a frontier model monolith
to exactly what you need, right? And you can even fine tune each of these models or post train them in any way that you want to get this thing to behave. So you can sort of constitute a model that way using it. So it's a very capable component that touches a lot of different operational concerns. Another one is confidential inference.
So this is a security problem, which is, the way I describe it is to buy, say, your Coca Cola, and your trade secret is your Coca Cola recipe. But now you have this business opportunity, but you have to go make Coca Cola in someone else's factory. How do you make sure nobody looks at your recipe? And this is sort of the problem with model weights. So if you had a sovereign
opportunity, I think this is the carrot for a lot of these SaaS providers or not even SaaS providers, people whose model weights are their trade secret and their key investment. If they run them in a third party environment, how can you guarantee that
nobody will inspect those model weights and therefore steal their trade secret? And so that's another area that we're investing in related to this. But it's sort of still emerging, and so we're sort of looking at the different problems, how them, put them together. But we are putting these into our standard Kubernetes deployments. And so the traditional platform engineering or SRE or operations
approaches that you've had for Kubernetes would just work for all of these. So BLM semantic browser uses Envoy runs in Kubernetes. Business as usual if you're used to that paradigm. So continuing on that security thread, there's also the interesting new problem of agentic identity and how much it mirrors human identity, how much it needs to mirror machine identity, how to figure out that overlap, and the strain that it puts on existing
authorization and policy framework. So continuing on with the Kubernetes and cloud native conversation,
there is the OPA framework. I also just saw that the Cedar language from AWS just got added as an incubating project for the CNCF, and I'm wondering how the operations teams who are responsible for the infrastructure for all of the systems that these agents are interacting with are dealing with this increased complexity and the open questions that still exist around how to actually manage those workloads and the permissioning and access control and audit trails that need to go along with it.
So this is a project that we're we're working in the agency project. So that's A G T N C Y. So it was announced by the Linux Foundation, but this is a place where we're trying to track or come up with a consensus solution. I think it's worth mentioning that the place where there's the most churn right now in the technology ecosystems around the different projects is in the AgenTex space. There's what I would describe lower down the stack as more choice as opposed to churn. But in
the agentic space, there's so many new projects being launched all the different times. So we have a team working on agentic identity. You're spot on around what the issues are. It's not just only what you describe. It's also what privileges does the agent have versus the person that started the agent or created the agent, and then how long does it have them for? And so there's
a number of different open source projects. It's not super clear yet which one's going to emerge the winner. But this whole space, it's a bit of a tangent, but it's related, which is if you go back to when web services was first being created, and so you had web services description, which would be like the equivalent of an agent card, so a WSDL file. You had an agent registry, which was UDDI,
basically a registry of different web services. All of these protocols are just sort of being reimagined. There's a slight feeling of those that don't know history are doomed to repeat it, like happening inside the agentic space right now. And ultimately, these are all just, yeah, great. They're powered by an LLM that's
autonomous or semi autonomous that can just do work that's handed to it. That part is new, but the fact that it's got a service interface isn't new and how to deal with services and service identity. And that has been going on for twenty five years at least, and longer if you factor Qorba into that. And so I feel like we're not learning the lessons from that and that what I'm seeing is a bit of a standardization explosion.
And we saw this with that service stack with the SOAP protocol, Simple Object Access Protocol. And there was this gnarly
stack of different standards, like encryption standards and identity standards. And after a while, everyone was like, this is crazy. I'm throwing that out and I'm just going with the REST protocol, which was highly simplistic way to deal with services. I feel with the GenTeX, REST will come as well. I think all of these are important conversations to have. But what they're ultimately driving us towards hopefully is consistency around simplicity.
And continuing on with the thread of the cloud native stack with Kubernetes as that foundation, what is your view as far as how that has left us in terms of the ability to absorb these new requirements, new complexities, and what are the areas where it is starting to show the seams and, where we need to actually invest net new effort and ideas?
I think there's a couple of I think this is more psychological than technical in the sense of, like, I don't think that Kubernetes has any fundamental like, my background is in large scale distributed systems. A lot of those, a lot of time spent dealing with large scale distributed file systems and storage, so SAFe, cluster, HDFS.
And if Kubernetes can run those things just fine, like large scale persistent things where if you lose, if it breaks, like you lose people's data, I'm fairly confident it can handle vector databases. And so I don't think there's technological limitations, there's usability limitations, and I think some of this is psychological, like, can we make Kubernetes appealing to people that did grow up with Kubernetes and want to just use it for a couple of very simplistic things?
There's a great sort of meme that went around with Kubernetes around 2017, and it said, Just deployed my WordPress blog on Kubernetes, and it showed this tiny package, six inch by six inch, in the center of the bed of this monster giant truck. And it was sort of saying it's
this very complex thing just to do this very simple task of this person just wanting to deploy their WordPress blog. That's starting to resurface a little bit around And I can just say I work a lot in the PyTorch Foundation ecosystem,
and I think that's how they feel looking at Kubernetes. So I think there's a sort of usability piece that we need to go tackle with, and that's where the seams are showing, and that will become a sort of a psychological deterrent to the humans that are making the generative AI tool chain and the PyTorch foundation.
Digging more into that question of some of these specialized data stores that are gaining prominence as we invest in things like retrieval augmented generation, semantic search, and just the ability to do the
mapping of raw data into the semantic space that the models are operating in. Obviously, a lot of the core primitives are the same regardless of what the engine is. You need to have some way of persisting storage, some way of backing things up. But a lot of these database engines that are specialized on vectors are net new or maybe reimagined for the cloud native era. We also have a lot of existing engines that are bolting on vector capabilities, and I'm wondering how much you're seeing of the
noise around, oh, you need to deploy a specialized vector database because it's so much better at dealing with vectors versus I just want to add it to my Postgres or add it to my MySQL. It's just an array of numbers. There's really nothing that special about it. And just how much of the fundamentals of the problem are actually net new, and how much of it is just a new coat of paint on the same problems we've been dealing with for decades?
Great question. So first thing, I am seeing that, especially with agents, there's a new technique that's coming out, appears fairly new in the generative story, which is basically be able to do discovery across an organization's entire data estate, data warehouse, vector databases, like everything, inspect that, hoover that all up into a giant semantic layer, so big ontology, and then
use that as your foundation for it to be able to rapidly create agents using reinforcement learning. So what is kind of new in Kubernetes and perhaps where a seam is showing or where work needs to be done is if you look at how distributed systems have evolved since 2000s, which is the nature of the software platform or the orchestrator had certain advantages based on hardware limitations. And so when we first were sort of working with Hadoop, we had very little memory.
Compute was kind of limited. Network was really slow, but we had quite a lot of storage.
And so it was all about moving the work to the data. And then Spark responded because there'd been a hardware shift and now servers came out with more memory, so you could stick stuff in more memory, and so you went from plenty of hardware to now plenty of memory, but then you still kind of wanted to keep stuff on disk or in memory because you were network throughput and bandwidth constrained, and I think we're now in this era of highly performant networks where we can sort of attach,
store sort of the equivalent VRAM, etcetera, off server, and the interconnects are so fast that there isn't sort of a noticeable difference really going to an off server memory array. Kubernetes, I think, can still be further optimized to take advantage of this. And I think that needs to happen in conjunction with being able to sort of leverage all of these things in a way that can be consistently used.
And so what's emerging? Well, new data platforms that leverage this network abundance, that's kind of new from a computer science standpoint and the platforms, and I think Kubernetes has got to evolve to that.
The, I think, biggest strain that I am at least assuming, but seeing a little bit in terms of this overall space of the progression to AI as a more broadly deployed workload across a broader range of teams and business cases is the human investment necessary to make it a reality where the technology is competent, but there are still a lot of just foundational conceptual challenges that teams have to deal with as far as how to think about what it can do,
how to enable it, what they need to know about how to actually operate these things. And I'm wondering how you're seeing that investment being made in this context of AI sovereignty and just the workforce development and the workforce training that's necessary to be able to actually capitalize on this new capability that is being pushed across the board and everybody feels like they need to adapt to it or get left in the dust? I
think it's to do with levels of complexity at the stack. So the complexity of using a SaaS service is low, so I think people in a sovereign context or not, if there's a sovereign SaaS model, right, like offered by some service provider in their region for their use, I think they'll be able to parlay what they know building agents around ChatGPT or Claude very quickly around that because it's just an API.
As you go further down the stack, that's when there's a really sharp drop off from ability to go from SaaS to self managed. So typically, where I see the next jumping off point is LamaCPP or Lama, which is a CPU based inference engine that you can run on your MacBook, M series, or your AMD powered laptop, or CPU.
And they start then getting some familiarity with like, oh, there's this thing called Hugging Face that I have all these models that I can go pull from, and then I start to get some experience using this model versus that model. And then there's sort of this path from that to enterprise, okay? Like, maybe now I know how to do that. That's how I learned. I'm now at my company. I want to use my sovereign AI model registry.
So these are models that have been trained off of my sovereign data or my sovereign data hub, and I want to start leveraging it. So now you start using something like VLLM. All of these, the complexity starts increasing, and I really there's a sharp drop off between OLAMA. There's a sharp drop off between SAS usage to OLAMA, like a small amount, and then there's an even bigger drop off to like, hey, I now know how to do enterprise inference, like with VLLM.
And so I think there's just a sort of learning curve that's going to take time. There also has to be the market incentives for them to learn how to do this. Self managed, I think, sovereign AI is creating that need to have self determinism either as a company or a nation or a state. I think that's the carrot. So I think it will happen.
And then there's just a sort of learning arc around that. And I think if you just step out of sovereign and you look at the commercial world, it's moving fast, but I'd say maybe six months ago, we were still very much like in the case of most companies had multiple departments running their own Gen AI pilots that weren't really communicating in a standard way. And so
one might use this stack with that model and another one might use that stack with this model. And they've all first, step one, got to get to a successful pilot, like, did it deliver business value for me? Then the next thing is IT shows up and says, Okay, let's start getting our arms around this cost. It's expensive.
And so then there's this platform consolidation story, which means you're going to have to pick a stack or at least a subset of the stacks so that you can drive total cost of ownership by leveraging consolidation. And I still think a lot of the enterprises in, hey, we're running lots of different pilots using lots of different stacks, and our stacks are in those mixes, but there isn't this grand consolidation strategy yet.
So extending a little bit the conceptual framework of AI sovereignty, I think it's also interesting to frame that in the idea of platform risk and the ability for any organization to be able to actually own their own destiny around the capabilities and adoption of these AI models, particularly when you're talking about some of these frontier providers where you only have access to an API and maybe you have a stable model identifier,
but the reality is that that model is actually getting fine tuned and tweaked under the covers, or they might be changing some of the system prompt. And so you don't have as much control over the end to end realities.
And I'm wondering how you're seeing teams think about that aspect of the equation as well of how much control they're willing to cede to the underlying provider and how much they need to be able to actually own everything from the hardware all the way up to delivery, particularly around some of those questions.
And additionally, from the kind of hosted provider aspect, being able to access things like the log probabilities to get a better understanding of the confidence of the model in whatever output it's providing. Okay, yeah. At least what we're seeing across our customers is well, and not just our customers. So when I talk to our partners and their customers as well, the
number one focus and I mean, this is pretty logical when I think about it is their number one focus is they don't have much loyalty to a particular model provider. It's they have a loyalty to getting the functionality they want. So they don't have too much of a preference, whether it's this model from this provider, whether it's a model available on a Hugging Face or whether it's SaaS. They first want to get to a successful pilot, and then
their opinions get stronger after that. I think it's sort of like a little bit what I started with that Martin Casado sort of perspective, right? You, they want to get to success, drive costs down as fast as possible, and then start to understand the cost benefit around that solution. So great, they've got the benefit. Now what are the long term costs, both financial, operational, compliance, and the
agency that they have. So for example, something may work really well for them, it might be a small part of their business they're optimizing. The moment you go into a core part of your business, you create a business risk if you're dependent on a third party that you have no agency over. And so if they change the way the model works, because frankly, OpenAI's models serve the entire world as their customer base,
your small company might not be a big enough, I guess, segment to adjust how they make their business decisions. And so at that point, I think along the maturity curve of the company, they'll start to look at other things and self manage becomes quite appealing. There can be a bit of a rude awakening, though, where
what you're really hitting is a SAS interface, not a model. What you don't really know is what's happening behind that SAS interface. So are you really hitting one model, or are you hitting 27 with a semantic router in between them with incredibly detailed system prompts that are being added to your prompt and a whole host of guardrails that are being implemented? Presumably,
all of this is happening, right? And so going self managed, I think, is also a journey there where I think they're just trying to find what works for them, which is they're either choosing a larger monolith or a smaller
set of smaller language models that they're post training to get to what they need. But this is an interesting thing around the need for post training abilities, which is in Red Hat, one of the things we've done because we not only build platforms that allow people to use AI, but we use those same platforms ourselves, and we've had to create a post training team that basically it's a sort of a train the trainer model, but we have different teams that are incorporating
open models inside Red Hat products or services. And the issue with these is the same that our customers face, is these models also will kind of get them 95% of the way there, but there's this last mile problem of where they actually need to align the model to actually get the results or performance that they need to get it to behave more predictably. And so we have this team that works internally
throughout the company that works directly with the teams that are responsible for delivering the model and its performance, and they train them. So it's sort of like a teach a man to fish kind of thing. The idea is that once they've worked with this team, if that same issue pops up again, they'll know how to fix it themselves, and then typically they'll only reach back to our broader post training team if they have a different kind of a problem.
And in your experience of working with these customers who are onboarding these AI capabilities, building new systems and services to take advantage of these various models and agentic patterns? What are some of the most interesting or innovative or unexpected ways that you've seen them approach this overall question of AI sovereignty and owning the overall capability for their organization? I don't know if it's a The example I'm gonna give you isn't a customer one, it's a nation state example.
But there is this prevalence, which is unusual, right? We've had this in analytics, small history of data marketplaces and data catalogs, data set catalogs. But in the sovereign space, there is now a repeated pattern that we're seeing across multiple countries where it's not only the infrastructure that they're building, but they're building a data hub. And that data hub is being built, and the data hub is being protected, so it can't get crawled.
So there's some aspect of protecting their assets that are their own sovereign assets and it's seen as a sovereign differentiator. And then the idea is that you use the infrastructure to be able to create models from that data hub. And that being seen as a pattern is something that's very unique to Sovereign. I don't see that in the commercial sector very much. And it's kind of this interesting mix of researchers and commercial interests.
And so often, there's an example of this happening with the state of Massachusetts called ARPA H, which is a set of data sets that are focused on producing models and improvements in the healthcare industry. So I think if I had to think of one thing that's kind of different, new, and unique, and it's kind of innovative and will be useful and produce a whole set of new reasoning models from that, whether they're predictive or generative or post transformer or otherwise, I think that's it.
And as you have been working in this space and particularly trying to keep pace with the rapid changes over the past couple of years, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process of helping these organizations maintain their own control over the AI capabilities that they're investing in? I think AI, generative AI especially, I didn't find predictive natural language processing wasn't
that hard to learn. There's math involved in machine learning for sure. I tend to not jump into the math. I'm like, This is why it works great. Trust you. What I'm more focused on is what tool is the right tool for the right job and why, and what's the rationale between why
this is the right technique versus that's the right technique. With generative, though, it feels like we're on a treadmill where every week someone's increasing the speed, and we're all just running as fast as we can trying not to get shut off the back of the treadmill. And the amount of time both myself and my team have to spend on continually learning, hopefully
all good engineers spend time continually investing in themselves in learning. I'm just saying the sheer amount of the time in each day that we have to spend learning, and it's a different kind of learning. We would go to conferences and listen to talks and go to meetups and read blog posts and new open source projects come out, we would go and stand them up. That's the old way. The new way is what three papers came out on archive this week? Go and read.
They're not quite peer reviewed yet, but effectively peer reviewed academic papers to understand what is happening now and why that invalidates our last three months of work. And so there's this sort of frenetic pace around generative AI that I find everybody is struggling with. It's tough to keep up with it. And to even understand it,
the hardware is changing. Like I talked about how the interconnects are fundamentally changing. It's not just that, it's memory. Like we're now in this shared memory architecture where there's a shared memory base between the CPU and the accelerator, and that works differently. So at every level of the stack, there's these incredibly rapid advancements. I think that is what our customers, and I think if the vendors will be honest with themselves, is everybody's struggling with.
As you continue to try and keep pace with that treadmill and help your customers try to understand how to address this rapidly moving ecosystem. What are some of the predictions that you have for the ways that the operational substrate is going to evolve to accommodate these various AI workloads and the need for rapid evolution and rapid experimentation? Yeah. I think Red Hat's gonna do what we always do. Like, I think people tend to think of us as the open source company.
Great. That that is true. And we're we're an open source platform company. But the value is not just giving you hardened code. The value is you being able to build on our platform and that platform staying stable and us creating a pipeline of all the different innovation that's happening and steadily introducing it into the platform so you can leverage it without disrupting your business or creating instability in your platform. That might sound easy,
but it's not. Because you know, we've have to go and spend a ton of time understanding the open source ecosystem, understanding which project's going to be around for three months versus which one's going to be around for three years. Because the last thing we want to do is introduce new capabilities into the platform that our customers start building on, and then that project gets abandoned upstream.
And now there's no new ideas or improvements ever coming into this. This customer has got this critical business application that's based on this technology that's now been abandoned, basically. So there's this role that we play of like the funnel, right, of taking the bleeding edge and then funneling it into stability for our customers. That's what we're going to keep doing.
Are there any other aspects of this question of AI sovereignty and the operational realities of how organizations can adapt their existing investments to achieve that goal that we didn't discuss yet that you'd like to cover before we close out the show?
I think there's one thing, which is just a topic we haven't brought up that I've been spending some time thinking about, which is, you know, with AI sovereignty, it's this sort of interconnection between the startup ecosystem, the nation state, and researchers, universities,
and then the operator or service provider that's delivering the infrastructure. I think there's this I don't just to touch on my last comment, I don't just think commercial entities are struggling to keep up. I think universities are struggling to keep up. I think researchers are struggling to keep up. I think this is sort of like an interesting
challenge around the sovereignty pieces. Another way of saying that is like, hey, the nation state doesn't want to have an unreasonable dependence on an entity that's not aligned with their interests. Okay, well, for a lot of these, these are I'm not saying the entity doesn't, but for now, a lot of them have a dependency on a commercial
frontier model lab that is moving at warp speed. Now, I'm not saying academics aren't relevant because these commercial labs are full of researchers and PhDs that are driving these advancements. There is this sort of interesting piece of, can the universities keep up with the research coming out of the commercial labs because there's a strong tie in between the university, the infrastructure, and the startups. And that is something that's just kind of like a new societal question.
It's starting to ask some questions around our university based research model. Some of them are, like UC Berkeley, had multiple hits and have clearly developed a lab model that works in this space. But it will be interesting to see how that plays out, because I think the sovereign AI infrastructure investments are sort of predicated on, well, we'll have our own infrastructure, we'll have our own data hub, we'll have our own researchers, we'll be able to compete. And that is the question of,
you? I think that's going to be interesting to see. Absolutely. And especially the impact that these generative capabilities are going to have on research as well and just the ways that it's going to augment the researchers who are pushing those boundaries. Yeah, absolutely. The model doing the research, you know, that it's we're not too far away from that, you know.
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gaps in the tooling technology or human training that's available for AI systems today. Access. It is hard to get access to GPUs.
I think that's the if Red Hat's got this wonderful slogan, is open unlocks the world's potential, but to actually act on open, you need access to the tools to make stuff. And generative AI and where the space is going, so post transformer models, etcetera, world models, all of this stuff, is where technology is headed for society. The more people that can participate in that, the better.
And I think that starts with just to tie it, bring it back to the sovereign piece. We need more nation states giving access to society so that they can participate in this innovation, and so I think that's it. I have talks I do at meetups about how hard GPUs are to access and how expensive they are. I think that's the one thing of the access.
All right. Well, thank you very much for taking the time today to join me and share the work that you and your team are doing on helping organizations own more of the overall process of enabling their organization with these AI capabilities, as well as some of the considerations of the entities that exist beyond the bounds of the organization and just the relationships that are brought to bear on this overall question of who can do what with AI,
when and where. It's definitely a very interesting question and one that we're all going to continue struggling with. So I appreciate the time and energy you're putting into that, I hope you enjoy the rest of your day. Yeah. Thanks, Tobias. This is a lot of fun.
Thank you for listening. Don't forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management, and podcast dot init covers the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at a I engineering podcast dot com with your story.
