The mechanics of data center flexibility - podcast episode cover

The mechanics of data center flexibility

Aug 28, 202536 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

Shayle Kann and Varun Sivaram of Emerald AI delve into the complexities of integrating AI data centers with the electricity grid. They explain how the often-misunderstood load profiles of training and inference workloads create planning challenges, and how leveraging inherent flexibility—through temporal pausing, slowing, or spatial shifting of compute—can transform these energy-intensive facilities into valuable grid resources. The discussion covers the need for new service-level agreements and the potential for demand response to evolve into daily load shifting, ultimately enabling a more reliable and clean energy future.

Episode description

Adding flexibility to data center loads could ease strain on the grid and reduce the need for costly new generation. And, according to one study, shaving off just a few megawatts during peak hours could also unlock unused capacity —as many as 98 gigawatts in the U.S —  if those facilities reduced load by just 0.5% each year.  

The problem: data centers promise near-perfect reliability, often “five nines” (99.999% uptime) in service-level agreements with customers. That leaves little room to adjust something as critical to reliability as power. 

But times are changing. The data center market is reckoning with the constraints of the power grid and growing concern about pushing up electricity prices to pay for new generation. In July, the Electric Power Resource Institute’s DCFlex demonstration at an Oracle data center in Phoenix, Arizona, reduced load 25% during peak demand. And this month Google expanded its demand response through two new agreements with Michigan Power and the Tennessee Valley Authority.

So what are the actual mechanics of data center flexibility?

In this episode, Shayle talks to Varun Sivaram, founder and CEO of Emerald AI. The startup’s data center flexibility platform powered EPRI’s DCFlex demonstration. Shayle and Varun cover topics like:

  • What people often misunderstand about how much of their nameplate capacity data centers actually use 

  • The distinct load profiles of training, inference, and other workloads

  • How data centers can pause, slow, or shift workloads in time or space to reduce demand

  • What it will take for flexibility solutions like Emerald AI to earn operator trust 

  • How much flexibility data centers can realistically achieve 

  • Varun’s long-term vision for evolving from occasional demand response to weekly or even daily load shifting

Resources:

  • Latitude Media: Nvidia and Oracle tapped this startup to flex a Phoenix data center  

  • Latitude Media: Google expands demand response to target machine learning workloads   

  • Catalyst: The potential for flexible data centers  

Credits: Hosted by Shayle Kann. Produced and edited by Daniel Woldorff. Original music and engineering by Sean Marquand. Stephen Lacey is our executive editor.

Catalyst is brought to you by Anza, a solar and energy storage development and procurement platform helping clients make optimal decisions, saving significant time, money, and reducing risk. Subscribers instantly access pricing, product, and supplier data. Learn more at go.anzarenewables.com/latitude.

Catalyst is supported by EnergyHub. EnergyHub helps utilities build next-generation virtual power plants that unlock reliable flexibility at every level of the grid. See how EnergyHub helps unlock the power of flexibility at scale, and deliver more value through cross-DER dispatch with their leading Edge DERMS platform by visiting energyhub.com.


Catalyst is brought to you by Antenna Group, the public relations and strategic marketing agency of choice for climate and energy leaders. If you're a startup, investor, or global corporation that's looking to tell your climate story, demonstrate your impact, or accelerate your growth, Antenna Group's team of industry insiders is ready to help. Learn more at antennagroup.com.

Transcript

Intro / Opening

Latitude Media, covering the new frontiers of the energy transition. I'm Shail Khan, and this is Catalyst. You might slow down a job. You might change the resource allocation of how many chips, for example, are instantaneously being used for a job. You might also go all the way down to the underlying silicon and you might change what we call the clock frequency of the chip to change the rate at which computations happen. Coming up, what does it actually look like to make a data center flexible?

Catalyst is supported by Fishtank PR, an award-winning PR firm focused on climate and energy tech, renewables, and sustainability. Fishtank is known for generating prominent and effective media coverage for the brands they work with. If you want a PR partner that's thoughtful, shoots straight, and gets results, you'll like Fishtank PR. To learn more about Fishtank's approach, visit fishtankpr.com. That's f I s ch fishtankpr.com.

When utilities need flexible capacity they can count on, they turn to energy hubs. Energy Hub works with more than one hundred and seventy utilities, coordinating over two point five million devices to manage three point four gigawatts of flexibility built for the moments when utilities can't afford uncertainty.

Energy Hub builds and operates virtual power plants that utilities actually stake their grid planning on, coordinating EVs, batteries, thermostats, and more through a single platform built for utility scale, predictive, verifiable, and designed to perform when it counts. Learn more at energyhub.com.

AI Data Centers and Grid Strain

I'm Shell Khan. I invest in early stage companies at Energy Impact Partners. Welcome. So the conventional wisdom about data centers is that from an electricity perspective, they look like totally flat load, i.e. operating 24-7, 365, and without much willingness to change that. But as power increasingly becomes the choke point for more data center infrastructure development, the world is waking up to a bunch of ways in which that's not entirely or necessarily true.

First, you can put generation or batteries on site to shave peak load. That's the physical solution. But there are also digital solutions, it appears. First, because data centers aren't actually operating at nameplate peak most of the time anyway. But also second, because you might actually be able to make the workloads themselves a little bit flexible.

Google actually made a big announcement about doing this at their data centers just a few weeks ago. They've announced that they've partnered with two utilities, Michigan Power and TVA, to introduce demand response via workload flexibility in their data centers. But our guest today is my old friend Varun Seav, who's also working on this problem. His company, Emerald AI, is building a software platform that is intended to make data centers flexible.

As with many things in electricity, the devil is in the details, and in this case, the details involve what do we mean by flexibility, how do we actually get it? What are the SLAs between The data center operators and their customers, how are the grid operators going to think about it? There are a lot of nuances to this. So let's get into it. Here's for Varun, welcome back. Shale, thanks for having me back.

All right, new topic for us to talk about here, which is what you are spending your time on these days, data center flexibility. I wanna start by having you kind of walk me through what you understand to be the the way that compute translates to electricity load in AI data centers today. I think this is something that is actually commonly misunderstood. So What is the what does the electricity load profile look like of an actual AI data center today?

Yeah, great great question. First of all, from a planning perspective. The grid has absolutely no idea what your load profile is going to look like, and that's the way that they study you as a new AI data center load. But let's just back up here. AI data centers nowadays, as Nvidia CEO Jensen Huang calls them, AI factories. Fundamentally are in the business of transforming electricity into what we call tokens, which are the fundamental input or output unit from AI.

And they're doing it increasingly well. So a data center will try very efficiently to take electricity and turn it, you know, into compute output. And you'll have losses aro along the way. You'll have losses because of the load of cooling, for example. All the other non computational loads in a data center. Historically, a data center might lose 33% of the power or use it, 33% of that power for non-IT or information technology uses. And the remaining 66 or 67% goes into actual computation.

Nowadays, with the increasingly customized design of these AI factories, And some of the amazing efforts of the hyperscalers such as Google, these numbers are falling, and therefore you can get 80 or 90% of the power being turned directly into AI computation. What does that look like to the grid? Well, if you're running a large language model training run, you might see the power use of that AI data center spike as the training run commences.

have brief dips as the AI training run undergoes what's called synchronized checkpoints. So there's this kind of very difficult to predict transient behavior that's wildly swinging. And then after the training run concludes, hours or days later, you might have a large reduction in demand. If you have an AI data center that's fully committed to doing what's called inference or using these AI models, you might see more smooth, but still relatively unpredictable

uh usage patterns from the grids perspective. So that's one of the reasons that AI data centers appear so scary to grids today. You can't really plan for what you expect to see.

Grid Planning Versus Actual Operations

And these loads look fundamentally different from anything they've ever seen. They are extraordinarily energy dense. Yeah, and You know, it's not dissimilar from kind of everything else in electricity, uh, which is the result is you have to plan for the peak, right? So the data center says, I need, let's invent a number, 400 megawatts of capacity.

The I think from a grid operator perspective, you basically have to plan for 87, 60 hours of 400 megawatts. That is essentially what you were planning for, right? You're actually planning for even worse than that, Shale. You're right. Over 8760 hours, which is one single year, you want to predict or plan for a worst case scenario where the data center, let's say as you suggested, it's a 400 megawatt data center, that 400 megawatts shows up at the absolute worst time of the year.

But you're actually planning for even more years than that when you're running this interconnection study to determine can this data center connect. to my system, you're saying in the next seven or ten years, in an absolute worst case scenario, so not just eighty seven sixty, but eighty seven sixty times ten, eight eighty seven thousand six hundred hours.

uh when a transmission line goes down somewhere and it's a record hot day and air conditioning demand is super high, on that particular day, will my 400 megawatt data center request its full 400 megawatts and overload a circuit? And if so, Can't connect it today, have to upgrade the system before we do that. So that's how data centers are studied today.

Okay, and then, but that is a different question. That's sort of you said it right. That is how data centers are studied today. There is a separate question of how are they operated.

generally speaking, which does not align perfectly with how they are studied. In other words, it is not always true that they are operating at full four hundred megawatt capacity if it's a four hundred megawatt rated data center. So What do we know about the actual operational profile from an electricity perspective, assuming you're doing nothing clever like the things we're about to start talking about?

And let me say, Shale, before you do anything clever, I actually don't think it's irresponsible or analytically incorrect for the grid to study these data centers in that extremely risk-averse way that I just described. Because you're right, Shale, data centers do take sometimes years to ramp up. Their capacity, they'll proceed in phases as you build out the buildings, fill the data halls with the equipment.

and begin to actually run the workloads that you'd like to run. And there may also be quite a bit of buffer that you leave on top, but you may only, even if you're running an intensive training run, you may only be utilizing this data hall seventy five percent, let's say. And so It it may very well be the case that that four hundred megawatt data center In the foreseeable future does not hit four hundred megawatts.

And yet I don't think it's incorrect for system operators to plan for a hyperscaler who comes to town and says, I want a 400 megawatt data center to actually use that full entitlement once it's granted.

And there are certainly examples of data centers running absolutely full tilt, large data centers running full tilt, to the point where unless, Shale, as you mentioned, you do one of these clever things to intelligently control the consumption when the grid needs you to, the grid has Absolutely, you know, it's correct and justified to assume that you may use your consumption at the absolute worst time in full.

Yeah, I mean my understanding of kind of the basic state of affairs is right. So the the grid says, okay, I'm gonna plan for worst case scenarios I need to do to deliver reliable service. And so I'm gonna assume you need 400 megawatts all the time for 10 years.

Meanwhile, the data center actually operates differently from that. And data center um load profiles, AI load profiles, as I understand it, I mean, particularly for training, but in inference as well, at least in the current iteration of inference, they're surprisingly spiky. So loads can go up and down quite a lot. So maybe you're pulling four hundred megawatts some of the time. Maybe you're pulling two hundred megawatts some of the time. It's kind of a weird.

load profile, but to the grid operator, it's unpredictable, which is I guess the key point here, which is if you don't know when that load is gonna spike or not spike, then again, all you can do is operate as if it is 87, 60, 400 megawatts of load. And so that's what people are starting to wake up to is like, wait a second, like there is this mismatch here. Clearly, there is headroom.

Because the data center does not need to operate all the time at full capacity. But taking advantage of that requires doing some things differently because otherwise the grid operators can't do anything different. Their hands are tied basically.

AI Workload Dynamics and Flexibility

Yeah, it precisely. I think that's really well said. And if I can just take one more moment to set the table here. Shale, earlier you said, hey, look, this isn't dissimilar to what we see from other loads. And and I think, you know, I don't probably disagree with you fundamentally, but I do think there are some very peculiar things about AI that are truly dissimilar. One is the extraordinary rate of growth, the power demand from data centers.

has more than doubled every year the last several years, and that trend shows no sign of abating. A lot of people talk about data center efficiency and the increasing efficiency of the new generations of GPUs, these graphics processing units. Nvidia's Blackwell is much more efficient than Hopper, which is much more efficient than the previous generation A one hundreds, et cetera.

But that efficiency gain is currently being eaten up by the tremendous growth in computing demand. So even as power demand is more than doubling every year, the reason it's more than doubling is because compute demand is more than quadrupling every year, a 4x increase every year. And the second thing that's truly dissimilar is what I mentioned earlier, the power density.

AI's power density is increasing by orders of magnitude, which I don't think any other electricity application has seen in this short span of time, where we went from Five kilowatt racks. A rack is a set of servers and stacked in a single cabinet. That rack might have used five kilowatts just a few years ago. Today I just was in a data center in Silicon Valley seeing a brand new deployment of Nvidia GB200s, the Black Bolt Generation.

The rack is 132 kilowatts. It's liquid cooled. And we're headed toward one megawatt rack. So think of that. That's two orders of magnitude increase in density. These massive data centers occupy a tiny footprint and look like small cities. So both of these trends, the exponential increase in power demand and the shrinking footprint of massive power demand, are stressing grids out in ways we haven't seen before.

Okay, so last question on on the current state of affairs before we talk about the clever stuff. I I mentioned this, but I'm I'm curious whether you have visibility into actually what it looks like, which is is there a is there a meaningful distinction in terms of the current operating profile of AI data centers for a training data center versus an inference data center? Do they look different from a load profile perspective?

Oh absolutely. Th these loads do look different, right? Um tr training loads have a very characteristic profile and inference workloads have a different characteristic profile. And and we talked a little bit about this earlier. A training run looks like, you know, you ramp it up. It can ramp up by tens or hundreds of megawatts. Um It will kind of randomly, you'll have dips in the power as checkpoints happen. It'll ramp all the way back down when the synchronized GPUs stop.

uh with the end of the training run. Uh inference, depending on the set of use cases and the diversity of the applications, uh, can look much more smoothed out. Uh it might in some cases look more like what you've seen traditional cloud computing. Like you've seen, for example, uh a meta data center might have a load profile that looks like people open their phones in the morning and go to Instagram and so you see a spike.

Uh similarly today, people open their phones and go to ChatGPT. And so that's a more familiar load profile. But but nevertheless, like you can certainly impute a different kind of workload type from the power signature today. It's one of the things, by the way, that we at Emerald AI uh have been training an AI model to do.

However, an important distinction here is that a data center will not do a single thing for its lifetime, right? A massive data center, for example, may initially be configured and specified to train a large language model, and then you'll finish training the large language model, and then you'll do other things with those GPUs.

Those same NVIDIA GPUs can then be used for smaller research training workloads. They can be used for inference and fine-tuning large models for specific applications. A single data center may be used for one model and then it's separated out into multiple different types of workloads. So I I wouldn't count on any given data center having the same load profile for its lifetime or even more than a year. Presumably complicates things even a little bit further from the electricity perspective.

Um, all right, so let's let's talk about the the clever stuff then, or at least start to talk about the clever stuff. So the the key concept here is can we make data centers look to the grid like flexible assets? Which means introducing some measure of predictability and planning into when the load from the data center is below peak, basically. And there are various ways you could do that, from like, you know, basic demand response that says we will

tone down demand, you know, a few hours a year just at peak, to like daily flexibility where you're shifting intraday all the time. So there's lots of different versions of it. But from like a simple mechanical perspective, just to start, say you want to introduce some measure of of load flexibility into a AI data center, what are you actually doing? So you can achieve flexibility through multiple routes.

You can of course achieve flexibility through what I'll call the physical infrastructure route. If you have a lot of backup generation, you might fire up the backup generation. Often you're not allowed to because your diesel generator will violate its air permit if you use it regularly.

And so what we at Emerald AI, the company I founded to solve this problem of data center flexibility, what we do at Emerald AI is computational workload orchestration. We want to attack the beating heart of AI's energy demand, which as I mentioned increasingly is just the computers, as AI factories become much more efficient and honed at converting electricity into tokens. And to do that, to achieve that on demand flexibility.

You take advantage of some of the inherent or latent flexibility that the different AI workloads have.

You might, for example, orchestrate a workload that is flexible in time, one that can be slowed down or paused for a certain amount of time. Something, for example, that looks like a fine-tuning operation that doesn't need to terminate immediately on time if what you're doing is taking a large language model and tuning it to a particular enterprise application, that enterprise might not mind if that model is paused for a minute or an hour.

And in other cases, you may be taking a model or an AI use case that has flexibility spatially. you might move it from one location to another to save power in one particular data center location while keeping that application running as you move it to a different location. So there are a lot of different ways within this broad framework of achieving spatio-temporal flexibility. And what Emerald AI takes advantage of is there is inherent workload flexibility in the use cases of AI today.

Are you tired of overpaying for big name PR firms but not really knowing what they're delivering? Is your comms team wasting time reviewing lengthy messaging briefs and decks? Instead of engaging journalists or producing content, are you wondering why your competitors are getting pressed and you aren't? Fishtink PR is an award-winning climate and energy tech, renewables, and sustainability focused PR firm

Dedicated to elevating the work of both early stage and established companies. Whether you need to position yourself as a thought leader in between project announcements or translate complex ideas and technologies into tangible, compelling stories that resonate with the media, Fishtank can help. Check out fishtankpr.com. That's f-i-s-c-h fishtankpr.com.

Virtual power plants are becoming a reliable way for utilities to manage capacity, but enrolling devices is just the start. What really matters is confidence. Knowing those resources will perform when dispatched, and being able to prove it from the control room to the living room. Energy Hub's platform handles the full picture.

From near real-time forecasting, locational dispatch, and the kind of rigorous verification that holds up when regulators, grid operators, or leadership ask, did it deliver? Easy enrollment creates momentum, proven performance builds trust. That's why more than 170 utilities rely on Energy Hub to manage over 2.5 million devices, delivering 3.4 gigawatts of flexible capacity. See what that looks like at energyhub.com.

Maybe let's walk through that in a little bit more detail. So let's let's focus on the temporal component, right? Spatial component. If you have multiple data centers, you shift load from one place to another. Google's actually been talking about doing that for years for the purpose of lower carbon, right? Like they've been saying one of our ways we're gonna reduce the carbon intensity of our computation is by shifting location to location.

Um, that feels to me like it is more readily available to the hyperscalers who have lots and lots of data centers probably within one region than it is to others, the temporal one in theory, available to anybody. So what does it look like? So you have some workload that the data center is supposed to undertake. Is it as simple as saying We delay this workload by a few hours, or presumably there's more to it than that.

It absolutely can be a simple. And let me first give credit where credit's due. You mentioned Google. Google also, by the way, has exploited temporal flexibility. There was a paper or a post they put out a couple years ago. A friend of mine, Varun Mira, wrote it about moving video indexing operations to nighttime in order to reduce uh load during periods, as you mentioned, Shale, when that computation would be not uh renewables intensive, it would be carbon intensive.

So exactly as you said, one simple thing to do would be to simply pause a workload. However, that's not going to work for all workloads. And the reason this is tricky and sophisticated is because there are many things you could do, many different requirements that users are going to have for you.

And you want to precisely meet a grid target and you want to make sure that your performance is not sort of approximate, but that you can guarantee to the grid that if they need you to achieve a particular demand reduction, you can certainly do that while respecting the constraints. that the users of the AI compute put on you. That dual optimization problem is what makes this complicated. So in addition to pausing and then resuming later on a job that can tolerate a delay.

You might slow down a job. You might change the resource allocation of how many chips, for example, are instantaneously being used for a job. Uh Some instances of this are known as auto scaling, where you scale up and down the resource allocation for particular kinds of queries. You might also go all the way down to the underlying silicon, the, for example, NVIDIA chip.

And you might change what we call the clock frequency of the chip to change the rate at which computations happen. And so depending on the workload type, a customer may be comfortable with that workload being slowed a little bit, slowed a lot. And there are some other technical limitations as well, and I'll stop talking in a moment about the complexities because that they're fractally complex, but I'll mention, for example, that different workload types can tolerate different amounts of I've...

clock frequency changes or power caps. And so you need to know something about these workloads in order to determine, hey, what's the best set of operations that I can do to preserve what the user wants, which is great performance for their AI workload, whether it's training a model, fine tuning a model, et cetera.

And precisely what the grid needs, which is not a megawatt more than this limit that we promised to achieve for them. And that is a non-trivial problem that's far harder than just Eh, I'll just pause a bunch of jobs.

New SLAs and The Future of Flexibility

That um differentiation amongst types of workloads, I think, is sort of important here because if you think just historically, pre-AI wave, right? There was already the same problem of like lots of data centers, way, way fewer, but lots of data centers that needed what looked to the grid like 24-7 load, et cetera, et cetera. And the explanation you would always get as to why those loads

couldn't or wouldn't be flexible was well, these are mostly hyperscaler data centers. And the hyperscalers are making a commitment to their customers, the ones on whose behalf they're doing this work. that they will deliver with low latency or whatever it is. And so you know, they're just it's just not worth it to them to try to shift this stuff around. They just want to deliver as quickly as they possibly can.

So I can imagine there being cases here where that's going to be true too. Certain inference workloads in particular, I can imagine like there isn't really flexibility. But then others maybe like training a model, certainly not as time sensitive. So how do you think about like the workloads? Um and types of compute for which this is especially well suited.

Well, first of all, necessity is the mother of invention or changing your business model. And this is one of those cases, Shale, where look, we've got fifty to a hundred gigawatts of latent AI demand in the pipeline, it's just not gonna get built unless you have this capability of flexibility.

Tyler Norris's viral paper. He is an advisor to Emerald AI, I should note. Tyler Norris's viral paper said, Hey, there's a hundred gigawatts of spare capacity lying around on grids if we can just make data centers modestly flexible, up to two hundred hours a year. uh they're able to reduce consumption by around twenty-five percent for around two hours on average pervent.

And so i if it weren't the case that there was this extraordinary demand for energy, severe limitations, and kind of this golden ticket to get it. I don't think we would be changing business as usual, which is the last two decades of SLAs or service level agreements is Shill, as you said, you simply get twenty-four-seven uptime agreements on your power.

Given the necessity now, I think there's a range of AI customers, and we've talked to hundreds, who are willing to tolerate small levels of changed power availability. You know, today there are different kinds of ways that you can reserve compute capacity. You can have a guaranteed instance where you get that 99.99999% uptime guarantee. You can also have a spot instance where you can basically just get kicked out anytime or preempted.

What Emerald AI's spatio-temporal flexibility technology offers is an almost firm guarantee. It's a guarantee that looks 99% of the time, you're going to be left alone. But every so often, up to that hundred hours or two hundred hours, there might be a mild power cap in which the it w in which Emerald Conductor is going to gracefully orchestrate your workloads.

And you might have to face a power cap. And based on what kind of workloads you're running, we're going to make sure to protect the performance and tolerate delays only where you're willing to tolerate them. So that so that implies then that sort of answers one of my uh implicit questions from earlier. So you're focused on the hundred, two hundred hours a year. So this is a demand response type.

Application. It's not like a daily load shifting thing. This is like in periods of extreme grid stress, we will will dial down your power consumption a little bit. You know, to be clear, I think that's where we enter. It's the most pressing need of the hour, no pun intended today. I think that the same toolkit that harnesses spatiotemporal flexibility that allows you to, for those hundred or two hundred hours, provide this demand response.

is also the same capability set that would allow data centers to flex on a weekly or even daily basis one day, again, if the prices are right, if the incentives are well calibrated. And I think Shale, you and I both believe in a grid that is fundamentally abundant, cheap, affordable, and that's gonna require a lot of both dispatchable but also intermittent and not dispatchable energy. And I personally view data centers as a potential holy grail, if not the silver bullet, to enable

A generation mix like that. One that's far more clean and one that's far more intermittent. So down the road, you can imagine that. Data centers, which today are about four percent of American energy consumption, AI data centers are about five gigawatts of load, grow to twelve percent by the end of the decade. AI data centers could be all anywhere up to 50 or even more gigawatts.

to twenty five percent of American load by twenty thirty five and beyond, they suddenly become by far the biggest user of electricity in this country. And if they have this flexibility toolkit They can be doing all of these operations. The up to a hundred hours demand response, potentially daily shifting. That's what a truly co-optimized AI infrastructure and electricity grid infrastructure. massive system would look like.

Proving Data Center Reliability to Grids

And I think step one is solving this hundred to two hundred hour problem and just getting data centers onto the grid and getting grids comfortable that they can perform when called upon. So I think the big question then here is like how much flexibility can you actually offer? And it's gonna vary, I understand, but I don't think anybody's proposing the four hundred megawatt nominal data center turns to zero megawatts, two hundred.

hours out of the year, right? Because you still have HVAC load and all that kind of stuff. And my presumption is you also don't want to, I mean, you mentioned this, right? Some of the techniques that you want to employ are things like slowing down the clock speed of uh of a GPU. That doesn't dial the load down to zero, it just dials it down some. So what do we know about how much

Flexibility. How much demand response capacity is realistically latent within, say, a four hundred megawatt data center? You know, we set out to demonstrate one example of this in Phoenix, Arizona earlier this summer. And we published the results along with Nvidia and the Electric Power Research Institute, our partner's Salt River Project, uh, at an Oracle data center. And we said, look,

Let's take a large cluster of GPUs and let's see what we can get. Can we achieve a twenty five percent demand reduction, which the Tyler Norris Duke paper suggested would be a kind of minimum threshold to achieve this massive amount of headroom. So twenty five percent reduction, sustain it for what the Arizona grid needed, which was a three hour demand reduction.

And do so with representative AI workloads. And so we worked with uh our partner, Jonathan Frankel, the chief AI scientist of Databricks. Who specified for us, look, this is what a representative set of workloads could look like. It was surprising to me, by the way, to hear that. uh he he anticipated that just ten percent of the workloads on a representative Databricks cluster were non preemptible. In other words, they absolutely could not be paused or delayed in any way.

That gives us a lot of flexibility to work with. And so we worked with him to develop four kind of representative ensembles of workloads of varying levels of flexibility, some which could be just delayed by a little bit or slowed down a little bit, and some which could be delayed a little more.

Using those representative workloads, we've published a preprint of our academic paper on the archive showing that a 25% reduction is definitely feasible. We even have one of our runs which showed a 40% reduction still met. all of the performance requirements for this representative set of users and uh AI workloads.

So there is, I think, a lot of inherent flexibility in the system. And then Shale, you can think about layering on other interventions. You can get computational load flexibility alongside, let's say, some limited deployment of batteries. And together You can get much of the data center's consumption to go offline for a small amount of time.

When you say still met the performance requirements, is that like there's something in the SLA, they're giving you an a representative SLA and you're saying, Okay, I still need to meet this, or is it yeah, who defines what the'cause isn't that the key thing? Obviously you can get kind of as much as you want, presuming that the performance requirements allow for it. And so I a lot of this to me seems to come down to like what is the SLA between the data center operator and the customer.

You're nailing it. This is the key central question going forward is can we dis can we define a new kind of SLA? that looks almost like the previous kind of SLA, but has, again, less than one percent of the time, the chance that your workloads might get power capped in the most graceful way possible.

And again, in talking with hundreds of AI companies, our conclusion is this is definitely doable. It is definitely possible for us to find a large set of customers who are willing to tolerate this kind of disruption, especially because first AI customers today struggle to get access to compute. You hear OpenAI's Sam Altman talking often about how GPU capacity is a limiting constraint on the expansion of OpenAI's uh GPT five model, for example.

And others say, hey, the costs of compute because of the scarcity of compute are really the limiting factor for popularizing and democratizing AI. And even for applications that are extremely timer latency sensitive, you know, I recently talked to uh the CEO of a company that makes. a very real-time interactive world model. You know, you can step into this world and the data center needs to be quite close to you in order for you to have a good experience at 30 frames per second.

Even they can tolerate geo-shifting a workload less than 1% of the year, geo-shifting some of the workloads within a 500 mile radius. because it's only going to incur less than a 50 millisecond latency penalty. That's that's acceptable if what that leads to is a much larger set of GPU deployments.

And therefore better access to compute and maybe even cheaper access to compute. So I think, yes, Shale, the central question is, can there be a new PowerFlex SLA that's slightly different from today's SLAs? And I think the answer is probably yes. All right, so final question for you then. The the holy grail here is if is if you and others can

convince the grid operators. You mentioned this before, right? That they can rely upon this type of flexibility, as you said, perhaps in combination with physical flexibility assets as well. such that they know there is a data center that has nominal 400 megawatt capacity, but actually we're gonna interconnect it at 300 megawatts or whatever it is.

What do you think it's gonna take to get that level of comfort from the grid operators? It's been a long road to get like traditional demand response there. And this is like a whole other level of complexity. Now, as you said, necessity is a motherhood of invention, but What's your sense of like, what are you gonna have to prove to get grid operators to trust it? That's a really great question.

To answer it, I recently was invited to speak at the Electric Power Research Institute's summer seminar. There are a hundred utility and grid operator CEOs in the audience. And I asked all of them for the same thing. I said, please participate alongside the AI companies in an escalating series of demonstrations.

approaching commercial scale. And we at Emerald AI plan to hit commercial scale early next year. We're very excited to have whole data centers be power flexible in partnership with our collaborators, such as Nvidia, which is our biggest investor.

That data, that ground-truth reliability information is what's needed for grid operators and utilities to believe that this is actually a thing. That AI, far from being the scariest liability that's get getting added to grids could actually be the most promising asset that we can add to grids. They've got to see it to believe it. So we're working with a range of partners.

I mentioned the collaboration with EPRI and Oracle and NVIDIA and SRP in Phoenix, but now we have upcoming demonstrations all over the United States and increasingly around the world, which I'm very excited about. to showcase that data centers can be flexible and get grid operators very comfortable. One last thing I'll mention is in order for a grid operator or utility to bank on the fact that, hey, when I call this resource, it's actually going to perform the way I need it to

Emerald has developed something called the Emerald Simulator, which is a digital twin that imagines what would happen if we did certain orchestration operation operations. We moved some workloads around, we paused or slowed workloads. And as we've submitted in our academic paper, it's

extremely accurate. And that accuracy, built out over many more demonstrations, is going to be critical to prove to utilities and grid operators that in fact the system is going to work the exact way you expect it to. And if it doesn't,

in that absolute worst case, there will be some fail-safe mechanism to make sure that it does work. So there's a lot of convincing work to do, but I sometimes feel we're pushing on an open door. You know, when I talk to the chairman of a regulatory commission, you know, you you pick your large East Coast state, uh, that chairman said,

I've got the governor knocking on my door every month and saying, What have you done for me to bring data centers to my state? Because I want to economically compete with all the other states. Regulators, utilities, system operators are all balancing this trade-off between providing reliable and affordable electricity, but also bringing economic development and this extraordinary new source of demand, the greatest economic opportunity humanity's ever seen to their state.

Data center flexibility is a way to end the trade-off between those two halves. You can have it all at the same time. It's the reason I've left everything I've been doing in my career and founded this company to do just this for the next decade of my life. So really excited about it. Vrun, this was fun. Thank you again for coming back. Really appreciate the time, Shale. Thank you so much for having me.

Varun Sivaram is the founder and CEO of Emerald AI. This show is a production of Latitude Media. You can head over to latitudemedia.com for links to today's topics. Latitude is supported by Prelude Ventures. This episode was produced by Daniel Waldorf. Mixing and theme song by Sean Marquon. Stephen Lacey is our executive editor. I'm Shao Kahn, and this is Catalyst.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android