Thermal challenges in Powering AI - podcast episode cover

Thermal challenges in Powering AI

Mar 11, 202521 minEp. 53
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

In this episode of Podcast4Engineers, host Peter Balint sits down with Davide Chiola, VP of Systems Solutions at Infineon, to explore the critical role of thermal management in AI data centers. They discuss innovative cooling techniques, design challenges, and emerging trends like liquid cooling and superconductivity, shaping the future of high-performance computing.

Learn more about Infineon's data center solutions here. Have a question or topic suggestion for our next episode? Email us at wepowerai@infineon.com

Transcript

Temperature is the biggest enemy of power semiconductors, is the biggest design challenge Hello This is the podcast if you're interested in what's going on I'm your host, Peter Balint and today we continue our journey And with us today is Davide Chiola, he's VP of Systems Solutions here at Infineon. and... welcome, Thank you Peter.

We've talked a lot about the power And there's a lot of things that go hand in hand with the power So, and managing this thermal, So, to kick things off, can you give us if you look at the total cost of ownership how much of that is devoted Sure, sure.

So, Peter, point, let me start Let me start with this small information I hope I pronounce in Sweden, in the north of Sweden, And Luleå became famous, because it was the first data center Now, what is special about Luleå is that the average temperature, during Temperature can go down, And in summer never less than 10, So now you understand how important and cooling and cool weather Now going, specifically to your question So TCO typically consists of two elements.

So there are fixed so how much fixed investment So operational expenses. So how much do we have to spend every year So for a typical AI datacenter, these the, the part related to cooling, let's say air conditioning infrastructure, or, paying the bill, to cooling for And to give you an idea, on a large installation of, racks for AI, 1000 to 2000 racks, which cost somewhere around 100 and then opex in ten years Then we can think about, 50 million, 45 So 30% devoted to cooling aspects.

So really relevant. In fact, if we map a little bit the installations, across Europe or us, we see a lot of, of them So Ireland, for example, Netherlands, Sweden, So definitely cooling cool climate plays Yeah. Okay. So you could leave the windows open Pretty much actually in this, Facebook there is hardly any, air But these really big fans that are flowing cold there So that's the idea. Okay. And then let's zoom in a little bit and

Yes. In a, in a data center, And let's be even more specific and say the eight kilowatt, When you look at this designer's challenges, that they have to overcome Yes, Peter...so, again, before I talk about that, there is I want to step back one second. And this year, which is, So again, real estate is, can be a big deal. The cost depends on the location.

Of course, in highly populated more expensive in remote But these can be to 20% of the total installation cost That means that the two aspects are So, cold climate, meaning density is an aspect thinking about, data center construction Now, if we think about the, our systems or particularly the power supply unit, of the power conversion Main aspects, that we should take into account Density is very important Managing temperature, as well as we just So these three elements are somehow

off, meaning that typically, high density means a higher temperature. So if we, compress the system, of course, less surface to dissipate, If we, at the same time, increase efficiency, So typically, a more efficient system can have higher density and also lower So what I have here today are two systems The specifications are very similar. So both are three kilowatt They have an efficiency of around 97.5%. However, as you can see from the sides, is totally different.

So the one on top is, let's say, design approach, having around 30 watt And the one on the bottom here is using for a quite advanced power The construction is quite different. And I would like to highlight, just, that, are playing a big role, And also the distribution of the component that here was optimized advanced integration that enabled this high density approach. And if you don't see video now, you should know that there's quite Right? Exactly. They are significantly different.

And, I can tell you, Peter, three kilowatt design, we can squeeze Amazing. So, now on the from the design side. So if we have to design a typical system, of these, of these elements? There are many aspects to consider. For example, for the the eight kilowatt power supply simulation. So the theoretical approach So, for what concern, thermal management, airflow, simulation has to be carry out.

Consider that these systems are normally, and they are stacked on top of each other So there is no way to dissipate heat or laterally, So there is a fan actually on one side So the the way the air is distributed, and how many obstacles are found during So, simulation of the airflow, placement making sure that there are no hotspots even, considering power, dissipation. So a system like the eight kilowatt system runs in 97.5% efficiency.

This means that around A fan, like this one So this can be up to 10% So if we blow air to fast, But we consume a lot of energy. So this is, you know, counterproductive So this has to be, carefully considered another aspect In improving this trade off is the reduction If you look at a traditional system the electrolytic capacitors If you look at the more modern system, we have much higher density. You can see that strongly reduced.

And this can be done by, basically special circuit special Particularly in this new system, we have, a, a power conversion, element extension circuit that basically highly efficient, we compensate to somehow So by having this extra small circuit, of the electrolytic capacitors. And another, very important As mentioned, density And here I want to bring also one example, which is the magnetic integration which is the LLC resonant converter.

So the system is made which is basically a transformer, magnetic inductor, and a series Typically in a conventional system, separately on the board We have an integrated design part of the transformer and the magnetizing inductor So this is a way to, integrate And in the latest integration, the so-called synchronous rectifier, components that we need to complete So high integration contributes, And the special design that you see here So these PCBs are exposed to the airflow

and increase the air exposed to airflow to reduce to improve the cooling Finally very important topology. So topologies are important to increase density. Advanced topology can, help to reduce the stress And also to reduce temperature. Basically better efficiency means better So summarizing again reducing the size of the electrolytic especially integration of magnetics and semiconductor, quality of course.

These standard solution uses mainly silicon and silicon carbide This advanced solution is basically, bend gap, semiconductors. So GaN, silicon carbide switches. Okay. And in addition quite certain that there are standards that if you want to sell you have to be at this point, temperature Correct, Correct. There are standard. So let me say that in general, temperature is an enemy of power semiconductor, So temperature for most of the degradation If you think about gate oxide, so packaging.

to temperature So we want to keep these two, elements down absolute temperature and also temperature Now, there are standards, the server for data center is the OCP. So open compute project.

One variation of the standard for the power degrees, below the maximum temperature, and another standard, refers to the temperature So the airflow between the, the, the input and the output should not exceed So in order to this means that the you know, dissipate low enough to not to increase the temperature Okay. And when it comes to trends, what kind of trends can you identify of thermal management?

Yes, we can definitely observe trends, regarding the cooling So one very clear trend already So as I mentioned So you see an airflow However if you go to rack power on the 100 or 200 which is what the major OEMs, are having airflow becomes not, effective any longer So, liquid cooling is already a state For example, the so-called IT trays are And there are ideas about having or a test that is the attempt cooled down by, by liquid cooling.

So a liquid cooling can greatly improve the power usage Effectiveness Index, of how effectively the input power is used So an index of one would be you do calculation or computation, So by the elimination of fans which are mechanical elements this index can be brought below 1.1, And that means in there are some test a temperature reduction of up to 40% So means component is running you know, run at 60 degree C with a good, liquid cooling, system liquid cooling can there are different

There is this so-called again, known at the time. And today, already in use today, the converter is sitting on by liquid, and the heat is extracted by up to even more efficient system from, thermal point of view, So this is very interesting. Basically the entire converter which is a sort of oil, And this allows a heat extraction, even more efficient heat extraction.

However, of course, So this is a technique, still let's say in development at Infineon, by the way, we, So these I would say are the more the short term, innovation that we see if, I need to look, I can look a little bit beyond that. There are also, of course, research areas, where, where big benefit can be found So one is in superconductor. So, everybody knows that, current when current flows in a conductors dissipates, heat there's a, Joule effect. And these, heat dissipated, of the conductor.

And by the way, So system with high current, they tend to, dissipate more heat, of course. Now, that means that, if we could lower the resistance of the conductor, So the temperature increase And, there are interesting materials at very low temperature, So very few degrees Kelvin, So they would have zero dissipation. So current flowing through them Unfortunately, of course.

This is very difficult to achieve But there are already a commercial system magnetic levitation So the magnetic resonance and system that are based on these principles already so the dream, in power electronics would be that, a superconducting electronic, And, there are now research studies showing that there are, material showing superconductivity effect even to higher temperature But in this case So you need to contain this system So very challenging.

So to answer your question shortly, the next, frontier, the next step But we need a better cryostatic techniques to reach the superconducting state Yeah. All right. Very well put. So do you have any last words today? Well, I can just say, Peter, Because we are at the verge Technology transition...revolution. That doesn't happen very often.

So I think we as Infineon, we are proud to be part of this, And we participate, you know, in basically, I this is what I think motivates me as an, as an engineer, And, we try to bring these, innovation you know, value also to our customer Yeah. Thank you so much for coming in today Thank you Peter, thanks a lot. And to our audience, And if you would like to submit episodes, please feel free to send email Thank you and see you soon.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android