What Would You Do With The Fastest Computer In The World?

Speaker 1

00:00

And you know a question and often here's why would anybody need to compute that fast? There's lots of problems where you have to compute that fast. If you ever want to get an answer from Bloomberg News and iHeartRadio, it's the big take. I'm West Cassova today a trip

00:20

to see the world's fastest computer. We did a whole lot of COVID work, how the virus makes people are really really sick, looking at solving multiphysics problems related to the modeling and simulation of nuclear reactors, much more complicated mechanistic models of what's going on in opioord's addiction. One problem that requires a lot of flops, a lot of floating point operations per second. Is climate modeling or weather

00:53

model really trying to crack hard problems? And so we need this combination of clever rhythm designed with exoscale computing to start to solve the challenges and biology and ecosystems. Hey, Vicky, Hey Catherine, Hey, how's it going. Hello Wes, So what are we talking about today? Well, I came across these incredible photos in BusinessWeek a bit ago, and they were reporting on the world's largest supercomputer in fact, it's linked

01:24

to our write up today. So the piece talked about the sheer computing power of this mega computer and it breaking something called the exoscale barrier. We're going to get into that. But my eyes popped and I took it to you and Catherine. Remember, oh, yes, I remember this. You were very excited, and so where were we and so we all decided that you and Catherine needed to take a road trip, and we did so. This computer

01:48

is called Frontier. It was built at the Department of Energies oak Ridge National Lab, which is just outside Knoxville, Tennessee. We reached out to them and they said, come check it out. So we packed up our audio gear and we went and we got to speak to some of the engineers and scientists who are using the supercomputer right now for a myriad of projects. Yeah, and we really

02:12

can't stress enough how incredibly powerful this thing is. They're looking at questions that they were unable to answer ever before, like the origins of the universe, like potential cures for disease, and these kinds of questions actually really need a massive amount of computing power, and they haven't been able to do it up to now. Exoscale is like a next level. So scientists are lining up out the door to use this thing. So we went down and we took a tour.

02:39

I'm justin Wit and I am the program director for the Leadership Computing Facility here at a Christian National Lab. We're housing some of the fastest and most powerful supercomputers in the world. We currently are deploying the world's fastest computer, the Frontier system, and we all have just down the hall the Summit Supercomputer, which is at number five currently, I believe. All right, so we've established that this thing is fast. But what is fast? Like? How fast are

03:12

we talking about? Yeah, we had the same question, and so we put it to him. Can you describe it a little bit like what has gone into it? How big is it? How fast is it? Yeah? Sure, So we'll start with fast. You know, that's what Frontier is known for as being the fastest computer in the world at this point, and it's capable of about one point six quintillion calculations a second. That's one point six times

03:37

with eighteen zeros after it. When numbers get that big, that they start to lose meaning, right, you can't your mind can't grasp that, and one of the things we like to compare it to is if every person on the planet could do one calculation per second, it would take four years for them to do all the calculations that Frontier does every second of the day. I think the engineering that went into the room itself is pretty astounding. For instance, there is so much computing capacity inside each

04:09

cabinet of Frontier. And when I say cabinet, they're about seventy four of these refrigerator sized objects, and each of those makes up part of Frontier, and they're packed full of computing hardware, and each of those is very heavy.

04:24

At this density, each cabinet weighs about as much as two f one fifty pickups And we have seventy four of those, plus supporting infrastructure, plus the file system and all we're supporting about two Bowling seven forty seven's on our raised data center floor at this point, so that there's a lot of engineering that went into that to make that happen. That image of each of those seventy four cabinets weighing as much as two f one fifty pickup trucks is like sticking in my head. How do

04:55

you power something that big? This is a really complicated system, so I'm going to let the experts explain. So this part, I think is a little likely. Behind the scenes store of Disney, a lot of people think, oh, do you you know I like at home, I order a computer, it comes, we plug it into the wall. But with computers of the size. For instance, we installed about forty megawatts of power to run this computers, and it's kind

05:22

of peak operating. It would use about thirty megawatts of that and that's enough to power about twenty five thousand homes. And what you hear are actually these very large pumps pumping that six thousand gallons of water to the system every minute. I just got to cut in here and say, that's six thousand gallons. He's talking about one. They're in a closed loop system, so it's the same water being repurposed over and over again. But they are flowing through

05:52

thirty six inch pipes. We're talking like a yard across. Tell us about that the six thousand gallons. That is just to keep it cool. That is to keep it cool. If you put that much power in, you've got to get the heat back out, and in those small spaces, really the only way to do that is by circulating some fluid directly over each of the components on the computer to have your own sort of miniature energy grid here,

06:18

or how do you power this? So we're actually power from separate substations here, so if one goes down, we still get power to the computer and the mechanical plants. You'd think that the world's fastest supercomputer would be I don't know, immortal, or at least have a long lifespan, but we found out that it's actually not all that

06:37

different from the computers we use. So what we like to do is run the computer's in production with scientific users on the systems doing their work for you know, at least five years, and then we like to overlap a year with the next computer, so you know, we kind of say five to seven years. It's kind of

06:54

the general timeframe we'd like to keep these around. So while we were on the tour, there was one point where a technician had one of these huge cabinets open and he was sort of messing with the guts of it. So we got to see inside of it and we see all these microtrips and copper and wiring, and it sort of got us thinking how do you build this thing. We've all experienced the supply chain issues we've had over the last couple of years, especially due to COVID and

07:19

building a supercomputer. It doesn't matter if you're the Department of Energy, you're going to run into the same supply issues. So we asked them how it was for them. We jokingly say, one of our biggest lessons learned is don't build a supercomputer during a pandemic. So the supply chain issues were non trivial and we're hard to overcome. Frontier has sixty million individual parts in it, you know that balls down to hundreds of individual part numbers, and the

07:49

supply chain issues were across the board. I guess it was early twenty twenty we got word from Hewlett Packard Enterprises that their suppliers were saying, this could be a two year delay. We cannot get chips to do that for us. We had some supply chain issues there, but it was even the fifty cent parks, the little voltage

08:11

regulators and things that are in here. And at one point we had eighteen people that their full time jobs were every day going out and finding the parts to build Frontier, and so we shrunk that two year delay down to less than a three month delay. We're able to mostly hold our schedule for this. But again, so they got it built and my next question was what's it like to use it? I asked Bronson Messer, he was another Oakridge scientist who was also along for the tour. Yeah.

08:40

For the most part, you set at your own workstation and you log into Frontier remotely. Almost all of our users, with some notable exceptions, are not local. In fact, we have a lot of international users. Users all over the country everywhere. I have students that, for example, don't interact with the computer any other way except something called a Jupiter notebook, which is actually a web interface to the computer,

09:05

and it's sort of everything in between. More from oak Ridge National Lab when we come back, Vicky, you mentioned earlier that scientists are lining up to use the saying you can see why. How do they decide though, which scientists get time on this computer? And then once they get the time, what do they actually use it for. Remember Bronson Messer from earlier, he is actually the director

09:36

of science for the oak Ridge Leadership Computing Facility. That's where we were within this massive complex, and we had a chance to sit down and ask him. You are the go guy, right, You're the guy who says yes or no to the project. So how do you vet these? Like all peer reviewed science, we use peer review pounds, right, So we have several different competitive allocation programs for the machine. You know, some of them are open to everybody, some

10:04

are open mostly to DOE researchers. We also reserve a small amount of time for what we call our director's discretionary program. But for all these programs, we impanel other scientists to sort of decide, yeah, this is an important scientific problem and we need to use the resources to do this first, or you know, maybe this is important too, but maybe we should wait just a minute. So those

10:26

same proposals get checked out. And basically the question is can they use the big computer and do they need to use the big computer? And if you can't answer both of those without resounding yes, you probably, frankly, you probably don't want to try to compute on our big computer because it's it's hard for a variety of reasons. Inventing all of these there has to be a certain

10:47

amount of national security implications. We don't do any classified work here, so we're strictly an open science shop that it will see if so, tell me what that means. It means that all the all the science that we do, basically all the inputs and all the outputs can be published in the open literature and be promulgated to anybody.

11:03

So in a journal publication, we do have the ability to do things that are a little bit more sensitive than just strictly open science, things like protected health information or other things like that. But we take that security measures really seriously. But what it boils down to is we do what's called an exploit control review on every single project that gets on the machine. We make sure that it's possible that these the results can be can

11:27

be promulgated to everybody. How concerned are you all about breach of security or hackers and what kinds of safeguards do you have against that. We have a huge cybersecurity cred So if you've met, if you fielded the world's largest computer for at this point more than a decade, we have script kitties knocking at the door every five

11:49

or six seconds. What's a script kitty? Somebody who wants to break into the computer for nefarious purposes, Either either organized nefarious purposes or just hey, I has hacked into the world's largest supercomputer kind of right, Yeah, so we have we have huge effort and intrusion detection making sure that the edge of our computer is protected from that kind of thing. Um. And just to back up a

12:14

little bit, why is it called the Frontier? Yeah, I think it really I think Frontiers an optimist named for the machine. It's it's the culmination of a more than decade long project to get us to the exo scale. More than a decade ago, the federal government basically made a commitment to getting us to the exo scale as as a concrete sort of flag that we could When you say the federal government, is that Congress was it

12:39

legislated everybody. It was a whole of government effort, right, And of course it took getting Congress behind it because they controlled the first strings and it costs money. And so there's really two major components of getting us here.

12:50

There's this machine and other machines like it, but this machine first, and then alongside of that there's there was something called the Exo Scale Computing Project, which was really concerned with making sure that we had software and codes that could actually take advantage of the machines once they got here. That was a big worry, say seven years ago, that we would build it and nobody would come and

13:11

nobody could come right because they wouldn't be ready. One problem that requires a lot of flops, a lot of floating point operations per seconds. Climate modeling or weather modeling. Now, the thing about weather modeling is you kind of have to get an answer before the weather actually happens, or it's not super useful, right, and so weather codes typically want to be able to get answers within just a few hours or maybe even sub hour in some cases.

13:36

To be able to do that and to have the same the physical fidelity, it's going to tell you it's going to rain here and it's not going to rain there, or you're going to have a windstorm here, you're not going to have a windstorm there, when you're talking about a few miles apart. Only at the excess scale, we're going to be able to do that with regularity. Climate models are going to have to be at that sort

13:57

of that level. AI and machine learning my help, because you know, we all have a feel for what the weather is like, right, So guess what you can train machines to sort of have a feel for what it's like as well. But you're having to stop right there because you said the machine will have a feel And this is something that I think we are trying to wrap our heads around. These are human words, they are so how do we get a feel when when when people say I have a feel for what that is,

14:23

how do you get a feel? Well, it's through experience. It's through seeing the same thing over and over with slight variations and being able to predict with some amount of fidelity, what's what's going to happen again? Is it gonna happen again exactly? Like that? If if I right, this guy is gray, it might rain today exactly, or or if it's it's clear blue sky, it's not going

14:45

to rain today. I have a feel for that. Even something as simple as shooting a basketball right if I sort of if I sort of cork it this way, I'm gonna miss it. If I don't, If I feel it go out of my hand the right way, it's probably going to make it. In feeling like that, that's very much the kind the thing that happens with AI

15:01

and mL. You basically train mL and AI tools mL, machine learning, machine learning on lots and lots of data, so lots and lots of experiences, and then you use that to anticipate if you see a similar data set in the future, what might happen. Does Frontier understand emotion? No, Frontier does not understand emotion. It doesn't understand tone and is all it knows is sort of what it's been fed.

15:30

It can fake understanding emotion. Right. So you know, a lot of the AI and mL tools feel like human right. Chat GPT is a prime example of this ring as they feel rather human, but they're not right. They're they're not formulating original fault. They're using a backlog. They're making new things, right, but out of a limited set of things. And if you steer them right, they'll just copy what they've learned before. I understand, And so does that all

15:59

get stored? That memory learning gets stored, and then it can be basically resurfaced as needed, like right, and so the basically there's two sides to the artificial intelligence and machine learning coin. One is training. It turns out that machine big, huge machines like Frontier really good at training, right, because you need a lot of memory to store all those data right to be able to train an AI model on, and you need to be able to do it fast, so huge machine like Frontier is really good

16:31

at that. Then you get a model, you get a reduced sort of set of things that you can use, and you want to do inferns, so you want to feed in some new inputs and get the machine learning model to tell you what the output ought to be. Those are much smaller. They typically fit like on one of those nodes you saw earlier inside the data center, fairly effectively. But the key is you probably want to use them everywhere all over the computer. You probably want

16:57

to use it and lots and lots of place. This was another question for my eleven year old. How do you ask it? A question? The question is usually of the form if I do a what will happen? So like an equation almost almost, but you want to know if, basically, if I change something, what's going to happen, because that's the way I understand how something works. So, to go back to my basketball analogy, if I talk my old bow, I out or keep it in, what is I actually

17:27

going to do to the fly to the ball. That's typically very much how you ask the computer a question. You do several simulations where you've tweaked one or two little things as you go along, or try to see what the answer actually is, what this big computer is actually used for. After the break, Vicky, now we know how you sit down and use this computer. So I wanted to ask you one more question. Well wait, before you do, us have to say they really buried the

18:02

lead on us. At the end of the tour, we stopped by an old building in an area that seemed really decrepit, and it was on the verge of demolition. Almost We walked into that building and we came to understand it was actually the site of the Manhattan Project.

18:16

The reason they built this here was because in around nineteen thirty eight in Germany, vision was discovered, and so they convinced Albert Einstein to write a letter at President Roosevelt saying, we need a program because we don't want the Nazi ste it there before we did, and so they began a lot of work on what became known as the Manhattan Project. This part of it was to pilot the demonstration of plutonium made in a reactor. That

18:47

was Bill Cabbage. He's a public information officer at oak Ridge and he was the tour guide for the Manhattan Project site. It was incredible. We had spent all day looking at what's next in the future, and all of the sudden we walked back into the past. It was filled with vintage instruments and photographs. Yeah. I really thought the wowl factor of the tour had ended when we arrived at this sort of rundown site, but I have to say it was about as cool as seeing the

19:15

supercomputer up close. Why though, did they choose oak Ridge as one of the sites with the Manhattan Project. Well, we asked Bill dot and here's what he said. This area recognize it had a lot of water resources here, it was a rail hub center Knoxville was so you had electricity. He would also essentially in the middle of nowhere. That might have been the sense back in the day, but nobody would think that this was in the middle

19:42

of nowhere anymore. This was an incredible complex with a bunch of buildings we didn't even have a chance to go to, but we did have a chance to sit down and speak with a number of engineers and scientists who are actually using the Frontier system. One of them

19:57

was Steve Hamilton. Hamilton work in the Fission Infusion Energy Sciences Directorate in a group called the High Performance Computing for Nuclear Applications, and I am the primary investigator for a project called XSMR, which is looking at solving multiphysics problems, so problems that involve multiple different domain sciences related to the modeling and simulation of nuclear reactors. And when you say reactor, are we talking a nuclear reactor, So we're

20:28

talking nuclear fission reactors. So these would be designs that would be targeting production of in most cases commercial electricity production. So Steve's work on a very basic level is modeling and simulating how new designs for smaller scale nuclear reactors would function. He and his team are basically using Frontier to a stress test these by building a replica and then they test it out before trying to actually make them at scale. There's multiple reasons that we want to

20:58

do these simulations. The largest factor really is cost for us. So there are some safety considerations, but in large part, if somebody proposes a reactor design, they say, I think if we had this geometry and we put this fuel and we use this coolant, I think it would be

21:16

much more efficient than anything we have out there. And somebody puts this idea out there, it could take billions of dollars for them to be able to take that idea and push it forward to an actual prototype that they could demonstrate and confirm that, yes, this reactor is going to operate exactly the way that I thought it would. If we use a computer like frontier, if we build up a computational model, and we don't like to use the term too much, but one of the terms that

21:46

is used a lot as digital twins. And so you're building up this digital computational representation of the system. And if you have enough confidence in your computational tools that they're accurately predicting the physics, so they're predicting what would happen inside of that reactor, then you can perform computational experiments. You don't have to have all of the physical experiments, and so you can build up confidence in these different

22:17

reactor designs purely using a computer like file. So you're building a replica so that you can test it out before bringing it to scale. Essentially, what are some real life use cases like would we eventually see them powering small cities? So that is definitely in the cards. There are people who are targeting reactors. For instance, remote villages that right now are relying on diesel generators for their

22:44

electricity production. If they could deploy a relatively small nuclear reactor, they could have long term stable energy production for that community, and it would ultimately be at a substantially lower cost thanesel generator would cost to operate over long periods of time. But there even within the smaller reactors doesn't necessarily imply that it's these niche markets. So the small modulear reactors can also be deployed for large scale electricity production. We

23:16

also spoke with Dan Jacobson. He's a chief scientist for something called Computational systems Biology, So I'm a computational systems biologist. We're trying to utilize data from all of those different types of molecules together with environmental information and then using whole ensembles of algorithms on the exoscale computing systems here to solve what are really complex problems. This stuff was

23:40

pretty mind blowing. Dan talked to us about a bunch of ways he's been using exoscale computing to layer information to tackle issues like climate health or future pandemics. But there's one thing he's using Frontier for that we did not expect. One of the big concerns in the US population, even perhaps more so in the veterans population is suicide, suicide ideations, suicide attempts, and one of the initial questions in that is are there genetic architectures that predispose people

24:12

to suicide attempts. We have this wonderful collaboration between the Veterans Administration and the DUE where actually all the electronic health records for the entire VIE systems. We have a copy of herod O creation gets updated nightly, and we have genotype information now for over eight hundred thousand patients and the goal is they get that to two million.

24:33

This is out of the MVP million Veterans program from the VA, and this is a tremendous resource for human systems biology and human health trying to understand again the mechanisms of disease, both at the clinical level as well

24:46

as at the molecular level. Having this larger population that we have genomics information and clinical information for allows us to start to tackle those sorts of questions, questions about data and privacy with all of these frontier projects kept coming up throughout our visit, so we wanted to know how are they protecting privacy of these veterans health records. The VA data sits here a very secure enclave. It's a PHI certified enclave, and they are very strict access controls.

25:18

Everybody goes through lots of training to be able to use this data. But the data that we see is anonymized. We don't see names or addresses. That's all been scrubbed by the time we see it. And so we're interested in understanding the clinical variables, and we're looking at thousands or millions of them, so we're never going to drill down to an individual patient. Genomic tracking is already pretty impressive, but Dan explained that it's just the tip of the

25:45

iceberg in understanding and treating disease. We're doing a lot of this in addiction work. We study a lot of neuropsychological conditions, cardiovascular disease, cancer, infectious diseases, and we're actually scaling up to do this on about two thousand to three thousand human diseases, so we'll have this complete network model of the interconnections between human disease and mechanistic understanding and how you have a lot of comorbidities people with

26:10

multiple diseases that tend to co occur. This will help us understand why why are these things tied together. Another area he and his team built was a climate model of basically the entire planet. We've taken about the past sixty years of climate information for every point of land on the planet around the world, and we built vectors tensors, strings of numbers representing all the different components in the

26:36

environment at every position. Well, those vectors, now we can use the Comet codebase we develop on Summit and Frontier to compare all of those vectors to each other. So we're trying to say what are the similar environments of all the possible environments on Earth. So now we have a really fascinating color representation and this is the highest resolution climate type map ever done in human history. It looks really cool, but the truly amazing thing was learning

27:04

what you can actually do with this model. We can zoom into different areas of interest in this case for pandemics, things like Eastern Australia, Indonesia, Bangladesh from Central Africa, all hot spots for different types of viruses that we know are being affected by climate change, and this is allowing us to shine the flashlight where do we need to be concerned. Where's the biggest relativistic climate change affecting bat populations and viruses and their food sources. Where do we

27:31

need to be focusing. So to pull this off, we did nine point three extra flop calculation. That's the fastest calculation done in human history. Overall, it was one hundred and sixty eight zeta ops, so that's ten to the twenty three zeros mathematical operations that had to be calculated to pull this off. It's one of the largest calculations done and about ten to the seventeenth network edges calculated.

27:54

This one of the largest networks ever created. Again, how we're using these sorts of exoscale resources to tackle problems we simply couldn't do before. So at this point we're really starting to get a sense of how much is possible when you have computing power with like eighteen zeros

28:08

after it per second. But I think for us both this next example was absolutely the most fascinating is something called zoonosis, and it made us really glad we were there with some really large brains that are working on these problems. A lot of our focus now is on trying to understand zonosis. How pathogens, viruses and other microbes

28:29

often sit nascent in animal reservoirs. When those viruses get into other species that are not ready for them, like us, that leads to disease, and COVID nineteen is one example of that. But the rules of zonosis, how and why this happens and what's driving it are actually fairly poorly understood. And then he hit us with flying foxes. So the system are studying are in flying foxes. So these are these giant bats that have like a six foot wingspan.

28:57

They traditionally live in roosts of two hundred thousand bats in the jungle. They carry these viruses called hendra and hinipoviruses, which in the Australian system, if they get into horses they kill about seventy percent of the horses. When they get from horses into people, it's also a seventy percent mortality rate. So, I mean, COVID nineteen has been horrific,

29:16

but it's a sub one percent mortality rate. Seventy percent mortality rate is game over right, So we want to really understand these systems and learn how to prevent that sort of spread before they become efficient at human to human spread. All right, Wess, let's take a moment here for you to digest this. It took us a minute the idea that this virus could go from flying foxes to horses and then horses to humans. I mean, this

29:42

could potentially wipe out populations. So we're actually looking at We know that when there are food shortages, that puts the bats under stress, messing with their ecosystem. So we

29:54

want to understand what causes these food shortages. So we took all these environmental variables and we threw them into one of our explainable AI approaches over time and space, and we're actually we thought this was a ten year goal that we could someday learn some of these rules about an environment and Baigali, we're about to submit the paper showing this predictive model that is very good at picking up these little black dots at the top are

30:18

when food shortages are occurring, so we can start to get up to a year warning when one of these pandemic breakoats could be occurring that we need to start watching for and then zooming on where and when to look. So you saw all of that in just one day,

30:34

Ed oak Ridge, Yeah, we did. And you know, it's crazy because these are just some of the first projects getting started because it just got built and it's got, as we heard, five to seven years of lifespan left, so there's a lot of room to grow, and they're just getting started, and as we also learned, they're already building the next one. So I mean, for us, one of the big challenges of on a machine like Frontier, which has also been one of the fun aspects of it.

31:04

So it's all a big challenge, but it's just the sheer number of people that you have to bring together to be able to run one of these simulations, the number of interactions and managing that as you're working all working towards getting to this final capability that you have to be able to use a machine like Frontier. It's been very, very rewarding. Vic Katherine, next time you're go on a field trip, I'm going with you. We'd love to have you us. You got it. Thanks for listening

31:33

to us here at the Big Take. It's a daily podcast from Bloomberg and iHeartRadio. For more shows from My Heart Radio, visit the iHeartRadio app, Apple Podcasts, or wherever you listen, and we'd love to hear from you. Email us questions our comments to Big Take at Bloomberg dot net. The supervising producer of The Big Take is Vicky Bergolina. Our senior producer is Katherine Fink, and they both produced this episode. Raphael M Seely is our engineer. Our original

32:03

music is composed by Leo Sidrin. I'm West Casova. Will be back on Monday with another Big Take. Have a great weekend. My kids are at an age where we've been watching The Terminator. Terminator to cyberdine, will it become self aware? Do you do you believe that it has a capability to become self aware? I don't. I will say that colleagues of mine who are interested in this question and I talked about it that I just talked

32:38

about about where do the heavy elements come from? Um. They built a code module to sort of do that, to sort of trace the transmutation of elements as they get heavier and heavier. They called it Skynet, which I thought was like dangerously arrogant of their part. Yeah, I thought I thought, really, you're going to tempt fail. It's a little too it's a little too real. We're getting a little late in the game, if you will, to

33:04

call something that. But yeah, they just plowed ahead. And the in fact, they even use and you know on their website, Yeah they use the symbol from the movie. Yeah,

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript