AI Infrastructure Lessons from Our Visit to Huawei (269)

⁠¶ AI Infrastructure Background and Huawei Insights

00:07

Welcome, welcome, everybody. My name is Jeff Towson, and this is the Tech Strategy Podcast from Tecmo Consulting. And the topic for today, an AI infrastructure update. from our visit to Huawei. Now, last week, me and about 67, 70 people, we were bouncing around Shenzhen, visiting some companies, doing a lot of lectures, guest speakers, really...

00:34

Had a great time. Well, I had a great time. I think most people had a great time. And we also went to the Huawei campus in Shenzhen, which is really interesting. Fantastic campus. I mean, it's really beautiful, actually. And they have tremendous information in their exhibition halls and other things about what they're doing, which is a lot. But there's one exhibition hall which is on Hallway Cloud. That's my favorite one.

00:57

I've been wanting to talk about this for, I don't know, I went there about six months ago when it just opened and I wasn't really allowed to talk about it back then. But people are talking about the cloud matrix and their new chip architecture and all that. Now it's all public. So I was going to talk about it. And I thought I'd go through sort of what.

01:15

the update is from all of that what what i learned and also for those of you who were on the trip you know i kind of said look that was a lot of information very very quickly um you know Tons and tons of info, slides, graphics, use cases, all of that. So I said, you know, it might be helpful if I sort of go through my own summary of what a lot of you saw.

01:36

So that'll be it. So it's a bit of a summary. It's a bit of an update for what sort of the frontier of AI architecture coming out of China, more or less. And that's going to be the topic for today. Should be fun. Well, I think it's fun. Well, let's see. Standard disclaimer.

01:51

Nothing in this podcast or my writing website is investment advice. The numbers and information for me and any guests may be incorrect. The views and opinions expressed may no longer be relevant or accurate. Overall, investing is risky. This is not investment, legal, or tax advice. Do your own. research. And with that, let's get into the topic. Now the timing on this is pretty good because I just, you know, I finished up four articles over the last month or so about AI infrastructure.

02:20

which was kind of a lot. Part one was about these big AI data centers they're building, what those are about, which is really becoming the infrastructure of intelligence that businesses and others are tapping into. The second one, part two, was how AI compute is just different. So the foundation, the core of everything, you know. Compute being done by NPUs and GPUs in parallel form doing generative AI type really...

02:49

linear algebra, matrix multiplication types, mathematics. It's just different than CPU-based compute that we've seen on computers and laptops forever. So the core is different. So I talked about how that's different and how that therefore sort of requires different architecture. Part three, I started to get into the so what, which is the cost. That was really what I was trying to get at.

03:16

You can't really think up business strategy until you understand what these things can do and what they cost. So that was part three was sort of getting into the generative AI operating costs. And part four. The latest one was at the center of that cost question is what I called the cost of correctness. How much does it take to create and maintain correct answers?

03:44

or generation of content in one of these models that's within an app or a service? Because it turns out it can be very significant. It can cost a lot to create. This sort of level of correctness, it can take a lot to maintain it. You might need a lot of humans in the loop. So the economics can look very different. It can look a lot more like a service business.

04:07

than a straight software business, which means lower margins, more people, things like that. So that was kind of what I was trying to get at. And I was pretty much done with what I wanted to say. I think I'll have maybe one or two more. But then this visit I thought would be a nice way to sort of wrap a lot of that together and review a little bit and sort of tie it together. So I thought, yeah, okay, let's do this.

04:28

And hopefully that's helpful for the people who were there as well. So I want to sort of talk about four things that I saw in the visit. And it was one of my favorite visits for that trip. I mean, I was taking notes like crazy because I'm always looking for what's new. Like, what are they talking about they weren't talking about a year ago? Where's the frontier? And for me, there was really sort of four buckets.

04:52

that I saw going through the hall and listening to all the presentations and all that, which are basically, they talked a lot about AI compute, which is Cloud Matrix 384, which I'll talk about. They talked a bit about sort of... I don't know what the word for this is, sort of AI native storage. The idea of, you know...

05:13

Storage is not really that exciting of a topic. We don't usually talk about sort of memory and how you store things and then the GPU or CPU draws on it. You know, it's a subject. There's flash drives and solid-state drives and all of that. Okay, it turns out AI native storage is really pretty interesting and always doing a lot in that area. And then related to that is what we call sort of the AI data convergence.

05:41

You know, what's sitting on all this storage? Well, you got databases, right? And typical when we think about traditional. data databases, we're thinking like something that's kind of like an Excel spreadsheet sitting on a solid state drive. That's the data. That's the storage, you know, relational databases and so on. But once you move into.

06:02

AI architecture, generative AI architecture, the data system just changes tremendously. So there's this idea that's floating around that to manage and make the databases work. you have to basically combine data with GPUs so that there's sort of AI within the database itself that manages and cleans it and gets it ready to use by the central GPUs, which do the generative AI.

06:31

So there's this sort of data AI convergence that people are talking about. That's number three. And then the last one is really just sort of data centers. When you put all this together, you get, you know, you get the server racks, you get the cooling. I think Huawei has a very compelling global picture for this, which they call cloud ocean, cloud lake.

06:54

Cloud Pond, you know, different levels of this sort of data being stored around the world and the computes may be close by if you have fast applications or maybe the computes far away. So I'll talk about those. Those are kind of my four. And I was looking for updates on those things. And then they had lots of use cases, smart mining, smart transportation, smart banking. Everything's got the word smart in front of it now.

⁠¶ Cloud Matrix 384 AI Compute

07:18

But I'll mostly talk about those three to four things. Okay, so let's get to number one, which is the most exciting, the most interesting for me, which is basically Cloud Matrix 384. This is the thing I'd heard about this a year ago or so, and I couldn't really talk about it. But a couple months ago, they made announcements about what the cloud 384 was. And yeah, it's a big deal. This is, you know.

07:44

If you're looking for the answer to the question of how is China going to compete in frontier high-level computing for generative AI without having access to NVIDIA semiconductors? This is pretty much the answer right now. There's other groups doing stuff that's similar. They're not the only ones. But this looks like the approach, the architecture they're using.

08:11

is basically getting in the range of being NVIDIA-level performance without having NVIDIA chips. Now, they're also working on high-end semiconductors, but most people would agree that they are, you know... If the U.S. is in three nanometers to four to five, you know, China is more like five to seven, five to eight, something like that. So, you know, there are ways behind. Now, what's going on behind the scenes?

08:38

Do they have lithography machines? Where is SMIC in terms of foundry capabilities? Do they have a CUDA-like language that's being used? Hard to know what's true. A lot of it's kept pretty tight-lipped. I don't know. I suspect there's a lot more going on there. We don't know about it. But we'll talk about what is public, which I think the Cloud Matrix 384 is a...

09:00

A good example of that, and that was in Article 2, Part 2 for Understanding AI Architecture. I talked about how they had sort of—actually, no, it was number one and number two. What this includes, how many chips are involved, how many chips per rack. And basically when you add up all the chips within this, what they're called super cluster. Well, it's actually not quite a super cluster, but anyways.

09:23

When you add this thing up, you get 384 chips that are all working closely together in a cluster. Hence, Cloud Matrix 384. So what's the architecture? Basically... It's going from sort of a master-slave architecture to an everything-to-everything architecture. Traditionally, the way the compute has done is you have really a CPU at the center.

09:50

And that's kind of the choke point. You can think about it like that's the center of a radius or the center of a wheel. Everything goes into the center. And then from there, the CPU will send some to other GPUs and other things. but it kind of looks like a wagon wheel with a CPU at the center. When you go to everything to everything, basically every chip can talk to every other chip directly, pretty much.

10:16

So you're wiping out these choke points, these bottlenecks that can hit your latency. It can hit your bandwidth because the whole point of generative AI is, you know. You have to have tremendous data flowing all the time in tremendous volume to get it to work. And you have to have tremendous amount of compute. The flops are very high. Well...

10:40

Rather than going into one chip, all the chips work together because they're all connected to each other. So you can sort of allocate compute from all over the... the cluster to whatever problem you want. You can all cut all of it, a little of it, but it all becomes very dynamic in terms of data flow, bandwidth, and especially allocation of compute when everything's connected to everything dynamically.

11:05

Now, everything I just said is also true for how Nvidia is doing their pods and such. They have a similar architecture. Both are a departure from sort of traditional architecture, which is more sort of hierarchical command control. The difference is with, let's say, Huawei and NVIDIA is always using a lot more chips because their chips are less powerful. NVIDIA's smaller number of chips, smaller clusters.

11:33

You know, it's basically a bigger version of that. And here's a little bit more on that. I was looking up sort of definitions of this. Here's a good definition I heard. The whole sort of concept of, look, you have... Very intensive AI workloads in large language models, both for the training and the inference. So it's a move away from traditional hierarchical computing architecture, more to a peer-to-peer, fully pooled.

12:00

and disaggregated system. There's no central point commanding everything. So some of the characteristics, as mentioned, it's disaggregated, and you can basically pool the resources. So instead of a CPU sort of controlling things where you're going to have limitations on memory bandwidth and network bandwidth, no, you can basically disaggregate the cloud matrix into separate pools.

12:27

and you can sort of allocate everything as you see fit. But the node boundaries are no longer fixed. So everything's pooled, and you can kind of do dynamic allocation. Again, peer-to-peer relationship. Pretty standard For the 384 You're talking about the Resend NPUs They have a couple computer architecture, one for their traditional compute and one for GPUs, generative AI. That's the Ascend series of chips. Okay, so they connect all their NPUs with basically...

13:04

high speed, everything connecting to everything communication, which they call their full mesh. So if you look up the Huawei stuff, you'll see that a lot. It's a full mesh and they use kind of a matrix link architecture. And basically any NPU can connect with any other NPU directly and do what's what. And that gets you large scale parallel computing, which is a big deal. So why is that better?

13:31

Well, when you get rid of the bottlenecks, everything's much more efficient. The whole system can scale up a lot easier. This is actually kind of a big problem with generative AI computing. Traditional computing, the workloads, they move, but they don't move erratically. They don't double or triple in seconds. But you send certain questions. to an LLM, suddenly there's a big spike in the compute necessary. So you need a lot more sort of elasticity in your ability to compute.

14:05

Well, if you have this sort of peer-to-peer mesh type network, it's very easy to scale up the compute you're allocating second by second. So it scales very elastically. That's kind of a big deal. And then you basically just get more utilization from the chips you have.

14:23

You know, instead of waiting and waiting for workloads to be processed, you can queue them up and sort of allocate them very dynamically, which is more efficient. But it also gets you better utilization of your chips overall, which is effectively cost. So I actually looked up the difference between sort of the Huawei Cloud Matrix, which is their...

14:43

There are Ascend 910C chips, 384 of them. Well, actually, there's other chips in there. They're not all Ascend 910Cs. And if you compare that to NVIDIA, the... NVL72, which is sort of a similar connected architecture. They're pretty similar. The switches are a bit different. Matrix Links is what connects for Huawei. The NVLink and the NVSwitch is what NVIDA uses.

15:07

They're both all-to-all full mesh, but basically one is smaller than the other. I didn't see too many other differences, and they both bypassed the traditional CPU bottleneck. Okay, the other benefits of this, basically your flops go way up a lot, a thousand, ten thousand fold.

15:30

Flops are your floating point operations per second. It's basically compute speed. How fast can it do the math, especially complicated math? The other thing to keep in mind with the cloud matrix is, okay, your compute goes way up. So does your network bandwidth. Basically, the communication within the processor, all of that is faster, lower latency, fine. Actually, the part I thought was interesting was you also get sort of a big bump in memory bandwidth.

15:59

You know, there's communication between the GPU, CPUs, all that. Okay, so that's network bandwidth. But you also get communication between the processing power and where the data is stored. So your memory. And typically you have sort of high bandwidth memory on each chip. So the chips have memory on them themselves, high bandwidth. And then you also have sort of an aggregate system-wide menu.

16:23

which tends to be a lot of flash memory, not as much solid state because, you know, just the nature of the computer. So you can sort of pool and disaggregate the memory as well. For those of you who were on the tour, you might have noticed that they had a couple benefits they'd listed on their slides, and it was basically those. Big increase in flops, compute power.

16:47

The network bandwidth jumps and then the memory bandwidth jumps. The words they use were things like you can scale out much faster, add more compute resources, you can scale up. You get higher performance inference. When you get an interruption to your training time, it can be quicker for recovery. You might have seen all those little notes that they had presented. They were basically talking about those three points.

17:14

memory bandwidth, network bandwidth, and a compute at the center, all of that within the cloud matrix architecture. And they actually had a model of the Cloud Matrix 384 in the room. They didn't let us take pictures of it, but it was actually there. I'll put a picture in the show notes. I got a picture of one of them, not a model, but the real thing.

17:33

at a conference a couple months ago. I'll put the picture in there if you're curious what this big thing looks like. And it's pretty huge. Okay, so that was kind of number one. First takeaway, first lesson.

⁠¶ Rethinking AI Native Storage

17:46

Cloud Matrix 384 is a new AI native sort of architecture for compute. The next one, AI native storage, which I mentioned, like... It's funny, nobody ever talks about storage. It's not that exciting, but it's everywhere. It's in every computer. It's in every phone. It's in every smartwatch and whatever.

18:07

Yeah, it turns out when you switch from traditional compute to AI-based compute, generative AI, yeah, the data requirements change dramatically. One, you just need a tremendous amount of data, far more. The nature of the data is different. It's not structured data and numbers in a spreadsheet. It's photos and images and recordings and...

18:31

you know, it's what they call unstructured data. And then in between you have sort of quasi-unstructured data, unstructured, but mostly it's unstructured. The vast majority of it is. It has to move around with tremendous volume and speed.

18:46

you've got this massive pool of data now in your storage. It's got to get to the CPUs and the GPUs very, very quickly, which it turns out you need tremendous bandwidth. So the nature of the storage... is much more about how fast is sort of the reading and writing process in and out of the storage, as opposed to traditionally memory, the thing you might care about was...

19:12

If I'm going to put something on a solid state drive in my laptop or server, I need to make sure that it's secure. I need to make sure it doesn't degrade so that two years from now, that memory is still accurate and preserved. So solid state drives in particular, they're very good at preserving data for the long term. They're not that good for speed of data in and out. But when you're doing generative AI, we don't really need to preserve all of this data.

19:41

In fact, a lot of it's going to be disposed of. The important thing is keeping this river of data flowing between the outside world, into our storage, into the... compute architecture and then back into the storage where it may need to be rewritten and such. So it's much more like, that's how I view it, like a river. And you got to keep the river going.

20:04

AI native storage is this idea that you're going to have to redesign storage architecture from the bottom up to meet these weird demands of AI workloads. And you really can't just adapt traditional enterprise storage. No. So AI native storage is really what we're doing is we're embedding intelligence into the storage itself. And then we're putting on high speed data access.

20:30

So what do you need within all this? Well, first thing you need is sort of very low latency and extremely high performance. So you need the high throughput. lots of reading and writing, you know, huge amounts of data into the storage, often in, you know, very sort of large parallel chunks. So you need sort of this sustained high bandwidth data delivery. And if you don't have that, then your GPUs, your accelerators, they're not being fully...

21:01

You want the utilization of your very expensive chips to be like maxed out all the time. Well, for that to happen, you've got to have these rivers of data moving in and out very quickly. So I just low latency is a big deal. There's this idea of putting AI within the storage itself. So that if you call it AI ops for storage, if you put AI and machine learning algorithms that you want them to actually manage.

21:31

the storage itself. You want it to be moving files around. You want it to do predictive maintenance. You want sort of what they call intelligent data placement and caching. I would say caching, caching. So yeah, you kind of want your... There's basically your drive, your storage drive to get smart and start to sort of dynamically optimize itself against, you know, the demands. And then the third bit is you have sort of, again, elastic scalability.

22:00

Parallelism. You want these storage devices to be able to grow quite easily, like the, you know, sort of the peer-to-peer architecture for compute. You want the same thing in the data. So horizontal scalability. moving things in parallel form, not all serial processing and so on. Anyways, I'll put a slide in the show notes. Actually, I'll put two. I'll put a slide about how AI native storage is different.

22:28

than traditional enterprise storage and also put one for the compute I talked about. Anyway, so... When we were walking around, that was really the second area for those of you who were there. Like we walked into the left, there was all the compute, the cloud matrix stuff. Right next to that was all about storage. That wasn't there a couple months ago.

22:48

So that's interesting that they've started working on this. And I've heard their executives talk about this in comments over the last couple months. So yeah, that was new. It got my attention. Okay, bucket number three. These are going to be shorter. We got two more.

⁠¶ AI-Ready Data and Foundation Models

23:02

Buggin number three, I won't go through too much, which is just the foundation models. What are the big components? Well, it's the compute, it's the storage, it's the models, and it's the data databases. Those are the four things. I've talked about this before, the sort of foundation models, which for Huawei is called Pungu, for Ali Bobbitt's Quinn. I mean, everyone's got their names, Paddle Paddle. They've all got these names.

23:30

But I always liked how Huawei talks about this, where they talk about the L0 level, which is your large foundation models, reasoning, you know, GPT-like models, multimodal models, scientific models are emerging. And so on. Then L1 is sort of, you know, they make those industry specific where they adapt the foundation models into industry specific models. And then L2 is.

23:56

even more specialized, more to scenarios, things like chatbots for customer service and whatever. So they have kind of a nice framework for how to think about models and they're building across the board. But their adoption, obviously, they're not as widely adopted by developers, which is a, you know, they're much more of a hardware maker today. But they could be a software and a foundation model in the future. They're in the running.

24:22

But, you know, they're not like DeepSeek or Alibaba yet. So the Pongu models are interesting. And then they have their platform for how to create these things. Model Art Studio. Interesting. I won't go through that because it's. We've talked about it before. But okay, models are a big important deal to understand. And then the last bucket is data arts, which is really this data AI convergence thing. That's really...

24:49

This is kind of the thing I've been thinking most about in the last couple of weeks. This idea of we're starting to combine data and databases, not storage, which is the hardware, data and databases, which is the software. We're starting to combine that with AI itself. Why would you do that? I listened to a podcast the other day, and they were talking about the same convergence within enterprise databases.

25:17

And the guy who was on the podcast, he had a really good summary. And he called it like the goal for every enterprise before you can do anything with AI, apps, whatever you dream of doing. you have to basically have AI-ready data. That was his phrase, AI-ready data, which I thought was a great way to think about it. It has to be in a form...

25:41

that it can go directly into the GPUs and the compute to generate text or whatever it's supposed to do. And the path from getting data from the world, data from your business, data from everything... to the AI-ready state is pretty difficult. And it's a tremendous amount of work. I describe it as a river. That's not really true.

26:05

It's more like a river with 10 big water processing plants stuck right in the middle of the river. And you got to get the water through those so they can come out the other side and then the river becomes usable. So here's the list this guy was talking about. He said, understand for enterprise data, most data is unstructured. It's voicemails, it's meetings, it's written reports. Some of it's in Excel spreadsheets. Most of it's unstructured.

26:32

What do you do with it? He basically says, look, you got to get it ready for RAG. Most generative AI is retrieval, augmented generation. You got to get in the right form for that type of generative AI activity. Here was his list. You have to gather the data, which is a process that never stops, especially if you're pulling it from the outside world, from customer behavior. You've got to extract the data that's useful within the sea of data.

27:00

You've then got to sort of extract the semantic knowledge within the text or whatever you have. Like most data is of no value. You've got to sort of pull the knowledge that matters out of. the random data, you know, the signal for the noise. You've got to chunk that knowledge-based data up into homogeneous sizes that can be managed and manipulated.

27:25

Then you want to enrich those chunks with metadata, tagging, labeling, so you can search for them, things like that. Then it gets interesting. Then you need to embed it. And this is the part where like I had to watch a lot of videos before I understood this. You know, we have ultimately it all has to be in numbers. It all has to be numbers because ultimately this is all math.

27:51

So you have to get it into a form that the GPUs and CPUs can do mathematical calculations on it. So you've got to convert knowledge, text, images, sounds, all of it into math. Well, that's embedding. So you transform to a numerical representation of the information, the data, the knowledge in a way that can be stored and searched. That's kind of now we're talking about a database.

28:18

And the last step is then we got to index it within a vector database. You know, a vector database is, for those who aren't familiar, they basically capture knowledge by saying this word cat is similar to the word dog. it is more similar to the word dog than it is to the word table because cats and dogs are somewhat similar in that they're both animals, they're both mammals, right? So vector databases put things in spatial orientation and these are just strings of numbers.

28:47

And that's how you capture sort of capture knowledge. The distance between various vectors tells you how similar or different they are. That's how you can search for things and do things. Anyways. All those steps have to happen before the data is sort of AI-ready data. So those are the power plants, the processing plants that are stuck in the middle of the river that everything has to go through before it can be used.

29:16

And that's a big deal. And when you try and do this at large scale, because we're talking about huge amounts of data, that's a problem. It also turns out that the data is growing and growing. Your database keeps growing. So you got to think about the data velocity. How fast is this data coming in? And then also the existing data you have in your databases, how fast is that changing?

29:44

You know, we have to rewrite and change things. So we have to take in new data all the time and we have to change the databases we already have all the time. So that's basically data velocity. Now that doesn't really fit the river analogy, but I think it's kind of close. Anyway, so that's where this idea of how do we convert this data AI convergence is sort of to solve that problem. Because right now, it's a lot of data scientists doing all these steps. We want AI and AI agents.

30:14

to take over this process of getting all this data into an AI-ready form. That's kind of the idea. I mean, that's my simple explanation of it. But those of you who are data scientists are probably gritting your teeth at that explanation. Ultimately, I'm a software and a business guy, so my knowledge taps out at a certain point.

30:35

Yeah, that's the idea. So in the Huawei exhibition, you would have seen an area called intelligent data management, also called AI native storage, where basically you want to have AI and AI agents do all this processing. So how do you do that? Well, you take a GPU out of the central core and you stick it in the storage. So the storage device where all the data sits. has gpus in it now you want the ai and the ai agent in the storage device you don't want to send the data from storage to the chips

31:13

and have it done there and then send it back. That's very efficient. You want to have it done all within the storage device. So anytime the processor, the accelerator need the data, it can come out of the storage all ready to use. You don't want it to go from storage to the CPUs and GPUs, have to get cleaned and used properly, and then it can be used. No, you want to come out of the storage, done.

31:38

That's the idea. So what does that mean? Well, that's AI native storage. That's the convergence. It's got to govern the data. It's got to curate the data, all those steps. I'll put it in a little slide about the benefits of conversion, which is accuracy, cost reduction, speed and agility and all that. And then you got things like security, which is a problem. So I know that's kind of the.

32:02

The fourth one I wanted to sort of bring up, and they actually have a database program there called GaussDB, which they've been talking at a lot. You know, instead of using a Salesforce database, you could use a Huawei Gauss DB system. Okay, they're basically putting AI into that software. Anyways, those were sort of my four big buckets to think about. The compute architecture, cloud matrix 3 to 4, very interesting. The sort of AI meets storage.

32:36

the idea that the storage devices are going to get smart, the foundational models, the LLMs, and then this idea of data AI convergence and that databases are going to have agents operating within them, fixing all these steps and getting everything AI ready.

⁠¶ Innovative Data Center Cooling

32:51

The last bit, which I'll go through quickly, is they did have a little bit about their data centers there. If you saw those big racks. When they show data center racks, they always have glowing tubes on them to make them look cool. I'll put in a slide of one of these racks if you didn't see it. They're actually pretty cool.

33:12

That's where all the chips go in the racks, the racks of the server, and then they put all the cooling. The cooling is actually kind of interesting because, you know, when you go to... Any generative AI computing generates a lot of heat because you have a lot of chips. But it's even more if you're doing this China version where NVIDIA uses far fewer chips because they're more powerful.

33:35

Huawei and most of the China players, they're stitching together 384 chips in this case to match the performance. But that also means you're going to generate a lot more heat. One, you're going to use energy, and two, you're going to generate more heat, which means you need more water cooling. You know, you can put a fan in a laptop and cool it down, but these servers need water.

33:58

So there's a couple models. I don't know if anyone saw them at the center. There's a couple where you sort of put tubes that lay across the circuit boards with water flushing through it all the time to keep it cool, to soak up the heat. There was one version there where you actually have chips and circuit boards. Basically, the whole server rack goes underwater. And the whole thing's just designed to be underwater. And forget the tubes. Let's just dunk the whole system.

34:25

Or at least the circle board and all that. So there was a model of that there, which is pretty cool. Anyways, and then they do data center operations and management, which is a pretty interesting business, actually. So anyways, that was the last bit. And then lots of use cases all throughout there, which are pretty fun. Anyways, that is it for this week. I hope that's helpful, interesting.

⁠¶ AI's Business Strategy and Cost Implications

34:49

I thought it was fascinating. I mean, I took a ton of notes trying to find out where the state of all this stuff is. And ultimately, as I kind of said, for me, the rubber hits the road in all of this in terms of business strategy. What can we do as a business? Well, that comes down to at least two things. What can this technology actually achieve reliably such that we can put it into a service, an app, the capabilities? And then two, what is that going to cost me?

35:18

Well, the biggest component of the cost is the AI architecture, basically. A lot of people are going to be doing this with contracts with cloud providers to access everything I just mentioned. Some will do it locally. Bigger businesses will do that locally. You can also download a lot of this now. And then you have a human component, labor, which can be significant, which was part four.

35:45

of these articles was about how the cost structure of these services can stop looking a lot like software and can start looking like a human service business. And it really depends how difficult of a question you're dealing with in terms of the AI. So that cost bit is important. That's where I'm trying to get to with all that. And then out of that, once you know the cost structure and the capabilities, you can come up with moats.

36:10

and a competitive strategy and a business strategy. That's where I'm going with all this. I'm not quite there yet, but I'm getting there. Anyways, that is it for me for this week. I hope that's helpful. Those of you who went on the trip, I hope you had a good time. I had a great time. And we saw a lot of cool companies, Huawei, Xiaomi, Tencent, Insta360. I thought was really neat.

36:35

You know, the stores, the lectures, machine, like robots, someone from Pop Mart. That was pretty fun. We ride. Yeah, I thought that was great. So anyways, I'm taking a bit of a rest this week. I was pretty burned out. at the end of all that. But yeah, for those of you who are subscribers, I owe you a couple articles. I sort of fell behind last week. And the four articles I just mentioned about understanding AI architecture, they're all there.

37:02

so you can look at those. But I owe you a couple articles. I'll catch back up this week. Anyways, that is it for me. I hope you're all well, and I will talk to you next week. Bye-bye.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Summary

Episode description

Transcript

⁠¶ AI Infrastructure Background and Huawei Insights

⁠¶ Cloud Matrix 384 AI Compute

⁠¶ Rethinking AI Native Storage

⁠¶ AI-Ready Data and Foundation Models

⁠¶ Innovative Data Center Cooling

⁠¶ AI's Business Strategy and Cost Implications

AI Infrastructure Lessons from Our Visit to Huawei (269)

Summary ✨

Episode description

Transcript ✨

⁠¶ AI Infrastructure Background and Huawei Insights

⁠¶ Cloud Matrix 384 AI Compute

⁠¶ Rethinking AI Native Storage

⁠¶ AI-Ready Data and Foundation Models

⁠¶ Innovative Data Center Cooling

⁠¶ AI's Business Strategy and Cost Implications

Summary

Transcript