S1 EP27 - Exploring Storage Fabric Performance | The Marvell Essential Technology Podcast

Christopher Banuelos

00:04

Welcome to the Marvell Essential Technology Podcast. I'm your host, Chris Banuelos. On today's episode, jump in on a conversation with Todd Owens, Field Marketing Director, and Nishant, Lodha Director of Product Marketing Emerging Technologies. Today they discuss the key things you need to know about storage

00:22

performance numbers. Learn more about some of the metrics most commonly used when people refer to storage performance, how storage figures are derived, and how performance figures drive future decisions, as well as an in depth discussion when it comes to IOPS. To stay up to date on future episodes, please be sure to subscribe to the Marvell essential technology podcast.

Todd Owens

00:51

Hi, everybody. This is Todd Owens. I'm a Field Marketing Director here at Marvell. And I work with our customers or partners on a variety of different technologies. And one of those that I focus in on is the area

01:05

of Storage Networking. And joining me today is Nishant Lodha, who's our Director of EmergingTechnologies in our storage connectivity business unit, Nishant is in charge of our lab efforts, but also works extensively with customers and our ecosystem partners in the real world to define the performance criteria, you know, that we're really going to see end to end in, in our storage

01:27

networking environment. In fact, he recently wrote a white paper called Real World Expectations from Fibre Channel HBAs, that you'll find in the show notes here on this podcast. Hey, Nishant, welcome.

Nishant Lodha

01:44

Thank you, Todd. Good to be here. There is a lot of confusion out there about storage performance metrics, a lot of misleading stuff, just downright confusing stuff. And I'm glad to have we are having this conversation. And I've spent over two decades looking at storage performance from our different angles work with customers. So just just looking forward to this conversation between you and me.

Todd Owens

02:05

Let's get right into you know, what are some of the metrics that people look to most commonly when it comes to storage performance.

Nishant Lodha

02:12

If you look at tier one, metrics, things that are most important to the most number of people, you can call them kind of bandwidth or throughput and IOPS or latency, right, and just quickly defining them out for for our audience hear right bandwidth is a measure of the data transfer rate, the faster the better, right? IOPS or people call as input output operations per second is kind of like the number of reads and writes you can you can do per second more

02:37

than merrier, right? In general latency, I would call that as a measure of time taken to actually complete these read or write operations. And definitely lower is better. I would say, you know, think of these in the context of an airplane, right? Latency is how quickly your passengers can board the plane and the plane and IOPS is the passenger capacity of the plane and the bandwidth is the speed of the plane.

Todd Owens

03:01

That makes sense. And you know, those numbers get thrown around a lot in a lot of different areas. I see them published in a variety of different documents, I see them published in blogs, and in all kinds of different places. But how does a customer really need to approach the area of storage performance? What are some of the things that they should look at?

Nishant Lodha

03:23

I call the three big things right, in my opinion that that customers, partners, vendors, solution architects, storage array admins, and storage fabric managers, all need to look at is a couple of things like first A, remember that data sheet figures, the performance numbers, that vendor site in data sheets, they are achieved under ideal lab conditions, which may or may not be achieved in the real world.

03:47

So it's super important number two, and number two, I think is the point that almost everybody misses, which is that we all new to look at things beyond a single device, it is paramount to understand that there is a whole end to end storage infrastructure. And every component plays into what you get a lot of people are just focused on looking at one device and one device's performance that's often leading them in the

04:12

wrong direction. Third, and finally, the most important is sometimes it helps to do a proof of concept right in your own lab, set up a real world environment, set up the applications and actually look at how things function.

Todd Owens

04:24

Well, let's take a little bit deeper into the reality of these performance numbers. You know, you're part of the team that does a lot of the deep dive analysis on our I/O technology and stuff and manufacturers like ourselves, we publish some pretty amazing performance figures for our products. You know, how are those typically derived? What happens to get those numbers so datasheet numbers as you said?

Nishant Lodha

04:50

So, Todd you want to know how the sausage is being made?

Todd Owens

04:54

Well, sort of.

Nishant Lodha

04:56

But let me tell you this. Let me tell you this. Most published performance numbers, like I said, I derived from a lab setting, which is done under ideal conditions, the objective is to showcase the highest performance that one

05:09

specific device can achieve. But assuming there are no other bottlenecks and restrictions, and these tests are often done using artificial node generators, and not to real world applications, I like to call them as hero numbers because they turn heads that make products shine, but little else, and you'll find them all over in data sheet and sponsored collateral.

Todd Owens

05:31

So that's kind of like this speedometer on my truck, right? It says 140 miles an hour, but there's not a chance in heck, that my trucks gonna go that fast, maybe down the hill with a hurricane blowing behind me. Who knows. But so that's some of those published numbers that don't always make sense, you know, and even if my truck did go 140 miles an hour, you know, what good is that going to do me, you know, on my local streets with my speed limits, and all the

05:57

other good things. So your point is good that you're trying to make a number that makes your particular product stand out. But when I go to use that product, there's a whole bunch of other things in the environment that I need to consider, right?

Nishant Lodha

06:09

Absolutely Todd and point on your high speed car or truck and whether you can actually drive it at that speed. And I'll give you another kind of case in point. I recently saw one fibre channel HBA vendor publish a paper saying that device clocked 10 million I ops a huge number in a test which if you look deeper, actually required 24 Different ram based targets and they had to throw in 72 Physical CPU cores, and half a terabyte of RAM. Right? All of this is nothing can be further

06:41

from reality on this right? This is your 140-160 mile speedometer, stuff like this makes the question go up. And I look at this kind of stuff.

Todd Owens

06:49

Yeah, no, I hear you. That's absolutely not a typical kind of configuration that any customer would actually be deploying, who can imagine the point that many cores just to drive one IO stream. It's just, it's just not reality. You know, when it comes to storage, you've got to consider all the elements that are in the data path. It's not just about what the HBA can do, right? What can the switches do? What can the storage device do? And all of those, it even goes back to the

07:18

OS, right? What what happens in the kernel, what happens with the application, so all of those different elements add up again, it's like me driving my car, no matter how fast it goes, I've got to obey the laws, I've got a particular types of streets I can drive on and those kinds of things. So how does that relate in the in the storage world? What is your perspective on all those different elements and the role that they play?

Nishant Lodha

07:43

So I think, first of all, I would say todd I think you're hitting the right point here, I think it is key for customers to understand that there is an impact of every single device out there that is in your data path, so to speak, on on the overall performance. And you know, all of this starts with the realization that the real world is not a lab, and hence not ideal. In fact, I

08:04

would call it far from it. And like you said, you know, it is the realization that an IO which is a reader right operation actually travels to various different components. You mentioned the switcher, a storage array, what we call as a data path, and each one of these contribute, or I should rather say, impede the performance and type destination, right, I think often talked about example,

08:28

which is latency, right? There's a lot of people out there who are touting that they're HBAs have, say 20%, lower latency, right? Across Generations. Great, isn't it? Just hand the more credit card and get that 20% less latency, right? No, not until you actually do the math, right? Because the contribution that these HBAs, especially Fibre Channel from this vendor have to the overall latency is

08:52

very, very, very small. Right, a faster HBA might save you a few microseconds in what typically takes a few milliseconds to complete in an end to end storage array. That doesn't mean much, right. And, you know, most of the latency and end to end storage array actually comes from like you said, some stack the storage media, so that if you think about, you know, the nauseating acceleration of an electric car, Will that get you from San Francisco to LA faster?

09:25

I don't think so. Right. We live and our IOs live in the real world. At least we are not in the metaverse last time I checked.

Todd Owens

09:33

Yeah, no, I totally agree with you. I mean, what's that? So 20% of 1% isn't much when you look at the end to end performance figures. Absolutely. What about what about the IOPS numbers, the input output processes per second, where does that fit into the equation?

Nishant Lodha

09:49

Well, actually, this one is worse. I mean, it's worse understood than latency. See about the story of NVMe and IOPS right. But there are two important things about IPOS to remember and to consider, first of all, IOPS by themselves don't mean anything until you know, what is the size of that IO, on which this performance is being claimed. So the right question is IPOS and what block size my dear vendor, right? Number two is knowing your application, right? What is its IO size

10:23

profile? What are its needs, for example, databases, file servers, web applications, in fact, an overwhelmingly huge majority of applications don't need more than a few 100,000 IOPS if at all that much, and those IO sizes are typically eight kilobyte or larger.

Todd Owens

10:42

That's the most misunderstood part of the i ops number is it is completely dependent on that block size. And you know, the hero numbers are always done, and what's called 512 byte blocks. I can't think of any applications in the real world that actually run them. Like you said, they're all at 8k or better. So that's, that's the key element is that IOPS and block size are absolutely joined at the hip.

Nishant Lodha

11:09

A lot of vendors, like you mentioned, who claim these millions of IOPS irrelevant block sizes, like phytol bites, I call that straight off, like dark stuff.

Todd Owens

11:18

Yeah, yeah, it can be some dark stuff, especially in some of the third party reports that we see out there to where they build these perfect environments, like the one you mentioned earlier, and all the cores and all the storage targets that are needed. Okay, so I see that, you know, this, the IPOS are important, but they you need to understand the block size, how do I how do I go about, you know, doing the POC part of this right, I really need to test it in my

11:45

environment. But I can't always make a test site look like the real world environment. So what are your What are your coaching around that when I'm when I'm actually testing the product in my environment?

Nishant Lodha

11:58

At some point of time, you want to get to the bottom of things, right? In fact, the step before that is figuring out what's right. What's wrong. What are the uncomfortable questions that you need to ask your vendors about? Latency? Instead of asking the question around latency asked about end to end latency? Instead of asking the question about IOPS, talk about IOPS at a

12:20

specific block sites. Second, you know, sometimes you need to invest in a proof of concept, you know, use the equipment and the application that you would actually expect to deploy in production, and actually measure what the performance can be sometimes start there is no substitute for getting your hands dirty.

Todd Owens

12:42

Right? No, I totally agree. But even POCs, right, you have to be a little bit careful, because your real world isn't necessarily as static as your POC environment. Can you expand on that? Just a little bit?

Nishant Lodha

12:56

Yeah, a lot of customers that I talked to about challenge that POC, again depicts a static environment. But a real world environment actually is is evolving, it moves from time of day from situation or from congestion. And you know, we all seek the mantra to get to consistent performance. But it is one of the hardest problems to solve to build a system that is intelligent enough to react to the unknown and not overreact to paranoia. It is quite a

13:26

challenge. I would say that we at Marvell, within the Qlogic Fibre Channel team are on a mission to solve this. We call this our store fusion technology. But details in another part, I would say.

Todd Owens

13:39

Well, I really appreciate your expertise and the discussion today, and look forward to having another conversation with you soon.

Nishant Lodha

13:46

Good talking Todd. Thank you.

Christopher Banuelos

13:49

Thank you for listening to the Marvell Essential Technology Podcast. As always, please feel free to visit our website to learn more, and we'll see you on the next episode.

Transcript source: Provided by creator in RSS feed: download file

S1 EP27 - Exploring Storage Fabric Performance

Episode description

Transcript