S5E17 - Azure Data Box - move big data to Azure efficiently - podcast episode cover

S5E17 - Azure Data Box - move big data to Azure efficiently

May 10, 202439 minSeason 5Ep. 17
--:--
--:--
Listen in podcast apps:

Episode description

This week Alan and Sam discuss Azure Data Box. It is a ruggedised appliance designed to simplify the process of transferring large volumes of data to Azure cloud storage, offering high-speed data transfer and secure encryption. It enables organisations to overcome bandwidth limitations and easily migrate large datasets, ensuring efficient and reliable data transfer.

  • What is Azure Data Box?
  • What can you back up with it?
  • How does the process work?
  • How much does it cost?

What did you think of this episode? Give us some feedback via our contact form, Or leave us a voice message in the bottom right corner of our site.

Read transcript

Transcript

Hello and welcome to the let's talk. Azure podcast with your hosts, Sam Foote and Alan Armstrong.

If you're new here, we're a pair of Azure and Microsoft 365 focused it security professionals. It's episode 17 of season five. Alan and I recently had a discussion around Azure Databox. It's a ruggedized appliance designed to simplify the process of transferring large volumes of data to Azure. Here are a few things that we covered. What is as your data box and what can you back up with it? How does the process work and how much does it cost? We've noticed a large number of you aren't subscribed. So if you do enjoy our podcast, please do consider subscribing. It would mean a lot to us for you to show your support to the show. It's a really great episode, so let's jump in. Hey, Alan, how are you doing this week?

Hey, Sam. Not doing too bad. How are you? Yeah, not, not doing too bad. Thank you. I feel like this week has been a co pilot week for me. That seems to be the theme that I've been doing a lot of prompting. 365 copilot. I mean, yeah, to be fair, I've.

Used a lot of copart for 365 this week with a sprinkling of copart for security, I guess. But, yeah, it's RSA at the moment, isn't it, if I remember. So there's been a few, there's been quite a few announcements. So next news, one's going to be streaming with updates, I think. Yeah, even more. Even more. Yeah, it's good. Yeah. Really good to say. Cool. Okay, so what we talking about this week then, Sam?

Yeah, so this is a, this is a new to me service, that's for sure. And it seems really cool. So, yeah, we're going to talk about Azure databox and the databox service, I would say, an old school way, I would say, for data transfer into Azure. That's why I get the vibe off with this one. Okay, so let's get started. So I guess Reed started with what is the problem Azure data box is targeting? Yeah. So.

The sort of first, I suppose, most obvious use case for it is just being able to transfer a huge amount of data into Azure. If you imagine that you've got some sort of use case, like, let's say you're migrating to Azure as an example. Let's say you're, I don't know, Netflix, I think they're aws. But, you know, could insert any media streaming service. Imagine you've got your huge media library that you want to move into Azure. Maybe you're migrating your service, maybe you want to use it for disaster recovery as a backup. Maybe you just got a huge amount of data that would be infeasible to transfer into Azure, not only because of the ingress costs, but the amount of time that you would spend uploading it and the bandwidth that you would consume at your remote site might just be infeasible. Imagine that your office, your building that you're in, you might have a metered Internet connection or you might have data caps or a speed issue. It could take you weeks, if not months to transfer that information up. So, you know, it's, it's really, it's really targeting those types of scenarios in the documentation. They really sort of target 40 terabytes and above sort of, you know, as a sort of tipping point, really. I've never been in the scenario where I've had to upload 40 terabytes to Azure, right. But I can imagine anything below that might not make sense to have some sort of physical appliance, but above that, then that's where this sort of kicks in. A scenario that they use for this is for sort of disaster recovery, having like an off site backup if you've got like, you know, truly large amounts of data. The other one is, the other part that I want to talk about is not just upload into Azure, but also export from Azure as well. So it's both ways essentially. So if you're migrating from Azure or you want to have another, you want to pull about a good amount of data or you want to use something else as a backup for your data in Azure, then this service can help you as well. I think the key benefits really are the speed aspect of it. We'll talk about the actual devices themselves in a bit, but really you have a local network connection or local interface to those actual devices that you're using. So you get like local speeds whilst you're actually doing your transfer. And then obviously it's a lot, I don't know how you would quantify the data speed of a truck driving your data box to a data center. And then that part is maybe quite slower, but because you've got so much space there, you're transferring a lot of information in one go. So if you've got 40 terabytes that you're moving, as an example, you zip that onto the box as fast as you can and then the 40 terabytes gets loaded into a truck and sent up to Azure. And yes, that process might take end to end. It might take ten days, five days. I don't know what the shipping is like, but you know, if you've got a really slow Internet connection it could take you a lot longer than that anyway. Also there is security built in and I'll go into it in a bit more depth. But you can have secure end to end transfer even though you're handing off the physical device to somebody else.

Okay, interesting, because I know, I know previously there was a service that you could do something similar, but it's more around 365. Again, not knowing exactly what you know, you do with the Azure service side of things, but with 365 I think you could provide them with your hard drives and they securely collect them and then in effect upload them into SharePoint or onedrive kind of thing and things like that where you could transfer your emails from your exchange on prem to online. So yeah, I'm definitely interested about understanding what this sort of does and I guess like some of your scenarios, I was just thinking of some of my, like I suppose is, you know, relations and things like that. It's, and they've got their own businesses but the smaller sort of organization that maybe want to back up their home, Nas's that they've got data, you know, they do video editing or something and they've got loads of data there they might want to back up as a one time thing as well. Up into Azure.

Yeah. I think if you've got a large file workflow, video editing would be a really good one because there's huge amounts of raw data there that you might want to preserve in doctor scenario because I know especially with video editing, those types of companies pay a lot of money to have that video footage shot and there is massive value in retaining and having access to that data in the future. Right. So, and you might not need it. So it might make sense to have that backed up somewhere else as offsite.

Yeah. Okay. So I guess we've kind of talked about what problems we, it's kind of designed to sort of tackle in some scenarios, but I guess what is all the service or Azure data box and how does it work?

Okay. Yeah, so as I've mentioned, you can import and export data from Azure both ways. I'll talk about the sort of workflow and the process in a bit more detail. What are the sort of targets of where you can send data in Azure? So I think this might be what you were referring to, Alan. So you can use Azure data box and the sharepoint migration tool to take file content into sharepoint online. So this may be a revision of that service or just an expansion of that service into other areas. You can replicate files into an Azure file share as well with Azure file sync and you can also target Azure backup as well for critical infrastructure and recovery service vaults. They do call out as well that you can also use third party tooling as well. So like Veeam is supported for backup and backing up your hyper v machines to your data box and then storing them in the cloud as well. Let's quickly talk about the workflow before we get into the actual hardware itself. What you do is you make an order from the Azure portal from what I believe you need an EA or CSP agreement in order to make the order and you effectively put in all of your shipping information and they prepare the disks and the actual hardware itself, they deliver it to you and then you plug in and I'll talk about how you, the connectivity of the devices, you copy the data onto the disks and then you prepare and ship the disks back to Azure directly and they then do the ingestion on their side. So the real benefit here is that Microsoft owns and sort of effectively loans you the hardware itself. So there's three devices, but they are Microsoft owned equipment that you get sent and receive and return and use. So you know, why, you know, why would you use them? It's an end to end system that is completely managed by Microsoft. You can securely transfer and I'll talk about securing a bit more depth. But also the appliances themselves are wiped clean by Microsoft. They're in Tampa resistant housings and you can have full customer managed sort of encryption from end to end as well. So they've thought about secure file transfer and they also support when they're wiping the disks and cleaning the information. The NIST, I don't know this exact 1888 r1 standard. I assume that's a NIST directive on data cleansing and wiping. I don't know how high that up is in terms of government regulatory compliance for NIST. But yeah, you do have a designed end to end workflow there that is secure. It's supported in most major regions. So you know, basically where most of the, you know, most of the major regions are. You shouldn't really have any issue with region availability. Only real caveat to that is you, you can't shift, you can't ship across regions. So like for instance UK south is supported. So your disks have to be sent and received within the UK itself. You can have up to five, I believe, discs out, appliances out at any one time. That's per region. That's sort of how it's limited on that side of things. Let's just talk about some of the limits, not the size limits because I talk about that when I do hardware. So actually no, let's go through the hardware and I'll talk about each of the device limits as we go. I'll start from the bottom and I'll work my way up. So the smallest solution is the Azure databox disk solution. So this is a physical, physical hard disk. You can order them in blocks up to five I believe of disks at a time to be delivered to you. You plug them in via USB three and you can copy content to them. I believe each disk is eight terabytes and you can have five for basically one combined data transfer of 40 terabytes in one go. Only a single storage account is supported with disk. I assume that's one disk to one storage location. I'm guessing you can have a maximum of 100,000 files, but inside that storage account you can have up to 512 containers or shares to segregate that information. There's a bunch of like different, you know, caveats what you can and can't store as your block blob page. Different formats are supported. So it's not just flat files that are supported. You can upload sort of object store content as well with it and you format it in different ways essentially to sort of direct it and where it should actually deploy. Very basic requirements from operating system perspective. Windows Server, Windows Server and Windows is supported. Ubuntu, Debian and Centos are all supported as well and pretty low requirements to copy it across. I believe you can also hardware encrypt yourself by. It's actually just a SATA hard drive so I believe you can mount the actual hard drive itself and do a block encryption on the actual device as well if you want end to end encryption. So that's sort of the most basic, you know, process. Just actual actual disks themselves that get sent out to you. You've got about a usable space of around seven terabytes per disk. So 30 tera, 35 terabytes in total because of, you know, file systems and formatting. And it's two, it's less than two pounds, two pounds in weight per box and that's going to become more relevant as we talk about the other sizes of data box. So I'm going to call that entry level.

Yeah, I guess like you said, they're just in effect USB drives, aren't they? But they've been, you know, just for this sort of transfer. They've got some other sort of software maybe on, in the controller or something? Yeah, I don't know how the encryption's done. I wonder if it's software encryption on the application that you run. You know, they're just standard SATA hard drives and it's all the application that does all the encryption and, you know, secure end to end. Yeah. Okay.

Right, let's jump up a gear to Databox itself. And Databox is a ruggedized device. When it's on its side it uses up seven u of rack space. So if you need to put this in a rack to ingest onto it, and you want to put a shelf in, you're going to need seven U's of it on its side. The storage capacity of the device raw is 100 terabytes and there's 80 terabytes usable capacity after you've applied raid five protection on the disks themselves. It's got a 700 watt power supply and it typically draws 375 watts when it's in use. It's got multiple network interfaces. There are 210 gig gigabit Ethernet interfaces for ingesting information and there's, there's also a one gig management port and a third one gig data port as well. The over a ten gigabit Ethernet network interface, you can, you can look to fill the drive in 80, sorry, in. You can fill 80 terabytes in a single day with the ten gigabit Ethernet connection. There's a local web UI on the box itself so you can use the management interface to remote into it. And it's got a sort of an Azure like interface with blades and also I believe it phones back to the Azure portal so you can manage it remotely as well from Azure itself. I assume there's some connectivity requirements there. It's got encryption end to end like the disks have. I assume that's all managed in the actual box itself. This is now 50 pounds. So what did we say the other one was? Two pounds. So this is 50 pounds. So it's what, 25 times the size and weight? And you know, the usable capacity is quotes only 80 terabytes. So the unit is quite a bit bigger. But it's a completely self contained solution. You know, I don't believe there's a host requirement to run the software. Everything runs on the box actually itself. So. Yeah, so we're starting to sort of ramp up size here. Yeah, the last one. Oh sorry. Gallon. Yes, sorry.

I was interested, you said about rack mounting it, it says seven years. But I think it says, just looking at it, it says it takes seven years space, but you still can't rack mount it because it's got to be on its side to even fit in the.

Yeah, yeah, yeah. You'd have to put on a shelf. Yeah, yeah. It's not rack mountable. It's just. Yeah. If you want to stick it in a rack, you're going to need seven years of. Yeah. Space. And that's on its side. Right. So the last option on the physical side is the azure databox. Heavy. And this thing is like a trolley. Like, it's like got four wheels. It's got like a handle that you can. I'm describing it because we're on a podcast, but, yeah, so definitely cannot be rack mounted. This one, this is its own physical device, one petabyte of raw storage. There's 7014 terabyte disks in there with 770 terabytes usable. And so you've got 240 gig interfaces. I assume they're SFP. I assume maybe fiber. I don't know anything more than that. Four built in psus, typical power drawer of 1200 watts. This one is 500 pounds. So again, an actual, you know, I don't even know. I can't. Don't even want to call this a device. It's like a, like a tanker or something like that. I don't know. So, yeah, if you've got, if you've got, you know, one petabyte of data to transfer in, then yeah, there's a lot of option for you here, basically. So now, you know, I suppose the logistics of handling that type of equipment, I mean, it is on wheels, so I don't know how it gets delivered to you. Does it must come on like a double pallet or something like that, I assume. I don't know. So, yeah, you just got to think about the logistics of getting that in and out of your, your own data center or office and what the logistics of that actually is. So. Yeah, yeah, really. Yeah, really quite something else. Just like the other, the data box. It's got its own web interface and connection back to Azure, basically.

It looks insane. I just quickly looked at it. Yeah, if you look at the pictures, it's like an actual, like a trolley. It's ridiculous. Yeah, I'm trying to think of what it. It looks like, almost like the size of a massive storage container that you might, you know, that you might put files in it. It's the irony there, but it's like, how many drives? Like 18 or something like that. Did you say 1770. No. Yeah, I think it's 70. Let me just. I got the specs here. 7014 terabyte disks.

I mean, that's gonna be really heavy. Yeah. It's 500 pounds. Yeah, yeah. Let alone the heart, you know, the rest of it. Yes. Yes. So you get these devices for ten days, and we'll talk about that in, when we talk about pricing. Let me just double check to make sure it's the same. For the heavy? No, for the heavy, you get. You get 20 days to fill it so you can have extra days, but that's what you get from the bolt, the base cost.

Yeah. Okay. Interesting. Okay. Did you. You talked about the order process, didn't you? And was there any other limitations around the box itself? No. Let me just see if I can pull up the limits on the databox heavy. I didn't have any notes on that. No, I can't see that. Can I see that? No, I can't see that in my notes. No, I've got nothing else to cover on those ones.

Okay. So I guess what happens. It's quite interesting, actually. Yeah. What happens if you lose it, as you said? I mean, you wouldn't want to lose it, but I guess because it's technically, like you said, you're renting it, in effect, from Microsoft to provide your data up to Azure. What happens if you lose it? I mean, I don't know if you could lose a databox heavy, but maybe a data box disk, maybe you could misplace.

Okay, so data box disk, which is the single, you know, eight terabyte disk, the damage or lost device fee is $2,500. Wow. For a single disk, the data box, that one is $40,000 if you lose that or damage it. Okay, I dread the next one. The data box heavy. The lost or damaged device cost is $250,000. So I assume there's, like, an insurance situation here that you've got to think about, right. Because what you're effectively saying is that data box heavy is worth $250,000, quarter of a million.

Yeah. Yeah. And I could imagine. Would you lose it? Don't know, but it's quite a valuable thing. It's got 70, what was it, 7014 terabyte disks in it. Right. So you could imagine that it could be, like a robbery situation. It could be a damage situation, a transit damage situation. Right. Like you say, walking it to the data center and it does. Something happens to it. Yeah. That's probably the most likely, isn't it? I guess, yeah. But, yeah, that's insane.

I do just need to talk about the Databox gateway, actually, because there is a virtual appliance in Azure that you can actually put in place. I've seen it on the diagrams, I couldn't find much documentation about it, but it's essentially a virtual appliance that you can use. So I don't know if that's a gateway in Azure or if it's a gateway on prem that you use to broker, you know, data up to Azure, but I will talk about it in the pricing. But I could only find sort of public docs on the actual offline data process, to be honest with you. The data box documentation links on the website just don't work properly. Takes you to like lots of 404 pages and things like that. It's a bit strange, but it is a really niche service because I don't know, I don't know anybody that's ever had to do this and I don't know how like busy the services.

Yeah, I mean there must be some demand for it for it to be there, I guess for large amounts of data. It's just, I guess we don't ourselves work in those types of organizations that need to transfer that much out there. Yeah, exactly. Yeah, we just got, you know, really low data requirements, haven't we?

Yeah. And really fast Internet. Okay, so I guess we need to talk about cost then. We've heard about the what happens if I lose it cost and the insurance implications that you may have on that and the extra bubble wrap you might need when transferring it between locations. But yeah. What does it actually cost to, I guess send to get it, to use it and also to send it back and get your data transferred.

Okay, so for the databox disk, the base cost for each disk is $50. You can have the data box disk, sorry, for three days before the sort of daily disk usage then kicks in. So after your three days, you're charged $10 per day until the courier scans it back in on the return. So it's from the day that you receive it to the day that the courier scans it and picks it up. The standard shipping fee for one full round trip per package, which can contain up to five disks, is dollar 35. So to move eight terabytes via this method, you know, one single disk would, and if you manage to fill the disk and return it in three days, you're looking at an $85 cost round trip either way, you know, from Azure or back down or up, basically.

Okay, I was just thinking about that. If you did split it out that your three days was costing you $10, then theory, the actual renting the disc is like $20, isn't it? In theory, if you worked out that way and then three days of waiting. You save a lot on the shipping if you do five discs, but you still got the $50, you know, per unit cost per disc.

Yeah. Okay. I guess, you know, I guess if you've got, you're somewhere where you don't have, you know, capacity on your, on your Internet connection because, you know, I guess there's two scenarios. There, isn't there? You have, like you said, metered Internet connection or you've got fast, you may have, you know, fast Internet connections, but it might be there's no capacity for you to do the transfer up to Azure as well, isn't it? I suppose, yeah.

So standard data box, you get ten days included in your service fee. So one unit is $250 each extra day on top of the ten days is $15 per day and the standard shipping fee is $113 round trip. So if you manage to fill it in your time, you pay what, $263 round trip to move what was 80 terabytes formatted into Azure. It's $3.28 29 /tb okay.

Databox heavy. You get 20 days included with your one unit for $4,000. An extra day fee is $100 per day. And round trip shipping fee does vary by location because it's freight shipped to you, but it starts from $1,500 round trip. So what was it, 800 terabytes formatted for starting from five and a half, $1,000 round trip and 5850 watts, right? Terabytes, 850 terabytes. Yeah. Six pound, 47 or $6.47. A terrible.

Yeah. And one thing I haven't talked about is exporting data actually from Azure in that process. So let's just quickly talk about that. You essentially go into the Databox part of Azure portal and I believe you can decide on Databox or you can send your own disks into Azure. The option seems to you can actually provide your own disks for export from Azure as well. So you essentially select the medium that you're extracting with and you link your data selection to it in the portal, order your disks, they export it and send it to you. You can still do things like customer manage keys for encryption and you set your shipping content details. Next one said, as you go through, you can use a key vault to store your customer managed key.

Is the cost the same with renting the disk? Doesn't matter about the actual data. Yeah, I believe so, yeah. Because it's just the other way. Basically you've got ten days to get it extracted wherever it's going. So I suppose that's another consideration from, you know, if you're exporting, you've got to be able to ingest that to somewhere, you know. Yeah. You can't just bring it home and leave it for a month. Yeah, exactly. To get your storage array up. Interesting. Okay, cool.

Yeah, so yeah, that's basically it. If you've got so much data that you physically need it moving into Azure, then they've got options for you there. They feel relatively expensive, but they are talking about a physical device. I mean if you've got like nearly a terabytes worth of data that you need to move into Azure, is five and a half, $1,000 really issue for you?

And I guess it's also you're not worried about connectivity, that being some oddities with the port, you know, maybe the, the transfer failing and you not knowing what you've transferred already. And I know there's other capabilities, you know, that can, you know, check what you've transferred and things like that, but you've just got all that other unknown across the Internet, haven't you? Yeah, yeah, exactly.

Just generally not, not to azure but you know, the actual routine and everything. Just talking about that side of things really. But this is like you said, done it locally. I know it's on the box. When Microsoft plug it in, it's going to just transfer locally there and go into the right, you know, go into the right locations etc. As specified.

So yeah, I probably, I probably would like to call out as well the docs that Microsoft got about how you go through the process of looking at all the different data transfer options in and out of Azure because we've just talked about one of them. So there's some really good docs on the learn site about how you compare the solutions, what you should use in different scenarios. It looks like they've really sort of worked through a lot of those scenarios ahead of time. So definitely check those out. If this is something that you seriously need to consider these larger file sizes.

Yeah, it might be worth us doing if they still exist. Looking at store simple appliances so that I've seen, and well back in three or four, five years ago seen that sort of in action as a sort of synchronization with Azure as well. Okay. There's probably better ways now, but that was an interesting thing, but yeah. Okay. Is there anything else than Sam you can think of? I mean, we have covered it quite well, I think.

Yeah. Well, considering it was just a bunch of boxes of hard drives, I think we've done all right to get ourselves to over 35 minutes. Cool. Is there any other episodes to call out? I seem to be on this theme at the moment of picking sort of backup a related content, but season five, episode 14. So just a few episodes ago I sort of did an overview of Azure backup solutions. So yeah, go and check that out because it's vaguely relevant to this topic. Alan, what are you covering next week?

Yes, I'm going to cover managing Mac OS devices with Microsoft technology. So with intune and entrancing. So due to some. Not necessary, due to some announcements, but because of some recent announcements, there's definitely some view to sort of show how you can manage them and understand if it's. If actually, because I know a lot of organizations use JAMF for managing macOS and iOS, those kind of things. I think, you know, a lot of talk now that you know that intune is capable of doing majority of things for managing, you know, Mac, Mac endpoints and things like that now, especially with this new single sign on sort of KPAT that just got announced at RSA. So. So yeah, we're, we'll run through that. We'll do some, we'll check out some of this new stuff as well before that episode.

Nice. Thanks, Alan. Yeah, that should be a good episode. Cool. So did you enjoy this episode? If sued? If. If so. If so, please do consider leaving us a review on Apple Spotify. This really helps us reach more people like yourselves. If you do have any specific feedback or suggestions, we have a link in our show notes to get in contact with us. Yeah. And if you've made it this far, thanks very much for listening and we'll catch you on the next one. Yeah, thanks all. Bye.

Transcript source: Provided by creator in RSS feed: download file