After CrowdStrike, are we at risk of another major tech outage? - podcast episode cover

After CrowdStrike, are we at risk of another major tech outage?

Jul 22, 202414 min
--:--
--:--
Listen in podcast apps:
Metacast
Spotify
Youtube
RSS

Episode description

Our Y2K bug fears became a reality when millions of computers were disabled worldwide over the weekend.

Microsoft estimates the outage meant about 8.5 million computers were disabled when an update from global cybersecurity company, CrowdStrike, went wrong.

Banks, airports, supermarkets, media companies and retailers were left scrambling – and the company’s chief executive says it could be ‘some time’ before systems are completely back to normal.

It’s believed to have been the biggest tech outage in history – so what can we learn from this to prevent it from happening again? And does it expose weaknesses in a globally connected computer network?

Today on The Front Page, to analyse the next steps from this tech crisis, we’re joined by Aura Information Security, Advisory Consultant, Alastair Miller.

Follow The Front Page on iHeartRadio, Apple Podcasts, Spotify or wherever you get your podcasts.

You can read more about this and other stories in the New Zealand Herald, online at nzherald.co.nz, or tune in to news bulletins across the NZME network.

Host: Chelsea Daniels
Sound Engineer: Paddy Fox
Producer: Ethan Sills

See omnystudio.com/listener for privacy information.

Transcript

Speaker 1

Kielder. I'm Chelsea Daniels and this is the Front Page, a daily podcast presented by the New Zealand Herald. Our y two k bug fears became a reality when millions of computers were disabled worldwide over the weekend. Microsoft estimates the outage meant eight point five million computers were disabled

when an update from global cybersecurity company CrowdStrike went wrong. Banks, airports, supermarkets, media companies and retailers were left scrambling, and the company's chief executive says it could be some time before systems are completely back to normal. It's believed to have been the biggest tech outage in history. So what can we learn from this to prevent it from happening again? And does it expose weaknesses in a globally connected computer network.

Today on the Front Page, to analyze the next steps from this tech crisis, we're joined by Aura Information Security advisory consultant Alistair Miller. Alistair, what was your reaction on Friday night when these problems started to be reported globally.

Speaker 2

I have to say I was a bit worried for all the IT teams who are going to have to deal with them. As the problems escalated and as we've now found out, it's least eight point five million computers. That's a lot of work for a lot of people to sort out.

Speaker 1

This issue has been blamed on a software update, but the effects were pretty wide spread. Planes were canceled, banking systems where inaccessible security systems went down. The list goes on. How standard is it that so many different and varied systems and industries would be linked to the same software.

Speaker 3

Unfortunately, it's quite common.

Speaker 2

There's a sort of only a small handful of big players in a lot of the IT spaces, and they pretty much cover all especially crowdstrikes at the top end of the market. So that will be with all those big players, those banks, big transport companies who we probably haven't seen in the press but probably have suffered because it's going to take a bit longer there and other big players are going to be impacted by those things.

Speaker 4

The scale of the outage unprecedented, affecting flights, banks, healthcare systems, and TV stations. The trouble first occurred in Australia and here industries hit in America, across Europe and Asia.

Speaker 3

Chaos for airlines there are canceled Amy drive.

Speaker 5

We'll have to try and get a flying home somewhere somewhere sometime, don't know.

Speaker 3

Sint John lost computers for taking calls and for dispatching.

Speaker 4

Literally a paper process, having to go back to writing things and using paper and paper boat system.

Speaker 1

We know the issues being traced back to CrowdStrike, a company that produces anti virus software. You mentioned with eight point five million Microsoft devices affected. What do you think went wrong here?

Speaker 2

It appears it's a bit of a lack of quality assurance, so they didn't do their testing with the latest Microsoft update and released a bit of software that conflicted badly with the Microsoft update and took all those PCs and servers out.

Speaker 1

And CrowdStrike I believe has around a twenty percent market share. Is this one example where one company being so dominant in the market really doesn't work in people's favor.

Speaker 2

Well, I mean there is that risk, and again they cater to the high end market, so the ones who are much more noticeable. But it's just very common in the IT space where there are a few big players who can dominate the markets and then obviously when run into trouble, it hits everyone. I mean, Microsoft and Amazon are the prime example really with who the biggest cloud providers adding Google not so big in New Zealand, but big worldwide. When they have issues, the whole world feels

that pain. CrowdStrike is the company behind the software that brought everything down.

Speaker 1

We're deeply sorry for the impact that we've caused customers, to travelers, to anyone affected by this, including our companies. So we know what the issue is. We're resolving and have resolved the issue.

Speaker 2

Now.

Speaker 1

Is this really our y two k fears realized twenty four years later?

Speaker 2

It certainly demonstrates that the people who put in that hard work back then for y two k, this is what they were trying to mitigate, these kind of issues where it takes down all kinds of systems and things that you didn't necessarily expect cause you a lot of pain.

Speaker 1

We were talking on this podcast actually last week about if we need to break up the banking sector or SuperM it's locally for more competition. Is that something that's even possible in the tech sector, or do you need a large company like CrowdStrike, say, to actually have the capacity to manage millions of devices globally?

Speaker 3

They have a lot of competitors.

Speaker 2

I think our problems are more people have not got good business continuity and disaster response plans. A lot of New Zealand businesses have nothing, and the ones that a lot of the other ones will have something written on a piece of paper that's never been exercised at all. So people actually prepared for problems happening, because we'll know they do. You know, we have COVID, we have earthquakes. I mean, you have plans for you know, a fire

escape and you practice that drill once a year. So companies have to have plans for what happens if we lose all our servers and how do we get around that?

Speaker 1

So is this really an issue with companies worldwide not having those fire escape drills?

Speaker 3

Say it is?

Speaker 2

I mean, I think we'll find that some of them have recovered within twenty four to forty eight hours, other ones will drag on for weeks, and the ones that dragon for weeks will be the ones without the good plans.

Speaker 1

Does New Zealand have any companies based here that could take on some of this workload.

Speaker 2

To try and fix it all manually. I mean, we New Zealand has an IT skill shortage, which is one of the reasons why some companies are going to take a long time to recover because they're going to be looking to outside providers. But I mean all companies can either talk to themselves or talk to consultancies and other experts in the area to get advice on what is a good business continuity or disaster response plan.

Speaker 1

I've seen the term single point failure emerge in some of our analysis of this. Can you explain what that actually is?

Speaker 2

So, a single point of failure, or as people lovingly call it, a spoff, is basically where everything points to one thing. Often in an organization, it can be a person. When that person is everything grinds to a halt. It's that one place where, even if you had multiple things around it, they all have to deal with that system, talk to that person or get sort of channeled through it, and when that goes down, the whole business goes down

with it. In this case, it's not quite a single point of failure, but it's the inability to have a failover for when that thing fails that's more of the issue in this case.

Speaker 1

How vulnerable are global systems to these types of issues? Does this make us more susceptible to hackers or malicious attacks.

Speaker 2

Indirectly, it may well do where people lose the ability to monitor things and see what's going on on a wider scale.

Speaker 3

For the issues. Obviously, the attackers.

Speaker 2

Immediately jumped on it, and we're looking to scam people with fake websites, fake emails of help and things like that. But I think what it really reflects is how interconnected the world is nowadays, how much we depend on our IT systems, and we need them to be resilient and robust to cope with things like this.

Speaker 5

We've been long encouraged to keep our devices software up to date, but there are now tough questions as to how one faulty update could have brought so much chaos. To be clear, this wasn't a hack or cyber attack, so you don't need to change any passwords. It only affected machines running Microsoft, but every single one will need a manual reboot in safe mode. Microsoft says some people have to do this as many as fifteen times.

Speaker 1

For years now, we've all been talking about society moving more and more online. Do you think this incident will make some governments and companies stop and have a bit of a rethink.

Speaker 2

Well, I mean, I think they'll have a think about it and they'll have to consider what to do. I mean often a good business continuity plan is to have some paper and some forms available, tell your staff about it train the minute, and when the computers go down, they go back to those pieces of paper and carry on in a sort of more outdated world. But at least they know what they're doing and they know how to do it.

Speaker 3

Yeah.

Speaker 1

I saw in some supermarkets they actually had those things that we used.

Speaker 3

To have and yeah, the little stubs of pay for you.

Speaker 1

Yeah, the stubs of paper or those things where you'd push the thing across for the card reader.

Speaker 2

Oh yes, yes, And that's brilliant that they still had them and the staff knew what they were doing. I mean, you do need these fallbacks. Hopefully we never have to use them all very infrequently, but to have them there is absolutely key.

Speaker 1

Well, a cashless society is one area that's caused concern for some. The idea of us removing physical money and going fully digital, given some people couldn't access their accounts or use pay systems, is that one area particularly that might need to be re examined.

Speaker 2

It certainly puts it at a very high risk of the repercussions should we not be able to access that money. Obviously, if you can't get to an ATM to withdraw the money because the ATM's fallen over, even with cash, you're kind of stuck. But it's definitely something to consider going forward where everything is either on your phone or on a card.

Speaker 1

What's the likelihood of a global outage like this happening again, and if so, how do we go about preventing it.

Speaker 2

I think it's fairly likely that within the next couple

of years something similar will happen. It's just as we've been saying that the number of providers out there is limited in some spaces, some of them tend to monopolize certain sexes, so it's going to hit and so it's really having those plans and getting people to exercise them, at the very least round a tabletop, so everyone gets around that for a few hours and practices it, and then every now and then actually doing a physical test of it of knocking over a system and seeing how

you recover from it.

Speaker 6

Microsoft's going to be looking at this closely to see how this never happens again. From a brand perspective, they're going to have to spend weeks and months ahead trying to contain that. But it does speak to just a broader, the stronger and stronger intact. We're not talking about thirty forty companies. There's really a core ten to twenty. They're run the global IT infrastructure.

Speaker 1

Should we have some kind of worldwide regulatory agreement on something like.

Speaker 2

This, It's a very interesting idea. International regulation is absolutely a nightmare as far as I can tell, so it would be a good idea, But I imagine getting anything through the United Nations or some similar body would be very hard work in.

Speaker 1

Terms of managing it. From a New Zealand point of view, Is there anything our government can do to prevent something like this happening again, or at least making it a better turnaround, so to speak, to get everything back online.

Speaker 2

Well, interestingly, the FMA, who regulates financial bodies, recently put out a sort of not quite a regulation, but a requirement that businesses actually get ready for this kind of thing, improve their resiliency and redundancy plans. So parts of the government are taking this seriously. What they really need to do is kind of have a bit more of a stick about it and go. We're going to come and order you every year to and check these plans real

and that you've been practicing them. It should probably be in all areas where things really matter to us, so you know, the health, transport, financial, those kind of things are the ones that probably need that level of plans in place.

Speaker 1

And in terms of the general public, should we not get our money out of the banks and bury it in the backyard.

Speaker 3

I think that may be an overreaction.

Speaker 2

I would imagine that's probably going to have as many repercussions as doing that. But we certainly need to be prepared ourselves, have our own sort of fire drill or earthquake escape plan on what happens if we can't, you know, use our card. Do we have you know, a couple of hundred bucks in a draw that we could then go and get groceries or whatever we need? So we ourselves are not kind of beyond having little plans.

Speaker 1

Yeah, a bit of a wake up call for everyone.

Speaker 2

Hey, it is it's that complacency where we get where it's someone else will always fix that problem. We don't have to worry about it because it's taken care of.

Speaker 1

Thanks for joining us, Elstair no problem, Chelsea.

Speaker 3

It was a pleasure to talk to you.

Speaker 1

That's it for this episode of the Front Page. You can read more about today's stories and extensive news coverage at enzedherld dot co dot nz. The Front Page is produced by Ethan Siles with sound engineer Patti Fox. I'm Chelsea Daniels. Subscribe to The Front Page on iHeartRadio or wherever you get your podcasts, and tune in tomorrow for another look behind the headlines.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast