Thursday, May 18, 2017

We Didn't Know What We Didn't Know: WannaCry and the Case for SaaS

“There are known knowns. These are things we know that we know.
There are known unknowns. That is to say, there are things that we know we don't know.
But there are also unknown unknowns. There are things we don't know we don't know.”
Donald Rumsfeld. Former US Secretary of Defense.

“Hedley Lamarr: Unfortunately there is one thing standing between me and that property: The rightful owners.”
Harvey Korman. Blazing Saddles.

“Plan to be spontaneous tomorrow.”
Steven Wright.

I watched in horror last week, as did many of you I suspect, as the WannaCry ransomware crippled thousands of systems around the world and wreaked havoc in almost every country on the planet. I have, you might say, slightly more than a casual interest in the matter. For several years, I managed the team responsible for the SMB protocol, the vector used for the attack, and I was also the head of security for the company for a few years.

I didn't personally write a single of line of code in that protocol. That task was delegated to much smarter people; but I did manage the teams responsible for building, testing, maintaining, advancing, and securing it for several years. I was shocked, like everyone else, to see it at the center of an international meltdown of unprecedented proportions. Almost two hundred countries impacted? Over two hundred thousand businesses stymied? Hospitals? Emergency rooms? WTF?!?

For the uninitiated, here's a brief summary of the situation. If I understand correctly, the ransomware takes advantage of a previously undiscovered bug in the SMB file sharing protocol to take over a computer, encrypts all the files, and puts up a message telling the owner to pay up or lose their data. The NSA had known about the bug for years but didn't disclose its existence so it could be used as an espionage weapon. It was only discovered “in the wild” a few months ago when a group called the Shadow Brokers leaked a ton of NSA documents through WikiLeaks. Microsoft fixed the bug back in March but it's been present in all versions of Windows for years and many companies were caught off-guard as they didn’t realize the potential impact and didn’t deploy the patch on their systems.

This is a protocol, mind you, that was designed back in the eighties when computer networks were still few and far in between, was first shipped in 1990, was standardized as part of the CIFS protocol later in the nineties, and has been used for the past twenty five years for file sharing in every Windows and Windows-compatible product in the world.

Now you take one of these servers that, benignly, implements this file sharing protocol so you can… guess what… share files across a supposedly secure local area network, you add a pinch of magic dust and send it a really screwy malformed request, one that no sane human being would ever send in a reasonably written piece of software. This malformed request, in turn, triggers a bug in the implementation of the SMB protocol that allows the caller to gain supervisor access to the system. Game over. You can encrypt all my data behind my back and ask for ransom to release it.

Microsoft’s Brad Smith immediately blogged about the need for all parties to share this kind of vulnerability information in order to secure software. It is inconceivable to me, knowing what I know about the teams and process at Microsoft, that they would not have fixed this bug had they known about it. I am not here to apologize for Microsoft or the Windows team or the SMB protocol or the history of computer science. I’m here only to say simply that more such bugs will be found in the future, for the simple reason that “we didn’t know what we didn’t know back then” and it’s crazy to continue to depend on such software in today’s world where billions of people are connected to the internet, where nefarious actors abound, and where automated tools can be used to sniff out vulnerabilities.

We spent years designing this software. We spent years testing it. We spent years standardizing it in cross-industry committees and sharing it with partners. We spent years building a community around a protocol that is supported by millions of servers around the world. Our goals at the time were primarily interoperability, usability, and compatibility. We even spent thousands of man years fuzz testing the APIs to make sure attackers couldn't trigger vulnerabilities in the code. We used specialized tools that generated all kinds of random patterns in the arguments and we worked hard with the community of white hat security experts around the world to, responsibly, document and fix security related bugs in all our software.

But guess what. No one tried this particular random pattern of bits - except the NSA. And they chose to keep it to themselves because they felt they could use it to spy on people. That's the story as I've seen reported. Feel free to correct me if you have other data.

Note that “automated updates” (a la Windows Update) are not a solution. Unlike consumers, most organizations around the world spend months retesting Microsoft patches after they are released in order to make sure they don’t break compatibility with business critical applications, then they spend several more months rolling them out through their complicated networks of thousands of servers. The very same corporations and entities who are the slowest to adopt released security patches are the ones most in need of it, the ones that are highly regulated, fairly antiquated in their processes, and entirely unprepared to deal with a global security event of these proportions.

To me, this is the last nail in the coffin of onprem shrink wrapped software and the reason more and more services will move to a SaaS delivery model. I’ve blogged multiple times in the past about the public cloud several times (on private vs. public clouds, on the death of onprem infrastructure and its rebirth in the cloud, and on the architectural advantages of the public cloud). I hope WannaCry will serve as a wakeup call for all those continuing to depend on onprem shrink wrapped software.

Much will be written about this event and how it could have been avoided or more quickly remedied. But the real answer is much simpler than all that, so I'll spell it out. We didn't know what we didn't know back then. You are likely to continue to find more bugs - not just in SMB, but also in the millions of lines of code written in all the operating system software written over the past few decades that is running our businesses today. And the juiciest bugs will be hoarded by hackers and used to wreck even more havoc on our systems. The real problem is that this is a broken model of service delivery as it relies on local system administrators or worse, government bureaucrats, to decide when to install a patch. We’ve just seen an example of what that means in real life.

So the hackers will keep finding the bugs, knowing that inertia is in their favor. And they will hide it from others - so they can weaponize it, so they can monetize it, so they can benefit from it. Think about that. It's human nature. And we are all in denial of it. The motive - industrial, government, or criminal espionage - is almost secondary in nature.

The days are gone when it made sense to have so much device specific code running onprem. The pipes are so much fatter and faster these days that the same services can be offered much more securely from the cloud. The more code you have on your system, the more “attack surface”. The more compability you offer with legacy systems, the more successful you are as a platform with the onprem software delivery model, the longer the tail of companies that will be at risk of exposure for years to come. As an industry, we figured all this out a while ago and moved to the cloud as a much more robust and supportable service delivery model, but the rest of the world hasn't caught up with that model yet. They're still running 1990’s era software. Legacy is a bitch.

We can sit here and blame Microsoft but that would be a mistake. It's true that every one of the thousands of eyeballs that looked at that particular piece of code didn't notice that it would misbehave in a peculiar way when handed parameters that it was never designed to handle. Some smart kid somewhere figured it and it became weaponized. Trust me, there are many other such pieces of code out there. You and I and the rest of the world will pay the price for the next two decades, guaranteed. That's how long it takes to replace these systems in regulated industries. Did I mention that this particular version of the protocol was officially deprecated by Microsoft four years ago exactly because it was known to have fundamental security flaws in the design? Not that it matters. As became obvious last week, hundreds of thousands of businesses were still depending on it to run their applications.

The cloud model of service delivery, where the vast majority of the code runs in the cloud and is always up to date and the most recent version, conceptually bypasses all of these operational problems. If the code is running on our servers in the cloud, instead of on your servers onprem, it's so much easier to patch problems quickly before they become a liability. And trust us; we know how to better manage and patch and upgrade the servers running the code. Better than you, Mr. Hospital in the U.K., anyway.

Fundamentally more coherent and elegant architectural solutions have evolved over the past two decades that cleanly address most, if not all, of the security concerns we deal with every day in an enterprise context. Yet we continue to rely on twenty year old technology and complain vociferously as it fails to stand up when measured against our latest requirements and innovations. Continuing to run ancient software in today's hyper-connected world is akin to riding a horse and buggy down the freeway, complaining that it can't keep up with the neighbor’s latest Google controlled autonomous vehicle, and blaming the poor horse when its knees buckle under the pressure.

If you think your particular application isn't offered over the web a service, I urge you to do another google search. Meanwhile, depending on software designed thirty years ago, implemented twenty years ago, and deprecated ten years ago to run your business and trusting government bureaucrats to know when and how to maintain those systems is a recipe for disaster. It is naive and it is irresponsible in the world we live in.

WannaCry is just the first of many. There will be more and they will be worse. I'm sure of it.

No comments:

Post a Comment