Killing Bad Encryption

If your run a website, early TLS is bad. If you run a payment service, early TLS is about to be outlawed. Read on.

Killing Bad Encryption

You see the flowers in the post image? They represent connections to our web applications, over various shades of TLS. Some of them sadly die in this story.

Background

Transport Layer Security, or TLS is a (relatively) modern encryption protocol, used to protect data travelling between clients and servers. It doesn't encrypt the data itself, but it does encrypt the connection between those points. So, it sort of provides a safe tunnel between you and say, the banking or e-commerce website you're connecting to.

Its predecessor also worked at the transport layer, but was known as Secure Sockets Layer, or SSL. In recent years, SSL was shown to be easily compromised and therefore an unsafe mechanism to protecting communications between networked 'things'.

More recently, clever folks that know their cryptographic onions identified flaws in the early 1.0 version of TLS, which meant it could no longer be trusted.

From that point and when proof of concepts started emerging around cracking that type of encryption, firms were encouraged to end support for that particular version.

Then, along came a mandate; the Payment Card Industry Security Standards Council (those people responsible for ensuring that companies that process card payments are doing so securely, via the PCI DSS) weighed in with a hard and fast requirement to stop supporting SSL entirely and TLS1.0. Originally the deadline for this was June 2016, whereby continued support would represent a material breach of the standard and potentially result in a whacking set of penalties (including fines and no longer being able to process card payments).

Given that the PCI SSC made this declaration in April 2015, it was probably (no, definitely) too short notice for everyone on the planet to take appropriate steps to comply by the following June and on that basis, the PCI SSC recognised this and extended the deadline out to June 30, 2018. There may have been some pressure from enterprise at play...

In any event, organisations were given an extra two years to sort out their affairs and so here we are, one and a bit months from the deadline. So, what are we doing?

It's not a simple case of switching things off

Let's be clear, the PCI DSS requirement affects systems 'in scope', so by that I mean card payment systems, either used in e-commerce or cardholder present scenarios. In other words, your payment processing websites or point of sale terminals used in your retail outlet.

So, when those systems talk to other systems, that then depend on other systems and when some of those systems are a decade plus old or are operated by third parties, you start to get the idea that this isn't a simple case of updating the config on a a bunch of servers. No, it's a complex programme of work, that will inevitably break things along the way and also leave clients unable to make connections to our applications, because they're not cool with leaving Windows XP behind (for example). More on that later.

The practicalities

Helpfully, most of our estate is Windows-based, so our approach to dealing with removing support for early TLS is made slightly easier, simply because we're dealing predominantly with a single vendor stack. That's about the only advantage to us though.

Having said that, we have literally 100s of applications that have been hard coded (over the years) to talk over either SSLvX or TLSv1.0. So, maybe not that easy after all.

Some good news though, if we could massage later versions of the .Net framework into our applications (those that don't even know what TLS1.2 is), then we could see if those applications would obey the version mandated by the server. Worth a try!

So, we looked into this and found that a pretty simple registry change would cater for the majority of examples of this problem. Yes, there were a few apps that in code, insisted on literally forcing TLS1.0, but thankfully they were very few. We spotted and fixed those in our test environments.

Awesome, we have a plan

With increasing confidence that we wouldn't destroy everything by making this pretty trivial, yet ultimately significant change, we applied it to all of our non-production environments. One (business critical) thing went "BANG!" but that was it and again, the reason for this was a directive at code level. We fixed that easily, with dev support.

We had verging on 100% confidence that making these changes in live would simply work (remember, we're talking about 100+ web applications). Make that number nearer 200.

Let's do this

OK, we started with what we considered to be lower risk (not low risk; lower risk) servers. Servers we knew didn't have any critical dependencies as such. We ran our updates, tested the hell out of things and hey - things worked! Does browsing work? Does placing an order work? Well yeah! You get the idea.

Next up we delved a little deeper; application servers, web services servers - things that talk to things (some of them really old things). Again, the scripts were run, things got tested and they worked.

There were a couple of bumps, but again, they were apps that intrinsically insisted on talking over bad TLS, but a code change here and there fixed that. Nothing major.

Again, testing proved that core services were available and working; Can I place an order? Can I modify a service? Stuff just worked.

Meanwhile, we started to see little flowers dying in our garden. They were the flowers of TLS1.0/1.1 connections not attempting, while the flowers of TLS1.2 connections began overtaking / overgrowing our garden.

We've still got a few servers to deal with (ones that speak to third party organisations, or deal with really legacy services, but we're confident that these will follow the same patterns we've seen with all of the others, i.e. they'll just behave.

As a bonus, because the OS across the piece is now in charge of the protocol in use, it should naturally allow us to roll over to TLS1.3 once it's properly out there. This is great.

What's left

Well, quite a few things actually and one of them is really a bit of a biggie; clients. One thing we can't control is end users and whatever dated tech they're using to try and interact with our systems.

Come the end of May, we'll cease support for TLS<1.2 on all of our public facing web applications and that's gonna cause an issue. Remember, we have to do this to maintain our PCI DSS compliance, so we don't have to apologise for it; we're protecting our customers' best interests.

But, there does come a price; some random Windows XP user will whack the domain of our firm into their dead version of IE and get a connection refused error and assume we ceased trading. What can we do about that? All we can do is notify the citizens of the web publicly that we're taking their information security into account and updating our systems and infrastructure accordingly.

We're not doing this purely because the PCI Security Standards Council is telling us to.

We're killing bad encryption, because it's the right thing to do.

Update

So, today, we turned off support for TLS1.0 / 1.1 across our entire public facing web application estate. Remember, we're talking about a lot of software.

No one noticed, or at least no one got in touch with an issue.

The conclusion here is that the modernisation of clients, slowly but surely cajoling the use of better protocols has worked, our testing was pretty damn thorough when we made our changes server-side and all of that in combination resulted in a bit of an anti-climax once we finally 'pulled the lever'.

No one noticed.

Brilliant.

Mastodon