#040: Making a hash of it
Let’s imagine you’re building the next new social network, and right now, you’re working on allowing users to create accounts and log in. Being a security-minded person, you know that any data you store might be exposed in some way to an attacker trying to break into your system, and you also know that most people don’t choose really good passwords to begin with, and re-use them everywhere. So simply storing a users’ password is probably not a good idea.
Luckily, mathematics has something in store for you: A one-way function, or hashing, function.
The idea is pretty simple: You create a function that takes arbitrary data, and returns a random-looking number (the hash) of that data. Your algorithm inside this function works so the number hash generated is easy to generate, but hard to do in reverse. That is, given the hash, it would take an attacker an extraordinary amount of time to find the data that was used to generate the hash.
This is perfect for your new social network! Instead of storing the password, you instead compute the hash of the password, and store that. Every time a user tries to log in providing their username and password, you again hash the given password, and compare it to the stored hash. If they match, it’s the correct password.
So, if anyone ever does get at your data, they’ll only have the (to them) useless hash, and not the users’ password.
And some time ago, this was actually all it took to make storing passwords fairly secure. But then human laziness threatened to bring all of this down.
As mentioned above, humans are not good at choosing good, random passwords. Even worse, if you have a lot of accounts, you would need to also remember a lot of passwords, and this is something else humans are not necessarily the best at.
So someone clever figured, instead of trying to reverse a hash, they could simply take a list of the most popular passwords, and compute the hash for each one of them. The result is a rainbow table. Then, whenever they got their hands on other users’ accounts, they just had to search for their hashes in the rainbow table, and voila, you now know their password.
In response, two things happened:
First, to prevent using such rainbow tables, programmers started “salting” their hashes. They generated a random string for each user, added this random string to the users’ password, and then stored the hash of the result, along with the “salt”. This doesn’t entirely preclude rainbow tables, but now someone has to re-compute the entire rainbow table using the “salt” before they can search for the hash. In combination with the second step, this makes using rainbow tables infeasible.
The second thing was developing new hashing algorithms with the explicit goal of them being slow. “Slow” in this context means still extremely fast for us humans, but now a computer trying to crack a hash would only be able to make far fewer guesses at the hash than before.
How much slower? Well, let’s put the difference into numbers, using three hashing algorithms to illustrate the difference. The first algorithm is called MD5, and is a fairly old and outdated hashing algorithm. The second one is SHA-256, a member of a fairly new and fast family of hashing functions, and finally we’ll look at bcrypt, a modern “slow” algorithm.
Using a single, modern computer with a very fast graphics card (turns out, they can’t just generate pretty graphics, but also crack hashes really fast), you can expect around 200.000.000.000 guesses per second if you’re using MD5.
With SHA-256, you can expect around 23.000.000.000 guesses per second. That’s only a factor of 10 less than MD5, so not really suitable for to use for our password storage.
Finally, with bcrypt, you could get around 100.000 guesses per second. That’s 2 million times slower than MD5! And bcrypt can actually be tuned to make it harder or easier to compute, so it is possible to lower the number of guesses per second even further, making bcrypt an excellent choice to store your salted passwords.
Still, for you as a user, the best defense against stolen accounts is to use a password manager, so you can have long, randomly generated passwords, unique to each website you have an account with. This way, even if someone cracks your password on one site, they can’t use it to access any of your other accounts.
You might also note that I called SHA-256 a modern algorithm, yet it still allowed a lot of guesses per second. Why do we still need such algorithms? Well, hashing is also useful in a lot of other areas, where people aren’t trying to crack passwords. One such application is cryptocurrencies, which we’ll look at next week.