Posted by Martijn Grooten on Feb 23, 2017
Researchers from Google and CWI Amsterdam have created the first publicly known SHA-1 collision.
SHA-1 is a hashing algorithm: it turns data of arbitrary size (such as a string of text, or a file) into a fixed-length string, with a number of cryptographic properties. Hash functions are ubiquitous in IT in general and security in particular. For example, a security product could whitelist (or blacklist) hashes of files that it has seen before, and digital signatures of documents and files are created by cryptographically signing the hash value of the document.
What the researchers have done is to find a way to create two PDF documents that differ in an essential way, but whose SHA-1 value is the same. They haven't yet released their algorithm for doing this, but adhering to Google's standard 90-day disclosure policy, they will do so in May 2017. At 110 years of GPU computations, though not cheap, it would not be beyond the reach of a moderately funded adversary to break the algorithm.
The SHA-1 algorithm has long been known to be less than ideal, and today's announcement shouldn't come as a surprise to anyone. Nevertheless, the algorithm remains widely used.
It is important to realise that the collision and the soon-to-be published algorithm do not mean that SHA-1 can be broken in every possible use case. In particular, pre-image collision attacks – where, given a hash value, one is able to create a byte string with that same hash value – aren't possible: the adversary still needs to have quite a lot of control over the content of both byte strings to be able to create a collision.
Even in TLS certificates, possibly the best known case in which until recently SHA-1 hashes were widely used, there are a number of mitigations in place that make the creation of a "rogue certificate" likely still to be some time off. For SHA-1's predecessor, MD5, it took eight years from the first discovered collision before a powerful adversary was able to create a rogue certificate that was used by the Flame malware, and then it was only thanks to some mistakes made by Microsoft.
So there's no need to panic: the Internet isn't suddenly broken, and it may take some time before an adversary is able to take advantage of this new development in the wild. But it will happen eventually, and we should make sure we get rid of the use of SHA-1 everywhere, without spending too much time trying to determine the likelihood of attacks in each particular use case.
While we're at it, we should stop using MD5 as well.