The Joy of GUIDs

Like many developers, my first experience of primary keys was the Autonumber field in MS Access. At the time, this looked like a pretty neat idea. When I outgrew Access and moved on to using SQL Server, I adopted the convention of an identity(1,1) field as a primary key - a pattern that's used extensively both within my own code and throughout the industry.

Recently, though, I've been working on big distributed systems where the identity field really doesn't work too well. The biggest drawback is that if you create a new entity in any of your data stores, you need to wait for the response from the database before you can find it again. For RESTful systems where your Create method might just return an HTTP 202 "Accepted for processing" and leave you to work out whether your request eventually succeeded or not, this approach just plain doesn't work. Enter the GUID - globally unique identifier. The idea is that every GUID ever created is completely unique - it uses various system/hardware IDs, the system clock and a degree of randomness to create unique 128-bit values. The lovely thing about working with GUIDs is that when you create your new customer, or order, or invoice, YOU can assign the ID that will identify the object for the rest of its life.

Thing is, working with GUIDs is very, very different to working with ints. For starters, a lot of security practises around integers just don't apply.

A great example is URL-hacking. If you've ever seen an address in your browser's address bar that looks like:

http://www.myshop.com/basket.asp?basketid=2789

Try editing it. Try changing the query string to be basketid=2788 - on many (poorly written!) systems, that'll show you what's in someone else's basket. It doesn't take a lot of work to write a little script that'll GET all the basket IDs from 0-999999 and voila! you've got a folder full of other people's shopping baskets. Even if the developers do something terribly clever - like using identity(27897,17) instead of identity(1,1) - if you can do ten HTTP requests per second, you can still sweep the entire ID range from 0-1,000,000 in just over 24 hours. Sure, only 1 in 17 of your requests will succeed - but who cares? That's still 58,823 compromised shopping baskets.

Now, imagine instead of sequental integers, you use a GUID:

http://www.myshop.com/basket.asp?basketid=fa8f6423-e7d6-ff40-7afc-ae874c755c00

Try hacking it. It won't work. There are 3.4×1038 possible GUIDs, and roughly seven billion people on earth. If every single person on the planet used your website every day, at the end of a year you'd have 10,000,000,000 * 365 shopping baskets in your system. Let's see how long it would take an attacker to steal those baskets by URL-hacking:

Number of possible basket IDs Bp = 3.4x10^38

Number of actual basket IDs Ba = 3.65x10^10

Likelihood of a randomly chosen GUID finding a shopping basket = Bp/Ba = 1.0x10-28

Let's assume the website we're attacking is running on some serious distributed hardware so it'll remain responsive throughout our attack, and you have a massive botnet so you can check baskets until the server starts to struggle under the load. Facebook manages 6 million page-views per minute, or 100,000 requests per second. At 100,000 requests per second, it will take you (on average) 2,953,788,842,323,276 years to find a single shopping basket by URL-hacking.  Given those odds, a lot of security concerns around vulnerabilities like URL-hacking just disappear.

I've sat through numerous conversations that get onto "what'll happen if we get a GUID collision?"

There's really only one sensible answer to this: You'll find the place in your code where you're accidentally re-using the same GUIDs, and fix the bug. GUIDs don't collide - you're doing it wrong. If you're using Guid.NewGuid() or NEWID(), you won't get collisions. It just won't happen. There isn't even any meaningful analogy with daily human experience that can convey just how unlikely a GUID collision is. Things with a probability of 1.0x10-28 just don't happen.