Earlier this year the US Federal Trade Commission (FTC), which handles antitrust and consumer protection enforcement in America, issued a blog post warning that “hashing doesn’t make your data anonymous”. In the post, the FTC outlined its view that hashing, a technique used to anonymise data, isn’t effective enough to be valid in cases where data anonymisation is a legal requirement. “Companies should not act or claim as if hashing personal information renders it anonymised,” the FTC warned. “FTC staff will remain vigilant to ensure companies are following the law and take action when the privacy claims they make are deceptive.”
The post caused a stir in the ad industry. Hashing is used in a lot of identity solutions as a means of protecting consumer data, and is often touted as a means of anonymising data. So what does the FTC’s statement mean for advertising?
What is hashing?
Hashing is a technique for turning a piece of data into a random string of characters, by passing it through a mathematical function (the hash). Hash functions are designed so that they are random and unpredictable, but that if given the same input, they will always produce the same output. Importantly, the ‘inverse function’ (which would let you reverse a hash, taking the random string of characters and calculating the original piece of data) is unknown, and isn’t feasible to find.
The idea behind hashing is that it takes a piece of data which might be sensitive, or might be used to recognise an individual (like an email address), and turns it into a bunch of number and letters (an example given by the FTC is b0254c86634ff9d0800561732049ce09a2d003e1).
So if an ad tech company wants to use email addresses as an identifier, but doesn’t want to send a list of email addresses through its supply chain, it can hash those email addresses, and then use those hashed email addresses instead. They’re still useful as an identifier, since they’re consistent – every time the ad tech business hashes an email address, it will get the same output. But other companies who handle that identifier can’t see the original data point.
The ad tech company itself can keep a database of the email addresses it has access to and their respective hashes, making it easy to link the hashed email back to the original email address (and thus link it to any other data linked to that email address). However other businesses are unable to do so.
Making a hash of things
The FTC has two primary issues with hashing. The first is that there actually are ways to find the original data point from its hash (for example, to figure out an email address from its hash). As stated, a hashing function will always create the same output from the same input. So bad actors can find the original data point through brute force.
Imagine a company received a hashed email address, and wanted to find out the original email address. They could run many many possible email addresses through the original hash function, and compare the outputs to the hashed email address it has access to. When they find a match, they simply see which input produced that matching output – that’s the original email address.
This sounds like a lot of work, and it also requires you to know the original hash function. But given their complexity, there are a relatively small number of hash functions used (and the bad actor may know which hash function is being used).
And it’s actually very quick to calculate a large number of hashes, especially when you know the format the data point is likely to take. The FTC gave the example of a social security number – there are only a billion possible SSNs, and a bad actor could hash every possible SSN “in less time than it takes you to get a cup of coffee”.
There are techniques which can be used to strengthen hashing – the best known of which is called ‘salting’. Salting adds random extra data to the original data point, thus changing the hashed output.
However in comment underneath an older blog post on hashing, the FTC’s chief technologist at the time, agreed with other commenters that salting is not rock solid itself, and thus not enough to guarantee data privacy.
Strictly anonymous
The FTC’s second issue with hashing is much simpler. One of hashing’s strengths is that it creates a stable identifier which can be passed between different actors without revealing the original datapoint (barring the methods just described). However its stability means that it can still be used by third-parties to track and monitor an individual. They may not know anything about that individual, bar what they see through monitoring the hashed identifier. But they’re still able to monitor that individual, through an identifier which is uniquely assigned to them.
Crucially on this count, the FTC appears to use a pretty strict view on anonymity.
It is sometimes assumed that data is anonymous so long as it can’t be tied back to personally identifiable information – for example a person’s name, phone number, email address, physical address.
But the FTC says that “data is only anonymous when it can never be associated back to a person. If data can be used to uniquely identify or target a user, it can still cause that person harm.” The emphasis on ‘or’ is our own – this ‘or’ suggests that even an identifier which can’t be tied back to any PII, it’s still not considered to be anonymised if it can be used to target an individual.
It’s important to note too that these principles don’t only apply to hashing. The FTC made explicit in the blog post that it is looking at other anonymisation techniques too, applying the same high standards.
What does this mean for ad tech?
There are clearly implications for advertising here. One of the cases listed by the FTC in its blog post was a complaint filed against Premom, which shared advertising and device identifiers, which were then used to track individuals, and were used to infer the identity of individuals.
The FTC’s declaration doesn’t completely invalidate the use of hashing within identity solutions. The US has less strict laws around data privacy than Europe, and businesses aren’t necessarily required to completely anonymise data which they share with third-parties.
But the FTC does require that businesses fulfil any privacy claims they make. Thus any company promising users that their data is anonymised should ask themselves: is there any way for that anonymised data to be linked back to an individual? And even if it can’t be linked back to an individual, can it be used to track and target an individual?
If the answer to either of those questions is ‘yes’, then that company isn’t fulfilling their promises, in the eyes of the FTC, and thus could find themselves in its crosshairs.