July 24, 2019. RT: Online services that claim to anonymize users’ personal data aren’t as secure as we think, according to researchers who found over 99 percent of people can be identified from a handful of supposedly anonymous data points.
Doubters will be quickly silenced with the online tool the researchers developed to accompany their paper, published in Nature Communications on Tuesday. Using just three commonly-requested demographic attributes – birthdate, zip code, and gender – the program is able to successfully identify users about 83 percent of the time. And with 5 and more data points the machine-learning model gets it right over 99 percent of the time.
These alarming results “question whether current de-identification practices satisfy the anonymization standards of modern data protection laws,” the researchers, hailing from Université Catholique de Louvain in Belgium and Imperial College London, wrote, pointing out that “data that does not contain obvious identifiers but might be re-identifiable” is still protected by privacy laws like the EU’s GDPR and California’s Consumer Privacy Act, which protect sensitive personal information from being shared without users’ consent. Popular anonymization techniques like noise and sampling simply do not provide enough protection against potentially malicious actors reverse-engineering their methods – a major problem as more and more personal information, especially health data, moves into the cloud.
“The goal of anonymization is so we can use data to benefit society,” Yves-Alexandre de Montjoye, one of the researchers, told CNBC. “This is extremely important but should not and does not have to happen at the expense of people’s privacy.”
Given how many data points many online platforms contain, the risk of deanonymization isn’t just a risk – it’s a certainty. Credit reporting agencies, for example, hoard as many as 248 separate demographic identifiers on their customers. Nearly three quarters of Americans are concerned about sharing personal information online, according to a 2015 Harvard Business Review survey – and rightfully so, given the near-daily frequency of major hacks, breaches, and leaks – but few understand the complexities of what actually happens to their data when they submit it.