For the second time in less than two weeks, a set of data released by the Australian government has been taken offline over fears it wasn't securely anonymized, posing a possible privacy risk.
See Also: Hide & Sneak: Defeat Threat Actors Lurking within Your SSL Traffic
The data comprises a census of Australian federal employees that was conducted over May and June by the Australian Public Service Commission. The online survey, which gauges opinions about a range of aspects concerning public service employment, is administered by an employee research company, ORC International.
This was the fourth year the survey was conducted, covering 105 agencies and almost 97,000 respondents. The data is aggregated and supposed to be scrubbed to ensure no responses can be traced back to an individual, according to a fact sheet.
Numeric Code Problems
Names are not collected as part of the census. But five questions are mandatory, which include gender, age, location of workplace and two questions related to their civil service employee classification, similar to a rank. The census itself is voluntary, but employees are encouraged to complete it.
There was one significant change to the federal census this year: Individual federal agencies were identified by a numeric code. The data was published on a website that's part of a large initiative to make government-collected information more accessible to the public.
But the census data was removed from that website after concerns the numeric code could be used, in part, to link responses back to individual public servants, according to The Canberra Times.
In a statement provided to Information Security Media Group, the APSC asserted it was incorrect to call the incident a data breach. It maintained that it did not publish individually identifiable information, and that no individual could be identified with certainty. Still, the agency felt it was necessary to remove the data pending a review.
"We decided that extra care should be taken to ensure individual officers could not be inadvertently identified if cross-referenced with a range of other publicly available data," according to a statement.
Linkage Attacks
The APSC declined to make officials available for an interview, so it wasn't possible to learn more technical details about what errors the agency may have made in anonymizing data. The Canberra Times noted the data had been downloaded 58 times before it was taken offline.
Anonymizing large data sets is a complex problem. The worry is that other information contained in a data set could be combined with public clues that could enable the discovery, or at minimum a good guess, of the data that has been masked, referred to as so-called linkage attacks.
Australia's Department of Health recently encountered the same problem. The agency published a large data set covering 30 years worth of medical and pharmaceutical claims for about 10 percent of Australia's population (see Australian Health Breach Exposes Danger of 'Anonymous' Data).
In that incident, patient ID numbers and those for medical service providers were encrypted. But researchers at the University of Melbourne found a weak algorithm was used to encrypt the service provider IDs. They cracked the service provider ID codes, which are the same as those that appear on invoices that go to Medicare, Australia's national health service.
The data set was taken offline pending a review and an investigation by Australia's Privacy Commissioner. The government also quickly responded, amending the Privacy Act 1988 to make it an offense to de-anonymize government data.
But as most technologists know, illegality rarely deters hackers. Also, once insecure data has been released, it's impossible to pull back. The APSC says it expects the data to be re-released in the next week.