The threat intel team at Recorded Future, a US-based cyber-security firm, claims to have identified the hacker who assembled and then sold a massive collection of email addresses and passwords known as Collection #1.
The company's experts believe a hacker going online by the pseudonym of "C0rpz" is the person who rigorously and meticulously collected billions of user records over the past three years. This includes records from companies that were hacked in the past and whose data was posted or sold online.
Recorded Future says that C0rpz isn't only responsible for assembling and selling Collection #1, a data trove of 773 million unique email addresses and just under 22 million unique passwords that grabbed headlines at the start of the year, but many more other data collections.
Researchers say Collection #1 was part of a larger package containing seven other "collections" in total.
"ANTIPUBLIC #1" (102.04 GB)"AP MYR & ZABUGOR #2" (19.49 GB)"Collection #1" (87.18 GB)"Collection #2" (528.50 GB)"Collection #3" (37.18 GB)"Collection #4" (178.58 GB)"Collection #5" (40.56 GB)Of the seven, the AntiPublic collection had already leaked online and had been shared among other hackers since April 2017. The rest appear to be new items, that hadn't been seen online until this month.
In total, these databases appear to contain more than 3.5 billion user records, in combinations such as email addresses and passwords, usernames and passwords, and cell phone numbers and passwords.
Recorded Future says C0rpz sold this data to other hackers, who are now disseminating it for free via online sharing portal MEGA and via torrent magnet links.
Some of the hackers who bought this data from C0rpz are Sanix, another hacker who infosec journalist Brian Krebs first identified as the source of Collection #1, and Clorox, the person who initially shared Collection #1 for free on Raid Forums at the start of the month, inadvertently exposing this huge data trove to security researchers and journalists.
"Neither of three actors has ever been on our radar," Andrei Barysevich, Director of Advanced Collection at Recorded Future, told ZDNet in an email today. "However, we did find a previous online footprint on all actors, which does not suggest that these actors are sophisticated."
Barysevich also told ZDNet that his team didn't find "any proof" that the named three, including C0rpz, are hackers, responsible for actual breaches at any company.
"We believe they have merely aggregated the data over the time," Barysevich told us.
But Recorded Future experts aren't 100 percent sure in their attribution of these data collections to C0rpz --as no attribution that involves self-aggrandizing and braggadocio hackers can truly ever be 100 percent. Experts are also looking into another possible source of the leak, which they did not name yet.
"On January 10, 2019, an actor on a well-known Russian-speaking hacker forum posted both a magnet link and a direct download link to a database containing 100 billion user accounts hosted on a personal website," Recorded Future said in a report published earlier today. "The following week, the actor made clear that the data dump referenced in Troy Hunt's [Collection #1] article was included in their dump as well."
To be fair, it doesn't really matter who assembled, sold, or shared this data in the end. All this data was previously available for years. The difference was that in past, this data was shared in individual packages, per site of origin.
It's only become a recent trend for data hoarders (hackers who collected data from hacked sites) to assemble these smaller leaks and breaches into gigantic packages.
This became a trend because more and more companies are getting hacked, and the value of individual leaks became smaller. Data sellers adapted and started merging leaks together to continue to make a profit.
There are likely hundreds of similar mega-packages being shared on hacking forums out of the public eye as we speak, which have not made the light of day yet.
Eventually, they will. When that happens, cyber-crime groups will collect these aggregated leaks, extract any new user records they don't have, and use this information to spam our email inboxes, attempt brute-force attacks against our online accounts, or, even worse, use these details for extortion or financial fraud.
It is highly likely that most of our data has already leaked online by now. All, we, the users, can do is protect our accounts with strong passwords that are unique per site, enable multi-factor authentication wherever possible, and avoid entrusting our data to any company that asks for our details for no good reason.
Now, if we could only get journalists to stop blowing these "collections" out of proportion every time one of them surfaces online.