We’ve grown accustomed to seeing data leaks on a daily basis but every now and then one of them is a spectacular doozy. All on one server with 4 billion user accounts involved, the sheer number of records could populate a small galaxy. The only fortunate thing, if anything fortunate could be said about this or any data breach or data leak, is that the data itself was not as critically personal as it often is – i.e., no social security numbers, credit card data or passwords in this leak.
The data this time is more social in nature – affecting Facebook, Twitter and LinkedIn profiles – and including cell phone numbers, home numbers, email addresses, work histories and other profile information. Four terabytes of personal info was exposed some of the records are duplicates so the unique number of users affected is over 1.2 billion, thus ranking as one of the largest data leaks ever.
Dark Web researcher Vinny Troja, while searching for other leaks with colleague Bob Diachenko, discovered the exposed Elasticsearch server on October 16.
Huge Number of Records Exposed; No One Knows Where They Came From
The data appears to have mixed origins and therefore isn’t clearly identifiable yet. Troja discovered three of the four datasets coming from San Franscisco data broker People Data Labs (PDL). PDL offers for sale on its own website the data of 1.5 billion people, 260 million of which are in the US. Among the data they promote, they boast over a billion personal emails, Facebook URLs and IDs, 420 LinkedIn URLs, 400 million personal phone numbers (200 million US). However, PDL cofounder states PDL does not own the server that held the exposed data. Researchers have confirmed this is likely true though they can’t yet identify how the leaked data got there.
A fourth data set is tagged OXY, likely for Oxydata based in Wyoming. This data represented 380 million consumer profiles and employees in 85 industries, 195 countries.
A Huge Data Leak Among Many
This leak ranks with other mega leaks and breaches that have occurred. In March this year, researchers Troja and Diachenko made another discovery of 809 million exposed records from Verifications.io. In 2018, Exactis marketing firm leaked 340 million personal records, and Apollo also breached billions of data points.
The Elasticsearch Server holding all the 1.2 billion records of personal information of this particular breach was unguarded and could be accessed by browser at http://22.214.171.124:9200. Anyone visiting that address was not asked for a password, authentication or any kind of identifying or restricting requirement in order to access the data.
Elasticsearch different indexes (databases) on the exposed server
Data Enrichment Companies Played a Role
Data enrichment companies played a role here where users’ social profiles were victimized in this leak. These companies provide additional (“enriched”) information on single pieces of information. They don’t charge a lot of money and their services increase user profile data considerably – up to hundreds of new data points. This can include household, financial, income, political and religious information.
No one oversees the resulting information and the door is open for a person’s personal and social information to be accessed easily.
Will Anyone Be Held Liable?
The exposed IP address, http://126.96.36.199, was hosted with Google Cloud, but data in the cloud is protected by privacy. The FBI can make requests but doesn’t have authority to demand an organization to announce a breach. And the question still remains of who is responsible – PDL as the data owner, or the owner of the URL http://188.8.131.52. A court order may be required to get enough information to make the determination.
Willy Leichter, VP of Marketing, Virsec, Says the Time to Act Is Now
The data exposed appears to have been handled by at least two “data enrichment companies.” These organizations aren’t so different from the credit reporting agencies that collect our data. Oftentimes, we don’t know what’s in there, and there’s little recourse to correct it. Well-founded privacy concerns are the major impetus behind California Consumer Privacy Act, GDPR & other state and national privacy laws now in the works. The goal of these is to enable users to explicitly control their data that’s “out there.” There’s been no “opt in” for consumers who don’t want their data shared, and now the challenge is how to put the Genie back in the bottle.
The time to act is NOW. The reality is that the compiled and consolidated data that massive companies are now monetizing is a small fraction of what will be exposed in the years to come. As more companies use increasingly advanced AI to predict consumer behavior, there is enormous potential for both intrusions into and limitations on the average consumers’ life.
Religious preferences, social activities, spending patterns, educational potential and more may become mere data points by which consumers are targeted or limited. Just as so many companies are now using consumer behavioral data to predict shopping, travel patterns and more, they could use customer data, including illegally sourced data, in ways that have the potential to be detrimental on entirely new levels.
The data Genie is growing daily. It’s urgent that authorities pass and uniformly enforce laws to give legal control to consumers over their data. It’s equally urgent that individuals today invoke greater care of their data in the absence of such laws, and that companies be far more diligent with data collected than we’ve seen in these last few years.