Data Breach: Reputation Risk Intelligence Company Left 30TB Server Exposed

Reading time: 7 min

First published: Mar 1, 2021

Updated 2 times since publishing

Written by Cyber Research Team WizCase

The Wizcase CyberResearch Team, led by Ata Hakcil, have recently discovered a huge breach affecting Polecat, the UK leader in reputation intelligence, exposing 30TB of data and billions of records.

Polecat, which successfully predicted the outcome of the 2016 US Presidential Election, had potentially conducted a similar set of research less than a week before the 2020 US Election. In 2016, the company explained its research methodology as follows:

“Polecat’s award winning technology interrogates global conversations online and in social media, with over 7,000 keywords associated with 270 risk topics, such as inequality, unemployment, harassment, corruption and climate change.”

Polecat’s 2016 Election Prediction featured in the front page of Financial Times

Based on our research, we assume that Polecat conducted its 2020 research using a similar process, but it appears that they never published their latest election predictions.

What’s Going On?

Data leak discovered: 29th October 2020
Polecat contacted: 30th October 2020; 1st November 2020
OVH contacted: 1st November 2020
Attacked by Meow: 30th October 2020
Response received: 2nd November 2020
Server secured: 2nd November 2020

How Did the Data Breach Happen?

Polecat left an unsecured ElasticSearch server exposed — no authentication was required and no encryption was put in place.

It is important to mention that the server exposed some well-protected usernames and hashed passwords belonging to Polecat’s employees. This shows that the company is aware of the security measures required to protect its data, and that the server exposure was likely a result of human error.

The database contained 30TB of data, exposing over 12 billion records, including:

Over 6.5 billion tweets
Almost 5 billion records labeled “social”, which seemed to be all tweets
Over 1 billion social posts across different blogs and websites

The day after the server was exposed, our team observed that a Meow attack had already erased half of the data. A few more terabytes were then deleted by an unknown actor (either Polecat itself or another hacker), leaving the database with just over 2 billion records counting for around 4TB.

Soon after, a third actor seemingly deleted most of the remaining records, leaving behind a ransom note asking for 0.04 bitcoin (around $550 back then) to get the data back. It’s important to note that these types of scams/ransoms are usually automated and sent to many open databases.

According to our findings, Polecat started to harvest the data back in July 2019, despite the server containing records dating back to 2007. We could see that new indices were being added even on the day we discovered the leak, with many new records added every second.
This means that even after the aforementioned attacks, the database was still being continuously updated with new records.

According to our estimates, about 20-50 million tweets had been harvested on a daily basis since the end of July 2019, which represents 5-10% of the approximately 500 million total tweets sent each day.

The breach affected tweets and posts from Twitter users all over the world, in many different languages and across multiple countries.

The data exposed included:

For tweets:
- Tweet content
- Tweet ID
- Author username
- Language
- Time it was harvested
- Hashtags
- Views/Follower Count
- Tweet URL
For websites/blogs content:
- Post content
- URL
- Time it was harvested
- Language
- Publisher
- Region
- Post title

After analyzing a sample of the data, we realized that a big part of the content was related to topics such as racism, propaganda, firearms, Covid, and healthcare, as well as politicians such as Trump, Obama, Putin, and more.

Screenshot of analized data pro-Trump tweet from the user in the US

A sample pro-Trump tweet from a user in the US

Interestingly, a sizable portion of the data didn’t seem to have a direct correlation with topics that could be screened for the latest US elections. As mentioned earlier, Polecat used over 7,000 keywords in 2016 to analyze the reputation of the candidates to give accurate predictions.

A twitter user unhappy with Dr. Fauci

A user from England saying Disney villains should not get the best songs

If the breached server was used by Polecat to help prepare its 2020 election outcome prediction, its exposure may have jeopardized the company’s analysis, since almost all of the records were deleted by ill-intentioned actors. That said, there is a possibility Polecat saved the data on a separated (secured) server or completed their work before attackers deleted the content.

Whose Data Was Exposed and What Are the Consequences?

The data exposed is public data, most likely harvested with the tools Polecat is promoting. Yet, the number of tweets harvested seems very high.

Some of the users whose data was harvested seemed to have all of their tweets exposed in the database, while others had only a few.

Corporate Espionage

As previously mentioned, a few perpetrators had already discovered the database within the first few days it was exposed. If any other hacker found and downloaded the data they could try and sell it to Polecat’s competitors.

Even if there had not been an upcoming election, the data could still have been used to analyze other trends, especially considering the amount of data the leak contained.

Blackmails and Threats

We’ve all heard about people getting backlash for past posts; things they published years ago when they were “young and irresponsible” and likely forgot about. If anyone who had their entire Twitter history uploaded to Polecat’s server had shared potentially controversial content in years past, hackers who found the database could try to blackmail them with said content. People could receive threats over what they posted over 10 years ago, and could have their lives and reputations damaged if such things were exposed.

What Can I Do to Protect My Data?

In this specific case, most Twitter users are less likely to receive threats and blackmail than Polecat itself. However, as mentioned above, you could still face some risks.

If your Twitter account is private, you shouldn’t worry about your tweets ending up in this database. But if your account is public, we suggest you take a look and make sure you’ve posted nothing that could come back to compromise you, even years after you’ve shared it. We always recommend to be careful with the data you share publicly, and to think twice before publishing something you might later regret.

How and Why We Discovered the Breach

At WizCase, we are constantly scanning random sections of the internet to find data breaches and to get them secured before cybercriminals can find and abuse them.

As this ElasticSearch server was left open, without any encryption to protect its data, it could have been discovered and accessed by anyone with the server URL. This server also contained credentials to other platforms that were properly encrypted, so even though authorization mechanisms were there, they were not implemented sufficiently.

Who is Wizcase?

WizCase is one of the biggest international online security websites, with content translated into 30 different languages. We provide tools, tricks, and best practices for online safety and security, including detailed VPN reviews and tutorials.

Our online web security team of White Hat hackers have uncovered some of the biggest data breaches, including a research regarding unsecured webcams and a report exposing a data breach affecting the Bing app.

I release reports to the public and also disclose them to the impacted companies, empowering them to enhance server security and foster a safer environment for all.