Correlating sentiments and topics with spam waves on social networks

Daniel Dichiu Bitdefender
Lucian Lupsescu Bitdefender
Irina Rancea Bitdefender

There have been many papers analysing Facebook vulnerabilities, threats and spam. However, there have been few papers that extract correlations, on a very large corpus of posts, between a user's posted content and the spam received by that user.

Each status update on a social network brings with it a piece of information. That piece of information might be used for targeted attacks. Being able to collect users' status updates is outside the scope of this paper. We are interested to see, by reverse-engineering, if there is a connection between the type of status update (i.e. a positive event in a user's life, a particular topic) with the received spam. With this information, faster and more powerful spam filters can be designed.

The system we use has two components: one for sentiment analysis and one for topic extraction. The former consists of two Support Vector Machine classifiers trained on an annotated corpus, which predicts a status update's level of subjectiveness, respectively its polarity. The latter uses either Latent Semantic Indexing or Latent Dirichlet Allocation (both unsupervised methods) for extracting topics from status updates posted just before receiving a spam message.

Example of correlations:

level of subjectiveness vs. spam waves;
polarity vs. spam waves;
status updates' topics vs spam waves.

The information is extracted from a corpus of status updates for over 150,000 users.