2006-07-01
Abstract
Olivier Guillion and John Graham-Cumming describe how it is possible to 'blind' the POPFile spam filter with a single-word attack.
Copyright © 2006 Virus Bulletin
In the four years since Paul Graham's 'A plan for spam' [1] was published, Bayesian spam filters have proven their efficiency, versatility and simplicity of operation.
Bayesian spam filters work by computing the 'spaminess value' of a message (the probability that a message is spam; typically a message with probability 1 is certain to be spam, while a message with probability 0 is certain to be legitimate). The spaminess value of a message is calculated according to the probability of each of the words in the message appearing in a spam message. These probabilities are calculated by training a Bayesian filter with samples of spam and legitimate (ham) messages.
A Bayesian filter operates by splitting the header and body of a message into tokens. These are usually words, but may include other meta-information such as the presence of HTML or attached images. The probability that the token appears in spam messages is computed for each token in the message.
After having kept only the most significant tokens (those that have a probability that is significantly different from neutral; i.e. far away from 0.5), the remaining values are combined by following the Bayes formula [2], to obtain a single probability that tells the filter whether the message should be considered as spam.
To determine the spam probability of each token, the filter is trained by the user: it has to be taught what is spam and what is not. During this training process all tokens are collected into a database specific to each user, called a corpus, that keeps track of the number of times each token has been found in each category of message (spam or ham). This 'known tokens' set, along with the spam probability of each token, differs from one user to another, so it becomes impossible for a spammer to find a single word that will be considered the same way by each of their potential victims.
How the probability for each token is computed differs from filter to filter. Filters such as SpamBayes [3] and DSPAM [4] count the number of messages in which a token appears and ignore whether the token appears once, twice, or 100 times in a single message. POPFile [5], on the other hand, counts the total number of tokens seen in all the training data, and the number of times a word is seen in a message including duplicates. In this article we demonstrate how this feature of POPFile can be exploited to get spam messages delivered.
A large number of filters were tested in the TREC 2005 Spam Track and performed well (see [6]). However, spammers have tried to poison Bayesian filters by a variety of methods. A summary of such attacks can be found in [7].
To work around this form of spam filtering, spammers have tried many tricks [8], with particular focus on manipulating HTML to hide likely spammy words (e.g. 'Viagra') from spam filters.
The most common form of attack on Bayesian filters is to add to the spam messages English words chosen randomly from a dictionary. However, both [9] and [10] have shown that this actually increases the catch rate for spam because most of these words are uncommon and by picking randomly, spammers are more likely to hit words seen in spam messages than in ham (because they themselves are adding random words to spam, whereas ham messages use a more restricted vocabulary).
If a spammer had knowledge of a set of words that are likely to appear as ham in most of the corpuses of trained Bayesian filters, they would be able to conduct statistical attacks, by maximizing the probability for their messages to contain a number of ham words large enough to move the probability towards ham [11]. However, it is possible to construct an efficient attack that works against POPFile by choosing words that are likely to be ham for most people.
POPFile computes the spaminess probability of a message in a different fashion from most spam filters:
It does not apply any 'windowing' to the tokens it finds in a message; all tokens are used in the probability calculation. This is based on the assumption that each token, even if it doesn't obviously belong to a given category, can be important in the final decision.
When a token is found multiple times in a message, the multiple occurrences are processed individually. This means that if the word 'Viagra' appears in a message ten times, POPFile will apply the probability for 'Viagra' ten times. This is done because POPFile considers that a token that appears several times is more important than one that appears only once.
POPFile maintains a 'stop word' list of common words that are ignored when processing messages. This list contains many common English words.
Instead of choosing words randomly from a dictionary, a spammer can maximize his chances of bypassing the spam filter by selecting very common words, that are likely to appear in ham messages. Lists of common words are readily available - see, for example, [12]. Figure 1 shows the 100 most common English words from [12], with the POPFile 'stop words' removed.
they one hot word what some other put use how said each which their time way about many then them write like these long make thing see two look more day come number sound most people over know water than call first who down side now find new work part take get place made live where after back little only round man year came show every good give under name very through just sentence great think say help low line differ turn cause much mean before move right boy old too same tell set three want air well play end
Figure 1: 100 most common English words seen by POPFile.
Examination of the corpuses of a number of long-time POPFile users has shown that between 50% and 80% of these are hammy words (i.e. have a probability that indicates that they are more likely to appear in ham messages than in spam). For many of them, the probability is close to 0.5 - indicating that they appear often in both spam and ham, but they are, nevertheless, hammy.
POPFile counts duplicates of tokens in a message, so when a single word from the common word list is repeated several times, each of its occurrences will be processed. If the number of repetitions is large enough, even a slightly hammy word could 'blind' POPFile - the probability for that single token would outweigh all the other tokens in the message and force POPFile to classify the message into the category associated with that token (ham).
Thus, by selecting only one word from the common English word list, and repeating it in a message, the spammer can cause POPFile to classify their message wrongly as ham between 50% and 80% of the time.
Figure 2 shows a genuine spam message received by one of the authors and classified correctly by his POPFile installation as spam.
Return-Path: <[email protected]> Delivered-To: [email protected] Received: (qmail 22081 invoked from network); 9 May 2006 19:09:12 0000 Received: from unknown (HELO guillion.net) (195.50.152.74) by mrelay5-1.guillion.net with SMTP; 9 May 2006 19:09:12 0000 To: <[email protected]> From: “Ernest” <[email protected]> Date: Mon, 15 Dec 2003 01:07:08 GMT Subject: DISCREET OVERNIGHT PHARMACY Content-Type: text/plain; Pay a ton less regular price drugs http://ffutkd.slicebred.info/?35319980
Figure 2: A sample spam identified correctly by POPFile.
Table 1 shows that POPFile identified 27 tokens that had spam probabilities associated with them, resulting in a spam probability of 0.999829 for the message in Figure 2. POPFile also saw 24 hammy tokens with a ham probability of 1.713392e-004. POPFile considers the message to be spam.
Classification | Count | Probability |
---|---|---|
Spam | 27 | 0.999829 |
Ham | 24 | 1.713392e-004 |
Table 1. POPFile probability calculation for spam in Figure 2.
To attack POPFile we chose a word randomly from the 1,000 most common English words and added it to the message 1,000 times. In this example, the chosen word was 'one'. Table 2 shows POPFile's display of the probability for the word 'one'.
Classification | Probability |
---|---|
Spam | 0.4643718161 |
Ham | 0.5356281839 |
Table 2. POPFile probability display for the word 'one'.
Hence 'one' is close to being a neutral word (i.e. having a probability of 0.5), but nevertheless it is more likely to appear in ham messages than spam messages. Figure 3 shows the original spam message blinded by adding the word 'one' 1,000 times.
Return-Path: <[email protected]> Delivered-To: [email protected] Received: (qmail 22081 invoked from network); 9 May 2006 19:09:12 0000 Received: from unknown (HELO guillion.net) (195.50.152.74) by mrelay5-1.guillion.net with SMTP; 9 May 2006 19:09:12 0000 To: <[email protected]> From: “Ernest” <[email protected]> Date: Mon, 15 Dec 2003 01:07:08 GMT Subject: DISCREET OVERNIGHT PHARMACY Content-Type: text/plain; Pay a ton less regular price drugs http://ffutkd.slicebred.info/?35319980 one one one one one one one one one one [followed by 99 identical lines]
Figure 3: A spam message blinded with the word 'one'.
When examined by POPFile, the message is wrongly classified as ham and passes through the filter. Table 3 shows the updated probability display from POPFile. POPFile now clearly believes that the message is ham (with a probability of 0.999999).
By using this method against POPFile, a spammer can make a large number of their messages pass through the filter. A ratio of 50 to 80%, compared to the average false negative rate reported by POPFile users of less than 1%, is more than attractive.
Moreover, if the user reclassifies these messages as spam (as users are encouraged to do when spam is misclassified), the chosen word ('one' in the case above) will see its spam word count increased by 1,000, which means that, from then on, it will be likely to be considered by POPFile as spammy.
If the operation is performed several times (for example, if the user receives a number of spams because of this blinding technique), making enough innocent and common words become heavily spammy, the risk of regular ham messages being misclassified as spam will increase significantly.
Figure 4 shows a legitimate email that was classified initially by POPFile as ham.
Return-Path: <[email protected]> Delivered-To: [email protected] Received: (qmail 17698 invoked from network); 31 Mar 2006 06:57:47 0000 Received: from wp043.hoster.de (80.217.132.52) Received: by wp043.hoster.de running Exim 4.43 using esmtpsa (TLSv1:RC4-SHA:128) Date: Fri, 31 Mar 2006 08:58:09 +0200 From: Kurt Stahl <[email protected]> X-Mailer: The Bat! (v3.71.03) Professional X-Priority: 3 (Normal) Message-ID: <[email protected]> To: [email protected] Subject: Re[2]: mail archive In-Reply-To: <[email protected]> References: <[email protected]> <[email protected] > MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 8bit Hi, thanks for your mail. I’ll have a look at it next week. Have a nice week-end! Kurt
Figure 4: A ham message seen by POPFile.
As shown in Table 4, POPFile considers this to be a ham message with high probability (0.999999). To attack POPFile we sent three spam messages and added the words 'thanks', 'next' and 'end' to each message 1,000 times. These three words all appear in the 1,000 most common English words.
Classification | Count | Probability |
---|---|---|
Ham | 46 | 0.999999 |
Spam | 42 | 1.650895e-008 |
Table 4. POPFile probability display for message in Figure 4.
The words were chosen specifically to attack the ham message shown in Figure 4, but it is easy to see how a spammer sending hundreds of messages to the same user could choose these same words even using random selection from the list. POPFile was blinded by each message and classified the three spams as hams. Then each message was reclassified in POPFile's user interface, teaching it that the three messages were in fact spam and not ham. Table 5 shows the probability display for the same ham message (from Figure 4) after the three spams had been reclassified.
Classification | Count | Probability |
---|---|---|
Spam | 42 | 0.999829 |
Ham | 46 | 1.713392e-004 |
Table 5. POPFile probability display for message in Figure 4 after blinding
Hence the spammer has managed to introduce a false positive by causing POPFile to classify the message in Figure 4 as spam. If the spammer repeats the added word significantly more than 1,000 times, say 10,000 times, the corpus corruption could become much more problematic. In that case reclassifying the false positive may not work; POPFile may still think the message is a spam. However, that would require that the spammer be willing to add 10,000 words to a message. At an average of five letters per word the spammer has to add more than 58kB to each spam sent, increasing their bandwidth cost or time to send.
Several countermeasures are possible against this attack:
Ignoring a wide list of common words: by adding more words to the POPFile stop word list, this attack could be defeated easily.
Capping the number of times a word is counted in a single message: any token that appears above some capping value would either be disregarded completely or capped at the maximum repetition value. Instead of using a fixed value for this bound, a reasonable solution might be to calculate it as the frequency with which the word appears in the message and exclude words that appear very frequently for an individual message.
Automatically ignoring words with a probability close to 0.5. Since many of the common words appear in both spam and ham with approximately equal probability, ignoring words with a probability close to 0.5 could be effective in deterring this attack.
[1] Graham, P. 'A plan for spam'. August 2002. http://www.paulgraham.com/spam.html.
[2] Graham-Cumming, J. 'Naïve Bayesian text classification'. Dr Dobbs Journal. May 2005. http://www.ddj.com/184406064.
[3] SpamBayes. http://spambayes.sf.net/.
[4] DSPAM. http://dspam.nuclearelephant.com/.
[5] POPFile. http://getpopfile.org/.
[6] TREC 2005 Spam Track. http://plg.uwaterloo.ca/~gvcormac/trecspamtrack05/.
[7] Graham-Cumming, J. 'Does Bayesian Poisoning exist?'. Virus Bulletin, February 2006, pp.S2-S4.
[8] Graham-Cumming, J. 'The Spammers' Compendium'. 2003-2006. http://www.jgc.org/tsc/.
[9] Graham-Cumming, J. 'How to beat an adaptive spam filter'. MIT Spam Conference 2004. http://www.jgc.org/SpamConference011604.pps.
[10] Lowd, D, Meek, C. 'Good Word Attacks on Statistical Spam Filters'. CEAS 2005. http://www.cs.washington.edu/homes/lowd/ceas05lowd.ppt.
[11] Stern, H, Mason, J, Shepherd, M. 'A linguistics-based attack on personalised statistical e-mail classifiers'. Technical Report, Dalhousie University. March 2004. http://www.cs.dal.ca/research/techreports/2004/CS-2004-06.shtml.
[12] About.com. '1,000 Most Common Vocabulary Words in English'. http://esl.about.com/library/vocabulary/bl1000_list1.htm.