2009-08-01
Abstract
Gordon Cormack describes the highlights of the sixth Conference on Email and Anti-Spam (CEAS 2009).
Copyright © 2009 Virus Bulletin
Since its inception in 2004, CEAS (the Conference on Email and Anti-Spam) has been held in Silicon Valley. Perhaps the biggest news to come from the 2009 event, was that it is tentatively set to move north to Seattle next year, to collocate with SOUPS, the Symposium on Usable Privacy and Security (http://cups.cs.cmu.edu/soups/2009/). The proposed move reflects the importance of human, social and societal issues – in addition to technical ones – in facilitating electronic communication while mitigating abuse.
The keynote speech on the first day of the event was given by Dave Dittrich, who explored possible ways in which spammers or spam service providers might be punished for their activities. Given the expansive international underground network that supports spam and related activities, the conclusion is unclear.
Lori Cranor, organizer of SOUPS, addressed delegates on the second day of the event, emphasizing the need to coordinate the detection of phishing with education, so that users can learn how to respond to phishing attacks and alerts. A demonstration of educational materials employing the animated character PhishGuru is available online at http://phishguru.org/.
In theme, the 23 contributed papers selected for CEAS 2009 ranged from understanding spammers to understanding users, and from pure technical solutions to those designed to engage the spammer or user.
The first session explored the dimension of spammer behaviour. The first paper observed that spam is generated through vast networks, and that pinpointing the original spam is difficult. Next, ‘Spamology: a study of spam origins’ explored the propagation of email addresses through spammers’ mailing lists. ‘Spamming botnets: are we losing the war?’ observed that the distribution of spam IP addresses is becoming more diverse, indicating that there are ever fewer ‘safe’ subnets that can be assumed to be uncompromised by spambots. Finally, in ‘How much did shutting down McColo help?’, Richard Clayton observed that while the volume of spam decreased acutely as the result of shutting down the large spam service provider, the decrease was temporary and consisted mainly of ‘easy to filter’ spam messages. The net effect was perhaps less than might be assumed from elementary measurements.
The second session considered the role of user input in spam filtering. First in this session was a paper considering the contrasting models of server-side vs. personal spam filtering, which observed that a personal spam filter can be trained using the results of a commercial server-side filter, requiring no input from the user. Next, ‘Going mini: extreme lightweight spam filters’ considered the problem of providing personalized spam filters in a server environment where a very limited amount of memory is available per user. The final paper in this session addressed the same problem by hashing personal and community judgements into a common feature space. The results presented indicate that this approach improves filter performance for those users who train the system as well as for those who don’t – a win-win situation.
The third session considered anti-spam techniques that might be employed by a large email service provider. The first paper, ‘Router-level spam filtering using TCP fingerprints’, presented an approach to identifying spam from the router’s perspective, where packets rather than complete messages are handled. Next, ‘An anti-spam filter combination framework for text-and-image emails’ considered how to combine the results of image- and text-based filters to improve overall accuracy. A group from Texas A&M University presented a tool designed to translate SpamAssassin regular expression rules into POSIX. SpamAssassin is slow, in large part due to the fact that it uses patterns written in Perl, which is an interpretive language. When translated into POSIX regexp syntax, the patterns can be compiled and executed much more efficiently. The translation is inexact, but yields good results. The final paper in this section explored the idea of using new spam and old ham to train products – based on the premise that spam changes much more quickly than non-spam, but is also much easier to collect.
In the next session, ‘An empirical analysis of phishing blacklists’ explored the impact of phishing page warning messages on the user – phishing detection is of little use unless the user heeds the warning. ‘Anti-phishing landing page: turning a 404 into a teachable moment for end users’ investigated a user interface design in which links from phishing pages lead to educational pages explaining why the user was duped, and how to be more wary. The final paper in this session examined the issue of inadvertently addressing email to the wrong user, and proposed a mechanism to warn the user in many such cases.
The paper ‘Training SpamAssassin with active semi-supervised learning’ considered the idea of asking the user to label a small subset of messages – selected by the filter – as spam or non-spam. The overall impact is to lessen the burden on the user, while providing better personalized filtering. The paper ‘Feature weighting for improved classifier robustness’ considered the problem of incorrect training examples: spam messages labelled as non-spam, and vice versa. Such examples may occur due to user error or due to a spammer being able to label messages (for example, in a collaborative filtering system).
In the final session, ‘Extracting product information from email recipients using Markov logic’ considered the problem of identifying electronic transactions. For example, a participant in CEAS may have subscribed to an information list, used a web system to submit a paper, and a different web system to register. How can an email system recognize and accumulate the various messages related to the conference? ‘CentMail: rate limiting via certified micro-donations’ considered an approach to engage both sender and recipient, excluding the spammer. Like all previous proposals for proof of payment, this paper generated controversy. Finally, ‘A human factors approach to spam filtering’ suggested that the user should be engaged differently, labelling rather than filtering spam.
In the wrap-up meeting, the organizers solicited suggestions for a new name for CEAS, while preserving the acronym. The issues and technologies underlying the use and abuse of email are converging with those for other forms of electronic communication and collaboration – including the web, social networks, text messaging and collaborative recommender systems. For example, ‘C’ could stand for ‘collaboration’ or ‘communication’; ‘E’ could represent ‘electronic’; ‘A’ could be ‘adversarial’ or ‘abuse’; ‘S’ – ‘symposium’, perhaps?
From my perspective, the most interesting papers fell at the boundary between technology and human factors. Usability is as important as technology and it makes no sense to study the two separately. The future collocation of the event with SOUPS (and perhaps another yet-to-be-named workshop) will provide valuable cross-pollination of interests and expertise.
The papers from this year’s conference are available at http://www.ceas.cc/.