Reza Rajabiun COMDOM Software
download slides (PDF)
As anti-spam filters have improved in their capacity to process text-based messages, spammers have learned to 'envelope' their communications in a number of different formats. These envelopes include documents, pdf files and graphical formats. Although it is relatively easy to construct filters that read and process content embedded in some of these envelopes, image spam has challenged the analytical capacity of academic and industry researchers.
The most pressing problem raised by image spam is the large computational power necessary to process incoming content using traditional Optimal Character Recognition (OCR) techniques. For this reason, many network administrators have simply limited the ability of their end users to receive messages containing images. This simple solution has the disadvantage that it limits the usefulness of email as a communication device for business and personal use. Less biased options have been offered more recently by Dredze et al. (2007) who introduce a simple feature selection algorithm resembling ad hoc challenge response methods used in text-based anti-spam products of the late 1990s. Additionally, Wang et al. (2007) extend the standard 'fuzzy signature' method of the mid-2000s for processing text to detecting image spam.
This paper introduces and demonstrates a novel approach to accurate and high-speed processing of image spam that: a) does not suffer from the well known shortcomings of challenge response and signature-based systems, notably their ease of manipulation by spammers, and b) imposes much lower computational costs in terms of hardware than OCR. Image Part Recognition (IPR) decomposes an image into its constituent parts in order to read the characters used to construct spam messages. In combination with a high capacity Bayesian classifier, IPR offers a promising approach to fast and robust processing of image spam. Given the increased importance of sophisticated image spam over the past months, for instance in 'pump and dump' schemes used to manipulate the price of corporate securities, IPR significantly lowers the hardware and end user costs of 'smart spam' enveloped in graphical images.