How inbound traffic control improves spam filtering

2008-03-01

Ken Simpson

MailChannels , Canada
Editor: Helen Martin

Abstract

Ken Simpson considers the implications of rising spam volume despite increasing accuracy of content filters.


To understand why managing email connections is an essential aspect of today’s multi-tiered anti-spam infrastructure, it is important to look at the history of spamming and understand how spammers have evolved their techniques to outwit countermeasures.

Many industry observers believe that 2002 was the year that spam changed from being a mere nuisance into a significant problem. The dot-com boom had connected millions to the Internet, creating a critical mass of email users for the spammers to exploit to great financial benefit. This was also a time (fondly remembered by system administrators everywhere) when the majority of spam was sent from servers residing in legitimate Internet collocation facilities. Venture-backed companies even created powerful ‘spam cannons’, which perhaps unwittingly assisted seedy email marketers like Scott Richter to reach the critical masses and fund the first of his exotic cars.

In response to the first major wave of spam, the first commercial and open source spam filters arrived – Symantec Brightmail, Sophos PureMessage and SpamAssassin to name just a few. This first generation of filters applied sets of filter rules to each message received, using regular expressions to identify spammy features within messages.

In response to regular expression filters, spammers began obfuscating the content of their messages. Rather than sending a pure HTML message advertising Viagra, for example, the spammer might chop the message into small HTML pieces which, while unrecognizable to the spam filter, would still render into legible text for the message recipient. Regular expression filters added more rules to catch these obfuscations, causing the spammers to innovate further ad nauseum. New anti-spam approaches emerged, leveraging the best text classification research, but the spammers’ goal remained the same: beat enough of the filters temporarily to get a bit of mail through and generate a quick profit.

Prohibition induces ‘bot-legging’

Spamming is a tragedy of the commons, in which a finite resource (our time and attention) is abused at low cost by a minority (the spammers). In many such tragedies in our human history, prohibition has been seen as the answer. In 2003, American legislators passed the CAN-SPAM Act, which made it illegal to send unsolicited bulk email messages with a deceptive subject line and forced legitimate senders to identify themselves with a full mailing address.

CAN-SPAM is rightly criticized for not ending the spam problem, but its most significant side effect was to force spamming underground and out of the reach of law enforcement. Faced with service interruptions, in early 2004 spammers began to migrate their operations to a highly scalable distribution platform that was immune to law enforcement: the botnet. By the end of that year, the majority of spam was being delivered by networks such as Phatbot – and nowadays by Storm, Mega-D and Srizbi – lending little hope to Bill Gates’ famous pronouncement that spam would be beaten before the end of 2006.

Once promising proposals for a final solution to spam

Researchers at Microsoft and elsewhere had devised two techniques that they believed would eradicate spam. The first was SenderID, in which email senders would provide a list of the servers permitted to send email for users within their domain. The idea was that SenderID would allow for the creation of a permanent, ironclad whitelist of trustworthy domains that never send spam, thus allowing recipients simply to block everything not on the whitelist and eradicate spam.

Another idea pitched in 2004 was the computational challenge. Upon connecting to a receiving email server, senders would have to spend considerable CPU cycles computing the answer to a mathematical challenge provided by the receiving server. Bill Gates believed this approach would stop spam by making it too costly to send the high volumes of email required to make spamming profitable.

Unfortunately, neither SenderID nor the computational challenge technique resolved the spam problem. Computational challenges were rejected as being too costly for legitimate bulk email senders (airlines, banks, open-source mailing lists, etc.), and SenderID, while eventually enjoying widespread adoption, proved difficult to implement and so prone to errors that it has remained useful mostly for the acceptance of legitimate email rather than the rejection of spam.

By 2005, what the anti-spam community was getting right was content filtering. By the time filters had more or less universally reached a 90 per cent accuracy level, spam had transitioned from a problem of content to a problem of volume alone.

Spammer economics

Spammers now earn billions of dollars annually (see http://www.ironport.com/company/pp_channel_news_12-01-2007.html). The business is efficient, hierarchical, and organized. In much the same way as the global trade in narcotics involves every conceivable method of smuggling (from submarines to drug mules), the spam trade employs software engineers to develop increasingly sophisticated delivery technologies. And just as the trade in drugs will continue until the end of humanity, so too will the illegal delivery of spam.

To understand how spamming has become such an intractable problem, it serves to analyse the economics that drive spamming. Spammers make money if one in every 30,000 recipients makes a purchase. Given this response rate, a spammer advertising pharmaceutical products can expect to make roughly $5,000 per million email messages sent.

Finding out what it costs to send spam is not difficult: botnet operators advertise their spamming services via online forums. One forum mentioned a price of $100 to send one million spam messages. If we assume that $100 is the cost per million spam messages, and $5,000 is the revenue, then the gross margin from spamming is approximately 98 per cent.

Although some spam filters provide better accuracy than others, filter accuracy across the board is approximately 90 per cent, meaning that only one in ten spam messages reach a recipient. If global anti-spam effectiveness could be improved from 90 to 95 per cent, earning $5,000 from spamming would require the sending of 2 million spam messages, rather than 1 million. This increase in volume would reduce the spammers’ profit margin from 98 per cent to 96 per cent (assuming sending costs remained constant). If global anti-spam accuracy were to reach 99 per cent – a figure that experts will tell you is nearly inconceivable given the innovative nature of spammers – the cost of sending spam would reduce the profit margin to 80 per cent. Consider Google, one of the world’s most profitable advertising companies, which has a reported margin of 25 per cent – now imagine an enterprise with a margin of 80 per cent. The spamming business won’t be going away any time soon.

Why botnets are so difficult to stop

The botnet architecture.

Figure 1. The botnet architecture.

Before botnets arrived on the scene, spammers could be stopped by blocking their IP addresses. Since the introduction of botnets, blocking has no longer provided an efficient solution to the spam problem.

The largest botnets contain millions of ‘zombie’ machines. Botnets are controlled by a bot herder (botnets can also be controlled by the spammer directly via a special network appliance supplied by the botnet owner), who uses sophisticated encryption and peer-to-peer networking techniques to ensure the resilience and permanence of his creation. While the people who control them are constant, the individual zombies within a botnet change constantly. Spam does not come from a predictable set of computers – it comes from all over the place in a completely unpredictable manner. By leveraging the diversity of IP addresses available via botnets, spammers have rendered the blocking approach far less effective than it once was.

Furthermore, as the number of broadband subscribers continues to grow – most rapidly in developing economies such as China and Eastern Europe – the number of computers available to exploit for participation in botnets is expanding. As botnets increase in size and sophistication, attempting to identify where the ‘bad stuff’ is coming from is becoming less and less worthwhile.

Indeed, in 2006 researchers at Georgia Tech discovered in a survey of data from the Spamhaus blacklist that only 5 per cent of botnet IP addresses ever end up listed in the Spamhaus database. In another paper, the same researchers found that 85 per cent of spam zombies sent fewer than ten email messages to their honeypot server over the course of about 18 months, as shown in the graph in Figure 2.

85% of spam zombies sent fewer than ten email messages to researchers’ honeypots.

Figure 2. 85% of spam zombies sent fewer than ten email messages to researchers’ honeypots.

In late 2007, the zombie at 201.21.174.207 (a Brazilian broadband subscriber address) began sending approximately three spams each day into one of our honeypot systems. It took 19 days for the first real-time blackhole list (RBL) to identify this IP address and cause it to be blocked. By sending only a very light trickle of email, zombies can evade detection.

Blocking spam in 2008

Botnet operators only get paid by the spammer when a message is actually delivered to the receiving email server. In other words, the botnet operator gets paid only once the server has sent 250 OK after the DATA phase. So in order to make lots of money, both the spammer and the botnet operator have to send as much as possible from the botnet in the shortest possible time. If a zombie is being blocked, the botnet operator doesn’t make any money.

Spamming software that sends spam to your server from a zombie is impatient. In programming terms, spamming software has a very low timeout. The SMTP RFC recommends that email servers wait at least three minutes for each chunk of data they send to be received by the receiving server and acknowledged via a TCP acknowledgement packet. Furthermore, the RFC recommends that senders wait at least ten minutes for the final message delivery acknowledgement.

These long timeouts were established because in the early days of the Internet the infrastructure was slow and unreliable, and the machines were easily overloaded, leading to frequent message delivery delays. Today, email servers and our networks are much faster, processing incoming messages in a matter of seconds. Delays still occur, but the timeouts defined in the RFC are significantly longer than required in today’s world.

Since botnet operators don’t get paid until they receive the 250 OK, their software earns a higher profit by disconnecting after a few seconds and seeking out new victims whose servers respond more quickly.

Now let’s take a minute to reiterate a few points.

A few years ago, the MIT Spam Conference was a very interesting place to be. Each year, bright-eyed graduate students and even intrepid industry types would present new filtering techniques that really pushed the accuracy of spam filters to new levels. For the past three years, the spam conference has been much less fantastic. A great result is a paper that shows accuracy improvement of half a per cent.

Spam filtering has really reached the limits of computer science and there isn’t much more we can do but tweak things so as not to fall behind the spammers at the very least.

Similarly, reputation systems that identify suspicious IP addresses have become asymptotic in their effectiveness. The spread of botnets has led to a virtually inexhaustible supply of new IP addresses, which spam us a few times and then disappear forever. Most of the large anti-spam companies now have comprehensive blacklists that are updated every minute.

In other words, we are blocking everything we possibly can, and yet the spam problem continues to grow. So what can we do?

Slowing things down

Bill Gates was right in 2004 when he boldly posited that the way to solve the spam problem was to introduce a cost barrier that caused spamming to cease to be profitable. But unfortunately for Bill, spammers created botnets, which have rendered them more computing power than most governments. One way to think of the problem is that the spammers have millions of computers. You only have a handful, and you have to pay for yours. Who’s going to win? While we can’t win the spam war with better filters or better blacklists, there is something we can do.

We can make botnets unprofitable by slowing down spam traffic.

The drawing on the left of Figure 3 shows how a typical email system deals with spam. Zombies pour messages into the top and the email server receives the messages as quickly as it can. The spam filter analyses and tries to filter out any messages that appear to be spam. Filters are effective at separating spam from email but do nothing to stop the rising volume of spam. As time passes, the server becomes overloaded, which results in delivery delays and temp-failing of emails. The problem with this approach is that as spam volumes increase, so does the CPU required to process all the mail. Keeping up with volume with filters alone is a never-ending cost. In short, spam filters aren’t getting a whole lot more accurate, and they are getting more computationally complex.

Traditional email system vs. traffic-controlled email.

Figure 3. Traditional email system vs. traffic-controlled email.

The drawing on the right side of Figure 3 shows another approach to receiving email that we have been developing for the past three years. Data flows through a transparent proxy from the Internet to the organization’s existing email infrastructure. Sources with good traffic are prioritized while sources that are sending spam are restricted. The system identifies abusive senders at the SMTP protocol layer and throttles those connections back. Senders of spam are literally not permitted to deliver packets to the network, eliminating abusive traffic before it is delivered.

The result is a clean mail stream of less than 30 per cent its original volume. Limiting the bandwidth and resources that spamming sources use causes spam software to time out and move on to more vulnerable targets. This slowing down approach works by traffic-shaping the TCP connection and implements similar methods to those of a network load-balancing device.

Real-world scenarios

Despite all the money invested in anti-spam solutions, the volume of spam continues to rise. Organizations receiving the spam bear the cost. One company to have implemented the TCP traffic-shaping approach is a major Fortune 500 company that was being flooded with so much spam that legitimate email was being delayed for hours at a time, so that spam filters could catch up with processing backlogged traffic.

The company’s administrators were using all the blacklists they could find, but even though the blacklists got rid of 50 to 70 per cent of the spam coming from well-known sources, the spam that remained was significant enough to be a very serious problem. They deployed our network traffic-shaping technology to restrict the suspect traffic.

The result was that spam volume dropped dramatically from 70 per cent of all traffic to 20 per cent overnight (see Figure 4), and as an experiment, they turned off four of six servers to handle all inbound mail. More importantly, they no longer needed to waste time maintaining content filters, adding more servers or experiencing slow SMTP responses.

Effect of deploying traffic-shaping technology.

Figure 4. Effect of deploying traffic-shaping technology.

There are limitations with every anti-spam technology. While filtering is effective at separating spam from email, it is only effective when it is one layer in a multi-tiered anti-spam architecture designed to leverage various technologies suited to each task. Applying traffic shaping at the network edge ensures legitimate senders get excellent quality of service and their mail flows quickly, whereas spammers are given very poor quality of service and their mail is not allowed into your network.

Problems with throttling

Slowing traffic from spammers works well. It decreases spam volume, contains infrastructure costs, and allows admins to deal effectively with the large proportion of senders that are not yet included in a blacklist. The problem with slowing down spammers is that it increases the number of TCP connections to the email server.

In the previous example, the customer dealt with 100 connections at a time, but after traffic shaping, they now see upwards of 1,000 concurrent connections. This ten-fold increase in the number of connections utterly destroys most email servers. To illustrate this problem, consider that it takes up to two seconds to deliver an email message under normal circumstances. Slowing down a spam zombie causes the connection to last an average of 40 seconds. If a significant proportion of connections are lasting 30 times longer than normal, then the number of connections you have going on at any one time grows.

Figure 5 shows the number of SMTP connections being handled by a single server at a large university using traffic-shaping technology. Note that the number of concurrent connections hovers around 500. The red line represents the total number of connections. The green line indicates the number of connections that the traffic control software is choosing to slow down.

SMTP connections.

Figure 5. SMTP connections.

Administrators running Sendmail or Postfix will note that 500 concurrent connections is a large number. The amount of memory required to handle 500 concurrent Sendmail processes, plus any associated spam-filtering processes, is considerable. If we were passing this number of connections through to Sendmail, the email server would almost certainly become overloaded.

One approach to improve the scalability of email systems is to redesign the email server completely with a new, highly scalable software architecture. But redesigning the email server is difficult, and changing the email system is a large commitment. An asymmetric SMTP proxy called real-time SMTP Multiplexing was built to solve the scalability challenge posed by traffic shaping. The proxy accepts thousands of connections from the Internet and then multiplexes these connections onto a much smaller pool of connections with the existing email server (see Figure 6). Unlike an email server, our proxy server doesn’t save messages to disk, which means it is a lot less complex and also doesn’t consume much in the way of system resources.

Real-time SMTP multiplexing.

Figure 6. Real-time SMTP multiplexing.

Figure 7 shows the number of connections to the email server of the large university mentioned previously. The red line indicates that the average number of connections with the email server hovers around 50, which is well within the amount a typical email server can handle. By multiplexing the SMTP connections, the system can achieve a 5:1 or 10:1 reduction in the number of connections the email server has to deal with. Moreover, reducing the concurrency of connections the email server has to deal with enables a large proportion of the incoming connections to be reduced, getting rid of a great deal of spam traffic in the process.

SMTP connections to the university server after multiplexing.

Figure 7. SMTP connections to the university server after multiplexing.

Conclusion

Spamming is an arms race. The real arms race today is one of sheer volume between the amount of traffic spammers can send and the volume of traffic that administrators can successfully receive. Despite the anti-spam mechanisms in place worldwide, spam volumes continue to rise. Some analysts believe that filters may have led to the increased volume. Better filtering only causes spammers to send more messages to improve their chances of getting through. The ability to plan correctly and provision the capacity needed to deal with what spammers throw at you is extremely difficult when unseen sources disable your ‘content rules’. With botnets, spammers have very scalable delivery infrastructures and receiving and filtering messages will be more demanding than ever before.

twitter.png
fb.png
linkedin.png
hackernews.png
reddit.png

 

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…


Bulletin Archive

We have placed cookies on your device in order to improve the functionality of this site, as outlined in our cookies policy. However, you may delete and block all cookies from this site and your use of the site will be unaffected. By continuing to browse this site, you are agreeing to Virus Bulletin's use of data as outlined in our privacy policy.