Evading CAPTCHA

2008-08-01

Martin Overton

Independent researcher, UK

Editor: Helen Martin

Abstract

Martin Overton looks at the use of CAPTCHAs in web security and at how cybercriminals are making increasing attempts to evade them.

Table of contents


Attack, attack
Striptease!
The rise of the machines
How big a problem is this really?
Conclusions

‘Evading CAPTCHA’ may sound like a theme for a spy or war story, but this has nothing to do with spies or traditional conflicts in war zones. In this article I will cover the use of CAPTCHAs and how cybercriminals are trying to evade them so that they can create bogus accounts on web services such as Google Mail, Yahoo! Mail and Microsoft Live Hotmail.

The following is a brief description of a CAPTCHA [1]:

‘The term CAPTCHA (for Completely Automated Public Turing Test To Tell Computers and Humans Apart) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford of Carnegie Mellon University. At the time, they developed the first CAPTCHA to be used by Yahoo ... A CAPTCHA is a program that protects websites against bots by generating and grading tests that humans can pass but current computer programs cannot. For example, humans can read distorted text ... but current computer programs can’t.’

Figure 1 shows an example of the distorted text displayed in a CAPTCHA.

Figure 1. Example CAPTCHA (actually a reCAPTCHA [2]).

If you have created webmail or similar accounts at Yahoo! Mail, Google Mail or Microsoft Live Hotmail you will have had to solve a CAPTCHA to complete the sign-up form and prove that you are a human and not a machine. Many websites also use CAPTCHAs for forum sign-ups, feedback forms, etc. The idea is to make it too difficult or time consuming for the bad guys and girls to bother filling in sign-up and feedback/contact forms and to stop them from automating the process using bots and botnets. Love them or hate them, CAPTCHAs have their place in web security.

Spammers, scammers and malware authors have started to move to the likes of the Google Mail, Yahoo! Mail and Microsoft Live Hotmail web mail services to try and improve the chances of their output bypassing anti-spam defences.

This is because anti-spam defences are now in place almost everywhere, as even home users have finally woken up to the spam problem (commercial organizations, academia and government departments have mostly been on the ball for quite a few years).

But why are the cybercriminals bothering to use these web mail systems? Simply because anti-spam defences such as Domain Keys (aka DKIM) [3] are used by both Yahoo! Mail and Google Mail to prove that emails have originated from their systems; this in turn gives any email sent via their systems extra credibility and makes them less likely to be filtered as spam at the receiving server.

Microsoft Live Hotmail uses a similar technique known as Sender ID [4], which is heavily based on SPF. This, like DKIM, is seen to add credibility to emails and make them less likely to be flagged as spam.

Now do you see why the bad guys and girls are interested in CAPTCHA-evading/solving techniques and tools?

Attack, attack

So what sort of techniques can be used to evade or beat CAPTCHA-based sign-ups?

The types of attack that have been shown to work include computer character recognition (OCR or shape matching and object recognition) [5], social engineering (humans) and bots as well as mixtures of these attack vectors. So, let us have a look at each of these methods. We will start with the easiest and most effective, which almost certainly has the highest accuracy rate: social engineering.

Striptease!

At the end of October 2007 (most anti-malware descriptions show that this was first discovered on 1 November, a few claim that it was 31 October). we saw a very interesting technique being used to try to make unsuspecting users help the criminals evade or beat CAPTCHAs. This was called Troj/CAPTCHA-A (Sophos). The following is a brief explanation of how it works [6].

‘The Troj/CAPTCHA-A Trojan horse poses as a sexy game, offering increasingly saucy photographs of a blonde model called Melissa in exchange for the user correctly unscrambling an image. The obfuscated image is a CAPTCHA used by websites to ensure that requests are being made by a human being and not a bot ... every time a CAPTCHA is entered correctly Melissa donates another item of clothing to charity.’

This particular CAPTCHA attack was aimed squarely at breaking those used by Yahoo! Mail. Figure 2 shows a series of screenshots from Troj/CAPTCHA-A.

Figure 2. Troj/CAPTCHA/A screenshots

Not surprisingly, this trojan-assisted attack worked quite well as it used one of the key social-engineering hooks: sex.

However, it isn’t the only way that the bad guys and girls encourage humans to solve CAPTCHAs for them, they also use another common social-engineering hook: greed. Yes, they simply pay people to solve them!

Websense found a document [7] that appears to instruct workers on the art of solving CAPTCHAs. It states (translated from Russian):

'If you are unable to recognize a picture or it is not loaded (picture appears black, empty picture), just press Enter. Do not enter random characters! If there is a delay in downloading images, exit from your account, refresh the page and go again.’

It is not known how much the person gets paid for each CAPTCHA solved [8], but the original document does state ‘No more than one payout per day. Minimum balance to be paid out is $3’.

To those of us in the developed and wealthy parts of the world the level of payments being offered seems a pittance, however many of those who live in the poorer parts of the world would see this as a golden opportunity to be grasped with both hands. It is not known whether those who run this service actually pay out, and if they do, how.

The rise of the machines

It was suggested by some researchers earlier this year [9] that bots and botnets are now being used successfully to break the CAPTCHAs used by Google Mail (aka Gmail):

‘Gmail is being targeted in recent spammer tactics. Spammers in these attacks managed to create bots that are capable of signing up and creating random Gmail accounts for spamming purposes.’

However, the research seems to indicate that the attacks on Google Mail require the use of several bots – a sort of tag-team wrestling approach:

‘The Gmail signing process involves two botted hosts (or CAPTCHA breaking hosts) ... On average, only one in every five CAPTCHA breaking requests are successfully including both algorithms used by the bot, approximating a success rate of 20%.’

It isn’t just Google Mail that has been targeted using bots, both Yahoo! Mail and Microsoft Live Hotmail [10] have also been attacked successfully by using bots to solve their CAPTCHAs.

According to Websense, this is how the Microsoft Live Hotmail account sign-up is automated using a single bot:

‘First, the bot is observed to request the Live Mail registration page and it begins filling in the necessary form fields (as any ordinary user would be required to) with random data. When it comes to the CAPTCHA verification test, the bot sends the CAPTCHA image to its CAPTCHA breaking service for the text in the image.

‘...on average, one in every three CAPTCHA breaking requests succeeds – setting the bot’s success rate at around 30–35%.’

This is quite an amazing success rate for something that a computer is not supposed to be able to do.

However, I don’t believe that the current success rates using bots and botnets are completely accurate I suspect, as do others, that this is more of a cyborg-based [11] attack, with the work using both bots and humans to defeat automated account sign-ups and CAPTCHA solving. The report from Websense on bots being used to solve Google Mail CAPTCHAs seems to confirm my suspicions.

How big a problem is this really?

An article which appeared in The Register in March 2008 [12] stated:

‘An analysis of spam trends in February 2008 by net security firm MessageLabs revealed that 4.6 per cent of all spam originates from web mail-based services.

‘The proportion of spam from Gmail increased two-fold from 1.3 per cent in January to 2.6 per cent in February, most of which spamvertised skin-flick websites. Yahoo! Mail was the most abused web mail service, responsible for sending 88.7 per cent of all web mail-based spam.’

This shows that the problem is still quite small (4.6%) when compared with global spam quantities. Unfortunately I suspect that the use of webmail services may well take over from the current almost exclusive use of botnets to send spam. However, the criminals will need to come up with some more efficient ways of evading or solving CAPTCHAs first.

Conclusions

Is the CAPTCHA still useful? Yes, and more complex and harder-to-defeat systems have been developed, including 3D [13] and image-recognition [14] (rather than text-based) varieties.

It seems that spammers are intent on continuing their assault on our inboxes, offering things as diverse as university degrees and penny stocks to pills and potions to make various body parts larger or firmer.

The funny thing is that this market would soon collapse and become financially non-viable if the 11 per cent of recipients [15] (or 22 per cent of British consumers [16]) who currently buy the items advertised in spam would just stop doing so. (Yes, I know, that is about as likely to happen as world peace.)

Until then those that push spam will continue to look for ways to ensure – or at least improve the chances – that their ‘crud’ will end up in inboxes all over the world.

Repeat after me: ‘I will not buy from spam’.

Bibliography

[1] http://www.captcha.net/.

[2] http://recaptcha.net/learnmore.html.

[3] http://en.wikipedia.org/wiki/DomainKeys.

[4] http://en.wikipedia.org/wiki/SenderID.

[5] http://www.cs.sfu.ca/~mori/research/gimpy/.

[6] http://www.sophos.com/security/blog/2007/11/737.html.

[7] http://securitylabs.websense.com/content/Blogs/2919.aspx.

[8] http://bits.blogs.nytimes.com/2008/03/13/breaking-google-captchas-for-3-a-day/index.html?ref=technology.

[9] http://securitylabs.websense.com/content/Blogs/2919.aspx.

[10] http://securitylabs.websense.com/content/Blogs/2907.aspx.

[11] http://en.wikipedia.org/wiki/Cyborg.

[12] http://www.theregister.co.uk/2008/03/14/captcha_serfs/.

[13] http://spamfizzle.com/CAPTCHA.aspx.

[14] http://research.microsoft.com/asirra/.

[15] http://www.informationweek.com/news/security/vulnerabilities/showArticle.jhtml?articleID=165701785.

[16] http://www.theregister.co.uk/2004/12/10/spam_buyers_survey_bsa/.

Latest articles:

Nexus Android banking botnet – compromising C&C panels and dissecting mobile AppInjects

Aditya Sood & Rohit Bansal provide details of a security vulnerability in the Nexus Android botnet C&C panel that was exploited to compromise the C&C panel in order to gather threat intelligence, and present a model of mobile AppInjects.

Cryptojacking on the fly: TeamTNT using NVIDIA drivers to mine cryptocurrency

TeamTNT is known for attacking insecure and vulnerable Kubernetes deployments in order to infiltrate organizations’ dedicated environments and transform them into attack launchpads. In this article Aditya Sood presents a new module introduced by…

Collector-stealer: a Russian origin credential and information extractor

Collector-stealer, a piece of malware of Russian origin, is heavily used on the Internet to exfiltrate sensitive data from end-user systems and store it in its C&C panels. In this article, researchers Aditya K Sood and Rohit Chaturvedi present a 360…

Fighting Fire with Fire

In 1989, Joe Wells encountered his first virus: Jerusalem. He disassembled the virus, and from that moment onward, was intrigued by the properties of these small pieces of self-replicating code. Joe Wells was an expert on computer viruses, was partly…

Run your malicious VBA macros anywhere!

Kurt Natvig wanted to understand whether it’s possible to recompile VBA macros to another language, which could then easily be ‘run’ on any gateway, thus revealing a sample’s true nature in a safe manner. In this article he explains how he recompiled…

Bulletin Archive