Lucian Constantin
CSO Senior Writer

Researchers defeat CAPTCHA on popular websites

news
Nov 1, 20113 mins

A new tool is capable of solving CAPTCHA tests on Wikipedia, eBay, CNN, and other sites

Researchers from Stanford University have developed an automated tool that is capable of deciphering text-based anti-spam tests used by many popular websites with a significant degree of accuracy.

Researchers Elie Bursztein, Matthieu Martin and John C. Mitchel presented the results of their year-and-a-half long CAPTCHA study at the recent ACM Conference On Computer and Communication Security in Chicago.

[ Find out how to block the viruses, worms, and other malware that threaten your business, with hands-on advice from InfoWorld’s expert contributors in InfoWorld’s “Malware Deep Dive” PDF guide. ]

CAPTCHA stands for ‘Completely Automated Public Turing test to tell Computers and Humans Apart’ and consists of challenges that only humans are supposed to be capable of solving. Websites use such tests in order to block spambots that automate tasks like account registration and comment posting.

There are various types of CAPTCHAs, some using audio, others using math problems, but the most common implementations rely on users typing back distorted text. The Stanford team devised various methods of cleaning up purposely introduced image background noise and breaking text strings into individual characters for easier recognition, a technique called segmentation.

Some of their CAPTCHA-breaking algorithms are inspired by those used by robots to orient themselves in various environments and were built into an automated tool dubbed Decaptcha. This tool was then run against CAPTCHAs used by 15 high-profile websites.

The results revealed that tests used by Visa’s Authorize.net payment gateway could be beaten 66 percent of the time, while attacks on Blizzard’s World of Warcraft portal had a success rate of 70 percent.

Other interesting results were registered on eBay, whose CAPTCHA implementation failed 43 percent of the time, and on Wikipedia, where one in four attempts was successful. Lower, but still significant, success rates were found on Digg, CNN and Baidu — 20, 16 and 5 percent respectively.

The only tested sites where CAPTCHAs couldn’t be broken were Google and reCAPTCHA. The latter is an implementation originally developed at Carnegie Mellon University and bought by the Internet search giant in September 2009.

Authorize.net and Digg have switched to reCAPTCHA since these tests were performed, but it’s not clear if the other websites made changes as well. Nevertheless, the Stanford researchers came up with several recommendations to improve CAPTCHA security.

These include randomizing the length of the text string, randomizing the character size, applying a wave-like effect to the output and using collapsing or lines in the background. Another noteworthy conclusion was that using complex character sets has no security benefits and is bad for usability.

Bursztein and his team have also had other breakthroughs in this field in the past. Back in May, they developed techniques to successfully break audio CAPTCHAs on sites like Microsoft, eBay, Yahoo and Digg and they plan to continue improving their Decaptcha tool in the future.

Lucian Constantin

Lucian Constantin writes about information security, privacy, and data protection for CSO. Before joining CSO in 2019, Lucian was a freelance writer for VICE Motherboard, Security Boulevard, Forbes, and The New Stack. Earlier in his career, he was an information security correspondent for the IDG News Service and Information security news editor for Softpedia.

Before he became a journalist, Lucian worked as a system and network administrator. He enjoys attending security conferences and delving into interesting research papers. He lives and works in Romania.

You can reach him at lucian_constantin@foundryco.com or @lconstantin on X. For encrypted email, his PGP key's fingerprint is: 7A66 4901 5CDA 844E 8C6D 04D5 2BB4 6332 FC52 6D42

More from this author