What Google Gains With ReCaptcha Technology In Its Trickbag
The move gives Google at first a reliable Web fraud prevention tool. ReCAPTCHA's technology asks users to identify squiggly letter images a computer would have trouble interpreting, thus increasing the likelihood that whatever is accessing the Web site is a human. According to reCaptcha, the technology is employed by more than 100,000 Web sites and services.
Google said it intends to use CAPTCHA to not only aide its own security efforts, but also to help streamline its massive and controversial book and newspaper-scanning project.
"In this way, reCAPTCHA's unique technology improves the process that converts scanned images into plain text, known as Optical Character Recognition (OCR). This technology also powers large scale text scanning projects like Google Books and Google News Archive Search," wrote Google in a Wednesday blog-post co-authored by Google product manager Will Cathcart and reCAPTCHA co-founder Luis von Ahn.
Many of the words contained in CAPTCHA images are taken from scanned newspapers and books, anyway, which will help Google's efforts, they explained.
"Having the text version of documents is important because plain text can be searched, easily rendered on mobile devices and displayed to visually impaired users. So we'll be applying the technology within Google not only to increase fraud and spam protection for Google products but also to improve our books and newspaper scanning process," they wrote.
The CAPTCHA technology was originally developed at Carnegie Mellon University.
Google's book and newspaper digitization project has been a subject of much controversy. It's been heavily criticized by the Authors Guild, who with the Association of American Publishers filed suit against Google in 2005, claiming copyright infringement.
In October 2008, both sides agreed to a $125 million settlement to avoid a trial case, but the settlement has been in limbo ever since, thanks to a Department of Justice review and stepped-up criticism of Google from a variety of high profile digital book and technology companies like Amazon, Microsoft and Yahoo and associations like the Open Book Alliance.
"Millions of copyright owners around the world who did not participate in and do not even know about this litigation -- many of whose works Google has not even scanned -- stand on the verge of having their copyrights infringed in ways exponentially greater than the conduct challenged in the complaints," reads Microsoft's complaint. "Monopolization is the wrong means to carry out the worthy goal of digitizing and increasing the accessibility of books."
Google late last week defended its scanning efforts, saying that it was interested in creating open platforms with publishers and e-reading device makers to build a digital book ecosystem.
The U.S. Justice Department is expected to offer its official opinion on Google's scanning project to the U.S. District Court for the Southern District of New York later this week. A later hearing is scheduled in District Court for Oct. 7.