04 نوفمبر 2008

Scanned docs can be searched via Google

A picture of a thousand words?

10/30/2008 02:33:00 PM
(Note: Click on the first result in each of the search results pages linked to throughout the post to see this feature in action.)

A scanner is a wonderful tool. Every day, people all over the world post scanned documents online -- everything from official government reports to obscure academic papers. These files usually contain images of text, rather than the text themselves.But all of these documents have one thing in common: someone somewhere thought they were they were valuable enough to share with the world.

In the past, scanned documents were rarely included in search results as we couldn't be sure of their content. We had occasional clues from references to the document-- so you might get a search result with a title but no snippet highlighting your query. Today, that changes. We are now able to perform OCR on any scanned documents that we find stored in Adobe's PDF format. This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words -- words that can be searched and indexed, so that these valuable documents are more easily found. This is a small but important step forward in our mission of making all the world's information accessible and useful.

URL: http://googleblog.blogspot.com/2008/10/picture-of-thousand-words.html