Monday, May 22, 2006

OCR for Free

A fairly frequent question in translators' fora is "how do I convert a (jpg, or tif, or pdf) file", so that I can work on it in MS Word.

For pdf files, the solution is sometime as simple as opening the file in Acrobat, and saving it as a MS Word or rtf file. But often this approach doesn't work (for example because the pdf was created from a graphic file, and not a text one).

For graphic files such as jpg or tif files, of course, "saving as" a Word file is not an option.

So often the solution offered is to use some OCR package. Professional ones may give good results (if the quality of the original is good), but they cost money, and the free OCR applications that come with a scanner are usually very disappointing: they may not recognize accented letters, or fail to properly keep the layout of the page (after all, they are given away so as to induce customers to upgrade to the full version).

A better alternative, at least for users of the latest versions of MS Office, is to take advantage of Microsoft Office Document Imaging: it is better than most other "free" OCR applications, may be upgraded (if necessary) to one of the leading "pro" OCR packages, and, on its own, already recognizes things such as tables and accented characters.

1 comment:

  1. I don't have documents to OCR often but when I do I use A Billion Billion cuase it's free.


Thank you for your comment!

Unfortunately, comment spam has grown to the point that all comments need to be moderated. All legitimate comments will be published as soon as possible.