raleighpublicrecord/dochive · GitHub, DocHive has 2 prerequisites, ImageMagic and Tesserac. coverts pdf pages to images and the OCRs the image. purpose is to extract numeric statistical tables in PDFs for import into spreadsheets.
OCRopus is an OCR system written in Python, NumPy, and SciPy focusing on the use of large scale machine learning for addressing problems in document analysis. Formerly Tesseract.
Free-OCR.com is a free online OCR (Optical Character Recognition) tool. You can use this to perform OCR on any image you supply. This service is free, no registration necessary. We also do not need your email address. Just upload your image files. Free-OCR takes either a JPG, GIF, TIFF BMP or PDF (only first page). The only restriction is that the images must not be larger than 2MB, no wider or higher than 5000 pixels and there is a limit of 10 image uploads per hour.
Audiveris is an Optical Music Recognition (OMR) module. Starting from the image of a music sheet, it provides high-level logical music information compliant with the MusicXML definition. Other tools such as a Midi Sequencer, or a Composition Editor can then read and update this standard data.
There are already commercial tools in this area but Audiveris is, to our knowledge, the first Java open-source OMR tool. It is a cross-platform tool, written entirely in Java, and tested on Windows, Solaris, Linux and Mac OS.
Audiveris works with printed music sheets only, the task of recognizing hand-written scores being significantly harder.
Free-OCR.com is a free online OCR (Optical Character Recognition) tool. You can use this to perform OCR on any image you supply. This service is free, no registration necessary. We also do not need your email address. Just upload your image files. Free-OCR takes either PDF, JPG, GIF, TIFF or BMP format. The only restriction is that the images must not be larger than 2MB, no wider or higher than 5000 pixels and there is a limit of 10 image uploads per hour.
hOCR is a format for representing OCR output, including layout information, character confidences, bounding boxes, and style information. It embeds this information invisibly in standard HTML. By building on standard HTML, it automatically inherits well-defined support for most scripts, languages, and common layout options. Furthermore, unlike previous OCR formats, the recognized text and OCR-related information co-exist in the same file and survives editing and manipulation. hOCR markup is independent of the presentation.