TakOCR : Easy OCR for Mac
Tako : Japanese for Octopus
OCRopus : Great Open Source OCR project
TakOCR is a project to fill a need I had. I needed a GUI to an OCR engine for my dad. He’s not really the compile-it-and-use-the-command-line type of guy. He is however a Mac using guy, so here are the results for your enjoyment.
Latest downloads
TakOCR.pkg version 1 md5: a7a620e1bbef92c454764c42ce1b4b8e
All packages, sources, uninstaller, etc.
NOTICE:
TakOCR is no longer supported. If the existing program works for you, great! If it does not work, I hope you find something else that does.
If someone wants to give me a Mac with the latest version of OSX, I would be happy to update this software. :-)
Usage
Run the installer program, then just drop images onto the program. The OCRed output will be displayed in a window which will pop up.
You will need to quit TakOCR before dropping more images onto it.
What’s Included, Copyrights
TakOCR is really just a bundle of OCRopus, ImageMagick, Ghostscript and a little wrapper application to tie it all together. ImageMagick and Ghostscript let you OCR PDFs, TIFFs, JPEGs, and many more formats.
The wrapper script is just a little Ruby program made into a dropplet application with the help of Platypus.
All of the software included is available under Open Source compatible licenses. You may download the sources at the link above and read individual packages licenses if you wish. Software included is : ImageMagick, uilib, libjpeg, leptonlib, libpng, ocropus, OpenFST, tesseract, libtiff, zlib, ghostscript.
TakOCR itself and the script behind the scenes are both placed in the Public Domain









Pingback: Ocropus on OS X: frustrations « History Research Hacks
I fail to use the program.
I made a screenshot from a text editor window with some sample text, which merely contains black characters on a white background without any visual obstacles whatsoever, and saved it into the file “pic1.png”.
I dropped pic1.png onto TakOCR. The result is a window with an empty text field!
Double-clicking TakOCR (without any file dropping, in progammer’s language “starting the program without any arguments”) returns the “Usage info” into the textfield, being equivalent to the command line “ocroscript recognize” (I’ve had a look into “/Applications/TakOCR.app/Contents/Resources/script”).
What version of OSX are you using? TakOCR was compiled for OSX 10.5 and probably will only work on that version. Unfortunately I don’t have access to a Mac with developer tool any more and don’t have any way to update or fix any issues you may be encountering. Sorry!
Hi
I wish you would post a tutorial of how to use it correctly. I got it to work, but I still have no idea what steps and pieces are essential.
Here are my steps:
1. Downloaded the TakOCR.pkg and installed on my Mac OS X 10.5.8 Macbook.
– When I run this, it open a window which I first thought was displaying errors. Then I dragged a pdf file twice to it. It accepts it the second time, and there is a “reluctant” four icon menu which sometimes appears. The options are zoom in, zoom out, download to Download folder, and Preview. None of these seem to help me convert my pdf file to text.
2. Downloaded and unzipped the full pacakge. Wasn’t ready to mix it in with lots of my production related stuff in /usr/local directory, so hand copied only /usr/local/share/ghostscript and /usr/local/share/fonts.
– This did nothing to change the behavior of the application package’s box mentioned in step 1.
3. Downloaded the tacocr.rb file to my home directory. I have ruby 1.8.7 installed on my machine. So I ran this file with the following command
~/tacocr.rb myinput.pdf > ocrtextoutput.html
- This magically created the parsed OCR.
Please explain this.
Thanks for the wonderful work you have done. I didn’t have to struggle with the compiles and makes.
Have you considered adding the formula to brew (homebrew) which will make this whole process a oneline command for anyone who has brew. Brew includes uninstall.
Anil,
I’m afraid I don’t even have Mac with which to test this program any more!
If I remember correctly, I would install TakOCR into the /Applications folder and it would work. To use it I would just drag a PDF to the TakOCR icon and drop it on the icon. If that’s not working, I’m afraid I don’t know what to suggest.
Best of luck!
I’m on OSX.5 but sadly couldn’t get it to work… drag and drop comes up with either a blank screen or with a string of
convert: unable to open image `SITES/9′: No such file or directory @ blob.c/OpenBlob/2411.
error messages. You might want to consider fixing this soft & releasing it; I’ve spent several hours trying to find a cheap / free OCR for mac that works, driving me crazy. There is an intel-only software out there but for us PPC users that’s no good. Anyway thanks for your efforts.
bash$ sudo port search ocr
Password:
cuneiform @1.1.0 (textproc, graphics)
Cuneiform is an OCR system with layout analysis.
gocr @0.49 (graphics)
Optical Character Recognition, converts images back to text
ocrad @0.21 (graphics)
ocrad is an optical character recognition program
ocropus @0.4-62bdc7b8be62 (textproc)
The OCRopus open source document analysis and OCR system
py31-djvubind @1.1.0 (python, graphics)
A tool to create highly compressed djvu files with positional ocr, metadata, and bookmarks
tesseract @3.00 (textproc, graphics, pdf)
Open source OCR engine
xmoto @0.5.9 (games)
X-Moto is a challenging 2D motocross platform game
Found 7 ports.
bash$
…
You could also try a bit of crippleware called: PDF OCR X.app
I find ocrad install via port works fine and is great for automating the process with the scripting language of your choice. See http://www.macports.org if you are not familiar with the terminal command “port”.
In case you’re curious, I still have it on my Mac upgraded to Mountain Lion (OS 10.8) and it works, though it hasn’t worked with a PDF so far. I think possibly it depends on what sort of pdf it is (some pdfs are just a batch of images, others have a text layer). The batch of images I think work or you can convert it to tiff or you could drag each page out to your desktop then to takocr.
Anything that’s tiff works fine as far as I can tell. I have the latest tesseract for 10.8 and that probably helps.
Actually, the PDFs aren’t a problem, I just had spaces in the file name. But the ones that are just images are vastly better than with a text layer.
Thanks for the updates, and I’m glad it’s still out there helping someone!