TakOCR

TakOCR : Easy OCR for Mac

Tako : Japanese for Octopus
OCRopus : Great Open Source OCR project

TakOCR is a project to fill a need I had. I needed a GUI to an OCR engine for my dad. He’s not really the compile-it-and-use-the-command-line type of guy. He is however a Mac using guy, so here are the results for your enjoyment.

Latest downloads

TakOCR.pkg version 1 md5: a7a620e1bbef92c454764c42ce1b4b8e
All packages, sources, uninstaller, etc.

NOTICE:

TakOCR is no longer supported.  If the existing program works for you, great!  If it does not work, I hope you find something else that does.

If someone wants to give me a Mac with the latest version of OSX, I would be happy to update this software. :-)

Usage

Run the installer program, then just drop images onto the program. The OCRed output will be displayed in a window which will pop up.

You will need to quit TakOCR before dropping more images onto it.

What’s Included, Copyrights

TakOCR is really just a bundle of OCRopus, ImageMagick, Ghostscript and a little wrapper application to tie it all together. ImageMagick and Ghostscript let you OCR PDFs, TIFFs, JPEGs, and many more formats.

The wrapper script is just a little Ruby program made into a dropplet application with the help of Platypus.

All of the software included is available under Open Source compatible licenses. You may download the sources at the link above and read individual packages licenses if you wish. Software included is : ImageMagick, uilib, libjpeg, leptonlib, libpng, ocropus, OpenFST, tesseract, libtiff, zlib, ghostscript.

TakOCR itself and the script behind the scenes are both placed in the Public Domain

This entry was posted in Digitization, Programming, Projects and tagged , , , , . Bookmark the permalink.

11 Responses to TakOCR

  1. Pingback: Ocropus on OS X: frustrations « History Research Hacks

  2. Stefan Nowak says:

    I fail to use the program.

    I made a screenshot from a text editor window with some sample text, which merely contains black characters on a white background without any visual obstacles whatsoever, and saved it into the file “pic1.png”.

    I dropped pic1.png onto TakOCR. The result is a window with an empty text field!

    Double-clicking TakOCR (without any file dropping, in progammer’s language “starting the program without any arguments”) returns the “Usage info” into the textfield, being equivalent to the command line “ocroscript recognize” (I’ve had a look into “/Applications/TakOCR.app/Contents/Resources/script”).

    • stuporglue says:

      What version of OSX are you using? TakOCR was compiled for OSX 10.5 and probably will only work on that version. Unfortunately I don’t have access to a Mac with developer tool any more and don’t have any way to update or fix any issues you may be encountering. Sorry!

  3. Anil says:

    Hi

    I wish you would post a tutorial of how to use it correctly. I got it to work, but I still have no idea what steps and pieces are essential.

    Here are my steps:
    1. Downloaded the TakOCR.pkg and installed on my Mac OS X 10.5.8 Macbook.
    – When I run this, it open a window which I first thought was displaying errors. Then I dragged a pdf file twice to it. It accepts it the second time, and there is a “reluctant” four icon menu which sometimes appears. The options are zoom in, zoom out, download to Download folder, and Preview. None of these seem to help me convert my pdf file to text.

    2. Downloaded and unzipped the full pacakge. Wasn’t ready to mix it in with lots of my production related stuff in /usr/local directory, so hand copied only /usr/local/share/ghostscript and /usr/local/share/fonts.
    – This did nothing to change the behavior of the application package’s box mentioned in step 1.

    3. Downloaded the tacocr.rb file to my home directory. I have ruby 1.8.7 installed on my machine. So I ran this file with the following command

    ~/tacocr.rb myinput.pdf > ocrtextoutput.html

    - This magically created the parsed OCR.

    Please explain this.

    Thanks for the wonderful work you have done. I didn’t have to struggle with the compiles and makes.

    Have you considered adding the formula to brew (homebrew) which will make this whole process a oneline command for anyone who has brew. Brew includes uninstall.

    • stuporglue says:

      Anil,

      I’m afraid I don’t even have Mac with which to test this program any more!

      If I remember correctly, I would install TakOCR into the /Applications folder and it would work. To use it I would just drag a PDF to the TakOCR icon and drop it on the icon. If that’s not working, I’m afraid I don’t know what to suggest.

      Best of luck!

  4. Lex says:

    I’m on OSX.5 but sadly couldn’t get it to work… drag and drop comes up with either a blank screen or with a string of

    convert: unable to open image `SITES/9′: No such file or directory @ blob.c/OpenBlob/2411.

    error messages. You might want to consider fixing this soft & releasing it; I’ve spent several hours trying to find a cheap / free OCR for mac that works, driving me crazy. There is an intel-only software out there but for us PPC users that’s no good. Anyway thanks for your efforts.

    • agitatedString says:

      bash$ sudo port search ocr
      Password:
      cuneiform @1.1.0 (textproc, graphics)
      Cuneiform is an OCR system with layout analysis.

      gocr @0.49 (graphics)
      Optical Character Recognition, converts images back to text

      ocrad @0.21 (graphics)
      ocrad is an optical character recognition program

      ocropus @0.4-62bdc7b8be62 (textproc)
      The OCRopus open source document analysis and OCR system

      py31-djvubind @1.1.0 (python, graphics)
      A tool to create highly compressed djvu files with positional ocr, metadata, and bookmarks

      tesseract @3.00 (textproc, graphics, pdf)
      Open source OCR engine

      xmoto @0.5.9 (games)
      X-Moto is a challenging 2D motocross platform game

      Found 7 ports.
      bash$

      You could also try a bit of crippleware called: PDF OCR X.app
      I find ocrad install via port works fine and is great for automating the process with the scripting language of your choice. See http://www.macports.org if you are not familiar with the terminal command “port”.

  5. Marion Delgado says:

    In case you’re curious, I still have it on my Mac upgraded to Mountain Lion (OS 10.8) and it works, though it hasn’t worked with a PDF so far. I think possibly it depends on what sort of pdf it is (some pdfs are just a batch of images, others have a text layer). The batch of images I think work or you can convert it to tiff or you could drag each page out to your desktop then to takocr.

    Anything that’s tiff works fine as far as I can tell. I have the latest tesseract for 10.8 and that probably helps.

  6. Pingback: Suche OCR-Software, die alte Schriften erkennt (breite Kanzlei)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *