Re: OCR for Kubuntu Feisty?

Discussion in 'NZ Computing' started by Lodi, Sep 18, 2007.

  1. Lodi

    Lodi Guest

    On Mon, 17 Sep 2007 23:47:36 +0200, Lodi wrote:

    > Hi all...Anyone got any OCR recommendations for Kubuntu Feisty.
    > Preferably something he/she has actually used so that I know the
    > software actually works under Feisty.
    >
    > Will start playing around with different programmes myself later in the
    > week but would appreciate any advice or pointers from the collected
    > wisdom of the group.
    >
    > Regards
    > Lodi


    No need to reply folks. I just had a quick look through ubuntuforums.org
    and installed ocrad.

    Scanned the document, then converted the image to a pbm file, then ran
    ocrad and hey presto one text file of the original document. Half a dozen
    wrong guesses by the software but easily fixed.

    Working OCR software ten minutes after starting to look through
    ubuntuforums. Amazing. (And it's free :)

    Regards
    Lodi
    Lodi, Sep 18, 2007
    #1
    1. Advertising

  2. Lodi

    Guest

    Just a follow up.....For future posterity (and google searches on OCR,
    Tesseract and Kubuntu Feisty) try Tesseract cos it's even better than
    ocrad (and easier to use).

    It converted five 3000 word documents into text files and ended up
    with maybe two dozen errors. Absolute magic. Still had to reformat the
    different font sizes (headings/body) and reset some bold/italics etc
    but saved the tediousness involved in having to re-type out all of the
    documents. All in all well impressed. And it's free too :)

    Here's a real easy how to:

    1 - Scan and save document as filename.bmp
    If possible scan using Black and White LineArt 300dpi (or similar)

    2 - If needed, change ownership using...... sudo chown YourUserName
    filename.bmp

    3 - If needed, open filename.bmp in Gimp to tidy/rotate image.

    4 - Convert bmp to tif using the command .... convert filename.bmp
    filename.tif

    If you get an error message you need to install ImageMagick
    sudo apt-get install imagemagick

    5 - Do the OCR using the command .... tesseract filename.tif
    finalfilename

    If you get an error message you need to install Tesseract (only works
    in Feisty)
    sudo apt-get install tesseract-ocr

    Tesseract will do its thing and produce a file called
    finalfilename.txt
    Open the text file in OpenOffice and re-format it as you so desire

    The power of the command line and Open Source software. Totally rules.

    Regards
    Lodi
    , Sep 19, 2007
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. noalternative

    For faster feisty downloading try mirrors.

    noalternative, Apr 19, 2007, in forum: Computer Support
    Replies:
    0
    Views:
    441
    noalternative
    Apr 19, 2007
  2. Lorenzo 1950

    Ubuntu Feisty Fawn 7.04

    Lorenzo 1950, Oct 7, 2007, in forum: Computer Support
    Replies:
    1
    Views:
    506
    Mike Easter
    Oct 7, 2007
  3. Miguel

    Xubuntu Feisty...

    Miguel, Apr 23, 2007, in forum: NZ Computing
    Replies:
    10
    Views:
    539
    Peter
    Apr 25, 2007
  4. Lodi

    Kubuntu Feisty - Convert mkv to dvd?

    Lodi, Jan 3, 2008, in forum: NZ Computing
    Replies:
    14
    Views:
    3,292
    inman2787
    Oct 6, 2010
  5. Giuen
    Replies:
    0
    Views:
    730
    Giuen
    Sep 12, 2008
Loading...

Share This Page