Who is an OCR guru?

Discussion in 'Computer Support' started by Murgi, Jul 1, 2004.

  1. Murgi

    Murgi Guest

    Before purchasing Omnipage Pro 14, TextBridge 11, or Finereader Pro, I need
    to know whether it can handle the following:




    I have been translating part lists for the automotive industry on a daily
    basis for many years. The source text is provided as printed media.

    Recently I was asked to type a 6-digit part number (non-sequential numbers)
    in front of every translated part. This is tedious work unless it can be
    automated.
    I tried to scan the numbers, but the result isn't usable for a simple reason
    (interfering underlinings). Each number is written in one line and is
    "underlined" like these:

    123456 texttexttext

    ------
    235673 texttexttext
    ------
    735499 texttexttext
    ------

    The underlinings run actually just 1 mm beneath the text (color: black). (I
    don't know how
    to present this style here in the newsreader.)


    How can I remove these perforated underlinings? Obviously it depends on the
    OCR software. Simple OCR software packages won't handle this task!
    The print is otherwise clean enough to scan without problems. Which OCR
    software can do what I want to achieve?

    Omnipage, TextBridge or Finereader might have a function to eliminate this
    "noise" it was suggested. Does anybody know whether this works or not?

    I really want to automate this task since I am wasting too much time in
    typing thousands of stupid numbers.


    Do you have any ideas/suggestions how to solve the problem?

    Thanks,
    Murgi
     
    Murgi, Jul 1, 2004
    #1
    1. Advertising

  2. Murgi

    D.Currie Guest

    "Murgi" <-net.ne.jp> wrote in message
    news:...
    > Before purchasing Omnipage Pro 14, TextBridge 11, or Finereader Pro, I

    need
    > to know whether it can handle the following:
    >
    >
    >
    >
    > I have been translating part lists for the automotive industry on a daily
    > basis for many years. The source text is provided as printed media.
    >
    > Recently I was asked to type a 6-digit part number (non-sequential

    numbers)
    > in front of every translated part. This is tedious work unless it can be
    > automated.
    > I tried to scan the numbers, but the result isn't usable for a simple

    reason
    > (interfering underlinings). Each number is written in one line and is
    > "underlined" like these:
    >
    > 123456 texttexttext
    >
    > ------
    > 235673 texttexttext
    > ------
    > 735499 texttexttext
    > ------
    >
    > The underlinings run actually just 1 mm beneath the text (color: black).

    (I
    > don't know how
    > to present this style here in the newsreader.)
    >
    >
    > How can I remove these perforated underlinings? Obviously it depends on

    the
    > OCR software. Simple OCR software packages won't handle this task!
    > The print is otherwise clean enough to scan without problems. Which OCR
    > software can do what I want to achieve?
    >
    > Omnipage, TextBridge or Finereader might have a function to eliminate this
    > "noise" it was suggested. Does anybody know whether this works or not?
    >
    > I really want to automate this task since I am wasting too much time in
    > typing thousands of stupid numbers.
    >
    >
    > Do you have any ideas/suggestions how to solve the problem?
    >
    > Thanks,
    > Murgi
    >


    No matter how good the OCR is, you're still going to have to check for
    errors. When you're scanning normal text, it's pretty simple because the OCR
    software tags anything that's not in its dictionary. Then, when you read it
    again outside of the OCR software, it should be relatively simple to spot
    words that are OCR'd wrong because it usually won't make sense in context.
    But when you're reading bunches of numbers and random letters, none of it is
    going to be in its dictionary, so it's all going to be tagged as a
    misspelling. So you're going to have to check it all, character by tedious
    character. If you don't, and you just accept the OCR, you'd have to be
    pretty confident that the software got it all right. If you were starting
    with a very clean copy, you might have a chance, but I wouldn't bet on it in
    a critical application.

    You might want to see if any of the OCR software packages offer a trial
    version. Maybe one of them is better than the others for your particular
    application, but it's something you'd have to test under your particular
    circumstances. It's doubtful anyone else would have experience with exactly
    your scanner, that type of OCR requirements, the font you're scanning in,
    etc...and that they've used all the programs you've listed.
     
    D.Currie, Jul 1, 2004
    #2
    1. Advertising

  3. Murgi

    Murgi Guest

    > No matter how good the OCR is, you're still going to have to check for
    > errors. When you're scanning normal text, it's pretty simple because the

    OCR
    > software tags anything that's not in its dictionary. Then, when you read

    it
    > again outside of the OCR software, it should be relatively simple to spot
    > words that are OCR'd wrong because it usually won't make sense in context.
    > But when you're reading bunches of numbers and random letters, none of it

    is
    > going to be in its dictionary, so it's all going to be tagged as a
    > misspelling. So you're going to have to check it all, character by tedious
    > character. If you don't, and you just accept the OCR, you'd have to be
    > pretty confident that the software got it all right. If you were starting
    > with a very clean copy, you might have a chance, but I wouldn't bet on it

    in
    > a critical application.
    >


    OK, the actual numbers come out OK... but these "----------------" generate
    other numbers (smaller in size).
    Can the OCR program be TRAINED to skip these characters?
    The actual needed numbers are checked after the job is done.


    Murgi
     
    Murgi, Jul 1, 2004
    #3
  4. Murgi

    D.Currie Guest

    "Murgi" <-net.ne.jp> wrote in message
    news:...
    > > No matter how good the OCR is, you're still going to have to check for
    > > errors. When you're scanning normal text, it's pretty simple because the

    > OCR
    > > software tags anything that's not in its dictionary. Then, when you read

    > it
    > > again outside of the OCR software, it should be relatively simple to

    spot
    > > words that are OCR'd wrong because it usually won't make sense in

    context.
    > > But when you're reading bunches of numbers and random letters, none of

    it
    > is
    > > going to be in its dictionary, so it's all going to be tagged as a
    > > misspelling. So you're going to have to check it all, character by

    tedious
    > > character. If you don't, and you just accept the OCR, you'd have to be
    > > pretty confident that the software got it all right. If you were

    starting
    > > with a very clean copy, you might have a chance, but I wouldn't bet on

    it
    > in
    > > a critical application.
    > >

    >
    > OK, the actual numbers come out OK... but these "----------------"

    generate
    > other numbers (smaller in size).
    > Can the OCR program be TRAINED to skip these characters?
    > The actual needed numbers are checked after the job is done.
    >
    >
    > Murgi
    >
    >


    You'd have to check the documentation for all three products. I don't have
    current versions of any -- I've got an old OmniPage and something else that
    came with a scanner, but they aren't even installed on my production
    machine. When I need to OCR, I just use the built-in OCR in Word -- it's
    good enough for what I do. I used to do a lot more scanning of text, but now
    everything comes electronically instead of on paper, so I've had no need to
    use better OCR software for some time.

    You might want to see if there's an option to turn off detection for fonts
    below a certain point size.

    There are also settings to keep or remove formatting, and something in there
    might be useful. If you could get the software to see the lines as graphic
    lines instead of text, it might ignore them. Maybe by darkening the lines
    themselves? I don't know if it's worth it, but running over the lines with a
    fine-tip marker might convince the OCR software that it's a graphic element
    instead of text.

    As far as training the OCR software, you might be more successful training
    it to see the characters as something else, rather than trying to train it
    to ignore the characters. For example, you could try training it to see if
    it would recognize them as dashes or asterisks, then just delete those from
    the finished product. Whether the training sticks or not, depends on the
    software.

    When I did a lot of OCRing, I was getting text from a lot of different
    sources, using different fonts, etc., so training was pretty futile. If
    you're always sung the same sources, you should have better luck.
     
    D.Currie, Jul 1, 2004
    #4
  5. Murgi

    Murgi Guest

    > You'd have to check the documentation for all three products. I don't have
    > current versions of any -- I've got an old OmniPage and something else

    that
    > came with a scanner, but they aren't even installed on my production
    > machine. When I need to OCR, I just use the built-in OCR in Word -- it's
    > good enough for what I do. I used to do a lot more scanning of text, but

    now
    > everything comes electronically instead of on paper, so I've had no need

    to
    > use better OCR software for some time.
    >


    I approached the company that makes Omnipage and TextBridge... and
    may never receive an answer.
    But "Finereader" sent me a good response within a day. If this one doesn't
    do the trick, I'll just outsource these daily jobs to a housewife in the
    neighborhood who wants to make some extra money.

    *********************

    ABBYY FineReader 7.0 can recognize underlined text and allow you to erase
    the underlining. Also, you can save the recognized text into a Microsoft
    Word document and uncheck underlining.

    You can download a trial copy of ABBYY FineReader 7.0 from our web-site:
    http://www.abbyy.com/ocr_products.asp?param=28844
    Or contact our sales-manager () to purchase a licensed copy of
    ABBYY FineReader 7.0.

    With best regards,
    Julia Mosenkova
    Technical Support Service
    ABBYY Software House
    Phone: +7(095)7833700
    E-mail:
    http://www.abbyy.com
     
    Murgi, Jul 4, 2004
    #5
  6. Murgi

    Saddles Guest

    I'm not an OCR guru, but I've been using ABBYY FineReader for years. I can
    vouch for it. It will do it, just as they explained. Scan, then erase the
    lines on the "canvas" (zoom windows) before reading, or read, send to Word,
    then remove the underline format. In fact, I duplicated your issue and it
    did it.

    "Murgi" <-net.ne.jp> wrote in message
    news:...
    > > You'd have to check the documentation for all three products. I don't

    have
    > > current versions of any -- I've got an old OmniPage and something else

    > that
    > > came with a scanner, but they aren't even installed on my production
    > > machine. When I need to OCR, I just use the built-in OCR in Word -- it's
    > > good enough for what I do. I used to do a lot more scanning of text, but

    > now
    > > everything comes electronically instead of on paper, so I've had no need

    > to
    > > use better OCR software for some time.
    > >

    >
    > I approached the company that makes Omnipage and TextBridge... and
    > may never receive an answer.
    > But "Finereader" sent me a good response within a day. If this one

    doesn't
    > do the trick, I'll just outsource these daily jobs to a housewife in the
    > neighborhood who wants to make some extra money.
    >
    > *********************
    >
    > ABBYY FineReader 7.0 can recognize underlined text and allow you to erase
    > the underlining. Also, you can save the recognized text into a Microsoft
    > Word document and uncheck underlining.
    >
    > You can download a trial copy of ABBYY FineReader 7.0 from our web-site:
    > http://www.abbyy.com/ocr_products.asp?param=28844
    > Or contact our sales-manager () to purchase a licensed copy

    of
    > ABBYY FineReader 7.0.
    >
    > With best regards,
    > Julia Mosenkova
    > Technical Support Service
    > ABBYY Software House
    > Phone: +7(095)7833700
    > E-mail:
    > http://www.abbyy.com
    >
    >
     
    Saddles, Jul 4, 2004
    #6
  7. Murgi

    VV Guest

    "Murgi" <-net.ne.jp> wrote in message news:<>...
    > Before purchasing Omnipage Pro 14, TextBridge 11, or Finereader Pro, I need
    > to know whether it can handle the following:
    >
    >
    >
    >
    > I have been translating part lists for the automotive industry on a daily
    > basis for many years. The source text is provided as printed media.
    >
    > Recently I was asked to type a 6-digit part number (non-sequential numbers)
    > in front of every translated part. This is tedious work unless it can be
    > automated.
    > I tried to scan the numbers, but the result isn't usable for a simple reason
    > (interfering underlinings). Each number is written in one line and is
    > "underlined" like these:
    >
    > 123456 texttexttext
    >
    > ------
    > 235673 texttexttext
    > ------
    > 735499 texttexttext
    > ------


    > The underlinings run actually just 1 mm beneath the text (color: black). (I
    > don't know how
    > to present this style here in the newsreader.)
    >
    > How can I remove these perforated underlinings?


    An interesting problem.

    I've tried to do it on your sample and it did'n wotk on the first
    line, but worked on the other lines. I understand your sample differs
    from original image.
    Could you send a small graphic sample (3-4 lines)of your actual waork
    as a JPEG or GIF file to ?


    Regards

    VV
     
    VV, Jul 7, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jim Beaver

    Bad OCR results from faded typing

    Jim Beaver, Mar 26, 2005, in forum: Computer Support
    Replies:
    6
    Views:
    707
    Blinky the Shark
    Mar 26, 2005
  2. JoeAley2003

    OCR Software - Cleaning Lines

    JoeAley2003, Sep 3, 2003, in forum: Digital Photography
    Replies:
    6
    Views:
    367
    Mark Grebner
    Sep 7, 2003
  3. poster

    Use webcam as a low end scanner for OCR?

    poster, Mar 6, 2004, in forum: Digital Photography
    Replies:
    3
    Views:
    10,067
    twoflower
    Mar 6, 2004
  4. poster

    Use camera output for OCR?

    poster, May 13, 2004, in forum: Digital Photography
    Replies:
    1
    Views:
    325
    Frank ess
    May 13, 2004
  5. Rohit

    IT Job Guru - Certification Guru

    Rohit, Aug 13, 2008, in forum: A+ Certification
    Replies:
    0
    Views:
    1,964
    Rohit
    Aug 13, 2008
Loading...

Share This Page