MSOffice OCR scan

Discussion in 'Computer Support' started by Jim S, Oct 25, 2008.

  1. Jim S

    Jim S Guest

    If I use MSoffice Document Imaging from my scanner then use <Save Text to
    Word>, the result is accurately OCR'd but is in the default word new page
    format and not formatted as the original document. Is there a way or
    retaining the original layout?
    WinXP
    OfficeXP (2003?)
    --
    Jim S
    Tyneside UK
     
    Jim S, Oct 25, 2008
    #1
    1. Advertising

  2. Jim S

    Jim S Guest

    On Sat, 25 Oct 2008 08:02:28 -0700, Mike Easter wrote:

    > Jim S wrote:
    >> If I use MSoffice Document Imaging from my scanner then use <Save Text
    >> to Word>, the result is accurately OCR'd but is in the default word new
    >> page format and not formatted as the original document. Is there a way
    >> or retaining the original layout?
    >> WinXP
    >> OfficeXP (2003?)

    >
    > What do you mean? If the 'original' (the printed page you start with)
    > isn't a 'document' (a digital document format), then when you scan the
    > original you are creating a collection of pixels like a picture or image
    > which is 'originally' (new/secondary original) in TIFF or MDI MS document
    > imaging format. Then, when you perform OCR on the picture/image tiff/mdi
    > pixel arrangement, you are 'hoping to' have the arrangement of pixels be
    > estimated/ guessed at/ the text which was/ used to be/ present in the
    > originally original original (x3 !) digital document format that was used
    > to print the original original paper output you scanned.
    >
    > OTOH, if you are the one who produced the original printed page, then you
    > should save the original digital document. If someone else has the
    > original digital document and you can get it from them, then that is the
    > best source.
    >
    > Otherwise, you are left to scanning the original printed page at a very
    > high resolution so that you can reproduce it as accurately as possible,
    > and not go with the OCR conversion process at all.


    Noo....
    The scanned document contained paragraphs, some of which were left aligned
    and some were centred. Part was in one font and part in another.
    When I use <Save text to word> the Word document is in my default font and
    all left aligned.
    --
    Jim S
    Tyneside UK
     
    Jim S, Oct 25, 2008
    #2
    1. Advertising

  3. Jim S

    Jim S Guest

    On Sun, 26 Oct 2008 22:22:48 +1300, PeeCee wrote:

    > "Jim S" <> wrote in message
    > news:...
    >> If I use MSoffice Document Imaging from my scanner then use <Save Text to
    >> Word>, the result is accurately OCR'd but is in the default word new page
    >> format and not formatted as the original document. Is there a way or
    >> retaining the original layout?
    >> WinXP
    >> OfficeXP (2003?)
    >> --
    >> Jim S
    >> Tyneside UK

    >
    >
    > OCR "Optical Character Recognition"
    >
    > As such the main thrust of these programs is to automatically extract the
    > text information from the scanned document.
    > Other aspects of the scanned document like pictures and layout require
    > considerably greater program sophistication and user input.
    > I would guess your OCR program is the one that came with your scanner. (Abby
    > Fine Reader ?)
    > As such it will have the minimum necessary to enable the vendor of the
    > scanner to claim an OCR program has been included with the product.
    > These bundled programs concentrate on extracting the text and just ignore
    > formating, fonts and other characteristics of the original scan.
    >
    > So when you OCR from your scanner to Word, all the OCR program does is
    > recognise the text only component and squirt it at Word.
    > Word in it's turn accepts the text stream as if it was being typed in, hence
    > the use of Words default font and paragraph style.
    >
    > To get the results you are apparently expecting you will have to look at
    > something like OmniPage.
    > I'ts a while since I used it but I seem to remember OmniPage came up with a
    > preprocessing screen that enabled you to define which areas of the scan were
    > Graphic elements and which were Text elements. This was achieved by the user
    > drawing boxes around these various elements.
    > On top of that you could tell OmniPage the order to process the text boxes
    > and whether to follow the original layout or not.
    >
    > OmniPage is an expensive program so you can understand why it is not bundled
    > with your average scanner.
    > I'm sure if it is what you want though you'll spring for it, but don't
    > expect it to recreate your original as accurately as you might like.
    >
    > Best
    > Paul.
    >
    >
    > -- Posted on news://freenews.netfront.net - Complaints to --


    Thanks
    I dug out an old copy of TextbridgePro98
    It's not as accurate as the MSoffice one, but it works well enough if used
    manually. It retains pictures and formatting too.
    --
    Jim S
    Tyneside UK
     
    Jim S, Oct 27, 2008
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Murgi

    Who is an OCR guru?

    Murgi, Jul 1, 2004, in forum: Computer Support
    Replies:
    6
    Views:
    775
  2. Jim Beaver

    Bad OCR results from faded typing

    Jim Beaver, Mar 26, 2005, in forum: Computer Support
    Replies:
    6
    Views:
    712
    Blinky the Shark
    Mar 26, 2005
  3. Edge
    Replies:
    1
    Views:
    388
    Night_Seer
    Jan 9, 2004
  4. Lawrence D'Oliveiro

    MSOffice->OpenOffice.org migration study

    Lawrence D'Oliveiro, Oct 23, 2005, in forum: NZ Computing
    Replies:
    15
    Views:
    526
    Bling Bling
    Oct 25, 2005
  5. digger odell

    msOffice Language

    digger odell, Oct 8, 2010, in forum: Computer Support
    Replies:
    2
    Views:
    425
    Zu Arsschlaark!
    Oct 9, 2010
Loading...

Share This Page