Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Extracting text from .png images

Reply
Thread Tools

Extracting text from .png images

 
 
Henrik Berg Nielsen
Guest
Posts: n/a
 
      10-01-2003
Hi group!

I need to extract some text (well numbers actually) from a bunch of
similarly looking .png images. After extraction the numbers will be fed to a
Python script for further processing. Any good ideas on how to go about with
this? I have no idea whatsoever about how to extract the numbers out of the
images...

Thanks in advance,

Henrik


 
Reply With Quote
 
 
 
 
John J. Lee
Guest
Posts: n/a
 
      10-01-2003
"Henrik Berg Nielsen" <(E-Mail Removed)> writes:

> I need to extract some text (well numbers actually) from a bunch of
> similarly looking .png images. After extraction the numbers will be fed to a
> Python script for further processing. Any good ideas on how to go about with
> this? I have no idea whatsoever about how to extract the numbers out of the
> images...


OCR is the TLA you're looking for ("Optical Character Recognition").

Dunno if there are any good free OCR engines. With these sorts of
hard algorithms, you tend to get what you pay for.


John
 
Reply With Quote
 
 
 
 
Indigo Moon Man
Guest
Posts: n/a
 
      10-01-2003
Henrik Berg Nielsen <(E-Mail Removed)> spake thusly:
>
> I need to extract some text (well numbers actually) from a bunch of
> similarly looking .png images. After extraction the numbers will be fed
> to a Python script for further processing. Any good ideas on how to go
> about with this? I have no idea whatsoever about how to extract the
> numbers out of the images...
>

This might help you out...
http://www.pricelessware.org/2003/PL...tm#Convert-OCR

I'm not sure if it does PNG, you might have to convert the file to tiff or
bmp or something.


--
Audio Bible Online:
http://www.audio-bible.com/


 
Reply With Quote
 
Lee Harr
Guest
Posts: n/a
 
      10-01-2003
In article <wbDeb.2223$(E-Mail Removed)2net.dk>, Henrik Berg Nielsen wrote:
> Hi group!
>
> I need to extract some text (well numbers actually) from a bunch of
> similarly looking .png images. After extraction the numbers will be fed to a
> Python script for further processing. Any good ideas on how to go about with
> this? I have no idea whatsoever about how to extract the numbers out of the
> images...
>



http://www.claraocr.org/

 
Reply With Quote
 
Skip Montanaro
Guest
Posts: n/a
 
      10-01-2003
John> OCR is the TLA you're looking for ("Optical Character Recognition").

John> Dunno if there are any good free OCR engines. With these sorts of
John> hard algorithms, you tend to get what you pay for.

Which often means there's a piece of free software out there which works
better than the most expensive commercial solutions. <wink>

A little googling suggests this might be a candidate:

http://www.claraocr.org/

I have no idea if there's an exported library and/or a Python wrapper, but
it's probably worth a look.

Skip

 
Reply With Quote
 
Tim Roberts
Guest
Posts: n/a
 
      10-02-2003
"Henrik Berg Nielsen" <(E-Mail Removed)> wrote:
>
>I need to extract some text (well numbers actually) from a bunch of
>similarly looking .png images. After extraction the numbers will be fed to a
>Python script for further processing. Any good ideas on how to go about with
>this? I have no idea whatsoever about how to extract the numbers out of the
>images...


Are you hoping to extract the "password" characters from the pictures
presented by the whois checks? If so, you should give up now, because
those images are SPECIFICALLY designed to make them almost impervious to
automated recognition.
--
- Tim Roberts, http://www.velocityreviews.com/forums/(E-Mail Removed)
Providenza & Boekelheide, Inc.
 
Reply With Quote
 
Lukas Ccenovsky
Guest
Posts: n/a
 
      10-02-2003
Henrik Berg Nielsen wrote:
> Hi group!
>
> I need to extract some text (well numbers actually) from a bunch of
> similarly looking .png images. After extraction the numbers will be fed to a
> Python script for further processing. Any good ideas on how to go about with
> this? I have no idea whatsoever about how to extract the numbers out of the
> images...


Hi,
I'm dealing with similar problem now. My pictures are very complicated
(construction drawings). I am trying to use gamera
(http://dkc.jhu.edu/gamera/) for OCR and it seems very promising.

--
-- Lukas


 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      10-02-2003
On Wed, 01 Oct 2003 20:25:45 -0700, Tim Roberts <(E-Mail Removed)> wrote:

>"Henrik Berg Nielsen" <(E-Mail Removed)> wrote:
>>
>>I need to extract some text (well numbers actually) from a bunch of
>>similarly looking .png images. After extraction the numbers will be fed to a
>>Python script for further processing. Any good ideas on how to go about with
>>this? I have no idea whatsoever about how to extract the numbers out of the
>>images...

>
>Are you hoping to extract the "password" characters from the pictures
>presented by the whois checks? If so, you should give up now, because
>those images are SPECIFICALLY designed to make them almost impervious to
>automated recognition.

Sounds interesting as a problem, but I wouldn't want to create a skeleton key
for any bad guys

Regards,
Bengt Richter
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extracting images from a PDF file Doug Farrell Python 5 12-28-2007 01:26 PM
Re: Alternatives for Extracting EXIF and JPEG Data from Images Max Erickson Python 3 03-05-2007 03:36 AM
Alternatives for Extracting EXIF and JPEG Data from Images Roger Python 0 03-04-2007 09:03 PM
Extracting images from HTML sharonf Computer Support 4 01-24-2007 07:34 PM
Extracting Images from RTF documents ming C++ 2 01-24-2007 02:05 PM



Advertisments