On Thu, 25 Sep 2008 17:51:48 -0700 Jim Gibson <> wrote:
JG> In article
JG> <820d8d96-2839-45ed-8ca9->,
JG> william <> wrote:
>> I'm writing perl scripts to retrieve data from email messages. Here
>> are two .txt files.
>> ACNI050124_05_04_59.txt
>>
>> received fifteen thousand dollars ...
>>
>> ZLDV060318_19_32_11.txt
>> We have received one hundred thirty five thousand ...
>> I want to achieve the following output to an excel table.
>>
>> filename
>> dollars shares
>> ACNI050124_05_04_59.txt 15000 -9
>> ZLDV060318_19_32_11.txt -9 135000
>>
>> -9 simply means that we don't find any information related to shares
>> or dollars in the file.
(the comments are for the OP mainly)
Have you considered empty fields instead of special values to denote
absence of value? Specifically, you may need negative numbers for
shares later if you want to indicate buy/sell modes.
>>
>> It seems to be a simple task at first. But I realize that it is quite
>> complicated when I start to write the script. Any suggestions from you
>> will be highly appreciated.
JG> It doesn't seem simple at all. You are trying to parse free-form
JG> English written by various people and extract numerical data from
JG> alphabetic number names. My suggestion is to give it up before you
JG> start.
It's not impossible, and certainly it's interesting. Perhaps
http://web.media.mit.edu/~hugo/montylingua/ will be useful; it has Java
and Python interfaces and a Perl interface may be doable. At the very
least you can parse the montylingua analyzer output.
Ted