On Sep 25, 6:20*pm, william <huxian...@gmail.com> wrote:
> I'm writing perl scripts to retrieve data from email messages. Here
> are two .txt files.
> ACNI050124_05_04_59.txt
>
> received fifteen thousand dollars from
> an unaffiliated third party
>
> Section 27A of the Securities Act of 1933 and Section 21E of the
> Securities Exchange Act of 1934,
>
> involve a number of risks
> and uncertainties which could cause actual results to differ
> materially from those presently anticipated.
>
> ZLDV060318_19_32_11.txt
> We have received one hundred thirty five thousand free *trading shares
> from a
> third party not an officer, director or affiliate shareholder for our
> services. We intend to
> sell all these shares now, which could cause the stock to go down,
> resulting in losses for you.
> Do your due diligence before you invest.
>
> I want to achieve the following output to an excel table.
>
> filename
> dollars * * * * * * * * * * *shares
> ACNI050124_05_04_59.txt * * * * * 15000 * * * * * ** * * * * * *-9
> ZLDV060318_19_32_11.txt * * * * * * * -9 * * * * * * * * * *135000
>
> -9 simply means that we don't find any information related to shares
> or dollars in the file.
>
> It seems to be a simple task at first. But I realize that it is quite
> complicated when I start to write the script. Any suggestions from you
> will be highly appreciated.
>
> William
This is a common problem in VXML, converting spoken words to text. In
fact, since there is so much more variation in spoken language than in
written language (due to accent, tone of voice, quality of voice,
etc.) it's harder.
Construct a hash using all known variables as keys to convert into
numeric values:
{ one => 1,
two => 2,
hundred => 00,
}
and so on. Be care of variations like 'fifteen hundred' vs. 'one
thousand five hundred'.
CC
|