Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Words to numbers

Reply
Thread Tools

Words to numbers

 
 
william
Guest
Posts: n/a
 
      09-25-2008
I'm writing perl scripts to retrieve data from email messages. Here
are two .txt files.
ACNI050124_05_04_59.txt

received fifteen thousand dollars from
an unaffiliated third party

Section 27A of the Securities Act of 1933 and Section 21E of the
Securities Exchange Act of 1934,

involve a number of risks
and uncertainties which could cause actual results to differ
materially from those presently anticipated.

ZLDV060318_19_32_11.txt
We have received one hundred thirty five thousand free trading shares
from a
third party not an officer, director or affiliate shareholder for our
services. We intend to
sell all these shares now, which could cause the stock to go down,
resulting in losses for you.
Do your due diligence before you invest.


I want to achieve the following output to an excel table.

filename
dollars shares
ACNI050124_05_04_59.txt 15000 -9
ZLDV060318_19_32_11.txt -9 135000

-9 simply means that we don't find any information related to shares
or dollars in the file.

It seems to be a simple task at first. But I realize that it is quite
complicated when I start to write the script. Any suggestions from you
will be highly appreciated.

William
 
Reply With Quote
 
 
 
 
cartercc
Guest
Posts: n/a
 
      10-14-2008
On Sep 25, 6:20*pm, william <(E-Mail Removed)> wrote:
> I'm writing perl scripts to retrieve data from email messages. Here
> are two .txt files.
> ACNI050124_05_04_59.txt
>
> received fifteen thousand dollars from
> an unaffiliated third party
>
> Section 27A of the Securities Act of 1933 and Section 21E of the
> Securities Exchange Act of 1934,
>
> involve a number of risks
> and uncertainties which could cause actual results to differ
> materially from those presently anticipated.
>
> ZLDV060318_19_32_11.txt
> We have received one hundred thirty five thousand free *trading shares
> from a
> third party not an officer, director or affiliate shareholder for our
> services. We intend to
> sell all these shares now, which could cause the stock to go down,
> resulting in losses for you.
> Do your due diligence before you invest.
>
> I want to achieve the following output to an excel table.
>
> filename
> dollars * * * * * * * * * * *shares
> ACNI050124_05_04_59.txt * * * * * 15000 * * * * * ** * * * * * *-9
> ZLDV060318_19_32_11.txt * * * * * * * -9 * * * * * * * * * *135000
>
> -9 simply means that we don't find any information related to shares
> or dollars in the file.
>
> It seems to be a simple task at first. But I realize that it is quite
> complicated when I start to write the script. Any suggestions from you
> will be highly appreciated.
>
> William


This is a common problem in VXML, converting spoken words to text. In
fact, since there is so much more variation in spoken language than in
written language (due to accent, tone of voice, quality of voice,
etc.) it's harder.

Construct a hash using all known variables as keys to convert into
numeric values:
{ one => 1,
two => 2,
hundred => 00,
}

and so on. Be care of variations like 'fifteen hundred' vs. 'one
thousand five hundred'.

CC
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Words and non-words, according to Microsoft et al Steve B NZ Computing 11 03-21-2008 11:52 PM
Replace stop words (remove words from a string) BerlinBrown Python 6 01-17-2008 02:37 PM
Words Words utab C++ 6 02-16-2006 07:00 PM
Non-noise words are incorrectly recognised as noise words. Peter Strĝiman ASP .Net 1 08-23-2005 01:26 PM
Re: A little bit of help regarding my linked list program required. - "words.c" - "words.c" Richard Heathfield C Programming 7 10-05-2003 02:38 PM



Advertisments