Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > text parsing

Reply
Thread Tools

text parsing

 
 
Shalini Joshi
Guest
Posts: n/a
 
      06-18-2004
Hi!

I am relatively new to perl and am looking to parse a non-delimited
text file. What I would like to do is out of this file of records
which always begin with 'FPR' and could span multiple lines, extract
only some relevant records.

The criterion is the number denoted by characters 7 through 18..I have
a vague idea how to go about parsing this, but with so much
information(on the website and the other postings on the group) it's
kind of confusing what the best way to do it is..I am initially
interested in just getting the script to work...

I would just like to extract this info and create a newfile where i
would store it in the same format..Is there any way I can do it? Or
would I necessarily have to parse the info into an array or somethng
before i dump it into the new file?

Thanks and looking forward to any kind of help and tips.

--Shalini
 
Reply With Quote
 
 
 
 
Jeff 'japhy' Pinyan
Guest
Posts: n/a
 
      06-18-2004
[posted & mailed]

On 18 Jun 2004, Shalini Joshi wrote:

>I am relatively new to perl and am looking to parse a non-delimited
>text file. What I would like to do is out of this file of records
>which always begin with 'FPR' and could span multiple lines, extract
>only some relevant records.
>
>The criterion is the number denoted by characters 7 through 18..I have
>a vague idea how to go about parsing this, but with so much
>information(on the website and the other postings on the group) it's
>kind of confusing what the best way to do it is..I am initially
>interested in just getting the script to work...


It would be helpful if you showed us what code you're already trying.

The first thing I would do is read one 'record' from the file at a time.
You could use $/ to do this, but in your case it would be a little
trickier than usual, so I'll avoid that approach. Here, I'm just reading
until I get to a line that starts with "FPR".

my @records;

open RECORDS, "< file.txt" or die "can't read file.txt: $!";

# get the FIRST line of the record
local $_ = <RECORDS>;
{
# put the first line into $rec
my $rec = $_;

# get all subsequent lines of the record
while (<RECORDS>) {
# stop if we encounter an FPR line
last if /^FPR/;

# tack this line onto the $rec variable
$rec .= $_;
}

# add this record to the array
push @records, $rec;

# go back to the top of the block
# NOTE: at this point, $_ is the
# first line of the NEXT record
redo;
}
close RECORDS;

Now you have an array, @records, that contains the FPR-marked records in
your file. What you do with that array is up to you.

--
Jeff Pinyan RPI Acacia Brother #734 RPI Acacia Corp Secretary
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)



 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      06-18-2004
Shalini Joshi wrote:
> I am relatively new to perl and am looking to parse a non-delimited
> text file. What I would like to do is out of this file of records
> which always begin with 'FPR' and could span multiple lines,
> extract only some relevant records.
>
> The criterion is the number denoted by characters 7 through 18..


Not easy to suggest anything without sample data, but how about
something like this:

open OLDFILE, $oldfile or die $!;
open NEWFILE, "> $newfile" or die $!;
{
local $/ = 'FPR';
print NEWFILE scalar <OLDFILE>;
while (<OLDFILE>) {
my $num = substr ($_, 3, 12;
if ( ... some tests of $num ... ) {
print NEWFILE $_;
}
}
}
close NEWFILE;
close OLDFILE;

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

 
Reply With Quote
 
Shalini Joshi
Guest
Posts: n/a
 
      06-18-2004
I also have the following data from the data file:

Sorry for not including all the information in one post. Will keep all
this in mind in all my future ones.

RHR001PRICE REFRESHER2004052620040526*AETNA*V (Header)
FPR001AET001A20AET81082004052601863063601863063600 000000000ING VP
Growth and Income M&E 1.40% (Data)
FPR001AET001A70AETN2062004052601384691901384691900 000000000ING VP
Growth and Income M&E 1.40% (Data)
RTR001PRICE REFRESHER000000569 (Trailer)


Of course the data doesnt wrap in the file..it's one single line(funny
when i print it on paper it gave me the data on multiple lines).

I am not sure why the code in Jeff's reply doesnt work..when i run it
it just sits in an infinite loop. I have to kill it to get back to my
prompt.

THanks for all the helpful posts.

Regards

Shalini
 
Reply With Quote
 
Shalini Joshi
Guest
Posts: n/a
 
      06-19-2004
Hi ..this is regarding my earlier post.

I got it working.Apparently because of the redo, it was going into an
infinite loop. Here's what i did and it works when I print out the
array elements now.

#! /usr/bin/perl
use strict;
my @records;
my $i;
open RECORDS, "< AETNA17.R3617.txt" or die "Can't read file:
$!";

# Get the first line of the record
local $_ = <RECORDS>;
#print $_ ;
while (<RECORDS>){
#Put the first line into $rec
my $rec = $_;

#get all subsequent lines of the record

while (<RECORDS>) {
# stop if we encounter an FPR line

last if /^FPR/;

# tack this line onto the $rec variable
$rec .= $_;
}
push @records, $rec;

#Go back to the top of the block
# At this point, $_ is the first line of the NEXT FPR record
print "$_";

}
pop @records; # To remove the trailer that is stored before
condition
# is tested
close RECORDS;

foreach $i (@records)
{
print "$i";
}


Am now working on dealing with the array elements.

Thanks a bunch for the help.

Regards,

Shalini
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
SAX parsing problem, when element contains text like "[text]" Kai Schlamp Java 1 03-27-2008 08:36 PM
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 09:01 PM
Assistance parsing text file using Text::CSV_XS Domenico Discepola Perl Misc 6 09-02-2004 03:55 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments