Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Reading Data File Records

Reply
Thread Tools

Reading Data File Records

 
 
Graham
Guest
Posts: n/a
 
      09-09-2003
I'm a little frustrated with Perl's line-by-line file reading and I am
hoping that someone can help me.

I have a data file that looks like:

--
! Comment 1
! Comment 2
! Comment ...
5 ! number of levels
*aaa [aaa units] ! space deliminated is common
1.0 2.0 3.0 4.0 5.0
*bbb [bbb units] ! csv is possible
1.0, 2.0, 3.0,
4.0 5.0
*ccc [ccc units] ! the file is written from fortran and the number of
columns is not fixed
10.0
20.0
30.0
40.0
50.0
....
--

Essentially, there is a header block that always begins with '!' in
the first column. This is followed by the number of elements in each
data block and an unknown number of data blocks having a set number of
elements.

The file is generated using about five lines of FORTRAN so it seems
somehwat surprising that I am up to 30 lines of perl with almost no
end in sight... Does anyone have an example showing how to process a
file in blocks using Perl?

Thanks,
Graham
 
Reply With Quote
 
 
 
 
Brian Wakem
Guest
Posts: n/a
 
      09-09-2003

"Graham" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) om...
> I'm a little frustrated with Perl's line-by-line file reading and I am
> hoping that someone can help me.
>
> I have a data file that looks like:
>
> --
> ! Comment 1
> ! Comment 2
> ! Comment ...
> 5 ! number of levels
> *aaa [aaa units] ! space deliminated is common
> 1.0 2.0 3.0 4.0 5.0
> *bbb [bbb units] ! csv is possible
> 1.0, 2.0, 3.0,
> 4.0 5.0
> *ccc [ccc units] ! the file is written from fortran and the number of
> columns is not fixed
> 10.0
> 20.0
> 30.0
> 40.0
> 50.0
> ...
> --
>
> Essentially, there is a header block that always begins with '!' in
> the first column. This is followed by the number of elements in each
> data block and an unknown number of data blocks having a set number of
> elements.
>
> The file is generated using about five lines of FORTRAN so it seems
> somehwat surprising that I am up to 30 lines of perl with almost no
> end in sight... Does anyone have an example showing how to process a
> file in blocks using Perl?



What do you want to do with it?

--
Brian Wakem


 
Reply With Quote
 
 
 
 
James Willmore
Guest
Posts: n/a
 
      09-09-2003
On 9 Sep 2003 08:14:57 -0700
http://www.velocityreviews.com/forums/(E-Mail Removed) (Graham) wrote:
<snip>
> The file is generated using about five lines of FORTRAN so it seems
> somehwat surprising that I am up to 30 lines of perl with almost no
> end in sight... Does anyone have an example showing how to process
> a file in blocks using Perl?


Post your code - I have no idea what you are trying to do. Maybe it's
just me

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
You cannot kill time without injuring eternity.

 
Reply With Quote
 
Tulan W. Hu
Guest
Posts: n/a
 
      09-09-2003
"Graham" <(E-Mail Removed)> wrote in message ...
[snip..]
> The file is generated using about five lines of FORTRAN so it seems
> somehwat surprising that I am up to 30 lines of perl with almost no
> end in sight... Does anyone have an example showing how to process a
> file in blocks using Perl?


I would download the File::Slurp module from cpan and installed it.
http://search.cpan.org/author/MUIR/F...urp-2004.0904/

====
#!/usr/bin/perl
use File::Slurp;

@allLines = read_file("data_file_name");
foreach my $line (@allLine) {
# in case you need process each line
if ($line =~ /^!/) { # comment lines }
else { # datalines}
}


 
Reply With Quote
 
Jay Tilton
Guest
Posts: n/a
 
      09-09-2003
(E-Mail Removed) (Graham) wrote:

: I have a data file that looks like:
:
: --
: ! Comment 1
: ! Comment 2
: ! Comment ...
: 5 ! number of levels
: *aaa [aaa units] ! space deliminated is common
: 1.0 2.0 3.0 4.0 5.0
: *bbb [bbb units] ! csv is possible
: 1.0, 2.0, 3.0,
: 4.0 5.0
^
^
Should there be a comma between those two values?

: *ccc [ccc units] ! the file is written from fortran and the number of
: columns is not fixed

Is this really how the data file is formatted, or did your newsreader
word-wrap that line for you?

: 10.0
: 20.0
: 30.0
: 40.0
: 50.0
: ...
: --
:
: Essentially, there is a header block that always begins with '!' in
: the first column. This is followed by the number of elements in each
: data block and an unknown number of data blocks having a set number of
: elements.

The problem is determining where one block ends and another begins when
the only thing known about the block is how many elements it contains.
There's no apparent consistency or predictability to how the blocks may
be formatted, or to how the elements are separated. Altering the input
record separator, $/, then reading in a number of records isn't going to
work.

What might work would be to read lines of data until a block's requisite
number of elements have been acquired, but the elements themselves will
need to have a consistent, recognizable format, and a newline character
has to mark the boundary between blocks. From the sample data, the
elemets all seem to be numbers with one place after the decimal.

As a first approximation of workable code,

#!perl
use warnings;
use strict;
my $elems_per_block;
while(<DATA>) {
next if /^!/;
($elems_per_block) = /^(\d+)/;
last;
}
my @blocks;
while(<DATA>) {
my $block = $_;
my $n = 0;
while(<DATA>) {
$block .= $_;
last if $elems_per_block == ($n += () = /(\b\d+\.\d\b)/g);
}
push @blocks, $block;
}
for( @blocks ) {
# whatever processing each block needs
print "Block:\n$_\n";
}

__DATA__
! Comment 1
! Comment 2
! Comment ...
5 ! number of levels
*aaa [aaa units] ! space deliminated is common
1.0 2.0 3.0 4.0 5.0
*bbb [bbb units] ! csv is possible
1.0, 2.0, 3.0,
4.0 5.0
*ccc [ccc units] ! the file is written from fortran and the number of
columns is not fixed
10.0
20.0
30.0
40.0
50.0

: The file is generated using about five lines of FORTRAN so it seems
: somehwat surprising that I am up to 30 lines of perl with almost no
: end in sight...

Why should that be surprising? You're trying to build a modicum of
intelligence into one tool to compensate for another's lack of
sophistication. The Perl program would have a much easier time reading
if the FORTRAN program was only a little better at writing.

 
Reply With Quote
 
James Willmore
Guest
Posts: n/a
 
      09-10-2003
On 9 Sep 2003 15:41:03 -0700
(E-Mail Removed) (Graham) wrote:
> It seems it isn't just you. All I am trying to do is get the data
> blocks into a suitable perl structure so I can calculate some simple
> statistics and reformat it for another program. See comments in the
> second while loop.
>
> I really appreciate the help. I have a pile of files with this type
> of structure (a legacy of an ancient postdoc) that I need to
> manipulate and reformat.


First, let me say that each language is going to handle files and
variables differently. I say this because you commented on using
FORTRAN. I know nothing about FORTRAN, but have had _some_ dealings
with COBOL. Some functionality in COBOL is unavailable in Perl (such
as strictly defining variables). By the same token, there's
functionaility in Perl that is not available in COBOL (such as regular
expressions). Having said that, here is some untested code that _may_
fit the bill for you. Again, it's untested and may _not_ be exactly
what you're looking for. If I'm off, I'm hoping someone will point
out where the errors are.

==untested==
#!/usr/bin/perl -w
use strict;

#define the name of the file
my $file = 'name_of_file_here';

#define a hash (associative array) for your records
my %records;

#open a file handle to the file - die if we can't open it
open(FILE, $file)
or die "Can't open file $file: $!\n";

#get the header - if it's the first line and
#leads with a "!"
my $header = <FILE> if /^!/;
#if you want the number of levels, get the portion before the first
"!"
#can be done with substr - regular expression used for
#demonstration purposes
my $numLev = $1 if $header =~ m/^(.*)!/;

#while the file is open and does not return eof
while(<FILE>){
#chomp the newline off the line
chomp;
#stick the line of the file into variable $line
my $line = $_;
#get the begining of the line up until the first "!"
#(strip the comments)
#again - substr could be used
my $uncommented_line = $1 if m/^(.*)!/;
#if the record is 132 characters in length, separated by
whitespace
#spilt the line on whitespace and place each 'section' into an
array
my @data = split / /, $uncommented_line;
#create the key for the record using the block id
my $key = shift @data;
#store the record as an array into the hash using the block id as the
key
push @{$records{$key}}, @data;
}

#to retrieve the records ...
foreach my $k(sort keys %records){
print "$k => ",join(" ",@{$record{$k}}),"\n";
}
==untested==

HTH

--
Jim

Copyright notice: all code written by the author in this post is
released under the GPL. http://www.gnu.org/licenses/gpl.txt
for more information.

a fortune quote ...
What this country needs is a good five cent microcomputer.

 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      09-10-2003
Jay Tilton <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> (E-Mail Removed) (Graham) wrote:


> : The file is generated using about five lines of FORTRAN so it seems
> : somehwat surprising that I am up to 30 lines of perl with almost no
> : end in sight...
>
> Why should that be surprising? You're trying to build a modicum of
> intelligence into one tool to compensate for another's lack of
> sophistication. The Perl program would have a much easier time reading
> if the FORTRAN program was only a little better at writing.


Also, parsing input is generally harder than generating output. Printing
what comes along is easy. To read it back in, you must often (as in
the OPs case) understand what you have read so far to know how to
proceed.

The C functions printf() and scanf() are an attempt to make printing
and scanning symmetric. A look at their respective frequency of use
shows that the attempt wasn't a full success.

Anno
 
Reply With Quote
 
Mike Flannigan
Guest
Posts: n/a
 
      09-10-2003


Graham wrote:

>
> It seems it isn't just you. All I am trying to do is get the data
> blocks into a suitable perl structure so I can calculate some simple
> statistics and reformat it for another program. See comments in the
> second while loop.
>
> I really appreciate the help. I have a pile of files with this type
> of structure (a legacy of an ancient postdoc) that I need to
> manipulate and reformat.


snip


Don't be afraid to slurp the whole file. I slurp 400,000+
line files very quickly and do the processing. The only
trouble is if you do it more than once in the program.
You might see a big slowdown - at least on Win2000.

I never found a good solution to this (yet), so I just
run a bunch on individual perl scripts - one for each
file.

If you find a better solution, let us know.


Mike


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Simple query returns 0 records in asp, but all records in vbscript masg0013@gmail.com ASP General 3 11-02-2006 09:23 AM
Reading in text file and comparing records Craig Java 1 03-15-2005 12:27 PM
Delete records or update records Dan ASP General 1 05-10-2004 01:25 PM
match muliple header records to associated detail records Luke Airig XML 0 12-31-2003 12:06 AM
Reading the number of records in an XML file Milo Woodward XML 6 08-28-2003 11:22 AM



Advertisments