Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Parsing long files by using read

Reply
Thread Tools

Parsing long files by using read

 
 
gavs
Guest
Posts: n/a
 
      01-16-2004
Hi,

I am fairly new to perl and need to split a fairly large file that
contains no newlines. The records contained in this file is fixed
length. I have written the following code to split this long record
into 600 byte long records and appending a newline. After executing
this program, the file size doubles.

For example: a record in this file can be split up into 3 records of
600 byte length; hence the original length of this file is 1800 bytes.

size = size of the original file.

while($bytes_read < $size) {
my $record;
$bytes_read += read(FIN, $record, $record_len, $offset);
print "Bytes read # $bytes_read, OFFSET=$offset\n";

$record .= "\n";

print FOUT $record;
$offset += $record_len;
}

fclose(FIN);
fclose(FOUT);

Viewing the out file with vi generates the following:
"a" 3 lines, 3603 characters (1800 null characters)

Where are extra 1800 bytes coming from? How do I get rid of them?

Thanks.
gavs
 
Reply With Quote
 
 
 
 
Uri Guttman
Guest
Posts: n/a
 
      01-16-2004

perldoc perlvar. look for $/ and assigning it a ref to an integer.

uri

--
Uri Guttman ------ http://www.velocityreviews.com/forums/(E-Mail Removed) -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
 
Reply With Quote
 
 
 
 
Ben Morrow
Guest
Posts: n/a
 
      01-16-2004

(E-Mail Removed) (gavs) wrote:
> I am fairly new to perl and need to split a fairly large file that
> contains no newlines. The records contained in this file is fixed
> length. I have written the following code to split this long record
> into 600 byte long records and appending a newline. After executing
> this program, the file size doubles.
>
> For example: a record in this file can be split up into 3 records of
> 600 byte length; hence the original length of this file is 1800 bytes.
>
> size = size of the original file.
>
> while($bytes_read < $size) {
> my $record;
> $bytes_read += read(FIN, $record, $record_len, $offset);
> print "Bytes read # $bytes_read, OFFSET=$offset\n";
>
> $record .= "\n";
>
> print FOUT $record;
> $offset += $record_len;
> }
>
> fclose(FIN);
> fclose(FOUT);


Perl has no fclose function. Please show us your real code.

>
> Viewing the out file with vi generates the following:
> "a" 3 lines, 3603 characters (1800 null characters)
>
> Where are extra 1800 bytes coming from? How do I get rid of them?


The 'offset' parameter to read() is an offset into the string, not
into the file. The bytes are read from the file starting wherever the
last read left off. However, the whole thing looks more like C than
Perl.

Here's how I'd do it (untested):

{
local $/ = \600; # 600-byte input records
local $\ = "\n"; # see perldoc perlvar

open my $IN, ... or die "can't open input: $!";
open my $OUT, ... or die "can't open output: $!";

print $OUT $_ while <$IN>;
}
# no need for close() as the filehandles are closed when they go out
# of scope.

or indeed

perl -lpe'BEGIN { $/ = \600 }' < in > out

Ben

--
Joy and Woe are woven fine,
A Clothing for the Soul divine William Blake
Under every grief and pine 'Auguries of Innocence'
Runs a joy with silken twine. (E-Mail Removed)
 
Reply With Quote
 
Walter Roberson
Guest
Posts: n/a
 
      01-16-2004
In article <bu9gku$1jb$(E-Mail Removed)>,
Ben Morrow <(E-Mail Removed)> wrote:
: local $/ = \600; # 600-byte input records

How does that work, Ben? When I look at the documentation for $/
there does not appear to be an option for setting a record size.
And a reference to a scalar looks odd there...
--
Rump-Titty-Titty-Tum-TAH-Tee -- Fritz Lieber
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      01-16-2004
>>>>> "WR" == Walter Roberson <(E-Mail Removed)-cnrc.gc.ca> writes:

WR> In article <bu9gku$1jb$(E-Mail Removed)>,
WR> Ben Morrow <(E-Mail Removed)> wrote:
WR> : local $/ = \600; # 600-byte input records

WR> How does that work, Ben? When I look at the documentation for $/
WR> there does not appear to be an option for setting a record size.
WR> And a reference to a scalar looks odd there...

what docs are you looking at? perldoc perlvar says this:

Setting "$/" to a reference to an integer, scalar
containing an integer, or scalar that's convertible
to an integer will attempt to read records instead
of lines, with the maximum record size being the
referenced integer. So this:

$/ = \32768; # or \"32768", or \$var_containing_32768
open(FILE, $myfile);
$_ = <FILE>;

will read a record of no more than 32768 bytes from
FILE. If you're not reading from a record-oriented
file (or your OS doesn't have record-oriented
files), then you'll likely get a full chunk of data
with every read. If a record is larger than the
record size you've set, you'll get the record back
in pieces.


seems to be clearly documented to me.

uri

--
Uri Guttman ------ (E-Mail Removed) -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
 
Reply With Quote
 
gnari
Guest
Posts: n/a
 
      01-16-2004
"Walter Roberson" <(E-Mail Removed)-cnrc.gc.ca> wrote in message
news:bu9ht4$mh4$(E-Mail Removed)...
> In article <bu9gku$1jb$(E-Mail Removed)>,
> Ben Morrow <(E-Mail Removed)> wrote:
> : local $/ = \600; # 600-byte input records
>
> How does that work, Ben? When I look at the documentation for $/
> there does not appear to be an option for setting a record size.


see http://perldoc.com/perl5.8.0/pod/perlvar.html
look for $/, where it says:

Setting $/ to a reference to an integer, scalar containing an integer,
or scalar that's convertible to an integer will attempt to read records
instead of lines, with the maximum record size being the referenced
integer.

gnari



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Having compilation error: no match for call to (const __gnu_cxx::hash<long long int>) (const long long int&) veryhotsausage C++ 1 07-04-2008 05:41 PM
long long and long Mathieu Dutour C Programming 4 07-24-2007 11:15 AM
unsigned long long int to long double Daniel Rudy C Programming 5 09-20-2005 02:37 AM
Parsing long with MSB set using Long.parseLong() sameergn@gmail.com Java 0 06-07-2005 06:49 AM
Assigning unsigned long to unsigned long long George Marsaglia C Programming 1 07-08-2003 05:16 PM



Advertisments