Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Slurp large files into an array, first is quick, rest are slow

Reply
Thread Tools

Slurp large files into an array, first is quick, rest are slow

 
 
gdtrob@gmail.com
Guest
Posts: n/a
 
      12-28-2005
I am slurping a series of large .csv files (6MB) directly into an array
one at a time (then querying). The first time I slurp a file it is
incredibly quick. The second time I do it the slurping is very slow
despite the fact that I close the file (using a filehandle) and undef
the array. here is the relevant code:

open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open
targetfile: $!";
print "opened";
@chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
into memory
print "slurped";

(and after each loop)

close (TARGETFILE);
undef @chrfile;

If it is possible to quickly/simply fix this I would much rather keep
this method than setting up a line by line input to the array. The
first slurp is very efficient.

I am using activestate perl 5.6 on a win32 system with 1 gig ram:

 
Reply With Quote
 
 
 
 
Kevin Collins
Guest
Posts: n/a
 
      12-28-2005
In article <(E-Mail Removed) .com>,
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> I am slurping a series of large .csv files (6MB) directly into an array
> one at a time (then querying). The first time I slurp a file it is
> incredibly quick. The second time I do it the slurping is very slow
> despite the fact that I close the file (using a filehandle) and undef
> the array. here is the relevant code:
>
> open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open

^^^^^^^^^^^^^

No need to quote this. It should either be:
open (TARGETFILE,"CanRPT".$chromosome.".csv") || die "can't open
or
open (TARGETFILE,"CanRPT$chromosome.csv") || die "can't open

> targetfile: $!";
> print "opened";
> @chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
> into memory
> print "slurped";
>
> (and after each loop)
>
> close (TARGETFILE);


Not that it answers your question, but you should be able to close your file
immediately after slurping it in, rather than after a loop...

> undef @chrfile;
>
> If it is possible to quickly/simply fix this I would much rather keep
> this method than setting up a line by line input to the array. The
> first slurp is very efficient.
>
> I am using activestate perl 5.6 on a win32 system with 1 gig ram:



Kevin


--
Unix Guy Consulting, LLC
Unix and Linux Automation, Shell, Perl and CGI scripting
http://www.unix-guy.com
 
Reply With Quote
 
 
 
 
Mark Clements
Guest
Posts: n/a
 
      12-28-2005
(E-Mail Removed) wrote:
> I am slurping a series of large .csv files (6MB) directly into an array
> one at a time (then querying). The first time I slurp a file it is
> incredibly quick. The second time I do it the slurping is very slow
> despite the fact that I close the file (using a filehandle) and undef
> the array. here is the relevant code:
>
> open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open
> targetfile: $!";
> print "opened";
> @chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
> into memory
> print "slurped";
>
> (and after each loop)
>
> close (TARGETFILE);
> undef @chrfile;
>
> If it is possible to quickly/simply fix this I would much rather keep
> this method than setting up a line by line input to the array. The
> first slurp is very efficient.
>
> I am using activestate perl 5.6 on a win32 system with 1 gig ram:
>


I'd argue you'd be better off processing one line at a time, but anyway...

You need more detailed timing data: you are assuming that the extra time
is being spent in the slurp, but you have no timing data to prove this.

Use something like

Benchmark::Timer

to provide a detailed breakdown of where the time is being spent. You
may be surprised. It would be an idea to display file size and number of
lines at the same time.

Running with

use strict;
use warnings;

will save you a lot of heartache. Also, it is now recommended to use
lexically scoped filehandles:

open my $fh,"<","$filename"
or die "could not open $filename for read: $!";

You may also want to check out one of the cvs parsing modules available,
eg

DBD::CSV
Text::CSV_XS

Mark
 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      12-28-2005
(E-Mail Removed) wrote in
news:(E-Mail Removed) oups.com:

> I am slurping a series of large .csv files (6MB) directly into an
> array one at a time (then querying). The first time I slurp a file it
> is incredibly quick. The second time I do it the slurping is very slow
> despite the fact that I close the file (using a filehandle) and undef
> the array. here is the relevant code:
>
> open (TARGETFILE,"CanRPT"."$chromosome".".csv") || die "can't open
> targetfile: $!";
> print "opened";
> @chrfile = <TARGETFILE>; #slurp the chromosome-specific repeat file
> into memory
> print "slurped";
>
> (and after each loop)
>
> close (TARGETFILE);
> undef @chrfile;


Here is what the loop body would look like if I were writing this:

{
my $name = sprintf 'CanRPT%s.csv', $chromosome;
open my $target, $name
or die "Cannot open '$name': $!";
my @chrfile = <$target>;

# do something with @chrfile
}

> If it is possible to quickly/simply fix this I would much rather keep
> this method than setting up a line by line input to the array. The
> first slurp is very efficient.
>
> I am using activestate perl 5.6 on a win32 system with 1 gig ram:


I am assuming the problem has to do with your coding style. You don't
seem to be using lexicals effectively, and the fact that you are
repeatedly slurping is a red flag.

Can't you read the file once (slurped or line-by-line) and build the
data structure it represents, and then use that data structure for
further processing.

It is impossible to tell without having seen the program, but the
constant slurping might be causing memory fragmentation and therefore
excessive pagefile hits. Dunno, really.

Sinan
--
--
A. Sinan Unur <(E-Mail Removed)>
(reverse each component and remove .invalid for email address)

comp.lang.perl.misc guidelines on the WWW:
http://mail.augustmail.com/~tadmc/cl...uidelines.html
 
Reply With Quote
 
Larry
Guest
Posts: n/a
 
      12-28-2005
(E-Mail Removed) wrote:

> I am using activestate perl 5.6 on a win32 system with 1 gig ram:


You may want to consider upgrading... 5.8 has been out for several
years.

 
Reply With Quote
 
Smegal
Guest
Posts: n/a
 
      12-28-2005
Thanks everyone,

I thought this might be a simple slurp usage problem ie: freeing up
memory or something because the program runs, its just really slow
after the first slurp. But I wasn't able to find anything google
searching. I'll look into improving my coding as suggested and see if
the problem persists.

Grant

 
Reply With Quote
 
Eric J. Roode
Guest
Posts: n/a
 
      12-29-2005
"A. Sinan Unur" <(E-Mail Removed)> wrote in
news:Xns973A9C0195EA7asu1cornelledu@127.0.0.1:

> my $name = sprintf 'CanRPT%s.csv', $chromosome;


OOC, why use sprintf here instead of

my $name = "CanRPT$chromosome.csv";

?

--
Eric
`$=`;$_=\%!;($_)=/(.)/;$==++$|;($.,$/,$,,$\,$",$;,$^,$#,$~,$*,$:,@%)=(
$!=~/(.)(.).(.)(.)(.)(.)..(.)(.)(.)..(.)......(.)/,$"),$=++;$.++;$.++;
$_++;$_++;($_,$\,$,)=($~.$"."$;$/$%[$?]$_$\$,$:$%[$?]",$"&$~,$#,);$,++
;$,++;$^|=$";`$_$\$,$/$:$;$~$*$%[$?]$.$~$*${#}$%[$?]$;$\$"$^$~$*.>&$=`
 
Reply With Quote
 
Big and Blue
Guest
Posts: n/a
 
      12-30-2005
(E-Mail Removed) wrote:

> undef @chrfile;


Why bother? You are about to replace this with the read of the next
file. This means that you chuck away all of the memory allocation you have
just for Perl to reassign it all. This may lead to heap memory fragmentation.

--
Just because I've written it doesn't mean that
either you or I have to believe it.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
*** REST, large POST, and "Server Error" (for gurus only) Amil Hanish ASP .Net Web Services 0 07-30-2009 04:16 PM
Best way to slurp a file into a string? Wes Gamble Ruby 7 03-23-2006 10:34 PM
slurp a file into an array without newlines Dick Davies Ruby 1 09-29-2005 02:46 PM
File slurp takes a long time on some files Tom Sliva Perl Misc 7 11-23-2004 06:12 PM
Backing Up Large Files..Or A Large Amount Of Files Scott D. Weber For Unuathorized Thoughts Inc. Computer Support 1 09-19-2003 07:28 PM



Advertisments