Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > m// on very long lines leaks memory

Reply
Thread Tools

m// on very long lines leaks memory

 
 
ShaunJ
Guest
Posts: n/a
 
      03-13-2008
The following snippet leaks memory until it breaks and falls down when
m// is used on a very long line. It works fine if the line lengths are
short. Try
../test.pl /usr/share/dict/words /usr/share/dict/words
Depending on your dictionary, you'll see that compiling the regex
takes about 200 MB. However the following matching loop leaks memory
at an alarming rate. Start up `top` and watch it run. I'm using Perl
5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
or deny this behaviour for other architectures or version of Perl,
that would be interesting too.

Cheers,
Shaun

#!/usr/bin/perl
use strict;
use English;
open REFILE, '<' . shift;
chomp (my @restrings = <REFILE>);
close REFILE;
my @re = map { qr/$_/ } @restrings;

open TEXTFILE, '<' . shift;
chomp (my @text = <TEXTFILE>);
close TEXTFILE;
my $text = join '', @text;

foreach my $re (@re) {
if ($text =~ m/$re/) {
print $LAST_MATCH_START[0], "\n";
}
}
 
Reply With Quote
 
 
 
 
John W. Krahn
Guest
Posts: n/a
 
      03-13-2008
ShaunJ wrote:
> The following snippet leaks memory until it breaks and falls down when
> m// is used on a very long line. It works fine if the line lengths are
> short. Try
> ./test.pl /usr/share/dict/words /usr/share/dict/words
> Depending on your dictionary, you'll see that compiling the regex
> takes about 200 MB. However the following matching loop leaks memory
> at an alarming rate. Start up `top` and watch it run. I'm using Perl
> 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
> or deny this behaviour for other architectures or version of Perl,
> that would be interesting too.
>
> Cheers,
> Shaun
>
> #!/usr/bin/perl
> use strict;
> use English;
> open REFILE, '<' . shift;
> chomp (my @restrings = <REFILE>);
> close REFILE;
> my @re = map { qr/$_/ } @restrings;
>
> open TEXTFILE, '<' . shift;
> chomp (my @text = <TEXTFILE>);
> close TEXTFILE;
> my $text = join '', @text;
>
> foreach my $re (@re) {
> if ($text =~ m/$re/) {
> print $LAST_MATCH_START[0], "\n";
> }
> }


I tested it and if I remove the English module it works fine.
(So don't use English.pm!)



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
 
Reply With Quote
 
 
 
 
John W. Krahn
Guest
Posts: n/a
 
      03-13-2008
John W. Krahn wrote:
> ShaunJ wrote:
>> The following snippet leaks memory until it breaks and falls down when
>> m// is used on a very long line. It works fine if the line lengths are
>> short. Try
>> ./test.pl /usr/share/dict/words /usr/share/dict/words
>> Depending on your dictionary, you'll see that compiling the regex
>> takes about 200 MB. However the following matching loop leaks memory
>> at an alarming rate. Start up `top` and watch it run. I'm using Perl
>> 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
>> or deny this behaviour for other architectures or version of Perl,
>> that would be interesting too.
>>
>> Cheers,
>> Shaun
>>
>> #!/usr/bin/perl
>> use strict;
>> use English;
>> open REFILE, '<' . shift;
>> chomp (my @restrings = <REFILE>);
>> close REFILE;
>> my @re = map { qr/$_/ } @restrings;
>>
>> open TEXTFILE, '<' . shift;
>> chomp (my @text = <TEXTFILE>);
>> close TEXTFILE;
>> my $text = join '', @text;
>>
>> foreach my $re (@re) {
>> if ($text =~ m/$re/) {
>> print $LAST_MATCH_START[0], "\n";
>> }
>> }

>
> I tested it and if I remove the English module it works fine.
> (So don't use English.pm!)


Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:

use English qw( -no_match_vars );



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      03-13-2008
ShaunJ <(E-Mail Removed)> wrote:
> The following snippet leaks memory until it breaks and falls down when
> m// is used on a very long line. It works fine if the line lengths are
> short. Try
> ./test.pl /usr/share/dict/words /usr/share/dict/words
> Depending on your dictionary, you'll see that compiling the regex
> takes about 200 MB. However the following matching loop leaks memory
> at an alarming rate. Start up `top` and watch it run. I'm using Perl
> 5.8.6 built for darwin-thread-multi-2level. If anyone cares to confirm
> or deny this behaviour for other architectures or version of Perl,
> that would be interesting too.


Technically, this does not seem to be a leak. If I throw in infinite
loop around your foreach my $re (@re) loop, then memory only grows
up to 15.5Gig when the inner loop completes. Upon the next iteration of
the outer loop, memory stops growing. So it seems like it is an
inefficiency rather than a leak. With idle speculation, I'd say that each
$re maintains some kind of independent state, that that state is
proportional to the size of the string it was last used on, and that that
storage is reused next time that $re gets invoked, but not before then.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
Reply With Quote
 
ShaunJ
Guest
Posts: n/a
 
      03-13-2008
On Mar 13, 2:53 pm, "John W. Krahn" <(E-Mail Removed)> wrote:
....
> > I tested it and if I remove the English module it works fine.
> > (So don't use English.pm!)

>
> Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:
>
> use English qw( -no_match_vars );


Wow, thanks! If I use either English.pm or $& (even without
English.pm) it uses up tons of memory with Perl 5.8.6 (on MacOSX
10.4.11). If I use neither English.pm or $& it works fine.

If I use Perl 5.10.0 built from source it works for every case.

Cheers,
Shaun
 
Reply With Quote
 
Uri Guttman
Guest
Posts: n/a
 
      03-13-2008
>>>>> "S" == ShaunJ <(E-Mail Removed)> writes:

S> On Mar 13, 2:53 pm, "John W. Krahn" <(E-Mail Removed)> wrote:
S> ...
>> > I tested it and if I remove the English module it works fine.
>> > (So don't use English.pm!)

>>
>> Or at least don't use the $PREMATCH, $MATCH, or $POSTMATCH variables:
>>
>> use English qw( -no_match_vars );


S> Wow, thanks! If I use either English.pm or $& (even without
S> English.pm) it uses up tons of memory with Perl 5.8.6 (on MacOSX
S> 10.4.11). If I use neither English.pm or $& it works fine.

i was going to mention that but didn't want to get into this thread. $&
(which is used in english.pm without that option) is a known memory hog
(not a leak). since $& is global it must copy the entire match string
for each regex in case it might be used later anywhere in the
program. this is a well known issue and you should google for more about
it or find the points in perldoc perlvar.

S> If I use Perl 5.10.0 built from source it works for every case.

they seem to have fixed this problem (partially from what i heard but i
could be wrong) in 5.10. i still recommend never using $& and no one who
knows perl uses english.pm.

uri

--
Uri Guttman ------ http://www.velocityreviews.com/forums/(E-Mail Removed) -------- http://www.sysarch.com --
----- Perl Architecture, Development, Training, Support, Code Review ------
----------- Search or Offer Perl Jobs ----- http://jobs.perl.org ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
reading text-file with very long lines McGregor Java 2 01-29-2009 07:07 PM
Having compilation error: no match for call to (const __gnu_cxx::hash<long long int>) (const long long int&) veryhotsausage C++ 1 07-04-2008 05:41 PM
Writing long-running daemons without memory leaks? Toby DiPasquale Ruby 4 03-17-2006 09:29 PM
very very very long integer shanx__=|;- C Programming 19 10-19-2004 03:55 PM
very very very long integer Abhishek Jha C Programming 4 10-17-2004 08:19 AM



Advertisments