Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > regular expression too big

Reply
Thread Tools

regular expression too big

 
 
brabuhr@gmail.com
Guest
Posts: n/a
 
      11-23-2006
On 11/17/06, http://www.velocityreviews.com/forums/(E-Mail Removed) <(E-Mail Removed)> wrote:
> On 11/12/06, Ross Bamford <(E-Mail Removed)> wrote:
> > On Sun, 12 Nov 2006 15:01:56 -0000, Peter Schrammel
> > <(E-Mail Removed)> wrote:
> > > Why is there a limitation at all? I implemented the same thing in perl
> > > and it no complains ...
> > > Is the regexp engine of perl that much better?
> > >

> >
> > Irrespective of whether regex the best solution for your needs, it seems
> > Oniguruma will improve the situation somewhat with respect to large
> > regular expressions.

>
> I built a local version of 1.8.5 with the oniguruma engine:
> http://raa.ruby-lang.org/project/oniguruma/
>
> And re-ran (a slight variation of) my test program:


I thought I'd try running under jruby too:

$ ruby long_regex_test.rb
Took 0.000153 seconds to convert 1 words into a regex 17 bytes long.
Took 0.000381 seconds to convert 2 words into a regex 20 bytes long.
Took 0.000393 seconds to convert 4 words into a regex 36 bytes long.
Took 0.000629 seconds to convert 8 words into a regex 93 bytes long.
Took 0.001359 seconds to convert 16 words into a regex 180 bytes long.
Took 0.002261 seconds to convert 32 words into a regex 360 bytes long.
Took 0.007304 seconds to convert 64 words into a regex 741 bytes long.
Took 0.013601 seconds to convert 128 words into a regex 1348 bytes long.
Took 0.028273 seconds to convert 256 words into a regex 2746 bytes long.
Took 0.066228 seconds to convert 512 words into a regex 5345 bytes long.
Took 0.177105 seconds to convert 1024 words into a regex 10017 bytes long.
Took 0.330573 seconds to convert 2048 words into a regex 19597 bytes long.
Took 1.390542 seconds to convert 4096 words into a regex 37345 bytes long.
long_regex_test.rb:26:in `match': regular expression too big:
/(?:A(?:cr(?:edula|opora)|d(?:ar|elochorda|ventis[mt])|frogaean|hepatokla|ileen|l(?:adinist|l(?:a(?:man da|sch)|otheria)|ticamelus)|m(?:bystomidae|ericanl y|ioidei|phioxidae)|n(?:chisaurus|d(?:aman|romache )|olympiad|t(?:echinomys|h(?phila|ropozoic)))|pa tornis|r(?:ab|chelenis|istarch)|s(?:caridia|elli|h antee|ilidae|terias)|tropa|u(?:riculidae|stroasiat ic))|B(?:a(?:cchus|eria|haism|iera|k(?:shaish|wiri )|re|sili(?:ca|scus))|e(?:atrice|l(?:g(?:ae|ic)|sh azzaresque)|mbex|rn(?:inesque|oullian))|i(?:elid|l ati|smarck|tis)|lackfoot|o(?:hemia|llandist|rrovia n)|ra(?:m|nchiopulmonata)|u(?:nga|phthalmum)|yroni (?:cs|te))|C(?:a(?:ctales|l(?:edonia|li(?:carpa|st ephus)|ochortaceae|vados|ycophorae)|m(?:bodian|orr a)|ntabri|p(?:ito(?:line)?|sidae)|r(?lan|tist)|s (?:sandra|tanospermum)|thari)|e(?:ntrarchidae|stri an)|h(?:arontas|e(?:lura|makuan)|rist(?:ianomastix |li(?:keness|ness)|mas))|lathrus|o(?:bleskill|fane |l(?:letidae|ossian)|m(?:melinaceae|us)|rybantic)| rocus|u(?:cumariidae|thbert)|y(?:clospondy
(RegexpError)
from long_regex_test.rb:26
from long_regex_test.rb:15:in `times'
from long_regex_test.rb:15

$ /opt/ruby/v1.8.5-oniguruma/bin/ruby long_regex_test.rb
Took 0.000211 seconds to convert 1 words into a regex 5 bytes long.
Took 0.000334 seconds to convert 2 words into a regex 24 bytes long.
Took 0.000215 seconds to convert 4 words into a regex 52 bytes long.
Took 0.000836 seconds to convert 8 words into a regex 92 bytes long.
Took 0.000885 seconds to convert 16 words into a regex 173 bytes long.
Took 0.002779 seconds to convert 32 words into a regex 345 bytes long.
Took 0.004934 seconds to convert 64 words into a regex 725 bytes long.
Took 0.009765 seconds to convert 128 words into a regex 1369 bytes long.
Took 0.020761 seconds to convert 256 words into a regex 2737 bytes long.
Took 0.088759 seconds to convert 512 words into a regex 5408 bytes long.
Took 0.144276 seconds to convert 1024 words into a regex 10131 bytes long.
Took 0.246762 seconds to convert 2048 words into a regex 19531 bytes long.
Took 0.667575 seconds to convert 4096 words into a regex 37498 bytes long.
Took 1.677037 seconds to convert 8192 words into a regex 71352 bytes long.
Took 2.971277 seconds to convert 16384 words into a regex 133499 bytes long.
Took 6.078681 seconds to convert 32768 words into a regex 245318 bytes long.
Took 13.001538 seconds to convert 65536 words into a regex 433611 bytes long.
Took 26.791838 seconds to convert 131072 words into a regex 713229 bytes long.
Took 47.691109 seconds to convert 262144 words into a regex 1061186 bytes long.
Took 71.050324 seconds to convert 524288 words into a regex 1354567 bytes long.

$ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home
$ ~/Desktop/jruby-0.9.1/bin/jruby long_regex_test.rb
Took 0.032 seconds to convert 1 words into a regex 9 bytes long.
Took 0.012 seconds to convert 2 words into a regex 18 bytes long.
Took 0.624 seconds to convert 4 words into a regex 40 bytes long.
Took 0.033 seconds to convert 8 words into a regex 95 bytes long.
Took 0.095 seconds to convert 16 words into a regex 156 bytes long.
Took 0.057 seconds to convert 32 words into a regex 358 bytes long.
Took 0.171 seconds to convert 64 words into a regex 743 bytes long.
Took 0.309 seconds to convert 128 words into a regex 1402 bytes long.
Took 0.40900000000000003 seconds to convert 256 words into a regex
2692 bytes long.
Took 1.863 seconds to convert 512 words into a regex 5341 bytes long.
Took 0.838 seconds to convert 1024 words into a regex 10328 bytes long.
Took 1.504 seconds to convert 2048 words into a regex 19733 bytes long.
Took 2.814 seconds to convert 4096 words into a regex 37334 bytes long.
Took 8.177 seconds to convert 8192 words into a regex 71593 bytes long.
Took 15.181000000000001 seconds to convert 16384 words into a regex
133779 bytes long.
Took 30.695 seconds to convert 32768 words into a regex 244280 bytes long.
Took 61.555 seconds to convert 65536 words into a regex 432751 bytes long.
Took 155.94400000000002 seconds to convert 131072 words into a regex
713573 bytes long.
Took 224.93 seconds to convert 262144 words into a regex 1060079 bytes long.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

 
Reply With Quote
 
 
 
 
x1
Guest
Posts: n/a
 
      11-23-2006
#would_never_do.rb
matched = `perl -e 'if ("Hello" =~ m/(Hello)/) {print "true";}'`


On 11/22/06, (E-Mail Removed) <(E-Mail Removed)> wrote:
> On 11/17/06, (E-Mail Removed) <(E-Mail Removed)> wrote:
> > On 11/12/06, Ross Bamford <(E-Mail Removed)> wrote:
> > > On Sun, 12 Nov 2006 15:01:56 -0000, Peter Schrammel
> > > <(E-Mail Removed)> wrote:
> > > > Why is there a limitation at all? I implemented the same thing in perl
> > > > and it no complains ...
> > > > Is the regexp engine of perl that much better?
> > > >
> > >
> > > Irrespective of whether regex the best solution for your needs, it seems
> > > Oniguruma will improve the situation somewhat with respect to large
> > > regular expressions.

> >
> > I built a local version of 1.8.5 with the oniguruma engine:
> > http://raa.ruby-lang.org/project/oniguruma/
> >
> > And re-ran (a slight variation of) my test program:

>
> I thought I'd try running under jruby too:
>
> $ ruby long_regex_test.rb
> Took 0.000153 seconds to convert 1 words into a regex 17 bytes long.
> Took 0.000381 seconds to convert 2 words into a regex 20 bytes long.
> Took 0.000393 seconds to convert 4 words into a regex 36 bytes long.
> Took 0.000629 seconds to convert 8 words into a regex 93 bytes long.
> Took 0.001359 seconds to convert 16 words into a regex 180 bytes long.
> Took 0.002261 seconds to convert 32 words into a regex 360 bytes long.
> Took 0.007304 seconds to convert 64 words into a regex 741 bytes long.
> Took 0.013601 seconds to convert 128 words into a regex 1348 bytes long.
> Took 0.028273 seconds to convert 256 words into a regex 2746 bytes long.
> Took 0.066228 seconds to convert 512 words into a regex 5345 bytes long.
> Took 0.177105 seconds to convert 1024 words into a regex 10017 bytes long.
> Took 0.330573 seconds to convert 2048 words into a regex 19597 bytes long.
> Took 1.390542 seconds to convert 4096 words into a regex 37345 bytes long.
> long_regex_test.rb:26:in `match': regular expression too big:
> /(?:A(?:cr(?:edula|opora)|d(?:ar|elochorda|ventis[mt])|frogaean|hepatokla|ileen|l(?:adinist|l(?:a(?:man da|sch)|otheria)|ticamelus)|m(?:bystomidae|ericanl y|ioidei|phioxidae)|n(?:chisaurus|d(?:aman|romache )|olympiad|t(?:echinomys|h(?phila|ropozoic)))|pa tornis|r(?:ab|chelenis|istarch)|s(?:caridia|elli|h antee|ilidae|terias)|tropa|u(?:riculidae|stroasiat ic))|B(?:a(?:cchus|eria|haism|iera|k(?:shaish|wiri )|re|sili(?:ca|scus))|e(?:atrice|l(?:g(?:ae|ic)|sh azzaresque)|mbex|rn(?:inesque|oullian))|i(?:elid|l ati|smarck|tis)|lackfoot|o(?:hemia|llandist|rrovia n)|ra(?:m|nchiopulmonata)|u(?:nga|phthalmum)|yroni (?:cs|te))|C(?:a(?:ctales|l(?:edonia|li(?:carpa|st ephus)|ochortaceae|vados|ycophorae)|m(?:bodian|orr a)|ntabri|p(?:ito(?:line)?|sidae)|r(?lan|tist)|s (?:sandra|tanospermum)|thari)|e(?:ntrarchidae|stri an)|h(?:arontas|e(?:lura|makuan)|rist(?:ianomastix |li(?:keness|ness)|mas))|lathrus|o(?:bleskill|fane |l(?:letidae|ossian)|m(?:melinaceae|us)|rybantic)| rocus|u(?:cumariidae|thbert)|y(?:clospondy
> (RegexpError)
> from long_regex_test.rb:26
> from long_regex_test.rb:15:in `times'
> from long_regex_test.rb:15
>
> $ /opt/ruby/v1.8.5-oniguruma/bin/ruby long_regex_test.rb
> Took 0.000211 seconds to convert 1 words into a regex 5 bytes long.
> Took 0.000334 seconds to convert 2 words into a regex 24 bytes long.
> Took 0.000215 seconds to convert 4 words into a regex 52 bytes long.
> Took 0.000836 seconds to convert 8 words into a regex 92 bytes long.
> Took 0.000885 seconds to convert 16 words into a regex 173 bytes long.
> Took 0.002779 seconds to convert 32 words into a regex 345 bytes long.
> Took 0.004934 seconds to convert 64 words into a regex 725 bytes long.
> Took 0.009765 seconds to convert 128 words into a regex 1369 bytes long.
> Took 0.020761 seconds to convert 256 words into a regex 2737 bytes long.
> Took 0.088759 seconds to convert 512 words into a regex 5408 bytes long.
> Took 0.144276 seconds to convert 1024 words into a regex 10131 bytes long.
> Took 0.246762 seconds to convert 2048 words into a regex 19531 bytes long.
> Took 0.667575 seconds to convert 4096 words into a regex 37498 bytes long.
> Took 1.677037 seconds to convert 8192 words into a regex 71352 bytes long.
> Took 2.971277 seconds to convert 16384 words into a regex 133499 bytes long.
> Took 6.078681 seconds to convert 32768 words into a regex 245318 bytes long.
> Took 13.001538 seconds to convert 65536 words into a regex 433611 bytes long.
> Took 26.791838 seconds to convert 131072 words into a regex 713229 bytes long.
> Took 47.691109 seconds to convert 262144 words into a regex 1061186 bytes long.
> Took 71.050324 seconds to convert 524288 words into a regex 1354567 bytes long.
>
> $ export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Home
> $ ~/Desktop/jruby-0.9.1/bin/jruby long_regex_test.rb
> Took 0.032 seconds to convert 1 words into a regex 9 bytes long.
> Took 0.012 seconds to convert 2 words into a regex 18 bytes long.
> Took 0.624 seconds to convert 4 words into a regex 40 bytes long.
> Took 0.033 seconds to convert 8 words into a regex 95 bytes long.
> Took 0.095 seconds to convert 16 words into a regex 156 bytes long.
> Took 0.057 seconds to convert 32 words into a regex 358 bytes long.
> Took 0.171 seconds to convert 64 words into a regex 743 bytes long.
> Took 0.309 seconds to convert 128 words into a regex 1402 bytes long.
> Took 0.40900000000000003 seconds to convert 256 words into a regex
> 2692 bytes long.
> Took 1.863 seconds to convert 512 words into a regex 5341 bytes long.
> Took 0.838 seconds to convert 1024 words into a regex 10328 bytes long.
> Took 1.504 seconds to convert 2048 words into a regex 19733 bytes long.
> Took 2.814 seconds to convert 4096 words into a regex 37334 bytes long.
> Took 8.177 seconds to convert 8192 words into a regex 71593 bytes long.
> Took 15.181000000000001 seconds to convert 16384 words into a regex
> 133779 bytes long.
> Took 30.695 seconds to convert 32768 words into a regex 244280 bytes long.
> Took 61.555 seconds to convert 65536 words into a regex 432751 bytes long.
> Took 155.94400000000002 seconds to convert 131072 words into a regex
> 713573 bytes long.
> Took 224.93 seconds to convert 262144 words into a regex 1060079 bytes long.
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>
>


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
GIDS 2009 .Net:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf ASP .Net 0 12-26-2008 09:29 AM
GIDS 2009 .Net:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf ASP .Net Web Controls 0 12-26-2008 06:11 AM
GIDS 2009 Java:: Save Big, Win Big, Learn Big: Act Before Dec 29 2008 Shaguf Python 0 12-24-2008 07:35 AM
boost regex --- sregex_iterator -- Regular expression too big wolverine C++ 2 08-29-2006 11:22 PM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments