Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > regexp and stack overflow

Thread Tools

regexp and stack overflow

Une bévue
Posts: n/a

i've a regexp :'<script[^>]*>((.|\n)(?!/script))*</script>',
Regexp::EXTENDED, 'N')

which is supposed to strip out everything being inside :
<script ...>(part suppressed...)</script>

it works well for some html file but crash over other with the following
error message :
RegexpError: Stack overflow in regexp matcher:
method gsub
in check_files.rb at line 38
method stripHTML
in check_files.rb at line 38

ligne 38 being :

self.gsub(SCRIPT_RE, '').gsub(TAGS_RE, '').gsub(/\s+/, '
').gsub(NBSP_RE, '')

with :'<script[^>]*>((.|\n)(?!/script))*</script>',
Regexp::EXTENDED, 'N')

what i want to do :

strip out all the contents of scripts, all the html tags with their
attributes, and also i have to add striping out any css declaration (not
done yet).

the prog failes for a file having the following parts for script :
<script type="text/javascript"
src="Mac-roman-utf-8_fichiers/wikibits.js"><!-- wikibits js --></script>
<script type="text/javascript"
src="Mac-roman-utf-8_fichiers/index.php"><!-- site js --></script>
<style type="text/css">/*<![CDATA[*/
"/w/index.php?title=MediaWiki:Common.css&action=raw&ct ype=text/css&smaxa
"/w/index.php?title=MediaWiki:Monobook.css&action=raw& ctype=text/css&sma
@import "/w/index.php?title=-&action=raw&gen=css&maxage=2678400";
/*]]>*/</style></head><body class="ns-0 ltr">

and also having some script inside divs of body :
<script type="text/javascript"> if (window.isMSIE55) fixalpha();
<script type="text/javascript"> if (window.runOnloadHook)

i can't make use of tidy for that purpose, because the reason to strip
out any kind of html, to keep the text only, is to help some prog
finding out the encoding of the file.
une bévue
Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
C/C++ compilers have one stack for local variables and return addresses and then another stack for array allocations on the stack. Casey Hawthorne C Programming 3 11-01-2009 08:23 PM
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Why stack overflow with such a small stack? Kenneth McDonald Ruby 7 09-01-2007 04:21 AM
The RedCloth-3.0 regexp stack overflow: Copland manual Bil Kleb Ruby 4 12-27-2004 03:54 AM
[Bug?] Stack overflow in regexp matcher David Heinemeier Hansson Ruby 4 02-12-2004 05:46 PM