Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > too greedy of a regexp

Reply
Thread Tools

too greedy of a regexp

 
 
Dave Rose
Guest
Posts: n/a
 
      11-09-2006
i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
processing
a billing extract file containing:
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
BillHead...<<<much information here>>\n
<<one or more detail lines here\n>>
Bill_End...<<<much information here>>\n
...etc.... to EOF....

..i get the whole file matched....i just want each invoice...
it will eventually be in a oneliner like
a=File.read("billfile").scan(regexp)

so what is the non-greedy way for the above regexp to properly match
each invoice...

--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
 
Jan Svitok
Guest
Posts: n/a
 
      11-09-2006
On 11/9/06, Dave Rose <(E-Mail Removed)> wrote:
> i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
> processing
> a billing extract file containing:
> BillHead...<<<much information here>>\n
> <<one or more detail lines here\n>>
> Bill_End...<<<much information here>>\n
> BillHead...<<<much information here>>\n
> <<one or more detail lines here\n>>
> Bill_End...<<<much information here>>\n
> ...etc.... to EOF....
>
> ..i get the whole file matched....i just want each invoice...
> it will eventually be in a oneliner like
> a=File.read("billfile").scan(regexp)
>
> so what is the non-greedy way for the above regexp to properly match
> each invoice...


try:

/(^BillHead(.*?))(^Bill_End(.*?))\n/m

or

/(^BillHead(.*?))(^Bill_End([^\n].*))\n/m

notice the .*? instead of .*

*? has some pecularities, that were discussed here some time ago, so
perhaps you'd want to find them in the archives. (search for 'greedy'
or 'regex' - I don't remeber now)

 
Reply With Quote
 
 
 
 
Robert Klemme
Guest
Posts: n/a
 
      11-09-2006
Jan Svitok wrote:
> On 11/9/06, Dave Rose <(E-Mail Removed)> wrote:
>> i have a regexp: /(^BillHead(.*))(^Bill_End(.*))/m that's too greedy for
>> processing
>> a billing extract file containing:
>> BillHead...<<<much information here>>\n
>> <<one or more detail lines here\n>>
>> Bill_End...<<<much information here>>\n
>> BillHead...<<<much information here>>\n
>> <<one or more detail lines here\n>>
>> Bill_End...<<<much information here>>\n
>> ...etc.... to EOF....
>>
>> ..i get the whole file matched....i just want each invoice...
>> it will eventually be in a oneliner like
>> a=File.read("billfile").scan(regexp)
>>
>> so what is the non-greedy way for the above regexp to properly match
>> each invoice...

>
> try:
>
> /(^BillHead(.*?))(^Bill_End(.*?))\n/m
>
> or
>
> /(^BillHead(.*?))(^Bill_End([^\n].*))\n/m
>
> notice the .*? instead of .*
>
> *? has some pecularities, that were discussed here some time ago, so
> perhaps you'd want to find them in the archives. (search for 'greedy'
> or 'regex' - I don't remeber now)


I would also remove the last .* because that likely eats up the rest of
the document. So that would be

/^BillHead(.*?)^BillEnd/m

Another approach is to do

s.split(/^(Bill(?:Head|End))/m)

and then go through the array.

irb(main):006:0> "BillHead\nfoo\nbar\nBillEnd".split(/^(Bill(?:Head|End))/m)
=> ["", "BillHead", "\nfoo\nbar\n", "BillEnd"]

Kind regards

robert

 
Reply With Quote
 
Dave Rose
Guest
Posts: n/a
 
      11-09-2006
Robert Klemme wrote:
> Jan Svitok wrote:
>>> ...etc.... to EOF....

>> /(^BillHead(.*?))(^Bill_End(.*?))\n/m
>>
>> or
>>
>> /(^BillHead(.*?))(^Bill_End([^\n].*))\n/m
>>
>> notice the .*? instead of .*
>>

=> ["", "BillHead", "\nfoo\nbar\n", "BillEnd"]
>
> Kind regards
>
> robert

i played around in irb with a shorten extract file and found that:
b=File.read("drbilp.txt").scan(/(^BillHead(.*?))(^Bill_End(\d*)(\s*UBPBILP1\n)(.*? ))/m)
works in that separates each invoice in an sub-array of size=6
in which b[x][0]+b[x][2] completes that task of reading,scanning
correctly
and puting all in a ruby 'container' that i can do an each on....thanx
dave


--
Posted via http://www.ruby-forum.com/.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Greedy and non greedy quantifiers Dan Kelly Ruby 4 01-19-2008 08:36 PM
regexp non-greedy matching bug? Sam Pointon Python 8 12-05-2005 08:31 AM
regexp s// too greedy bettyann Perl Misc 10 11-14-2004 12:26 AM
greedy v. non-greedy matching Matt Garrish Perl Misc 4 02-16-2004 03:25 PM



Advertisments