Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Why do look-ahead and look-behind have to be fixed-width patterns?

Reply
Thread Tools

Why do look-ahead and look-behind have to be fixed-width patterns?

 
 
inhahe
Guest
Posts: n/a
 
      01-28-2005
Hi i'm a newbie at this and probably always will be, so don't be surprised
if I don't know what i'm talking about.

but I don't understand why regex look-behinds (and look-aheads) have to be
fixed-width patterns.

i'm getting the impression that it's supposed to make searching
exponentially slower otherwise

but i just don't see how.

say i have the expression (?<=.*?:.*?.*
all the engine has to do is search for .*?:.*?:.*, and then in each result,
find .*?:.*?: and return the string starting at the point just after the
length of the match.
no exponential time there, and even that is probably more inefficient than
it has to be.



 
Reply With Quote
 
 
 
 
John Machin
Guest
Posts: n/a
 
      01-28-2005

inhahe wrote:
> Hi i'm a newbie at this and probably always will be, so don't be

surprised
> if I don't know what i'm talking about.
>
> but I don't understand why regex look-behinds (and look-aheads) have

to be
> fixed-width patterns.
>
> i'm getting the impression that it's supposed to make searching
> exponentially slower otherwise
>
> but i just don't see how.
>
> say i have the expression (?<=.*?:.*?.*
> all the engine has to do is search for .*?:.*?:.*, and then in each

result,
> find .*?:.*?: and return the string starting at the point just after

the
> length of the match.
> no exponential time there, and even that is probably more inefficient

than
> it has to be.


But that's not what you are telling it to do. You are telling it to
firstly find each position which starts a match with .* -- i.e. every
position -- and then look backwards to check that the previous text
matches .*?:.*?:

To grab the text after the 2nd colon (if indeed there are two or more),
it's much simpler to do this:

>>> import re
>>> q = re.compile(r'.*?:.*?.*)').search
>>> def grab(s):

.... m = q(s)
.... if m:
.... print m.group(1)
.... else:
.... print 'not found!'
....
>>> grab('')

not found!
>>> grab('::::')

::
>>> grab('a:b:yadda')

yadda
>>>>>> grab('a:b:c:d')

c:d
>>> grab('a:b:')


>>>


 
Reply With Quote
 
 
 
 
Steven Bethard
Guest
Posts: n/a
 
      01-28-2005
John Machin wrote:
> To grab the text after the 2nd colon (if indeed there are two or more),
> it's much simpler to do this:
>
>>>>import re
>>>>q = re.compile(r'.*?:.*?.*)').search
>>>>def grab(s):

>
> ... m = q(s)
> ... if m:
> ... print m.group(1)
> ... else:
> ... print 'not found!'
> ...
>
>>>>grab('')

> not found!
>
>>>>grab('::::')

> ::
>
>>>>grab('a:b:yadda')

> yadda
>
>>>>>>>grab('a:b:c:d')

> c:d
>
>>>>grab('a:b:')

>
>


Or without any regular expressions:

py> def grab(s):
.... try:
.... first, second, rest = s.split(':', 2)
.... print rest
.... except ValueError:
.... print 'not found!'
....
py> grab('')
not found!
py> grab('a:b:yadda')
yadda
py> grab('a:b:c:d')
c:d
py> grab('a:b:')

py>

To the OP: what is it you're trying to do? Often there is a much
cleaner way to do it without regular expressions...

Steve
 
Reply With Quote
 
Diez B. Roggisch
Guest
Posts: n/a
 
      01-28-2005
> but I don't understand why regex look-behinds (and look-aheads) have to be
> fixed-width patterns.
>
> i'm getting the impression that it's supposed to make searching
> exponentially slower otherwise


That's because of the underlying theory of regular expressions. They are
modelled using so called finite state automata (FSM). These are very much
limited in the complexity of things they can do, and so are regular
expressions. Explaining that further would require to dig deep into the
details of FSM, grammars and languages - deeper than I'm currently willing
to do But I wanted to point out that there is a "real" technical reason
for that, not just a lack of feature or willingness to implement one.

--

Regards,

Diez B. Roggisch
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why does String have to_str and Integer have to_int? Nanostuff Ruby 9 03-01-2007 09:44 PM
why why why why why Mr. SweatyFinger ASP .Net 4 12-21-2006 01:15 PM
findcontrol("PlaceHolderPrice") why why why why why why why why why why why Mr. SweatyFinger ASP .Net 2 12-02-2006 03:46 PM
Why I have error message "not have an entry point defined"? =?Utf-8?B?ZGF2aWQ=?= ASP .Net 6 08-18-2005 11:09 AM



Advertisments