Velocity Reviews > Match First Sequence in Regular Expression?

# Match First Sequence in Regular Expression?

Roger L. Cauvin
Guest
Posts: n/a

 01-26-2006
Say I have some string that begins with an arbitrary sequence of characters
and then alternates repeating the letters 'a' and 'b' any number of times,
e.g.

"xyz123aaabbaabbbbababbbbaaabb"

I'm looking for a regular expression that matches the first, and only the
first, sequence of the letter 'a', and only if the length of the sequence is
exactly 3.

Does such a regular expression exist? If so, any ideas as to what it could
be?

--
Roger L. Cauvin
http://www.velocityreviews.com/forums/(E-Mail Removed) (omit the "nospam_" part)
Cauvin, Inc.
Product Management / Market Research
http://www.cauvin-inc.com

Sybren Stuvel
Guest
Posts: n/a

 01-26-2006
Roger L. Cauvin enlightened us with:
> I'm looking for a regular expression that matches the first, and
> only the first, sequence of the letter 'a', and only if the length
> of the sequence is exactly 3.

1) You're looking for the first, and only the first, sequence of the
letter 'a'. If the length of this first, and only the first,
sequence of the letter 'a' is not 3, no match is made at all.

2) You're looking for the first, and only the first, sequence of
length 3 of the letter 'a'.

What is it?

Sybren
--
The problem with the world is stupidity. Not saying there should be a
capital punishment for stupidity, but why don't we just take the
safety labels off of everything and let the problem solve itself?
Frank Zappa

Guest
Posts: n/a

 01-26-2006
Hello Roger,

> I'm looking for a regular expression that matches the first, and only
> the first, sequence of the letter 'a', and only if the length of the
> sequence is exactly 3.

import sys, re, os

if __name__=='__main__':

m = re.search('a{3}', 'xyz123aaabbaaabbbbababbbbaabb')
print m.group(0)
print "Preceded by: \"" + m.string[0:m.start(0)] + "\""

Best wishes,
Christoph

Tim Chase
Guest
Posts: n/a

 01-26-2006
> Say I have some string that begins with an arbitrary
> sequence of characters and then alternates repeating the
> letters 'a' and 'b' any number of times, e.g.
> "xyz123aaabbaabbbbababbbbaaabb"
>
> I'm looking for a regular expression that matches the
> first, and only the first, sequence of the letter 'a', and
> only if the length of the sequence is exactly 3.
>
> Does such a regular expression exist? If so, any ideas as
> to what it could be?
>

I'm not quite sure what your intent here is, as the
resulting find would obviously be "aaa", of length 3.

If you mean that you want to test against a number of
things, and only find items where "aaa" is the first "a" on
the line, you might try something like

import re
listOfStringsToTest = [
'helloworld',
'xyz123aaabbaabababbab',
'cantalopeaaabababa',
'baabbbaaabbbbb',
'xyzaa123aaabbabbabababaa']
r = re.compile("[^a]*(a{3})b+(a+b+)*")
matches = [s for s in listOfStringsToTest if r.match(s)]
print repr(matches)

If you just want the *first* triad of "aaa", you can change
the regexp to

r = re.compile(".*?(a{3})b+(a+b+)*")

With a little more detail as to the gist of the problem,
perhaps a better solution can be found. In particular, are
there items in the listOfStringsToTest that should be found
but aren't with either of the regexps?

-tkc

Alex Martelli
Guest
Posts: n/a

 01-26-2006
Tim Chase <(E-Mail Removed)> wrote:
...
> I'm not quite sure what your intent here is, as the
> resulting find would obviously be "aaa", of length 3.

But that would also match 'aaaa'; I think he wants negative loobehind
and lookahead assertions around the 'aaa' part. But then there's the
spec about matching only if the sequence is the first occurrence of
'a's, so maybe he wants '\$[^a]*' instead of the lookbehind (and maybe
parentheses around the 'aaa' to somehow 'match' is specially?).

It's definitely not very clear what exactly the intent is, no...

Alex

Roger L. Cauvin
Guest
Posts: n/a

 01-26-2006
"Sybren Stuvel" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) r.org...
> Roger L. Cauvin enlightened us with:
>> I'm looking for a regular expression that matches the first, and
>> only the first, sequence of the letter 'a', and only if the length
>> of the sequence is exactly 3.

>
>
> 1) You're looking for the first, and only the first, sequence of the
> letter 'a'. If the length of this first, and only the first,
> sequence of the letter 'a' is not 3, no match is made at all.
>
> 2) You're looking for the first, and only the first, sequence of
> length 3 of the letter 'a'.
>
> What is it?

The first option describes what I want, with the additional restriction that
the "first sequence of the letter 'a'" is defined as 1 or more consecutive
occurrences of the letter 'a', followed directly by the letter 'b'.

--
Roger L. Cauvin
(E-Mail Removed) (omit the "nospam_" part)
Cauvin, Inc.
Product Management / Market Research
http://www.cauvin-inc.com

Roger L. Cauvin
Guest
Posts: n/a

 01-26-2006
"Christoph Conrad" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)-berlin.de...
> Hello Roger,
>
>> I'm looking for a regular expression that matches the first, and only
>> the first, sequence of the letter 'a', and only if the length of the
>> sequence is exactly 3.

>
> import sys, re, os
>
> if __name__=='__main__':
>
> m = re.search('a{3}', 'xyz123aaabbaaabbbbababbbbaabb')
> print m.group(0)
> print "Preceded by: \"" + m.string[0:m.start(0)] + "\""

The correct pattern should reject the string:

'xyz123aabbaaab'

since the length of the first sequence of the letter 'a' is 2. Yours
accepts it, right?

--
Roger L. Cauvin
(E-Mail Removed) (omit the "nospam_" part)
Cauvin, Inc.
Product Management / Market Research
http://www.cauvin-inc.com

Roger L. Cauvin
Guest
Posts: n/a

 01-26-2006
"Alex Martelli" <(E-Mail Removed)> wrote in message
news:1h9reyq.z7u4ziv8itblN%(E-Mail Removed). ..
> Tim Chase <(E-Mail Removed)> wrote:
> ...
>> I'm not quite sure what your intent here is, as the
>> resulting find would obviously be "aaa", of length 3.

>
> But that would also match 'aaaa'; I think he wants negative loobehind
> and lookahead assertions around the 'aaa' part. But then there's the
> spec about matching only if the sequence is the first occurrence of
> 'a's, so maybe he wants '\$[^a]*' instead of the lookbehind (and maybe
> parentheses around the 'aaa' to somehow 'match' is specially?).
>
> It's definitely not very clear what exactly the intent is, no...

Sorry for the confusion. The correct pattern should reject all strings
except those in which the first sequence of the letter 'a' that is followed
by the letter 'b' has a length of exactly three.

Hope that's clearer . . . .

--
Roger L. Cauvin
(E-Mail Removed) (omit the "nospam_" part)
Cauvin, Inc.
Product Management / Market Research
http://www.cauvin-inc.com

Guest
Posts: n/a

 01-26-2006
Hello Roger,

> since the length of the first sequence of the letter 'a' is 2. Yours
> accepts it, right?

Yes, i misunderstood your requirements. So it must be modified
essentially to that what Tim Chase wrote:

m = re.search('^[^a]*a{3}b', 'xyz123aabbaaab')

Best wishes from germany,
Christoph

Christos Georgiou
Guest
Posts: n/a

 01-26-2006
On Thu, 26 Jan 2006 14:09:54 GMT, rumours say that "Roger L. Cauvin"
<(E-Mail Removed)> might have written:

>Say I have some string that begins with an arbitrary sequence of characters
>and then alternates repeating the letters 'a' and 'b' any number of times,
>e.g.
>
>"xyz123aaabbaabbbbababbbbaaabb"
>
>I'm looking for a regular expression that matches the first, and only the
>first, sequence of the letter 'a', and only if the length of the sequence is
>exactly 3.
>
>Does such a regular expression exist? If so, any ideas as to what it could
>be?

Is this what you mean?

^[^a]*(a{3})(?:[^a].*)?\$

--
TZOTZIOY, I speak England very best.
"Dear Paul,
The Corinthians