Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

Reply
Thread Tools

make RE more cleaver to avoid inappropriate : sre_constants.error: redefinition of group name

 
 
aspineux
Guest
Posts: n/a
 
      03-29-2007

I want to parse

'foo@bare' or '<foot@bar>' and get the email address foo@bar

the regex is

r'<\w+@\w+>|\w+@\w+'

now, I want to give it a name

r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

sre_constants.error: redefinition of group name 'email' as group 2;
was group 1

BUT because I use a | , I will get only one group named 'email' !

Any comment ?

PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
\w+@\w+)(?(lt)>)'

 
Reply With Quote
 
 
 
 
attn.steven.kuo@gmail.com
Guest
Posts: n/a
 
      03-29-2007
On Mar 29, 7:22 am, "aspineux" <(E-Mail Removed)> wrote:
> I want to parse
>
> 'foo@bare' or '<foot@bar>' and get the email address foo@bar
>
> the regex is
>
> r'<\w+@\w+>|\w+@\w+'
>
> now, I want to give it a name
>
> r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
>
> sre_constants.error: redefinition of group name 'email' as group 2;
> was group 1
>
> BUT because I use a | , I will get only one group named 'email' !
>
> Any comment ?
>
> PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
> \w+@\w+)(?(lt)>)'




Regular expressions, alternation, named groups ... oh my!

It tends to get quite complex especially if you need
to reject cases where the string contains a left bracket
and not the right, or visa-versa.

>>> pattern = re.compile(r'(?P<email><\w+@\w+>|(?<!<)\b\w+@\w+\b (?!>))')
>>> for email in ('foo@bar' , '<foo@bar>', '<start@without_end_bracket'):

.... matched = pattern.search(email)
.... if matched is not None:
.... print matched.group('email')
....
foo@bar
<foo@bar>


I suggest you try some other solution (maybe pyparsing).

--
Hope this helps,
Steven

 
Reply With Quote
 
 
 
 
aspineux
Guest
Posts: n/a
 
      03-29-2007
On 29 mar, 16:22, "aspineux" <(E-Mail Removed)> wrote:
> I want to parse
>
> 'foo@bare' or '<foot@bar>' and get the email address foo@bar
>
> the regex is
>
> r'<\w+@\w+>|\w+@\w+'
>
> now, if I want to give it a name
>
> r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
>
> sre_constants.error: redefinition of group name 'email' as group 2;
> was group 1
>
> BUT because I use a | , I will get only one group named 'email' !


THEN my regex is meaningful, and the error is meaningless and
somrthing
should be change into 're'

But maybe I'm wrong ?

>
> Any comment ?


I'm trying to start a discussion about something that can be improved
in 're',
not looking for a solution about email parsing


>
> PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
> \w+@\w+)(?(lt)>)'



 
Reply With Quote
 
Paddy
Guest
Posts: n/a
 
      03-29-2007
On Mar 29, 3:22 pm, "aspineux" <(E-Mail Removed)> wrote:
> I want to parse
>
> 'foo@bare' or '<foot@bar>' and get the email address foo@bar
>
> the regex is
>
> r'<\w+@\w+>|\w+@\w+'
>
> now, I want to give it a name
>
> r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'
>
> sre_constants.error: redefinition of group name 'email' as group 2;
> was group 1
>
> BUT because I use a | , I will get only one group named 'email' !
>
> Any comment ?
>
> PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
> \w+@\w+)(?(lt)>)'


use two group names, one for each alternate form and if you are not
concerned with whichever matched do something like the following:

>>> s1 = 'foo@bare'
>>> s2 = '<foo@bare>'
>>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s1)
>>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

'foo@bare'
>>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s2)
>>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

'foo@bare'
>>>


- Paddy.

 
Reply With Quote
 
aspineux
Guest
Posts: n/a
 
      03-30-2007
On 30 mar, 00:13, "Paddy" <(E-Mail Removed)> wrote:
> On Mar 29, 3:22 pm, "aspineux" <(E-Mail Removed)> wrote:
>
>
>
> > I want to parse

>
> > 'foo@bare' or '<foot@bar>' and get the email address foo@bar

>
> > the regex is

>
> > r'<\w+@\w+>|\w+@\w+'

>
> > now, I want to give it a name

>
> > r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

>
> > sre_constants.error: redefinition of group name 'email' as group 2;
> > was group 1

>
> > BUT because I use a | , I will get only one group named 'email' !

>
> > Any comment ?

>
> > PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
> > \w+@\w+)(?(lt)>)'

>
> use two group names, one for each alternate form and if you are not
> concerned with whichever matched do something like the following:
>

The problem is the way I create this regex

regex={}
regex['email']=r'(?P<email1>\w+@\w+)'

path=r'<%(email)s>|%(email)s' % regex

Once more, the original question is :
Is it normal to get an error when the same id used on both side of a
|

>
>
> >>> s1 = 'foo@bare'
> >>> s2 = '<foo@bare>'
> >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s1)
> >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

> 'foo@bare'
> >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s2)
> >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

> 'foo@bare'
>
> - Paddy.



 
Reply With Quote
 
Paddy
Guest
Posts: n/a
 
      03-30-2007
On Mar 30, 1:44 pm, "aspineux" <(E-Mail Removed)> wrote:
> On 30 mar, 00:13, "Paddy" <(E-Mail Removed)> wrote:
>
> > On Mar 29, 3:22 pm, "aspineux" <(E-Mail Removed)> wrote:

>
> > > I want to parse

>
> > > 'foo@bare' or '<foot@bar>' and get the email address foo@bar

>
> > > the regex is

>
> > > r'<\w+@\w+>|\w+@\w+'

>
> > > now, I want to give it a name

>
> > > r'<(?P<email>\w+@\w+)>|(?P<email>\w+@\w+)'

>
> > > sre_constants.error: redefinition of group name 'email' as group 2;
> > > was group 1

>
> > > BUT because I use a | , I will get only one group named 'email' !

>
> > > Any comment ?

>
> > > PS: I know the solution for this case is to use r'(?P<lt><)?(?P<email>
> > > \w+@\w+)(?(lt)>)'

>
> > use two group names, one for each alternate form and if you are not
> > concerned with whichever matched do something like the following:

>
> The problem is the way I create this regex
>
> regex={}
> regex['email']=r'(?P<email1>\w+@\w+)'
>
> path=r'<%(email)s>|%(email)s' % regex
>
> Once more, the original question is :
> Is it normal to get an error when the same id used on both side of a
> |
>
>
>
> > >>> s1 = 'foo@bare'
> > >>> s2 = '<foo@bare>'
> > >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s1)
> > >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

> > 'foo@bare'
> > >>> matchobj = re.search(r'<(?P<email1>\w+@\w+)>|(?P<email2>\w+@\ w+)', s2)
> > >>> matchobj.groupdict()['email1'] or matchobj.groupdict()['email2']

> > 'foo@bare'

>
> > - Paddy.


Groups are numbered left-to-right irrespective of the expression
contents.
I am quite happy with the names being merely apseudonym for the
positional
group number and don't see a problem with not allowing multiple
occurrences of the same group name.
I did see some article about RE's and their speed. It seems that if
Pythons
RE package distinguished between 'grep style' RE' and the full set of
Python
RE's then their are much faster and efficient algorithms available for
the
grep style subset.

- Paddy.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: avoid the redefinition of a function Jabba Laci Python 3 09-13-2012 08:23 AM
Re: avoid the redefinition of a function Michael Torrie Python 0 09-12-2012 01:52 PM
Re: avoid the redefinition of a function D'Arcy Cain Python 0 09-12-2012 01:51 PM
avoid the redefinition of a function Jabba Laci Python 2 09-12-2012 01:15 PM
Inappropriate - make that stupid - use of flash at events Anthony Buckland Digital Photography 84 08-10-2004 01:16 PM



Advertisments