Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Question: Optional Regular Expression Grouping

Reply
Thread Tools

Question: Optional Regular Expression Grouping

 
 
galyle
Guest
Posts: n/a
 
      10-10-2011
HI, I've looked through this forum, but I haven't been able to find a
resolution to the problem I'm having (maybe I didn't look hard enough
-- I have to believe this has come up before). The problem is this:
I have a file which has 0, 2, or 3 groups that I'd like to record;
however, in the case of 3 groups, the third group is correctly
captured, but the first two groups get collapsed into just one group.
I'm sure that I'm missing something in the way I've constructed my
regular expression, but I can't figure out what's wrong. Does anyone
have any suggestions?

The demo below showcases the problem I'm having:

import re

valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
[\']+.*$')
line1 = "[field1][field2] = blarg"
line2 = " 'a continuation of blarg'"
line3 = "[field1][field2][field3] = blorg"

m = valid_line.match(line1)
print 'Expected: ' + m.group(1) + ', ' + m.group(2)
m = valid_line.match(line2)
print 'Expected: ' + str(m.group(1))
m = valid_line.match(line3)
print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)
 
Reply With Quote
 
 
 
 
MRAB
Guest
Posts: n/a
 
      10-10-2011
On 10/10/2011 22:57, galyle wrote:
> HI, I've looked through this forum, but I haven't been able to find a
> resolution to the problem I'm having (maybe I didn't look hard enough
> -- I have to believe this has come up before). The problem is this:
> I have a file which has 0, 2, or 3 groups that I'd like to record;
> however, in the case of 3 groups, the third group is correctly
> captured, but the first two groups get collapsed into just one group.
> I'm sure that I'm missing something in the way I've constructed my
> regular expression, but I can't figure out what's wrong. Does anyone
> have any suggestions?
>
> The demo below showcases the problem I'm having:
>
> import re
>
> valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
> [\']+.*$')
> line1 = "[field1][field2] = blarg"
> line2 = " 'a continuation of blarg'"
> line3 = "[field1][field2][field3] = blorg"
>
> m = valid_line.match(line1)
> print 'Expected: ' + m.group(1) + ', ' + m.group(2)
> m = valid_line.match(line2)
> print 'Expected: ' + str(m.group(1))
> m = valid_line.match(line3)
> print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)


Instead of "\S" I'd recommend using "[^\]]", or using a lazy repetition
"\S+?".

You'll also need to handle the space before the "=" in line3.

valid_line =
re.compile(r'^\[(\[^\]]+)\]\[(\[^\]]+)\](?:\s+|\[(\[^\]]+)\])\s*=|\s+[\d\[\']+.*$')
 
Reply With Quote
 
 
 
 
Vlastimil Brom
Guest
Posts: n/a
 
      10-10-2011
2011/10/10 galyle <>:
> HI, I've looked through this forum, but I haven't been able to find a
> resolution to the problem I'm having (maybe I didn't look hard enough
> -- I have to believe this has come up before). *The problem is this:
> I have a file which has 0, 2, or 3 groups that I'd like to record;
> however, in the case of 3 groups, the third group is correctly
> captured, but the first two groups get collapsed into just one group.
> I'm sure that I'm missing something in the way I've constructed my
> regular expression, but I can't figure out what's wrong. *Does anyone
> have any suggestions?
>
> The demo below showcases the problem I'm having:
>
> import re
>
> valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
> [\']+.*$')
> line1 = "[field1][field2] = blarg"
> line2 = " * *'a continuation of blarg'"
> line3 = "[field1][field2][field3] = blorg"
>
> m = valid_line.match(line1)
> print 'Expected: ' + m.group(1) + ', ' + m.group(2)
> m = valid_line.match(line2)
> print 'Expected: ' + str(m.group(1))
> m = valid_line.match(line3)
> print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)
> --
> http://mail.python.org/mailman/listinfo/python-list
>


Hi,
I believe, the space before = is causing problems (or the pattern missingit);
you also need non greedy quantifiers +? to match as little as possible
as opposed to the greedy default:

valid_line = re.compile('^\[(\S+?)\]\[(\S+?)\](?:\s+|\[(\S+)\])\s*=|\s+[\d\[\']+.*$')

or you can use word-patterns explicitely excluding the closing ], like:

valid_line = re.compile('^\[([^\]]+)\]\[([^\]]+)\](?:\s+|\[([^\]]+)\])\s*=|\s+[\d\[\']+.*$')

hth
vbr
 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      10-10-2011
On Mon, Oct 10, 2011 at 4:49 PM, MRAB <> wrote:
> Instead of "\S" I'd recommend using "[^\]]", or using a lazy repetition
> "\S+?".


Preferably the former. The core problem is that the regex matches
ambiguously on the problem string. Lazy repetition doesn't remove
that ambiguity; it merely attempts to make the module prefer the match
that you prefer.

Other notes to the OP: Always use raw strings (r'') when writing
regex patterns, to make sure the backslashes are escape characters in
the pattern rather than in the string literal.

The '^foo|bar$' construct you're using is wonky. I think you're
writing this to mean "match if the entire string is either 'foo' or
'bar'". But what that actually matches is "anything that either
starts with 'foo' or ends with 'bar'". The correct way to do the
former would be either '^foo$|^bar$' or '^(?:foo|bar)$'.
 
Reply With Quote
 
galyle
Guest
Posts: n/a
 
      10-10-2011
On Oct 10, 4:59*pm, Vlastimil Brom <vlastimil.b...@gmail.com> wrote:
> 2011/10/10 galyle <gal...@gmail.com>:
>
>
>
>
>
>
>
>
>
> > HI, I've looked through this forum, but I haven't been able to find a
> > resolution to the problem I'm having (maybe I didn't look hard enough
> > -- I have to believe this has come up before). *The problem is this:
> > I have a file which has 0, 2, or 3 groups that I'd like to record;
> > however, in the case of 3 groups, the third group is correctly
> > captured, but the first two groups get collapsed into just one group.
> > I'm sure that I'm missing something in the way I've constructed my
> > regular expression, but I can't figure out what's wrong. *Does anyone
> > have any suggestions?

>
> > The demo below showcases the problem I'm having:

>
> > import re

>
> > valid_line = re.compile('^\[(\S+)\]\[(\S+)\](?:\s+|\[(\S+)\])=|\s+[\d\
> > [\']+.*$')
> > line1 = "[field1][field2] = blarg"
> > line2 = " * *'a continuation of blarg'"
> > line3 = "[field1][field2][field3] = blorg"

>
> > m = valid_line.match(line1)
> > print 'Expected: ' + m.group(1) + ', ' + m.group(2)
> > m = valid_line.match(line2)
> > print 'Expected: ' + str(m.group(1))
> > m = valid_line.match(line3)
> > print 'Uh-oh: ' + m.group(1) + ', ' + m.group(2)
> > --
> >http://mail.python.org/mailman/listinfo/python-list

>
> Hi,
> I believe, the space before = is causing problems (or the pattern missing it);
> you also need non greedy quantifiers +? to match as little as possible
> as opposed to the greedy default:
>
> valid_line = re.compile('^\[(\S+?)\]\[(\S+?)\](?:\s+|\[(\S+)\])\s*=|\s+[\d\[\']+.*$')
>
> or you can use word-patterns explicitely excluding the closing ], like:
>
> valid_line = re.compile('^\[([^\]]+)\]\[([^\]]+)\](?:\s+|\[([^\]]+)\])\s*=|\s+[\d\[\']+. *$')
>
> hth
> *vbr


Thanks, I had a feeling that greedy matching in my expression was
causing problem. Your suggestion makes sense to me, and works quite
well.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression Non Capturing Grouping Does Not Work. Virtual Buddha Python 3 06-27-2009 08:16 AM
Seek xpath expression where an attribute name is a regular expression GIMME XML 3 12-29-2008 03:11 PM
Regular Expression Grouping linnewbie@gmail.com Python 6 08-12-2007 08:45 PM
Matching abitrary expression in a regular expression =?iso-8859-1?B?bW9vcJk=?= Java 8 12-02-2005 12:51 AM
Dynamically changing the regular expression of Regular Expression validator VSK ASP .Net 2 08-24-2003 02:47 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57