Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > regex: multiple matching for one string

Reply
Thread Tools

regex: multiple matching for one string

 
 
scriptlearner@gmail.com
Guest
Posts: n/a
 
      07-23-2009
For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
will like to take out the values (valuea, valueb, and valuec). How do
I do that in Python? The group method will only return the matched
part. Thanks.

p = re.compile('#a=*;b=*;c=*;')
m = p.match(line)
if m:
print m.group(),
 
Reply With Quote
 
 
 
 
rurpy@yahoo.com
Guest
Posts: n/a
 
      07-23-2009
On Jul 22, 7:45 pm, "(E-Mail Removed)"
<(E-Mail Removed)> wrote:
> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
> will like to take out the values (valuea, valueb, and valuec). How do
> I do that in Python? The group method will only return the matched
> part. Thanks.
>
> p = re.compile('#a=*;b=*;c=*;')
> m = p.match(line)
> if m:
> print m.group(),


p = re.compile('#a=([^;]*);b=([^;]*);c=([^;]*);')
m = p.match(line)
if m:
print m.group(1),m.group(2),m.group(3),

Note that "=*;" in your regex will match
zero or more "=" characters -- probably not
what you intended.

"[^;]* will match any string up to the next
";" character which will be a value (assuming
you don't have or care about embedded whitespace.)

You might also want to consider using a r'...'
string for the regex, which will make including
backslash characters easier if you need them
at some future time.
 
Reply With Quote
 
 
 
 
Mark Lawrence
Guest
Posts: n/a
 
      07-23-2009
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
> will like to take out the values (valuea, valueb, and valuec). How do
> I do that in Python? The group method will only return the matched
> part. Thanks.
>
> p = re.compile('#a=*;b=*;c=*;')
> m = p.match(line)
> if m:
> print m.group(),


IMHO a regex for this is overkill, a combination of string methods such
as split and find should suffice.

Regards.

 
Reply With Quote
 
Bill Davy
Guest
Posts: n/a
 
      07-23-2009
"Mark Lawrence" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> (E-Mail Removed) wrote:
>> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
>> will like to take out the values (valuea, valueb, and valuec). How do
>> I do that in Python? The group method will only return the matched
>> part. Thanks.
>>
>> p = re.compile('#a=*;b=*;c=*;')
>> m = p.match(line)
>> if m:
>> print m.group(),

>
> IMHO a regex for this is overkill, a combination of string methods such as
> split and find should suffice.
>
> Regards.
>



For the OP, it can be done with regex by grouping:

p = re.compile(r'#a=(*);b=(*);c=(*);')
m = p.match(line)
if m:
print m.group(1),

m.group(1) has valuea in it, etc.

But this may not be the best way, but it is reasonably terse.


 
Reply With Quote
 
tiefeng wu
Guest
Posts: n/a
 
      07-23-2009
2009/7/23 (E-Mail Removed) <(E-Mail Removed)>:
> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
> will like to take out the values (valuea, valueb, and valuec). *How do
> I do that in Python? *The group method will only return the matched
> part. *Thanks.
>
> p = re.compile('#a=*;b=*;c=*;')
> m = p.match(line)
> * * * *if m:
> * * * * * * print m.group(),
> --
> http://mail.python.org/mailman/listinfo/python-list
>


maybe like this:
>>> p = re.compile(r'#?\w+=(\w+);')
>>> l = re.findall(p, '#a=valuea;b=valueb;c=valuec;')
>>> for r in l: print(r)

....
valuea
valueb
valuec

tiefeng wu
2009-07-23
 
Reply With Quote
 
rurpy@yahoo.com
Guest
Posts: n/a
 
      07-24-2009
Nick Dumas wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Agreed. Two string.split()s, first at the semi-colon and then at the
> equal sign, will yield you your value, without having to fool around
> with regexes.
>
> On 7/23/2009 9:23 AM, Mark Lawrence wrote:
>> (E-Mail Removed) wrote:
>>> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
>>> will like to take out the values (valuea, valueb, and valuec). How do
>>> I do that in Python? The group method will only return the matched
>>> part. Thanks.
>>>
>>> p = re.compile('#a=*;b=*;c=*;')
>>> m = p.match(line)
>>> if m:
>>> print m.group(),

>>
>> IMHO a regex for this is overkill, a combination of string methods such
>> as split and find should suffice.


You're saying that something like the following
is better than the simple regex used by the OP?

[untested]
values = []
parts = line.split(';')
if len(parts) != 4: raise SomeError()
for p, expected in zip (parts[-1], ('#a','b','c')):
name, x, value = p.partition ('=')
if name != expected or x != '=':
raise SomeError()
values.append (value)
print values[0], values[1], values[2]

Blech, not in my book. The regex checks the
format of the string, extracts the values, and
does so very clearly. Further, it is easily
adapted to other similar formats, or evolutionary
changes in format. It is also (once one is
familiar with regexes -- a useful skill outside
of Python too) easier to get right (at least in
a simple case like this.)

The only reason I can think of to prefer
a split-based solution is if this code were
performance-critical in that I would expect
the split code to be faster (although I don't
know that for sure.)

This is a perfectly fine use of a regex.
 
Reply With Quote
 
rurpy@yahoo.com
Guest
Posts: n/a
 
      07-25-2009
Scott David Daniels wrote:
> (E-Mail Removed) wrote:
>> Nick Dumas wrote:
>>> On 7/23/2009 9:23 AM, Mark Lawrence wrote:
>>>> (E-Mail Removed) wrote:
>>>>> For example, I have a string "#a=valuea;b=valueb;c=valuec;", and I
>>>>> will like to take out the values (valuea, valueb, and valuec). How do
>>>>> I do that in Python? The group method will only return the matched
>>>>> part. Thanks.
>>>>>
>>>>> p = re.compile('#a=*;b=*;c=*;')
>>>>> m = p.match(line)
>>>>> if m:
>>>>> print m.group(),
>>>> IMHO a regex for this is overkill, a combination of string methods such
>>>> as split and find should suffice.

>>
>> You're saying that something like the following
>> is better than the simple regex used by the OP?
>> [untested]
>> values = []
>> parts = line.split(';')
>> if len(parts) != 4: raise SomeError()
>> for p, expected in zip (parts[-1], ('#a','b','c')):
>> name, x, value = p.partition ('=')
>> if name != expected or x != '=':
>> raise SomeError()
>> values.append (value)
>> print values[0], values[1], values[2]

>
> I call straw man: [tested]
> line = "#a=valuea;b=valueb;c=valuec;"
> d = dict(single.split('=', 1)
> for single in line.split(';') if single)
> d['#a'], d['b'], d['c']
> If you want checking code, add:
> if len(d) != 3:
> raise ValueError('Too many keys: %s in %r)' % (
> sorted(d), line))


OK, that seems like a good solution. It certainly
wasn't an obvious solution to me. I still have no
problem maintaining that

[tested]
line = "#a=valuea;b=valueb;c=valuec;"
m = re.match ('#a=(.*);b=(.*);c=(.*);', line)
m.groups((1,2,3))
(If you want checking code, nothing else required.)

is still simpler and clearer (with the obvious
caveat that one is familiar with regexes.)

>> Blech, not in my book. The regex checks the
>> format of the string, extracts the values, and
>> does so very clearly. Further, it is easily
>> adapted to other similar formats, or evolutionary
>> changes in format. It is also (once one is
>> familiar with regexes -- a useful skill outside
>> of Python too) easier to get right (at least in
>> a simple case like this.)

> The posted regex doesn't work; this might be homework, so
> I'll not fix the two problems. The fact that you did not
> see the failure weakens your claim of "does so very clearly."


Fact? Maybe you should have read the whole thread before
spewing claims that I did not see the regex problem.
The fact that you did not bother to weakens any claims
you make in this thread.
(Of course this line of argumentation is stupid anyway --
even had I not noticed the problem, it would say nothing
about the general case. My advice to you is not to try
to extrapolate when the sample size is one.)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
OT: regular expression matching multiple occurrences of one group pinkisntwell Python 5 11-10-2009 06:57 AM
Help with Pattern matching. Matching multiple lines from while reading from a file. Bobby Chamness Perl Misc 2 05-03-2007 06:02 PM
compilation error: "error: no matching function for call to 'String::String(String)' =?ISO-8859-1?Q?Martin_J=F8rgensen?= C++ 5 05-06-2006 03:48 PM
matching multiple lines as one record Stephen Moon Perl Misc 3 03-03-2004 02:41 AM
Pattern matching : not matching problem Marc Bissonnette Perl Misc 9 01-13-2004 05:52 PM



Advertisments