Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > checking a string against multiple patterns

Reply
Thread Tools

checking a string against multiple patterns

 
 
tomasz
Guest
Posts: n/a
 
      12-18-2007
Hi,

here is a piece of pseudo-code (taken from Ruby) that illustrates the
problem I'd like to solve in Python:

str = 'abc'
if str =~ /(b)/ # Check if str matches a pattern
str = $` + $1 # Perform some action
elsif str =~ /(a)/ # Check another pattern
str = $1 + $' # Perform some other action
elsif str =~ /(c)/
str = $1
end

The task is to check a string against a number of different patterns
(containing groupings).
For each pattern, different actions need to be taken.

In Python, a single match of this kind can be done as follows:

str = 'abc'
match = re.search( '(b)' , str )
if match: str = str[0:m.start()] + m.group(1) # I'm not sure if
this way of accessing 'pre-match'
# is
optimal, but let's ignore it now

The problem is that you you can't extend this example to multiple
matches with 'elif'
because the match must be performed separately from the conditional.

This obviously won't work in Python:

if match=re.search( pattern1 , str ):
...
elif match=re.search( pattern2 , str ):
...

So the only way seems to be:

match = re.search( pattern1 , str ):
if match:
....
else:
match = re.search( pattern2 , str ):
if match:
....
else:
match = re.search( pattern3 , str ):
if match:
....

and we end up having a very nasty, multiply-nested code.

Is there an alternative to it? Am I missing something? Python doesn't
have special variables $1, $2 (right?) so you must assign the result
of a match to a variable, to be able to access the groups.

I'd appreciate any hints.

Tomasz







 
Reply With Quote
 
 
 
 
kib
Guest
Posts: n/a
 
      12-18-2007
tomasz a écrit :

> Is there an alternative to it? Am I missing something? Python doesn't
> have special variables $1, $2 (right?) so you must assign the result
> of a match to a variable, to be able to access the groups.
>

Hi Thomasz,

See ie :

http://www.regular-expressions.info/python.html [Search and Replace section]

And you'll see that Python supports numbered groups and even named
groups in regular expressions.

Christophe K.
 
Reply With Quote
 
 
 
 
Gabriel Genellina
Guest
Posts: n/a
 
      12-18-2007
On 18 dic, 09:41, tomasz <(E-Mail Removed)> wrote:

> Hi,
>
> here is a piece of pseudo-code (taken from Ruby) that illustrates the
> problem I'd like to solve in Python:
>
> str = 'abc'
> if str =~ /(b)/ # Check if str matches a pattern
> str = $` + $1 # Perform some action
> elsif str =~ /(a)/ # Check another pattern
> str = $1 + $' # Perform some other action
> elsif str =~ /(c)/
> str = $1
> end
>
> The task is to check a string against a number of different patterns
> (containing groupings).
> For each pattern, different actions need to be taken.
>
> In Python, a single match of this kind can be done as follows:
>
> str = 'abc'
> match = re.search( '(b)' , str )
> if match: str = str[0:m.start()] + m.group(1) # I'm not sure if
> this way of accessing 'pre-match'
> # is
> optimal, but let's ignore it now
>
> The problem is that you you can't extend this example to multiple
> matches with 'elif'
> because the match must be performed separately from the conditional.
>
> This obviously won't work in Python:
>
> if match=re.search( pattern1 , str ):
> ...
> elif match=re.search( pattern2 , str ):
> ...
>
> So the only way seems to be:
>
> match = re.search( pattern1 , str ):
> if match:
> ....
> else:
> match = re.search( pattern2 , str ):
> if match:
> ....
> else:
> match = re.search( pattern3 , str ):
> if match:
> ....
>
> and we end up having a very nasty, multiply-nested code.


Define a small function with each test+action, and iterate over them
until a match is found:

def check1(input):
match = re.search(pattern1, input)
if match:
return input[:match.end(1)]

def check2(input):
match = re.search(pattern2, input)
if match:
return ...

def check3(input):
match = ...
if match:
return ...

for check in check1, check2, check3:
result = check(input)
if result is not None:
break
else:
# no match found

--
Gabriel Genellina
 
Reply With Quote
 
grflanagan
Guest
Posts: n/a
 
      12-18-2007
On Dec 18, 1:41 pm, tomasz <(E-Mail Removed)> wrote:
> Hi,
>
> here is a piece of pseudo-code (taken from Ruby) that illustrates the
> problem I'd like to solve in Python:
>
> str = 'abc'
> if str =~ /(b)/ # Check if str matches a pattern
> str = $` + $1 # Perform some action
> elsif str =~ /(a)/ # Check another pattern
> str = $1 + $' # Perform some other action
> elsif str =~ /(c)/
> str = $1
> end
>
> The task is to check a string against a number of different patterns
> (containing groupings).
> For each pattern, different actions need to be taken.
>


In the `re.sub` function (and `sub` method of regex object), the
`repl` parameter can be a callback function as well as a string:

http://docs.python.org/lib/node46.html

Does that help?

Eg.

def multireplace(text, mapping):
rx = re.compile('|'.join(re.escape(key) for key in mapping))
def callback(match):
key = match.group(0)
repl = mapping[key]
log.info("Replacing '%s' with '%s'", key, repl)
return repl
return rx.subn(callback, text)

(I'm not sure, but I think I adapted this from: http://effbot.org/zone/python-replace.htm)

Gerard
 
Reply With Quote
 
Tim Chase
Guest
Posts: n/a
 
      12-18-2007
> Define a small function with each test+action, and iterate over them
> until a match is found:
>
> def check1(input):
> match = re.search(pattern1, input)
> if match:
> return input[:match.end(1)]
>
> def check2(input):
> match = re.search(pattern2, input)
> if match:
> return ...
>
> for check in check1, check2, check3:
> result = check(input)
> if result is not None:
> break
> else:
> # no match found


Or, one could even create a mapping of regexps->functions:

def function1(match):
do_something_with(match)

def function2(match):
do_something_with(match)

def default_function(input):
do_something_with(input)

function_mapping = (
(re.compile(pattern1), function1),
(re.compile(pattern2), function2),
(re.compile(pattern3), function1),
)

def match_and_do(input, mapping):
for regex, func in mapping:
m = regex.match(input)
if m: return func(m)
return default_function(input)

result = match_and_do("Hello world", function_mapping)

In addition to having a clean separation between patterns and
functions, and the mapping between them, this also allows wiring
multiple patterns to the same function (e.g. pattern3->function1)
and also allows specification of the mapping evaluation order.

-tkc



 
Reply With Quote
 
Hrvoje Niksic
Guest
Posts: n/a
 
      12-18-2007
tomasz <(E-Mail Removed)> writes:

> here is a piece of pseudo-code (taken from Ruby) that illustrates the
> problem I'd like to solve in Python:

[...]

I asked the very same question in
http://groups.google.com/group/comp....eb5631ade8b393
It seems that people either write more elaborate constructs or learn
to tolerate the nesting.

> Is there an alternative to it?


A simple workaround is to write a trivial function that returns a
boolean, and also stores the match object in either a global storage
or an object. It's not really elegant, especially in smaller scripts,
but it works:

def search(pattern, s, store):
match = re.search(pattern, s)
store.match = match
return match is not None

class MatchStore(object):
pass # irrelevant, any object with a 'match' attr would do

where = MatchStore()
if search(pattern1, s, where):
pattern1 matched, matchobj in where.match
elif search(pattern2, s, where):
pattern2 matched, matchobj in where.match
....
 
Reply With Quote
 
Duncan Booth
Guest
Posts: n/a
 
      12-18-2007
tomasz <(E-Mail Removed)> wrote:

> Is there an alternative to it? Am I missing something? Python doesn't
> have special variables $1, $2 (right?) so you must assign the result
> of a match to a variable, to be able to access the groups.


Look for repetition in your code and remove it. That will almost always
remove the nesting. Or, combine your regular expressions into one large
expression and branch on the existence of relevant groups. Using named
groups stops all your code breaking just because you need to change one
part of the regex.

e.g. This would handle your example, but it is just one way to do it:

import re
from string import Template

def sub(patterns, s):
for pat, repl in patterns:
m = re.match(pat, s)
if m:
return Template(repl).substitute(m.groupdict())
return s

PATTERNS = [
(r'(?P<start>.*?)(?P<b>b+)', 'start=$start, b=$b'),
(r'(?P<a>a+)(?P<tail>.*)$', 'Got a: $a, tail=$tail'),
(r'(?P<c>c+)', 'starts with c: $c'),
]

>>> sub(PATTERNS, 'abc')

'start=a, b=b'
>>> sub(PATTERNS, 'is a something')

'is a something'
>>> sub(PATTERNS, 'a something')

'Got a: a, tail= something'

 
Reply With Quote
 
Jonathan Gardner
Guest
Posts: n/a
 
      12-18-2007
On Dec 18, 4:41 am, tomasz <(E-Mail Removed)> wrote:
> Is there an alternative to it? Am I missing something? Python doesn't
> have special variables $1, $2 (right?) so you must assign the result
> of a match to a variable, to be able to access the groups.
>
> I'd appreciate any hints.
>


Don't use regexes for something as simple as this. Try find().

Most of the time I use regexes in perl (90%+) I am doing something
that can be done much better using the string methods and some simple
operations. Plus, it turns out to be faster than perl usually.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
matching against a zillion patterns George George Ruby 17 10-18-2009 11:38 AM
Checking a string against multiple matches Aaron Scott Python 7 12-02-2008 03:10 PM
M$ against Blu-ray, M$ for Blu-ray, M$ against Blu-ray, M$ forBlu-ray, ...... Blig Merk DVD Video 66 04-27-2008 04:46 AM
Regex: string match against mutiple patterns thorsten Java 1 03-10-2005 07:07 PM
where to find good patterns and sources of patterns (was Re: singletons) crichmon C++ 4 07-07-2004 10:02 PM



Advertisments