![]() |
checking a string against multiple patterns
Hi,
here is a piece of pseudo-code (taken from Ruby) that illustrates the problem I'd like to solve in Python: str = 'abc' if str =~ /(b)/ # Check if str matches a pattern str = $` + $1 # Perform some action elsif str =~ /(a)/ # Check another pattern str = $1 + $' # Perform some other action elsif str =~ /(c)/ str = $1 end The task is to check a string against a number of different patterns (containing groupings). For each pattern, different actions need to be taken. In Python, a single match of this kind can be done as follows: str = 'abc' match = re.search( '(b)' , str ) if match: str = str[0:m.start()] + m.group(1) # I'm not sure if this way of accessing 'pre-match' # is optimal, but let's ignore it now The problem is that you you can't extend this example to multiple matches with 'elif' because the match must be performed separately from the conditional. This obviously won't work in Python: if match=re.search( pattern1 , str ): ... elif match=re.search( pattern2 , str ): ... So the only way seems to be: match = re.search( pattern1 , str ): if match: .... else: match = re.search( pattern2 , str ): if match: .... else: match = re.search( pattern3 , str ): if match: .... and we end up having a very nasty, multiply-nested code. Is there an alternative to it? Am I missing something? Python doesn't have special variables $1, $2 (right?) so you must assign the result of a match to a variable, to be able to access the groups. I'd appreciate any hints. Tomasz |
Re: checking a string against multiple patterns
tomasz a écrit :
> Is there an alternative to it? Am I missing something? Python doesn't > have special variables $1, $2 (right?) so you must assign the result > of a match to a variable, to be able to access the groups. > Hi Thomasz, See ie : http://www.regular-expressions.info/python.html [Search and Replace section] And you'll see that Python supports numbered groups and even named groups in regular expressions. Christophe K. |
Re: checking a string against multiple patterns
On 18 dic, 09:41, tomasz <tmkm...@googlemail.com> wrote:
> Hi, > > here is a piece of pseudo-code (taken from Ruby) that illustrates the > problem I'd like to solve in Python: > > str = 'abc' > if str =~ /(b)/ # Check if str matches a pattern > str = $` + $1 # Perform some action > elsif str =~ /(a)/ # Check another pattern > str = $1 + $' # Perform some other action > elsif str =~ /(c)/ > str = $1 > end > > The task is to check a string against a number of different patterns > (containing groupings). > For each pattern, different actions need to be taken. > > In Python, a single match of this kind can be done as follows: > > str = 'abc' > match = re.search( '(b)' , str ) > if match: str = str[0:m.start()] + m.group(1) # I'm not sure if > this way of accessing 'pre-match' > # is > optimal, but let's ignore it now > > The problem is that you you can't extend this example to multiple > matches with 'elif' > because the match must be performed separately from the conditional. > > This obviously won't work in Python: > > if match=re.search( pattern1 , str ): > ... > elif match=re.search( pattern2 , str ): > ... > > So the only way seems to be: > > match = re.search( pattern1 , str ): > if match: > .... > else: > match = re.search( pattern2 , str ): > if match: > .... > else: > match = re.search( pattern3 , str ): > if match: > .... > > and we end up having a very nasty, multiply-nested code. Define a small function with each test+action, and iterate over them until a match is found: def check1(input): match = re.search(pattern1, input) if match: return input[:match.end(1)] def check2(input): match = re.search(pattern2, input) if match: return ... def check3(input): match = ... if match: return ... for check in check1, check2, check3: result = check(input) if result is not None: break else: # no match found -- Gabriel Genellina |
Re: checking a string against multiple patterns
On Dec 18, 1:41 pm, tomasz <tmkm...@googlemail.com> wrote:
> Hi, > > here is a piece of pseudo-code (taken from Ruby) that illustrates the > problem I'd like to solve in Python: > > str = 'abc' > if str =~ /(b)/ # Check if str matches a pattern > str = $` + $1 # Perform some action > elsif str =~ /(a)/ # Check another pattern > str = $1 + $' # Perform some other action > elsif str =~ /(c)/ > str = $1 > end > > The task is to check a string against a number of different patterns > (containing groupings). > For each pattern, different actions need to be taken. > In the `re.sub` function (and `sub` method of regex object), the `repl` parameter can be a callback function as well as a string: http://docs.python.org/lib/node46.html Does that help? Eg. def multireplace(text, mapping): rx = re.compile('|'.join(re.escape(key) for key in mapping)) def callback(match): key = match.group(0) repl = mapping[key] log.info("Replacing '%s' with '%s'", key, repl) return repl return rx.subn(callback, text) (I'm not sure, but I think I adapted this from: http://effbot.org/zone/python-replace.htm) Gerard |
Re: checking a string against multiple patterns
> Define a small function with each test+action, and iterate over them
> until a match is found: > > def check1(input): > match = re.search(pattern1, input) > if match: > return input[:match.end(1)] > > def check2(input): > match = re.search(pattern2, input) > if match: > return ... > > for check in check1, check2, check3: > result = check(input) > if result is not None: > break > else: > # no match found Or, one could even create a mapping of regexps->functions: def function1(match): do_something_with(match) def function2(match): do_something_with(match) def default_function(input): do_something_with(input) function_mapping = ( (re.compile(pattern1), function1), (re.compile(pattern2), function2), (re.compile(pattern3), function1), ) def match_and_do(input, mapping): for regex, func in mapping: m = regex.match(input) if m: return func(m) return default_function(input) result = match_and_do("Hello world", function_mapping) In addition to having a clean separation between patterns and functions, and the mapping between them, this also allows wiring multiple patterns to the same function (e.g. pattern3->function1) and also allows specification of the mapping evaluation order. -tkc |
Re: checking a string against multiple patterns
tomasz <tmkmarc@googlemail.com> writes:
> here is a piece of pseudo-code (taken from Ruby) that illustrates the > problem I'd like to solve in Python: [...] I asked the very same question in http://groups.google.com/group/comp....eb5631ade8b393 It seems that people either write more elaborate constructs or learn to tolerate the nesting. > Is there an alternative to it? A simple workaround is to write a trivial function that returns a boolean, and also stores the match object in either a global storage or an object. It's not really elegant, especially in smaller scripts, but it works: def search(pattern, s, store): match = re.search(pattern, s) store.match = match return match is not None class MatchStore(object): pass # irrelevant, any object with a 'match' attr would do where = MatchStore() if search(pattern1, s, where): pattern1 matched, matchobj in where.match elif search(pattern2, s, where): pattern2 matched, matchobj in where.match .... |
Re: checking a string against multiple patterns
tomasz <tmkmarc@googlemail.com> wrote:
> Is there an alternative to it? Am I missing something? Python doesn't > have special variables $1, $2 (right?) so you must assign the result > of a match to a variable, to be able to access the groups. Look for repetition in your code and remove it. That will almost always remove the nesting. Or, combine your regular expressions into one large expression and branch on the existence of relevant groups. Using named groups stops all your code breaking just because you need to change one part of the regex. e.g. This would handle your example, but it is just one way to do it: import re from string import Template def sub(patterns, s): for pat, repl in patterns: m = re.match(pat, s) if m: return Template(repl).substitute(m.groupdict()) return s PATTERNS = [ (r'(?P<start>.*?)(?P<b>b+)', 'start=$start, b=$b'), (r'(?P<a>a+)(?P<tail>.*)$', 'Got a: $a, tail=$tail'), (r'(?P<c>c+)', 'starts with c: $c'), ] >>> sub(PATTERNS, 'abc') 'start=a, b=b' >>> sub(PATTERNS, 'is a something') 'is a something' >>> sub(PATTERNS, 'a something') 'Got a: a, tail= something' |
Re: checking a string against multiple patterns
On Dec 18, 4:41 am, tomasz <tmkm...@googlemail.com> wrote:
> Is there an alternative to it? Am I missing something? Python doesn't > have special variables $1, $2 (right?) so you must assign the result > of a match to a variable, to be able to access the groups. > > I'd appreciate any hints. > Don't use regexes for something as simple as this. Try find(). Most of the time I use regexes in perl (90%+) I am doing something that can be done much better using the string methods and some simple operations. Plus, it turns out to be faster than perl usually. |
| All times are GMT. The time now is 03:22 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.