Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > how to handle repetitive regexp match checks

Reply
Thread Tools

how to handle repetitive regexp match checks

 
 
Matt Wette
Guest
Posts: n/a
 
      03-18-2005

Over the last few years I have converted from Perl and Scheme to
Python. There one task that I do often that is really slick in Perl
but escapes me in Python. I read in a text line from a file and check
it against several regular expressions and do something once I find a match.
For example, in perl ...

if ($line =~ /struct {/) {
do something
} elsif ($line =~ /typedef struct {/) {
do something else
} elsif ($line =~ /something else/) {
} ...

I am having difficulty doing this cleanly in python. Can anyone help?

rx1 = re.compile(r'struct {')
rx2 = re.compile(r'typedef struct {')
rx3 = re.compile(r'something else')

m = rx1.match(line)
if m:
do something
else:
m = rx2.match(line)
if m:
do something
else:
m = rx3.match(line)
if m:
do something
else:
error

(In Scheme I was able to do this cleanly with macros.)

Matt
 
Reply With Quote
 
 
 
 
David M. Cooke
Guest
Posts: n/a
 
      03-18-2005
Matt Wette <> writes:

> Over the last few years I have converted from Perl and Scheme to
> Python. There one task that I do often that is really slick in Perl
> but escapes me in Python. I read in a text line from a file and check
> it against several regular expressions and do something once I find a match.
> For example, in perl ...
>
> if ($line =~ /struct {/) {
> do something
> } elsif ($line =~ /typedef struct {/) {
> do something else
> } elsif ($line =~ /something else/) {
> } ...
>
> I am having difficulty doing this cleanly in python. Can anyone help?
>
> rx1 = re.compile(r'struct {')
> rx2 = re.compile(r'typedef struct {')
> rx3 = re.compile(r'something else')
>
> m = rx1.match(line)
> if m:
> do something
> else:
> m = rx2.match(line)
> if m:
> do something
> else:
> m = rx3.match(line)
> if m:
> do something
> else:
> error


I usually define a class like this:

class Matcher:
def __init__(self, text):
self.m = None
self.text = text
def match(self, pat):
self.m = pat.match(self.text)
return self.m
def __getitem__(self, name):
return self.m.group(name)

Then, use it like

for line in fo:
m = Matcher(line)
if m.match(rx1):
do something
elif m.match(rx2):
do something
else:
error

--
|>|\/|<
David M. Cooke
cookedm(at)physics(dot)mcmaster(dot)ca
 
Reply With Quote
 
 
 
 
Duncan Booth
Guest
Posts: n/a
 
      03-18-2005
Matt Wette wrote:

> I am having difficulty doing this cleanly in python. Can anyone help?
>
> rx1 = re.compile(r'struct {')
> rx2 = re.compile(r'typedef struct {')
> rx3 = re.compile(r'something else')
>
> m = rx1.match(line)
> if m:
> do something
> else:
> m = rx2.match(line)
> if m:
> do something
> else:
> m = rx3.match(line)
> if m:
> do something
> else:
> error
>
> (In Scheme I was able to do this cleanly with macros.)


My preferred way to do this is something like this:

import re

RX = re.compile(r'''
(?P<rx1> struct\s{ )|
(?P<rx2> typedef\sstruct\s{ )|
(?P<rx3> something\selse )
''', re.VERBOSE)

class Matcher:
def rx1(self, m):
print "rx1 matched", m.group(0)

def rx2(self, m):
print "rx2 matched", m.group(0)

def rx3(self, m):
print "rx3 matched", m.group(0)

def processLine(self, line):
m = RX.match(line)
if m:
getattr(self, m.lastgroup)(m)
else:
print "error",repr(line),"did not match"

matcher = Matcher()
matcher.processLine('struct { something')
matcher.processLine('typedef struct { something')
matcher.processLine('something else')
matcher.processLine('will not match')

 
Reply With Quote
 
GiddyJP
Guest
Posts: n/a
 
      03-18-2005
Matt Wette wrote:
>
> Over the last few years I have converted from Perl and Scheme to
> Python. There one task that I do often that is really slick in Perl
> but escapes me in Python. I read in a text line from a file and check
> it against several regular expressions and do something once I find a
> match.
> For example, in perl ...
>
> if ($line =~ /struct {/) {
> do something
> } elsif ($line =~ /typedef struct {/) {
> do something else
> } elsif ($line =~ /something else/) {
> } ...
>
> I am having difficulty doing this cleanly in python. Can anyone help?


I had a similar situation along with the requirement that the text to be
scanned was being read in chunks. After looking at the Python re module
and various other regex packages, I eventually wrote my own multiple
pattern scanning matcher.

However, since then I've discovered that the sre Python module has a
Scanner class that does something similar.

Anyway, you can see my code at:
http://users.cs.cf.ac.uk/J.P.Giddy/p...respass/2.0.0/

Using it, your code could look like:

# do this once
import Trespass
pattern = Trespass.Pattern()
pattern.addRegExp(r'struct {', 1)
pattern.addRegExp(r'typedef struct {', 2)
pattern.addRegExp(r'something else', 3)

# do this for each line
match = pattern.match(line)
if match:
value = match.value()
if value == 1:
# struct
do something
elif value == 2:
# typedef
do something
elif value == 3:
# something else
do something
else:
error
 
Reply With Quote
 
Jonathan Giddy
Guest
Posts: n/a
 
      03-18-2005
GiddyJP wrote:
>
> # do this once
> import Trespass
> pattern = Trespass.Pattern()
> pattern.addRegExp(r'struct {', 1)
> pattern.addRegExp(r'typedef struct {', 2)
> pattern.addRegExp(r'something else', 3)


Minor correction... in this module { always needs to be escaped if not
indicating a bounded repeat:
pattern.addRegExp(r'struct \{', 1)
pattern.addRegExp(r'typedef struct \{', 2)
pattern.addRegExp(r'something else', 3)
 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      03-18-2005
Matt -

Pyparsing may be of interest to you. One of its core features is the
ability to associate an action method with a parsing pattern. During
parsing, the action is called with the original source string, the
location within the string of the match, and the matched tokens.

Your code would look something like :

lbrace = Literal('{')
typedef = Literal('typedef')
struct = Literal('struct')
rx1 = struct + lbrace
rx2 = typedef + struct + lbrace
rx3 = Literal('something') + Literal('else')

def rx1Action(strg, loc, tokens):
.... put stuff to do here...

rx1.setParseAction( rx1Action )
rx2.setParseAction( rx2Action )
rx3.setParseAction( rx3Action )

# read code into Python string variable 'code'
patterns = (rx1 | rx2 | rx3)
patterns.scanString( code )

(I've broken up some of your literals, which allows for intervening
variable whitespace - that is Literal('struct') +Literal('{') will
accommodate one, two, or more blanks (even line breaks) between the
'struct' and the '{'.)

Get pyparsing at http://pyparsing.sourceforge.net.

-- Paul

 
Reply With Quote
 
Jeff Shannon
Guest
Posts: n/a
 
      03-18-2005
Matt Wette wrote:

>
> Over the last few years I have converted from Perl and Scheme to
> Python. There one task that I do often that is really slick in Perl
> but escapes me in Python. I read in a text line from a file and check
> it against several regular expressions and do something once I find a
> match.
> For example, in perl ...
>
> if ($line =~ /struct {/) {
> do something
> } elsif ($line =~ /typedef struct {/) {
> do something else
> } elsif ($line =~ /something else/) {
> } ...
>
> I am having difficulty doing this cleanly in python. Can anyone help?
>
> rx1 = re.compile(r'struct {')
> rx2 = re.compile(r'typedef struct {')
> rx3 = re.compile(r'something else')
>
> m = rx1.match(line)
> if m:
> do something
> else:
> m = rx2.match(line)
> if m:
> do something
> else:
> m = rx3.match(line)
> if m:
> do something
> else:
> error


If you don't need the match object as part of "do something", you
could do a fairly literal translation of the Perl:

if rx1.match(line):
do something
elif rx2.match(line):
do something else
elif rx3.match(line):
do other thing
else:
raise ValueError("...")

Alternatively, if each of the "do something" phrases can be easily
reduced to a function call, then you could do something like:

def do_something(line, match): ...
def do_something_else(line, match): ...
def do_other_thing(line, match): ...

table = [ (re.compile(r'struct {'), do_something),
(re.compile(r'typedef struct {'), do_something_else),
(re.compile(r'something else'), do_other_thing) ]

for pattern, func in table:
m = pattern.match(line)
if m:
func(line, m)
break
else:
raise ValueError("...")

The for/else pattern may look a bit odd, but the key feature here is
that the else clause only runs if the for loop terminates normally --
if you break out of the loop, the else does *not* run.

Jeff Shannon

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
String#match vs. Regexp#match - confused Old Echo Ruby 1 09-04-2008 06:11 PM
Ruby 1.9 - ArgumentError: incompatible encoding regexp match(US-ASCII regexp with ISO-2022-JP string) Mikel Lindsaar Ruby 0 03-31-2008 10:27 AM
RegExp.exec() returns null when there is a match - a JavaScript RegExp bug? Uldis Bojars Javascript 2 12-17-2006 09:50 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57