Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Python re repetative matching

Reply
Thread Tools

Python re repetative matching

 
 
Rich
Guest
Posts: n/a
 
      12-22-2003
Im new to regex's and cant quite figure out how to get them to work, what
I want is a tuple of all the matches from the regex. Ive simplified my
actual problem and still cant get it to work

Ive so far got this:

print re.findall( r'(@\d+)|(\w+)', "@5489 heel all and thumb toe" )

This dose exactly what I want, except it matches both matches each time,
so I end up with a list full tuples each with blank elements.... so close

I also tried my orginal idea

a = re.match( r'(@\d+)\s+(\w+)', "@5489 heel all and thumb toe" )
print a.groups()

This matches the number and the first word, so I thought the following
should rematch after the first word and give me what I wanted... but it
dosent for some reason

a = re.match( r'(@\d+)\s+(?\w+)\s*)', "@5489 heel all and thumb toe" )
print a.groups()

This is my next iteration, still gives me the number (first group) and
only the word (the second match). So I extend it to ...

a = re.match( r'(@\d+)\s+(?\w+)\s*)*', "@5489 heel all and thumb toe" )
print a.groups()

Now this gives me the number and the last but one word ? WHY!

My logic suggests that this should do what I want... what am I missing,
Ive spent all night trying to figure this out.

Cheers

Rich
 
Reply With Quote
 
 
 
 
Francis Avila
Guest
Posts: n/a
 
      12-23-2003
Rich wrote in message ...
>Im new to regex's and cant quite figure out how to get them to work, what
>I want is a tuple of all the matches from the regex. Ive simplified my
>actual problem and still cant get it to work


For the following answers I assume you only feed one line at a time. (If
this is an unacceptable restriction, things get uglier.)

First, try and think if you need re's. Re's are always last resort. In
this particular case, it seems to me that

s = "@5489 heel all and thumb toe"
s.split(' ', 1)

is all you need. If you need more precision (and the digit sequence is
always 4 chars long), the basic pattern is as follows:

re.split(r'(?<=@\d{4}) (?=.*)', s)

>Ive so far got this:
>print re.findall( r'(@\d+)|(\w+)', "@5489 heel all and thumb toe" )


You need nongrouping parens, and \w+ will split words.

Split to digits and words, discarding nothing:
re.findall(r'(?:@\d{4})|(?:.+)', s)

Split each item separately, discarding whitespace.
re.findall(r'(?:@\d{4})|(?:\w+)', s)

>I also tried my orginal idea
>
>a = re.match( r'(@\d+)\s+(\w+)', "@5489 heel all and thumb toe" )
>print a.groups()


re.match( r'(@\d+) (.+)', s ).groups()

>This matches the number and the first word, so I thought the following
>should rematch after the first word and give me what I wanted... but it
>dosent for some reason


It doesn't because '\w' means 'words', i.e. [1-9a-zA-Z_]. It doesn't match
spaces, so once it comes up against a space, it stops.

>
>a = re.match( r'(@\d+)\s+(?\w+)\s*)', "@5489 heel all and thumb toe" )
>print a.groups()


So you do know about nongrouping parens? Anyway, this doesn't match after
the first word because it only matches words, not spaces.

>This is my next iteration, still gives me the number (first group) and
>only the word (the second match). So I extend it to ...
>
>a = re.match( r'(@\d+)\s+(?\w+)\s*)*', "@5489 heel all and thumb toe" )
>print a.groups()
>
>Now this gives me the number and the last but one word ? WHY!


Because * does not magically make new groups. It seems to me it should
match the last word, though, instead of next-to-last, but I won't think
about it too much because this re is hideous as it is, and shouldn't be
used.

>My logic suggests that this should do what I want... what am I missing,
>Ive spent all night trying to figure this out.


Your first error was using regular expressions:

'Some people, when confronted with a problem, think "I know, I'll use
regular expressions". Now they have two problems.' --Jamie Zawinski,
comp.lang.emacs

Use string methods, especially split().

Also, I am no longer sure whether you want all items/words to be groups
separately, or if you want one group of numbers, and the rest words. Either
one is trivial for string methods:

s.split() for each in a group.
s.split(' ', 1) for only two groups.

However, the first one is impossible for REs (I think) if the number of
groups is variable, and ugly if the number of groups is fixed. The second
one I've done ad nauseum here.

See the RE Howto:
http://www.amk.ca/python/howto/regex/

Also, there's an O'Reilly book "Mastering Regular Expressions" which is said
to be excellent. Also Mertz wrote a "Text Processing with Python" (or
something like that) which is also said to be excellent. Mertz also has a
bunch of online columns on Python, all of which are very good. But my guess
is that you don't really need any of these.
--
Francis Avila

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Leigh's repetative mistakes Paul C++ 1 02-26-2011 08:15 PM
Help with Pattern matching. Matching multiple lines from while reading from a file. Bobby Chamness Perl Misc 2 05-03-2007 06:02 PM
Reducing amount of repetative code Brad Perl Misc 5 10-27-2004 01:44 PM
Pattern matching : not matching problem Marc Bissonnette Perl Misc 9 01-13-2004 05:52 PM
ANN: Apse.Approx - approximate string matching python extension Istvan Albert Python 0 08-31-2003 05:20 PM



Advertisments