Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Help with Regular Expressions

Reply
Thread Tools

Help with Regular Expressions

 
 
Harlin Seritt
Guest
Posts: n/a
 
      08-10-2005
I have been looking at the Python re module and have been trying to
make sense of a simple function that I'd like to do. However, no amount
of reading or googling has helped me with this. Forgive my
stone-headedness. I have done this with .NET and Java in the past but
damn if I can't get it done with Python for some reason. As such I am
sure it is something even simpler.

I am trying to find some matches and have them put into a list when
processing is done. I'll use a simple example like email addresses.

My input is the following:
wordList = ['myname1', '(E-Mail Removed)', '(E-Mail Removed)',
'myname4@domain', '(E-Mail Removed)']

My regular expression would be something like '\w\@\w\.\w' (I realize
it could and should be more detailed but that's not the point for now).

I would like to find out how to output the matches for this expression
of my 'wordList' into a neat list variable. How do I get this done?

Thanks,

Harlin Seritt

 
Reply With Quote
 
 
 
 
Devan L
Guest
Posts: n/a
 
      08-10-2005
Harlin Seritt wrote:
> I have been looking at the Python re module and have been trying to
> make sense of a simple function that I'd like to do. However, no amount
> of reading or googling has helped me with this. Forgive my
> stone-headedness. I have done this with .NET and Java in the past but
> damn if I can't get it done with Python for some reason. As such I am
> sure it is something even simpler.
>
> I am trying to find some matches and have them put into a list when
> processing is done. I'll use a simple example like email addresses.
>
> My input is the following:
> wordList = ['myname1', '(E-Mail Removed)', '(E-Mail Removed)',
> 'myname4@domain', '(E-Mail Removed)']
>
> My regular expression would be something like '\w\@\w\.\w' (I realize
> it could and should be more detailed but that's not the point for now).
>
> I would like to find out how to output the matches for this expression
> of my 'wordList' into a neat list variable. How do I get this done?
>
> Thanks,
>
> Harlin Seritt


You need to enclose the '\w's in parentheses. The re module will only
return it if you enclose it in parentheses. Also, you need to use the
'+' so that \w won't just match the first alphanumeric character, but
will match one or more. You also need to escape the '.' because that's
matches any character. So your regular expression would be more like

r'(\w+)@(\w+)\.(\w+)'

Anyways, you can use a list comprehension and the groups() method of a
match object to build a list of tuples
[re.match(r'(\w+)@(\w+)\.(\w+)', address).groups() for address in
wordList]

On a side note, some of the email addresses in your list don't work.
You should use

wordList = ['(E-Mail Removed)', '(E-Mail Removed)',
'(E-Mail Removed)']

 
Reply With Quote
 
 
 
 
Fredrik Lundh
Guest
Posts: n/a
 
      08-10-2005
Harlin Seritt wrote:

> I am trying to find some matches and have them put into a list when
> processing is done. I'll use a simple example like email addresses.
>
> My input is the following:
> wordList = ['myname1', '(E-Mail Removed)', '(E-Mail Removed)',
> 'myname4@domain', '(E-Mail Removed)']
>
> My regular expression would be something like '\w\@\w\.\w' (I realize
> it could and should be more detailed but that's not the point for now).
>
> I would like to find out how to output the matches for this expression
> of my 'wordList' into a neat list variable. How do I get this done?


that's more of a list manipulation question than a regular expression
question, of course. to apply a regular expression to all items in a
list, apply it to all items in a list. a list comprehension is the shortest
way to do this:

>>> out = [word for word in wordList if re.match("\w+@\w+\.\w+", word)]
>>> out

['(E-Mail Removed)', '(E-Mail Removed)', '(E-Mail Removed)']

</F>



 
Reply With Quote
 
Harlin Seritt
Guest
Posts: n/a
 
      08-10-2005
Ahh that's it Frederik. That's what I was looking for. The regular
expression problems I will take care of, but first wanted to walk
before running.

Thanks,

Harlin Seritt

 
Reply With Quote
 
Harlin Seritt
Guest
Posts: n/a
 
      08-10-2005
Forgive another question here, but what is the 'r' for when used with
expression: r'\w+...' ?

 
Reply With Quote
 
Benjamin Niemann
Guest
Posts: n/a
 
      08-10-2005
Harlin Seritt wrote:

> Forgive another question here, but what is the 'r' for when used with
> expression: r'\w+...' ?


r'..' or r".." are "raw strings" where backslashes do not introduce an
escape sequence - so you don't have to write '\\', if you need a backslash
in the string, e.g. r'\w+' == '\\w+'.
Useful for regular expression (because the re module parses the '\X'
sequences itself) or Windows pathes (e.g. r'C:\newfile.txt').

And you should append a '$' to the regular expression, because
r"\w+@\w+\.\w+" would match '(E-Mail Removed)-+*junk', too.

--
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      08-10-2005
If your re demands get more complicated, you could take a look at
pyparsing. The code is a bit more verbose, but many find it easier to
compose their expressions using pyparsing's classes, such as Literal,
OneOrMore, Optional, etc., plus a number of built-in helper functions
and expressions, including delimitedList, quotedString, and
cStyleComment. Pyparsing is intended for writing recursive-descent
parsers, but can also be used (and is best learned) with simple
applications such as this one.

Here is a simple script for parsing your e-mail addresses. Note the
use of results names to give you access to the individual parsed fields
(re's also support a similar capability).

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

from pyparsing import Literal,Word,Optional,\
delimitedList,alphanums

# define format of an email address
AT = Literal("@").suppress()
emailWord = Word(alphanums+"_")
emailDomain = delimitedList( emailWord, ".", combine=True)
emailAddress = emailWord.setResultsName("user") + \
Optional( AT + emailDomain ).setResultsName("host")

# parse each word in wordList
wordList = ['myname1', '(E-Mail Removed)', '(E-Mail Removed)',
'myname4@domain', '(E-Mail Removed)']

for w in wordList:
addr = emailAddress.parseString(w)
print w
print addr
print "user:", addr.user
print "host:", addr.host
print

Will print out:
myname1
['myname1']
user: myname1
host:

http://www.velocityreviews.com/forums/(E-Mail Removed)
['myname1', 'domain.tld']
user: myname1
host: domain.tld

(E-Mail Removed)
['myname2', 'domain.tld']
user: myname2
host: domain.tld

myname4@domain
['myname4', 'domain']
user: myname4
host: domain

(E-Mail Removed)
['myname5', 'domain.tldx']
user: myname5
host: domain.tldx

 
Reply With Quote
 
Jeff Schwab
Guest
Posts: n/a
 
      08-10-2005
Harlin Seritt wrote:

> I am trying to find some matches and have them put into a list when
> processing is done. I'll use a simple example like email addresses.
>
> My input is the following:
> wordList = ['myname1', '(E-Mail Removed)', '(E-Mail Removed)',
> 'myname4@domain', '(E-Mail Removed)']
>
> My regular expression would be something like '\w\@\w\.\w' (I realize
> it could and should be more detailed but that's not the point for now).


FYI, matching all compliant email addresses is ridiculously complicated.
Before you spend too much time on it, you might want to borrow the
complete and thoroughly explained example in Regular Expressions (O'Reilly):

http://www.oreilly.com/catalog/regex/
 
Reply With Quote
 
Cappy2112
Guest
Posts: n/a
 
      08-10-2005
Be careful with that book though, it's RE examples are Perl-centric and
not exactly the same implementation that Python uses. However, it's a
good place to start

This will also be useful
http://www.amk.ca/python/howto/regex/

 
Reply With Quote
 
Christopher Subich
Guest
Posts: n/a
 
      08-10-2005
Paul McGuire wrote:
> If your re demands get more complicated, you could take a look at
> pyparsing. The code is a bit more verbose, but many find it easier to
> compose their expressions using pyparsing's classes, such as Literal,
> OneOrMore, Optional, etc., plus a number of built-in helper functions
> and expressions, including delimitedList, quotedString, and
> cStyleComment. Pyparsing is intended for writing recursive-descent
> parsers, but can also be used (and is best learned) with simple
> applications such as this one.


As a slightly unrelated pyparsing question, is there a good set of API
documentation around for pyparsing?

I've looked into it for my mud client, but for now have gone with
DParser because I need (desire) custom token generation sometimes.
Pyparsing looks easier to internationalize, though.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expressions help needed Andreas Klemt ASP .Net 0 08-17-2004 11:04 AM
Add custom regular expressions to the validation list of available expressions Jay Douglas ASP .Net 0 08-15-2003 10:19 PM
Regular Expressions....HELP! Stephajn Craig ASP .Net 1 07-16-2003 06:56 PM
Re: Help with regular expressions. Chris R. Timmons ASP .Net 0 07-04-2003 10:49 PM
Re: Help with regular expressions. David Waz... ASP .Net 0 07-04-2003 08:59 PM



Advertisments