Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Problema con le RE....

Reply
Thread Tools

Problema con le RE....

 
 
Alessandro
Guest
Posts: n/a
 
      01-09-2006
Problema con le RE....
Ho questa stringa "3 HOURS, 22 MINUTES, and 28 SECONDS" e la devo
'dividere' nelle sue tre parti "3 HOURS", "22 MINUTES", "28 SECONDS".
La cosa mi viene molto con le RE...(inutile la premessa che sono molto
alle prime armi con RE e Python)
Qesito perchè se eseguo questo codice

>>>>regex=re.compile("[0-9]+ (HOUR|MINUTE|SECOND)")
>>>>print regex.findall("22 MINUTE, 3 HOUR, AND 28 SECOND")

ottengo come output:

>>>> ['MINUTE', 'HOUR', 'SECOND']


e non come mi aspettavo:

>>>> ['3 MINUTE', '22 HOUR', '28 SECOND']


Saluti e grazie mille...
Alessandro

 
Reply With Quote
 
 
 
 
Xavier Morel
Guest
Posts: n/a
 
      01-09-2006
Alessandro wrote:
> Problema con le RE....
> Ho questa stringa "3 HOURS, 22 MINUTES, and 28 SECONDS" e la devo
> 'dividere' nelle sue tre parti "3 HOURS", "22 MINUTES", "28 SECONDS".
> La cosa mi viene molto con le RE...(inutile la premessa che sono molto
> alle prime armi con RE e Python)
> Qesito perchè se eseguo questo codice
>
> >>>>regex=re.compile("[0-9]+ (HOUR|MINUTE|SECOND)")
> >>>>print regex.findall("22 MINUTE, 3 HOUR, AND 28 SECOND")

> ottengo come output:
>
> >>>> ['MINUTE', 'HOUR', 'SECOND']

>
> e non come mi aspettavo:
>
> >>>> ['3 MINUTE', '22 HOUR', '28 SECOND']

>
> Saluti e grazie mille...
> Alessandro
>

Would probably be slightly easier had you written it in english, but
basically the issue is the matching group.

A match group is defined by the parenthesis in the regular expression,
e.g. your match group is "(HOUR|MINUTE|SECOND)", which means that only
that will be returned by a findall.

You need to include the number as well, and you can use a non-grouping
match for the time (with (?: ) instead of () ) to prevent dirtying your
matched groups.

>>> pattern = re.compile(r"([0-9]+ (?:HOUR|MINUTE|SECOND))")


Other improvements:
* \d is a shortcut for "any digit" and is therefore equivalent to [0-9]
yet slightly clearer.
* You may use the re.I (or re.IGNORECASE) to match both lower and
uppercase times
* You can easily handle an optional "s"

Improved regex:

>>> pattern = re.compile(r"(\d+ (?:hour|minute|second)s?)", re.I)
>>> pattern.findall("3 HOURS 22 MINUTES 28 SECONDS")

['3 HOURS', '22 MINUTES', '28 SECONDS']
>>> pattern.findall("1 HOUR 22 MINUTES 28 SECONDS")

['1 HOUR', '22 MINUTES', '28 SECONDS']

If you want to learn more about regular expressions, I suggest you to
browse and read http://regular-expressions.info/ it's a good source of
informations, and use the Kodos software which is a quite good Python
regex debugger.
 
Reply With Quote
 
 
 
 
Alessandro
Guest
Posts: n/a
 
      01-09-2006
Thanks for the reply it's ok!!!
The language? I selected the wrong newsgroup in my
newsreader!!!...sorry...

Thanks...

Alessandro...

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Piccolo con problema con il tipo float Matteo Mancini Ruby 3 10-08-2007 09:37 PM
problema con xslt e caratteri escape teo_teo_teo@tiscali.it XML 1 12-16-2005 09:49 AM
Problema con ISDN Y CONEXION PPTP ELR Cisco 2 07-28-2005 11:18 AM
problema con vb net e system.net.socket Fabio Cirillo ASP .Net 0 03-29-2005 07:44 PM
Problema con archivos dbx Jose Joaquin de Haro ASP .Net 1 01-28-2005 01:41 PM



Advertisments