Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > re.match and non-alphanumeric characters

Reply
Thread Tools

re.match and non-alphanumeric characters

 
 
The Web President
Guest
Posts: n/a
 
      11-16-2008
Dear all,

this is really driving me nuts and any help would be extremely
appreciated.

I have a string that contains some numeric data. I want to isolate
these data using re.match, as follows.

bogus = "IFC(35m)"
data = re.match(r'(\d+)',bogus)
print data.group(1)

I would expect to have "35" printed out to screen, but instead I get
an error that the regular expression did not match:

Traceback (most recent call last):
File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
line 20, in <module>
print data.group(1)
AttributeError: 'NoneType' object has no attribute 'group'

Note that the same holds if I look for "35" straight, instead of "\d
+". If instead I look for "IFC" it works fine. That is, apparently
re.match will match only up to the first non-alphanumeric character
and ignore anything after a "(", "_", "[" and god knows what else.

I am using Python 2.6 (r26:66721, latest stable version). Am I missing
something very big and very important?
 
Reply With Quote
 
 
 
 
r
Guest
Posts: n/a
 
      11-16-2008
On Nov 16, 10:33*am, The Web President <(E-Mail Removed)>
wrote:
> Dear all,
>
> this is really driving me nuts and any help would be extremely
> appreciated.
>
> I have a string that contains some numeric data. I want to isolate
> these data using re.match, as follows.
>
> bogus = "IFC(35m)"
> data = re.match(r'(\d+)',bogus)
> print data.group(1)
>
> I would expect to have "35" printed out to screen, but instead I get
> an error that the regular expression did not match:
>
> Traceback (most recent call last):
> * File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
> line 20, in <module>
> * * print data.group(1)
> AttributeError: 'NoneType' object has no attribute 'group'
>
> Note that the same holds if I look for "35" straight, instead of "\d
> +". If instead I look for "IFC" it works fine. That is, apparently
> re.match will match only up to the first non-alphanumeric character
> and ignore anything after a "(", "_", "[" and god knows what else.
>
> I am using Python 2.6 (r26:66721, latest stable version). Am I missing
> something very big and very important?


try re.search or re.findall
re.match is only at the beginning of a string
i almost never use it
>>> re.search('(\d+)', bogus).group()

'35'
>>> re.search('(\d+)', bogus).span()

(4, 6)
 
Reply With Quote
 
 
 
 
MRAB
Guest
Posts: n/a
 
      11-16-2008
On Nov 16, 4:33*pm, The Web President <(E-Mail Removed)>
wrote:
> Dear all,
>
> this is really driving me nuts and any help would be extremely
> appreciated.
>
> I have a string that contains some numeric data. I want to isolate
> these data using re.match, as follows.
>
> bogus = "IFC(35m)"
> data = re.match(r'(\d+)',bogus)
> print data.group(1)
>
> I would expect to have "35" printed out to screen, but instead I get
> an error that the regular expression did not match:
>
> Traceback (most recent call last):
> * File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
> line 20, in <module>
> * * print data.group(1)
> AttributeError: 'NoneType' object has no attribute 'group'
>
> Note that the same holds if I look for "35" straight, instead of "\d
> +". If instead I look for "IFC" it works fine. That is, apparently
> re.match will match only up to the first non-alphanumeric character
> and ignore anything after a "(", "_", "[" and god knows what else.
>
> I am using Python 2.6 (r26:66721, latest stable version). Am I missing
> something very big and very important?


re.match() anchors the match at the start of the string. What you need
is re.search(). It's all in the documentation!
 
Reply With Quote
 
Gabriel Genellina
Guest
Posts: n/a
 
      11-16-2008
En Sun, 16 Nov 2008 14:33:42 -0200, The Web President
<(E-Mail Removed)> escribió:

> I have a string that contains some numeric data. I want to isolate
> these data using re.match, as follows.
>
> bogus = "IFC(35m)"
> data = re.match(r'(\d+)',bogus)
> print data.group(1)
>
> I would expect to have "35" printed out to screen, but instead I get
> an error that the regular expression did not match:


http://docs.python.org/library/re.ht...g-vs-searching

--
Gabriel Genellina

 
Reply With Quote
 
Diez B. Roggisch
Guest
Posts: n/a
 
      11-16-2008
The Web President wrote:

> Dear all,
>
> this is really driving me nuts and any help would be extremely
> appreciated.
>
> I have a string that contains some numeric data. I want to isolate
> these data using re.match, as follows.
>
> bogus = "IFC(35m)"
> data = re.match(r'(\d+)',bogus)
> print data.group(1)
>
> I would expect to have "35" printed out to screen, but instead I get
> an error that the regular expression did not match:
>
> Traceback (most recent call last):
> File "C:\Documents and Settings\Mattia\Desktop\Neeltje\read.py",
> line 20, in <module>
> print data.group(1)
> AttributeError: 'NoneType' object has no attribute 'group'
>
> Note that the same holds if I look for "35" straight, instead of "\d
> +". If instead I look for "IFC" it works fine. That is, apparently
> re.match will match only up to the first non-alphanumeric character
> and ignore anything after a "(", "_", "[" and god knows what else.
>
> I am using Python 2.6 (r26:66721, latest stable version). Am I missing
> something very big and very important?


Yep - re.search. Match matches the whole string. You want searching.


Diez
 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      11-16-2008
On Nov 17, 4:44*am, "Diez B. Roggisch" <(E-Mail Removed)> wrote:

> Match matches the whole string.


*ONLY* if the pattern ends with "$" or r"\Z"
 
Reply With Quote
 
Diez B. Roggisch
Guest
Posts: n/a
 
      11-16-2008
John Machin schrieb:
> On Nov 17, 4:44 am, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
>
>> Match matches the whole string.

>
> *ONLY* if the pattern ends with "$" or r"\Z"



You think so?

import re

rex = re.compile("abc.*def")

if rex.match("abc0123455678def"):
print "matched"



Diez
 
Reply With Quote
 
Steve Holden
Guest
Posts: n/a
 
      11-16-2008
Diez B. Roggisch wrote:
> John Machin schrieb:
>> On Nov 17, 4:44 am, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
>>
>>> Match matches the whole string.

>>
>> *ONLY* if the pattern ends with "$" or r"\Z"

>
>
> You think so?
>
> import re
>
> rex = re.compile("abc.*def")
>
> if rex.match("abc0123455678def"):
> print "matched"
>

Your test is inconclusive: necessary, but not sufficient.

>>> rex = re.compile("abc.*def")
>>>
>>> if rex.match("abc0123455678defPLUSEXTRASTUFF"):

.... print "Matched"
....
Matched
>>>


regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

 
Reply With Quote
 
John Machin
Guest
Posts: n/a
 
      11-17-2008
On Nov 17, 10:19*am, "Diez B. Roggisch" <(E-Mail Removed)> wrote:
> John Machin schrieb:
>
> > On Nov 17, 4:44 am, "Diez B. Roggisch" <(E-Mail Removed)> wrote:

>
> >> *Match matches the whole string.

>
> > *ONLY* if the pattern ends with "$" or r"\Z"

>
> You think so?
>
> import re
>
> rex = re.compile("abc.*def")
>
> if rex.match("abc0123455678def"):
> * * *print "matched"
>


OK, I'll try again:

The following 3-tuples represent (pattern, string,
matched_portion_of_string):
('abc', 'abc', 'abc')
('abc', 'abcdef', 'abc')
('abc$', 'abc', 'abc')
('abc$', 'abcdef', '<no match>')

Saying "Match matches the whole string" is incorrect; see the second
case. If you want to ensure that the whole string matches the pattern,
the pattern needs to be terminated by "$" or "\Z".
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Python unicode utf-8 characters and MySQL unicode utf-8 characters Grzegorz Śliwiński Python 2 01-19-2011 07:31 AM
Remove only special characters and junk characters from a file rvino Perl 0 08-14-2007 07:23 AM
pointers to constant characters and constant pointers to characters sam_cit@yahoo.co.in C Programming 4 12-14-2006 11:10 PM
Convert Raw Text Escaped Characters to Characters nicholas.wakefield@gmail.com Java 2 07-11-2005 09:17 PM
Adding a delimiter inbetween number characters and letter characters toomanyjoes@mail.utexas.edu Perl Misc 54 01-16-2005 04:07 PM



Advertisments