# matching exactly a 4 digit number in python

harijay
 11-21-2008
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

Here is my test program output and the test given below
Harijay

PyMate r8111 running Python 2.5.1 (/usr/bin/python)
>>> testdigit.py

Matched I have 2004 rupees
Matched I have 3324234 and more
Matched As 3233
Matched 2323423414 is good
Matched 4444 dc sav 2412441 asdf
SKIPPED random1341also and also
SKIPPED
SKIPPED 13
Matched a 1331 saves
SKIPPED A has 13123123
SKIPPED A 13123
Matched 1312 times I have told you
DONE

#!/usr/bin/python
import re
x = [" I have 2004 rupees "," I have 3324234 and more" , " As 3233 " ,
"2323423414 is good","4444 dc sav 2412441 asdf " , "random1341also and
also" ,"","13"," a 1331 saves" ," and and as dad"," A has 13123123","
A 13123","123 adn","1312 times I have told you"]

p = re.compile(r'\d{4} ')

for elem in x:
if re.search(p,elem):
print "Matched " + elem
else:
print "SKIPPED " + elem

print "DONE"

Mr.SpOOn
 11-21-2008
Try with this:

p = re.compile(r'\d{4}\$')

The \$ character matches the end of the string. It should work.

John Machin
 11-21-2008
No it doesn't. When used with re.search on that string it matches
3324, it doesn't "match" the whole sentence.

>
> I am very confused. Shouldnt the \d{4,} match exactly four digit
> numbers so a 5 digit number sentence should not be matched .

{4} does NOT mean the same as {4,}.
{4} is the same as {4,4}
{4,} means {4,INFINITY}

Ignoring {4,}:

You need to specify a regex that says "4 digits followed by (non-digit
or end-of-string)". Have a try at that and come back here if you have
any more problems.

some test data:
xxx1234
xxx12345
xxx1234xxx
xxx12345xxx
xxx1234xxx1235xxx
xxx12345xxx1234xxx

skip@pobox.com
 11-21-2008

>>> pat = re.compile(r"(?<!\d)(\d{4})(?!\d)")>>> for s in x:

... m = pat.search(s)
... print repr(s),
... print (m is not None) and "matches" or "does not match"
...
' I have 2004 rupees ' matches
' I have 3324234 and more' does not match
' As 3233 ' matches
'2323423414 is good' does not match
'4444 dc sav 2412441 asdf ' matches
'random1341also and also' matches
'' does not match
'13' does not match
' a 1331 saves' matches
' and and as dad' does not match
' A has 13123123' does not match
'A 13123' does not match
'1312 times I have told you' matches

George Sakkis
 11-21-2008
No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:

p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)

HTH,
George

MRAB
 11-21-2008
You want to match a sequence of 4 digits: \d{4}
not preceded by a digit: (?<!\d)
not followed by a digit: (?!\d)

which is: re.compile(r'(?<!\d)\d{4}(?!\d)')

harijay
 11-21-2008
Thanks John Machin and Mark Tolonen ..
SO I guess the correct one is to use the word boundary meta character
"\b"

so r'\b\d{4}\b' is what I need since it reads

a 4 digit number in between word boundaries

Thanks a tonne, and this being my second post to comp.lang.python. I
am always amazed at how helpful everyone on this group is

Hari

