Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > matching exactly a 4 digit number in python

Reply
Thread Tools

matching exactly a 4 digit number in python

 
 
harijay
Guest
Posts: n/a
 
      11-21-2008
Hi
I am a few months new into python. I have used regexps before in perl
and java but am a little confused with this problem.

I want to parse a number of strings and extract only those that
contain a 4 digit number anywhere inside a string

However the regexp
p = re.compile(r'\d{4}')

Matches even sentences that have longer than 4 numbers inside
strings ..for example it matches "I have 3324234 and more"

I am very confused. Shouldnt the \d{4,} match exactly four digit
numbers so a 5 digit number sentence should not be matched .

Here is my test program output and the test given below
Thanks for your help
Harijay

PyMate r8111 running Python 2.5.1 (/usr/bin/python)
>>> testdigit.py


Matched I have 2004 rupees
Matched I have 3324234 and more
Matched As 3233
Matched 2323423414 is good
Matched 4444 dc sav 2412441 asdf
SKIPPED random1341also and also
SKIPPED
SKIPPED 13
Matched a 1331 saves
SKIPPED and and as dad
SKIPPED A has 13123123
SKIPPED A 13123
SKIPPED 123 adn
Matched 1312 times I have told you
DONE

#!/usr/bin/python
import re
x = [" I have 2004 rupees "," I have 3324234 and more" , " As 3233 " ,
"2323423414 is good","4444 dc sav 2412441 asdf " , "random1341also and
also" ,"","13"," a 1331 saves" ," and and as dad"," A has 13123123","
A 13123","123 adn","1312 times I have told you"]

p = re.compile(r'\d{4} ')

for elem in x:
if re.search(p,elem):
print "Matched " + elem
else:
print "SKIPPED " + elem

print "DONE"
 
Reply With Quote
 
 
 
 
Mr.SpOOn
Guest
Posts: n/a
 
      11-21-2008
2008/11/21 harijay <>:
> Hi
> I am a few months new into python. I have used regexps before in perl
> and java but am a little confused with this problem.
>
> I want to parse a number of strings and extract only those that
> contain a 4 digit number anywhere inside a string
>
> However the regexp
> p = re.compile(r'\d{4}')
>
> Matches even sentences that have longer than 4 numbers inside
> strings ..for example it matches "I have 3324234 and more"


Try with this:

p = re.compile(r'\d{4}$')

The $ character matches the end of the string. It should work.
 
Reply With Quote
 
 
 
 
John Machin
Guest
Posts: n/a
 
      11-21-2008
On Nov 22, 8:46*am, harijay <hari...@gmail.com> wrote:
> Hi
> I am a few months new into python. I have used regexps before in perl
> and java but am a little confused with this problem.
>
> I want to parse a number of strings and extract only those that
> contain a 4 digit number anywhere inside a string
>
> However the regexp
> p = re.compile(r'\d{4}')
>
> Matches even sentences that have longer than 4 numbers inside
> strings ..for example it matches "I have 3324234 and more"


No it doesn't. When used with re.search on that string it matches
3324, it doesn't "match" the whole sentence.

>
> I am very confused. Shouldnt the \d{4,} match exactly four digit
> numbers so a 5 digit number sentence should not be matched .


{4} does NOT mean the same as {4,}.
{4} is the same as {4,4}
{4,} means {4,INFINITY}

Ignoring {4,}:

You need to specify a regex that says "4 digits followed by (non-digit
or end-of-string)". Have a try at that and come back here if you have
any more problems.

some test data:
xxx1234
xxx12345
xxx1234xxx
xxx12345xxx
xxx1234xxx1235xxx
xxx12345xxx1234xxx

 
Reply With Quote
 
skip@pobox.com
Guest
Posts: n/a
 
      11-21-2008

>> I am a few months new into python. I have used regexps before in perl
>> and java but am a little confused with this problem.


>> I want to parse a number of strings and extract only those that
>> contain a 4 digit number anywhere inside a string


>> However the regexp
>> p = re.compile(r'\d{4}')


>> Matches even sentences that have longer than 4 numbers inside strings
>> ..for example it matches "I have 3324234 and more"


Try this instead:

>>> pat = re.compile(r"(?<!\d)(\d{4})(?!\d)")>>> for s in x:

... m = pat.search(s)
... print repr(s),
... print (m is not None) and "matches" or "does not match"
...
' I have 2004 rupees ' matches
' I have 3324234 and more' does not match
' As 3233 ' matches
'2323423414 is good' does not match
'4444 dc sav 2412441 asdf ' matches
'random1341also and also' matches
'' does not match
'13' does not match
' a 1331 saves' matches
' and and as dad' does not match
' A has 13123123' does not match
'A 13123' does not match
'123 adn' does not match
'1312 times I have told you' matches

--
Skip Montanaro - - http://smontanaro.dyndns.org/
 
Reply With Quote
 
George Sakkis
Guest
Posts: n/a
 
      11-21-2008
On Nov 21, 4:46*pm, harijay <hari...@gmail.com> wrote:

> Hi
> I am a few months new into python. I have used regexps before in perl
> and java but am a little confused with this problem.
>
> I want to parse a number of strings and extract only those that
> contain a 4 digit number anywhere inside a string
>
> However the regexp
> p = re.compile(r'\d{4}')
>
> Matches even sentences that have longer than 4 numbers inside
> strings ..for example it matches "I have 3324234 and more"
>
> I am very confused. Shouldnt the \d{4,} match exactly four digit
> numbers so a 5 digit number sentence should not be matched .


No, why should it ? What you're saying is "give me 4 consecutive
digits", without specifying what should precede or follow these
digits. A correct expression is a bit more hairy:

p = re.compile(r'''
(?:\D|\b) # find a non-digit or word boundary..
(\d{4}) # .. followed by the 4 digits to be matched as group
#1..
(?:\D|\b) # .. which are followed by non-digit or word boundary
''', re.VERBOSE)


HTH,
George
 
Reply With Quote
 
MRAB
Guest
Posts: n/a
 
      11-21-2008
George Sakkis wrote:
> On Nov 21, 4:46 pm, harijay <hari...@gmail.com> wrote:
>
>> Hi
>> I am a few months new into python. I have used regexps before in perl
>> and java but am a little confused with this problem.
>>
>> I want to parse a number of strings and extract only those that
>> contain a 4 digit number anywhere inside a string
>>
>> However the regexp
>> p = re.compile(r'\d{4}')
>>
>> Matches even sentences that have longer than 4 numbers inside
>> strings ..for example it matches "I have 3324234 and more"
>>
>> I am very confused. Shouldnt the \d{4,} match exactly four digit
>> numbers so a 5 digit number sentence should not be matched .

>
> No, why should it ? What you're saying is "give me 4 consecutive
> digits", without specifying what should precede or follow these
> digits. A correct expression is a bit more hairy:
>
> p = re.compile(r'''
> (?:\D|\b) # find a non-digit or word boundary..
> (\d{4}) # .. followed by the 4 digits to be matched as group
> #1..
> (?:\D|\b) # .. which are followed by non-digit or word boundary
> ''', re.VERBOSE)
>

You want to match a sequence of 4 digits: \d{4}
not preceded by a digit: (?<!\d)
not followed by a digit: (?!\d)

which is: re.compile(r'(?<!\d)\d{4}(?!\d)')
 
Reply With Quote
 
harijay
Guest
Posts: n/a
 
      11-21-2008
Thanks John Machin and Mark Tolonen ..
SO I guess the correct one is to use the word boundary meta character
"\b"

so r'\b\d{4}\b' is what I need since it reads

a 4 digit number in between word boundaries

Thanks a tonne, and this being my second post to comp.lang.python. I
am always amazed at how helpful everyone on this group is

Hari

On Nov 21, 5:12*pm, John Machin <sjmac...@lexicon.net> wrote:
> On Nov 22, 8:46*am, harijay <hari...@gmail.com> wrote:
>
> > Hi
> > I am a few months new into python. I have used regexps before in perl
> > and java but am a little confused with this problem.

>
> > I want to parse a number of strings and extract only those that
> > contain a 4 digit number anywhere inside a string

>
> > However the regexp
> > p = re.compile(r'\d{4}')

>
> > Matches even sentences that have longer than 4 numbers inside
> > strings ..for example it matches "I have 3324234 and more"

>
> No it doesn't. When used with re.search on that string it matches
> 3324, it doesn't "match" the whole sentence.
>
>
>
> > I am very confused. Shouldnt the \d{4,} match exactly four digit
> > numbers so a 5 digit number sentence should not be matched .

>
> {4} does NOT mean the same as {4,}.
> {4} is the same as {4,4}
> {4,} means {4,INFINITY}
>
> Ignoring {4,}:
>
> You need to specify a regex that says "4 digits followed by (non-digit
> or end-of-string)". Have a try at that and come back here if you have
> any more problems.
>
> some test data:
> xxx1234
> xxx12345
> xxx1234xxx
> xxx12345xxx
> xxx1234xxx1235xxx
> xxx12345xxx1234xxx


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
(8-bit binary to two digit bcd) or (8-bit binary to two digit seven segment) Fangs VHDL 3 10-26-2008 06:41 AM
Regular Expression - Matching Multiples of 3 Characters exactly. blaine Python 6 04-28-2008 05:23 PM
regex matching exactly 10 digits jtbutler78@comcast.net Perl Misc 7 11-29-2006 01:30 AM
find digit length or the number of numbers in number ? Steven C Programming 8 02-03-2006 04:21 PM
How exactly do you use 2-digit Sipgate shortdial? Joe Harrison UK VOIP 1 09-08-2005 07:28 PM



Advertisments