Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > sre is broken in SuSE 9.2

Reply
Thread Tools

sre is broken in SuSE 9.2

 
 
Denis S. Otkidach
Guest
Posts: n/a
 
      02-10-2005
On all platfroms \w matches all unicode letters when used with flag
re.UNICODE, but this doesn't work on SuSE 9.2:

Python 2.3.4 (#1, Dec 17 2004, 19:56:4
[GCC 3.3.4 (pre 3.3.5 20040809)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile(ur'\w+', re.U).match(u'\xe4')
>>>


BTW, is correctly recognize this character as lowercase letter:
>>> import unicodedata
>>> unicodedata.category(u'\xe4')

'Ll'

I've looked through all SuSE patches applied, but found nothing related.
What is the reason for broken behavior? Incorrect configure options?

--
Denis S. Otkidach
http://www.python.ru/ [ru]
 
Reply With Quote
 
 
 
 
Serge Orlov
Guest
Posts: n/a
 
      02-10-2005
Denis S. Otkidach wrote:
> On all platfroms \w matches all unicode letters when used with flag
> re.UNICODE, but this doesn't work on SuSE 9.2:
>
> Python 2.3.4 (#1, Dec 17 2004, 19:56:4
> [GCC 3.3.4 (pre 3.3.5 20040809)] on linux2
> Type "help", "copyright", "credits" or "license" for more

information.
> >>> import re
> >>> re.compile(ur'\w+', re.U).match(u'\xe4')
> >>>

>
> BTW, is correctly recognize this character as lowercase letter:
> >>> import unicodedata
> >>> unicodedata.category(u'\xe4')

> 'Ll'
>
> I've looked through all SuSE patches applied, but found nothing

related.
> What is the reason for broken behavior? Incorrect configure options?


I can get the same results on RedHat's python 2.2.3 if I pass re.L
option, it looks like this option is implicitly set in Suse.

Serge

 
Reply With Quote
 
 
 
 
Denis S. Otkidach
Guest
Posts: n/a
 
      02-10-2005
On 10 Feb 2005 03:59:51 -0800
"Serge Orlov" <(E-Mail Removed)> wrote:

> > On all platfroms \w matches all unicode letters when used with flag
> > re.UNICODE, but this doesn't work on SuSE 9.2:

[...]
> I can get the same results on RedHat's python 2.2.3 if I pass re.L
> option, it looks like this option is implicitly set in Suse.


Looks like you are right:

>>> import re
>>> re.compile(ur'\w+', re.U).match(u'\xe4')
>>> from locale import *
>>> setlocale(LC_ALL, 'de_DE')

'de_DE'
>>> re.compile(ur'\w+', re.U).match(u'\xe4')

<_sre.SRE_Match object at 0x40375560>

But I see nothing related to implicit re.L option in their patches and
the sources themselves are the same as on other platforms. I'd prefer
to find the source of problem.

--
Denis S. Otkidach
http://www.python.ru/ [ru]
 
Reply With Quote
 
Daniel Dittmar
Guest
Posts: n/a
 
      02-10-2005
Denis S. Otkidach wrote:

> On all platfroms \w matches all unicode letters when used with flag
> re.UNICODE, but this doesn't work on SuSE 9.2:


I think Python on SuSE 9.2 uses UCS4 for unicode strings (as does
RedHat), check sys.maxunicode.

This is not an explanation, but perhaps a hint where to look.

Daniel
 
Reply With Quote
 
Denis S. Otkidach
Guest
Posts: n/a
 
      02-10-2005
On Thu, 10 Feb 2005 16:23:09 +0100
Daniel Dittmar <(E-Mail Removed)> wrote:

> Denis S. Otkidach wrote:
>
> > On all platfroms \w matches all unicode letters when used with flag
> > re.UNICODE, but this doesn't work on SuSE 9.2:

>
> I think Python on SuSE 9.2 uses UCS4 for unicode strings (as does
> RedHat), check sys.maxunicode.
>
> This is not an explanation, but perhaps a hint where to look.


Yes, it uses UCS4. But debian build with UCS4 works fine, so this is
not a problem. Can --with-wctype-functions configure option be the
source of problem?

--
Denis S. Otkidach
http://www.python.ru/ [ru]
 
Reply With Quote
 
Fredrik Lundh
Guest
Posts: n/a
 
      02-10-2005
Denis S. Otkidach wrote:

>> > On all platfroms \w matches all unicode letters when used with flag
>> > re.UNICODE, but this doesn't work on SuSE 9.2:

>>
>> I think Python on SuSE 9.2 uses UCS4 for unicode strings (as does
>> RedHat), check sys.maxunicode.
>>
>> This is not an explanation, but perhaps a hint where to look.

>
> Yes, it uses UCS4. But debian build with UCS4 works fine, so this is
> not a problem. Can --with-wctype-functions configure option be the
> source of problem?


yes.

that option disables Python's own Unicode database, and relies on the C library's
wctype.h (iswalpha, etc) to behave properly for Unicode characters. this isn't true
for all environments.

is this an official SuSE release? do they often release stuff that hasn't been tested
at all?

</F>



 
Reply With Quote
 
Serge Orlov
Guest
Posts: n/a
 
      02-10-2005
Denis S. Otkidach wrote:
> On 10 Feb 2005 03:59:51 -0800
> "Serge Orlov" <(E-Mail Removed)> wrote:
>
> > > On all platfroms \w matches all unicode letters when used with

flag
> > > re.UNICODE, but this doesn't work on SuSE 9.2:

> [...]
> > I can get the same results on RedHat's python 2.2.3 if I pass re.L
> > option, it looks like this option is implicitly set in Suse.

>
> Looks like you are right:
>
> >>> import re
> >>> re.compile(ur'\w+', re.U).match(u'\xe4')
> >>> from locale import *
> >>> setlocale(LC_ALL, 'de_DE')

> 'de_DE'
> >>> re.compile(ur'\w+', re.U).match(u'\xe4')

> <_sre.SRE_Match object at 0x40375560>
>
> But I see nothing related to implicit re.L option in their patches
> and the sources themselves are the same as on other platforms. I'd
> prefer to find the source of problem.


I found that

print u'\xc4'.isalpha()
import locale
print locale.getlocale()

produces different results on Suse (python 2.3.3)

False
(None, None)


and RedHat (python 2.2.3)

1
(None, None)

Serge.

 
Reply With Quote
 
Denis S. Otkidach
Guest
Posts: n/a
 
      02-10-2005
On Thu, 10 Feb 2005 17:46:06 +0100
"Fredrik Lundh" <(E-Mail Removed)> wrote:

> > Can --with-wctype-functions configure option be the
> > source of problem?

>
> yes.
>
> that option disables Python's own Unicode database, and relies on the C library's
> wctype.h (iswalpha, etc) to behave properly for Unicode characters. this isn't true
> for all environments.
>
> is this an official SuSE release? do they often release stuff that hasn't been tested
> at all?


Yes, it's official release:
# rpm -qi python
Name : python Relocations: (not relocatable)
Version : 2.3.4 Vendor: SUSE LINUX AG, Nuernberg, Germany
Release : 3 Build Date: Tue Oct 5 02:28:25 2004
Install date: Fri Jan 28 13:53:49 2005 Build Host: gambey.suse.de
Group : Development/Languages/Python Source RPM: python-2.3.4-3.src.rpm
Size : 15108594 License: Artistic License, Other License(s), see package
Signature : DSA/SHA1, Tue Oct 5 02:42:38 2004, Key ID a84edae89c800aca
Packager : http://www.suse.de/feedback
URL : http://www.python.org/
Summary : Python Interpreter
<snip>

BTW, where have they found something with Artistic License in Python?

--
Denis S. Otkidach
http://www.python.ru/ [ru]
 
Reply With Quote
 
Serge Orlov
Guest
Posts: n/a
 
      02-10-2005
Denis S. Otkidach wrote:
> On all platfroms \w matches all unicode letters when used with flag
> re.UNICODE, but this doesn't work on SuSE 9.2:
>
> Python 2.3.4 (#1, Dec 17 2004, 19:56:4
> [GCC 3.3.4 (pre 3.3.5 20040809)] on linux2
> Type "help", "copyright", "credits" or "license" for more

information.
> >>> import re
> >>> re.compile(ur'\w+', re.U).match(u'\xe4')
> >>>

>
> BTW, is correctly recognize this character as lowercase letter:
> >>> import unicodedata
> >>> unicodedata.category(u'\xe4')

> 'Ll'
>
> I've looked through all SuSE patches applied, but found nothing
> related. What is the reason for broken behavior? Incorrect
> configure options?


To summarize the discussion: either it's a bug in glibc or there is an
option to specify modern POSIX locale. POSIX locale consist of
characters from the portable character set, unicode is certainly
portable.

Serge.

 
Reply With Quote
 
Fredrik Lundh
Guest
Posts: n/a
 
      02-10-2005
Peter Maas wrote:

>> To summarize the discussion: either it's a bug in glibc or there is an
>> option to specify modern POSIX locale. POSIX locale consist of
>> characters from the portable character set, unicode is certainly
>> portable.

>
> What about the environment variable LANG? I have SuSE 9.1 and
> LANG = de_DE.UTF-8. Your example is running well on my computer.


Python's Unicode subsystem shouldn't depend on the system's LANG
setting.

</F>



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
"re" vs "sre"? Lawrence D'Oliveiro Python 2 09-23-2006 12:14 PM
RE vs. SRE Yoav Python 1 08-21-2005 09:26 AM
SRE (Simple Rule Engine) Yahoo ASP .Net Web Controls 3 06-30-2005 07:54 AM
shelve/bsddb broken in Python 2.3.3 (SuSe 9.1)? Glenn R Williams Python 0 08-14-2004 07:51 PM
Is it bug or feature in sre? Roman Suzi Python 0 01-05-2004 09:08 AM



Advertisments