Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   python3 Unicode is slow (http://www.velocityreviews.com/forums/t702872-python3-unicode-is-slow.html)

Dale Gerdemann 10-25-2009 12:12 PM

python3 Unicode is slow
 
I've written simple code in 2.6 and 3.0 to read every charcter of a
set of files and print out some information for each of these
characters. I tested each program on a large Cyrillic/Latin text. The
result was that the 2.6 version was about 5x faster. Here are the two
programs:

#!/usr/bin/env python

import sys
import codecs
import unicodedata
for path in sys.argv[1:]:
lines = codecs.open(path, encoding='UTF-8',
errors='replace').readlines()

for line in lines:
for c in line:
name = unicodedata.name(c,'unknown')
prnt = prnt_rep = c.encode('utf8')
if name == 'unknown':
prnt = ' '
if ord(c) > 127:
print('%s %-14r U+%04x %s' % (prnt, prnt_rep, ord(c),
name))
else:
if ord(c) == 9:
name = 'tab'
prnt = ' '
elif ord(c) == 10:
name = 'LF'
prnt = ' '
elif ord(c) == 13:
name = 'CR'
prnt = ' '
print("{0:s} '\\x{1:02x}' U+{2:04x}
{3:s}".format(
prnt, ord(c), ord(c), name))


#!/usr/bin/env python3

import sys
import unicodedata

for path in sys.argv[1:]:
lines = open(path, errors='replace').readlines()

for line in lines:
for c in line:
code_point = ord(c)
utf8 = c.encode()
if ord(c) <= 127:
utf8 = "b'\\" + hex(ord(c))[1:] + "'"
name = unicodedata.name(c,'unknown')
if name == 'unknown':
c = ' '
if code_point == 9:
c = ' '
name = 'tab'
elif code_point == 10:
c = ' '
name = 'LF'
elif code_point == 13:
c = ' '
name = 'CR'
print("{0:s} {1:15s} U+{2:04x} {3:s}".format(
c, utf8, code_point, name))



John Machin 10-25-2009 01:11 PM

Re: python3 Unicode is slow
 
On Oct 25, 11:12*pm, Dale Gerdemann <dale.gerdem...@googlemail.com>
wrote:
> I've written simple code in 2.6 and 3.0 to read every charcter of a
> set of files and print out some information for each of these
> characters. I tested each program on a large Cyrillic/Latin text. The
> result was that the 2.6 version was about 5x faster.


3.0? Nowadays nobody wants to know about benchmarks of 3.0. Much of
the new 3.X file I/O stuff was written in Python. It has since been
rewritten in C. In general AFAICT there is no good reason to be using
3.0. Consider updating to 3.1.1.


All times are GMT. The time now is 04:15 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.