Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Performance of int/long in Python 3

Reply
Thread Tools

Performance of int/long in Python 3

 
 
rusi
Guest
Posts: n/a
 
      04-03-2013
On Apr 3, 12:37*pm, Neil Hodgson <(E-Mail Removed)> wrote:
> * * Reran the programs taking a bit more care with the encoding of the
> file. This had no effect on the speeds. There are only a small amount of
> paths that don't fit into ASCII:
>
> ASCII 1076101
> Latin1 218
> BMP 113
> Astral 0
>
> # encoding:utf-8
> import codecs, os, time
> from os.path import join, getsize
> with codecs.open("filelist.txt", "r", "utf-8") as f:
> * * *paths = f.read().split("\n")
> bucket = [0,0,0,0]
> for p in paths:
> * * *b = 0
> * * *maxChar = max([ord(ch) for ch in p])
> * * *if maxChar >= 65536:
> * * * * *b = 3
> * * *elif maxChar >= 256:
> * * * * *b = 2
> * * *elif maxChar >= 128:
> * * * * *b = 1
> * * *bucket[b] = bucket[b] + 1
> print("ASCII", bucket[0])
> print("Latin1", bucket[1])
> print("BMP", bucket[2])
> print("Astral", bucket[3])
>
> * * Neil


Can you please try one more experiment Neil?
Knock off all non-ASCII strings (paths) from your dataset and try
again.

[It should take little more than converting your above code to a
filter:
if b == 0: print
if b > 0: ignore
]
 
Reply With Quote
 
 
 
 
jmfauth
Guest
Posts: n/a
 
      04-03-2013
--------

This FSR is wrong by design. A naive way to embrace Unicode.

jmf

 
Reply With Quote
 
 
 
 
Dave Angel
Guest
Posts: n/a
 
      04-03-2013
On 04/03/2013 04:22 AM, Neil Hodgson wrote:
> rusi:
>
>> Can you please try one more experiment Neil?
>> Knock off all non-ASCII strings (paths) from your dataset and try
>> again.

>
> Results are the same 0.40 (well, 0.001 less but I don't think the
> timer is that accurate) for Python 3.2 and 0.78 for Python 3.3.
>
> Neil


That would seem to imply that the speed regression on your data is NOT
caused by the differing size encodings. Perhaps it is the difference in
MSC compiler version, or other changes made between 3.2 and 3.3

Of course, I can't then explain why Steven didn't get the same results.
Perhaps the difference between 32bit Python and 64 on Windows? Or
perhaps you have significantly more (or significantly fewer)
"collisions" than Steven did.


Before I saw this message, I was thinking of suggesting that you supply
a key= parameter to sort, specifying as a key the Unicode character
65536 higher than the one supplied. That way all the keys to be sorted
would be 32 bits in size. If this made the timings change noticeably,
it could be a big clue.

--
DaveA
 
Reply With Quote
 
Mark Lawrence
Guest
Posts: n/a
 
      04-03-2013
On 03/04/2013 09:08, jmfauth wrote:
> --------
>
> This FSR is wrong by design. A naive way to embrace Unicode.
>
> jmf
>


The hole you're digging for yourself is getting bigger and bigger and
I'm loving it

--
If you're using GoogleCrap™ please read this
http://wiki.python.org/moin/GoogleGroupsPython.

Mark Lawrence

 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      04-03-2013
On 04/03/2013 07:05 AM, Neil Hodgson wrote:
> Dave Angel:
>
>> That would seem to imply that the speed regression on your data is NOT
>> caused by the differing size encodings. Perhaps it is the difference in
>> MSC compiler version, or other changes made between 3.2 and 3.3

>
> Its not caused by there actually being different size encodings but
> that the code is checking encoding size 2-4 times for each character.
>
> Back in 3.2 the comparison loop looked like:
>
> while (len1 > 0 && len2 > 0) {
> Py_UNICODE c1, c2;
>
> c1 = *s1++;
> c2 = *s2++;
>
> if (c1 != c2)
> return (c1 < c2) ? -1 : 1;
>
> len1--; len2--;
> }
>
> For 3.3 this has changed to
>
> for (i = 0; i < len1 && i < len2; ++i) {
> Py_UCS4 c1, c2;
> c1 = PyUnicode_READ(kind1, data1, i);
> c2 = PyUnicode_READ(kind2, data2, i);
>
> if (c1 != c2)
> return (c1 < c2) ? -1 : 1;
> }
>
> with PyUnicode_READ being
>
> #define PyUnicode_READ(kind, data, index) \
> ((Py_UCS4) \
> ((kind) == PyUnicode_1BYTE_KIND ? \
> ((const Py_UCS1 *)(data))[(index)] : \
> ((kind) == PyUnicode_2BYTE_KIND ? \
> ((const Py_UCS2 *)(data))[(index)] : \
> ((const Py_UCS4 *)(data))[(index)] \
> ) \
> ))
>
> There are either 1 or 2 kind checks in each call to PyUnicode_READ
> and 2 calls to PyUnicode_READ inside the loop. A compiler may decide to
> move the kind checks out of the loop and specialize the loop but MSVC
> 2010 appears to not do so.


I don't know how good MSC's template logic is, but it seems this would
be a good case for an explicit template, typed on the 'kind's values.
Or are all C++ features disabled when compiling Python? Failing that,
just code up 9 cases, and do a switch on the kinds.

I'm also puzzled. I thought that the sort algorithm used a hash of all
the items to be sorted, and only reverted to a raw comparison of the
original values when the hash collided. Is that not the case? Or is
the code you post here only used when the hash collides?


The assembler (32-bit build) for each
> PyUnicode_READ looks like
>
> mov ecx, DWORD PTR _kind1$[ebp]
> cmp ecx, 1
> jne SHORT $LN17@unicode_co@2
> lea ecx, DWORD PTR [ebx+eax]
> movzx edx, BYTE PTR [ecx+edx]
> jmp SHORT $LN16@unicode_co@2
> $LN17@unicode_co@2:
> cmp ecx, 2
> jne SHORT $LN15@unicode_co@2
> movzx edx, WORD PTR [ebx+edi]
> jmp SHORT $LN16@unicode_co@2
> $LN15@unicode_co@2:
> mov edx, DWORD PTR [ebx+esi]
> $LN16@unicode_co@2:


It appears that the compiler is keeping the three pointers in three
separate registers (eax, esi and edi) even though those are 3 aliases
for the same pointer. This is preventing it from putting other values
in those registers.

It'd probably do better if the C code manipulated the pointers, rather
than using an index i each time. But if it did, perhaps gcc would
generate worse code.

If I were coding the assembler by hand (Intel only), I'd be able to
avoid the multiple cmp operations, simply by comparing first to 2, then
doing a jne and a ja. I dunno whether the compiler would notice if I
coded the equivalent in C. (make both comparisons to 2, one for less,
and one for more)

>
> The kind1/kind2 variables aren't even going into registers and at
> least one test+branch and a jump are executed for every character. Two
> tests for 2 and 4 byte kinds. len1 and len2 don't get to go into
> registers either.
>
> Here's the full assembler output for unicode_compare:
>
> ; COMDAT _unicode_compare
> _TEXT SEGMENT
> _kind2$ = -20 ; size = 4
> _kind1$ = -16 ; size = 4
> _len2$ = -12 ; size = 4
> _len1$ = -8 ; size = 4
> _data2$ = -4 ; size = 4
> _unicode_compare PROC ; COMDAT
> ; _str1$ = ecx
> ; _str2$ = eax
>
> ; 10417: {
>
> push ebp
> mov ebp, esp
> sub esp, 20 ; 00000014H
> push ebx
> push esi
> mov esi, eax
>
> ; 10418: int kind1, kind2;
> ; 10419: void *data1, *data2;
> ; 10420: Py_ssize_t len1, len2, i;
> ; 10421:
> ; 10422: kind1 = PyUnicode_KIND(str1);
>
> mov eax, DWORD PTR [ecx+16]
> mov edx, eax
> shr edx, 2
> and edx, 7
> push edi
> mov DWORD PTR _kind1$[ebp], edx
>
> ; 10423: kind2 = PyUnicode_KIND(str2);
>
> mov edx, DWORD PTR [esi+16]
> mov edi, edx
> shr edi, 2
> and edi, 7
> mov DWORD PTR _kind2$[ebp], edi
>
> ; 10424: data1 = PyUnicode_DATA(str1);
>
> test al, 32 ; 00000020H
> je SHORT $LN9@unicode_co@2
> test al, 64 ; 00000040H
> je SHORT $LN7@unicode_co@2
> lea ebx, DWORD PTR [ecx+24]
> jmp SHORT $LN10@unicode_co@2
> $LN7@unicode_co@2:
> lea ebx, DWORD PTR [ecx+36]
> jmp SHORT $LN10@unicode_co@2
> $LN9@unicode_co@2:
> mov ebx, DWORD PTR [ecx+36]
> $LN10@unicode_co@2:
>
> ; 10425: data2 = PyUnicode_DATA(str2);
>
> test dl, 32 ; 00000020H
> je SHORT $LN13@unicode_co@2
> test dl, 64 ; 00000040H
> je SHORT $LN11@unicode_co@2
> lea edx, DWORD PTR [esi+24]
> jmp SHORT $LN30@unicode_co@2
> $LN11@unicode_co@2:
> lea eax, DWORD PTR [esi+36]
> mov DWORD PTR _data2$[ebp], eax
> mov edx, eax
> jmp SHORT $LN14@unicode_co@2
> $LN13@unicode_co@2:
> mov edx, DWORD PTR [esi+36]
> $LN30@unicode_co@2:
> mov DWORD PTR _data2$[ebp], edx
> $LN14@unicode_co@2:
>
> ; 10426: len1 = PyUnicode_GET_LENGTH(str1);
>
> mov edi, DWORD PTR [ecx+8]
>
> ; 10427: len2 = PyUnicode_GET_LENGTH(str2);
>
> mov ecx, DWORD PTR [esi+8]
>
> ; 10428:
> ; 10429: for (i = 0; i < len1 && i < len2; ++i) {
>
> xor eax, eax
> mov DWORD PTR _len1$[ebp], edi
> mov DWORD PTR _len2$[ebp], ecx
> test edi, edi
> jle SHORT $LN2@unicode_co@2
>
> ; 10426: len1 = PyUnicode_GET_LENGTH(str1);
>
> mov esi, edx
> mov edi, edx
>
> ; 10428:
> ; 10429: for (i = 0; i < len1 && i < len2; ++i) {
>
> sub ebx, edx
> jmp SHORT $LN4@unicode_co@2
> $LL28@unicode_co@2:
> mov edx, DWORD PTR _data2$[ebp]
> $LN4@unicode_co@2:
> cmp eax, ecx
> jge SHORT $LN29@unicode_co@2
>
> ; 10430: Py_UCS4 c1, c2;
> ; 10431: c1 = PyUnicode_READ(kind1, data1, i);
>
> mov ecx, DWORD PTR _kind1$[ebp]
> cmp ecx, 1
> jne SHORT $LN17@unicode_co@2
> lea ecx, DWORD PTR [ebx+eax]
> movzx edx, BYTE PTR [ecx+edx]
> jmp SHORT $LN16@unicode_co@2
> $LN17@unicode_co@2:
> cmp ecx, 2
> jne SHORT $LN15@unicode_co@2
> movzx edx, WORD PTR [ebx+edi]
> jmp SHORT $LN16@unicode_co@2
> $LN15@unicode_co@2:
> mov edx, DWORD PTR [ebx+esi]
> $LN16@unicode_co@2:
>
> ; 10432: c2 = PyUnicode_READ(kind2, data2, i);
>
> mov ecx, DWORD PTR _kind2$[ebp]
> cmp ecx, 1
> jne SHORT $LN21@unicode_co@2
> mov ecx, DWORD PTR _data2$[ebp]
> movzx ecx, BYTE PTR [eax+ecx]
> jmp SHORT $LN20@unicode_co@2
> $LN21@unicode_co@2:
> cmp ecx, 2
> jne SHORT $LN19@unicode_co@2
> movzx ecx, WORD PTR [edi]
> jmp SHORT $LN20@unicode_co@2
> $LN19@unicode_co@2:
> mov ecx, DWORD PTR [esi]
> $LN20@unicode_co@2:
>
> ; 10433:
> ; 10434: if (c1 != c2)
>
> cmp edx, ecx
> jne SHORT $LN31@unicode_co@2
> mov ecx, DWORD PTR _len2$[ebp]
> inc eax
> add edi, 2
> add esi, 4
> cmp eax, DWORD PTR _len1$[ebp]
> jl SHORT $LL28@unicode_co@2
> $LN29@unicode_co@2:
> mov edi, DWORD PTR _len1$[ebp]
> $LN2@unicode_co@2:
>
> ; 10436: }
> ; 10437:
> ; 10438: return (len1 < len2) ? -1 : (len1 != len2);
>
> cmp edi, ecx
> jge SHORT $LN23@unicode_co@2
> pop edi
> pop esi
> or eax, -1
> pop ebx
>
> ; 10439: }
>
> mov esp, ebp
> pop ebp
> ret 0
> $LN31@unicode_co@2:
>
> ; 10435: return (c1 < c2) ? -1 : 1;
>
> sbb eax, eax
> pop edi
> and eax, -2 ; fffffffeH
> pop esi
> inc eax
> pop ebx
>
> ; 10439: }
>
> mov esp, ebp
> pop ebp
> ret 0
> $LN23@unicode_co@2:
>
> ; 10436: }
> ; 10437:
> ; 10438: return (len1 < len2) ? -1 : (len1 != len2);
>
> xor eax, eax
> cmp edi, ecx
> pop edi
> pop esi
> setne al
> pop ebx
>
> ; 10439: }
>
> mov esp, ebp
> pop ebp
> ret 0
> _unicode_compare ENDP
>
> Neil



--
DaveA
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      04-03-2013
In article <(E-Mail Removed) >,
Neil Hodgson <(E-Mail Removed)> wrote:

> Roy Smith:
>
> > On the other hand, how long did it take you to do the directory tree
> > walk required to find those million paths? I'll bet a long longer than
> > 0.78 seconds, so this gets lost in the noise.

>
> About 2 minutes. But that's just getting an example data set. Other
> data sets may be loaded more quickly from databases or files or be
> created by processing. Reading the example data from a file takes around
> the same time as sorting.


Fair enough. In fact, given that reading the file from disk is O(n) and
sorting it is O(n log n), at some point, the sort will totally swamp the
input time. Your original example just happened to be one of the
unusual cases where the sort time is not the rate limiting factor in the
overall process.

I remember reading somewhere that more CPU cycles in the entire history
of computing have been spend doing sorting than anything else.
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      04-03-2013
In article <(E-Mail Removed)>,
Chris Angelico <(E-Mail Removed)> wrote:

> On Wed, Apr 3, 2013 at 3:03 PM, Neil Hodgson <(E-Mail Removed)> wrote:
> > rusi wrote:
> > "Every program attempts to expand until it can read mail. Those programs
> > which cannot so expand are replaced by ones which can."

>
> In my personal experience, it's calculators. I put command-line
> calculators into *everything*... often in the form of more general
> executors, and thus restricted to admins, but it's still a calculator.
>
> For some reason, the ability to type "calc 1+2" and get back 3 is very
> satisfying to me. You know, in case I ever forget what one plus two
> makes.


I discovered recently that Spotlight (the OSX built-in search engine)
can do this.
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      04-03-2013
On Thu, Apr 4, 2013 at 12:25 AM, Roy Smith <(E-Mail Removed)> wrote:
>
> Fair enough. In fact, given that reading the file from disk is O(n) and
> sorting it is O(n log n), at some point, the sort will totally swamp the
> input time.


But given the much larger fixed cost of disk access, that might take
an awful lot of strings...

ChrisA
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      04-03-2013
On Thu, Apr 4, 2013 at 12:28 AM, Roy Smith <(E-Mail Removed)> wrote:
> In article <(E-Mail Removed)>,
> Chris Angelico <(E-Mail Removed)> wrote:
>
>> On Wed, Apr 3, 2013 at 3:03 PM, Neil Hodgson <(E-Mail Removed)> wrote:
>> > rusi wrote:
>> > "Every program attempts to expand until it can read mail. Those programs
>> > which cannot so expand are replaced by ones which can."

>>
>> In my personal experience, it's calculators. I put command-line
>> calculators into *everything*... often in the form of more general
>> executors, and thus restricted to admins, but it's still a calculator.
>>
>> For some reason, the ability to type "calc 1+2" and get back 3 is very
>> satisfying to me. You know, in case I ever forget what one plus two
>> makes.

>
> I discovered recently that Spotlight (the OSX built-in search engine)
> can do this.


Good feature, not surprising. Google Search has had that feature for a
while, and it just "feels right" to be able to look up information the
same way regardless of its source.

ChrisA
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      04-03-2013
In article <515be00e$0$29891$c3e8da3$(E-Mail Removed) om>,
Steven D'Aprano <(E-Mail Removed)> wrote:

> On Wed, 03 Apr 2013 18:24:25 +1100, Chris Angelico wrote:
>
> > On Wed, Apr 3, 2013 at 6:06 PM, Ian Kelly <(E-Mail Removed)> wrote:
> >> On Wed, Apr 3, 2013 at 12:52 AM, Chris Angelico <(E-Mail Removed)>
> >> wrote:
> >>> Hmm. I was about to say "Can you just do a quick collections.Counter()
> >>> of the string widths in 3.3, as an easy way of seeing which ones use
> >>> BMP or higher characters", but I can't find a simple way to query a
> >>> string's width. Can't see it as a method of the string object, nor in
> >>> the string or sys modules. It ought to be easy enough at the C level -
> >>> just look up the two bits representing 'kind' - but I've not found it
> >>> exposed to Python. Is there anything?
> >>
> >> 4 if max(map(ord, s)) > 0xffff else 2 if max(map(ord, s)) > 0xff else 1

> >
> > Yeah, that's iterating over the whole string (twice, if it isn't width
> > 4).

>
> Then don't write it as a one-liner
>
> n = max(map(ord, s))
> 4 if n > 0xffff else 2 if n > 0xff else 1


This has to inspect the entire string, no? I posted (essentially) this
a few days ago:

if all(ord(c) <= 0xffff for c in s):
return "it's all bmp"
else:
return "it's got astral crap in it"

I'm reasonably sure all() is smart enough to stop at the first False
value.


> (sys.getsizeof(s) - sys.getsizeof(''))/len(s)
>

I wouldn't trust getsizeof() to return exactly what you're looking for.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance Tutorials Services - Boosting Performance by DisablingUnnecessary Services on Windows XP Home Edition Software Engineer Javascript 0 06-10-2011 02:18 AM
Re: Performance (pystone) of python 2.4 lower then python 2.3 ??? Andreas Kostyrka Python 0 12-17-2004 02:00 PM
Performance (pystone) of python 2.4 lower then python 2.3 ??? Lucas Hofman Python 13 12-16-2004 03:24 AM
RE: Straw poll on Python performance (was Re: Python is far from atop performer ...) Robert Brewer Python 1 01-10-2004 06:54 AM
Web Form Performance Versus Single File Performance jm ASP .Net 1 12-12-2003 11:14 PM



Advertisments