Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Performance of int/long in Python 3

Reply
Thread Tools

Performance of int/long in Python 3

 
 
rusi
Guest
Posts: n/a
 
      04-02-2013
On Apr 2, 8:17*pm, Ethan Furman <(E-Mail Removed)> wrote:

> Simmons (too many Steves!), I know you're new so don't have all the history with jmf that many
> of us do, but consider that the original post was about numbers, had nothing to do with
> characters or unicode *in any way*, and yet jmf still felt the need to bring unicode up.


Just for reference, here is the starting para of Chris' original mail
that started this thread.

> The Python 3 merge of int and long has effectively penalized
> small-number arithmetic by removing an optimization. As we've seen
> from PEP 393 strings (jmf aside), there can be huge benefits from
> having a single type with multiple representations internally. Is
> there value in making the int type have a machine-word optimization in
> the same way?


ie it mentions numbers, strings, PEP 393 *AND jmf.* So while it is
true that jmf has been butting in with trollish behavior into
completely unrelated threads with his unicode rants, that cannot be
said for this thread.
 
Reply With Quote
 
 
 
 
rusi
Guest
Posts: n/a
 
      04-02-2013
On Apr 2, 8:12*pm, jmfauth <(E-Mail Removed)> wrote:
>
> Sorrry I never claimed this, I'm just seeing on how Python is becoming
> less Unicode friendly.


jmf: I suggest you try to use less emotionally loaded and more precise
language if you want people to pay heed to your technical observations/
contributions.
In particular, while you say unicode, your examples always (as far as
I remember) refer to BMP.
Also words like 'friendly' are so emotionally charged that people stop
being friendly

So may I suggest that you rephrase your complaint as
"I am seeing python is becoming poorly performant on BMP-chars at the
expense of correct support for the whole (6.0?) charset"

(assuming thats what you want to say)

In any case PLEASE note that 'performant' and 'correct' are different
for most practical purposes.
If you dont respect this semantics, people are unlikely to pay heed to
your complaints.
 
Reply With Quote
 
 
 
 
jmfauth
Guest
Posts: n/a
 
      04-02-2013
On 2 avr, 18:57, rusi <(E-Mail Removed)> wrote:
> On Apr 2, 8:17*pm, Ethan Furman <(E-Mail Removed)> wrote:
>
> > Simmons (too many Steves!), I know you're new so don't have all the history with jmf that many
> > of us do, but consider that the original post was about numbers, had nothing to do with
> > characters or unicode *in any way*, and yet jmf still felt the need to bring unicode up.

>
> Just for reference, here is the starting para of Chris' original mail
> that started this thread.
>
> > The Python 3 merge of int and long has effectively penalized
> > small-number arithmetic by removing an optimization. As we've seen
> > from PEP 393 strings (jmf aside), there can be huge benefits from
> > having a single type with multiple representations internally. Is
> > there value in making the int type have a machine-word optimization in
> > the same way?

>
> ie it mentions numbers, strings, PEP 393 *AND jmf.* *So while it is
> true that jmf has been butting in with trollish behavior into
> completely unrelated threads with his unicode rants, that cannot be
> said for this thread.


-----

That's because you did not understand the analogy, int/long <-> FSR.

One another illustration,

>>> def AddOne(i):

.... if 0 < i <= 100:
.... return i + 10 + 10 + 10 - 10 - 10 - 10 + 1
.... elif 100 < i <= 1000:
.... return i + 100 + 100 + 100 + 100 - 100 - 100 - 100 - 100
+ 1
.... else:
.... return i + 1
....

Do it work? yes.
Is is "correct"? this can be discussed.

Now replace i by a char, a representent of each "subset"
of the FSR, select a method where this FST behave badly
and take a look of what happen.


>>> timeit.repeat("'a' * 1000 + 'z'")

[0.6532032148133153, 0.6407248807756699, 0.6407264561239894]
>>> timeit.repeat("'a' * 1000 + '9'")

[0.6429508479509245, 0.6242782443215589, 0.6240490311410927]
>>>


>>> timeit.repeat("'a' * 1000 + '€'")

[1.095694927496563, 1.0696347279235603, 1.0687741939041082]
>>> timeit.repeat("'a' * 1000 + 'ẞ'")

[1.0796421281222877, 1.0348612767961853, 1.035325216876231]
>>> timeit.repeat("'a' * 1000 + '\u2345'")

[1.0855414137412112, 1.0694677410017164, 1.0688096392412945]
>>>


>>> timeit.repeat("'œ' * 1000 + '\U00010001'")

[1.237314015362017, 1.2226262553064657, 1.21994619397816]
>>> timeit.repeat("'œ' * 1000 + '\U00010002'")

[1.245773635836997, 1.2303978424029651, 1.2258257877430765]

Where does it come from? Simple, the FSR breaks the
simple rules used in all coding schemes (unicode or not).
1) a unique set of chars
2) the "same" algorithm for all chars.

And again that's why utf-8 is working very smoothly.

The "corporates" which understood this very well and
wanted to incorporate, let say, the used characters
of the French language had only the choice to
create new coding schemes (eg mac-roman, cp1252).

In unicode, the "latin-1" range is real plague.

After years of experience, I'm still fascinated to see
the corporates has solved this issue easily and the "free
software" is still relying on latin-1.
I never succeed to find an explanation.

Even, the TeX folks, when they shifted to the Cork
encoding in 199?, were aware of this and consequently
provides special package(s).

No offense, this is in my mind why "corporate software"
will always be "corporate software" and "hobbyist software"
will always stay at the level of "hobbyist software".

A French windows user, understanding nothing in the
coding of characters, assuming he is aware of its
existence (!), has certainly no problem.


Fascinating how it is possible to use Python to teach,
to illustrate, to explain the coding of the characters. No?


jmf

 
Reply With Quote
 
rusi
Guest
Posts: n/a
 
      04-02-2013
On Apr 2, 11:22*pm, jmfauth <(E-Mail Removed)> wrote:
> On 2 avr, 18:57, rusi <(E-Mail Removed)> wrote:
>
>
>
>
>
>
>
>
>
> > On Apr 2, 8:17*pm, Ethan Furman <(E-Mail Removed)> wrote:

>
> > > Simmons (too many Steves!), I know you're new so don't have all the history with jmf that many
> > > of us do, but consider that the original post was about numbers, had nothing to do with
> > > characters or unicode *in any way*, and yet jmf still felt the need to bring unicode up.

>
> > Just for reference, here is the starting para of Chris' original mail
> > that started this thread.

>
> > > The Python 3 merge of int and long has effectively penalized
> > > small-number arithmetic by removing an optimization. As we've seen
> > > from PEP 393 strings (jmf aside), there can be huge benefits from
> > > having a single type with multiple representations internally. Is
> > > there value in making the int type have a machine-word optimization in
> > > the same way?

>
> > ie it mentions numbers, strings, PEP 393 *AND jmf.* *So while it is
> > true that jmf has been butting in with trollish behavior into
> > completely unrelated threads with his unicode rants, that cannot be
> > said for this thread.

>
> -----
>
> That's because you did not understand the analogy, int/long <-> FSR.
>
> One another illustration,
>
> >>> def AddOne(i):

>
> ... * * if 0 < i <= 100:
> ... * * * * return i + 10 + 10 + 10 - 10 - 10 - 10 + 1
> ... * * elif 100 < i <= 1000:
> ... * * * * return i + 100 + 100 + 100 *+ 100 - 100 - 100 - 100 - 100
> + 1
> ... * * else:
> ... * * * * return i + 1
> ...
>
> Do it work? yes.
> Is is "correct"? this can be discussed.
>
> Now replace i by a char, a representent of each "subset"
> of the FSR, select a method where this FST behave badly
> and take a look of what happen.
>
> >>> timeit.repeat("'a' * 1000 + 'z'")

>
> [0.6532032148133153, 0.6407248807756699, 0.6407264561239894]>>> timeit.repeat("'a' * 1000 + '9'")
>
> [0.6429508479509245, 0.6242782443215589, 0.6240490311410927]
>
>
>
> >>> timeit.repeat("'a' * 1000 + '€'")

>
> [1.095694927496563, 1.0696347279235603, 1.0687741939041082]>>> timeit.repeat("'a' * 1000 + 'ẞ'")
>
> [1.0796421281222877, 1.0348612767961853, 1.035325216876231]>>> timeit.repeat("'a' * 1000 + '\u2345'")
>
> [1.0855414137412112, 1.0694677410017164, 1.0688096392412945]
>
>
>
> >>> timeit.repeat("'œ' * 1000 + '\U00010001'")

>
> [1.237314015362017, 1.2226262553064657, 1.21994619397816]>>> timeit.repeat("'œ' * 1000 + '\U00010002'")
>
> [1.245773635836997, 1.2303978424029651, 1.2258257877430765]
>
> Where does it come from? Simple, the FSR breaks the
> simple rules used in all coding schemes (unicode or not).
> 1) a unique set of chars
> 2) the "same" algorithm for all chars.


Can you give me a source for this requirement?
Numbers are after all numbers. SO we should use the same code/
algorithms/machine-instructions for floating-point and integers?

>
> And again that's why utf-8 is working very smoothly.


How wonderful. Heres a suggestion.
Code up the UTF-8 and any of the python string reps in C and profile
them.
Please come back and tell us if UTF-8 outperforms any of the python
representations for strings on any operation (except straight copy).

>
> The "corporates" which understood this very well and
> wanted to incorporate, let say, the used characters
> of the French language had only the choice to
> create new coding schemes (eg mac-roman, cp1252).
>
> In unicode, the "latin-1" range is real plague.
>
> After years of experience, I'm still fascinated to see
> the corporates has solved this issue easily and the "free
> software" is still relying on latin-1.
> I never succeed to find an explanation.
>
> Even, the TeX folks, when they shifted to the Cork
> encoding in 199?, were aware of this and consequently
> provides special package(s).
>
> No offense, this is in my mind why "corporate software"
> will always be "corporate software" and "hobbyist software"
> will always stay at the level of "hobbyist software".
>
> A French windows user, understanding nothing in the
> coding of characters, assuming he is aware of its
> existence (!), has certainly no problem.
>
> Fascinating how it is possible to use Python to teach,
> to illustrate, to explain the coding of the characters. No?
>
> jmf


You troll with eclat and elan!
 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      04-02-2013
On Tue, Apr 2, 2013 at 3:20 AM, jmfauth <(E-Mail Removed)> wrote:
> It is somehow funny to see, the FSR "fails" precisely
> on problems Unicode will solve/handle, eg normalization or
> sorting [3].


Neither of these problems have anything to do with the FSR. Can you
give us an example of normalization or sorting where Python 3.3 fails
and Python 3.2 does not?

> [3] I only test and tested these "chars" blindly with the help
> of the doc I have. Btw, when I test complicated "Arabic chars",
> I noticed, Py33 "crashes", it does not really crash, it get stucked
> in some king of infinite loop (or is it due to "timeit"?).


Without knowing what the actual test that you ran was, we have no way
of answering that. Unless you give us more detail, my assumption
would be that the number of repetitions that you passed to timeit was
excessively large for the particular test case.

> [4] Am I the only one who test this kind of stuff?


No, you're just the only one who considers it important.
Micro-benchmarks like the ones you have been reporting are *useful*
when it comes to determining what operations can be better optimized,
but they are not *important* in and of themselves. What is important
is that actual, real-world programs are not significantly slowed by
these kinds of optimizations. Until you can demonstrate that real
programs are adversely affected by PEP 393, there is not in my opinion
any regression that is worth worrying over.
 
Reply With Quote
 
Terry Jan Reedy
Guest
Posts: n/a
 
      04-02-2013
On 4/2/2013 11:12 AM, jmfauth wrote:
> On 2 avr, 16:03, Steven D'Aprano <steve
> (E-Mail Removed)> wrote:


>> I'm sure you didn't intend to be insulting, but some of us *have* taken
>> JMF seriously, at least at first. His repeated overblown claims of how
>> Python is destroying Unicode ...


.... = 'usability in Python" or some variation on that.

> Sorrry I never claimed this, I'm just seeing on how Python is becoming
> less Unicode friendly.


Let us see what Jim has claimed, starting in 2012 August.

http://mail.python.org/pipermail/pyt...st/628826.html
"Devs are developing sophisticed tools based on a non working basis."

http://mail.python.org/pipermail/pyt...st/629514.html
"This "Flexible String Representation" fails."

http://mail.python.org/pipermail/pyt...st/629554.html
"This flexible representation is working absurdly."

Reader can decide whether 'non-working', 'fails', 'working absurdly' are
closer to 'destroying Unicode usability or just 'less friendly'.

On speed:

http://mail.python.org/pipermail/pyt...st/628781.html
"Python 3.3 is "slower" than Python 3.2."

http://mail.python.org/pipermail/pyt...st/628762.html
"I can open IDLE with Py 3.2 ou Py 3.3 and compare strings
manipulations. Py 3.3 is always slower. Period."

False. Period. Here is my followup at the time.
python.org/pipermail/python-list/2012-August/628779.html
"You have not tried enough tests .

On my Win7-64 system:
from timeit import timeit

print(timeit(" 'a'*10000 "))
3.3.0b2: .5
3.2.3: .8

print(timeit("c in a", "c = ''; a = 'a'*10000"))
3.3: .05 (independent of len(a)!)
3.2: 5.8 100 times slower! Increase len(a) and the ratio can be made as
high as one wants!

print(timeit("a.encode()", "a = 'a'*1000"))
3.2: 1.5
3.3: .26"

If one runs stringbency.ph with its 40 or so tests, 3.2 is sometimes
faster and 3.3 is sometimes faster.

http://mail.python.org/pipermail/pyt...er/630736.html

On to September:

"http://mail.python.org/pipermail/python-list/2012-September/630736.html"
"Avoid Py3.3"

In other words, ignore all the benefits and reject because a couple of
selected microbenchmarks show a slowdown.

http://mail.python.org/pipermail/pyt...er/631730.html
"Py 3.3 succeeded to somehow kill unicode"

I will stop here and let Jim explain how 'kill unicode' is different
from 'destroy unicode'.

--
Terry Jan Reedy


 
Reply With Quote
 
Joshua Landau
Guest
Posts: n/a
 
      04-02-2013
The initial post posited:
"The Python 3 merge of int and long has effectively penalized
small-number arithmetic by removing an optimization. As we've seen
from PEP 393 strings (jmf aside), there can be huge benefits from
having a single type with multiple representations internally. Is
there value in making the int type have a machine-word optimization in
the same way?"

Thanks to the fervent response jmf has gotten, the point above has been
mostly abandoned May I request that next time such an obvious diversion
(aka. jmf) occurs, responses happen in a different thread?

 
Reply With Quote
 
Lele Gaifax
Guest
Posts: n/a
 
      04-02-2013
jmfauth <(E-Mail Removed)> writes:

> Now replace i by a char, a representent of each "subset"
> of the FSR, select a method where this FST behave badly
> and take a look of what happen.


You insist in cherry-picking a single "method where this FST behave
badly", even when it is so obviously a corner case (IMHO it is not
reasonably a common case when you have relatively big chunks of ASCII
characters where you are adding one single non-ASCII char...)

Anyway, these are my results on the opposite case, where you have a big
chunk of non-ASCII characters and a single ASCII char added:

Python 2.7.3 (default, Jan 2 2013, 13:56:14)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.repeat("'€' * 1000 + 'z'")

[0.2817099094390869, 0.2811391353607178, 0.2811310291290283]
>>> timeit.repeat("u'œ' * 1000 + u'\U00010001'")

[0.549591064453125, 0.5502040386199951, 0.5490291118621826]
>>> timeit.repeat("u'\U00010001' * 1000 + u'œ'")

[0.3823568820953369, 0.3823089599609375, 0.3820679187774658]
>>> timeit.repeat("u'\U00010002' * 1000 + 'a'")

[0.45046305656433105, 0.45000195503234863, 0.44980502128601074]

Python 3.3.0 (default, Mar 18 2013, 12:00:52)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.repeat("'€' * 1000 + 'z'")

[0.23264244200254325, 0.23299441300332546, 0.2325888039995334]
>>> timeit.repeat("'œ' * 1000 + '\U00010001'")

[0.3760241370036965, 0.37552819900156464, 0.3755163860041648]
>>> timeit.repeat("'\U00010001' * 1000 + 'œ'")

[0.28259182300098473, 0.2825558360054856, 0.2824251129932236]
>>> timeit.repeat("'\U00010002' * 1000 + 'a'")

[0.28227063300437294, 0.2815949220021139, 0.2829978369991295]

IIUC, while it may be true that Py3 is slightly slower than Py2 when the
string operation involves an internal representation change (all your
examples, and the second operation above), in the much more common case
it is considerably faster. This, and the fact that Py3 actually handles
the whole Unicode space without glitches, make it a better environment
in my eyes. Kudos to the core team!

Just my 0.45-0.28 cents,
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
http://www.velocityreviews.com/forums/(E-Mail Removed) | -- Fortunato Depero, 1929.

 
Reply With Quote
 
rusi
Guest
Posts: n/a
 
      04-03-2013
On Apr 3, 8:31*am, Neil Hodgson <(E-Mail Removed)> wrote:

> * * Sorting a million string list (all the file paths on a particular
> computer) went from 0.4 seconds with Python 3.2 to 0.78 with 3.3 so
> we're out of the 'not noticeable by humans' range. Perhaps this is still
> a 'micro-benchmark' - I'd just like to avoid adding email access to get
> this over the threshold.


What does that last statement mean?
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      04-03-2013
In article
<(E-Mail Removed)>,
rusi <(E-Mail Removed)> wrote:

> On Apr 3, 8:31*am, Neil Hodgson <(E-Mail Removed)> wrote:
>
> > * * Sorting a million string list (all the file paths on a particular
> > computer) went from 0.4 seconds with Python 3.2 to 0.78 with 3.3 so
> > we're out of the 'not noticeable by humans' range.


On the other hand, how long did it take you to do the directory tree
walk required to find those million paths? I'll bet a long longer than
0.78 seconds, so this gets lost in the noise.

Still, it is unfortunate if sort performance got hurt significantly. My
mind was blown a while ago when I discovered that python could sort a
file of strings faster than the unix command-line sort utility. That's
pretty impressive.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance Tutorials Services - Boosting Performance by DisablingUnnecessary Services on Windows XP Home Edition Software Engineer Javascript 0 06-10-2011 02:18 AM
Re: Performance (pystone) of python 2.4 lower then python 2.3 ??? Andreas Kostyrka Python 0 12-17-2004 02:00 PM
Performance (pystone) of python 2.4 lower then python 2.3 ??? Lucas Hofman Python 13 12-16-2004 03:24 AM
RE: Straw poll on Python performance (was Re: Python is far from atop performer ...) Robert Brewer Python 1 01-10-2004 06:54 AM
Web Form Performance Versus Single File Performance jm ASP .Net 1 12-12-2003 11:14 PM



Advertisments