Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Performance of int/long in Python 3

Reply
Thread Tools

Performance of int/long in Python 3

 
 
rurpy@yahoo.com
Guest
Posts: n/a
 
      03-29-2013
On 03/28/2013 02:31 PM, Ethan Furman wrote:
> On 03/28/2013 12:54 PM, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
>> On 03/28/2013 01:48 AM, Steven D'Aprano wrote:
>> For someone who delights in pointing out the logical errors of
>> others you are often remarkably sloppy in your own logic.
>>
>> Of course language can be both helpful and excessively strong. That
>> is the case when language less strong would be equally or more
>> helpful.

>
> It can also be the case when language less strong would be useless.


I don't get your point.
I was pointing out the fallacy in Steven's logic (which you cut).
How is your statement relevant to that?

>> Further, "liar" is both so non-objective and so pejoratively
>> emotive that it is a word much more likely to be used by someone
>> interested in trolling than in a serious discussion, so most
>> sensible people here likely would not bite.

>
> Non-objective? If today poster B says X, and tomorrow poster B says
> s/he was unaware of X until just now, is not "liar" a reasonable
> conclusion?


Of course not. People forget what they posted previously, change
their mind, don't express what they intended perfectly, sometimes
express a complex thought that the reader inaccurately perceives
as contradictory, don't realize themselves that their thinking
is contradictory, ...
And of course who among us *not* a "liar" since we all lie from
time to time.

Lying involves intent to deceive. I haven't been following jmfauth's
claims since they are not of interest to me, but going back and quickly
looking at the posts that triggered the "liar" and "idiot" posts, I
did not see anything that made me think that jmfauth was not sincere
in his beliefs. Being wrong and being sincere are not exclusive.
Nor did Steven even try to justify the "liar" claim. As to Mark
Lawrence, that seemed like a pure "I don't like you" insult whose
proper place is /dev/null.

Even if the odds are 80% that the person is lying, why risk your
own credibility by making a nearly impossible to substantiate claim?
Someone may praise some company's product constantly online and be
discovered to be a salesperson at that company. Most of the time
you would be right to accuse the person of dishonesty. But I knew
a person who was very young and naive, who really believed in the
product and truly didn't see anything wrong in doing that. That
doesn't make it good behavior but those who claimed he was hiding
his identity for personal gain were wrong (at least as far as I
could tell, knowing the person personally.) Just post the facts
and let people draw their own conclusions; that's better than making
aggressive and offensive claims than can never be proven.

Calling people liars or idiots not only damages the reputation of
the Python community in general [*1] but hurts your own credibility
as well, since any sensible reader will wonder if other opinions
you post are more influenced by your emotions than by your intelligence.

>>> I hope that we all agree that we want a nice, friendly,
>>> productive community where everyone is welcome.

>>
>> I hope so too but it is likely that some people want a place to
>> develop and assert some sense of influence, engage in verbal duels,
>> instigate arguments, etc. That can be true of regulars here as
>> well as drive-by posters.
>>
>>> But some people simply cannot or will not behave in ways that are
>>> compatible with those community values. There are some people
>>> whom we *do not want here*

>>
>> In other words, everyone is NOT welcome.

>
> Correct. Do you not agree?


Don't ask me, ask Steven. He was the one who wrote two sentences
earlier, "...we want a...community where everyone is welcome."

I'll snip the rest of your post because it is your opinions
and I've already said why I disagree. Most people are smart enough
to make their own evaluations of posters here and if they are not,
and reject python based on what they read from a single poster
who obviously has "strong" views, then perhaps that's for the
best. That possibility (which I think is very close to zero) is
a tiny price to pay to avoid all the hostility and noise.

----
[*1] See for example the blog post at
http://joepie91.wordpress.com/2013/0...ould-feel-bad/
which was recently discussed in this list and in which the
author wrote, "the community around Python is one of the most
hostile and unhelpful communities around any programming-related
topic that I have ever seen".
 
Reply With Quote
 
 
 
 
Ethan Furman
Guest
Posts: n/a
 
      03-29-2013
On 03/29/2013 02:26 PM, (E-Mail Removed) wrote:
> On 03/28/2013 02:31 PM, Ethan Furman wrote:
>> On 03/28/2013 12:54 PM, (E-Mail Removed) wrote:
>>> On 03/28/2013 01:48 AM, Steven D'Aprano wrote:
>>> For someone who delights in pointing out the logical errors of
>>> others you are often remarkably sloppy in your own logic.
>>>
>>> Of course language can be both helpful and excessively strong. That
>>> is the case when language less strong would be equally or more
>>> helpful.

>>
>> It can also be the case when language less strong would be useless.

>
> I don't get your point.
> I was pointing out the fallacy in Steven's logic (which you cut).
> How is your statement relevant to that?


Ah. I thought you were saying that in all cases helpful strong language would be even more helpful if less strong.


>>> Further, "liar" is both so non-objective and so pejoratively
>>> emotive that it is a word much more likely to be used by someone
>>> interested in trolling than in a serious discussion, so most
>>> sensible people here likely would not bite.

>>
>> Non-objective? If today poster B says X, and tomorrow poster B says
>> s/he was unaware of X until just now, is not "liar" a reasonable
>> conclusion?

>
> Of course not. People forget what they posted previously, change
> their mind, don't express what they intended perfectly, sometimes
> express a complex thought that the reader inaccurately perceives
> as contradictory, don't realize themselves that their thinking
> is contradictory, ...


I agree, which is why I resisted my own impulse to call him a liar; however, he has been harping on this subject for
months now, so I would be suprised if he actually was surprised and had forgotten...


> Lying involves intent to deceive. I haven't been following jmfauth's
> claims since they are not of interest to me, but going back and quickly
> looking at the posts that triggered the "liar" and "idiot" posts, I
> did not see anything that made me think that jmfauth was not sincere
> in his beliefs. Being wrong and being sincere are not exclusive.
> Nor did Steven even try to justify the "liar" claim. As to Mark
> Lawrence, that seemed like a pure "I don't like you" insult whose
> proper place is /dev/null.


After months of jmf's antagonist posts, I don't blame them.

>>>> I hope that we all agree that we want a nice, friendly,
>>>> productive community where everyone is welcome.
>>>
>>> I hope so too but it is likely that some people want a place to
>>> develop and assert some sense of influence, engage in verbal duels,
>>> instigate arguments, etc. That can be true of regulars here as
>>> well as drive-by posters.
>>>
>>>> But some people simply cannot or will not behave in ways that are
>>>> compatible with those community values. There are some people
>>>> whom we *do not want here*
>>>
>>> In other words, everyone is NOT welcome.

>>
>> Correct. Do you not agree?

>
> Don't ask me, ask Steven. He was the one who wrote two sentences
> earlier, "...we want a...community where everyone is welcome."


Ah, right -- missed that!

--
~Ethan~
 
Reply With Quote
 
 
 
 
jmfauth
Guest
Posts: n/a
 
      03-31-2013
------

Neil Hodgson:

"The counter-problem is that a French document that needs to include
one mathematical symbol (or emoji) outside Latin-1 will double in size
as a Python string."

Serious developers/typographers/users know that you can not compose
a text in French with "latin-1". This is now also the case with
German (Germany).

---

Neil's comment is correct,

>>> sys.getsizeof('a' * 1000 + 'z')

1026
>>> sys.getsizeof('a' * 1000 + '€')

2040

This is not really the problem. "Serious users" may
notice sooner or later, Python and Unicode are walking in
opposite directions (technically and in spirit).

>>> timeit.repeat("'a' * 1000 + 'ẞ'")

[1.1088995672090292, 1.0842266613261913, 1.1010779011941594]
>>> timeit.repeat("'a' * 1000 + 'z'")

[0.6362570846925735, 0.6159128762502917, 0.6200501673623791]


(Just an opinion)

jmf
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      03-31-2013
On Sun, 31 Mar 2013 00:35:23 -0700, jmfauth wrote:


> This is not really the problem. "Serious users" may notice sooner or
> later, Python and Unicode are walking in opposite directions
> (technically and in spirit).
>
>>>> timeit.repeat("'a' * 1000 + 'ẞ'")

> [1.1088995672090292, 1.0842266613261913, 1.1010779011941594]
>>>> timeit.repeat("'a' * 1000 + 'z'")

> [0.6362570846925735, 0.6159128762502917, 0.6200501673623791]


Perhaps you should stick to Python 3.2, where ASCII strings are no faster
than non-ASCII strings.


Python 3.2 versus Python 3.3, no significant difference:

# 3.2
py> timeit.repeat("'a' * 1000 + 'ẞ'")
[1.7418999671936035, 1.7198870182037354, 1.763346004486084]

# 3.3
py> timeit.repeat("'a' * 1000 + 'ẞ'")
[1.8083378580026329, 1.818592812011484, 1.7922867869958282]



Python 3.2, ASCII vs Non-ASCII:

py> timeit.repeat("'a' * 1000 + 'z'")
[1.756322135925293, 1.8002049922943115, 1.721085958480835]
py> timeit.repeat("'a' * 1000 + 'ẞ'")
[1.7209150791168213, 1.7162668704986572, 1.7260780334472656]



In other words, if you stick to non-ASCII strings, Python 3.3 is no
slower than Python 3.2.



--
Steven
 
Reply With Quote
 
Mark Lawrence
Guest
Posts: n/a
 
      03-31-2013
On 31/03/2013 08:35, jmfauth wrote:
> ------
>
> Neil Hodgson:
>
> "The counter-problem is that a French document that needs to include
> one mathematical symbol (or emoji) outside Latin-1 will double in size
> as a Python string."
>
> Serious developers/typographers/users know that you can not compose
> a text in French with "latin-1". This is now also the case with
> German (Germany).
>
> ---
>
> Neil's comment is correct,
>
>>>> sys.getsizeof('a' * 1000 + 'z')

> 1026
>>>> sys.getsizeof('a' * 1000 + '€')

> 2040
>
> This is not really the problem. "Serious users" may
> notice sooner or later, Python and Unicode are walking in
> opposite directions (technically and in spirit).
>
>>>> timeit.repeat("'a' * 1000 + 'ẞ'")

> [1.1088995672090292, 1.0842266613261913, 1.1010779011941594]
>>>> timeit.repeat("'a' * 1000 + 'z'")

> [0.6362570846925735, 0.6159128762502917, 0.6200501673623791]
>
>
> (Just an opinion)
>
> jmf
>


I'm feeling very sorry for this horse, it's been flogged so often it's
down to bare bones.

--
If you're using GoogleCrap™ please read this
http://wiki.python.org/moin/GoogleGroupsPython.

Mark Lawrence

 
Reply With Quote
 
rusi
Guest
Posts: n/a
 
      04-01-2013
On Mar 31, 5:55*pm, Mark Lawrence <(E-Mail Removed)> wrote:

<snipped jmf's broken-record whine>

> I'm feeling very sorry for this horse, it's been flogged so often it's
> down to bare bones.


While I am now joining the camp of those fed up with jmf's whining, I
do wonder if we are shooting the messenger

From a recent Roy mysqldb-unicode thread:
> My unicode-fu is a bit weak. Are we looking at a Python problem, a
> MySQLdb problem, or a problem with the underlying MySQL server? We've
> certainly inserted utf-8 data before without any problems. It's
> possible this is the first time we've tried to handle a character
> outside the BMP.

:
:
> OK, that leads to the next question. Is there anyway I can (in Python
> 2.7) detect when a string is not entirely in the BMP? If I could find
> all the non-BMP characters, I could replace them with U+FFFD
> (REPLACEMENT CHARACTER) and life would be good (enough).


Steven's:
> But it means that if you're one of the 99.9% of users who mostly use
> characters in the BMP,


And from http://www.tlg.uci.edu/~opoudjis/uni...de_astral.html
> The informal name for the supplementary planes of Unicode is "astral planes", since
> (especially in the late '90s) their use seemed to be as remote as
> the theosophical "great beyond".
> As of this writing for instance, Dreamweaver MX for MacOSX (which I am currently using
> to prepare this) will let you paste BMP text into its WYSIWYG window; butpasting
> Supplementary Plane text there will make it crash.


So I really wonder: Is python losing more by supporting SMP with
performance hit on BMP?
The problem as I see it is that a choice that is sufficiently skew is
no more a straightforward choice. An example will illustrate:

I can choose to drive or not -- a choice.
Statistics tell me that on average there are 3 fatalities every day; I
am very concerned that I could get killed so I choose not to drive.
Which neglects that there are a couple of million safe-drives at the
same time as the '3 fatalities'

[What if anything this has to do with jmf's rants I dont know because
I dont know if anyone (including jmf) knows what he is ranting about. ]
 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      04-01-2013
On Sun, Mar 31, 2013 at 11:33 PM, rusi <(E-Mail Removed)> wrote:
>
> So I really wonder: Is python losing more by supporting SMP with
> performance hit on BMP?


I don't believe so. Although performance is undeniably worse for some
benchmarks, it is also better for some others. Nobody has yet
demonstrated an actual, real-world program that is affected negatively
by the Unicode change. All of jmf's complaints amount to
cherry-picking data and leaping to conclusions.
 
Reply With Quote
 
Chris Angelico
Guest
Posts: n/a
 
      04-01-2013
On Mon, Apr 1, 2013 at 4:33 PM, rusi <(E-Mail Removed)> wrote:
> So I really wonder: Is python losing more by supporting SMP with
> performance hit on BMP?


If your strings fit entirely within the BMP, then you should see no
penalty compared to previous versions of Python. If they happen to fit
inside ASCII, then there may well be significant improvements. But
regardless, what you gain is the ability to work with *any* string,
regardless of its content, without worrying about it. You can count
characters regardless of their content. Imagine if a tuple of integers
behaved differently if some of those integers flipped to being long
ints:

x = (1, 2, 4, 8, 1<<30, 1<<300, 1<<10)

Wouldn't you be surprised if len(x) returned 8? I certainly would be.
And that's what a narrow build of Python does with Unicode.

Unicode strings are approximately comparable to tuples of integers. In
fact, they can be interchanged fairly readily:

string = "Treble clef: \U0001D11E"
array = tuple(map(ord,string))
assert(len(array) == 14)
out_string = ''.join(map(chr,array))
assert(out_string == string)

This doesn't work in Python 2.6 on Windows, partly because of
surrogates, but also because chr() isn't designed for Unicode strings.
There's probably a solution to the second, but not really to the
first. The tuple of ords should match the way the characters are laid
out to a human.

ChrisA
 
Reply With Quote
 
Steven D'Aprano
Guest
Posts: n/a
 
      04-01-2013
On Sun, 31 Mar 2013 22:33:45 -0700, rusi wrote:

> On Mar 31, 5:55*pm, Mark Lawrence <(E-Mail Removed)> wrote:
>
> <snipped jmf's broken-record whine>
>
>> I'm feeling very sorry for this horse, it's been flogged so often it's
>> down to bare bones.

>
> While I am now joining the camp of those fed up with jmf's whining, I do
> wonder if we are shooting the messenger…


No. The trouble is that the messenger is shouting that the Unicode world
is ending on December 21st 2012, and hasn't noticed that was over three
months ago and the world didn't end.


[...]
>> OK, that leads to the next question. Is there anyway I can (in Python
>> 2.7) detect when a string is not entirely in the BMP? If I could find
>> all the non-BMP characters, I could replace them with U+FFFD
>> (REPLACEMENT CHARACTER) and life would be good (enough).


Of course you can do this, but you should not. If your input data
includes character C, you should deal with character C and not just throw
it away unnecessarily. That would be rude, and in Python 3.3 it should be
unnecessary.

Although, since the person you are quoting is stuck in Python 2.7, it may
be less bad than having to deal with potentially broken Unicode strings.


> Steven's:
>> But it means that if you're one of the 99.9% of users who mostly use
>> characters in the BMP, …


Yes. "Mostly" does not mean exclusively, and given (say) a billion
computer users, that leaves about a million users who have significant
need for non-BMP characters.

If you don't agree with my estimate, feel free to invent your own


> And from http://www.tlg.uci.edu/~opoudjis/uni...de_astral.html
>> The informal name for the supplementary planes of Unicode is "astral
>> planes", since (especially in the late '90s) their use seemed to be as
>> remote as the theosophical "great beyond". …


That was nearly two decades ago. Two decades ago, the idea that the
entire computing world could standardize on a single character set,
instead of having to deal with dozens of different "code pages", seemed
as likely as people landing on the Moon seemed in 1940.

Today, the entire computing world has standardized on such a system,
"code pages" (encodings) are mostly only needed for legacy data and
shitty applications, but most implementations don't support the entire
Unicode range. A couple of programming languages, including Pike and
Python, support Unicode fully and correctly. Pike has never had the same
high-profile as Python, but now that Python can support the entire
Unicode range without broken surrogate support, maybe users of other
languages will start to demand the same.


> So I really wonder: Is python losing more by supporting SMP with
> performance hit on BMP?


No.

As many people have demonstrated, both with code snippets and whole-
program benchmarks, Python 3.3 is *as fast* or *faster* than Python 3.2
narrow builds. In practice, Python 3.3 saves enough memory by using
sensible string implementations that real world software is faster in
Python 3.3 than in 3.2.


> The problem as I see it is that a choice that is sufficiently skew is no
> more a straightforward choice. An example will illustrate:
>
> I can choose to drive or not -- a choice. Statistics tell me that on
> average there are 3 fatalities every day; I am very concerned that I
> could get killed so I choose not to drive. Which neglects that there are
> a couple of million safe-drives at the same time as the '3 fatalities'


Clear as mud. What does this have to do with supporting Unicode?




--
Steven
 
Reply With Quote
 
Roy Smith
Guest
Posts: n/a
 
      04-01-2013
In article <515941d8$0$29967$c3e8da3$(E-Mail Removed) om>,
Steven D'Aprano <(E-Mail Removed)> wrote:

> [...]
> >> OK, that leads to the next question. Is there anyway I can (in Python
> >> 2.7) detect when a string is not entirely in the BMP? If I could find
> >> all the non-BMP characters, I could replace them with U+FFFD
> >> (REPLACEMENT CHARACTER) and life would be good (enough).

>
> Of course you can do this, but you should not. If your input data
> includes character C, you should deal with character C and not just throw
> it away unnecessarily. That would be rude, and in Python 3.3 it should be
> unnecessary.


The import job isn't done yet, but so far we've processed 116 million
records and had to clean up four of them. I can live with that.
Sometimes practicality trumps correctness.

It turns out, the problem is that the version of MySQL we're using
doesn't support non-BMP characters. Newer versions do (but you have to
declare the column to use the utf8bm4 character set). I could upgrade
to a newer MySQL version, but it's just not worth it.

Actually, I did try spinning up a 5.5 instance (one of the nice things
of being in the cloud) and experimented with that, but couldn't get it
to work there either. I'll admit that I didn't invest a huge amount of
effort to make that work before just writing this:

def bmp_filter(self, s):
"""Filter a unicode string to remove all non-BMP (basic
multilingual plane) characters. All such characters are
replaced with U+FFFD (Unicode REPLACEMENT CHARACTER).

"""
if all(ord(c) <= 0xffff for c in s):
return s
else:
self.logger.warning("making %r BMP-clean", s)
bmp_chars = [(c if ord(c) <= 0xffff else u'\ufffd') for c in
s]
return ''.join(bmp_chars)
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Performance Tutorials Services - Boosting Performance by DisablingUnnecessary Services on Windows XP Home Edition Software Engineer Javascript 0 06-10-2011 02:18 AM
Re: Performance (pystone) of python 2.4 lower then python 2.3 ??? Andreas Kostyrka Python 0 12-17-2004 02:00 PM
Performance (pystone) of python 2.4 lower then python 2.3 ??? Lucas Hofman Python 13 12-16-2004 03:24 AM
RE: Straw poll on Python performance (was Re: Python is far from atop performer ...) Robert Brewer Python 1 01-10-2004 06:54 AM
Web Form Performance Versus Single File Performance jm ASP .Net 1 12-12-2003 11:14 PM



Advertisments