Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > String concatenation benchmarking weirdness

Reply
Thread Tools

String concatenation benchmarking weirdness

 
 
Rotwang
Guest
Posts: n/a
 
      01-11-2013
Hi all,

the other day I 2to3'ed some code and found it ran much slower in 3.3.0
than 2.7.2. I fixed the problem but in the process of trying to diagnose
it I've stumbled upon something weird that I hope someone here can
explain to me. In what follows I'm using Python 2.7.2 on 64-bit Windows
7. Suppose I do this:

from timeit import timeit

# find out how the time taken to append a character to the end of a byte
# string depends on the size of the string

results = []
for size in range(0, 10000001, 100000):
results.append(timeit("y = x + 'a'",
setup = "x = 'a' * %i" % size, number = 1))

If I plot results against size, what I see is that the time taken
increases approximately linearly with the size of the string, with the
string of length 10000000 taking about 4 milliseconds. On the other
hand, if I replace the statement to be timed with "x = x + 'a'" instead
of "y = x + 'a'", the time taken seems to be pretty much independent of
size, apart from a few spikes; the string of length 10000000 takes about
4 microseconds.

I get similar results with strings (but not bytes) in 3.3.0. My guess is
that this is some kind of optimisation that treats strings as mutable
when carrying out operations that result in the original string being
discarded. If so it's jolly clever, since it knows when there are other
references to the same string:

timeit("x = x + 'a'", setup = "x = y = 'a' * %i" % size, number = 1)
# grows linearly with size

timeit("x = x + 'a'", setup = "x, y = 'a' * %i", 'a' * %i"
% (size, size), number = 1)
# stays approximately constant

It also can see through some attempts to fool it:

timeit("x = ('' + x) + 'a'", setup = "x = 'a' * %i" % size, number = 1)
# stays approximately constant

timeit("x = x*1 + 'a'", setup = "x = 'a' * %i" % size, number = 1)
# stays approximately constant

Is my guess correct? If not, what is going on? If so, is it possible to
explain to a programming noob how the interpreter does this? And is
there a reason why it doesn't work with bytes in 3.3?


--
I have made a thing that superficially resembles music:

http://soundcloud.com/eroneity/we-be...-own-crapiness
 
Reply With Quote
 
 
 
 
Rotwang
Guest
Posts: n/a
 
      01-11-2013
On 11/01/2013 20:16, Ian Kelly wrote:
> On Fri, Jan 11, 2013 at 12:03 PM, Rotwang <(E-Mail Removed)> wrote:
>> Hi all,
>>
>> the other day I 2to3'ed some code and found it ran much slower in 3.3.0 than
>> 2.7.2. I fixed the problem but in the process of trying to diagnose it I've
>> stumbled upon something weird that I hope someone here can explain to me.
>>
>> [stuff about timings]
>>
>> Is my guess correct? If not, what is going on? If so, is it possible to
>> explain to a programming noob how the interpreter does this?

>
> Basically, yes. You can find the discussion behind that optimization at:
>
> http://bugs.python.org/issue980695
>
> It knows when there are other references to the string because all
> objects in CPython are reference-counted. It also works despite your
> attempts to "fool" it because after evaluating the first operation
> (which is easily optimized to return the string itself in both cases),
> the remaining part of the expression is essentially "x = TOS + 'a'",
> where x and the top of the stack are the same string object, which is
> the same state the original code reaches after evaluating just the x.


Nice, thanks.


> The stated use case for this optimization is to make repeated
> concatenation more efficient, but note that it is still generally
> preferable to use the ''.join() construct, because the optimization is
> specific to CPython and may not exist for other Python
> implementations.


The slowdown in my code was caused by a method that built up a string of
bytes by repeatedly using +=, before writing the result to a WAV file.
My fix was to replaced the bytes string with a bytearray, which seems
about as fast as the rewrite I just tried with b''.join. Do you know
whether the bytearray method will still be fast on other implementations?


--
I have made a thing that superficially resembles music:

http://soundcloud.com/eroneity/we-be...-own-crapiness
 
Reply With Quote
 
 
 
 
wxjmfauth@gmail.com
Guest
Posts: n/a
 
      01-12-2013
from timeit import timeit, repeat

size = 1000

r = repeat("y = x + 'a'", setup = "x = 'a' * %i" % size)
print('1:', r)
r = repeat("y = x + ''", setup = "x = 'a' * %i" % size)
print('2:', r)
r = repeat("y = x + ''", setup = "x = 'a' * %i" % size)
print('3:', r)
r = repeat("y = x + ''", setup = "x = 'a' * %i" % size)
print('4:', r)
r = repeat("y = x + ''", setup = "x = '' * %i" % size)
print('5:', r)
r = repeat("y = x + ''", setup = "x = '' * %i" % size)
print('6:', r)
r = repeat("y = + ''", setup = " = '' * %i" % size)
print('7:', r)
r = repeat("y = + ''", setup = " = '' * %i" % size)
print('8:', r)



>c:\python32\pythonw -u "vitesse3.py"

1: [0.3603178435286996, 0.42901157137281515, 0.35459694357592086]
2: [0.3576409223543202, 0.4272010951864649, 0.3590055732104662]
3: [0.3552022735516487, 0.4256544908828328, 0.35824546465278573]
4: [0.35488168890607774, 0.4271707696118834, 0.36109528098614074]
5: [0.3560675370237849, 0.4261538782668417, 0.36138160167082134]
6: [0.3570182634788317, 0.4270155971913008, 0.35770629956705324]
7: [0.3556977225493485, 0.4264969117143753, 0.3645634239700426]
8: [0.35511247834379844, 0.4259628665308437, 0.3580737510097034]
>Exit code: 0
>c:\Python33\pythonw -u "vitesse3.py"

1: [0.3053600256152646, 0.3306491917840535, 0.3044963374976518]
2: [0.36252767208680514, 0.36937298133086727, 0.3685573415262271]
3: [0.7666293438924097, 0.7653473991487574, 0.7630926729867262]
4: [0.7636680712265038, 0.7647586103955284, 0.7631395397838059]
5: [0.44721085450773934, 0.3863234021671369, 0.45664368355696094]
6: [0.44699700013114807, 0.3873974001136613, 0.45167383387335036]
7: [0.4465200615491014, 0.387050034441188, 0.45459690419205856]
8: [0.44760587465455437, 0.3875261853459726, 0.45421212384964704]
>Exit code: 0



The difference between a correct (coherent) unicode handling and ...

jmf
 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      01-12-2013
On 1/12/2013 3:38 AM, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
> from timeit import timeit, repeat
>
> size = 1000
>
> r = repeat("y = x + 'a'", setup = "x = 'a' * %i" % size)
> print('1:', r)
> r = repeat("y = x + 'é'", setup = "x = 'a' * %i" % size)
> print('2:', r)
> r = repeat("y = x + 'œ'", setup = "x = 'a' * %i" % size)
> print('3:', r)
> r = repeat("y = x + '€'", setup = "x = 'a' * %i" % size)
> print('4:', r)
> r = repeat("y = x + '€'", setup = "x = '€' * %i" % size)
> print('5:', r)
> r = repeat("y = x + 'œ'", setup = "x = 'œ' * %i" % size)
> print('6:', r)
> r = repeat("y = é + 'œ'", setup = "é = 'œ' * %i" % size)
> print('7:', r)
> r = repeat("y = é + 'œ'", setup = "é = '€' * %i" % size)
> print('8:', r)
>
>
>
>> c:\python32\pythonw -u "vitesse3.py"

> 1: [0.3603178435286996, 0.42901157137281515, 0.35459694357592086]
> 2: [0.3576409223543202, 0.4272010951864649, 0.3590055732104662]
> 3: [0.3552022735516487, 0.4256544908828328, 0.35824546465278573]
> 4: [0.35488168890607774, 0.4271707696118834, 0.36109528098614074]
> 5: [0.3560675370237849, 0.4261538782668417, 0.36138160167082134]
> 6: [0.3570182634788317, 0.4270155971913008, 0.35770629956705324]
> 7: [0.3556977225493485, 0.4264969117143753, 0.3645634239700426]
> 8: [0.35511247834379844, 0.4259628665308437, 0.3580737510097034]
>> Exit code: 0
>> c:\Python33\pythonw -u "vitesse3.py"

> 1: [0.3053600256152646, 0.3306491917840535, 0.3044963374976518]
> 2: [0.36252767208680514, 0.36937298133086727, 0.3685573415262271]
> 3: [0.7666293438924097, 0.7653473991487574, 0.7630926729867262]
> 4: [0.7636680712265038, 0.7647586103955284, 0.7631395397838059]
> 5: [0.44721085450773934, 0.3863234021671369, 0.45664368355696094]
> 6: [0.44699700013114807, 0.3873974001136613, 0.45167383387335036]
> 7: [0.4465200615491014, 0.387050034441188, 0.45459690419205856]
> 8: [0.44760587465455437, 0.3875261853459726, 0.45421212384964704]
>> Exit code: 0

>
>
> The difference between a correct (coherent) unicode handling and ...


By 'correct' Jim means 'speedy', for a subset of string operations*.
rather than 'accurate'. In 3.2 and before, CPython does not handle
extended plane characters correctly on Windows and other narrow builds.
This is, by the way, true of many other languages. For instance, Tcl 8.5
and before (not sure about the new 8.6) does not handle them at all. The
same is true of Microsoft command windows.

* lets try another comparison:

from timeit import timeit
print(timeit("a.encode()", "a = 'a'*10000"))

3.2: 12.1 seconds
3.3 .7 seconds

3.3 is 15 times faster!!! (The factor increases with the length of a.)

A fairer comparison is the approximately 120 micro benchmarks in
Tools/stringbench.py. Here they are, uncensored, for 3.3.0 and 3.2.3. It
is in the Tools directory of some distributions but not all (including
not Windows). It can be downloaded from
http://hg.python.org/cpython/file/6f...ls/stringbench

In FireFox, Right-click on the stringbench.py link and 'Save link as...'
to somewhere you can run it from.

>>>

stringbench v2.0
3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit
(AMD64)]
2013-01-12 06:17:51.685781
bytes unicode
(in ms) (in ms) % comment
========== case conversion -- dense
0.41 0.43 95.2 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower()
(*1000)
0.42 0.43 95.8 ("where in the world is carmen san deigo?"*10).upper()
(*1000)
========== case conversion -- rare
0.41 0.43 95.8 ("Where in the world is Carmen San Deigo?"*10).lower()
(*1000)
0.42 0.43 96.3 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper()
(*1000)
========== concat 20 strings of words length 4 to 15
1.83 1.95 94.1 s1+s2+s3+s4+...+s20 (*1000)
========== concat two strings
0.10 0.10 98.7 "Andrew"+"Dalke" (*1000)
========== count AACT substrings in DNA example
2.46 2.44 100.9 dna.count("AACT") (*10)
========== count newlines
0.77 0.75 103.6 ...text.with.2000.newlines.count("\n") (*10)
========== early match, single character
0.30 0.27 110.5 ("A"*1000).find("A") (*1000)
0.45 0.06 750.5 "A" in "A"*1000 (*1000)
0.30 0.27 110.4 ("A"*1000).index("A") (*1000)
0.24 0.22 107.2 ("A"*1000).partition("A") (*1000)
0.33 0.29 116.6 ("A"*1000).rfind("A") (*1000)
0.32 0.29 107.9 ("A"*1000).rindex("A") (*1000)
0.20 0.21 94.1 ("A"*1000).rpartition("A") (*1000)
0.42 0.45 93.4 ("A"*1000).rsplit("A", 1) (*1000)
0.39 0.41 95.9 ("A"*1000).split("A", 1) (*1000)
========== early match, two characters
0.32 0.27 121.1 ("AB"*1000).find("AB") (*1000)
0.45 0.06 729.5 "AB" in "AB"*1000 (*1000)
0.30 0.27 111.2 ("AB"*1000).index("AB") (*1000)
0.23 0.28 85.0 ("AB"*1000).partition("AB") (*1000)
0.33 0.30 110.6 ("AB"*1000).rfind("AB") (*1000)
0.33 0.30 110.5 ("AB"*1000).rindex("AB") (*1000)
0.22 0.27 83.1 ("AB"*1000).rpartition("AB") (*1000)
0.46 0.47 96.7 ("AB"*1000).rsplit("AB", 1) (*1000)
0.44 0.48 90.9 ("AB"*1000).split("AB", 1) (*1000)
========== endswith multiple characters
0.24 0.29 84.0 "Andrew".endswith("Andrew") (*1000)
========== endswith multiple characters - not!
0.26 0.28 92.9 "Andrew".endswith("Anders") (*1000)
========== endswith single character
0.25 0.28 90.0 "Andrew".endswith("w") (*1000)
========== formatting a string type with a dict
N/A 0.67 0.0 "The %(k1)s is %(k2)s the
%(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
========== join empty string, with 1 character sep
N/A 0.06 0.0 "A".join("") (*100)
========== join empty string, with 5 character sep
N/A 0.06 0.0 "ABCDE".join("") (*100)
========== join list of 100 words, with 1 character sep
0.87 1.27 68.8 "A".join(["Bob"]*100)) (*1000)
========== join list of 100 words, with 5 character sep
1.14 1.54 74.0 "ABCDE".join(["Bob"]*100)) (*1000)
========== join list of 26 characters, with 1 character sep
0.27 0.37 72.0 "A".join(list("ABC..Z")) (*1000)
========== join list of 26 characters, with 5 character sep
0.32 0.43 75.7 "ABCDE".join(list("ABC..Z")) (*1000)
========== join string with 26 characters, with 1 character sep
N/A 1.30 0.0 "A".join("ABC..Z") (*1000)
========== join string with 26 characters, with 5 character sep
N/A 1.37 0.0 "ABCDE".join("ABC..Z") (*1000)
========== late match, 100 characters
3.25 3.23 100.5 s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
2.79 2.78 100.4 s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
1.98 1.94 102.3 s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
3.24 3.23 100.3 s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
4.26 3.62 117.7 s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
3.23 3.23 100.1 s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
2.32 2.32 100.1 s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
3.23 3.21 100.8 s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
3.58 3.57 100.4 s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
3.60 3.60 100.0 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
3.60 3.56 101.2 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
========== late match, two characters
0.62 0.58 106.3 ("AB"*300+"C").find("BC") (*1000)
0.92 0.82 111.8 ("AB"*300+"CA").find("CA") (*1000)
0.73 0.33 218.8 "BC" in ("AB"*300+"C") (*1000)
0.61 0.60 101.0 ("AB"*300+"C").index("BC") (*1000)
0.54 0.82 66.4 ("AB"*300+"C").partition("BC") (*1000)
0.66 0.63 104.6 ("C"+"AB"*300).rfind("CA") (*1000)
0.91 0.88 102.3 ("BC"+"AB"*300).rfind("BC") (*1000)
0.65 0.62 105.1 ("C"+"AB"*300).rindex("CA") (*1000)
0.53 0.56 94.5 ("C"+"AB"*300).rpartition("CA") (*1000)
0.75 0.77 96.6 ("C"+"AB"*300).rsplit("CA", 1) (*1000)
0.65 0.67 97.0 ("AB"*300+"C").split("BC", 1) (*1000)
========== no match, single character
0.89 0.87 102.3 ("A"*1000).find("B") (*1000)
1.03 0.64 159.1 "B" in "A"*1000 (*1000)
0.67 0.68 98.7 ("A"*1000).partition("B") (*1000)
0.87 0.85 102.8 ("A"*1000).rfind("B") (*1000)
0.67 0.68 98.5 ("A"*1000).rpartition("B") (*1000)
0.87 0.87 99.2 ("A"*1000).rsplit("B", 1) (*1000)
0.86 0.85 101.5 ("A"*1000).split("B", 1) (*1000)
========== no match, two characters
1.22 1.16 104.9 ("AB"*1000).find("BC") (*1000)
1.93 2.02 95.2 ("AB"*1000).find("CA") (*1000)
1.37 0.94 145.3 "BC" in "AB"*1000 (*1000)
1.39 2.14 65.1 ("AB"*1000).partition("BC") (*1000)
2.32 2.31 100.7 ("AB"*1000).rfind("BC") (*1000)
1.47 1.44 102.1 ("AB"*1000).rfind("CA") (*1000)
2.26 2.27 99.7 ("AB"*1000).rpartition("BC") (*1000)
2.46 2.45 100.2 ("AB"*1000).rsplit("BC", 1) (*1000)
1.15 1.16 99.1 ("AB"*1000).split("BC", 1) (*1000)
========== quick replace multiple character match
0.13 0.12 105.0 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10)
========== quick replace single character match
0.12 0.12 105.2 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10)
========== repeat 1 character 10 times
0.08 0.10 80.6 "A"*10 (*1000)
========== repeat 1 character 1000 times
0.16 0.18 93.1 "A"*1000 (*1000)
========== repeat 5 characters 10 times
0.11 0.13 84.4 "ABCDE"*10 (*1000)
========== repeat 5 characters 1000 times
0.39 0.41 94.8 "ABCDE"*1000 (*1000)
========== replace and expand multiple characters, big string
2.02 2.36 85.6 "...text.with.2000.newlines...replace("\n", "\r\n") (*10)
========== replace multiple characters, dna
3.12 3.23 96.6 dna.replace("ATC", "ATT") (*10)
========== replace single character
0.33 0.40 82.4 "This is a test".replace(" ", "\t") (*1000)
========== replace single character, big string
0.75 0.86 87.4 "...text.with.2000.lines...replace("\n", " ") (*10)
========== replace/remove multiple characters
0.41 0.48 86.1 "When shall we three meet again?".replace("ee", "") (*1000)
========== split 1 whitespace
0.14 0.18 79.3 ("Here are some words. "*2).partition(" ") (*1000)
0.11 0.14 75.1 ("Here are some words. "*2).rpartition(" ") (*1000)
0.35 0.39 90.3 ("Here are some words. "*2).rsplit(None, 1) (*1000)
0.32 0.38 83.9 ("Here are some words. "*2).split(None, 1) (*1000)
========== split 2000 newlines
1.74 2.02 86.3 "...text...".rsplit("\n") (*10)
1.69 1.97 85.5 "...text...".split("\n") (*10)
1.89 2.55 74.0 "...text...".splitlines() (*10)
========== split newlines
0.35 0.39 88.9 "this\nis\na\ntest\n".rsplit("\n") (*1000)
0.34 0.40 86.4 "this\nis\na\ntest\n".split("\n") (*1000)
0.32 0.40 80.7 "this\nis\na\ntest\n".splitlines() (*1000)
========== split on multicharacter separator (dna)
2.28 2.30 99.1 dna.rsplit("ACTAT") (*10)
2.63 2.66 98.9 dna.split("ACTAT") (*10)
========== split on multicharacter separator (small)
0.55 0.69 79.0
"this--is--a--test--of--the--emergency--broadcast--system".rsplit("--")
(*1000)
0.58 0.70 82.9
"this--is--a--test--of--the--emergency--broadcast--system".split("--")
(*1000)
========== split whitespace (huge)
1.51 2.12 71.4 human_text.rsplit() (*10)
1.51 2.05 73.6 human_text.split() (*10)
========== split whitespace (small)
0.48 0.68 70.1 ("Here are some words. "*2).rsplit() (*1000)
0.48 0.64 74.9 ("Here are some words. "*2).split() (*1000)
========== startswith multiple characters
0.24 0.25 95.9 "Andrew".startswith("Andrew") (*1000)
========== startswith multiple characters - not!
0.24 0.25 95.7 "Andrew".startswith("Anders") (*1000)
========== startswith single character
0.23 0.25 95.4 "Andrew".startswith("A") (*1000)
========== strip terminal newline
0.09 0.21 44.1 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000)
0.09 0.12 74.0 "\nHello!".rstrip() (*1000)
0.09 0.12 74.0 "Hello!\n".rstrip() (*1000)
0.09 0.12 71.6 "\nHello!\n".strip() (*1000)
0.09 0.12 73.2 "\nHello!".strip() (*1000)
0.09 0.12 72.9 "Hello!\n".strip() (*1000)
========== strip terminal spaces and tabs
0.09 0.13 69.6 "\t \tHello".rstrip() (*1000)
0.09 0.13 72.3 "Hello\t \t".rstrip() (*1000)
0.07 0.08 86.8 "Hello\t \t".strip() (*1000)
========== tab split
0.59 0.65 90.9 GFF3_example.rsplit("\t", (*1000)
0.55 0.59 94.2 GFF3_example.rsplit("\t") (*1000)
0.52 0.57 90.7 GFF3_example.split("\t", (*1000)
0.52 0.57 90.1 GFF3_example.split("\t") (*1000)
108.87 116.31 93.6 TOTAL
>>>

stringbench v2.0
3.2.3 (default, Apr 11 2012, 07:12:16) [MSC v.1500 64 bit (AMD64)]
2013-01-12 06:23:05.994000
bytes unicode
(in ms) (in ms) % comment
========== case conversion -- dense
0.63 3.01 21.0 ("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower()
(*1000)
0.63 2.90 21.5 ("where in the world is carmen san deigo?"*10).upper()
(*1000)
========== case conversion -- rare
0.84 2.83 29.8 ("Where in the world is Carmen San Deigo?"*10).lower()
(*1000)
0.50 3.47 14.3 ("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper()
(*1000)
========== concat 20 strings of words length 4 to 15
1.82 1.75 103.9 s1+s2+s3+s4+...+s20 (*1000)
========== concat two strings
0.09 0.08 115.5 "Andrew"+"Dalke" (*1000)
========== count AACT substrings in DNA example
2.40 2.64 91.1 dna.count("AACT") (*10)
========== count newlines
0.77 0.75 101.6 ...text.with.2000.newlines.count("\n") (*10)
========== early match, single character
0.19 0.18 101.9 ("A"*1000).find("A") (*1000)
0.39 0.05 824.7 "A" in "A"*1000 (*1000)
0.19 0.19 96.3 ("A"*1000).index("A") (*1000)
0.20 0.22 87.5 ("A"*1000).partition("A") (*1000)
0.20 0.20 101.8 ("A"*1000).rfind("A") (*1000)
0.20 0.20 101.2 ("A"*1000).rindex("A") (*1000)
0.18 0.22 82.5 ("A"*1000).rpartition("A") (*1000)
0.41 0.45 91.7 ("A"*1000).rsplit("A", 1) (*1000)
0.42 0.43 99.0 ("A"*1000).split("A", 1) (*1000)
========== early match, two characters
0.19 0.19 102.3 ("AB"*1000).find("AB") (*1000)
0.39 0.05 781.6 "AB" in "AB"*1000 (*1000)
0.19 0.20 97.9 ("AB"*1000).index("AB") (*1000)
0.23 0.33 71.1 ("AB"*1000).partition("AB") (*1000)
0.20 0.20 101.6 ("AB"*1000).rfind("AB") (*1000)
0.20 0.20 100.1 ("AB"*1000).rindex("AB") (*1000)
0.22 0.31 70.4 ("AB"*1000).rpartition("AB") (*1000)
0.47 0.53 90.0 ("AB"*1000).rsplit("AB", 1) (*1000)
0.45 0.52 85.0 ("AB"*1000).split("AB", 1) (*1000)
========== endswith multiple characters
0.18 0.18 97.6 "Andrew".endswith("Andrew") (*1000)
========== endswith multiple characters - not!
0.18 0.18 100.4 "Andrew".endswith("Anders") (*1000)
========== endswith single character
0.18 0.18 97.1 "Andrew".endswith("w") (*1000)
========== formatting a string type with a dict
N/A 0.53 0.0 "The %(k1)s is %(k2)s the
%(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
========== join empty string, with 1 character sep
N/A 0.05 0.0 "A".join("") (*100)
========== join empty string, with 5 character sep
N/A 0.05 0.0 "ABCDE".join("") (*100)
========== join list of 100 words, with 1 character sep
1.02 1.02 99.6 "A".join(["Bob"]*100)) (*1000)
========== join list of 100 words, with 5 character sep
1.25 1.48 84.4 "ABCDE".join(["Bob"]*100)) (*1000)
========== join list of 26 characters, with 1 character sep
0.31 0.25 122.9 "A".join(list("ABC..Z")) (*1000)
========== join list of 26 characters, with 5 character sep
0.36 0.41 88.4 "ABCDE".join(list("ABC..Z")) (*1000)
========== join string with 26 characters, with 1 character sep
N/A 1.06 0.0 "A".join("ABC..Z") (*1000)
========== join string with 26 characters, with 5 character sep
N/A 1.22 0.0 "ABCDE".join("ABC..Z") (*1000)
========== late match, 100 characters
2.52 2.68 94.0 s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
2.35 3.06 76.9 s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
1.55 1.61 96.2 s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
2.51 2.68 94.0 s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
3.57 4.66 76.7 s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
3.23 3.24 99.8 s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
2.35 2.56 91.7 s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
3.23 3.24 99.8 s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
3.58 3.92 91.4 s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
3.62 3.96 91.4 s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
2.89 3.38 85.4 s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
========== late match, two characters
0.52 0.52 99.5 ("AB"*300+"C").find("BC") (*1000)
0.69 0.90 76.5 ("AB"*300+"CA").find("CA") (*1000)
0.67 0.37 179.2 "BC" in ("AB"*300+"C") (*1000)
0.51 0.53 96.8 ("AB"*300+"C").index("BC") (*1000)
0.48 0.81 59.3 ("AB"*300+"C").partition("BC") (*1000)
0.55 0.55 101.5 ("C"+"AB"*300).rfind("CA") (*1000)
0.85 0.85 100.0 ("BC"+"AB"*300).rfind("BC") (*1000)
0.55 0.55 100.3 ("C"+"AB"*300).rindex("CA") (*1000)
0.52 0.60 87.1 ("C"+"AB"*300).rpartition("CA") (*1000)
0.78 0.82 95.4 ("C"+"AB"*300).rsplit("CA", 1) (*1000)
0.65 0.72 91.2 ("AB"*300+"C").split("BC", 1) (*1000)
========== no match, single character
0.77 0.77 100.6 ("A"*1000).find("B") (*1000)
0.98 0.63 155.1 "B" in "A"*1000 (*1000)
0.66 0.66 99.7 ("A"*1000).partition("B") (*1000)
0.77 0.77 100.4 ("A"*1000).rfind("B") (*1000)
0.66 0.66 99.7 ("A"*1000).rpartition("B") (*1000)
0.88 0.88 100.4 ("A"*1000).rsplit("B", 1) (*1000)
0.88 0.87 101.2 ("A"*1000).split("B", 1) (*1000)
========== no match, two characters
1.19 1.21 98.1 ("AB"*1000).find("BC") (*1000)
1.79 2.51 71.2 ("AB"*1000).find("CA") (*1000)
1.28 1.08 119.1 "BC" in "AB"*1000 (*1000)
1.10 2.11 52.1 ("AB"*1000).partition("BC") (*1000)
2.37 2.37 100.0 ("AB"*1000).rfind("BC") (*1000)
1.36 1.36 100.5 ("AB"*1000).rfind("CA") (*1000)
2.25 2.26 99.9 ("AB"*1000).rpartition("BC") (*1000)
2.38 2.62 90.7 ("AB"*1000).rsplit("BC", 1) (*1000)
1.18 1.30 90.1 ("AB"*1000).split("BC", 1) (*1000)
========== quick replace multiple character match
0.12 0.32 37.1 ("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10)
========== quick replace single character match
0.12 0.30 37.9 ("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10)
========== repeat 1 character 10 times
0.08 0.09 90.3 "A"*10 (*1000)
========== repeat 1 character 1000 times
0.16 0.19 82.2 "A"*1000 (*1000)
========== repeat 5 characters 10 times
0.11 0.12 98.3 "ABCDE"*10 (*1000)
========== repeat 5 characters 1000 times
0.40 0.58 67.9 "ABCDE"*1000 (*1000)
========== replace and expand multiple characters, big string
1.95 2.13 91.7 "...text.with.2000.newlines...replace("\n", "\r\n") (*10)
========== replace multiple characters, dna
2.93 3.25 90.3 dna.replace("ATC", "ATT") (*10)
========== replace single character
0.25 0.26 96.6 "This is a test".replace(" ", "\t") (*1000)
========== replace single character, big string
0.73 1.01 72.0 "...text.with.2000.lines...replace("\n", " ") (*10)
========== replace/remove multiple characters
0.30 0.34 89.0 "When shall we three meet again?".replace("ee", "") (*1000)
========== split 1 whitespace
0.12 0.13 93.3 ("Here are some words. "*2).partition(" ") (*1000)
0.11 0.11 98.8 ("Here are some words. "*2).rpartition(" ") (*1000)
0.32 0.37 86.5 ("Here are some words. "*2).rsplit(None, 1) (*1000)
0.32 0.33 96.9 ("Here are some words. "*2).split(None, 1) (*1000)
========== split 2000 newlines
1.76 2.19 80.5 "...text...".rsplit("\n") (*10)
1.72 2.10 81.9 "...text...".split("\n") (*10)
1.87 2.58 72.4 "...text...".splitlines() (*10)
========== split newlines
0.36 0.34 103.9 "this\nis\na\ntest\n".rsplit("\n") (*1000)
0.35 0.33 105.9 "this\nis\na\ntest\n".split("\n") (*1000)
0.31 0.34 89.7 "this\nis\na\ntest\n".splitlines() (*1000)
========== split on multicharacter separator (dna)
2.18 2.34 93.4 dna.rsplit("ACTAT") (*10)
2.50 2.64 94.5 dna.split("ACTAT") (*10)
========== split on multicharacter separator (small)
0.59 0.62 95.3
"this--is--a--test--of--the--emergency--broadcast--system".rsplit("--")
(*1000)
0.55 0.59 93.1
"this--is--a--test--of--the--emergency--broadcast--system".split("--")
(*1000)
========== split whitespace (huge)
1.54 2.34 65.5 human_text.rsplit() (*10)
1.51 2.22 68.3 human_text.split() (*10)
========== split whitespace (small)
0.46 0.60 76.5 ("Here are some words. "*2).rsplit() (*1000)
0.45 0.51 87.6 ("Here are some words. "*2).split() (*1000)
========== startswith multiple characters
0.18 0.18 97.3 "Andrew".startswith("Andrew") (*1000)
========== startswith multiple characters - not!
0.18 0.18 100.1 "Andrew".startswith("Anders") (*1000)
========== startswith single character
0.17 0.18 96.8 "Andrew".startswith("A") (*1000)
========== strip terminal newline
0.11 0.21 52.0 s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000)
0.06 0.07 92.1 "\nHello!".rstrip() (*1000)
0.06 0.07 92.2 "Hello!\n".rstrip() (*1000)
0.06 0.07 91.2 "\nHello!\n".strip() (*1000)
0.06 0.07 91.1 "\nHello!".strip() (*1000)
0.06 0.07 91.1 "Hello!\n".strip() (*1000)
========== strip terminal spaces and tabs
0.07 0.07 89.4 "\t \tHello".rstrip() (*1000)
0.07 0.07 91.4 "Hello\t \t".rstrip() (*1000)
0.04 0.05 88.7 "Hello\t \t".strip() (*1000)
========== tab split
0.57 0.56 100.8 GFF3_example.rsplit("\t", (*1000)
0.53 0.53 100.7 GFF3_example.rsplit("\t") (*1000)
0.49 0.49 101.2 GFF3_example.split("\t", (*1000)
0.51 0.49 103.5 GFF3_example.split("\t") (*1000)
102.13 125.57 81.3 TOTAL

--
Terry Jan Reedy


 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      01-12-2013
On Sat, Jan 12, 2013 at 1:38 AM, <(E-Mail Removed)> wrote:
> The difference between a correct (coherent) unicode handling and ...


This thread was about byte string concatenation, not unicode, so your
rant is not even on-topic here.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: stringbench (was Re: String concatenation benchmarking weirdness) Chris Angelico Python 0 01-12-2013 02:31 PM
Re: stringbench (was Re: String concatenation benchmarking weirdness) Terry Reedy Python 0 01-12-2013 02:27 PM
stringbench (was Re: String concatenation benchmarking weirdness) Chris Angelico Python 0 01-12-2013 11:42 AM
Tkinter WEIRDNESS or Python WEIRDNESS? steve Python 4 03-13-2005 12:34 AM
what's the difference between VHDL 93 CONCATENATION and VHDL 87 CONCATENATION? walala VHDL 3 09-18-2003 04:17 AM



Advertisments