Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   String concatenation (http://www.velocityreviews.com/forums/t332334-string-concatenation.html)

 Jonas Galvez 06-18-2004 11:40 PM

String concatenation

Is it true that joining the string elements of a list is faster than
concatenating them via the '+' operator?

"".join(['a', 'b', 'c'])

vs

'a'+'b'+'c'

If so, can anyone explain why?

\\ jonas galvez
// jonasgalvez.com

 Peter Hansen 06-21-2004 06:02 AM

Re: String concatenation

Jonas Galvez wrote:

> Is it true that joining the string elements of a list is faster than
> concatenating them via the '+' operator?
>
> "".join(['a', 'b', 'c'])
>
> vs
>
> 'a'+'b'+'c'
>
> If so, can anyone explain why?

It's because the latter one has to build a temporary
string consisting of 'ab' first, then the final string
with 'c' added, while the join can (and probably does) add up
all the lengths of the strings to be joined and build the final
string all in one go.

Note that there's also '%s%s%s' % ('a', 'b', 'c'), which is
probably on par with the join technique for both performance

Note much more importantly, however, that you should probably
not pick the join approach over the concatenation approach
based on performance. Concatenation is more readable in the
above case (ignoring the fact that it's a contrived example),

The reason joining lists is popular is because of the
up a string in pieces, rather than appending to a list and
then doing join at the end.

So

l = []
l.append('a')
l.append('b')
l.append('c')
s = ''.join(l)

is _much_ faster (therefore better) in real-world cases than

s = ''
s += 'a'
s += 'b'
s += 'c'

With the latter, if you picture longer and many more strings,
and realize that each += causes a new string to be created
consisting of the contents of the two old strings joined together,
steadily growing longer and requiring lots of wasted copying,
you can see why it's very bad on memory and performance.

The list approach doesn't copy the strings at all, but just
holds references to them in a list (which does grow in a
similar but much more efficient manner). The join figures
out the sizes of all of the strings and allocates enough
space to do only a single copy from each.

Again though, other than the += versus .append() case, you should
probably not pick ''.join() over + since readability will
suffer more than your performance will improve.

-Peter

 Duncan Booth 06-21-2004 08:05 AM

Re: String concatenation

Peter Hansen <peter@engcorp.com> wrote in news:xvydnWNN7t2X50vdRVn-
gw@powergate.ca:

> Jonas Galvez wrote:
>
>> Is it true that joining the string elements of a list is faster than
>> concatenating them via the '+' operator?
>>

> Note that there's also '%s%s%s' % ('a', 'b', 'c'), which is
> probably on par with the join technique for both performance

A few more points.

Yes, the format string in this example isn't the clearest, but if you have
a case where some of the strings are fixed and others vary, then the format
string can be the clearest.

e.g.

'<a href="%s" alt="%s">%s</a>' % (uri, alt, text)

rather than:

'<a href="'+uri+'" alt="'+alt+'">'+text+'</a>'

In many situations I find I use a combination of all three techniques.
Build a list of strings to be concatenated to produce the final output, but
each of these strings might be built from a format string or simple

On the readability of ''.join(), I would suggest never writing it more than
once. That means I tend to do something like:

concatenate = ''.join
...
concatenate(myList)

Or

def concatenate(*args):
return ''.join(args)
...
concatenate('a', 'b', 'c')

depending on how it is to be used.

It's also worth saying that a lot of the time you find you don't want the
empty separator at all, (e.g. maybe newline is more appropriate), and in
this case the join really does become easier than simple addition, but
again it is worth wrapping it so that your intention at the point of call
is clear.

Finally, a method call on a bare string (''.join, or '\n'.join) looks
sufficiently bad that if, for some reason, you don't want to give it a name
as above, I would suggest using the alternative form for calling it:

str.join('\n', aList)

rather than:

'\n'.join(aList)

 David Fraser 06-23-2004 08:32 AM

Re: String concatenation

Peter Hansen wrote:
> Jonas Galvez wrote:
>
>> Is it true that joining the string elements of a list is faster than
>> concatenating them via the '+' operator?
>>
>> "".join(['a', 'b', 'c'])
>>
>> vs
>>
>> 'a'+'b'+'c'
>>
>> If so, can anyone explain why?

>
>
> It's because the latter one has to build a temporary
> string consisting of 'ab' first, then the final string
> with 'c' added, while the join can (and probably does) add up
> all the lengths of the strings to be joined and build the final
> string all in one go.

Idea sprang to mind: Often (particularly in generating web pages) one
wants to do lots of += without thinking about "".join.
So what about creating a class that will do this quickly?
The following class does this and is much faster when adding together
lots of strings. Only seem to see performance gains above about 6000
strings...

David

class faststr(str):
def __init__(self, *args, **kwargs):
self.appended = []
str.__init__(self, *args, **kwargs)
self.appended.append(otherstr)
return self
def getstr(self):
return str(self) + "".join(self.appended)

for i in range(n):
start += str(i)
if hasattr(start, "getstr"):
return start.getstr()
else:
return start

if __name__ == "__main__":
import sys
if len(sys.argv) >= 3 and sys.argv[2] == "fast":
start = faststr("test")
else:
start = "test"

 =?iso-8859-15?Q?Pierre-Fr=E9d=E9ric_Caillaud?= 06-23-2004 02:21 PM

Re: String concatenation

Let's try this :

def test_concat():
s = ''
for i in xrange( test_len ):
s += str( i )
return s

def test_join():
s = []
for i in xrange( test_len ):
s.append( str( i ))
return ''.join(s)

def test_join2():
return ''.join( map( str, range( test_len ) ))

Results, with and without psyco :

test_len = 1000
String concatenation (normal) 4.85290050507 ms.
[] append + join (normal) 4.27646517754 ms.
map + join (normal) 2.37970948219 ms.

String concatenation (psyco) 2.0838675499 ms.
[] append + join (psyco) 2.29129695892 ms.
map + join (psyco) 2.21130692959 ms.

test_len = 5000
String concatenation (normal) 40.3251230717 ms.
[] append + join (normal) 23.3911275864 ms.
map + join (normal) 13.844203949 ms.

String concatenation (psyco) 9.65108215809 ms.
[] append + join (psyco) 13.0564379692 ms.
map + join (psyco) 13.342962265 ms.

test_len = 10000
String concatenation (normal) 163.02690506 ms.
[] append + join (normal) 47.6168513298 ms.
map + join (normal) 28.5276055336 ms.

String concatenation (psyco) 19.6494650841 ms.
[] append + join (psyco) 26.637775898 ms.
map + join (psyco) 26.7823898792 ms.

test_len = 20000
String concatenation (normal) 4556.57429695 ms.
[] append + join (normal) 92.0199871063 ms.
map + join (normal) 56.7145824432 ms.

String concatenation (psyco) 42.247030735 ms.
[] append + join (psyco) 58.3201909065 ms.
map + join (psyco) 53.8239884377 ms.

Conclusion :

- join is faster but worth the annoyance only if you join 1000s of strings
- map is useful
- psyco makes join useless if you can use it (depends on which web
framework you use)
- python is really pretty fast even without psyco (it runs about one mips
!)

Note :

Did I mention psyco has a special optimization for string concatenation ?

 Steve Holden 06-25-2004 12:35 PM

Re: String concatenation

Duncan Booth wrote:

[...]
> Finally, a method call on a bare string (''.join, or '\n'.join) looks
> sufficiently bad that if, for some reason, you don't want to give it a name
> as above, I would suggest using the alternative form for calling it:
>
> str.join('\n', aList)
>
> rather than:
>
> '\n'.join(aList)

This is, of course, pure prejudice. Not that there's anything wrong with
that ...

regards
Steve

 All times are GMT. The time now is 07:38 AM.