Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > StringBuilder

Reply
Thread Tools

StringBuilder

 
 
Arne Vajh°j
Guest
Posts: n/a
 
      09-11-2011
On 9/11/2011 10:35 AM, Wanja Gayk wrote:
> In article<4e655caa$0$308$(E-Mail Removed)>, http://www.velocityreviews.com/forums/(E-Mail Removed)
> says...
>>> Whereby this code is slower:
>>>
>>> String res="";
>>> for (int i=0; i<100; i++) {
>>> res+=i+"*"+i+"="+(i*i)+"\n";
>>> }
>>> System.out.println(res);
>>>
>>> It is translated to the following code by the compiler, and
>>> thus uses 100 new and 100 toString():
>>>
>>> String res="";
>>> for (int i=0; i<100; i++) {
>>> StringBuilder _buf=new StringBuilder(res);
>>> _buf.append(i);
>>> _buf("*");
>>> _buf.append(i);
>>> _buf.append("=");
>>> _buf.append((i*i));
>>> _buf.append("\n");
>>> res=_buf.toString();
>>> }
>>> System.out.println(res);
>>>
>>> For more information see for example here:
>>> http://caprazzi.net/posts/java-bytec...stringbuilder/

>>
>> That has been known for 10-15 years.
>>
>> It should be in any Java book above beginners level.

>
> Like other ancient performance-practices that have been obsoleted by
> today's compilers?


No.

What we are talking about is the "+=" part and that part is
still relevant for todays compilers.

Actually JB's example use = but to be equivalent to the first code
the it need to be +=.

And besides what he mention as reasons there are also
the 100 new strings.

> We have been told that using StringBuffer was the better alternative.
> Now compilers have switched from using StringBuffer for the above
> example to the unsynchronized StringBuilder. Those who have manually
> used the StringBuffer have stopped the compiler for doing that for them
> and must rely on the JITs lock elision algorithm.


That is a little interesting quirk.

The outside loop string += part is more efficient with StringB*. And
I am pretty sure that the StringBuffer/StringBuilder difference
is negligible.

But the inside loop string + is probably a little bit faster
with StringBuilder than StringBuffer.

You can consider that insignificant compared to the first.

Or you could argue for using StringB* and append of string +.

> So as long as this part of the code does not represent a critical
> performance-bottleneck, I would recommend to use the simple, stupid
> "slow" variant and hope for future compilers to detect and optimize that
> pattern.


As long as it is not critical then readability should be the deciding
factor.

Arne

 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      09-17-2011
On Sat, 17 Sep 2011 02:54:28 +0200, Wanja Gayk <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>
>Is simple pattern that could be detected by tomorrows JIT-compilers and
>transformed into:


I jittered at the Jet people about such optimisations. They seemed to
feel that string concatenation was not important an operation for
people to worry about in optimising. I disagreed since so much of my
code is about fiddling text. I don't have a handle on its importance
generally.
--
Roedy Green Canadian Mind Products
http://mindprod.com
Your top priority should be fixing bugs. If you carry on development,
you are just creating more places you will have to search for them.

 
Reply With Quote
 
 
 
 
Stanimir Stamenkov
Guest
Posts: n/a
 
      09-17-2011
Mon, 05 Sep 2011 05:27:15 +0200, /Jan Burse/:

> If you then explicitly use StringBuilder you are
> faster, because you save the new StringBuilder() and toString().
>
> So this is faster, since it uses 1 new and 1 toString():


The StringBuilder.toString() is really fast - that's the point, and
I don't think it is worth mentioning it.

--
Stanimir
 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-17-2011
Stanimir Stamenkov schrieb:
> Mon, 05 Sep 2011 05:27:15 +0200, /Jan Burse/:
>
>> If you then explicitly use StringBuilder you are
>> faster, because you save the new StringBuilder() and toString().
>>
>> So this is faster, since it uses 1 new and 1 toString():

>
> The StringBuilder.toString() is really fast - that's the point, and I
> don't think it is worth mentioning it.
>


I am not sure whether I can agree directly.

The StringBuilder is a mutable object. The String is a
immutable object. Therefore the obvious fast implementation
that would share the buffer between StringBuilder and
String does not work. Because the following code would
break the immutability of String:

StringBuilder buf=new StringBuilder();

buf.append("Hello World!");

String str=buf.toString();

buf.replace(6,11,"Java");

System.out.println("str="+str);

By a side effect via buf replace the value of the
string str would change. Therefore we find the following
slow implementation of toString() in the reference
implementation. Please note the comment:

429 public String toString() {
430 // Create a copy, don't share the array
431 return new String(value, 0, count);
432 }

http://kickjava.com/src/java/lang/St...ilder.java.htm

And if we look at the used constructor, it does really
make a copy. There would be a non public constructor
in String that allows some sharing, and that is for
example used to implement substring. But this time
a constructor is used that does not do a sharing:

197 public String(char value[], int offset, int count) {
198 if (offset < 0) {
199 throw new StringIndexOutOfBoundsException(offset);
200 }
201 if (count < 0) {
202 throw new StringIndexOutOfBoundsException(count);
203 }
204 // Note: offset or count might be near -1>>>1.
205 if (offset > value.length - count) {
206 throw new StringIndexOutOfBoundsException
(offset + count);
207 }
208 char[] v = new char[count];
209 System.arraycopy(value, offset, v, 0, count);
210 this.offset = 0;
211 this.count = count;
212 this.value = v;
213 }

http://kickjava.com/src/java/lang/String.java.htm

Eventually some programm analysis would allow sharing.
But the copying has also a positive effect. When
the StringBuilder by manipulation has gained a much
greater capacity than necessary, then the copying will
create a smaller char array, so that less space is used
as soon as the StringBuilder is reclaimed.

But maybe you are right, that toString() is nevertheless
fast. Since a) allocating objects is usually fast and
b) System array copy can also be fast. And together
with the capacity reducing effect this could all lead
to a small overhead.

BTW: OpenJDK uses the same code. In Harmony we find
a shared flag in the AbstractStringBuilder, and a
heuristic when sharing is done or not. The non public
String constructor is used for sharing:

public String toString() {
if (count == 0) {
return ""; //$NON-NLS-1$
}
// Optimize String sharing for more performance
int wasted = value.length - count;
if (wasted >= 256
|| (wasted >= INITIAL_CAPACITY &&
wasted >= (count >> 1))) {
return new String(value, 0, count);
}
shared = true;
return new String(0, count, value);
}

http://www.java2s.com/Open-Source/Ja...ilder.java.htm

There is then a little overhead in the basic operations
of StringBuilder to check for sharing, and in case that
there is sharing, a copy is made.

final void replace0(int start, int end, String string) {
[...]
if (!shared) {
// index == count case is no-op
System.arraycopy(value, end, value, start
+ stringLength, count - end);
} else {
char[] newData = new char[value.length];
System.arraycopy(value, 0, newData, 0, start);
// index == count case is no-op
System.arraycopy(value, end, newData, start
+ stringLength, count - end);
value = newData;
shared = false;
}

Probably gain in speed by the sharing compensates for
this little extra check needed everwhere. So probably
toString() is relatively fast here, assuming that sharing
happens enough often. When we look at the loop example
then we can positively influence sharing when we give
a good initial capacity, because then waste is small.

But giving an initial capacity for the whole loop is
propably non trivial. How does the digit size of
squares develop. So assume our StringBuilder grows
according to its enlargeBuffer rule. In the case of
Harmony the capacity is growing by a factor 1.5 and by
adding 2.

So initially we will have waste >= count/2 whenever
an enlargement happend, because of the adding of two
we have waste = count/2 + 2. So no sharing will happen.
When we then have added n characters, we will have
waste' = count/2 + 2 - n and count' = count + n.
We have only waste' < count' / 2 when 2 - n < n / 2.
So only after adding 2 characters sharing will happen
again for shure.

So the heuristic has a little glitch. But never mind.

Best Regards
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      09-17-2011
On Sat, 17 Sep 2011 19:56:50 +0300, Stanimir Stamenkov
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone who
said :

>The StringBuilder.toString() is really fast - that's the point, and
>I don't think it is worth mentioning it.


The places where StringBuilder wastes CPU cycles is when you have a
bad estimate and it has to create new buffer and copy what it has done
so far to it. If estimate is two low, you can get repeated such
doublings. If it is too high, you fill up RAM too quickly and force
premature c.

My solution was FastCat which is quite easy to get a bang on estimate.
see http://mindprod.com/products1.html#FASTCAT

StringBuilder composes its string in a char[]. Unfortunately it can't
simply plop that into a String object at the end. It has to allocate
yet another buffer, copy into it, and that becomes your string object.
The JVM is worried there might be encumbrances (pointers to) the
char[]. So it has to copy rather than reference. Perhaps a little
native code could bypass the final copy.
--
Roedy Green Canadian Mind Products
http://mindprod.com
Your top priority should be fixing bugs. If you carry on development,
you are just creating more places you will have to search for them.

 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-17-2011
Roedy Green schrieb:
> The JVM is worried there might be encumbrances (pointers to) the char[].


Well, strictly speaking, a pointer alone is not harmful and quite
impossible since the field is for example package local in the
AbstractStringBuilder class and the accessor is also package local.

The problem is invoking a method of an object that has access to
the char[] and will do some write into the array between offset
and offset+count of the string.

Bye

 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-17-2011
Jan Burse schrieb:
> quite impossible


Well not impossible. Via reflection we can access it
even when it is package local:

http://download.oracle.com/javase/6/...bleObject.html

Hm ...


 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      09-18-2011
On Sun, 18 Sep 2011 01:33:41 +0200, Jan Burse <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

>The problem is invoking a method of an object that has access to
>the char[] and will do some write into the array between offset
>and offset+count of the string.


Exactly. If there exists a reference to the char after it is inside
the String, that is a security breach, since it could be used to
modify the String.
--
Roedy Green Canadian Mind Products
http://mindprod.com
Your top priority should be fixing bugs. If you carry on development,
you are just creating more places you will have to search for them.

 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-18-2011
Peter Duniho schrieb:

> In any case, if Java does _not_ implement it that way, I suspect that's


Well, I wouldn't say its a matter of *Java*. Its a matter
of the given JDK how it is implementented. *Java* defines
the contract but there are many implementations.

From my post you can read off the following findings:

Oracle: Always does a copy (I only checked rt.jar,
not some alt-rt.jar).
OpenJDK: Always does a copy.
Harmony: Most of the time does a sharing and
flags the builder (Similarly like you
describe the .NET implementation)

But beware the list is not complete, for example I
didn't check the Apple classes.jar or IBM's rt.jar.
And also the finding only holds for JRE 1.6, and can
change at any time if the provider of the JRE decides
so. Or might be different for 32-bit and 64-bit etc..

> different decision-making process rather than ignorance. In other
> words, they have already considered whether it's a worthwhile
> optimization and decided


There is not only a plurality of *they* as can be seen
from above, but there is also a plurality what means
*worthwhile*. An implementation with a more reference
implementation character, like for example the Oracle
JRE, might take the simple route. Since the focus is
then more on functional requirements than on non-
functional requirements.

> Which strongly suggests that anyone worrying a priori
> about the performance of StringBuilder before they have
> demonstrated it's an actual bottleneck in their program
> is wasting their time.


I guess it is more about performance tuning than removing
bottlenecks. And any performance gain is only seen in
programs that make heavy use of StringBuilder. Such
measurements have already been done over and over. See
for example (not exactly measurig toString()):

+=: 546 ms
StringBuilder, default initial capacity: 30ms
StringBuilder, exact initial capacity: 10ms

http://christian.bloggingon.net/arch...ngbuilder.aspx

But measurements in detail will vary depending on JDK
and machine. But whether one JDK changes the measurment
fundamentally depends very much of the available
algorithms for the functional requirements and how
these algorithms behave non-functionally.

But I would say the consumers of JDKs are we the
programmers, so it is good to measure, inspect and
debate JDKs and not blindly trust any *they*.

Bye

 
Reply With Quote
 
Jan Burse
Guest
Posts: n/a
 
      09-18-2011
Jan Burse schrieb:
> for example (not exactly measurig toString()):
>
> +=: 546 ms
> StringBuilder, default initial capacity: 30ms
> StringBuilder, exact initial capacity: 10ms
>
> http://christian.bloggingon.net/arch...ngbuilder.aspx
>


Oops it measures somehow toString(). Since multiple
toStrings() are involved in the += test case.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
string vs stringbuilder H.G.Srivatsa ASP .Net 3 08-25-2005 01:55 PM
Double Quotes in a stringbuilder =?Utf-8?B?SnVzdGlu?= ASP .Net 2 03-18-2005 12:57 AM
Does a StringBuilder effect Viewstate variable size? darrel ASP .Net 2 01-28-2005 11:31 PM
How to reset a StringBuilder? ESPN Lover ASP .Net 2 10-14-2004 03:42 PM
no stringbuilder in C#2.0? =?Utf-8?B?SGFuZGk=?= ASP .Net 2 09-11-2004 08:27 AM



Advertisments