Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > std::string and std::ostringstream performances

Reply
Thread Tools

std::string and std::ostringstream performances

 
 
Bala2508
Guest
Posts: n/a
 
      10-31-2007
Hi,

I have a C++ application that extensively uses std::string and
std:stringstream in somewhat similar manner as below

std::string msgHeader;

msgHeader = "<";
msgHeader += a;
msgHeader += "><";

msgHeader += b;
msgHeader += "><";

msgHeader += c;
msgHeader += ">";

Similarly it uses ostringstream as well and the function that uses
this gets called almost on every message that my application gets on
the socket. I am using this to precisely construct a XML Message to
be sent to another application.

What we observed when we ran a collect/analyzer on the application is
that it shows majority of the CPU spent in trying to deal with these 2
datatypes, their memory allocation using std::allocator and other
stuff. The CPU goes as high as 100% sometimes.

I would like to get an advice/suggestion on the following points
1. Is there a better way to use std::string / std:stringstream than
the way I have been using it?
2. AM I using the wrong datatype for such kind of operations and
should move on to use something else? Any suggestions what the
datatype should be?

I eventually need these datatypes because the external library that I
am using to send this data out needs it in std::string /
std:stringstream formats.

Would like to have some suggestions to bring down the CPU utilization.

Thanks,
Bala

 
Reply With Quote
 
 
 
 
Jim Langston
Guest
Posts: n/a
 
      10-31-2007
"Bala2508" <> wrote in message
news: ups.com...
> Hi,
>
> I have a C++ application that extensively uses std::string and
> std:stringstream in somewhat similar manner as below
>
> std::string msgHeader;
>
> msgHeader = "<";
> msgHeader += a;
> msgHeader += "><";
>
> msgHeader += b;
> msgHeader += "><";
>
> msgHeader += c;
> msgHeader += ">";
>
> Similarly it uses ostringstream as well and the function that uses
> this gets called almost on every message that my application gets on
> the socket. I am using this to precisely construct a XML Message to
> be sent to another application.
>
> What we observed when we ran a collect/analyzer on the application is
> that it shows majority of the CPU spent in trying to deal with these 2
> datatypes, their memory allocation using std::allocator and other
> stuff. The CPU goes as high as 100% sometimes.
>
> I would like to get an advice/suggestion on the following points
> 1. Is there a better way to use std::string / std:stringstream than
> the way I have been using it?
> 2. AM I using the wrong datatype for such kind of operations and
> should move on to use something else? Any suggestions what the
> datatype should be?
>
> I eventually need these datatypes because the external library that I
> am using to send this data out needs it in std::string /
> std:stringstream formats.
>
> Would like to have some suggestions to bring down the CPU utilization.


One suggestion would be .reserve(). I E.
std::string msgHeader;
msgHeader.reserve( 100 );

That way the string msgHeader wouldn't need to try to allocate more memory
until it has used the initial 100 characters allocated. Some compilers are
better at preallocating a default number of bytes than others. Sometimes
they have to be given a hint. Figure out a good size to reserve (one big
enough where you won't need to be doing reallocatings, one small enough that
you're not running out of memory) and then try profiling it again and see if
it helps.


 
Reply With Quote
 
 
 
 
Bala
Guest
Posts: n/a
 
      10-31-2007
On Oct 31, 12:09 pm, "Jim Langston" <tazmas...@rocketmail.com> wrote:
> "Bala2508" <R.Balaji.I...@gmail.com> wrote in message
>
> news: ups.com...
>
>
>
>
>
> > Hi,

>
> > I have a C++ application that extensively uses std::string and
> > std:stringstream in somewhat similar manner as below

>
> > std::string msgHeader;

>
> > msgHeader = "<";
> > msgHeader += a;
> > msgHeader += "><";

>
> > msgHeader += b;
> > msgHeader += "><";

>
> > msgHeader += c;
> > msgHeader += ">";

>
> > Similarly it uses ostringstream as well and the function that uses
> > this gets called almost on every message that my application gets on
> > the socket. I am using this to precisely construct a XML Message to
> > be sent to another application.

>
> > What we observed when we ran a collect/analyzer on the application is
> > that it shows majority of the CPU spent in trying to deal with these 2
> > datatypes, their memory allocation using std::allocator and other
> > stuff. The CPU goes as high as 100% sometimes.

>
> > I would like to get an advice/suggestion on the following points
> > 1. Is there a better way to use std::string / std:stringstream than
> > the way I have been using it?
> > 2. AM I using the wrong datatype for such kind of operations and
> > should move on to use something else? Any suggestions what the
> > datatype should be?

>
> > I eventually need these datatypes because the external library that I
> > am using to send this data out needs it in std::string /
> > std:stringstream formats.

>
> > Would like to have some suggestions to bring down the CPU utilization.

>
> One suggestion would be .reserve(). I E.
> std::string msgHeader;
> msgHeader.reserve( 100 );
>
> That way the string msgHeader wouldn't need to try to allocate more memory
> until it has used the initial 100 characters allocated. Some compilers are
> better at preallocating a default number of bytes than others. Sometimes
> they have to be given a hint. Figure out a good size to reserve (one big
> enough where you won't need to be doing reallocatings, one small enough that
> you're not running out of memory) and then try profiling it again and see if
> it helps.- Hide quoted text -
>
> - Show quoted text -


I also clear the string using msgHeader.str("") method once i am done
with the sending of the message. Then again when this method gets
called, the same sequence of events happen. Wouldnt it clear the
allocated memory once i do a msgHeader.str("")? How do reserving
essentially help in this scenario?

 
Reply With Quote
 
=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=
Guest
Posts: n/a
 
      10-31-2007
On 2007-10-31 19:48, Bala wrote:
> On Oct 31, 12:09 pm, "Jim Langston" <tazmas...@rocketmail.com> wrote:
>> "Bala2508" <R.Balaji.I...@gmail.com> wrote in message
>>
>> news: ups.com...
>>
>>
>>
>>
>>
>> > Hi,

>>
>> > I have a C++ application that extensively uses std::string and
>> > std:stringstream in somewhat similar manner as below

>>
>> > std::string msgHeader;

>>
>> > msgHeader = "<";
>> > msgHeader += a;
>> > msgHeader += "><";

>>
>> > msgHeader += b;
>> > msgHeader += "><";

>>
>> > msgHeader += c;
>> > msgHeader += ">";

>>
>> > Similarly it uses ostringstream as well and the function that uses
>> > this gets called almost on every message that my application gets on
>> > the socket. I am using this to precisely construct a XML Message to
>> > be sent to another application.

>>
>> > What we observed when we ran a collect/analyzer on the application is
>> > that it shows majority of the CPU spent in trying to deal with these 2
>> > datatypes, their memory allocation using std::allocator and other
>> > stuff. The CPU goes as high as 100% sometimes.

>>
>> > I would like to get an advice/suggestion on the following points
>> > 1. Is there a better way to use std::string / std:stringstream than
>> > the way I have been using it?
>> > 2. AM I using the wrong datatype for such kind of operations and
>> > should move on to use something else? Any suggestions what the
>> > datatype should be?

>>
>> > I eventually need these datatypes because the external library that I
>> > am using to send this data out needs it in std::string /
>> > std:stringstream formats.

>>
>> > Would like to have some suggestions to bring down the CPU utilization.

>>
>> One suggestion would be .reserve(). I E.
>> std::string msgHeader;
>> msgHeader.reserve( 100 );
>>
>> That way the string msgHeader wouldn't need to try to allocate more memory
>> until it has used the initial 100 characters allocated. Some compilers are
>> better at preallocating a default number of bytes than others. Sometimes
>> they have to be given a hint. Figure out a good size to reserve (one big
>> enough where you won't need to be doing reallocatings, one small enough that
>> you're not running out of memory) and then try profiling it again and see if
>> it helps.- Hide quoted text -
>>

> I also clear the string using msgHeader.str("") method once i am done
> with the sending of the message. Then again when this method gets
> called, the same sequence of events happen. Wouldnt it clear the
> allocated memory once i do a msgHeader.str("")? How do reserving
> essentially help in this scenario?


To clear the string use clear() instead, that is what it is meant for.
clear() will not affect the capacity of the string so if you do
something like

std::string str;
str.reserve(100);
str.clear();

you will still be able to put 100 characters into the string before it
needs to reallocate.

Of course, if msgHeader is declared in the function that gets called it
will go out of scope when the function returns and will be reallocated
when it is called again, in which case a new string will be constructed
in which case the operations on the string will have not effect over two
different calls. If msgHeader on the other hand is external to the
function then you will probably benefit from using reserve. BTW, when
calling reserve() with an argument that is smaller than or equal to the
current capacity no action is taken.

--
Erik Wikström
 
Reply With Quote
 
Bala
Guest
Posts: n/a
 
      10-31-2007
On Oct 31, 3:21 pm, Erik Wikström <Erik-wikst...@telia.com> wrote:
> On 2007-10-31 19:48, Bala wrote:
>
>
>
>
>
> > On Oct 31, 12:09 pm, "Jim Langston" <tazmas...@rocketmail.com> wrote:
> >> "Bala2508" <R.Balaji.I...@gmail.com> wrote in message

>
> >>news: roups.com...

>
> >> > Hi,

>
> >> > I have a C++ application that extensively uses std::string and
> >> > std:stringstream in somewhat similar manner as below

>
> >> > std::string msgHeader;

>
> >> > msgHeader = "<";
> >> > msgHeader += a;
> >> > msgHeader += "><";

>
> >> > msgHeader += b;
> >> > msgHeader += "><";

>
> >> > msgHeader += c;
> >> > msgHeader += ">";

>
> >> > Similarly it uses ostringstream as well and the function that uses
> >> > this gets called almost on every message that my application gets on
> >> > the socket. I am using this to precisely construct a XML Message to
> >> > be sent to another application.

>
> >> > What we observed when we ran a collect/analyzer on the application is
> >> > that it shows majority of the CPU spent in trying to deal with these 2
> >> > datatypes, their memory allocation using std::allocator and other
> >> > stuff. The CPU goes as high as 100% sometimes.

>
> >> > I would like to get an advice/suggestion on the following points
> >> > 1. Is there a better way to use std::string / std:stringstream than
> >> > the way I have been using it?
> >> > 2. AM I using the wrong datatype for such kind of operations and
> >> > should move on to use something else? Any suggestions what the
> >> > datatype should be?

>
> >> > I eventually need these datatypes because the external library that I
> >> > am using to send this data out needs it in std::string /
> >> > std:stringstream formats.

>
> >> > Would like to have some suggestions to bring down the CPU utilization.

>
> >> One suggestion would be .reserve(). I E.
> >> std::string msgHeader;
> >> msgHeader.reserve( 100 );

>
> >> That way the string msgHeader wouldn't need to try to allocate more memory
> >> until it has used the initial 100 characters allocated. Some compilers are
> >> better at preallocating a default number of bytes than others. Sometimes
> >> they have to be given a hint. Figure out a good size to reserve (one big
> >> enough where you won't need to be doing reallocatings, one small enough that
> >> you're not running out of memory) and then try profiling it again and see if
> >> it helps.- Hide quoted text -

>
> > I also clear the string using msgHeader.str("") method once i am done
> > with the sending of the message. Then again when this method gets
> > called, the same sequence of events happen. Wouldnt it clear the
> > allocated memory once i do a msgHeader.str("")? How do reserving
> > essentially help in this scenario?

>
> To clear the string use clear() instead, that is what it is meant for.
> clear() will not affect the capacity of the string so if you do
> something like
>
> std::string str;
> str.reserve(100);
> str.clear();
>
> you will still be able to put 100 characters into the string before it
> needs to reallocate.
>
> Of course, if msgHeader is declared in the function that gets called it
> will go out of scope when the function returns and will be reallocated
> when it is called again, in which case a new string will be constructed
> in which case the operations on the string will have not effect over two
> different calls. If msgHeader on the other hand is external to the
> function then you will probably benefit from using reserve. BTW, when
> calling reserve() with an argument that is smaller than or equal to the
> current capacity no action is taken.
>
> --
> Erik Wikström- Hide quoted text -
>
> - Show quoted text -


msgHeader is local to the function. And the maximum size would not be
more than a 100 bytes. So I plan to modify my code to use reserve and
clear as you suggested and will try a hand on the performance. I hope
it helps.

BTW, a general question.
If i dont use reserve and my function looks somewhat like this below

void somefunction ()
{
std::string msgHeader
msgHeader = "<";
msgHeader += a;
msgHeader += "><";

msgHeader += b;
msgHeader += "><";

msgHeader += c;
msgHeader += ">";
}

How is the actual memory allocation done? My understanding is that
the string library tries to reallocate memory on every statement.
That is, initially when it finds the statement "msgHeader = "<";", it
allocates say 1 byte to the msgHeader.
Then at the next statement it reallocates msgHeader as sizeof (a) +
current memory of msgHeader and so on.
Is this correct? If yes, then I am sure using reserve would improve
the performance dramatically.

Thanks,
Bala

 
Reply With Quote
 
=?UTF-8?B?RXJpayBXaWtzdHLDtm0=?=
Guest
Posts: n/a
 
      10-31-2007
On 2007-10-31 21:58, Bala wrote:
> On Oct 31, 3:21 pm, Erik Wikström <Erik-wikst...@telia.com> wrote:
>> On 2007-10-31 19:48, Bala wrote:
>>
>>
>>
>>
>>
>> > On Oct 31, 12:09 pm, "Jim Langston" <tazmas...@rocketmail.com> wrote:
>> >> "Bala2508" <R.Balaji.I...@gmail.com> wrote in message

>>
>> >>news: roups.com...

>>
>> >> > Hi,

>>
>> >> > I have a C++ application that extensively uses std::string and
>> >> > std:stringstream in somewhat similar manner as below

>>
>> >> > std::string msgHeader;

>>
>> >> > msgHeader = "<";
>> >> > msgHeader += a;
>> >> > msgHeader += "><";

>>
>> >> > msgHeader += b;
>> >> > msgHeader += "><";

>>
>> >> > msgHeader += c;
>> >> > msgHeader += ">";

>>
>> >> > Similarly it uses ostringstream as well and the function that uses
>> >> > this gets called almost on every message that my application gets on
>> >> > the socket. I am using this to precisely construct a XML Message to
>> >> > be sent to another application.

>>
>> >> > What we observed when we ran a collect/analyzer on the application is
>> >> > that it shows majority of the CPU spent in trying to deal with these 2
>> >> > datatypes, their memory allocation using std::allocator and other
>> >> > stuff. The CPU goes as high as 100% sometimes.

>>
>> >> > I would like to get an advice/suggestion on the following points
>> >> > 1. Is there a better way to use std::string / std:stringstream than
>> >> > the way I have been using it?
>> >> > 2. AM I using the wrong datatype for such kind of operations and
>> >> > should move on to use something else? Any suggestions what the
>> >> > datatype should be?

>>
>> >> > I eventually need these datatypes because the external library that I
>> >> > am using to send this data out needs it in std::string /
>> >> > std:stringstream formats.

>>
>> >> > Would like to have some suggestions to bring down the CPU utilization.

>>
>> >> One suggestion would be .reserve(). I E.
>> >> std::string msgHeader;
>> >> msgHeader.reserve( 100 );

>>
>> >> That way the string msgHeader wouldn't need to try to allocate more memory
>> >> until it has used the initial 100 characters allocated. Some compilers are
>> >> better at preallocating a default number of bytes than others. Sometimes
>> >> they have to be given a hint. Figure out a good size to reserve (one big
>> >> enough where you won't need to be doing reallocatings, one small enough that
>> >> you're not running out of memory) and then try profiling it again and see if
>> >> it helps.- Hide quoted text -

>>
>> > I also clear the string using msgHeader.str("") method once i am done
>> > with the sending of the message. Then again when this method gets
>> > called, the same sequence of events happen. Wouldnt it clear the
>> > allocated memory once i do a msgHeader.str("")? How do reserving
>> > essentially help in this scenario?

>>
>> To clear the string use clear() instead, that is what it is meant for.
>> clear() will not affect the capacity of the string so if you do
>> something like
>>
>> std::string str;
>> str.reserve(100);
>> str.clear();
>>
>> you will still be able to put 100 characters into the string before it
>> needs to reallocate.
>>
>> Of course, if msgHeader is declared in the function that gets called it
>> will go out of scope when the function returns and will be reallocated
>> when it is called again, in which case a new string will be constructed
>> in which case the operations on the string will have not effect over two
>> different calls. If msgHeader on the other hand is external to the
>> function then you will probably benefit from using reserve. BTW, when
>> calling reserve() with an argument that is smaller than or equal to the
>> current capacity no action is taken.
>>
>> --
>> Erik Wikström- Hide quoted text -
>>
>> - Show quoted text -

>
> msgHeader is local to the function. And the maximum size would not be
> more than a 100 bytes. So I plan to modify my code to use reserve and
> clear as you suggested and will try a hand on the performance. I hope
> it helps.
>
> BTW, a general question.
> If i dont use reserve and my function looks somewhat like this below
>
> void somefunction ()
> {
> std::string msgHeader
> msgHeader = "<";
> msgHeader += a;
> msgHeader += "><";
>
> msgHeader += b;
> msgHeader += "><";
>
> msgHeader += c;
> msgHeader += ">";
> }
>
> How is the actual memory allocation done? My understanding is that
> the string library tries to reallocate memory on every statement.
> That is, initially when it finds the statement "msgHeader = "<";", it
> allocates say 1 byte to the msgHeader.
> Then at the next statement it reallocates msgHeader as sizeof (a) +
> current memory of msgHeader and so on.
> Is this correct? If yes, then I am sure using reserve would improve
> the performance dramatically.


I do not know, and I do not think the standard says anything about it.
But a good implementation will probably use a resizing scheme similar to
the one used for vectors, such as (at least) doubling the capacity every
time it resizes.

--
Erik Wikström
 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      11-01-2007
On Oct 31, 8:21 pm, Erik Wikström <Erik-wikst...@telia.com> wrote:
[...]
> Of course, if msgHeader is declared in the function that gets called it
> will go out of scope when the function returns and will be reallocated
> when it is called again, in which case a new string will be constructed
> in which case the operations on the string will have not effect over two
> different calls. If msgHeader on the other hand is external to the
> function then you will probably benefit from using reserve. BTW, when
> calling reserve() with an argument that is smaller than or equal to the
> current capacity no action is taken.


If you (re-)use a string with static lifetime, you probably
don't need reserve. It will very quickly reach the capacity of
the largest header, and never shrink.

In my own work, I tend to use std::vector<char> a lot for this
sort of thing, using ostrstream (initialized to use the space in
the vector) for formatting. My own experience is that the
implementations of std::vector tend to be better optimized that
those of std::string, and you have a lot more guarantees
concerning the allocation strategy. In my case, this is rather
simple, since I am dealing with fixed length records and fields
(so something like:

size_t pos = v.size() ;
v.resize( v.size() + fieldSize ) ;
ostrstream formatter( &v[0] + pos, fieldSize ) ;
formatter << ... ;

works perfectly). But it's something that may be worth
considering. (If called with arguments, ostrstream will format
directly in place, with no dynamic allocation.)

Although not currently guaranteed, something similar using
std::string will actually work with all current implementations,
and will be guaranteed in the next version of the standard.

--
James Kanze (GABI Software) email:
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      11-01-2007
On Oct 31, 9:58 pm, Bala <R.Balaji.I...@gmail.com> wrote:
> On Oct 31, 3:21 pm, Erik Wikström <Erik-wikst...@telia.com> wrote:
> > On 2007-10-31 19:48, Bala wrote:


[...]
> If i dont use reserve and my function looks somewhat like this below


> void somefunction ()
> {
> std::string msgHeader
> msgHeader = "<";
> msgHeader += a;
> msgHeader += "><";


> msgHeader += b;
> msgHeader += "><";


> msgHeader += c;
> msgHeader += ">";
> }


> How is the actual memory allocation done?


However the implementation wants. There are no real
requirements.

In practice, I think most implementations today do something
similar to what they do in std::vector (which requires some sort
of exponential growth strategy). Many implementations also use
the small string optimization---there is no dynamic allocation
whatsoever if the string is small enough (typically something
between 8 and 32 bytes).

> My understanding is that the string library tries to
> reallocate memory on every statement.


It might, but it probably doesn't.

You can find out by tracing the capacity of the string after
each +=. (If the capacity of the empty string immediatly after
construction is greater than 0, then the implementation probably
uses the small string optimization.)

> That is, initially when it finds the statement "msgHeader = "<";", it
> allocates say 1 byte to the msgHeader.
> Then at the next statement it reallocates msgHeader as sizeof (a) +
> current memory of msgHeader and so on.
> Is this correct? If yes, then I am sure using reserve would improve
> the performance dramatically.


Anytime you can set a reasonable maximum for the length, reserve
is likely to help. How much depends largely on the
implementation, however.

--
James Kanze (GABI Software) email:
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

 
Reply With Quote
 
James Kanze
Guest
Posts: n/a
 
      11-01-2007
On Oct 31, 10:55 pm, Erik Wikström <Erik-wikst...@telia.com> wrote:

[...]
> I do not know, and I do not think the standard says anything
> about it. But a good implementation will probably use a
> resizing scheme similar to the one used for vectors, such as
> (at least) doubling the capacity every time it resizes.


Doubling is actually not a very good strategy; multiplying by
say 1.5 is considerably better. (As a general rule, the
multiplier should be less that (1+sqrt(5))/2---about 1.6. 1.5
is close enough, and easy to calculate.) In memory tight
situations, of course, the multiplier should be even smaller.

The original STL implementation did use 2, and I suspect that
many implementations still do, even though we now know that it
isn't such a good idea.

--
James Kanze (GABI Software) email:
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

 
Reply With Quote
 
Roland Pibinger
Guest
Posts: n/a
 
      11-01-2007
On Wed, 31 Oct 2007 20:58:33 -0000, Bala wrote:
>How is the actual memory allocation done? My understanding is that
>the string library tries to reallocate memory on every statement.
>That is, initially when it finds the statement "msgHeader =3D "<";", it
>allocates say 1 byte to the msgHeader.


Probably yes.

>Then at the next statement it reallocates msgHeader as sizeof (a) +
>current memory of msgHeader and so on.
>Is this correct? If yes, then I am sure using reserve would improve
>the performance dramatically.


Probably for string but you cannot call reserve() for ostringstream.
Both std::string and std:stringstream are not meant to be utilized
"extensively". Consider to use a library for writing XML, e.g.
http://www.tbray.org/ongoing/When/20.../20/GenxStatus


--
Roland Pibinger
"The best software is simple, elegant, and full of drama" - Grady Booch
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Deployment and Performances Piggy ASP .Net 5 01-26-2010 07:23 AM
Performances de l'import Finder Java 1 04-19-2006 04:57 PM
IPsec PIX525 to PIX515 performances. AM Cisco 0 10-14-2005 08:04 PM
Python to measure HTTP and HTTPS performances: best way ??? vincent delft Python 2 11-15-2004 03:42 PM
Globalization and performances Steve B. ASP .Net 1 08-18-2004 12:59 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57