Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > a = b or memset/cpy?

Reply
Thread Tools

a = b or memset/cpy?

 
 
Shao Miller
Guest
Posts: n/a
 
      02-07-2012
On 2/7/2012 12:02, nroberts wrote:
> memset and memcpy are turning up in profiles a lot. I'd like to speed
> things up a bit.
>


You might find that the implementation actually translates a '= { 0
};'-style initializer into a call to 'memset'. An experiment might
reveal whether or not that's the case.

> Sometimes it is clear that using = to initialize a local would be
> better than memset. I might not gain anything, but at least there's a
> chance.
>


I'm not sure how you could gain anything unless the call to 'memset'
actually translates differently than a '= { 0 };'-style initializer.

Did you know that after all subobjects that are explicitly initialized
(by the initializer-list) have been so, the rest are initialized to what
they would have been had the object been declared with 'static' storage
duration? The whole containing object is thus "touched."

> However, can I gain performance improvements when zeroing out say some
> global element in an array like so:
>
> typedef struct x { int var0; char var1[20]; } X;
>
> X gX[30];
>
> void f(int slot)
> {
> X init = {0};
>
> gX[slot] = init;
>
> ...
> }
>
> vs.
> void f(int slot)
> {
> memset(&gX[slot], 0, sizeof(X));
>
> ...
> }
>


Well these aren't the same. The former initializes all sub-objects to
the "zeroey" values that would initialize a 'static'-storage-duration
object having the same type as the sub-object and having no explicit
initializer.

The latter fills the object with bytes with the 'unsigned char' value
'0', which is all-bits-zero.

In your example, the 'struct' type 'X' has an 'int' member. The object
representation of an 'int' can have padding bits that can be used any
way the implementation pleases.

If filling the padding bits with zeroes results in a trap representation
for an 'int', then you might be in for a surprise.

There are similar concerns for other types, including pointers, where a
null pointer value might not be all-bits-zero.

That is why I believe some people consider a '= { 0 };'-style
initializer to be more portable than 'memset'. If portability isn't a
concern, oh well.

> Normally I wouldn't look for a micro-optimization like this but I'm
> kind of stuck with the parameters I'm given.


Optmizing and making portable might not always be compatible. If you
have a particular set of implementations as your target(s), there might
be "compiler intrinsics" that you can use which are
implementation-defined extensions to C that could offer you speed
advantages.

For example, some Microsoft compilers offer '__movsd':

http://msdn.microsoft.com/en-us/library/9d196b9h.aspx
 
Reply With Quote
 
 
 
 
Eric Sosman
Guest
Posts: n/a
 
      02-08-2012
On 2/7/2012 12:02 PM, nroberts wrote:
> memset and memcpy are turning up in profiles a lot. I'd like to speed
> things up a bit.
>
> Sometimes it is clear that using = to initialize a local would be
> better than memset. I might not gain anything, but at least there's a
> chance.
>
> However, can I gain performance improvements when zeroing out say some
> global element in an array like so:
>
> typedef struct x { int var0; char var1[20]; } X;
>
> X gX[30];
>
> void f(int slot)
> {
> X init = {0};
>
> gX[slot] = init;
>
> ...
> }
>
> vs.
> void f(int slot)
> {
> memset(&gX[slot], 0, sizeof(X));
>
> ...
> }


The official answer is: The definition of the C language says
nothing about which constructs are faster or slower than others.

That said, I would expect memset() to be faster, usually, if
the wind is not unfavorable and the Moon is in the right quarter.
Argument: In the assignment version, the code must allocate the auto
variable `init', zero it, and then copy all those zeroes to `gX[slot]';
on the face of it, this sounds like more work than just zeroing
`gX[slot]' to begin with.

It is just possible that a very smart compiler could (1) realize
that the `init' variable is not actually necessary, (2) decide to
clear `gX[slot]' directly instead of clearing `init' and copying,
and (3) clear `gX[slot]' more efficiently than memset() can, perhaps
with in-line code. My suspicion, though, is that a compiler smart
enough for (1,2,3) would not at the same time be so dumb as to
implement memset() with an actual call to an actual external function;
you'd need a strange combination of brilliance and stupidity to get
an advantage for initialize-and-copy.

... and, of course, measurement is the only way to be sure.

> Normally I wouldn't look for a micro-optimization like this but I'm
> kind of stuck with the parameters I'm given.


My prejudice (and I admit it's something of a prejudice) would be
to take a hard look at those memset() and memcpy() calls, with a view
toward eliminating at least some of them -- if you can eliminate a
call you get an infinite speedup, as opposed to a mere hundredfold!
Making copies of bits you've already computed usually doesn't advance
the state of the computation very much; making many duplicates of a
single byte is also not usually a great addition to the program's
"knowledge." There are, of course, exceptions: qsort() just rearranges
bits you already own, for example, but can be useful nonetheless.
Still, if memset() and memcpy() are dominating the run time, it seems
likely that there may be a lot of needless setting and copying going
on. See what you can jettison.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)d
 
Reply With Quote
 
 
 
 
Shao Miller
Guest
Posts: n/a
 
      02-08-2012
On 2/7/2012 18:58, Shao Miller wrote:
>>
>> typedef struct x { int var0; char var1[20]; } X;
>>
>> X gX[30];
>>
>> void f(int slot)
>> {
>> X init = {0};
>>
>> gX[slot] = init;
>>
>> ...
>> }
>>
>> vs.
>> void f(int slot)
>> {
>> memset(&gX[slot], 0, sizeof(X));
>>
>> ...
>> }
>>

>
> Well these aren't the same. The former initializes all sub-objects to
> the "zeroey" values that would initialize a 'static'-storage-duration
> object having the same type as the sub-object and having no explicit
> initializer.
>
> The latter fills the object with bytes with the 'unsigned char' value
> '0', which is all-bits-zero.
>
> In your example, the 'struct' type 'X' has an 'int' member. The object
> representation of an 'int' can have padding bits that can be used any
> way the implementation pleases.
>
> If filling the padding bits with zeroes results in a trap representation
> for an 'int', then you might be in for a surprise.
>


Ben Bacarisse proved in another thread that my claim for a potential
surprise is false; there is no potential for all-zero-bits in an
integer's object representation to be a trap representation. Sorry
about that!

> There are similar concerns for other types, including pointers, where a
> null pointer value might not be all-bits-zero.
>


Still applies for other things, like pointers.
 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      02-08-2012
On Tue, 2012-02-07, nroberts wrote:
> memset and memcpy are turning up in profiles a lot. I'd like to speed
> things up a bit.
>
> Sometimes it is clear that using = to initialize a local would be
> better than memset. I might not gain anything, but at least there's a
> chance.


For copying with memcpy(), I much prefer assignment since it doesn't
bypass the type system, and is more readable.

I won't comment on the memset() part.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
Joe keane
Guest
Posts: n/a
 
      02-08-2012
In article <(E-Mail Removed)>,
nroberts <(E-Mail Removed)> wrote:
>memset and memcpy are turning up in profiles a lot.


Indeed.

>Sometimes it is clear that using = to initialize a local would be
>better than memset.


It's a shame if you call a function with a size parameter, when in fact
the size is a compile-time constant. You also probably know a bit about
alignment, whereas those guys have to assume the worst.

>I might not gain anything, but at least there's a chance.


Please to use real data! 'gprof' is very good at this. It works [so
far as i have seen] on stdlib calls as well as your functions.

It can tell you where you're getting killed by function call overhead,
and where the copy is taking a long time, such that you may go to more
length to avoid it. It can also (by switching back to a function) tell
you where your 'optimization' does nothing except increase code size.
 
Reply With Quote
 
Ian Collins
Guest
Posts: n/a
 
      02-08-2012
On 02/ 9/12 09:05 AM, Joe keane wrote:
> In article<(E-Mail Removed)>,
> nroberts<(E-Mail Removed)> wrote:
>> memset and memcpy are turning up in profiles a lot.

>
> Indeed.
>
>> Sometimes it is clear that using = to initialize a local would be
>> better than memset.

>
> It's a shame if you call a function with a size parameter, when in fact
> the size is a compile-time constant. You also probably know a bit about
> alignment, whereas those guys have to assume the worst.


A decent compiler will inline the call to memset() in this case, so
there is no call overhead. Whether the inline memset() is faster or
slower than an assignment to a const initialiser is something the OP
would have to measure.

>> I might not gain anything, but at least there's a chance.

>
> Please to use real data! 'gprof' is very good at this. It works [so
> far as i have seen] on stdlib calls as well as your functions.


Assuming the OP uses GNU tools...

> It can tell you where you're getting killed by function call overhead,
> and where the copy is taking a long time, such that you may go to more
> length to avoid it. It can also (by switching back to a function) tell
> you where your 'optimization' does nothing except increase code size.


Assuming there is a function call...

--
Ian Collins
 
Reply With Quote
 
Jens Gustedt
Guest
Posts: n/a
 
      02-09-2012
Am 02/08/2012 12:58 AM, schrieb Shao Miller:
> On 2/7/2012 12:02, nroberts wrote:


> I'm not sure how you could gain anything unless the call to 'memset'
> actually translates differently than a '= { 0 };'-style initializer.


The gain is in the knowledge of the optimizer. If you have a memset
initialization it is difficult (but not impossible) for the optimizer
to keep track of initializations. If it knows of initializations and
it encounters an assignment of a field of the struct before it is ever
read, the optimizer is allowed to omit the initialization. Modern
optimizers can be quite good in tracking individual struct or array
members.

Jens

 
Reply With Quote
 
Tim Prince
Guest
Posts: n/a
 
      02-14-2012
On 02/07/2012 12:02 PM, nroberts wrote:
> memset and memcpy are turning up in profiles a lot. I'd like to speed
> things up a bit.
>
> Sometimes it is clear that using = to initialize a local would be
> better than memset. I might not gain anything, but at least there's a
> chance.
>
> However, can I gain performance improvements when zeroing out say some
> global element in an array like so:
>
> typedef struct x { int var0; char var1[20]; } X;
>
> X gX[30];
>
> void f(int slot)
> {
> X init = {0};
>
> gX[slot] = init;
>
> ...
> }
>
> vs.
> void f(int slot)
> {
> memset(&gX[slot], 0, sizeof(X));
>
> ...
> }
>
> Normally I wouldn't look for a micro-optimization like this but I'm
> kind of stuck with the parameters I'm given.


Certain compilers make such transformations automatically; for only 30
elements, presumably with reasonable alignment (with compiler able to
see it via in-lining), in-line code may be best, but compilers may
prefer memset() to reduce code size. It may make a difference when one
or the other applies a cache bypass (IA nontemporal) when the move is
seen as large enough to need it, which 30 elements clearly is not.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments