Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: Checksum in a struct

Reply
Thread Tools

Re: Checksum in a struct

 
 
Eric Sosman
Guest
Posts: n/a
 
      07-11-2012
On 7/11/2012 10:56 AM, pozz wrote:
> I have a function that computes a 16-bit checksum (following whatever
> algorithm) of a memory space:
>
> unsigned int checksum(const void *buffer, size_t size);
>
> I want to embed this checksum in a struct:
>
> struct PStruct {
> int x;
> unsigned int y;
> char z[13];
> ...
> unsigned int checksum;
> };
>
> How to use the checksum() function above? I propose:
>
> struct PStruct ps;
> ...
> ps.checksum = checksum(&ps, offsetof(struct PStruct, checksum));
>
> Is there a better mechanism?


You'd better hope so

A problem with the approach you've outlined is that the
checksum computation will include the values of any padding
bytes -- the size of `z' in your example almost begs for some
padding bytes to be inserted. Since padding bytes are not
necessarily preserved when assigning structs or even when
assigning to struct elements, a checksum that includes padding
bytes is unlikely to be very useful. Similar concerns apply to
bit-field elements: The values of un-named bits are not necessarily
preserved. For that matter, if `z' holds a string (as opposed to a
generic batch of chars), the bytes after '\0' should probably be
omitted from a checksum since they're not part of the "value."

One possibility would be to checksum the fields individually,
perhaps with a variadic function:

ps.checksum = checksum(&p.x, sizeof p.x,
&p.y, sizeof p.y,
p.z, strlen(p.z) + 1,
...,
(void*)NULL);

It seems to me this would be cumbersome, and also prone to error:
somebody could omit a field by accident, or (for checksums that
are non-commutative) get them in the wrong order. Also, it can't
handle bit-fields since you can't point at them.

A preferable approach would be to write a checksum function
specifically for struct PStruct objects, even if that function
winds up making the cumbersome call(s) to the true underlying
checksummer:

unsigned int PSChecksum(const struct PStruct *);
ps.checksum = PSChecksum(&ps);

Such a function could even handle bit-fields by copying their
values to addressable local variables before applying the low-
level computation.

--
Eric Sosman
d


 
Reply With Quote
 
 
 
 
Eric Sosman
Guest
Posts: n/a
 
      07-12-2012
On 7/12/2012 10:42 AM, pozz wrote:
> Il 11/07/2012 17:59, Eric Sosman ha scritto:
>> A problem with the approach you've outlined is that the
>> checksum computation will include the values of any padding
>> bytes -- the size of `z' in your example almost begs for some
>> padding bytes to be inserted. Since padding bytes are not
>> necessarily preserved when assigning structs or even when
>> assigning to struct elements, a checksum that includes padding
>> bytes is unlikely to be very useful. Similar concerns apply to
>> bit-field elements: The values of un-named bits are not necessarily
>> preserved. For that matter, if `z' holds a string (as opposed to a
>> generic batch of chars), the bytes after '\0' should probably be
>> omitted from a checksum since they're not part of the "value."

>
> Yes, they are considerations I also made. In my application (running on
> a single processor), I have to read/write the struct from/to a file and
> use it in memory. I'm not interested in a standard format file (its a
> custom configuration for the application) and I'll never need to
> read/write the struct on a different processor.
>
> I know other better standard file formats for configuration settings are
> available (INI, XML, ...), but I'm working on an embedded simple
> processor and I don't want to increase the complexity of the software
> just for the configuration.


The fact that you intend to use the struct only locally and
only on one processor doesn't change anything: Padding bytes will
still contain random and potentially non-constant garbage, bytes
after the '\0' terminating a string are probably garbage, and so
on. It's unlikely, but the mere act of storing the checksum into
the struct could in principle change the padding bytes -- if it
does, the checksum is self-invalidating!

If you want to write a struct and a checksum to a file and
verify the checksum when you read it back, keep the checksum as
a separate variable and don't put it inside the struct.

--
Eric Sosman
d


 
Reply With Quote
 
 
 
 
Jorgen Grahn
Guest
Posts: n/a
 
      07-13-2012
On Thu, 2012-07-12, pozz wrote:
> Il 11/07/2012 17:59, Eric Sosman ha scritto:
>> A problem with the approach you've outlined is that the
>> checksum computation will include the values of any padding
>> bytes -- the size of `z' in your example almost begs for some
>> padding bytes to be inserted. Since padding bytes are not
>> necessarily preserved when assigning structs or even when

....
> Yes, they are considerations I also made. In my application (running on
> a single processor), I have to read/write the struct from/to a file and
> use it in memory. I'm not interested in a standard format file (its a
> custom configuration for the application) and I'll never need to
> read/write the struct on a different processor.
>
> I know other better standard file formats for configuration settings are
> available (INI, XML, ...), but I'm working on an embedded simple
> processor


How simple? Your target has a file system, at least.

> and I don't want to increase the complexity of the software
> just for the configuration.


Just bear in mind that your solution increases complexity in other
ways. For example, debugging is harder when the files are in a binary
ad-hoc format. You may need to document the format. Extending the
amount of configuration data can be tricky, if you need to upgrade a
system without losing the current config. And so on.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
Les Cargill
Guest
Posts: n/a
 
      07-13-2012
pozz wrote:
> Il 11/07/2012 17:59, Eric Sosman ha scritto:
>> A problem with the approach you've outlined is that the
>> checksum computation will include the values of any padding
>> bytes -- the size of `z' in your example almost begs for some
>> padding bytes to be inserted. Since padding bytes are not
>> necessarily preserved when assigning structs or even when
>> assigning to struct elements, a checksum that includes padding
>> bytes is unlikely to be very useful. Similar concerns apply to
>> bit-field elements: The values of un-named bits are not necessarily
>> preserved. For that matter, if `z' holds a string (as opposed to a
>> generic batch of chars), the bytes after '\0' should probably be
>> omitted from a checksum since they're not part of the "value."

>
> Yes, they are considerations I also made. In my application (running on
> a single processor), I have to read/write the struct from/to a file and
> use it in memory. I'm not interested in a standard format file (its a
> custom configuration for the application) and I'll never need to
> read/write the struct on a different processor.
>
> I know other better standard file formats for configuration settings are
> available (INI, XML, ...), but I'm working on an embedded simple
> processor and I don't want to increase the complexity of the software
> just for the configuration.
>


So do this: ( WARNING! - THIS CODE PROBABLY DOES NOT COMPILE! )

typedef enum TYPTYP = {
v_int32,
v_short, // or int16
v_float,
v_double,
v_int8, // used for scalar chars
v_string,
};

typedef struct {
char name[128];
TYPTYP type;
void *ptr;
} cfgElem;

extern int32 thing1;
extern double thing2;
....

cfgElem configTable[] = {
{ "Thing1", v_int32 , &thing1 },
{ "Thing2", v_double , &thing2 },
....
};

then write a simple parser for .ini files ( if it's
> 80 lines you did it wrong ) that exploits this table.

You will not regret it unless you are somehow trying
to make somebody's life more difficult.

At least write a 'C' ( or perl, python, Tcl - although
'C' has advantages here ) program to manipulate these files
for you on a PC. Trust me on this - you need it, and
it will save you time in the long run. If need be, do it
at home on a Saturday - you'll get a few Saturdays back in the
end....

you can even ... *gasp* ... write a logger that polls
the configuration store and tells you every time something
changes.

You may not be interested in configuration management,
but configuration management is interested in you.

--
Les Cargill
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      07-15-2012
On 7/15/2012 3:56 AM, pozz wrote:
> Il 12/07/2012 17:08, Eric Sosman ha scritto:
>> On 7/12/2012 10:42 AM, pozz wrote:
>>> Il 11/07/2012 17:59, Eric Sosman ha scritto:
>>>> A problem with the approach you've outlined is that the
>>>> checksum computation will include the values of any padding
>>>> bytes -- the size of `z' in your example almost begs for some
>>>> padding bytes to be inserted. Since padding bytes are not
>>>> necessarily preserved when assigning structs or even when
>>>> assigning to struct elements, a checksum that includes padding
>>>> bytes is unlikely to be very useful. Similar concerns apply to
>>>> bit-field elements: The values of un-named bits are not necessarily
>>>> preserved. For that matter, if `z' holds a string (as opposed to a
>>>> generic batch of chars), the bytes after '\0' should probably be
>>>> omitted from a checksum since they're not part of the "value."
>>>
>>> Yes, they are considerations I also made. In my application (running on
>>> a single processor), I have to read/write the struct from/to a file and
>>> use it in memory. I'm not interested in a standard format file (its a
>>> custom configuration for the application) and I'll never need to
>>> read/write the struct on a different processor.
>>>
>>> I know other better standard file formats for configuration settings are
>>> available (INI, XML, ...), but I'm working on an embedded simple
>>> processor and I don't want to increase the complexity of the software
>>> just for the configuration.

>>
>> The fact that you intend to use the struct only locally and
>> only on one processor doesn't change anything: Padding bytes will
>> still contain random and potentially non-constant garbage, bytes
>> after the '\0' terminating a string are probably garbage, and so
>> on. It's unlikely, but the mere act of storing the checksum into
>> the struct could in principle change the padding bytes -- if it
>> does, the checksum is self-invalidating!

>
> So a possible solution is to store the checksum outside the struct as
> a different variable.


Yes, as I suggested in the very next paragraph:

>> If you want to write a struct and a checksum to a file and
>> verify the checksum when you read it back, keep the checksum as
>> a separate variable and don't put it inside the struct.

>
> Could I ignore the "randomness" of the padding bytes? I read that
> the padding bytes can be randomly changed even assigning a value to a
> field of the struct.


Now, *where* could you have read such a thing?

> My application should work in this way:
>
> - at startup, read the configuration file, calculate and verify the
> checksum: if it isn't correct, use a default struct;


Right: You'd read the struct's bytes directly into an instance
of itself using fread(), say, rather than making field assignments.
Then you'd read the stored checksum into an independent variable,
re-calculate the struct's checksum, and compare. It's your choice
what to do about a mismatch.

> - when a field changes (after assigning it the new value), calculate
> the new checksum and save both (struct and checksum) to the file;


Right again: Calculate the new checksum, store it in a free-
standing variable, and write the bytes of both to the file. Again,
it's up to you to decide how frequently you want to do this: On
every change, only at program shutdown, or something in between.

> - during the normal execution of the application, the fields of the
> struct are accessed many times.
>
> In this situation, could I calculate the checksum on the entire
> memory area of the struct (with padding bytes)? I read the padding
> bytes can be randomly changed when a value is assigned to a field, but
> in this case a re-calculate the checksum. What happens if I access a
> field? Also for read operations the padding bytes could be changed?


Padding bytes are "vulnerable" when their fellow travellers are
stored to (6.2.6.1p6). There's no similar language for read accesses,
which I interpret as meaning reads won't change them. Note that this
applies only to the padding in the instance that's being read; if you
copy a padded struct from one instance to another

struct padded s1 = ...;
// Suppose the padding bytes in s1 have values p1,p2,...
struct padded s2 = s1;
// s1's padding is still p1,p2,... but s2's can differ.

.... the padding in the original doesn't change, but the padding in
the copy need not agree with it. So when you're moving data back
and forth to files, be sure to do the checksum calculations on the
exact same struct instance that you use for the I/O, not on a copy.

--
Eric Sosman
d


 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      07-15-2012
pozz <> writes:
>that correspond to any padding bytes take unspecified values." (6.2.6.1p6)
>Here unspecified values means random data.


»unspecified« does not imply that the date will pass
tests for randomness.

»unspecified behavior« is behavior »where each
implementation documents how the choice is made«.

 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      07-15-2012
On 7/15/2012 1:30 PM, Stefan Ram wrote:
> pozz <> writes:
>> that correspond to any padding bytes take unspecified values." (6.2.6.1p6)
>> Here unspecified values means random data.

>
> »unspecified« does not imply that the date will pass
> tests for randomness.
>
> »unspecified behavior« is behavior »where each
> implementation documents how the choice is made«.


No; that's "implementation-defined behavior" (3.4.1p1).
"Unspecified behavior" (3.4.4p1) is

use of an unspecified value, or other behavior where this
International Standard provides two or more possibilities
and imposes no further requirements on which is chosen in
any instance

"No further requirements" implies "No requirement to document."

--
Eric Sosman
d


 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      07-15-2012
Eric Sosman <> writes:
>On 7/15/2012 1:30 PM, Stefan Ram wrote:
>>(...)

>No; that's "implementation-defined behavior" (3.4.1p1).
>"Unspecified behavior" (3.4.4p1) is


You are right. I did not read careful enough. I just saw:

»unspecified behavior where each implementation
documents how the choice is made«

and thought it was from a table or so, so that one can
read »is « in front of »where«, but I erred.

 
Reply With Quote
 
alex
Guest
Posts: n/a
 
      07-16-2012
On Wed, 11 Jul 2012 11:59:42 -0400, Eric Sosman wrote:
> A problem with the approach you've outlined is that the
> checksum computation will include the values of any padding bytes -- the
> size of `z' in your example almost begs for some padding bytes to be
> inserted. Since padding bytes are not necessarily preserved when
> assigning structs or even when assigning to struct elements, a checksum
> that includes padding bytes is unlikely to be very useful.


Are you sure about this?? I would expect a struct assign/deepcopy to be
implemented "under the hood" using memcpy(), not { s1.a=s2.a;
s1.b=s2.b; } etc. Pretty sure that's what GCC does.
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      07-16-2012
alex <> writes:
> On Wed, 11 Jul 2012 11:59:42 -0400, Eric Sosman wrote:
>> A problem with the approach you've outlined is that the
>> checksum computation will include the values of any padding bytes -- the
>> size of `z' in your example almost begs for some padding bytes to be
>> inserted. Since padding bytes are not necessarily preserved when
>> assigning structs or even when assigning to struct elements, a checksum
>> that includes padding bytes is unlikely to be very useful.

>
> Are you sure about this?? I would expect a struct assign/deepcopy to be
> implemented "under the hood" using memcpy(), not { s1.a=s2.a;
> s1.b=s2.b; } etc. Pretty sure that's what GCC does.


A compiler can do it either way. Using the equivalent of a memcpy()
call is certainly a likely approach, but there's no guarantee that
it's done that way.

Code that assumes padding bytes are preserved is likely to work
perfectly until the moment you demonstrate it to an important client.

--
Keith Thompson (The_Other_Keith) kst- <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Checksum in a struct Stefan Ram C Programming 2 07-25-2012 12:13 PM
Re: Checksum in a struct jadill33@gmail.com C Programming 0 07-12-2012 09:23 PM
Re: Checksum in a struct fmassei@gmail.com C Programming 0 07-11-2012 04:39 PM
Can *common* struct-members of 2 different struct-types, that are thesame for the first common members, be accessed via pointer cast to either struct-type? John Reye C Programming 28 05-08-2012 12:24 AM
struct my_struct *p = (struct my_struct *)malloc(sizeof(struct my_struct)); Chris Fogelklou C Programming 36 04-20-2004 08:27 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57