Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Data alignment questin, structures

Reply
Thread Tools

Data alignment questin, structures

 
 
mathog
Guest
Posts: n/a
 
      01-12-2013
(Apologies if this is a duplicate post, glitch on first attempt.)

Consider the following struct:

typedef struct {
uint16_t one;
uint16_t twothree[2];
uint16_t four;
uint16_t five;
} Mystruct;

It contains data storage which is actually:

{ uint16_t, uint32_t, uint16_t, uint16_t }

Accessing 'twothree" as a uint32_t directly, for instance with:

Mystruct instance;
printf("twothree is:%d\n",*(uint32_t *)&(instance.twothree[0]);

will be a nonaligned memory access. That will do nothing untoward on
x86 CPUs but will blow up on many others. OK so far?

Now, what happens when this struct is embedded in a binary file stored
in a character array "buffer[]" such that its position "i" is not
guaranteed to be aligned on the same boundary that

Mystract *instance2 = malloc(sizeof(Mystruct));

would have used, but will always be on a multiple of 2.

1. Will this always work?

function1((Mystruct *) &(buffer[i]));

2. How about this?

function2(*(Mystruct *) &(buffer[i]));

Assume that within the function access to the data is by memcpy() to the
appropriate offset for the uint32_t and by "=" for the uint16_t fields.
My guess is that (2) is likely to blow up more often than not, since
it is trying to pass a Mystruct unaligned.

What does the language standard say should happen in these two cases?

Thanks,

David Mathog
 
Reply With Quote
 
 
 
 
Ben Bacarisse
Guest
Posts: n/a
 
      01-12-2013
mathog <> writes:

> (Apologies if this is a duplicate post, glitch on first attempt.)
>
> Consider the following struct:
>
> typedef struct {
> uint16_t one;
> uint16_t twothree[2];
> uint16_t four;
> uint16_t five;
> } Mystruct;
>
> It contains data storage which is actually:
>
> { uint16_t, uint32_t, uint16_t, uint16_t }
>
> Accessing 'twothree" as a uint32_t directly, for instance with:
>
> Mystruct instance;
> printf("twothree is:%d\n",*(uint32_t *)&(instance.twothree[0]);
>
> will be a nonaligned memory access. That will do nothing untoward on
> x86 CPUs but will blow up on many others. OK so far?
>
> Now, what happens when this struct is embedded in a binary file stored
> in a character array "buffer[]" such that its position "i" is not
> guaranteed to be aligned on the same boundary that
>
> Mystract *instance2 = malloc(sizeof(Mystruct));
>
> would have used, but will always be on a multiple of 2.
>
> 1. Will this always work?
>
> function1((Mystruct *) &(buffer[i]));
>
> 2. How about this?
>
> function2(*(Mystruct *) &(buffer[i]));
>
> Assume that within the function access to the data is by memcpy() to
> the appropriate offset for the uint32_t and by "=" for the uint16_t
> fields. My guess is that (2) is likely to blow up more often than not,
> since it is trying to pass a Mystruct unaligned.
>
> What does the language standard say should happen in these two cases?


I hope you'll permit a meta-answer. Having pulled my hair out porting
yards of code like this in the 80s, I am almost certain that there is a
better way to do whatever you are trying to do. More often than not,
the best way is though a small set of function that extract basic types
from a buffer regardless of alignment and byte order. Whether you then
choose to use the values directly or to put them into a struct -- whose
members can now be aligned any way the compiler chooses (because you
never play address tricks with them) -- depends on the rest of the code.

--
Ben.
 
Reply With Quote
 
 
 
 
Eric Sosman
Guest
Posts: n/a
 
      01-12-2013
On 1/12/2013 1:15 PM, mathog wrote:
> (Apologies if this is a duplicate post, glitch on first attempt.)
>
> Consider the following struct:
>
> typedef struct {
> uint16_t one;
> uint16_t twothree[2];
> uint16_t four;
> uint16_t five;
> } Mystruct;
>
> It contains data storage which is actually:
>
> { uint16_t, uint32_t, uint16_t, uint16_t }
>
> Accessing 'twothree" as a uint32_t directly, for instance with:
>
> Mystruct instance;
> printf("twothree is:%d\n",*(uint32_t *)&(instance.twothree[0]);
>
> will be a nonaligned memory access. That will do nothing untoward on
> x86 CPUs but will blow up on many others. OK so far?


Mostly. The nature of the "blow up" varies, though: Some
systems ignore offending low-order address bits, and would
print a mixture of instance.one and instance.twothree[0].
(There's also the possibility of padding bytes, although the
possibility appears remote for this particular struct.)

> Now, what happens when this struct is embedded in a binary file stored
> in a character array "buffer[]" such that its position "i" is not
> guaranteed to be aligned on the same boundary that
>
> Mystract *instance2 = malloc(sizeof(Mystruct));
>
> would have used, but will always be on a multiple of 2.
>
> 1. Will this always work?
>
> function1((Mystruct *) &(buffer[i]));


Not "always," no. 6.3.2.3p7: "A pointer to an object type
may be converted to a pointer to a different object type. If the
resulting pointer is not correctly aligned for the referenced
type, the behavior is undefined. [...]" There's no guarantee
that the alignment of `buffer' suffices for `Mystruct'.

> 2. How about this?
>
> function2(*(Mystruct *) &(buffer[i]));


Not "always;" same reasoning (with even more force this time,
since the potentially misaligned pointer is not only computed,
but also dereferenced).

> Assume that within the function access to the data is by memcpy() to the
> appropriate offset for the uint32_t and by "=" for the uint16_t fields.
> My guess is that (2) is likely to blow up more often than not, since
> it is trying to pass a Mystruct unaligned.
>
> What does the language standard say should happen in these two cases?


"The behavior is undefined."

What problem are you trying to solve? As Ben Bacarisse says,
safer and saner approaches are likely to exist.

--
Eric Sosman
d
 
Reply With Quote
 
Shao Miller
Guest
Posts: n/a
 
      01-12-2013
On 1/12/2013 13:15, mathog wrote:
> (Apologies if this is a duplicate post, glitch on first attempt.)
>
> Consider the following struct:
>
> typedef struct {
> uint16_t one;
> uint16_t twothree[2];
> uint16_t four;
> uint16_t five;
> } Mystruct;
>
> It contains data storage which is actually:
>
> { uint16_t, uint32_t, uint16_t, uint16_t }
>
> Accessing 'twothree" as a uint32_t directly, for instance with:
>
> Mystruct instance;
> printf("twothree is:%d\n",*(uint32_t *)&(instance.twothree[0]);
>
> will be a nonaligned memory access. That will do nothing untoward on
> x86 CPUs but will blow up on many others. OK so far?
>
> Now, what happens when this struct is embedded in a binary file stored
> in a character array "buffer[]" such that its position "i" is not
> guaranteed to be aligned on the same boundary that
>
> Mystract *instance2 = malloc(sizeof(Mystruct));
>
> would have used, but will always be on a multiple of 2.
>
> 1. Will this always work?
>
> function1((Mystruct *) &(buffer[i]));
>
> 2. How about this?
>
> function2(*(Mystruct *) &(buffer[i]));
>
> Assume that within the function access to the data is by memcpy() to the
> appropriate offset for the uint32_t and by "=" for the uint16_t fields.
> My guess is that (2) is likely to blow up more often than not, since
> it is trying to pass a Mystruct unaligned.
>
> What does the language standard say should happen in these two cases?


If you have implementation-specific knowledge of what will happen, then
I'd suggest that you do whatever you want to accomplish your goal.

If you actually care about portability (you seem to), I'd suggest
formally serializing and deserializing data to and from the file.

You might be interested in modern C, which is C11. It includes the
'_Alignof' and '_Alignas' keywords, which allow you to consider and
choose alignments, respectively.

Prior to C11, you could try to use something like:

#include <stddef.h> /* For 'offsetof' */
#define Alignof(type) (offsetof(struct { char c; type t; }, t))

to consider alignment, and could use unions to choose alignment, if you
know the complete object type whose alignment needs to be satsified:

union {
char buffer[512];
MyStruct as_mystruct;
} buffer_and_mystruct;

Here, you know that 'buffer_and_mystruct.buffer' will be properly
aligned for an instance of your structure type.

You can always use 'memcpy' to copy some bytes into a member of an
instance of your structure type; then it'll be safe to examine that
member, as long as it hasn't been populated with a trap representation.

--
- Shao Miller
--
"Thank you for the kind words; those are the kind of words I like to hear.

Cheerily," -- Richard Harter
 
Reply With Quote
 
mathog
Guest
Posts: n/a
 
      01-13-2013
Shao Miller wrote:

> If you have implementation-specific knowledge of what will happen, then
> I'd suggest that you do whatever you want to accomplish your goal.


That's the point - I want this code to work wherever, without having
access to all possible platforms to test it first. Implicitly
implementation specific is exactly what I am trying to avoid.

>
> If you actually care about portability (you seem to), I'd suggest
> formally serializing and deserializing data to and from the file.


Yeah, I was afraid that was going to be the only standard compliant way
of doing this. The key thing that I have learned is that passing a
pointer to an unaligned structure isn't portable (6.3.2.3p7 cited by
Eric Sosnan earlier in this thread). There are some instances of that
in the code that swaps bytes (big endian to little endian), so those
routines will need to have their interfaces adjusted. They are
otherwise "serial". Better to do this now, before the code is released,
than later. It also seems unavoidable that explicit recordtype_get()
functions will be needed, with the data passed by void * or char *.

> You might be interested in modern C, which is C11. It includes the
> '_Alignof' and '_Alignas' keywords, which allow you to consider and
> choose alignments, respectively.


This doesn't help with the struct offset problem, since we do not know
the offset ahead of time. In a particular file the struct could be
anywhere in the buffer, at 100,102,..,500,502,etc. Or am I
misunderstanding the purpose of the _Alignof and _Alignas?

Thanks,

David Mathog
 
Reply With Quote
 
Shao Miller
Guest
Posts: n/a
 
      01-13-2013
On 1/13/2013 13:17, mathog wrote:
> Shao Miller wrote:
>
>> If you have implementation-specific knowledge of what will happen, then
>> I'd suggest that you do whatever you want to accomplish your goal.

>
> That's the point - I want this code to work wherever, without having
> access to all possible platforms to test it first. Implicitly
> implementation specific is exactly what I am trying to avoid.
>
>>
>> If you actually care about portability (you seem to), I'd suggest
>> formally serializing and deserializing data to and from the file.

>
> Yeah, I was afraid that was going to be the only standard compliant way
> of doing this. The key thing that I have learned is that passing a
> pointer to an unaligned structure isn't portable (6.3.2.3p7 cited by
> Eric Sosnan


(Sosman)

> earlier in this thread). There are some instances of that
> in the code that swaps bytes (big endian to little endian), so those
> routines will need to have their interfaces adjusted. They are
> otherwise "serial". Better to do this now, before the code is released,
> than later.


Based on what you've typed, I am guessing that the file format is not
intended to be portable across different platforms. Is that right?
That is, copying a saved file that was made on an x86 isn't expected to
load properly in the same program on PowerPC? I ask because the padding
within structures might also be a concern, for you.

By formally serializing/deserializing, you can have a portable file format.

> It also seems unavoidable that explicit recordtype_get()
> functions will be needed, with the data passed by void * or char *.
>
>> You might be interested in modern C, which is C11. It includes the
>> '_Alignof' and '_Alignas' keywords, which allow you to consider and
>> choose alignments, respectively.

>
> This doesn't help with the struct offset problem, since we do not know
> the offset ahead of time. In a particular file the struct could be
> anywhere in the buffer, at 100,102,..,500,502,etc. Or am I
> misunderstanding the purpose of the _Alignof and _Alignas?


Ah, no. If the structure can appear at any offset in the buffer, then
these don't help you as much, unless the offset happens to be a multiple
of the alignment requirement. But as mentioned, you can 'memcpy' from
the offset into an aligned location, which you might be interested in
doing anyway, if a structure can be truncated by being near the end of
your buffer.

At least you can automate some of the serializing/deserializing... You
can have:

struct s_foo {
double d;
int i;
char c;
};

size_t foo_offsets[] = {
offsetof(struct s_foo, d),
offsetof(struct s_foo, i),
offsetof(struct s_foo, c),
};

f_serialize * foo_serials[] = {
serialize_double,
serialize_int,
serialize_char,
};

where 'f_serialize' is a function typedef for a serialization function.
Then you can iterate through these arrays with a loop in a generic
"structure serialization" function, instead of having a monolithic:

void serialize_foo(struct s_foo * foo) {
serialize_double(&foo->d);
serialize_int(&foo->i);
serialize_char(&foo->c);
}

for each of your different structure types.

Or, here is one library you might be interested in:

http://www.leonerd.org.uk/code/libpack/intro.html

--
- Shao Miller
--
"Thank you for the kind words; those are the kind of words I like to hear.

Cheerily," -- Richard Harter
 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      01-15-2013
On Sat, 2013-01-12, Ben Bacarisse wrote:
....
> I hope you'll permit a meta-answer. Having pulled my hair out porting
> yards of code like this in the 80s, I am almost certain that there is a
> better way to do whatever you are trying to do. More often than not,
> the best way is though a small set of function that extract basic types
> from a buffer regardless of alignment and byte order. Whether you then
> choose to use the values directly or to put them into a struct -- whose
> members can now be aligned any way the compiler chooses (because you
> never play address tricks with them) -- depends on the rest of the code.


AOL, except in my case "porting yards of code like this in the 80s"
would be "being bogged down in code like this on and off 1996--present".

Everything just becomes so much simpler and safer if you treat octet
buffers as octet buffers, and structs as structs.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      01-15-2013
On 1/15/2013 3:14 PM, Jorgen Grahn wrote:
> On Sat, 2013-01-12, Ben Bacarisse wrote:
> ...
>> I hope you'll permit a meta-answer. Having pulled my hair out porting
>> yards of code like this in the 80s, I am almost certain that there is a
>> better way to do whatever you are trying to do. More often than not,
>> the best way is though a small set of function that extract basic types
>> from a buffer regardless of alignment and byte order. Whether you then
>> choose to use the values directly or to put them into a struct -- whose
>> members can now be aligned any way the compiler chooses (because you
>> never play address tricks with them) -- depends on the rest of the code.

>
> AOL, except in my case "porting yards of code like this in the 80s"
> would be "being bogged down in code like this on and off 1996--present".
>
> Everything just becomes so much simpler and safer if you treat octet
> buffers as octet buffers, and structs as structs.


Let's add "powerful" and "flexible" to the "so much more" list.

Powerful: It's quite common to want to read and write not just
a struct, but a data structure. As a simple example, consider
`struct person { char *name; struct person *spouse; }': It will
do you no good at all to send these two pointers to another program.
You need to read and write this data with functions that are aware
of the semantics. (Also, they can be smart enough to handle "I'm
writing John; John's spouse is Mary so I'll also write Mary; Mary's
spouse is John so I'll also write John; John's spouse is Mary so ...")

Flexible: So you've settled on a serialized form (in the old
days we had "wire formats"), and along comes someone braying "Ad-hoc
binary formats are s-o-o-o twentieth-century! Management orders
you to get rid of all that well-tested, highly reliable, blazingly
efficient cruft, and use this shiny new XML DTD instead. Hop to it!"
If your programs are full of fread() and fwrite() calls you've got a
headache; if they already use functions that separate internal and
external representations you'll have a much easier time.

--
Eric Sosman
d
 
Reply With Quote
 
mathog
Guest
Posts: n/a
 
      01-18-2013
Eric Sosman wrote:
>> Now, what happens when this struct is embedded in a binary file stored
>> in a character array "buffer[]" such that its position "i" is not
>> guaranteed to be aligned on the same boundary that
>>
>> Mystract *instance2 = malloc(sizeof(Mystruct));
>>
>> would have used, but will always be on a multiple of 2.
>>
>> 1. Will this always work?
>>
>> function1((Mystruct *) &(buffer[i]));

>
> Not "always," no. 6.3.2.3p7: "A pointer to an object type
> may be converted to a pointer to a different object type. If the
> resulting pointer is not correctly aligned for the referenced
> type, the behavior is undefined. [...]" There's no guarantee
> that the alignment of `buffer' suffices for `Mystruct'.


On further consideration, how does one know what "correctly aligned"
means for a given struct? Consider these two examples:

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
} Mystruct4;

typedef struct {
uint16_t one;
uint16_t two;
uint16_t three;
uint16_t four;
uint16_t five;
} Mystruct5;

The first would presumably be 8 bytes and the second 10. Is it safe to
access these in a memory buffer via a pointer so long as that pointer
is aligned on a 2 byte boundary? I would assume yes because all of the
data within each struct has that alignment, and I am also assuming that
this code should always work:

Mystruct5 array[10]; /* valid */
Mystruct5 *aptr;
uint16_t x;
array[1].four=20;
aptr = &array[1];
/* the next three lines should all print: 1,4 is:20 */
printf("1,4 is:%u\n",aptr->five);
somefunction1(aptr);
somefunction2(*aptr);

....

void somefunction1(Mystruct5 *aptr){
printf("1,4 is:%u\n",aptr->four);
}
void somefunction2(Mystruct5 aptr){
printf("1,4 is:%u\n",aptr.four);
}


Or may a compiler make some assumption for Mystruct4 that requires it to
be aligned on a 4 byte boundary too? That would not prevent it from
being used in an array, and would not break the above example (after
changing Mystruct5 -> Mystruct4). But it would break both of the two
function calls if aptr was pointing to a buffer in memory where the data
was only 2 byte aligned for the structure. Conversely, adding a 4 byte
boundary alignment requirement on Mystruct5 would add two pad bytes, for
no obvious reason, but it would not break the above code example. I see
no reason for the compiler to add the 4 byte alignment
requirement for these structures, but is it nevertheless free to do so?

Thanks,

David Mathog

 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      01-18-2013
On 01/18/2013 01:24 PM, mathog wrote:
> Eric Sosman wrote:
>>> Now, what happens when this struct is embedded in a binary file stored
>>> in a character array "buffer[]" such that its position "i" is not
>>> guaranteed to be aligned on the same boundary that
>>>
>>> Mystract *instance2 = malloc(sizeof(Mystruct));
>>>
>>> would have used, but will always be on a multiple of 2.
>>>
>>> 1. Will this always work?
>>>
>>> function1((Mystruct *) &(buffer[i]));

>>
>> Not "always," no. 6.3.2.3p7: "A pointer to an object type
>> may be converted to a pointer to a different object type. If the
>> resulting pointer is not correctly aligned for the referenced
>> type, the behavior is undefined. [...]" There's no guarantee
>> that the alignment of `buffer' suffices for `Mystruct'.

>
> On further consideration, how does one know what "correctly aligned"
> means for a given struct? Consider these two examples:


In C99 and earlier, there were only a few cases where you could infer
that a pointer was correctly aligned for a given operation. For
instance, a struct must have alignment requirements at least as strict
as those of any of it's members, but it can be stricter.

In C2011, several alignment-oriented features were added, such as
_Alignof(), and it's now possible to determine exactly whether or not
the alignment of any arbitrary type allows an operation.

> typedef struct {
> uint16_t one;
> uint16_t two;
> uint16_t three;
> uint16_t four;
> } Mystruct4;
>
> typedef struct {
> uint16_t one;
> uint16_t two;
> uint16_t three;
> uint16_t four;
> uint16_t five;
> } Mystruct5;
>
> The first would presumably be 8 bytes and the second 10. Is it safe to
> access these in a memory buffer via a pointer so long as that pointer
> is aligned on a 2 byte boundary? I would assume yes because all of the
> data within each struct has that alignment, and I am also assuming that
> this code should always work:


You've made several assumptions not guaranteed by the standard:
sizeof(uint16_t) == 2
_Alignof(uint16_t) == 2
_Alignof(Mystruct4) == _Alignof(uint16_t)
_Alignof(Mystruct5) == _Alignof(uint16_t)

For implementations where all of those things are true, your conclusion
holds, but it's not necessarily the case that any of those things are true.

> Or may a compiler make some assumption for Mystruct4 that requires it to
> be aligned on a 4 byte boundary too? ...


Yes.

> ... I see
> no reason for the compiler to add the 4 byte alignment
> requirement for these structures, but is it nevertheless free to do so?


Yes.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
byte alignment in structures and unions anon.asdf@gmail.com C Programming 20 08-10-2007 04:13 PM
structures and alignment issues silpau@gmail.com C Programming 31 06-16-2007 12:03 AM
alignment of structures in c++ Sandeep C++ 5 12-04-2005 09:14 PM
Memory alignment in structures rahul8143@gmail.com C Programming 1 09-26-2005 05:26 PM
structures, structures and more structures (questions about nestedstructures) Alfonso Morra C Programming 11 09-24-2005 07:42 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57