On 10/25/2012 2:31 PM, Greg Martin wrote:
> I have heard it said, but not confirmed, that the only guarantee that
> the standard gives with regards to structs is that the first element is
> aligned with the structures first byte and that the order of the members
> will not be changed.
That's it, mostly. We know that members are properly aligned
for their types and there's some special language pertaining to
bit-fields, but you're essentially correct.
> Does that mean that code like that below should
> print "Hello" but after that anything would be possible?
>
>
> Hello, World
> struct words: 14
> char[] str: 13
>
>
>
> /***********************************************/
>
> #include <stdlib.h>
> #include <stdio.h>
> #include <string.h>
>
> struct words {
> char hello[5];
> char comma;
> char space;
> char world[5];
> char exclaim;
> char term;
> };
We know that the "hello" member begins at the struct's first
byte, and that the later members appear in order, not overlapping:
offsetof(struct words, hello) == 0
offsetof(struct words, comma) >= 0 + 5
offsetof(struct words, space) >=
offsetof(struct words, comma) + 1
offsetof(struct words, world) >=
offsetof(struct words, space) + 1
offsetof(struct words, exclaim) >=
offsetof(struct words, world) + 5
offsetof(struct words, term) >=
offsetof(struct words, exclaim) + 1
Finally, we know that the struct it at least as large as the
sum of its element sizes and any padding between them:
sizeof(struct words) >= offsetof(struct words, term) + 1
.... hence sizeof(struct words) >= 14 (== 5 + 1 + 1 + 5 + 1 + 1).
Since none of the members requires any special alignment, it's
quite likely that sizeof(struct words) will in fact be 14 exactly.
Perhaps the next most likely value is 16, if a compiler decides to
put two padding bytes at the end to make the whole thing fit in two
8-byte units. Descending even further on the likelihood scale, a
compiler might insert one padding byte before `world' and one more
at the end, so each array would be contained in a single 8-byte
unit. Other padding arrangements seem extremely unlikely -- though
as you observe, they're permitted.
> int main (int argc, char* argv[]) {
> char str[] = "Hello, World";
> struct words w;
Okay, `w' occupies >=14 bytes of storage.
> memcpy (&w, str, sizeof (str));
This fills the first 13 bytes of `w' with a copy of the string.
The 14th byte (and any others) remain uninitialized. Since `w'
has sufficient space for everything that's being copied into it,
there's no problem up to this point.
Note that memcpy() makes no use of the "struct-ness" of
the target. In C, any addressable object can be viewed as an
array of bytes, without regard to the object's actual type.
That's what memcpy() does: It just copies bytes, and doesn't
care what type the bytes represent.
> char *cp = (char*) &w;
>
> while (*cp != '\0') {
> printf ("%c", *cp);
> ++cp;
> }
Here, you're doing much the same thing as memcpy() did: You
are not using `w' as a struct, but only as a bag of bytes. If
there are padding bytes, you're using them on exactly the same
basis as you use member bytes: They're all just bytes. The
output *will* be "Hello, World" whether there's padding or not.
Using the "struct-ness" might (in principle) have produced
some surprises:
printf("%.5s", w.hello); // fine so far
printf("%c", w.comma); // BZZT!
printf("%c", w.space); // BZZT!
printf("%.5s", w.world); // BZZT!
There's no telling (in principle) what the final three lines
would have done.
> printf ("\n");
>
> printf ("struct words: %d\nchar[] str: %d\n",
> sizeof (w), sizeof (str));
Nit-pick: "%d" is for signed integers, which `size_t' is
not. I've used systems where this would have printed the two
sizes as 14 and 0 thanks to the mismatch; in principle, worse
things could happen.
> return 0;
> }
--
Eric Sosman
d