Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Structure byte padding rule (http://www.velocityreviews.com/forums/t744106-structure-byte-padding-rule.html)

Shivanand Kadwadkar 02-24-2011 10:11 AM

Structure byte padding rule
 
can any one give rule behind the how structure byte padding works

Is it depends on machine word size or size of the largest data type or
something else.

bert 02-24-2011 10:44 AM

Re: Structure byte padding rule
 
On Feb 24, 10:11*am, Shivanand Kadwadkar
<shivanand.kadwad...@gmail.com> wrote:
> can any one give rule behind the how structure byte padding works
>
> Is it depends on machine word size or size of the largest data type or
> something else.


No. Whatever one implementation does,
another one can do quite differently,
so long as well-defined code (that is,
code that does not depend on how the
padding bytes are implemented) works
as it is expected to.
--

Eric Sosman 02-24-2011 12:59 PM

Re: Structure byte padding rule
 
On 2/24/2011 5:11 AM, Shivanand Kadwadkar wrote:
> can any one give rule behind the how structure byte padding works


There can be padding after any element, including the last.
(Bit-field elements are special, and complicated, so let's just
ignore them -- besides, you can't point at them anyhow, so their
position within the larger struct doesn't matter much.)

> Is it depends on machine word size or size of the largest data type or
> something else.


The implementation is free to use as much or as little padding
as it wants, and to arrange the padding any way it wants, provided
any padding bytes come after struct elements (that is, there can be
no padding before the first element). Usually, an implementation
will insert the smallest amount of padding necessary to satisfy the
alignment requirements of the element's own type. For example, on
a system where a `double' is eight bytes long and must be aligned
on a four-byte boundary, the struct

struct s { char x; double y; char z; };

.... will probably have six padding bytes: three after `x' so that
`y' begins four bytes in (and will be four-byte-aligned if the
struct itself is), and another three after `z' so that in an array
of `struct s' objects the second array element will be four-byte-
aligned if the array itself is. If you want to discover how a given
implementation has padded a given struct, you can use the offsetof()
macro from <stddef.h>:

printf ("struct s takes %d bytes\n", (int)sizeof(struct s));
printf ("x starts %d bytes in\n", (int)offsetof(struct s, x));
printf ("y starts %d bytes in\n", (int)offsetof(struct s, y));
printf ("z starts %d bytes in\n", (int)offsetof(struct s, z));

However, the alignment requirements for various data types are
also entirely up to the implementation. Thus, different compilers
may pad the same source-code struct differently to satisfy their
differing alignment needs, and the values printed by this code may
differ from one system to another. (Except that the offset of `x'
will always be zero; no padding before the first element.)

--
Eric Sosman
esosman@ieee-dot-org.invalid

BGB 02-24-2011 09:09 PM

Re: Structure byte padding rule
 
On 2/24/2011 3:11 AM, Shivanand Kadwadkar wrote:
> can any one give rule behind the how structure byte padding works
>
> Is it depends on machine word size or size of the largest data type or
> something else.


as others, have noted, the specifics are somewhat compiler/target specific.


however, there are a few common "rules of thumb" (for compilers/targets
which use padding):
most base types have a power-of-2 size (note 1);
most base types require an alignment which is the same as their size
(note 2) often up to a certain limit (note 3);
....

note 1: except "long double", which even on x86, differs widely between
compilers and CPU mode. 80, 96, and 128 bit storage sizes exist, as well
as some compilers which simply treat them as double.

note 2: this is not always consistent, as targets may require an
alignment smaller than the size for some types. an example is in 32-bit
x86, where sometimes "long long" will only require a 32-bit alignment
despite being a 64 bit type, and other compilers will still align it to
64 bits out of principle.

note 3: often an architecture will only care about alignment up to a
certain point (such as the native word size, address size, or bus
width), and past this point no greater alignment is needed (even if
larger sizes may exist). for example, on x86 at present such limit is 16
bytes (128 bits), but this may change later if/when larger CPU registers
are added...


so, usual strategy:
for each struct member, it figures out the needed alignment, and the
current offset within the struct (directly following the prior member);
if the offset is not aligned, it is padded up to the needed alignment;
following the last member, the struct may be in-turn padded up to its
own needed alignment (so they can go nicely into arrays), which is
usually that of the greatest needed alignment within the struct.


or such...

sandeep 02-24-2011 09:51 PM

Re: Structure byte padding rule
 
Eric Sosman writes:
> The implementation is free to use as much or as little padding
> as it wants, and to arrange the padding any way it wants, provided any
> padding bytes come after struct elements (that is, there can be no
> padding before the first element). Usually, an implementation will
> insert the smallest amount of padding necessary to satisfy the alignment
> requirements of the element's own type. For example, on a system where
> a `double' is eight bytes long and must be aligned on a four-byte
> boundary, the struct
>
> struct s { char x; double y; char z; };
>
> ... will probably have six padding bytes: three after `x' so that `y'
> begins four bytes in (and will be four-byte-aligned if the struct itself
> is), and another three after `z' so that in an array of `struct s'
> objects the second array element will be four-byte- aligned if the array
> itself is. If you want to discover how a given implementation has
> padded a given struct, you can use the offsetof() macro from <stddef.h>:
>
> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));

printf ("x
> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y

starts
> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d

bytes
> in\n", (int)offsetof(struct s, z));


Unfortunately though, this code will invoke an undefined behavior on an
implementation where sizeof(struct s) is bigger than INTMAX. I would
advise using the %z argument to printf, this matches the return type of
sizeof() and offsetof() so no explicit casts will be needed.

Keith Thompson 02-24-2011 10:33 PM

Re: Structure byte padding rule
 
sandeep <nospam@nospam.com> writes:
> Eric Sosman writes:
>> The implementation is free to use as much or as little padding
>> as it wants, and to arrange the padding any way it wants, provided any
>> padding bytes come after struct elements (that is, there can be no
>> padding before the first element). Usually, an implementation will
>> insert the smallest amount of padding necessary to satisfy the alignment
>> requirements of the element's own type. For example, on a system where
>> a `double' is eight bytes long and must be aligned on a four-byte
>> boundary, the struct
>>
>> struct s { char x; double y; char z; };
>>
>> ... will probably have six padding bytes: three after `x' so that `y'
>> begins four bytes in (and will be four-byte-aligned if the struct itself
>> is), and another three after `z' so that in an array of `struct s'
>> objects the second array element will be four-byte- aligned if the array
>> itself is. If you want to discover how a given implementation has
>> padded a given struct, you can use the offsetof() macro from <stddef.h>:
>>
>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));

> printf ("x
>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y

> starts
>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d

> bytes
>> in\n", (int)offsetof(struct s, z));

>
> Unfortunately though, this code will invoke an undefined behavior on an
> implementation where sizeof(struct s) is bigger than INTMAX. I would
> advise using the %z argument to printf, this matches the return type of
> sizeof() and offsetof() so no explicit casts will be needed.


A struct containing a char, a double, and a char is vanishingly
unlikely to exceed INT_MAX bytes.

But yes, using "%zu" would make the code a bit cleaner (assuming your
implementation supports it; not all do).

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ben Bacarisse 02-24-2011 10:52 PM

Re: Structure byte padding rule
 
sandeep <nospam@nospam.com> writes:

> Eric Sosman writes:

<snip>
>> struct s { char x; double y; char z; };

<snip>
>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));

> printf ("x
>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y

> starts
>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d

> bytes
>> in\n", (int)offsetof(struct s, z));

>
> Unfortunately though, this code will invoke an undefined behavior on an
> implementation where sizeof(struct s) is bigger than INTMAX.


It's not undefined behaviour -- it's implementation-defined.

> I would
> advise using the %z argument to printf,


Presumably you mean %zu. 'z' is just a length modifier.

> this matches the return type of
> sizeof() and offsetof() so no explicit casts will be needed.


If you don't have a C99 version of printf, the most portable solution is
to cast to unsigned long (so there is not even any implementation-
defined behaviour) and use %lu as the format.

However (as I am sure you know) even this advice is over the top for the
code in question!

--
Ben.

Keith Thompson 02-24-2011 11:07 PM

Re: Structure byte padding rule
 
Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
> sandeep <nospam@nospam.com> writes:
>
>> Eric Sosman writes:

> <snip>
>>> struct s { char x; double y; char z; };

> <snip>
>>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));

>> printf ("x
>>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y

>> starts
>>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d

>> bytes
>>> in\n", (int)offsetof(struct s, z));

>>
>> Unfortunately though, this code will invoke an undefined behavior on an
>> implementation where sizeof(struct s) is bigger than INTMAX.

>
> It's not undefined behaviour -- it's implementation-defined.

[...]

An overflowing conversion to a signed type either yields an
implementation-defined result or raises an implementation-defined signal
(C99 6.3.1.3p3). The consequences of raising an implementation-defined
signal are (at least potentially) undefined.

The permission to raise a signal is new in C99, and I've never
heard of any compiler taking advantage of it.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

BGB 02-24-2011 11:17 PM

Re: Structure byte padding rule
 
On 2/24/2011 3:33 PM, Keith Thompson wrote:
> sandeep<nospam@nospam.com> writes:
>> Eric Sosman writes:
>>> The implementation is free to use as much or as little padding
>>> as it wants, and to arrange the padding any way it wants, provided any
>>> padding bytes come after struct elements (that is, there can be no
>>> padding before the first element). Usually, an implementation will
>>> insert the smallest amount of padding necessary to satisfy the alignment
>>> requirements of the element's own type. For example, on a system where
>>> a `double' is eight bytes long and must be aligned on a four-byte
>>> boundary, the struct
>>>
>>> struct s { char x; double y; char z; };
>>>
>>> ... will probably have six padding bytes: three after `x' so that `y'
>>> begins four bytes in (and will be four-byte-aligned if the struct itself
>>> is), and another three after `z' so that in an array of `struct s'
>>> objects the second array element will be four-byte- aligned if the array
>>> itself is. If you want to discover how a given implementation has
>>> padded a given struct, you can use the offsetof() macro from<stddef.h>:
>>>
>>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));

>> printf ("x
>>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y

>> starts
>>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d

>> bytes
>>> in\n", (int)offsetof(struct s, z));

>>
>> Unfortunately though, this code will invoke an undefined behavior on an
>> implementation where sizeof(struct s) is bigger than INTMAX. I would
>> advise using the %z argument to printf, this matches the return type of
>> sizeof() and offsetof() so no explicit casts will be needed.

>
> A struct containing a char, a double, and a char is vanishingly
> unlikely to exceed INT_MAX bytes.
>
> But yes, using "%zu" would make the code a bit cleaner (assuming your
> implementation supports it; not all do).
>


a struct exceeding INT_MAX bytes on any "reasonable" architecture seems
itself exceedingly unlikely...

on a 16-bit target, having a struct this large would be itself a problem
(yes, yes, say on DOS one could have a far pointer and a 64kB struct,
but how likely is this?...).

on most 32-bit systems, this can't practically happen (would need a 2GB
struct, which would have problems fitting into most address spaces).

on 64-bit systems, it could happen, but seriously, how likely is it in
the near future that there will be >=2GB structs?...

unless, maybe:
struct foo_s
{
int arr[1000][1000][1000];
};


more subtly, there is the issue of if existing 64-bit systems have
memory managers which allow objects this large? (such as via
malloc/free...).

or, additionally, the last time I did a multi-GB memory allocation (on
64-bit Windows, via "VirtualAlloc()"...), the computer lagged so hard
(due to swapping) that I worried a crash was likely (although, I changed
it to not use COMMIT on the memory, and problem fixed...).


OT:

mostly though this was for a region for my "code/data/bss heap":
basically, for dynamically generated machine code, which has a +-2GB limit.
x86-64 doesn't allow direct 64-bit memory addressing or jumps, meaning
one either has to load addresses into a register and use an indirect
addressing, or use the new RIP-relative addressing and live with a +-2GB
limit, or have all code/data/bss sections within the lower 4GB.

but, if one uses a single 2GB region, they can assure that any local
accesses will be within the +-2GB window, and thus use the cheaper
direct addressing (non-local calls then being handled via trampoline
thunks, and non-local global variables being assumed to be invalid).


Ben Bacarisse 02-24-2011 11:54 PM

Re: Structure byte padding rule
 
Keith Thompson <kst-u@mib.org> writes:

> Ben Bacarisse <ben.usenet@bsb.me.uk> writes:
>> sandeep <nospam@nospam.com> writes:
>>
>>> Eric Sosman writes:

>> <snip>
>>>> struct s { char x; double y; char z; };

>> <snip>
>>>> printf ("struct s takes %d bytes\n", (int)sizeof(struct s));
>>> printf ("x
>>>> starts %d bytes in\n", (int)offsetof(struct s, x)); printf ("y
>>> starts
>>>> %d bytes in\n", (int)offsetof(struct s, y)); printf ("z starts %d
>>> bytes
>>>> in\n", (int)offsetof(struct s, z));
>>>
>>> Unfortunately though, this code will invoke an undefined behavior on an
>>> implementation where sizeof(struct s) is bigger than INTMAX.

>>
>> It's not undefined behaviour -- it's implementation-defined.

> [...]
>
> An overflowing conversion to a signed type either yields an
> implementation-defined result or raises an implementation-defined signal
> (C99 6.3.1.3p3). The consequences of raising an implementation-defined
> signal are (at least potentially) undefined.


I don't see how except as a rather extreme reading the standard. The
implementation-defined signal must be "set" to either SIG_IGN or
SIG_DFL. The SIG_IGN case is well-defined; that of SIG_DFL says that
"default handling for that signal will occur". That's maybe a bit vague
but J.3.2 says of implementation-defined behaviour that "[t]he set of
signals, their semantics, and their default handling" must be
documented.

Of course, you could say that the implementation may document the
default handling as being "undefined behaviour" but seems to me to be a
perverse interpretation. In effect it requires that implementation-
defined behaviour may be defined as undefined!

<snip>
--
Ben.


All times are GMT. The time now is 04:30 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.