Velocity Reviews > Calculating storage alignment

# Calculating storage alignment

Darren Cubitt
Guest
Posts: n/a

 03-15-2009
Hi CLC,

For a while I've been using a pool allocator I wrote for fixed sized
objects. Basically, it allocates chunks of 32 (or however many bits in
an int) objects of a given size, and returns pointers to the individual
objects as required.

One problem I needed to deal with is, when a user calls the free
function, the allocator needs to be able to work out where the object is
with respect to the allocated chunk.

I got around this by allocating an extra int per object. This int stores
the index of this object in the chuck, from which the chuck address can
be calculated. This will give you the general idea:

int * ptr = malloc(sizeof_object + sizeof(int));
ptr[0] = index_of_object;
return ptr + 1;

Thanks to reading this group, I now know this is a Bad Idea(tm) due to
memory alignment issues - something I have been ignorant of in the past.

It seems what I need to do is instead of using sizeof(int) I need to
calculate the "universal alignment" value - for lack of a better name.
Obviously, there must be one or malloc() couldn't work.

So if I create a union of all the intrinsic types, and can then
calculate the alignment for that union, I should be set, right?

In short, would this code be correct and portable:

#include "intrinsic_types.h" /* intrinsic_types_t */

size_t a = offsetof(struct {char a; instrinsic_types_t b;}, b);

Also, is using a struct definition in offsetof() ok? It compiles fine in
MSVC, but that is hardly reassuring.

--
Darren (aka: qarnos)

Darren Cubitt
Guest
Posts: n/a

 03-15-2009
Malcolm McLean wrote:

> You don't need an integer to obtain the index. Subtract the base from
> the pointer instead.

This is the problem though - I don't know what the base is.

The allocation function returns a pointer to the object, which in
reality will be a pointer to somewhere inside the chunk.

When the free function is called, it gets this pointer back - but it
doesn't know the base address. It could be the first object in the
chunk, or the third, or eighth, or anything!

I store an integer offset since the code uses a set of int flags to mark
which objects within each chunk have already been allocated and this
flag need to be cleared - this is why the number of objects per chunk is
tied to the number of bits in an int. It just makes things easier. I
could just as easily store a direct pointer to the base, but then I'd
have to calculate the offset and divde by the size to get the flag bit

> Which means you don't need to align. A union of all types is one way to
> obtain a universal alignment, however. You can take the size rather than
> relying on the offsetof hack.

Hey, offsetof is great! I've written some cool code with that.

But I'm not sure if taking the size would work. Just because the largest
type is X chars, doesn't mean universal alignment is X chars.

The reason I choose offsetof() is that any struct needs to be compatible
with a malloc()ed address. malloc() guarantees alignment, so if you make
a struct with a char followed by the types-union, the compiler should
insert sufficient padding to guarantee alignment of the union, which
means offsetof(a, b) will give you the magic value.

Unless there is an error in my logic, of course. It wouldn't be the
first time.

> Because of C's array rules size also
> equals maximum alignment strictness.

I'm not sure I'm following with that bit about array size rules. Maybe
this is another part of the standard I need to farmiliarise myself with.
Could you explain further or give me an RTFM link?

Thanks.

--
Darren (aka: qarnos)

Darren Cubitt
Guest
Posts: n/a

 03-15-2009
Darren Cubitt wrote:

> The reason I choose offsetof() is that any struct needs to be compatible
> with a malloc()ed address. malloc() guarantees alignment, so if you make
> a struct with a char followed by the types-union, the compiler should
> insert sufficient padding to guarantee alignment of the union, which
> means offsetof(a, b) will give you the magic value.
>
> Unless there is an error in my logic, of course. It wouldn't be the
> first time.

Actually, there is an error in my logic which I'll correct before anyone
else does.

I should use:

struct { int a; intrinsic_types_t b; }

struct { char a; intrinsic_types_t b; }

Since it is an int that I want to reserve space for and there is no
guarantee that the alignment would be less than sizeof(int).

--
Darren (aka: qarnos)

Flash Gordon
Guest
Posts: n/a

 03-15-2009
Darren Cubitt wrote:
> Malcolm McLean wrote:
>
>
>> You don't need an integer to obtain the index. Subtract the base from
>> the pointer instead.

>
> This is the problem though - I don't know what the base is.
>
> The allocation function returns a pointer to the object, which in
> reality will be a pointer to somewhere inside the chunk.
>
> When the free function is called, it gets this pointer back - but it
> doesn't know the base address. It could be the first object in the
> chunk, or the third, or eighth, or anything!

I'm guessing you have multiple chunks and you, of course, don't know
which chunk the object the pointer refers to is in.

> I store an integer offset since the code uses a set of int flags to mark
> which objects within each chunk have already been allocated and this
> flag need to be cleared - this is why the number of objects per chunk is
> tied to the number of bits in an int. It just makes things easier. I
> could just as easily store a direct pointer to the base, but then I'd
> have to calculate the offset and divde by the size to get the flag bit

Still possible. The best method depends on all sorts of things,
including how portable the code needs to be. There are solutions which
would work on 95% of all systems currently in production (remembering
that 97% of all statistics are made up) which might be good enough, and
other methods which use up more memory but are completely portable, and
a range in between.

>> Which means you don't need to align. A union of all types is one way
>> to obtain a universal alignment, however. You can take the size rather
>> than relying on the offsetof hack.

>
> Hey, offsetof is great! I've written some cool code with that.
>
> But I'm not sure if taking the size would work. Just because the largest
> type is X chars, doesn't mean universal alignment is X chars.
>
> The reason I choose offsetof() is that any struct needs to be compatible
> with a malloc()ed address. malloc() guarantees alignment, so if you make
> a struct with a char followed by the types-union, the compiler should
> insert sufficient padding to guarantee alignment of the union, which
> means offsetof(a, b) will give you the magic value.
>
> Unless there is an error in my logic, of course. It wouldn't be the
> first time.

It will give you an alignment that will work for the types in the union,
but it might be larger than required. The same applies to the sizeof
method Malcolm suggests.

> > Because of C's array rules size also
> > equals maximum alignment strictness.

>
> I'm not sure I'm following with that bit about array size rules. Maybe
> this is another part of the standard I need to farmiliarise myself with.
> Could you explain further or give me an RTFM link?

Basically there are no gaps between elements in an array. So, for
example, if as far as the processor is concerned a long double takes 10
byte but it has to be aligned on a 4 byte boundary the C implementation
would have to make sizeof(long double) 12 and have two padding bytes
which are part of the long double as far as C is concerned. Obviously
using 12 as your alignment would be very wasteful in this situation!

Another option you could use is a hash table. You could then look up the
pointer passed to your "free" function and find all the other data you
need. Obviously this has some memory overheads. There are also some
portability problems in that there could be multiple pointer
representations could actually refer to the same pointer value, and
there is no guarantee you get back the same representation your passed!

Another option with different portability problems is to compare the
pointer passed to your "free" function to the start/end of each of your
chunks. However, C leave pointer comparisons (other than for equality)
undefined when the pointers do not refer to locations within the same
object.

Another potentially good way is to have a header file with
implementation specifics such as the alignment value. Then you just have
to set all the appropriate values when porting to a new system and know
exactly where to find them!

No solution is perfect, you just have to decide on the best compromise.
--
Flash Gordon

Darren Cubitt
Guest
Posts: n/a

 03-15-2009
Flash Gordon wrote:

> Basically there are no gaps between elements in an array. So, for
> example, if as far as the processor is concerned a long double takes 10
> byte but it has to be aligned on a 4 byte boundary the C implementation
> would have to make sizeof(long double) 12 and have two padding bytes
> which are part of the long double as far as C is concerned. Obviously
> using 12 as your alignment would be very wasteful in this situation!

Oh, I see where I am going wrong - sizeof returns the size of the object

Otherwise, if we take you example of a 10-byte long double aligned to
4-bytes, something like this woudln't work:

long double * ptr = &a_double_array_with_at_least_2_elements;

*++ptr = 0;

So sizeof(long double) would need to return 12, not 10? I am assuming
that ++ptr is equivelant to:

ptr = (long double *)((char *)ptr + sizeof(long double));

This is why I coudln't see how Malcom's solution would work. I knew I
was missing something. Something obvious, too, now that I think about it.

> Another option you could use is a hash table. You could then look up the
> pointer passed to your "free" function and find all the other data you
> need. Obviously this has some memory overheads. There are also some
> portability problems in that there could be multiple pointer
> representations could actually refer to the same pointer value, and
> there is no guarantee you get back the same representation your passed!

I used to some programming way back in IMB PC real mode. The
segment/offset point system had some advantages... but not enough to
make you want to rip your hair out.

> Another option with different portability problems is to compare the
> pointer passed to your "free" function to the start/end of each of your
> chunks. However, C leave pointer comparisons (other than for equality)
> undefined when the pointers do not refer to locations within the same
> object.
>
> Another potentially good way is to have a header file with
> implementation specifics such as the alignment value. Then you just have
> to set all the appropriate values when porting to a new system and know
> exactly where to find them!
>
> No solution is perfect, you just have to decide on the best compromise.

All good ideas, but I'm trying to keep it simple (this coming from a
person who always over-complicates things). The whole point of the code
is to make things more efficient! I was hoping for a magic bullet
solution but, alas, it just might not exist.

Thanks for your help.

--
Darren (aka: qarnos)

Flash Gordon
Guest
Posts: n/a

 03-15-2009
Darren Cubitt wrote:
> Flash Gordon wrote:
>
>> Basically there are no gaps between elements in an array. So, for
>> example, if as far as the processor is concerned a long double takes
>> 10 byte but it has to be aligned on a 4 byte boundary the C
>> implementation would have to make sizeof(long double) 12 and have two
>> padding bytes which are part of the long double as far as C is
>> concerned. Obviously using 12 as your alignment would be very wasteful
>> in this situation!

>
> Oh, I see where I am going wrong - sizeof returns the size of the object
> including alignment padding.

Padding can be for reasons other than alignment, but yes.

> Otherwise, if we take you example of a 10-byte long double aligned to
> 4-bytes, something like this woudln't work:
>
> long double * ptr = &a_double_array_with_at_least_2_elements;
>
> *++ptr = 0;
>
> So sizeof(long double) would need to return 12, not 10? I am assuming
> that ++ptr is equivelant to:
>
> ptr = (long double *)((char *)ptr + sizeof(long double));

Yes, you are correct.

Another reason it works like this is that array indexing is defined in
terms of pointer arithmetic.

> This is why I coudln't see how Malcom's solution would work. I knew I
> was missing something. Something obvious, too, now that I think about it.

A lot of things are obvious once you understand why they are done that
way

>> Another option you could use is a hash table. You could then look up
>> the pointer passed to your "free" function and find all the other data
>> you need. Obviously this has some memory overheads. There are also
>> some portability problems in that there could be multiple pointer
>> representations could actually refer to the same pointer value, and
>> there is no guarantee you get back the same representation your passed!

>
> I used to some programming way back in IMB PC real mode. The
> segment/offset point system had some advantages... but not enough to
> make you want to rip your hair out.

These things come and go. The hashing could also hit problems due to
padding bits in the pointer which are not guaranteed to be maintained.

>> Another option with different portability problems is to compare the
>> pointer passed to your "free" function to the start/end of each of
>> your chunks. However, C leave pointer comparisons (other than for
>> equality) undefined when the pointers do not refer to locations within
>> the same object.
>>
>> Another potentially good way is to have a header file with
>> implementation specifics such as the alignment value. Then you just
>> have to set all the appropriate values when porting to a new system
>> and know exactly where to find them!
>>
>> No solution is perfect, you just have to decide on the best compromise.

>
> All good ideas, but I'm trying to keep it simple (this coming from a
> person who always over-complicates things). The whole point of the code
> is to make things more efficient! I was hoping for a magic bullet
> solution but, alas, it just might not exist.

The simplest and most efficient solutions, in my opinion, is the header
file that specifies the alignment. A value of 4 is probably appropriate
for your current targets, and people targeting other environments can
easily change it.

Keeping system specifics in one (or a small number) of files to allow
easy porting is a standard technique. I've got commercial software
distributed as source where the file is called port.h to make it easy

> Thanks for your help.

No problem.
--
Flash Gordon

gw7rib@aol.com
Guest
Posts: n/a

 03-15-2009
On 15 Mar, 09:21, Darren Cubitt <(E-Mail Removed)> wrote:
> One problem I needed to deal with is, when a user calls the free
> function, the allocator needs to be able to work out where the object is
> with respect to the allocated chunk.
>
> I got around this by allocating an extra int per object. This int stores
> the index of this object in the chuck, from which the chuck address can
> be calculated. This will give you the general idea:
>
> * * * * int * ptr = malloc(sizeof_object + sizeof(int));
> * * * * ptr[0] = index_of_object;
> * * * * return ptr + 1;

You may be overlooking the obvious here. Firstly, is there some reason
you can't simply add the extra int to the object structure? Or, if the
object definition is fixed for some reason, why you can't create a new
structure:

struct myobject {
struct object obj;
int index; };

which includes both the desired object and your new int?

Hope that helps.
Paul.

CBFalconer
Guest
Posts: n/a

 03-16-2009
Darren Cubitt wrote:
>

.... snip ...
>
> All good ideas, but I'm trying to keep it simple (this coming from
> a person who always over-complicates things). The whole point of
> the code is to make things more efficient! I was hoping for a
> magic bullet solution but, alas, it just might not exist.

I gather you want to make efficient ways of allocating single
integers. Lets assume those integers are 4 bytes long, and that
anything of any other size will use malloc.

To control allocation we need one bit per item. If we limit the
count of items to something line 250, and each integer has 16 bits,
we need a 16 integer array to hold the bits. If we round total
storage used up to about 1k bytes, that will allow for about 240
integers.

So we have to allocate such a package early in program operation.
Once allocated, a request for less or more than 1 integer will
divert to the original malloc, as will all requests when the
package is totally used. Now we just have to find an integer whose
allocation bit is 0, set that allocation bit, and return the ints
address. Free just has to reset the allocation bit. Somewhere a
single integer will keep the count of items allocated.

Note that the system will be highly sensitive to misuse.

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>

Chris M. Thomasson
Guest
Posts: n/a

 03-16-2009

"Darren Cubitt" <(E-Mail Removed)> wrote in message
news:rM3vl.28771\$(E-Mail Removed)...
> Hi CLC,
>
> For a while I've been using a pool allocator I wrote for fixed sized
> objects. Basically, it allocates chunks of 32 (or however many bits in an
> int) objects of a given size, and returns pointers to the individual
> objects as required.
>
> One problem I needed to deal with is, when a user calls the free function,
> the allocator needs to be able to work out where the object is with
> respect to the allocated chunk.
>
> I got around this by allocating an extra int per object. This int stores
> the index of this object in the chuck, from which the chuck address can be
> calculated. This will give you the general idea:
>
> int * ptr = malloc(sizeof_object + sizeof(int));
> ptr[0] = index_of_object;
> return ptr + 1;
>
> Thanks to reading this group, I now know this is a Bad Idea(tm) due to
> memory alignment issues - something I have been ignorant of in the past.

You say the pool allocator works with only a single object size. Well, why
not do something simple like... Round all object sizes up to a multiple of
`sizeof(void*)'. Then allocate a large chunk of memory, say 4096 bytes of
object space + size of the chunk header + extra for alignment purposes, and
align the object space on a 4096 byte boundary. Object space must be a
multiple of the alignment requirements of at least a void*, or the given
object, which ever one is greater. Header alignment must be at least big
enough to accommodate its strictest type. Now, create a simple bump pointer
and a free list in the header. Initial allocations use the bump pointer. If
the pointer is at the end, it uses the freelist. Frees simply link objects
as-is right into the free list. To find the chunk header of any object, you
simple round its address down to 4096 boundary and subtract size of the
header. This is a fairly common technique, and it does not require an extra
wasteful int per object. The wasted space only gets detected when the user
associates the pool with an object whose sizeof value is less than that of
sizeof(void*).

I can whip up some simple working code if your interested. Many
state-of-the-art multi-threaded allocators use this technique. Intel TBB,
StreamFlow, ect...

Another simple technique is to use a bitmap; Here is simple example of
region allocator than can actually free individual objects:

> It seems what I need to do is instead of using sizeof(int) I need to
> calculate the "universal alignment" value - for lack of a better name.
> Obviously, there must be one or malloc() couldn't work.
>
> So if I create a union of all the intrinsic types, and can then calculate
> the alignment for that union, I should be set, right?
>
> In short, would this code be correct and portable:
>
>
> #include "intrinsic_types.h" /* intrinsic_types_t */
>
> size_t a = offsetof(struct {char a; instrinsic_types_t b;}, b);
>
> Also, is using a struct definition in offsetof() ok? It compiles fine in
> MSVC, but that is hardly reassuring.

This "offsetof" trick works fine:

I used this to create a simple conservative region allocator:

This can carve objects out of a given buffer, and maintain alignment without
resorting to a so called max alignment value. It works well in space
confined environments.