Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > [Slightly OT] Memory management and custom allocator

Reply
Thread Tools

[Slightly OT] Memory management and custom allocator

 
 
BartC
Guest
Posts: n/a
 
      01-03-2012


"Kleuske" <(E-Mail Removed)> wrote in message
news:jdupbb$om0$(E-Mail Removed)...
> On Mon, 02 Jan 2012 18:19:29 -0500, Eric Sosman saw fit to publish the
> following:
> <snip>
>
>> Yes, folks, yes, there certainly *are* circumstances where power-
>> of-two sizes are Just What The Doctor Ordered. But there's no good
>> reason to make everything in sight a power of two! If you allocate 128
>> array elements to represent the 88 keys of a piano, or sixteen for the
>> Twelve Days of Christmas, or thirty-two for your fingers and toes,
>> you're just being silly. Or superstitious.

>
> Bravo! I couldn't have put it better than that.


For user applications, I also agree with all this; there shouldn't be all
these mysterious powers-of-two floating around for no good reason. (And if
you've ever been involved in writing user manuals, the less to have to
explain, the better.)

For higher level, rapid development languages, they are also largely
irrelevant.

But C is a somewhat lower level language, and these powers of two do get
everywhere. You can completely ignore them if you wish, but if often pays to
be aware of them.

Anything that can only be described, or indexed, in N bits, will have 2^N
possibilities; you can't away from that. (And hardware seems to prefer N
itself as a power-of-two; so 8 bits in a byte is more common than 5, 7 or
11.)

And part of the context of this thread is memory allocation strategies,
where there might well be merit in a scheme were allocated blocks can only
change in size by a power of two (and which, if implemented too thinly on
top of malloc(), is affected by malloc()'s own overheads).

I don't think that is being obsessed with it.

--
Bartc

 
Reply With Quote
 
 
 
 
Joe keane
Guest
Posts: n/a
 
      01-03-2012
In article <(E-Mail Removed)>,
FtM <(E-Mail Removed)> wrote:
>That question is obviously plain wrong, because I *do* like the
>allocator taking cycles if this means that, when the memory is low, my
>complex data structure resides in one single page and the system has
>not to hysterically swap from disk to deal with a pathological
>fragmentation.


IMHE, people who try to make the allocation 'as fast as possible' are
just chasing a squirrel up the wrong tree.

No sir, you want to make your clients' code as fast as possible.

Sometimes people recommend a class-based linked list ('look! it's two
instructions!'). In my onion, this is disasterous. Works great for toy
programs, but when you have lots-of-data and long-running combined, your
memory turns into an omelet.

You may as well send invitations to Mr. Cache Miss and Ms. Tlb Fail...

If you have a 'fast' allocator that doesn't have the slighest idea of
locality, and a 'slow' allocator that takes some care of it, the
'slower' code will kick the 'faster' code arse every time.
 
Reply With Quote
 
 
 
 
88888 Dihedral
Guest
Posts: n/a
 
      01-03-2012
Joe keane於 2012年1月3日星期二UTC+8下午9時14分46秒 道:
> In article <(E-Mail Removed)>,
> FtM <(E-Mail Removed)> wrote:
> >That question is obviously plain wrong, because I *do* like the
> >allocator taking cycles if this means that, when the memory is low, my
> >complex data structure resides in one single page and the system has
> >not to hysterically swap from disk to deal with a pathological
> >fragmentation.

>
> IMHE, people who try to make the allocation 'as fast as possible' are
> just chasing a squirrel up the wrong tree.
>
> No sir, you want to make your clients' code as fast as possible.
>
> Sometimes people recommend a class-based linked list ('look! it's two
> instructions!'). In my onion, this is disasterous. Works great for toy
> programs, but when you have lots-of-data and long-running combined, your
> memory turns into an omelet.
>
> You may as well send invitations to Mr. Cache Miss and Ms. Tlb Fail...
>
> If you have a 'fast' allocator that doesn't have the slighest idea of
> locality, and a 'slow' allocator that takes some care of it, the
> 'slower' code will kick the 'faster' code arse every time.


This is a system level problem. The allocator under a huge dram space with
a lot free space remained should be designed toward fast easily.

But if the dram is near fully allocated then a good heap walker that can
swap pages to the hard disk is important toward reliability not the speed
part in the allocator to degrade the system speed.




 
Reply With Quote
 
Joe keane
Guest
Posts: n/a
 
      01-04-2012
In article <jdsact$nr6$(E-Mail Removed)>,
Eric Sosman <(E-Mail Removed)> wrote:
> If you have a forty-four-byte struct and allocate sixty-four
>bytes to hold it, how have you avoided *either* sort of "waste?"


If the cache line is 16 bytes, we know that each object will take only
three lines; the fourth is never used [it may be worse with the VM].
For this case 48 would work just as well (the compiler will do that for
you if it thinks 8-byte alignment is a good idea). But then if the
cache line is 32 bytes, same argument applies.
 
Reply With Quote
 
Joe keane
Guest
Posts: n/a
 
      01-04-2012
In article <jdte21$im7$(E-Mail Removed)>,
Eric Sosman <(E-Mail Removed)> wrote:
> I don't know when the number Two starts to exercise its mystical
>fascination for programmers, but the fascination certainly exists: You
>find people reaching for powers of two "just because," with no actual
>reason for the choice.


You have 'struct x', you add padding, test it, it is faster. You have
'struct y', you add padding, test it, it is faster. You have 'struct
z', you add padding, test it, it is faster. You have 'struct w', you
add padding, test it, it is slower [oops doesn't always work].

You have 'struct a', and your boss says 'can't test! just guess!'.
What would you do?
 
Reply With Quote
 
Karl Malbrain
Guest
Posts: n/a
 
      01-05-2012
On Jan 1, 4:54*am, "BartC" <(E-Mail Removed)> wrote:
> "Markus Wichmann" <(E-Mail Removed)> wrote in message
>
> news:(E-Mail Removed)...
>
> > On 31.12.2011 18:52, FtM wrote:
> >> Simply because it can't know if I'm going to do 200 little allocations
> >> or 1 big one.

> > For one, it _never_ uses brk(). That syscall is just plain old and does
> > nothing mmap() couldn't do better. Then it has linked lists prepared for
> > objects of sizes of powers of two, starting at 16 and stopping at 2048.

>
> That exactly what I do in my own allocator! Except I stop at 1024. Although
> I probably manage them very differently, for example once a small block size
> is allocated, say 64 bytes, it always stays that size.
>
> And one thing about malloc() I don't like, are the overheads in having to
> remember the block sizes; if I allocate 16 bytes, malloc() seems to reserve
> either 20 or, more often, 24 bytes on the few compilers I've tried. That's a
> 50% overhead!
>
> Also if you're being sensible and trying to keep to power-of-two
> allocations, this 4- or 8-byte overhead is going to screw that up.
>
> My allocators have never stored a block size, and it has never really been a
> problem. Either you will know the size (a struct for example), or you need
> to keep track of it yourself, in which case it is very handy when you have
> an allocation that can grow in never having to keep running back to
> realloc().
>
> As an illustration, take this simple example of a string growing from oneto
> ten million characters:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> int main(void)
> {
> char *p;
> int i,len;
>
> p=malloc(1);
> *p=0;
> len=0;
>
> for (i=1; i<=10000000; ++i) {
> *++len;
> *p=realloc(p,len+1);
> *p[len-1]='*';
> *p[len]=0;
> // puts(p);}
>
> printf("Length= %d\n",strlen(p));
>
> }
>
> On three C compilers, this took about 12 seconds (on DMC, any length of
> string resulted in a runtime of 0.4 seconds; a bit odd).
>
> Using my strategy, of avoiding any calls to malloc or realloc unless
> absolutely necessary, and using that inside an interpreter, this bit of
> script took about 0.1 seconds:
>
> string a
>
> a:=""
> to 10 million do
> *a+:="*"
> od
> println "Length=",a.len
>
> Of course, real C code wouldn't use such a naive strategy; it would also
> seek to minimise malloc()/realloc() calls, but that's also an admission that
> these calls have some overheads which it would be desirable to avoid. Hence
> I can understand the OP's attempt to experiment with his own versions.
>
> --
> Bartc


You can take a look at my bit-mapped memory allocator: code.google.com/
p/arena-memory-allocation
for an example of out-of-band storage of the block sizes.

Karl m
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      01-05-2012
On 1/4/2012 6:42 PM, Joe keane wrote:
> In article<jdte21$im7$(E-Mail Removed)>,
> Eric Sosman<(E-Mail Removed)> wrote:
>> I don't know when the number Two starts to exercise its mystical
>> fascination for programmers, but the fascination certainly exists: You
>> find people reaching for powers of two "just because," with no actual
>> reason for the choice.

>
> You have 'struct x', you add padding, test it, it is faster. You have
> 'struct y', you add padding, test it, it is faster. You have 'struct
> z', you add padding, test it, it is faster. You have 'struct w', you
> add padding, test it, it is slower [oops doesn't always work].
>
> You have 'struct a', and your boss says 'can't test! just guess!'.
> What would you do?


I'd leave it alone, because there are more important things
to attend to.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)d
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      01-05-2012
On 1/4/2012 4:52 PM, Joe keane wrote:
> In article<jdsact$nr6$(E-Mail Removed)>,
> Eric Sosman<(E-Mail Removed)> wrote:
>> If you have a forty-four-byte struct and allocate sixty-four
>> bytes to hold it, how have you avoided *either* sort of "waste?"

>
> If the cache line is 16 bytes, we know that each object will take only
> three lines; the fourth is never used [it may be worse with the VM].
> For this case 48 would work just as well (the compiler will do that for
> you if it thinks 8-byte alignment is a good idea). But then if the
> cache line is 32 bytes, same argument applies.


All I can say is that you and I have divergent notions of
efficiency. I'm thinking "I can run over the entire array with
just N*44/C cache misses; Joe would prefer N*64/C." That is, I
think you'll incur 46% more cache misses than I will. Tell me
again, please, what I have wasted by using 31% fewer cache line
loads than you would?

I'm also thinking "I need N*44/P pages for my array; Joe would
rather use N*64/P." That is, I think your working set size will
be 46% larger than mine, your likelihood of taking a page fault
will be 46% greater than mine. Tell me again, please, what have
I wasted by using 31% fewer memory pages?

It's an odd sort of "waste" that spends fewer resources.

--
Eric Sosman
(E-Mail Removed)d
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      01-05-2012
"Karl Malbrain" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...


> You can take a look at my bit-mapped memory allocator: code.google.com/
> p/arena-memory-allocation
> for an example of out-of-band storage of the block sizes.


Thanks, I'll take a closer look later.

But a few things struck me immediately:

(1) It had a comment block at the beginning that actually said what the code
was and gave a brief description, instead of the usual GPL waffle.

(2) It's vfree() function requires the size to be supplied, just like mine
does! So I'm not alone in thinking the way C does it is not the only way.

(3) It has plenty of powers-of-two sprinkled about. With the comments in
this thread I was starting to feel like some sort of dinosaur.

--
Bartc

 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      01-05-2012


"Eric Sosman" <(E-Mail Removed)> wrote in message
news:je2uhu$6s5$(E-Mail Removed)...
> On 1/4/2012 4:52 PM, Joe keane wrote:
>> In article<jdsact$nr6$(E-Mail Removed)>,
>> Eric Sosman<(E-Mail Removed)> wrote:
>>> If you have a forty-four-byte struct and allocate sixty-four
>>> bytes to hold it, how have you avoided *either* sort of "waste?"

>>
>> If the cache line is 16 bytes, we know that each object will take only
>> three lines; the fourth is never used [it may be worse with the VM].
>> For this case 48 would work just as well (the compiler will do that for
>> you if it thinks 8-byte alignment is a good idea). But then if the
>> cache line is 32 bytes, same argument applies.

>
> All I can say is that you and I have divergent notions of
> efficiency. I'm thinking "I can run over the entire array with
> just N*44/C cache misses; Joe would prefer N*64/C." That is, I
> think you'll incur 46% more cache misses than I will. Tell me
> again, please, what I have wasted by using 31% fewer cache line
> loads than you would?


I think he means that the last 16-bytes of the 64 might never be accessed,
so will never be fetched from memory in the first place; there is no cache
miss.

(That does depend on how the 64-bytes are used; if the whole struct is
copied, the compiler might access all 64-bytes anyway.)

And if the cache blocks are 32-bytes, then it doesn't matter.

> I'm also thinking "I need N*44/P pages for my array; Joe would
> rather use N*64/P." That is, I think your working set size will
> be 46% larger than mine, your likelihood of taking a page fault
> will be 46% greater than mine. Tell me again, please, what have
> I wasted by using 31% fewer memory pages?


No one was advocating wasting 20 bytes, but finding a use for them, perhaps
moving data there that would have been stored elsewhere.

> It's an odd sort of "waste" that spends fewer resources.


If you had a heavily used struct, and it turned out it had 65 bytes, that
wouldn't bother you any? You wouldn't try and get it down to 64?

--
Bartc

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ Memory Management Innovation: GC Allocator xushiwei C++ 1 04-22-2008 11:51 AM
custom allocator using templates Robert Frunzke C++ 4 01-29-2006 05:30 PM
Custom memory allocator - alignment Romeo Colacitti C Programming 4 03-07-2005 01:02 AM
Custom allocator sample code for vector Alex Vinokur C++ 16 08-16-2004 11:25 AM
Idea for custom thread-safe STL allocator? Brian Genisio C++ 12 01-15-2004 03:41 PM



Advertisments