Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > [Slightly OT] Memory management and custom allocator

Reply
Thread Tools

[Slightly OT] Memory management and custom allocator

 
 
Ben Bacarisse
Guest
Posts: n/a
 
      01-01-2012
"BartC" <(E-Mail Removed)> writes:

> "BartC" <(E-Mail Removed)> wrote in message
> news:LPYLq.355$(E-Mail Removed)2...
>
> [growing a string from empty, to 10 million chars]
>> char *p;
>> int i,len;
>>
>> p=malloc(1);
>> *p=0;
>> len=0;
>>
>> for (i=1; i<=10000000; ++i) {
>> ++len;
>> p=realloc(p,len+1);
>> p[len-1]='*';
>> p[len]=0;
>> }

>
>> On three C compilers, this took about 12 seconds (on DMC, any length
>> of string resulted in a runtime of 0.4 seconds; a bit odd).


On my system is takes, .15 seconds

> Measuring the number of times the value of p changes, with the slow
> compilers, this was respectively 2335, 2336, and 2337. With DMC, it
> was just 46.


and changes p 156 times. (Off-topic extra: a C++ program using s += '*'
on a std::string takes 0.07s on the same machine.)

> Clearly DMC overallocates more than the others (growing allocations by
> sqrt(2) apparently).


I don't see how you can conclude that. The allocated storage might be
able to grown (i.e. it is not initially over-allocated) without changing
the address.

<snip>
--
Ben.
 
Reply With Quote
 
 
 
 
BartC
Guest
Posts: n/a
 
      01-02-2012
"Ben Bacarisse" <(E-Mail Removed)> wrote in message
news:0.0749b13cccb174981f23.20120101235442GMT.87ip (E-Mail Removed)...
> "BartC" <(E-Mail Removed)> writes:


>>> for (i=1; i<=10000000; ++i) {
>>> p=realloc(p,len+1);


>>> On three C compilers, this took about 12 seconds (on DMC, any length
>>> of string resulted in a runtime of 0.4 seconds; a bit odd).

>
> On my system is takes, .15 seconds


>> Measuring the number of times the value of p changes, with the slow
>> compilers, this was respectively 2335, 2336, and 2337. With DMC, it
>> was just 46.

>
> and changes p 156 times.


Your implementation must be pretty good then, or maybe it's optimised a few
realloc() calls out.

Running the test by calling realloc() with the same argument (so no new
allocations are needed) still takes about 1 second on those three slow
compilers (gcc, lccwin-32 and Pelles C). Seems like two of them have ripped
off the other's inefficient realloc() implementation!

> (Off-topic extra: a C++ program using s += '*'
> on a std::string takes 0.07s on the same machine.)


Sounds about right when calls to realloc() are minimised. The question I
suppose is how much buffering/allocation logic should be done outside of
malloc()/realloc(), or do you just let those functions do most of the work;
you can afford to do that with fast implementations like yours.

>> Clearly DMC overallocates more than the others (growing allocations by
>> sqrt(2) apparently).

>
> I don't see how you can conclude that.


Just a guess. The 46th root of ten million is around 1.42.

> The allocated storage might be
> able to grown (i.e. it is not initially over-allocated) without changing
> the address.


That's true; with nothing else going on memory-wise, even an over-allocated
block could be further expanded at the same location, depending on how
allocator works.

--
Bartc
 
Reply With Quote
 
 
 
 
Kaz Kylheku
Guest
Posts: n/a
 
      01-02-2012
On 2012-01-01, BartC <(E-Mail Removed)> wrote:
> for (i=1; i<=10000000; ++i) {
> ++len;
> p=realloc(p,len+1);


Growing linearly, and, by one character, is silly.

You have O(N*N) behavior here.

If you grow exponentially (e.g. doubling the buffer), you get amortized O(N)
behavior.

I.e. this is a poor algorithm. It's the O(N*N) time that's killing this,
not malloc.
 
Reply With Quote
 
88888 Dihedral
Guest
Posts: n/a
 
      01-02-2012
FtM於 2011年12月31日星期*UTC+8下午10時56分53 寫道:
> [xpost comp.lang.c, comp.unix.programmer, comp.os.ms-
> windows.programmer.win32]
> Hello NG,
> I have a very specific question, but first of all I describe you what
> I'm facing as I'm open to alternative general solutions..
> I'm refactoring some old C code that deals with very basilar data
> structures (like lists, priority queues, hashtables and so on) that
> pretends to be very portable. Regarding the memory management, I have
> an internal wrapper that, on *nix and windows systems, maps the malloc/
> calloc/realloc/free functions one-to-one to the default ones, but it's
> possibile to switch to another custom allocator that works pretty much
> as a standard allocator (a big chunk of memory splitted/joined, tought
> for embedded systems).
> I would like to implement my custom allocator for the *nix and windows
> systems too (of course loosing some portability for performance gain),
> and maybe implement something like the C++ allocators to improve the
> memory management of the different algorithms (like rare large
> allocations on vectors / frequent small allocations on lists), but
> I've some questions:
> - In these systems, to have decent performances, I was thinking to
> allocate page-sized (and page-aligned) chunks to have small
> allocations in the same page, but how is it possible to request some
> memory with this characteristics? Will it help the OS memory manager?
> - Does all of this make any sense?
> Thanks for any suggestion!
> Ciao!
>
> P.S. I don't really have the necessity of a "performance boost", but
> I'm dealing with a general, low-level and already used library, and as
> long as I'm refactoring the code I'd like to do it the better possible
> way


Kind of interesting here. I thought you were describing a customized
heap manager that can support page swapping to some file in the hard disk.

A graph of nodes of buffers is required.

If you don't have to page out to the secondary storage system, then
the malloc, memcpy, memset and free in the GCC source code are available.


 
Reply With Quote
 
FtM
Guest
Posts: n/a
 
      01-02-2012
On Jan 2, 12:34*am, "BartC" <(E-Mail Removed)> wrote:
> "FtM" <(E-Mail Removed)> wrote in message
>
> news:(E-Mail Removed)...
>
> > - inside, I'll have three different "objects": the first one is for
> > little allocations, the second one for the others, the third one for
> > the medium/big allocations that possibly will need to realloc(). The
> > third allocator follows the "normal" strategy (and could simply use
> > the system calls for that matter), while the first and the second one
> > could simply have some preallocated blocks and a vector to mark if
> > they are used or not, skipping the natural overhead needed by the
> > general purpose allocator.

>
> > What do you think about this approach (despite the fact that in the
> > end it probably gives no noticeable advantages)?

>
> I don't think there's enough information here; what sort of sizes are the
> three classes of allocator handling?
>


16<=x<=128 (little structures, one-shot char buffers, etc.)
128<x<=2048 (array of pointers, bigger structures, etc.)
x>2048 (realloc()atable memory)


> Will any of them always be a fixed size?


Yes, the first two types. (Of course, if a realloc() is requested for
a first or second type memory it should be switched to the third one,
but this is not likely to happen as the user (me) is aware of that
behaviour!). I didn't write it, but obviously I will have the
possibility to call the objects directly, choosing the preferred
allocation strategy.

> Will they need deallocating?


Sure they will. The first two by marking on their own map-vector, the
third by the usual way of joining adiacent free blocks.

> What strategy will be used to grow the
> heap from which the the blocks will come; in fact does this heap need to
> grow?


Each "object" will have some pages to start with; when one of them
fills its pages completely will ask the system for some more
(increasing its allocation vector on the fly too). I don't know the
"some" quantity, but I think no one do before runtime.. Making it a
compile-(or maybe run-)time parameter will help monitoring and testing
the overall performances, but this is just an idea.
Naturally, speaking of the first two "objects", the space of the
allocation vector should be taken into account, but as it is a bit-
masked vector it will occupy a very little space.

> Where does the memory come from in the first place?


mmap (thus even leaving the possibility to swap to persistent storage
or share the memory between processes).

Ciao!
 
Reply With Quote
 
88888 Dihedral
Guest
Posts: n/a
 
      01-02-2012
FtM於 2012年1月2日星期一UTC+8下午6時18分45秒 道:
> On Jan 2, 12:34*am, "BartC" <(E-Mail Removed)> wrote:
> > "FtM" <(E-Mail Removed)> wrote in message
> >
> > news:(E-Mail Removed)....
> >
> > > - inside, I'll have three different "objects": the first one is for
> > > little allocations, the second one for the others, the third one for
> > > the medium/big allocations that possibly will need to realloc(). The
> > > third allocator follows the "normal" strategy (and could simply use
> > > the system calls for that matter), while the first and the second one
> > > could simply have some preallocated blocks and a vector to mark if
> > > they are used or not, skipping the natural overhead needed by the
> > > general purpose allocator.

> >
> > > What do you think about this approach (despite the fact that in the
> > > end it probably gives no noticeable advantages)?

> >
> > I don't think there's enough information here; what sort of sizes are the
> > three classes of allocator handling?
> >

>
> 16<=x<=128 (little structures, one-shot char buffers, etc.)
> 128<x<=2048 (array of pointers, bigger structures, etc.)
> x>2048 (realloc()atable memory)
>
>
> > Will any of them always be a fixed size?

>
> Yes, the first two types. (Of course, if a realloc() is requested for
> a first or second type memory it should be switched to the third one,
> but this is not likely to happen as the user (me) is aware of that
> behaviour!). I didn't write it, but obviously I will have the
> possibility to call the objects directly, choosing the preferred
> allocation strategy.
>
> > Will they need deallocating?

>
> Sure they will. The first two by marking on their own map-vector, the
> third by the usual way of joining adiacent free blocks.
>
> > What strategy will be used to grow the
> > heap from which the the blocks will come; in fact does this heap need to
> > grow?

>
> Each "object" will have some pages to start with; when one of them
> fills its pages completely will ask the system for some more
> (increasing its allocation vector on the fly too). I don't know the
> "some" quantity, but I think no one do before runtime.. Making it a
> compile-(or maybe run-)time parameter will help monitoring and testing
> the overall performances, but this is just an idea.
> Naturally, speaking of the first two "objects", the space of the
> allocation vector should be taken into account, but as it is a bit-
> masked vector it will occupy a very little space.
>
> > Where does the memory come from in the first place?

>
> mmap (thus even leaving the possibility to swap to persistent storage
> or share the memory between processes).


This requires customized data compression modules not just the heap manager..

If a program only uses 20 to 30 percent of the total heap space in a typical
platform, then a lot people won't bother to write a new heap manager at all..

>
> Ciao!


 
Reply With Quote
 
FtM
Guest
Posts: n/a
 
      01-02-2012
On Jan 2, 11:34*am, 88888 Dihedral <(E-Mail Removed)>
wrote:
> FtM於 2012年1月2日星期一UTC+8下午6時18分45秒 道:
>
> > > Where does the memory come from in the first place?

>
> > mmap (thus even leaving the possibility to swap to persistent storage
> > or share the memory between processes).

>
> This requires customized data compression modules not just the heap manager.
>


Maybe. Or maybe more disk space? I don't think it's relevant
here..

> If a program only uses 20 to 30 percent of the total heap space in a typical
> platform, then a lot people won't bother to write a new heap manager at all.
>


I don't see your point.. In the OSes we were speaking about there's no
fixed "heap-space", the physical memory is "recycled" based on the
current needs.. Or you were thinking something different?

Ciao!
 
Reply With Quote
 
88888 Dihedral
Guest
Posts: n/a
 
      01-02-2012
FtM於 2012年1月2日星期一UTC+8下午6時43分33秒 道:
> On Jan 2, 11:34*am, 88888 Dihedral <(E-Mail Removed)>
> wrote:
> > FtM於 2012年1月2日星期一UTC+8下午6時18分45秒 道:
> >
> > > > Where does the memory come from in the first place?

> >
> > > mmap (thus even leaving the possibility to swap to persistent storage
> > > or share the memory between processes).

> >
> > This requires customized data compression modules not just the heap manager.
> >

>
> Maybe. Or maybe more disk space? I don't think it's relevant
> here..
>
> > If a program only uses 20 to 30 percent of the total heap space in a typical
> > platform, then a lot people won't bother to write a new heap manager atall.
> >

>
> I don't see your point.. In the OSes we were speaking about there's no
> fixed "heap-space", the physical memory is "recycled" based on the
> current needs.. Or you were thinking something different?
>
> Ciao!


I think the heap space in a 32 bit platform is somewhat limited.
Of course in the 64 bit OS a lot things will be different.

But as long as the hard disk and the dram are both of some finite sizes,
I just model the heap space to be of a finite size in my programs.

I am not interested in trivial programs.



 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      01-02-2012
"FtM" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> On Jan 2, 12:34 am, "BartC" <(E-Mail Removed)> wrote:



>> What strategy will be used to grow the
>> heap from which the the blocks will come; in fact does this heap need to
>> grow?

>
> Each "object" will have some pages to start with; when one of them
> fills its pages completely will ask the system for some more
> (increasing its allocation vector on the fly too). I don't know the
> "some" quantity, but I think no one do before runtime.. Making it a
> compile-(or maybe run-)time parameter will help monitoring and testing
> the overall performances, but this is just an idea.
> Naturally, speaking of the first two "objects", the space of the
> allocation vector should be taken into account, but as it is a bit-
> masked vector it will occupy a very little space.
>
>> Where does the memory come from in the first place?

>
> mmap (thus even leaving the possibility to swap to persistent storage
> or share the memory between processes).


It sounds like you are more interesting in playing with the virtual memory
side of things and doing your own paging.

Be aware that the OS tends to be take care of this, and does it rather well
in conjunction with the hardware! (If you look in particular at the memory
management routines on the msdn website, it can do it rather comprehensively
too. I usually take one look then go back to the friendly little malloc,
realloc and free functions of C.)

Also, if you just want to experiment, then possibly malloc() can provide the
initial memory within which you can do your own allocations. Just ask for
the largest block of memory you want to work with, and look for the first
address within it where the bottom 12 bits are zero; that will likely be on
a page boundary. (In a real program, malloc()ed memory may not grow in the
way you might like, so your mmap() method might be needed.)

--
Bartc

 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      01-02-2012


"Kaz Kylheku" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> On 2012-01-01, BartC <(E-Mail Removed)> wrote:
>> for (i=1; i<=10000000; ++i) {
>> ++len;
>> p=realloc(p,len+1);

>
> Growing linearly, and, by one character, is silly.


Nevertheless, that's what the program wants to do. (As it happens I quite
often have to build strings this way, and without knowing the final size.)


> You have O(N*N) behavior here.
>
> If you grow exponentially (e.g. doubling the buffer), you get amortized
> O(N)
> behavior.


But do you worry about this stuff directly in your code, or let the
language's runtime take care of it, or have some intermediate functions deal
with these details?

(And in the case of a simple C string, there is nowhere to put any extra
information about what the current allocated size is; it doesn't even know
it's length!)

> I.e. this is a poor algorithm. It's the O(N*N) time that's killing this,
> not malloc.


Yet some C implementations coped reasonably enough:

"Ben Bacarisse" <(E-Mail Removed)> wrote in message

> On my system is takes, .15 seconds


--
Bartc

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ Memory Management Innovation: GC Allocator xushiwei C++ 1 04-22-2008 11:51 AM
custom allocator using templates Robert Frunzke C++ 4 01-29-2006 05:30 PM
Custom memory allocator - alignment Romeo Colacitti C Programming 4 03-07-2005 01:02 AM
Custom allocator sample code for vector Alex Vinokur C++ 16 08-16-2004 11:25 AM
Idea for custom thread-safe STL allocator? Brian Genisio C++ 12 01-15-2004 03:41 PM



Advertisments