Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > [Slightly OT] Memory management and custom allocator

Reply
Thread Tools

[Slightly OT] Memory management and custom allocator

 
 
Phil Carmody
Guest
Posts: n/a
 
      01-05-2012
(Joe keane) writes:
> In article <jdte21$im7$>,
> Eric Sosman <> wrote:
> > I don't know when the number Two starts to exercise its mystical
> >fascination for programmers, but the fascination certainly exists: You
> >find people reaching for powers of two "just because," with no actual
> >reason for the choice.

>
> You have 'struct x', you add padding, test it, it is faster. You have
> 'struct y', you add padding, test it, it is faster. You have 'struct
> z', you add padding, test it, it is faster. You have 'struct w', you
> add padding, test it, it is slower [oops doesn't always work].
>
> You have 'struct a', and your boss says 'can't test! just guess!'.
> What would you do?


I'd ask my boss if he's heard of Knuth.

However, it's hypothetical, I'd probably not be working at such a
company in the first place.

Phil
--
Unix is simple. It just takes a genius to understand its simplicity
-- Dennis Ritchie (1941-2011), Unix Co-Creator
 
Reply With Quote
 
 
 
 
88888 Dihedral
Guest
Posts: n/a
 
      01-06-2012
88888 Dihedral於 2012年1月3日星期二UTC+8下午10時07分20秒 寫道:
> Joe keane於 2012年1月3日星期二UTC+8下午9時14分46秒 道:
> > In article <f500e5bf-e...@t16g2000vba.googlegroups.com>,
> > FtM <fma...@gmail.com> wrote:
> > >That question is obviously plain wrong, because I *do* like the
> > >allocator taking cycles if this means that, when the memory is low, my
> > >complex data structure resides in one single page and the system has
> > >not to hysterically swap from disk to deal with a pathological
> > >fragmentation.

> >
> > IMHE, people who try to make the allocation 'as fast as possible' are
> > just chasing a squirrel up the wrong tree.
> >
> > No sir, you want to make your clients' code as fast as possible.
> >
> > Sometimes people recommend a class-based linked list ('look! it's two
> > instructions!'). In my onion, this is disasterous. Works great for toy
> > programs, but when you have lots-of-data and long-running combined, your
> > memory turns into an omelet.
> >
> > You may as well send invitations to Mr. Cache Miss and Ms. Tlb Fail...
> >
> > If you have a 'fast' allocator that doesn't have the slighest idea of
> > locality, and a 'slow' allocator that takes some care of it, the
> > 'slower' code will kick the 'faster' code arse every time.

>
> This is a system level problem. The allocator under a huge dram space with
> a lot free space remained should be designed toward fast easily.
>
> But if the dram is near fully allocated then a good heap walker that can
> swap pages to the hard disk is important toward reliability not the speed
> part in the allocator to degrade the system speed.


The heap managing module is very important in the OS.

Normally all allocations from malloc are allocated in sizes of 2's powers
that are 2 to S powers for S=4, 5, 6, ..12 to 16.

A page might be 2K bytes or 64K or even 4M(20 bits) bytes in various systems.

There are trivial details.

The heap walker part for each node in the heap space is more important.

After a heap walk, the buddy algorithm can be applied.

Do we need to discuss the details here?





 
Reply With Quote
 
 
 
 
Phil Carmody
Guest
Posts: n/a
 
      01-07-2012
"BartC" <> writes:
> "Eric Sosman" <> wrote in message
> news:je2uhu$6s5$...
> > On 1/4/2012 4:52 PM, Joe keane wrote:
> >> In article<jdsact$nr6$>,
> >> Eric Sosman<> wrote:
> >>> If you have a forty-four-byte struct and allocate sixty-four
> >>> bytes to hold it, how have you avoided *either* sort of "waste?"
> >>
> >> If the cache line is 16 bytes, we know that each object will take only
> >> three lines; the fourth is never used [it may be worse with the VM].
> >> For this case 48 would work just as well (the compiler will do that for
> >> you if it thinks 8-byte alignment is a good idea). But then if the
> >> cache line is 32 bytes, same argument applies.

> >
> > All I can say is that you and I have divergent notions of
> > efficiency. I'm thinking "I can run over the entire array with
> > just N*44/C cache misses; Joe would prefer N*64/C." That is, I
> > think you'll incur 46% more cache misses than I will. Tell me
> > again, please, what I have wasted by using 31% fewer cache line
> > loads than you would?

>
> I think he means that the last 16-bytes of the 64 might never be accessed,
> so will never be fetched from memory in the first place; there is no cache
> miss.


Were there to be no hits on google for "64 bytes (one cache line)", then
you might have a point. The fact that most of such hits refer to the single
most popular general purpose computer architecture in the world diminishes
your point significantly. Those bytes are being pulled into the cache whether
you access them or not.

> (That does depend on how the 64-bytes are used; if the whole struct is
> copied, the compiler might access all 64-bytes anyway.)
>
> And if the cache blocks are 32-bytes, then it doesn't matter.


Nonsense. An adjacent bunch of Eric's structures being pulled into the
working set will displace fewer cache lines than the same number of your
structs. It's only if the cache lines are 16 bytes (i.e. not any of the
half dozen architectures I've worked with in recent years) that both
will cause 48 bytes to be churned.

> > I'm also thinking "I need N*44/P pages for my array; Joe would
> > rather use N*64/P." That is, I think your working set size will
> > be 46% larger than mine, your likelihood of taking a page fault
> > will be 46% greater than mine. Tell me again, please, what have
> > I wasted by using 31% fewer memory pages?

>
> No one was advocating wasting 20 bytes, but finding a use for them, perhaps
> moving data there that would have been stored elsewhere.


How about the "use" of "still being available for subsequent allocations"?

> > It's an odd sort of "waste" that spends fewer resources.

>
> If you had a heavily used struct, and it turned out it had 65 bytes, that
> wouldn't bother you any? You wouldn't try and get it down to 64?


Straw man.

Phil
--
Unix is simple. It just takes a genius to understand its simplicity
-- Dennis Ritchie (1941-2011), Unix Co-Creator
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      01-07-2012


"Phil Carmody" <> wrote in message
news:...
> "BartC" <> writes:


>> I think he means that the last 16-bytes of the 64 might never be
>> accessed,
>> so will never be fetched from memory in the first place; there is no
>> cache
>> miss.

>
> Were there to be no hits on google for "64 bytes (one cache line)", then
> you might have a point. The fact that most of such hits refer to the
> single
> most popular general purpose computer architecture in the world diminishes
> your point significantly. Those bytes are being pulled into the cache
> whether
> you access them or not.


I've no idea what cache-line sizes are being used these days. If it's 64
instead of 16 then perhaps we'd be talking about a 176-byte versus a
256-byte struct instead.

>> And if the cache blocks are 32-bytes, then it doesn't matter.

>
> Nonsense. An adjacent bunch of Eric's structures being pulled into the
> working set will displace fewer cache lines than the same number of your
> structs.


That's true, if you touch any element of a 64-byte struct, and the cache
lines are 64-bytes, and the struct is aligned on 64-bytes, then a set of,
say, 1000 serial accesses will require 1000 memory fetches, instead of 688
if they were only 44 bytes. That's why I advocated not wasting that extra 20
bytes per struct. (In fact I suggested making do with 32 in the first
place.)

The problem with 44-bytes in this case, is that if you're touching two parts
of the struct at a time, and you're doing random access (which is what
arrays are for), you'll frequently be fetching two cache lines instead of
one, as the structs will straddle a 64-byte boundary.

And if you are only touching one member of the struct at a time, and are
doing serial accesses, then perhaps you'd be better off using a set of
parallel arrays, for at least some of the struct members. If a field is
4-bytes wide, then that would require only about 1/11 of the memory fetches
as the 44-byte struct.

You're quite likely to find these narrower array elements happen to have
power-of-two widths..

Anyway, look at this struct (compiled with default padding):

struct record{
char c1;
double d1;
char c2;
double d2;
char c3;
double d3;
char c4;
} x;

On my machine, these 28 bytes end up occupying 56, exactly double!

Rearrange these fields so all the small ones are at the end, and still you
get 32, not 28!

Wasting bytes is obviously OK when the compiler does it...

(BTW I put that struct into another language, and it consistently gave a
size of 28 bytes no matter how the fields are ordered.)

--
Bartc

 
Reply With Quote
 
88888 Dihedral
Guest
Posts: n/a
 
      01-07-2012
FtM於 2011年12月31日星期*UTC+8下午10時56分53 寫道:
> [xpost comp.lang.c, comp.unix.programmer, comp.os.ms-
> windows.programmer.win32]
> Hello NG,
> I have a very specific question, but first of all I describe you what
> I'm facing as I'm open to alternative general solutions..
> I'm refactoring some old C code that deals with very basilar data
> structures (like lists, priority queues, hashtables and so on) that
> pretends to be very portable. Regarding the memory management, I have
> an internal wrapper that, on *nix and windows systems, maps the malloc/
> calloc/realloc/free functions one-to-one to the default ones, but it's
> possibile to switch to another custom allocator that works pretty much
> as a standard allocator (a big chunk of memory splitted/joined, tought
> for embedded systems).
> I would like to implement my custom allocator for the *nix and windows
> systems too (of course loosing some portability for performance gain),
> and maybe implement something like the C++ allocators to improve the
> memory management of the different algorithms (like rare large
> allocations on vectors / frequent small allocations on lists), but
> I've some questions:
> - In these systems, to have decent performances, I was thinking to
> allocate page-sized (and page-aligned) chunks to have small
> allocations in the same page, but how is it possible to request some
> memory with this characteristics? Will it help the OS memory manager?
> - Does all of this make any sense?
> Thanks for any suggestion!
> Ciao!
>
> P.S. I don't really have the necessity of a "performance boost", but
> I'm dealing with a general, low-level and already used library, and as
> long as I'm refactoring the code I'd like to do it the better possible
> way


I mean the allocation requests for the heap space can be modeled
as a discrete random process X[n] for n=0,1,2,3... in an application
and so is the free process Y[m] for m=0, 1,2,3 ....

Therefore the amount of memory used can be counted as

for(i=0, s1=0;i<n;i++) s1+=X[i];
for(i=0, s2=0;i<m;i++) s2+=Y[i];

The number s1-s2 is the amount of space used is another random process, too..

The OS part does not know the two random processes better than the programmer
who is the author of the application. Thus a customized heap manager
can beat the OS as proved in the above.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
C++ Memory Management Innovation: GC Allocator xushiwei C++ 1 04-22-2008 11:51 AM
custom allocator using templates Robert Frunzke C++ 4 01-29-2006 05:30 PM
Custom memory allocator - alignment Romeo Colacitti C Programming 4 03-07-2005 01:02 AM
Custom allocator sample code for vector Alex Vinokur C++ 16 08-16-2004 11:25 AM
Idea for custom thread-safe STL allocator? Brian Genisio C++ 12 01-15-2004 03:41 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57