Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > sizeof([ALLOCATED MEMORY])

Reply
Thread Tools

sizeof([ALLOCATED MEMORY])

 
 
Howard Hinnant
Guest
Posts: n/a
 
      05-04-2006
In article <(E-Mail Removed)>,
Kelsey Bjarnason <(E-Mail Removed)> wrote:

> [snips]
>
> On Thu, 04 May 2006 14:18:31 +0000, Howard Hinnant wrote:
>
> > The data structure as described above has an efficiency concern:
> > Repeated growth in size can lead to quadratic expense due to continually
> > reallocating for a slightly larger size.

>
> So don't do that. In simplest terms, don't add, multiply.


Did you read my entire post or just stop here?

-Howard
 
Reply With Quote
 
 
 
 
Stephen Sprunk
Guest
Posts: n/a
 
      05-04-2006
"Howard Hinnant" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> No matter what the growth strategy for capacity, its existence is a
> major motivation for finding out the amount of memory actually
> allocated. create_array_short(N) may request only N*sizeof(short)
> bytes, but may receive slightly more than that (for alignment purposes
> or whatever). If create_array_short(N) had some way to find out how
> much memory it actually received, then it makes perfect sense to set
> capacity to size_received/sizeof(short) instead of to N. It makes for
> higher performing code in the average case.


You're over-optimizing here. If malloc() returns more memory than you asked
for, when you expand the array, you'll get access to it. On average, you'll
perform the same number of realloc()s with or without this optimization
unless you're expanding by a very small amount each time, in which case a
large fraction of your realloc()s will be no-ops for the implementation.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin


*** Posted via a free Usenet account from http://www.teranews.com ***
 
Reply With Quote
 
 
 
 
Stephen Sprunk
Guest
Posts: n/a
 
      05-04-2006
"CBFalconer" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Chris McDonald wrote:
>> In short, you cannot determine the allocated size, from the
>> allocated memory itself.

>
> However, you can determine how much memory is needed by some means
> or other, and perform a realloc to that size. This can lead to
> snarling dogs playing tug-of-war.


My first thought was "how clever", but there are serious problems with this:

1. it assumes the pointer the callee received is to the start of the object
2. realloc() may relocate the object

The latter is a serious problem if you can't communicate that fact back to
the caller; it is cleaner for the interface to provide a way to indicate the
object's size to the callee in the first place so that this hack isn't
needed.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin


*** Posted via a free Usenet account from http://www.teranews.com ***
 
Reply With Quote
 
Howard Hinnant
Guest
Posts: n/a
 
      05-04-2006
In article <445a47c0$0$29357$(E-Mail Removed)>,
"Stephen Sprunk" <(E-Mail Removed)> wrote:

> "Howard Hinnant" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
> > No matter what the growth strategy for capacity, its existence is a
> > major motivation for finding out the amount of memory actually
> > allocated. create_array_short(N) may request only N*sizeof(short)
> > bytes, but may receive slightly more than that (for alignment purposes
> > or whatever). If create_array_short(N) had some way to find out how
> > much memory it actually received, then it makes perfect sense to set
> > capacity to size_received/sizeof(short) instead of to N. It makes for
> > higher performing code in the average case.

>
> You're over-optimizing here. If malloc() returns more memory than you asked
> for, when you expand the array, you'll get access to it. On average, you'll
> perform the same number of realloc()s with or without this optimization
> unless you're expanding by a very small amount each time, in which case a
> large fraction of your realloc()s will be no-ops for the implementation.


Let's go through a specific example. I'm going to assume the struct I
showed before:

struct array_of_short
{
short* data;
size_t size;
size_t capacity;
};

Let's say that size == capacity == 100. Now for some reason my
algorithm decides it needs 2 more shorts. What do I do?

Today I call realloc similar to:

data = realloc(data, 3*capacity/2 * sizeof(short));

I.e. I up the capacity by 50% (or whatever growth factor you want to
use). Now under the hood lets say (and this is quite reasonable) that
the heap had already allocated 4 more bytes to data, and had another 64
bytes beyond that which it could have expanded that chunk in place. But
because my code is ignorant of the current heap structure in the
neighborhood of my pointer, the best thing I know to do is blindly ask
for a realloc expanding to 50 more shorts (assuming a 2 byte short for
this example). That in turn will trigger the heap to relocate my 200
byte array to another location.

If I had more access to the current heap structure, I could have known
that there was already room for two extra shorts allocated to me. I
could've avoided a call to realloc altogether.

Continuing along with this example: let's say instead of 2 more shorts,
my algorithm actually needed 10 more shorts. Here's my choices:

1. realloc for 10 more shorts.
2. realloc for a 50% bigger capacity.

Choice 1 risks the poor efficiency of an incremental growth strategy
(known performance problem).

Choice 2 completely wastes the 68 bytes that were sitting beyond my 200
byte buffer but had no way to find out about.

Wouldn't it be nice if I could:

A: Discover those extra 2 shorts that were allocated to me anyway.
B: Discover that I could expand in place by another 32 shorts without
the expense of relocating my array.

?

These seem like significant and obvious optimizations to me.

Indeed, I've coded them up and tested them. Given the right underlying
allocation interface one can achieve very decent performance increases
with very little effort. It won't make your code 10 times faster. But
it sure is an easy way to gain say 15% while minimizing heap
fragmentation and thrashing.

But it is only easy if you have the right allocation interface. Indeed
today the only way to achieve this type of optimization in C is to write
your own malloc/realloc/free functions (adding to the interface as
appropriate). Anyone who's written these functions (I have) knows that
this exercise can't be labeled easy.

-Howard
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      05-04-2006


Howard Hinnant wrote On 05/04/06 17:26,:
> [...]
>
> If I had more access to the current heap structure, I could have known
> that there was already room for two extra shorts allocated to me. I
> could've avoided a call to realloc altogether.


IMHO, this isn't such a big deal. Or to put it another
way: if you've got an array that's (1) big enough to be a
problem and (2) expands and expands and expands and expands,
it's time to think about other data structures.

Are you familiar with any present-day file systems
that require each file to occupy contiguous disk sectors?

--
http://www.velocityreviews.com/forums/(E-Mail Removed)

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      05-04-2006
Howard Hinnant <(E-Mail Removed)> writes:
> In article <445a47c0$0$29357$(E-Mail Removed)>,
> "Stephen Sprunk" <(E-Mail Removed)> wrote:
>> "Howard Hinnant" <(E-Mail Removed)> wrote in message
>> news:(E-Mail Removed)...
>> > No matter what the growth strategy for capacity, its existence is a
>> > major motivation for finding out the amount of memory actually
>> > allocated. create_array_short(N) may request only N*sizeof(short)
>> > bytes, but may receive slightly more than that (for alignment purposes
>> > or whatever). If create_array_short(N) had some way to find out how
>> > much memory it actually received, then it makes perfect sense to set
>> > capacity to size_received/sizeof(short) instead of to N. It makes for
>> > higher performing code in the average case.

>>
>> You're over-optimizing here. If malloc() returns more memory than
>> you asked for, when you expand the array, you'll get access to it.
>> On average, you'll perform the same number of realloc()s with or
>> without this optimization unless you're expanding by a very small
>> amount each time, in which case a large fraction of your realloc()s
>> will be no-ops for the implementation.

>
> Let's go through a specific example. I'm going to assume the struct I
> showed before:
>
> struct array_of_short
> {
> short* data;
> size_t size;
> size_t capacity;
> };
>
> Let's say that size == capacity == 100. Now for some reason my
> algorithm decides it needs 2 more shorts. What do I do?
>
> Today I call realloc similar to:
>
> data = realloc(data, 3*capacity/2 * sizeof(short));
>
> I.e. I up the capacity by 50% (or whatever growth factor you want to
> use). Now under the hood lets say (and this is quite reasonable) that
> the heap had already allocated 4 more bytes to data, and had another 64
> bytes beyond that which it could have expanded that chunk in place. But
> because my code is ignorant of the current heap structure in the
> neighborhood of my pointer, the best thing I know to do is blindly ask
> for a realloc expanding to 50 more shorts (assuming a 2 byte short for
> this example). That in turn will trigger the heap to relocate my 200
> byte array to another location.
>
> If I had more access to the current heap structure, I could have known
> that there was already room for two extra shorts allocated to me. I
> could've avoided a call to realloc altogether.


No, you couldn't necessarily avoid the realloc altogether, unless the
extra memory is actually reserved. There might happen to be another
two shorts worth of storage available immediately after your allocated
space, but the system might later choose to use it for another
allocation.

More generally, if I call, say, malloc(300), the system might reserve
512 bytes for me, and might (internally) guarantee that it won't use
any of it for anything else until I call free() -- or it might
allocate 512 bytes, but remember that I only asked for 300, and later
carve out the last 128 bytes for another allocation. Or another call
to free() might result in my 300-byte allocation being at the
beginning of a contiguous chunk of 1024 bytes.

In other words, the number of bytes that are *really* reserved for a
given malloc() call isn't necessarily a meaningful question, and may
vary over time depending on things outside the program's control.

If I were going to suggest a new feature to address this (perhaps
straying off-topic a bit), I'd probably propose a new function similar
to realloc(), except that it's guaranteed not to relocate the object.
Let's call it xrealloc() (not a great name, but it will do for now).
If realloc() would have expanded or contracted the object in place,
xrealloc() is equivalent to realloc(). If realloc() would have
relocated the object, xrealloc() fails. (We wouldn't require
xrealloc() to succeed in *exactly* the same circumstances where
realloc() would expand or shrink the object in place, but it would
usually work out that way.)

If I've allocated space for 300 bytes, and I find that I need 302
bytes, I can call xrealloc(), asking for just 302 bytes. If this
succeeds, I'm done, and the call was probably very cheap. If it
fails, I can then do a more expensive realloc() call, requesting 400
or 500 bytes to leave room for future growth. And of course existing
code can ignore xrealloc() and continue to work as it always has.

This would allow an implementation to allocate more memory than you
ask for, while still permitting it to take back some of the extra
space if it needs it.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
 
Reply With Quote
 
Howard Hinnant
Guest
Posts: n/a
 
      05-04-2006
In article <1146780106.641149@news1nwk>,
Eric Sosman <(E-Mail Removed)> wrote:

> Howard Hinnant wrote On 05/04/06 17:26,:
> > [...]
> >
> > If I had more access to the current heap structure, I could have known
> > that there was already room for two extra shorts allocated to me. I
> > could've avoided a call to realloc altogether.

>
> IMHO, this isn't such a big deal. Or to put it another
> way: if you've got an array that's (1) big enough to be a
> problem and (2) expands and expands and expands and expands,
> it's time to think about other data structures.


Having multiple data structures in your tool box is always a good thing.
Sometimes you need contiguous array, sometimes you need a linked list,
sometimes a hash table or tree, etc. That being said, the contiguous
array is the workhorse of data structures. It's the one I find myself
reaching for most often. Even when I have no idea how much it will grow.

> Are you familiar with any present-day file systems
> that require each file to occupy contiguous disk sectors?


Nope. But I am intimately familiar with an in-ram data structure which
stores a set of contiguous arrays (and makes it look contiguous). Every
once in a while it comes in very handy.

-Howard
 
Reply With Quote
 
Howard Hinnant
Guest
Posts: n/a
 
      05-04-2006
In article <(E-Mail Removed)>,
Keith Thompson <(E-Mail Removed)> wrote:

> > If I had more access to the current heap structure, I could have known
> > that there was already room for two extra shorts allocated to me. I
> > could've avoided a call to realloc altogether.

>
> No, you couldn't necessarily avoid the realloc altogether, unless the
> extra memory is actually reserved. There might happen to be another
> two shorts worth of storage available immediately after your allocated
> space, but the system might later choose to use it for another
> allocation.


There are two issues here (I'm guilty of introducing the second one in
my last post).

1. The allocator may give you extra memory for free but currently has
no way to tell you that.

2. The allocator may have extra memory located directly after what it
has allocated to you and that amount may vary with time.

In my previous example I attempted to address both of these.

I heartily agree that insight into #2 is the bigger performance win.
However, insight into #1 is practically cost free (there's a dollar on
the sidewalk, do you pick it up, or is it too much trouble to bend
over?).

> If I were going to suggest a new feature to address this (perhaps
> straying off-topic a bit), I'd probably propose a new function similar
> to realloc(), except that it's guaranteed not to relocate the object.
> Let's call it xrealloc() (not a great name, but it will do for now).
> If realloc() would have expanded or contracted the object in place,
> xrealloc() is equivalent to realloc(). If realloc() would have
> relocated the object, xrealloc() fails. (We wouldn't require
> xrealloc() to succeed in *exactly* the same circumstances where
> realloc() would expand or shrink the object in place, but it would
> usually work out that way.)
>
> If I've allocated space for 300 bytes, and I find that I need 302
> bytes, I can call xrealloc(), asking for just 302 bytes. If this
> succeeds, I'm done, and the call was probably very cheap. If it
> fails, I can then do a more expensive realloc() call, requesting 400
> or 500 bytes to leave room for future growth. And of course existing
> code can ignore xrealloc() and continue to work as it always has.
>
> This would allow an implementation to allocate more memory than you
> ask for, while still permitting it to take back some of the extra
> space if it needs it.


I think that's an excellent idea. Perhaps there is some refinement
still to be done here. But the basic idea is fundamentally a win-win.

-Howard
 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      05-05-2006
Howard Hinnant wrote:
> "Stephen Sprunk" <(E-Mail Removed)> wrote:
>> "Howard Hinnant" <(E-Mail Removed)> wrote in message
>>
>>> No matter what the growth strategy for capacity, its existence
>>> is a major motivation for finding out the amount of memory
>>> actually allocated. create_array_short(N) may request only
>>> N*sizeof(short) bytes, but may receive slightly more than that
>>> (for alignment purposes or whatever). If create_array_short(N)
>>> had some way to find out how much memory it actually received,
>>> then it makes perfect sense to set capacity to
>>> size_received/sizeof(short) instead of to N. It makes for
>>> higher performing code in the average case.

>>
>> You're over-optimizing here. If malloc() returns more memory
>> than you asked for, when you expand the array, you'll get
>> access to it. On average, you'll perform the same number of
>> realloc()s with or without this optimization unless you're
>> expanding by a very small amount each time, in which case a
>> large fraction of your realloc()s will be no-ops for the
>> implementation.

>
> Let's go through a specific example. I'm going to assume the
> struct I showed before:


Why bother with all these gyrations? The malloc package for your
system probably understands your system better than you do, and can
take advantage of its peculiarities. As am example, here is an
excerpt from the realloc code in my nmalloc package for DJGPP,
which attempts to avoid moving data (among other things)

/* if decreasing simply reduce size and move excess to free */
if (szneed <= m->sz) {
DBGPRTR(EOL " Realloc is reducing");
if ((m->sz - szneed) >= MINSAVE) {
m = split(&m1, szneed);
mv2freelist(m1);
}
/* else just return old pointer, i.e. NOP */
}
else if (szneed > ((ulong)(INT_MAX - 65536))) {
/* reject excessive size request */
p = NULL; goto exeunt;
}
else if (ISFREE(m->next) &&
(szneed <= (m->sz + m->next->sz)) ) {
/* the 'next' block is free and adequate so use it */
DBGPRTR(EOL " Realloc is combining, next is free");
m = m1 = combinehi(m);
/* now split off the excess, if any */
if ((m->sz - szneed) >= MINSAVE) {
m = split(&m1, szneed);
mv2freelist(m1);
}
/* else m is the oversized return block */
}
else if ((lastsbrk == m->next) &&

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>


 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      05-05-2006
Stephen Sprunk wrote:
> "CBFalconer" <(E-Mail Removed)> wrote in message
>> Chris McDonald wrote:

>
>>> In short, you cannot determine the allocated size, from the
>>> allocated memory itself.

>>
>> However, you can determine how much memory is needed by some means
>> or other, and perform a realloc to that size. This can lead to
>> snarling dogs playing tug-of-war.

>
> My first thought was "how clever", but there are serious problems
> with this:
>
> 1. it assumes the pointer the callee received is to the start of
> the object
> 2. realloc() may relocate the object


No it doesn't, and besides I do not recommend the strategy. There
is never any guarantee that realloc will return the same pointer.
Some systems try to.

>
> The latter is a serious problem if you can't communicate that fact
> back to the caller; it is cleaner for the interface to provide a
> way to indicate the object's size to the callee in the first place
> so that this hack isn't needed.


It was intended to point out the absurdity of such strategies.
Unless you are excessively amused by snarling dogs playing
tug-of-war.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments