Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   How to choose buffer size? (http://www.velocityreviews.com/forums/t724870-how-to-choose-buffer-size.html)

sandeep 06-05-2010 05:24 PM

How to choose buffer size?
 
In this program I'm writing I want to pre-allocate space for a
indeterminate (but potentially large) number of instances of a
given structure. And if this amount of memory proves insufficient,
I'll add another large chunk with realloc. Etc.

So somewhere in my code I'll have an initial pre-allocation that
looks like

size_t _siz = INITIAL_SIZE;
foo* foos;
int next_slot = 0;

foos = (foo *)malloc(_siz);
/* error check */

/* ... */

while (size <= next_slot * sizeof foo) {
size *= 2;
foos = (foo *)realloc(foos, size);
/* error check */
}

/* ... */

Is there a way to choose a good value for INITIAL_SIZE? Should I
just pick a number out of a hat? 101? 1001? 5091? Or should I
pick a "nice power of 2" out of a hat, like 1024 or 4096? Or is
there a better way?

Seebs 06-05-2010 08:17 PM

Re: How to choose buffer size?
 
On 2010-06-05, Richard Heathfield <rjh@see.sig.invalid> wrote:
> It's best to avoid identifiers that use leading underscores. There is a
> rule against it. Although it's not an all-embracing rule, in practice it
> turns out to be easiest to pretend that it /is/ all-embracing.


Strong agreement here. There is simply no POINT in using such identifiers,
and if you never use one, you can be TOTALLY confident that you will not
mistakenly trip the rule.

Now if only the str*, is*, to*, and E* rules were as easy to remember.

> Depends on your data. One way is to pick a value that will be big enough
> to cover, say, 25% of your typical average requirement. That way, most
> of the time you'll probably only have to realloc once or twice, and
> you're unlikely ever to be wasting too much unrequired memory.


There is a similar tradeoff in how much to grow at a time. I tend to favor
*1.5 over *2, myself, but I'm not sure whether this has any justification
past "it seemed like a good idea at the time".

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Barry Schwarz 06-05-2010 09:17 PM

Re: How to choose buffer size?
 
On Sat, 5 Jun 2010 17:24:57 +0000 (UTC), sandeep <nospam@nospam.com>
wrote:

>In this program I'm writing I want to pre-allocate space for a
>indeterminate (but potentially large) number of instances of a
>given structure. And if this amount of memory proves insufficient,
>I'll add another large chunk with realloc. Etc.
>
>So somewhere in my code I'll have an initial pre-allocation that
>looks like
>
> size_t _siz = INITIAL_SIZE;


Why are you using an identifier whose name violates 7.1.3?

> foo* foos;
> int next_slot = 0;
>
> foos = (foo *)malloc(_siz);


Why are you using a meaningless cast?

While not incorrect, the more common idiom is to keep track of the
number of objects, not the total size they consume.

> /* error check */
>
> /* ... */
>
> while (size <= next_slot * sizeof foo) {


foo is a type. When the operand of sizeof is a type, the parentheses
are required.

If the left operand equals the right operand, you will overflow your
allocated memory. You probably want <, not <=.

> size *= 2;


Did you mean _siz instead of size?

What will happen if you execute this block multiple times (think of
the wheat and chessboard fable)? You might want to consider
size += INITIAL_SIZE;
which will have the same effect on the first execution but result in a
significantly smaller number on subsequent executions.

> foos = (foo *)realloc(foos, size);


If realloc fails, you have created a memory leak and cannot access any
of the previously created objects.

> /* error check */
> }
>
> /* ... */
>
>Is there a way to choose a good value for INITIAL_SIZE? Should I
>just pick a number out of a hat? 101? 1001? 5091? Or should I
>pick a "nice power of 2" out of a hat, like 1024 or 4096? Or is
>there a better way?


Analysis is almost always superior to guessing.

Even if the exact number is unknown, do you know anything about the
number of objects you are likely to need? If you run the program
repeatedly, do you know anything about the distribution of this value?
Does you program spend most of its time processing the data (so that
minimizing calls to malloc might be useful) or is it I/O bound (in
which case it wouldn't matter if your call to realloc increased the
space for only one more object)?

Since INITIAL_SIZE is not the quantity of objects but the amount of
space they consume, it needs to be a multiple of sizeof(foo). If you
change the initialization of _siz to
INITIAL_SIZE * sizeof(foo)
(or change the argument to malloc) then your numbers above might be
reasonable.

Most systems today use a virtual memory model. Virtual memory is
allocated in terms of pages. Each page has a fixed size determined by
the operating system and hardware. There may be some benefit to
allocating memory in quantities that match page boundaries.

How sophisticated would you like your growth algorithm to be?
Increasing by a constant factor (such as your doubling) or by a
constant quantity (such as my suggestion) are dirt simple approaches.
There are others, such as increasing by 100%, 75%, 50%, and 25% for
the first four changes and then if more increases are needed start
raising the percentage again. If you are obtaining the data from a
file, you could use a system dependent method of getting the file size
and computing an initial estimate from that.

--
Remove del for email

Gene 06-05-2010 11:19 PM

Re: How to choose buffer size?
 
On Jun 5, 1:24*pm, sandeep <nos...@nospam.com> wrote:
> In this program I'm writing I want to pre-allocate space for a
> indeterminate (but potentially large) number of instances of a
> given structure. *And if this amount of memory proves insufficient,
> I'll add another large chunk with realloc. *Etc.
>
> So somewhere in my code I'll have an initial pre-allocation that
> looks like
>
> * size_t _siz = INITIAL_SIZE;
> * foo* foos;
> * int next_slot = 0;
>
> * foos = (foo *)malloc(_siz);
> * /* error check */
>
> * /* ... */
>
> * while (size <= next_slot * sizeof foo) {
> * * size *= 2;
> * * foos = (foo *)realloc(foos, size);
> * * /* error check */
> * }
>
> * /* ... */
>
> Is there a way to choose a good value for INITIAL_SIZE? *Should I
> just pick a number out of a hat? *101? *1001? *5091? *Or should I
> pick a "nice power of 2" out of a hat, like 1024 or 4096? *Or is
> there a better way?


I suggest implementing an internal memory allocation interface that
the rest of your system uses rather than directly calling malloc/
calloc/realloc/strdup/free. Start by implementing the interface with
a malloc() call once per object. Profile. Then only if memory
allocation is a bottleneck or you have fragmentation problems, etc.
implement your block allocation scheme behind the interface. If your
program has a memory utilization profile consisting of many
allocations (the ones you'd use the blocks for) followed by
deallocation, modern malloc/frees are going to be a few 10s of
instructions, so multiple allocation won't buy much.

Seebs 06-06-2010 12:00 AM

Re: How to choose buffer size?
 
On 2010-06-05, Richard Heathfield <rjh@see.sig.invalid> wrote:
> Seebs wrote:
>> There is a similar tradeoff in how much to grow at a time. I tend to favor
>> *1.5 over *2, myself, but I'm not sure whether this has any justification
>> past "it seemed like a good idea at the time".


> That's an interesting empirical reach toward the theoretical ideal of
> 1.618033988... (i.e. (1.0 + sqrt(5.0)) / 2).


Yes, it is. :)

> I have an interesting proof of this, but... well, Usenet's so *small*,
> isn't it?


It's sufficient to observe that it's "probably close enough", with perhaps
the advantage that "*1.5" is SOOO much faster to calculate that, in a run
where you expand from an initial pool of a few thousand objects to a pool
of several million, clearly you will save NANOSECONDS by using something that
can be clearly and expressively written as:
i = (i < 2) ? (i + (i | 1)) : (i + (i >> 1));

.... Sorry, long day. :)

-s
--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nospam@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!


All times are GMT. The time now is 08:11 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.