How to choose buffer size?

S

sandeep

In this program I'm writing I want to pre-allocate space for a
indeterminate (but potentially large) number of instances of a
given structure. And if this amount of memory proves insufficient,
I'll add another large chunk with realloc. Etc.

So somewhere in my code I'll have an initial pre-allocation that
looks like

size_t _siz = INITIAL_SIZE;
foo* foos;
int next_slot = 0;

foos = (foo *)malloc(_siz);
/* error check */

/* ... */

while (size <= next_slot * sizeof foo) {
size *= 2;
foos = (foo *)realloc(foos, size);
/* error check */
}

/* ... */

Is there a way to choose a good value for INITIAL_SIZE? Should I
just pick a number out of a hat? 101? 1001? 5091? Or should I
pick a "nice power of 2" out of a hat, like 1024 or 4096? Or is
there a better way?
 
S

Seebs

It's best to avoid identifiers that use leading underscores. There is a
rule against it. Although it's not an all-embracing rule, in practice it
turns out to be easiest to pretend that it /is/ all-embracing.

Strong agreement here. There is simply no POINT in using such identifiers,
and if you never use one, you can be TOTALLY confident that you will not
mistakenly trip the rule.

Now if only the str*, is*, to*, and E* rules were as easy to remember.
Depends on your data. One way is to pick a value that will be big enough
to cover, say, 25% of your typical average requirement. That way, most
of the time you'll probably only have to realloc once or twice, and
you're unlikely ever to be wasting too much unrequired memory.

There is a similar tradeoff in how much to grow at a time. I tend to favor
*1.5 over *2, myself, but I'm not sure whether this has any justification
past "it seemed like a good idea at the time".

-s
 
B

Barry Schwarz

In this program I'm writing I want to pre-allocate space for a
indeterminate (but potentially large) number of instances of a
given structure. And if this amount of memory proves insufficient,
I'll add another large chunk with realloc. Etc.

So somewhere in my code I'll have an initial pre-allocation that
looks like

size_t _siz = INITIAL_SIZE;

Why are you using an identifier whose name violates 7.1.3?
foo* foos;
int next_slot = 0;

foos = (foo *)malloc(_siz);

Why are you using a meaningless cast?

While not incorrect, the more common idiom is to keep track of the
number of objects, not the total size they consume.
/* error check */

/* ... */

while (size <= next_slot * sizeof foo) {

foo is a type. When the operand of sizeof is a type, the parentheses
are required.

If the left operand equals the right operand, you will overflow your
allocated memory. You probably want <, not <=.
size *= 2;

Did you mean _siz instead of size?

What will happen if you execute this block multiple times (think of
the wheat and chessboard fable)? You might want to consider
size += INITIAL_SIZE;
which will have the same effect on the first execution but result in a
significantly smaller number on subsequent executions.
foos = (foo *)realloc(foos, size);

If realloc fails, you have created a memory leak and cannot access any
of the previously created objects.
/* error check */
}

/* ... */

Is there a way to choose a good value for INITIAL_SIZE? Should I
just pick a number out of a hat? 101? 1001? 5091? Or should I
pick a "nice power of 2" out of a hat, like 1024 or 4096? Or is
there a better way?

Analysis is almost always superior to guessing.

Even if the exact number is unknown, do you know anything about the
number of objects you are likely to need? If you run the program
repeatedly, do you know anything about the distribution of this value?
Does you program spend most of its time processing the data (so that
minimizing calls to malloc might be useful) or is it I/O bound (in
which case it wouldn't matter if your call to realloc increased the
space for only one more object)?

Since INITIAL_SIZE is not the quantity of objects but the amount of
space they consume, it needs to be a multiple of sizeof(foo). If you
change the initialization of _siz to
INITIAL_SIZE * sizeof(foo)
(or change the argument to malloc) then your numbers above might be
reasonable.

Most systems today use a virtual memory model. Virtual memory is
allocated in terms of pages. Each page has a fixed size determined by
the operating system and hardware. There may be some benefit to
allocating memory in quantities that match page boundaries.

How sophisticated would you like your growth algorithm to be?
Increasing by a constant factor (such as your doubling) or by a
constant quantity (such as my suggestion) are dirt simple approaches.
There are others, such as increasing by 100%, 75%, 50%, and 25% for
the first four changes and then if more increases are needed start
raising the percentage again. If you are obtaining the data from a
file, you could use a system dependent method of getting the file size
and computing an initial estimate from that.
 
G

Gene

In this program I'm writing I want to pre-allocate space for a
indeterminate (but potentially large) number of instances of a
given structure.  And if this amount of memory proves insufficient,
I'll add another large chunk with realloc.  Etc.

So somewhere in my code I'll have an initial pre-allocation that
looks like

  size_t _siz = INITIAL_SIZE;
  foo* foos;
  int next_slot = 0;

  foos = (foo *)malloc(_siz);
  /* error check */

  /* ... */

  while (size <= next_slot * sizeof foo) {
    size *= 2;
    foos = (foo *)realloc(foos, size);
    /* error check */
  }

  /* ... */

Is there a way to choose a good value for INITIAL_SIZE?  Should I
just pick a number out of a hat?  101?  1001?  5091?  Or should I
pick a "nice power of 2" out of a hat, like 1024 or 4096?  Or is
there a better way?

I suggest implementing an internal memory allocation interface that
the rest of your system uses rather than directly calling malloc/
calloc/realloc/strdup/free. Start by implementing the interface with
a malloc() call once per object. Profile. Then only if memory
allocation is a bottleneck or you have fragmentation problems, etc.
implement your block allocation scheme behind the interface. If your
program has a memory utilization profile consisting of many
allocations (the ones you'd use the blocks for) followed by
deallocation, modern malloc/frees are going to be a few 10s of
instructions, so multiple allocation won't buy much.
 
S

Seebs

That's an interesting empirical reach toward the theoretical ideal of
1.618033988... (i.e. (1.0 + sqrt(5.0)) / 2).

Yes, it is. :)
I have an interesting proof of this, but... well, Usenet's so *small*,
isn't it?

It's sufficient to observe that it's "probably close enough", with perhaps
the advantage that "*1.5" is SOOO much faster to calculate that, in a run
where you expand from an initial pool of a few thousand objects to a pool
of several million, clearly you will save NANOSECONDS by using something that
can be clearly and expressively written as:
i = (i < 2) ? (i + (i | 1)) : (i + (i >> 1));

.... Sorry, long day. :)

-s
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,222
Members
46,810
Latest member
Kassie0918

Latest Threads

Top