Pool question

a_linux_user · Oct 12, 2008

I am creating a large graph object in which I add nodes whenever I
need them, and I create them by 'new node' individually. I know this
is slow if I am going to allocate thousands of nodes individually. I
want to replace this allocation by an allocation from a pool. One
simple method I can think of is a manual allocation of a large array
of nodes (since I have an estimate of how many nodes there will be),
and use pointers to array elements, and then of course keep track of
used nodes. I think I can do this easily because in my current version
there is no freeing of nodes (until the end, when everything is
freed), so it is manageable without writing a complicated pool
mechanism. But potentially such a simplistic method will be
inadequate. Moreover I am trying to make a transition to C++ from
another language, so I want to know how it is done in C++ style. So I
would like to know if there is some pool facility in the library. I
have seen that there is a Boost pool library but I don't know if it is
the simplest choice.

Kai-Uwe Bux · Oct 12, 2008

a_linux_user said:
I am creating a large graph object in which I add nodes whenever I
need them, and I create them by 'new node' individually. I know this
is slow if I am going to allocate thousands of nodes individually. I
want to replace this allocation by an allocation from a pool. One
simple method I can think of is a manual allocation of a large array
of nodes (since I have an estimate of how many nodes there will be),
and use pointers to array elements, and then of course keep track of
used nodes. I think I can do this easily because in my current version
there is no freeing of nodes (until the end, when everything is
freed), so it is manageable without writing a complicated pool
mechanism. But potentially such a simplistic method will be
inadequate. Moreover I am trying to make a transition to C++ from
another language, so I want to know how it is done in C++ style. So I
would like to know if there is some pool facility in the library. I
have seen that there is a Boost pool library but I don't know if it is
the simplest choice.

First, I would change the code so that is uses an allocator instead of using
new and delete directly. You can test the code using the standard
allocator.

Then I would replace that allocator with a pool allocator. You can check out
various alternatives and also write your own. In the end, you can choose
the fastest.

Best

Kai-Uwe Bux

Adem · Oct 12, 2008

a_linux_user said:
I am creating a large graph object in which I add nodes whenever I
need them, and I create them by 'new node' individually. I know this
is slow if I am going to allocate thousands of nodes individually. I
want to replace this allocation by an allocation from a pool. One
simple method I can think of is a manual allocation of a large array
of nodes (since I have an estimate of how many nodes there will be),
and use pointers to array elements, and then of course keep track of
used nodes. I think I can do this easily because in my current version
there is no freeing of nodes (until the end, when everything is
freed), so it is manageable without writing a complicated pool
mechanism. But potentially such a simplistic method will be
inadequate. Moreover I am trying to make a transition to C++ from
another language, so I want to know how it is done in C++ style. So I
would like to know if there is some pool facility in the library. I
have seen that there is a Boost pool library but I don't know if it is
the simplest choice.

Look for "operator new()" in your C++ manual(s) and on the net.
By this mechanism you can have your own allocator.

Juha Nieminen · Oct 12, 2008

a_linux_user said:
I am creating a large graph object in which I add nodes whenever I
need them, and I create them by 'new node' individually. I know this
is slow if I am going to allocate thousands of nodes individually. I
want to replace this allocation by an allocation from a pool.

If all the nodes have the same size, you can use this:

http://warp.povusers.org/FSBAllocator/

Stephen Horne · Oct 13, 2008

I am creating a large graph object in which I add nodes whenever I
need them, and I create them by 'new node' individually. I know this
is slow if I am going to allocate thousands of nodes individually. I
want to replace this allocation by an allocation from a pool.

Why not hold all your nodes in an std::vector, and refer to them by
subscript?

I've rarely felt the need for custom allocators. I usually just choose
data structures that don't need single-item allocation.

a_linux_user · Oct 13, 2008

Thanks to all of you who responded. To begin with I am going to look
into using the standard allocator. Trying to read section 19.4 from
Stroustrup, but so far haven't figured how to use it. Later I will try
to post here little pieces of code to get your comments. Thank you
once again.

Juha Nieminen · Oct 13, 2008

a_linux_user said:
Thanks to all of you who responded. To begin with I am going to look
into using the standard allocator.

I don't really understand what is it that you are going to gain from
that. The standard allocator does 'new' and 'delete' in the way you are
already doing.

a_linux_user · Oct 13, 2008

I don't really understand what is it that you are going to gain from
that. The standard allocator does 'new' and 'delete' in the way you are
already doing.

Thanks for pointing this out. I didn't realize that. I was always
under the impression that doing 'new' for each node was somehow bad.

Stephen Horne · Oct 14, 2008

Thanks for pointing this out. I didn't realize that. I was always
under the impression that doing 'new' for each node was somehow bad.

I think this is a language issue.

You need to look up custom allocators. The standard allocator *is* bad
(for some requirements), which is why it can be overridden using
custom allocators.

Both standard and custom allocators are accessed through the new
operator. This is because the new operator is responsible for calling
the constructor - not just allocating the memory.

Containers have defaulted template parameters that give a kind of
allocation policy ("policy" being a kind of template design pattern) -
it may be more appropriate to target this than the new operator.

Stephen Horne · Oct 14, 2008

Furthermore, it is often a waste of time to start optimizing constant
factors before you even have a working program.

Very good point. Another reason for it is that you can end up
overcommitted to poor algorithm/data structure choices simply because
you've put so much time into developing and optimising them. A
variation on the KISS principle, basically.

Stephen Horne · Oct 14, 2008

I think this is a language issue.

That wasn't clear - I meant English language rather than C++. I got
the impression that a_linux_user was getting a tad confused.

James Kanze · Oct 14, 2008

I don't really understand what is it that you are going to
gain from that. The standard allocator does 'new' and
'delete' in the way you are already doing.

[/QUOTE]

The standard allocators are required to use the global operator
new() and operator delete() functions to acquire memory. They
do NOT use new or delete expressions, however, as they are
required to separate allocation an initialization.

Thanks for pointing this out. I didn't realize that. I was
always under the impression that doing 'new' for each node was
somehow bad.

What's wrong with it? It's the obvious solution, and is only
"bad" if it causes the program to fail to meet some requirement.

Also, the standard requires the standard allocator to use
:

perator new() and :

perator delete() to acquire and free
memory. However, it also explicitly states that "it is
unspecified when or how often this function [:

perator new()]
is called."

Juha Nieminen · Oct 14, 2008

James said:
What's wrong with it? It's the obvious solution, and is only
"bad" if it causes the program to fail to meet some requirement.

In many systems with many compilers the default allocator can be quite
slow in some situations (for example when allocating and freeing
enormous amounts of small objects). Using a faster allocator can speed
up the allocation by a large factor. If your program is very allocation
(and deallocation) heavy, your entire program can speed up by a large
factor.

(The most common solution to this problem is to use large arrays
rather than allocating each object individually, as arrays do not suffer
from the speed issues of 'new' and 'delete'. However, arrays cannot
always be used in every possible such situation.)

a_linux_user · Oct 14, 2008

In many systems with many compilers the default allocator can be quite
slow in some situations (for example when allocating and freeing
enormous amounts of small objects). Using a faster allocator can speed
up the allocation by a large factor. If your program is very allocation
(and deallocation) heavy, your entire program can speed up by a large
factor.

(The most common solution to this problem is to use large arrays
rather than allocating each object individually, as arrays do not suffer
from the speed issues of 'new' and 'delete'. However, arrays cannot
always be used in every possible such situation.)

I tested two versions:
1. new and delete a node structure a billion times (took about 52
seconds)
2. calloc an array of million nodes, set the id of each node, then
free the array (and repeat this 1000 times) (took 22 seconds)
so clearly the second alternative does run much faster, but in my
graph program, where I allocate only 10 million nodes, this is not a
significant difference given that I am doing other computations (35
seconds vs 28 seconds).
This is what I expected, and that is what I wanted to ask about in the
first post. But I have to do a bit learning about various allocation
mechanisms available in C++ to fully understand all comments that have
been made above, and I want to thank you all.

James Kanze · Oct 14, 2008

In many systems with many compilers the default allocator can
be quite slow in some situations (for example when allocating
and freeing enormous amounts of small objects).

I know that used to be the case (over ten years ago), but I've
not found it to be true recently. Most implementations of
operator new just call malloc, and most modern implementations
of malloc have special optimizations for small allocations,
handling them basically just as you would in your custom
allocator which chopped the big block.

Using a faster allocator can speed up the allocation by a
large factor. If your program is very allocation (and
deallocation) heavy, your entire program can speed up by a
large factor.

Been there, done that. It's only relevant IF you have a
performance problem. And even then, you might consider just
replacing :

perator new and :

perator delete with something
more intelligent.

(The most common solution to this problem is to use large
arrays rather than allocating each object individually, as
arrays do not suffer from the speed issues of 'new' and
'delete'. However, arrays cannot always be used in every
possible such situation.)

That's typically what the standard allocator will do, on
platforms where there is a problem. As I said, however, I've
not found it to be a problem on modern systems---the last time I
had to tune allocation was something like 15 years ago.

James Kanze · Oct 14, 2008

On 14 Oct, 13:05, Juha Nieminen <[email protected]> wrote:

I tested two versions:
1. new and delete a node structure a billion times (took about 52
seconds)
2. calloc an array of million nodes, set the id of each node, then
free the array (and repeat this 1000 times) (took 22 seconds)
so clearly the second alternative does run much faster, but in my
graph program, where I allocate only 10 million nodes, this is not a
significant difference given that I am doing other computations (35
seconds vs 28 seconds).

A custom fixed length allocator would have to do a little bit
more than that (since you have to consider the case where some
of the elements are freed and then have to be reused). It's
possible to get it down to less than about ten machine
instructions. Which is two or three less than a well written
standard allocator would need; the typical difference shouldn't
be more than about 25%, however. Of the time spent strictly
allocating. If you're spending less than 90% of your time in
the allocator, then it's definitely not worth it. (I have had
code where the profiler showed 99.98% of the time in the
allocator. In that particular case, the solution was to
allocate a lot less, however, not to optimize the allocator.)

Juha Nieminen · Oct 15, 2008

James said:
A custom fixed length allocator would have to do a little bit
more than that (since you have to consider the case where some
of the elements are freed and then have to be reused). It's
possible to get it down to less than about ten machine
instructions. Which is two or three less than a well written
standard allocator would need; the typical difference shouldn't
be more than about 25%, however.

Allocation speed is not only a question of how many machine
instructions they take.

The allocation strategy can play a crucial role in how fast the
allocator is. This is because memory cache behavior can cause an
allocator to be an order of magnitude slower than another allocator
which uses a different allocation strategy.

One major problem with most default memory allocators is that they are
unable to optimize their list of free blocks. When you randomly allocate
and deallocate objects, the list of free memory blocks gets very
randomized. This means that when the memory allocator follows the list
of free blocks, it will jump randomly in memory. This will cause tons of
cache misses, making the allocator very slow.

An allocator which is able to optimize the list of free memory blocks
in such way that it's able to traverse it linearly in memory can be much
faster than the default allocator, even if it has to do much more work
(in terms of machine code) to keep the list optimized.

(Also there are situations where keeping the list optimized is fairly
easy. For example an allocator which allocates elements of a fixed size
can be fairly easily optimized this way.)

Even a completely naive gargabe collector which makes a linear sweep
through the allocated memory from time to time can, rather surprisingly,
make the program faster if it can optimize the list of free blocks to be
linear. Cache misses are that heavy.

Stephen Horne · Oct 16, 2008

An allocator which is able to optimize the list of free memory blocks
in such way that it's able to traverse it linearly in memory can be much
faster than the default allocator, even if it has to do much more work
(in terms of machine code) to keep the list optimized.

Yes, but I tend to deal with that issue other ways.

The standard library associative containers are binary trees, which
can be problematic. Multiway trees were designed to be far better in
terms of locality from the start.

There are overheads in terms of item copying on inserts, deletes, node
splits etc - but the locality advantage normally outweighs that.
Particularly for sequential access, since the trees I use are more B+
tree than B tree, so a sequential traversal is just a
linked-list-of-small-arrays traversal.

Using a different kind of container means that custom allocators are
much less of an issues - though obviously they're not suitable for
everything.

Mem pool question	2	Mar 14, 2007
how to use a pool	4	Nov 26, 2008
Multiprocessing pool with custom process class	0	Dec 17, 2013
Thread pool issue	1	Mar 18, 2010
Looking for an open source memory pool	11	Jul 25, 2008
boost pool per class...	9	Nov 17, 2007
cond_wait problem with Thread::Pool	1	Apr 7, 2011
multiprocessing: pool with blocking queue	4	Jul 2, 2009

Pool question

a_linux_user

Kai-Uwe Bux

Adem

Juha Nieminen

Stephen Horne

a_linux_user

Juha Nieminen

a_linux_user

Stephen Horne

Stephen Horne

Stephen Horne

James Kanze

Juha Nieminen

a_linux_user

James Kanze

James Kanze

Juha Nieminen

Stephen Horne

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads