execution time becomes unpredictable?!

R

REH

Stephen Sprunk said:
Non-determinism isn't necessarily bad as long as you can prove the _maximum_
time required for all the functions you call (plus your own function)
doesn't exceed the deadline.
If you can prove there is a max. time, then it *is* deterministic.
The problem with malloc() and free() is that
there is usually no way to prove the maximum time required for one or the
other (or sometimes both). I'm not even sure it's possible in theory for
both.
Yes, hence, non-determinstic. sheesh.
 
K

Keith Thompson

REH said:
If you can prove there is a max. time, then it *is* deterministic.

If the time required has a fixed upper bound, but the time for any
instance depends on the phase of the moon divided by a random number,
I'd call that non-deterministic (with a fixed upper bound).

If the word "deterministic" has some other meaning in some specific
field, it's not one I'm familiar with.
 
R

REH

Keith Thompson said:
If the time required has a fixed upper bound, but the time for any
instance depends on the phase of the moon divided by a random number,
I'd call that non-deterministic (with a fixed upper bound).

If the word "deterministic" has some other meaning in some specific
field, it's not one I'm familiar with.
Well, then we'll have to agree to disagree. To me, and everyone I have ever
worked with in the real-time embedded world, non-deterministic means it
cannot be bounded. Whether or not you can predict the time of an individual
run will take, the fact that it still has a fixed upper bound, make it
deterministic.
 
M

Minti

The implementation of malloc used for System V Unix (which was used by
compilers beyond System V Unix) gets slower and slower after repeated
usage. In fact I have seen it actually eventually lose memory
allocations. Some implementations of malloc/free are just really
poor.


Just for my reference what algorithm did it use?
Basically the C language standard does not mandate anything about
performance and real implementations of memory heaps *really do*
execute in unpredictable amounts of time. That said, there certainly
do exist O(1) implementations of malloc/free (and realloc, if we ignore
the memcpy overhead.) So malloc() is only slow on *some* systems --
but it may be hard to know which such systems. As far as I know, gcc,
because its open source and "Lea" malloc implementation is reasonably
documented, we know that its memory allocation scheme is fairly
reasonable. OTOH, I've measured MSVC's memory allocation speed, and
its *MUCH* faster than gcc, but since its closed source, I can't sit
here and tell you that its even O(1) in all cases.

This O(1) is I believe only guaranteed as long as the memory requests
are satified using Memory pools, once your memory pools are exhausted,
all bets are off. One more thing, I came across a comment by James
Gosling that in many cases on parallel machines, Java's "new" was much
faster than C's "malloc". I believe that this _only_ because that the
C's implementation was not "parallelized", any other suspicions?



Note to OP: Try to get to know about your compiler and Operating System
specifially how it handles memory allocations. What sort of memory
pools are kept? If the memory pools don't meet your requirement,
allocate your own memory pools, and handle requests using your own
memory pool. There are several algorithms. I am not sure if CBFalconer
is offering source code along. But if he is good news.

--
Imanpreet Singh Arora
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
 
J

James McIninch

<posted & mailed>

It doesn't have anything to do with C per se.

Namely, if you dynamically request memory, you don't know in advance what
the system is going to do to satisfy the request. Is it going to hand you a
chunk already allocated to your program? does it need to search a
free-memory list to fins one for you? Will it rearrange memory to get you a
chunk the size you want? Will it allocate space in a page file or in swap?

You can't know, thus you can't predict how long it may take to answer the
request. 1 millisecond? 100 miliseconds? 2 seconds? You have no idea.
 
F

Flash Gordon

REH said:
Well, then we'll have to agree to disagree. To me, and everyone I have ever
worked with in the real-time embedded world, non-deterministic means it
cannot be bounded. Whether or not you can predict the time of an individual
run will take, the fact that it still has a fixed upper bound, make it
deterministic.

Regardless of whether having a fixed upper bound makes it deterministic,
there are times when you need the execution time to be as consistent as
possible. I know this because I've worked on such systems.
 
K

Keith Thompson

REH said:
Well, then we'll have to agree to disagree. To me, and everyone I have ever
worked with in the real-time embedded world, non-deterministic means it
cannot be bounded. Whether or not you can predict the time of an individual
run will take, the fact that it still has a fixed upper bound, make it
deterministic.

Ok, fair enough. Since I don't work in the real-time embedded world,
I'm not familiar with its jargon. (You can probably expect similar
confusion from others outside your field -- just as we C standards
junkies run into confusion about terms like "byte" and "object".)
 
C

Christian Bau

Might be off topic but I don't know where to post this question.Hope
some body clears my doubt.

The coding standard of the project which I am working on say's not to
use malloc.When I asked my lead(I have just started working) he said we
should not use dynamic allocation in real time systems, the code will
not run in predictable time period.Can anybody tell what does he
mean?Why the execution time becomes unpredictable?

Well, can you predict it? If you can't, it is unpredictable (at least to
you). If you write software that has to complete a task in a given time
with one hundred percent certainty, then you have to be able to predict
how long it takes or at least give an upper bound.

So how long does a call malloc (1000) take? Give me an answer, then
prove that it is correct.
 
W

Walter Roberson

Well, then we'll have to agree to disagree. To me, and everyone I have ever
worked with in the real-time embedded world, non-deterministic means it
cannot be bounded. Whether or not you can predict the time of an individual
run will take, the fact that it still has a fixed upper bound, make it
deterministic.

Though it helps to know the range of time distributions, and
to know whether the worst-case scenarios can be avoided by careful
use of the resource, and to know whether the worst-case scenarios
can be somehow cleaned up during spare compute cycles (e.g.,
if the worst case involves a whole bunch of searches, spare cycles
might be used to perform an interruptable resort-by-size algorithm.)

Time variations are not unimportant in real-time work: it's still
good to know, for example, how well the system will stand up when
pushed beyond it's guaranteed-to-handle limits.

e.g. does the 101st phone call just result in a dialtone 20 ms later
than would otherwise be the case, or does it result in all 100 existing
calls being dropped? Does the 101st incoming missile still get shot
down but closer to its target than is comfortable, or does the 101st
incoming missile get ignored and left to hit the target untracked, or
do all the missiles get ignored as soon as the 101'st shows up, or
does it lose track of a random missile for a moment and regains track
of it later, effectively "juggling" which ones are being tracked?
 
W

Walter Roberson

Regardless of whether having a fixed upper bound makes it deterministic,
there are times when you need the execution time to be as consistent as
possible. I know this because I've worked on such systems.

Would you be able to give us an example without compromising NDAs?

I can think of one example: information about security systems can
be gleaned by careful monitoring of the processor work -- cycle
counts, even which -section- of a microprocessor is being used.
On highly secure devices, you want to avoid having your opponents
do the equivilent of putting the stethoscope to the lock and deducing
the numbers one at a time by hearing the lock workload difference
when the key is varying degrees of correct.
 
C

CBFalconer

.... snip ...

Basically the C language standard does not mandate anything about
performance and real implementations of memory heaps *really do*
execute in unpredictable amounts of time. That said, there certainly
do exist O(1) implementations of malloc/free (and realloc, if we ignore
the memcpy overhead.) So malloc() is only slow on *some* systems --
but it may be hard to know which such systems. As far as I know, gcc,
because its open source and "Lea" malloc implementation is reasonably
documented, we know that its memory allocation scheme is fairly
reasonable. OTOH, I've measured MSVC's memory allocation speed, and
its *MUCH* faster than gcc, but since its closed source, I can't sit
here and tell you that its even O(1) in all cases.

Do try to be reasonably accurate. malloc is a part of the library,
so it has nothing to do with either gcc or MSVC, but everything to
do with the runtime library. Both sources (many in the case of
gcc) are available, but the MSVC stuff cannot be published without
permission. gcc operates on many systems with different
libraries. MSVC operates on an x86 under Windoze only. Depending
on where malloc gets the memory to hand out, the execution time can
be perfectly predictable, or at least have an upper bound put to
it. The time binds appear during free.

In practice you can often play implementor and link in a new malloc
package before linking the runtime library. There are no
guarantees once you do this.
 
W

websnarf

Rob said:
I hope they don't still use SysV malloc.

Right. Very few platforms do. But it clearly has stuck in the
memories/experiences of some people. This is why embedded developers
very often stay away from malloc. I was just diagnosing why people
might feel this way (apparently SCO Unix still uses this malloc.)

And the language standard is of no use here. Compare this to C++'s STL
vector sorting. The C++ language standard actually requires that the
algorithm have O(n*ln(n)) performance and that it performs a "stable
sort" and so on. Something as simple as "malloc and free must have
O(1) performance" would be a extremely valuable.
[...] The time spent can be split up between malloc and free
in different ways. For example you could split it up so it
works like this:-

malloc 1
------
m1 Look at size asked for, find the nearest bin
m2 Get a pointer to that bin from the free list
m3 attach it to the used list (? maybe if there is a used list)
m4 store some data just before the pointer location
m5 give it to the program

free 1
----
f1 Go to the ptr location look at the data you put in just before
f2 Add it back onto the relevant free list
f3 Do some merging (? maybe)
f4 Update the used list (? maybe)

Here, if there is no used list everything malloc does is fast,
but free must traverse the free list. It may also have to merge.

Both malloc and free can use "binning". My own testing indicates that
merging is necessary to avoid "lost allocations" or a failure to
allocate even when memory is available. I.e., merging is not optional.
There are also other interesting ideas/tricks that are worth
consideration.
But the algorithm can be done the other way around, so in
step 1 malloc traverses one of the free lists and takes an
element from the end. Some of the other data structures
you could use would have the same problem (but I'm sure
there'd be one that would be reasonably fast at both ends
at the expense of space).

It could be that GNU malloc splits the time up differently
to MS malloc. Also, it could be that it has an interface
into the operating system that allows it to work faster from
knowledge gleaned from inside the OS. I don't think GNU
malloc does that, even on Linux.

Well the LEA allocator, and UNIX-allocators in general, assume that you
have a sbrk(), which Windows doesn't have. So presumably sbrk() is
emulated somehow (perhaps slowly). However, this only affect
allocations that are made when expanding the programs' maximal foot
print -- my benchmarks lean more heavily on the quiescent state (i.e.,
doing mallocs after lots of frees have happened and therefore should be
obtained from free lists not sbrk())
Anyway, a "large machine" malloc would probably want to
use some devious way to avoid fragmentation, which would
probably make it unlikely to be O(1). A malloc for a small
machine might not need to so thoroughly though.

Ok here's the thing -- there is no way to make a malloc that
deterministically "avoids fragmentation". Think of the following
scenario:

for (i=0; ; i++) {
if (NULL == (p = malloc (16364))) break;
}
/* What should the mallocs be returning to make sure the heap
is not defragmented at this point? There doesn't seem to
be anything better than packed stack-like allocations. */

for (j=0; j < i; j+=2) free (p);
/* Presumably we've recovered half our memory, but what does the
heap look like now? Now matter what we got back from the
calls above, one way or another there is a way to force the
heap to fragment horrendously with some sequence of frees
such as this, even with lots of memory recovered. */

p[0] = malloc (32768);
/* Well there should be plenty of memory available, but is this
even possible now? */

This of course ignores "memory mapping tricks" -- but even then, one is
still limited by address ranges anyway.

Practical defragmentation (i.e., just worried about the most common
cases) is more straightforward -- just make sure allocations are as
packed together as possible. This will naturally increase the size of
your larger contiguous blocks. One way to try to effect this is, when
given a choice, try to return allocations with the "lowest" address.
You can, of course, use heuristics, like assuming that programs will
tend to allocate many of the same sized blocks -- and thus, upon
detection, try to return these as pieces broken off from larger
contiguous blocks.

Since context independent malloc/free designs cannot deterministically
avoid degenerate scenarios that cause arbitrary amounts of
fragmentation anyways, I don't think trying to use non-O(1) algorithms
are the right way to go. Heuristics do fairly well in combating
fragmentation in practice while leaving good algorithms as O(1)
anyways. So personally, I would never recommend an algorithm that went
over the top in trying to combat fragmentation at the expense of
algorithmic complexity.
(I'm aware I'm telling somes to Paul Hsieh about speed, which
is rather like talking to Gengis Khan about conquering Asia.
Don't flame me too much for any of the above that is wrong.)

Lol! Nah, we're all somewhere on the journey to better understanding
of this stuff, and certainly you seem a lot further than most.
 
W

websnarf

Minti said:
Just for my reference what algorithm did it use?

I wasn't able to analyze it deeply enough to truly understand
everything about it. But what I know for sure is that in its inner
loop it performed some sort of linear search for new blocks when it did
mallocs (presumably looking for a block that was big enough). So as
the number of blocks increases (due to fragmentation I presume) the
performance obviously just goes down. Basically, it doesn't use any
form of binning.
This O(1) is I believe only guaranteed as long as the memory
requests are satified using Memory pools, once your memory pools
are exhausted, all bets are off.

Uhh ... that's possible, but usually that just means you have to fetch
from sbrk() or VirtualAlloc() or some other system call. Technically,
yes that could take an unknown amount of time, but it still should be
reducible to O(ln(system-bits)) < O(1) time.
[...] One more thing, I came across a comment by James
Gosling that in many cases on parallel machines, Java's "new" was
much faster than C's "malloc". I believe that this _only_ because
that the C's implementation was not "parallelized", any other
suspicions?

Well actually I would be rather surprised if "new" wasn't *always*
faster than malloc() hands down. Garbage collection has the advantage
that it can always procure its memory from bigger blocks (and thus be
very stack-like), since it knows that there will not be common
interleaving frees and it doesn't have to support realloc. The real
complexity is supposed to be at garbage collection time when the free
blocks have to be recoallesced into blocks ready for use by "new".
Note to OP: Try to get to know about your compiler and Operating
System specifially how it handles memory allocations. What sort of
memory pools are kept? If the memory pools don't meet your
requirement, allocate your own memory pools, and handle requests
using your own memory pool.

Indeed. The gcc LEA allocator, we know is O(1), as is the BSD
allocator however it may reject allocations that should be possible,
due to excessive prebinning.) And as I said, if you are willing to
trust Microsoft, their allocator *seems* fast according to my
measurements. If the algorithm doesn't use some sort of "binning" I
would be highly suspicious of it -- in fact I believe its not possible
to be O(1) and still return with non-NULL allocations when it should be
possible without something like binning or an equivalent strategy.
There are also companies that *sell* high performance memory allocators
such as Microquill: http://www.microquill.com/ (many CPU companies use
this heap when testing their system on the Spec CPU benchmark.)
[...] There are several algorithms. I am not sure if CBFalconer
is offering source code along. But if he is good news.

I would be highly suspicious of any code CBFalconer offers (his idea of
a "general hash table" is one that can hold at most 8.5 million
entries). Certainly you should heavily audit the code before use.
 
C

CBFalconer

Minti wrote:
.... snip ...
[...] There are several algorithms. I am not sure if CBFalconer
is offering source code along. But if he is good news.

I would be highly suspicious of any code CBFalconer offers (his idea of
a "general hash table" is one that can hold at most 8.5 million
entries). Certainly you should heavily audit the code before use.

This comes from someone who can't recognize that a hash function
that requires a length parameter has to measure the length of an
input string, or who can't recognize that a 32 bit quantity won't
fit into an int and be portable. He is also a proponent of shift
operations on signed quantities. Similarly he seems unable to
recognize that the 8.5 million limit he mentioned is on entries,
each of which requires data storage, and thus represents something
between 5 and 100 up times that in storage bytes. Besides which,
if the user wishes, he simply adds an entry to a table to increase
the maximum. Meanwhile, all smaller quantities are handled with
minimum wastage (about 50% max).

<Aside to Hsieh - hash tables are intended to be in-memory storage,
and tend to be less efficient when they cause thrashing in a
virtual memory system. This puts an inherent maximum on the item
count usable on a given class of systems. If this is not clear to
you I suggest you use e-mail to get an explanation in smaller words
- this is not the appropriate forum>

<For everyone: The following is a quote from the hashlib source,
and is the area that creates this horrendous 8 mega-entries
limitation. I think how to adjust the upper limit should be fairly
obvious to any reasonably skilled C programmer.>

/* table of k where 2**n - k is prime, for n=8 up. 0 ends */
/* These numbers are chosen so that memory allocation will */
/* usually allow space for system overhead in a 2**n block */
/*http://www.utm.edu/research/primes/lists/2small/0bit.html */
#define FIRSTN 8
static int primetbl[] = {45, 45, 41, 45, 45, 45, 45, 49,
57, 49, 41, 45, 59, 55, 57, 61, 0};
/* So the prime of interest, vs index i into above table, */
/* is ( 2**(FIRSTN + i) ) - primetbl */
/* The above table suffices for about 8,000,000 entries. */

Hsieh also seems unable to recognize that the operation of the
malloc package is independant of the compiler, nor that
cryptographic qualities of a hash function have nothing to do with
its suitability for a hash table. There seems to be a lesson here,
that mathematical ability and knowledge have very little to do with
the ability to create clean and trouble free code. However he is
making some progress, in that he has recognized that the action of
sbrk (or an equivalent system function) can affect the timing of a
malloc call. Further hint: This is the one of the reasons malloc
is part of the system and is generally not portable. Alignment
also raises its head here.

At any rate, you (the public) may judge for yourself. My code is
generally available at:

<http://cbfalconer.home.att.net/download/>

and I recommend the hashlib and nmalloc packages for this
controversy, together with id2id-20 and hshftst. The latter arose
out of Hsiehs earlier erroneous claims.

I concede that I am basing my low opinion of his code on his
published hash function. It is possible that this is an inadequate
sample. At any rate, I will not get excited and irrational if told
that there is a problem with my code, and I welcome any such
reports.
 
F

Flash Gordon

Walter said:
Would you be able to give us an example without compromising NDAs?

I can describe an algorithm where this is important without going in to
the specific application.

The SW is controlling a motor which is moving something back and forth
in a saw-tooth wave, so the output motion looks like:
/| /| /|
/ | / | / |
/ |/ |/ |/
etc.

The "flyback" time (i.e. the down stroke) is very fast in this instance
but obviously not instantaneous!

Every x ms you do the following:
1) Measure the position
2) Calculate the current error
3) Calculate the required drive to apply to the motor (see 5 below)
4) Apply drive
5) Calculate a correction to apply earlier in the next mechanical
cycle to avoid getting the error in the first place

This is obviously triggered off an interrupt.

The motor is a very basic job, basically an electromagnet pulling
against a spring, so the drive is the force applied against the spring.
So if you apply the force late then it will start slowing, if you apply
it early it will speed up. So if there is a variation in time between
steps 1 and 4 you will introduce a random error which will be fed in to
the next mechanical cycle when the timing variation might be in the
opposite direction leading to more error. So basically you loose
mechanical accuracy if you have any time variation.

Obviously, in this instance I did not have any reason to even think of
using dynamic memory allocation. However, control loops of this form
where stability of timing is important are not *that* uncommon where you
are trying for accurate control.

Another instance where accurate timing, rather than timing with a known
maximum, is important is when they are giving your heart a shock to
stabilise it's rhythms (when you have arythmia as opposed to being flat
lined). If you give the shock too late OR too early it can damage your
heart. This is only hear say though, where as the previous example was
one I actually worked on myself.
I can think of one example: information about security systems can
be gleaned by careful monitoring of the processor work -- cycle
counts, even which -section- of a microprocessor is being used.
On highly secure devices, you want to avoid having your opponents
do the equivilent of putting the stethoscope to the lock and deducing
the numbers one at a time by hearing the lock workload difference
when the key is varying degrees of correct.

I'm not an expert on such systems, so can't really comment about them.
 
C

Chris Croughton

Well, then we'll have to agree to disagree. To me, and everyone I have ever
worked with in the real-time embedded world, non-deterministic means it
cannot be bounded. Whether or not you can predict the time of an individual
run will take, the fact that it still has a fixed upper bound, make it
deterministic.

Not anywhere I've been in the RT embedded world in the last 25+ years!
Deterministic means "can be determined in advance" (and in particular
can be reproduced with the same input conditions). If the time can be
bounded, we call it 'bounded' (duh!). There are some functions where
the bound is so small that it doesn't matter much so the behaviour of
the system is 'nearly' deterministic, but it still isn't deterministic
if the behaviour changes at all.

And there's no reason that dynamic memory allocation has an upper bound
on its time, it could sit there forever shuffling memory to try to find
some that's free. It's not even deterministic whether it succeeds or
do...

Chris C
 
C

Chris Croughton

Well, then we'll have to agree to disagree. To me, and everyone I have ever
worked with in the real-time embedded world, non-deterministic means it
cannot be bounded. Whether or not you can predict the time of an individual
run will take, the fact that it still has a fixed upper bound, make it
deterministic.

Not anywhere I've been in the RT embedded world in the last 25+ years!
Deterministic means "can be determined in advance" (and in particular
can be reproduced with the same input conditions). If the time can be
bounded, we call it 'bounded' (duh!). There are some functions where
the bound is so small that it doesn't matter much so the behaviour of
the system is 'nearly' deterministic, but it still isn't deterministic
if the behaviour changes at all.

And there's no reason that dynamic memory allocation has an upper bound
on its time, it could sit there forever shuffling memory to try to find
some that's free. It's not even deterministic whether it succeeds or
do...

Chris C
 
S

Stephen Sprunk

REH said:
If you can prove there is a max. time, then it *is* deterministic.

That depends on if "deterministic" is defined as fixed time or bounded time.
I've always understood it to mean the former.

S
 
R

REH

Stephen Sprunk said:
That depends on if "deterministic" is defined as fixed time or bounded time.
I've always understood it to mean the former.
In the company I work for, it was always the later. No big whoop, I guess.

REH
 
R

Rob Thorpe

Right. Very few platforms do. But it clearly has stuck in the
memories/experiences of some people. This is why embedded developers
very often stay away from malloc. I was just diagnosing why people
might feel this way (apparently SCO Unix still uses this malloc.)

And the language standard is of no use here. Compare this to C++'s STL
vector sorting. The C++ language standard actually requires that the
algorithm have O(n*ln(n)) performance and that it performs a "stable
sort" and so on. Something as simple as "malloc and free must have
O(1) performance" would be a extremely valuable.

I think it would be for people doing embedded software. It's not so
important as the average speed for the rest of us. Maybe it should be
in an embedded C standard.
[...] The time spent can be split up between malloc and free
in different ways. For example you could split it up so it
works like this:-

malloc 1
------
m1 Look at size asked for, find the nearest bin
m2 Get a pointer to that bin from the free list
m3 attach it to the used list (? maybe if there is a used list)
m4 store some data just before the pointer location
m5 give it to the program

free 1
----
f1 Go to the ptr location look at the data you put in just before
f2 Add it back onto the relevant free list
f3 Do some merging (? maybe)
f4 Update the used list (? maybe)

Here, if there is no used list everything malloc does is fast,
but free must traverse the free list. It may also have to merge.

Both malloc and free can use "binning". My own testing indicates that
merging is necessary to avoid "lost allocations" or a failure to
allocate even when memory is available. I.e., merging is not optional.
There are also other interesting ideas/tricks that are worth
consideration.

I was giving this as an example of a malloc/free implementation to
show how the slow bits could be put in different places. I hope that
people don't write malloc routines without merging these days.

That said, it depends on how insistent the programmer is on writing
good quality software. Some malloc's are very wasteful and people
don't seem to mind much, that said waste is a little less of a crime
than prematurely running out of memory.
Well the LEA allocator, and UNIX-allocators in general, assume that you
have a sbrk(), which Windows doesn't have. So presumably sbrk() is
emulated somehow (perhaps slowly). However, this only affect
allocations that are made when expanding the programs' maximal foot
print -- my benchmarks lean more heavily on the quiescent state (i.e.,
doing mallocs after lots of frees have happened and therefore should be
obtained from free lists not sbrk())

You mentioned you were using GCC, but not which library. Cygwin,
DJGPP and GLibc use different Mallocs. I think maybe the Linux kernel
has a different Malloc again. All these versions are somewhat
similar.

It sounds like you did a fairly thorough test of your version. Given
that, it probably is just a slow implementation of malloc/free.
Anyway, a "large machine" malloc would probably want to
use some devious way to avoid fragmentation, which would
probably make it unlikely to be O(1). A malloc for a small
machine might not need to so thoroughly though.

Ok here's the thing -- there is no way to make a malloc that
deterministically "avoids fragmentation". Think of the following
scenario:

for (i=0; ; i++) {
if (NULL == (p = malloc (16364))) break;
}
/* What should the mallocs be returning to make sure the heap
is not defragmented at this point? There doesn't seem to
be anything better than packed stack-like allocations. */

for (j=0; j < i; j+=2) free (p);
/* Presumably we've recovered half our memory, but what does the
heap look like now? Now matter what we got back from the
calls above, one way or another there is a way to force the
heap to fragment horrendously with some sequence of frees
such as this, even with lots of memory recovered. */

p[0] = malloc (32768);
/* Well there should be plenty of memory available, but is this
even possible now? */


I see what you mean, the user of malloc controls the pattern of the
frees, and therefore the fragmentation of memory, so there's no total
solution, only some approximations for common program behaviour.
This of course ignores "memory mapping tricks" -- but even then, one is
still limited by address ranges anyway.

Practical defragmentation (i.e., just worried about the most common
cases) is more straightforward -- just make sure allocations are as
packed together as possible. This will naturally increase the size of
your larger contiguous blocks. One way to try to effect this is, when
given a choice, try to return allocations with the "lowest" address.
You can, of course, use heuristics, like assuming that programs will
tend to allocate many of the same sized blocks -- and thus, upon
detection, try to return these as pieces broken off from larger
contiguous blocks.

Doug Lea's malloc also uses "deferred coalescing". Say a program
mallocs a number of pieces the same size, then frees some of them.
Instead of merging the pieces left after the frees straight away it
waits to see if the program is going to allocate more chunks that
size, which is likely.

Also, if a program uses one particular size often then malloc/free can
create a special bin for that size, there can be a set of such bins.
Since context independent malloc/free designs cannot deterministically
avoid degenerate scenarios that cause arbitrary amounts of
fragmentation anyways, I don't think trying to use non-O(1) algorithms
are the right way to go. Heuristics do fairly well in combating
fragmentation in practice while leaving good algorithms as O(1)
anyways. So personally, I would never recommend an algorithm that went
over the top in trying to combat fragmentation at the expense of
algorithmic complexity.

Fragmentation brings up interesting problems, and interesting
solutions. But, a little done about it goes a long way. On
reflection, you're probably right, a good, solid malloc/free
implementation could probably could be written that is completely
O(1). For modern PCs and other big machines though people who write
malloc routines probably don't consider it that important so long as
it's fast most of the time.

For small embedded machines I expect people drop all the frills, take
slightly worse fragmentation and try to keep malloc and free O(1). A
bunch of Smart-card programmers work in the office next to mine, I'll
ask them.
Lol! Nah, we're all somewhere on the journey to better understanding
of this stuff, and certainly you seem a lot further than most.

Thank you.

Speed is certainly curious stuff to understand. One of the most
interesting attitudes to it I've seen is the garbage collector in GCC,
it only kicks in when the compiler uses several megs of memory. This
means that on most compiles it never kick in at all, so in many cases
no complex memory management even happens. The problems here are
firstly running many instances of the same program and secondly the
point where it does kick in.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,161
Messages
2,570,892
Members
47,427
Latest member
HildredDic

Latest Threads

Top