Software memory management reengineering

E

Eric Sosman

Richard said:
jacob navia said:



I have already given an example, upthread, of what you call a "transient"
program which became a "non-transient" program five years later. Skimp on
the housekeeping now, and pay for it many times over in the future.

Engineering is about balancing considerations that are
sometimes in conflict. No one consideration or goal is always
paramount. Besides, even when "The program must be easy to
modify" is important, the designer faces the challenge of
trying to anticipate the likely modifications. Sometimes
the crystal ball works, sometimes it's cloudy, and hindsight
is always twenty-twenty.
No matter how many times you repeat an incorrect argument, it remains
incorrect.

"Incorrect" is too absolute a term to apply here. Can
you think of a good reason for the Unix `shutdown' program
to free() its allocated memory before returning? Yes, a
leaky `shutdown' would be hard to use in a loop, but ...

For what it's worth, many of my own programs follow the
general line Jacob describes, at least in part. "Library"
or "utility" functions are scrupulous about their memory
bookkeeping because they don't know the context they run in;
they're careful about memory for the same reasons they don't
call abort(). But if there's a bunch of long-lived data
that represents the program's "global state," I see nothing
wrong with letting exit() clean it up.
 
R

Richard Harter

I can see you have never worked on embedded systems.

You fail to realise that your leaky directory listing routine may get
invoked recursively in something like a backup utility, or get called
hundreds or even thousands of times in a test suite. Just clean up your
mess, it isn't that hard.

While what you say is quite true, you are talking about something
different than what Jacob is talking about. Jacob is talking
about transient *programs*, which you changed into *routines*.
If I write a program that lists a single directory specified in
its calling sequence then it can't be "invoked recursively". The
backup utility executes the program multiple times, each
execution being a separate process (or equivalent concept in the
OS of your choice).

Jacob's notion is plausible on the surface - if you have a
collection of utilities that are bounded in their own right and
you connecting them in scripts, then one could save execution
time by not bothering to free storage. That said, there are
several objections that occur to me.

The first is that people do take standalone programs and convert
them to internal functions. What was acceptable in the original
is not in the conversion. Secondly, it is not necessarily the
case that a simple standalone program will have bounded memory
usage. For example, a program that reads lines from stdin and
prints will exhaust memory if a new buffer is allocated for each
line. Thirdly the savings in execution time are strictly
nominal; the costs of starting up and terminating these little
transient programs dominates the cost of freeing storage.



Richard Harter, (e-mail address removed)
http://home.tiac.net/~cri, http://www.varinoma.com
But the rhetoric of holistic harmony can generate into a kind of
dotty, Prince Charles-style mysticism. -- Richard Dawkins
 
R

Richard Heathfield

Eric Sosman said:
Engineering is about balancing considerations that are
sometimes in conflict. No one consideration or goal is always
paramount.

Certainly true, but I would not classify "can't be bothered to manage
resources properly" as an engineering consideration, any more than I would
classify "can't be bothered to wash the dishes" as being a household
management consideration.

"Incorrect" is too absolute a term to apply here.

Sorry, Eric, but I can't agree. I consider the argument he is making to be
incorrect.
Can
you think of a good reason for the Unix `shutdown' program
to free() its allocated memory before returning?

Actually, I can't even think of a good reason for 'shutdown' to allocate
memory dynamically.

<snip>
 
E

Eric Sosman

Richard said:
Eric Sosman said:
[...]
Can
you think of a good reason for the Unix `shutdown' program
to free() its allocated memory before returning?

Actually, I can't even think of a good reason for 'shutdown' to allocate
memory dynamically.

Whether the reason is "good" or not I can't say, but
at least one version certainly does allocate dynamic memory
and does not free it. See line 711 of

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ucbcmd/shutdown/shutdown.c

From the copyright notices it appears that this module
is at least twenty-three years old; there are also stylistic
suggestions that its origins antedate the C Standard. Yet in
all that time, it seems no one has felt impelled to embed
`shutdown' as a routine in a larger program that could call
it in a loop! Maybe they would have, were it not for the
memory leak? What an awful thought: a `shutdown' that gobbled
up all the swap space and crashed the system ...
 
R

Richard Heathfield

Eric Sosman said:
Richard said:
Eric Sosman said:
[...]
Can
you think of a good reason for the Unix `shutdown' program
to free() its allocated memory before returning?

Actually, I can't even think of a good reason for 'shutdown' to allocate
memory dynamically.

Whether the reason is "good" or not I can't say, but
at least one version certainly does allocate dynamic memory
and does not free it. See line 711 of
http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/ucbcmd/shutdown/shutdown.c

From the copyright notices it appears that this module
is at least twenty-three years old; there are also stylistic
suggestions that its origins antedate the C Standard. Yet in
all that time, it seems no one has felt impelled to embed
`shutdown' as a routine in a larger program that could call
it in a loop!

I can't imagine why...
Maybe they would have, were it not for the
memory leak? What an awful thought: a `shutdown' that gobbled
up all the swap space and crashed the system ...

Yes, it's very bad practice - if you're unlucky, it might even shut down
the machine. Tut tut.
 
K

Keith Thompson

Richard Heathfield said:
Certainly true, but I would not classify "can't be bothered to manage
resources properly" as an engineering consideration, any more than I would
classify "can't be bothered to wash the dishes" as being a household
management consideration.
[...]

I don't wash paper plates before I throw them away.
 
R

Richard Heathfield

Keith Thompson said:
Richard Heathfield said:
Certainly true, but I would not classify "can't be bothered to manage
resources properly" as an engineering consideration, any more than I
would classify "can't be bothered to wash the dishes" as being a
household management consideration.
[...]

I don't wash paper plates before I throw them away.

That's the thing about analogies - they are illustrations, not proofs.
 
C

cr88192

jacob navia said:
Suppose that you have a module that always allocates
memory without ever releasing it because the guy that
wrote it was lazy, as lazy as me.

Now, you want to reuse it in a loop. What do you do?

Contrary to some people that will start crying to that
&@@""#~ programmer that wrote this sh!!!! you keep
your cool and you do the following:

1) Since the program always call malloc and never free()
you allocate a big chunk of memory and each time that the
program calls malloc, you replace those calls with another
that allocates from that memory heap.

2) When the program finishes you free all memory witha single
free() call.

actually, this is more or less what I do in my "lower compiler" (converts my
RPN-IL or RIL language to assembler).
it is convinient since the memory in question is not shared between runs.
so, I run the basic parts of the compiler, and simply "rewind" the heap when
I am done.

note that my "upper compiler", which works with ASTs, is garbage collected,
and my assembler/linker core, does its own (manual) memory management.


for anyone that knows:
this was also a major memory-management technique in Quake 1 and 2 (maybe Q3
and later, but I have not really investigated this). these engines actually
had 2 memory managers: the hunk (as above), and Z_Malloc (from what I
remember, a circular linked-list, rover, and memory-block based system,
basically, we scan along until we find a free block, and maybe split and
merge as needed, and a rover keeps track of where in the heap we are). the
hunk was used for most game data (maps, models, textures, wordstate, ...),
wheras the Z_Malloc system was used for longer-lived data (such as the
'cvars', or engine variables).

major cost:
one up-front commits a decent-sized chunk of memory to be used in this
manner, and "overflowing" is a potentially fatal condition (in my case, for
example if someone tries to compile some obscenely large source file...).

Caveats. The program should never call directly
realloc. Watch for calloc too.

the way to deal with realloc, is simply to relocate the object to the end of
the heap...
this can be done much the same as above, a wrapper to the custom allocator
and a memcpy (downsizing though can be made into a no-op).

I have done this many times. Actually programs that use the
lazy memory allocation strategy are very easy to convert to a heap
based strategy if the need ever arises.

maybe, maybe not.

Programs that leak memory but contain a lots of calls to free()
are much harder to modify like this, since you would have to
repeat the functionality of free().

or, very more the case: code that actually needs to keep track of the memory
after it finishes...
if we need to create, pass around, and return objects, then this approach is
totally ill-suited.
 
M

Malcolm McLean

Eric Sosman said:
But if there's a bunch of long-lived data
that represents the program's "global state," I see nothing
wrong with letting exit() clean it up.
Writing a clean-up function forces you to keep track of that global data's
pointer structure. If you can't do this easily it is a useful warning that
the program might be getting out of control.
 
J

Joachim Schmitz

Richard Heathfield said:
Keith Thompson said:
Richard Heathfield said:
Certainly true, but I would not classify "can't be bothered to manage
resources properly" as an engineering consideration, any more than I
would classify "can't be bothered to wash the dishes" as being a
household management consideration.
[...]

I don't wash paper plates before I throw them away.

That's the thing about analogies - they are illustrations, not proofs.
It's not a bad analogy though: if you want to reuse the dish (or code), do
the washing (or free()ing), if you don't, just dump it.

It's quite easy to tell the difference between paper plates and china, it's
not equally easy to tell reusable code from non-reusable though.

Of course I see what you're getting at and I also do have an example were
proper houskeeping would have helped a lot: it was webserver (the NCSA one I
think), which in it's classical form did a fork() on an incoming request,
let the child deal with the reqest and that child just exits when done, not
explicitly free()ing it's memory. Now ao another platform fork() was rather
expensive so the webserver got rewtittion into a long running service,
handling all the request itself (actually a bunch of processes). What
happened then was, that it was leaking memory all over the place and had
largly to be rewritten and redesigned.

Bye, Jojo
 
R

Richard Heathfield

Joachim Schmitz said:
Richard Heathfield said:
Keith Thompson said:
[...]
Certainly true, but I would not classify "can't be bothered to manage
resources properly" as an engineering consideration, any more than I
would classify "can't be bothered to wash the dishes" as being a
household management consideration.
[...]

I don't wash paper plates before I throw them away.

That's the thing about analogies - they are illustrations, not proofs.
It's not a bad analogy though: if you want to reuse the dish (or code),
do the washing (or free()ing), if you don't, just dump it.

It's quite easy to tell the difference between paper plates and china,
it's not equally easy to tell reusable code from non-reusable though.

Right. The Principle of Conservatism applies.

<good example snipped>
 
M

Malcolm McLean

Joachim Schmitz said:
It's quite easy to tell the difference between paper plates and china,
it's not equally easy to tell reusable code from non-reusable though.
The writer should decide. "Is this routine sufficiently general to be of use
in more program than one?" There is a grey area - code to print out a car
park ticket might be reusable if you work for a parking ticket vending
company, not if you work for an embedded systems company and they are just
one client - and not everything designated a reusable will actually be
reused.

However often the answer will be yes or yes possibly, in which case write
the code in such a way that it is likely to be reused. In particular,
parameterise assumptions and don't make it depend on anything it doesn't
need to depend upon.

For example, we've got a list of records in the form

typedef struct
{
char id[32];
double salary;
} RECORD;

Typical ids
Fred123, Fred1234, Fred1230, Jim100

We want to sort so that the the names are alphabetical, the numbers in
order.

The function we pass to qsort() takes two RECORD *s and is unlikely to be
useful elsewhere. However the problem of sorting strings containing embedded
numbers is quite general. So write a routine to do that which takes char *s
and doesn't depend on RECORD. The qsort() comparator then becomes a trivial
wrapper to it.
 
C

CBFalconer

This is reminiscent of the mark/release used on SOME versions of
p-code interpreters for Pascal. It was sufficient to handle the
symbol tables of the compiler, but was sadly wanting for general
use. One of the unique features of PascalP was that it allowed
mark/release to be mixed with new/dispose, and all functioned
correctly.

I.e. experience shows that this is not a satisfactory technique.
 
E

Erik Trulsson

Richard Heathfield said:
jacob navia said:



I have already given an example, upthread, of what you call a "transient"
program which became a "non-transient" program five years later. Skimp on
the housekeeping now, and pay for it many times over in the future.

And besides, there is no guarantee that the operating system will clean up
after the program. Depending on the OS doing that is a bad idea if you want
your programs to be portable.
 
E

Erik Trulsson

jacob navia said:
Maybe you haven't noticed it yet, but after a program finishes the
operating system cleans up.

Not all operating systems will do that.
 
?

=?iso-2022-kr?q?=1B=24=29CHarald_van_D=0E=29=26=0F

Not all operating systems will do that.

True, but the operating systems that don't do that aren't necessarily
operating systems the software might ever be ported to. Another reason to
clean up everything, one that is more interesting to me personally, is
because if I ignore memory leaks that don't cause problems in practice,
tools such as valgrind have no way of warning me of memory leaks that do
cause problems in practice.
 
P

Peter Pichler

jacob said:
1) Since the program always call malloc and never free()
you allocate a big chunk of memory and each time that the
program calls malloc, you replace those calls with another
that allocates from that memory heap.

2) When the program finishes you free all memory witha single
free() call.

If you're already diverting malloc to your own replacement, then why not
making it do something useful? Such as trace where they are called from,
how much they allocate etc - and your final free equivalent can then
dump the statistics to aid you in fixing the source. There are
ready-made tools for that, but so are malloc and free :)
 
C

cr88192

Harald van Dijk said:
True, but the operating systems that don't do that aren't necessarily
operating systems the software might ever be ported to. Another reason to
clean up everything, one that is more interesting to me personally, is
because if I ignore memory leaks that don't cause problems in practice,
tools such as valgrind have no way of warning me of memory leaks that do
cause problems in practice.

one idea, as a means for detecting memory leaks:
you make a wrapper that wraps malloc, and calls a wrapper function which
also accepts a source file and line number (__FILE__ and __LINE__ or
such...);
this wrapper function keeps track of allocations (what, how many, and where
they were from);
we can periodically display the number of allocations, and info about the
new (remaining) allocations.

(note that 'free' will also be wrapped, and remove an object from the
listing).

we then run the code in question in a loop, and watch for new objects to
appear.
if we have an unexpected accumulation of allocs, then we know at least where
they are comming from, and can try to work out the details from there.


for example, similar code (though more rigged up for detecting memory damage
than leaks), was fairly useful in tracking down certain bugs in my lower
compiler.
 
C

cr88192

Peter Pichler said:
If you're already diverting malloc to your own replacement, then why not
making it do something useful? Such as trace where they are called from,
how much they allocate etc - and your final free equivalent can then dump
the statistics to aid you in fixing the source. There are ready-made tools
for that, but so are malloc and free :)

though this is true, this does not rule out the original idea as a
possibility.
there are some cases where variations on the original idea are sensible.

however, usually one does not free and re-allocated the memory, rather it is
kept around and re-used each time.
the main advantages: it can offer better performance, and can help avoid
fragmenting the main heap.
basically, we create garbage yes, but one can simply forget about it and it
is gone.

this is useful for temporary-use situations (such as in a dynamic compiler,
where once the output is produced, most of the intermediate state is,
simply, no longer relevant, even to the level of being properly freed).

so, we produce a mass of small objects defining the various declared
functions and variables (consider all the crap contained in a typical set of
system headers...), small chunks of intermediate bytecoded blocks,
register-allocator info, various stacks and contexts, ... easily tearing
through a good number of MB in a matter of milliseconds.

and on to the next source file, where we forget it all and start all over
again...


a tradeoff (more memory conservative at least), is to keep track of all the
allocations (say, in an array), or maybe to implement a segmented pool, and
then to free everything in the array/pool.

this is not to say all this is all that good of an idea (or even at all
relevant for most 'general' situations), but in certain cases, it works
pretty well...

or such...
 
K

karthikbalaguru

one idea, as a means for detecting memory leaks:
you make a wrapper that wraps malloc, and calls a wrapper function which
also accepts a source file and line number (__FILE__ and __LINE__ or
such...);
this wrapper function keeps track of allocations (what, how many, and where
they were from);
we can periodically display the number of allocations, and info about the
new (remaining) allocations.

(note that 'free' will also be wrapped, and remove an object from the
listing).

we then run the code in question in a loop, and watch for new objects to
appear.
if we have an unexpected accumulation of allocs, then we know at least where
they are comming from, and can try to work out the details from there.

for example, similar code (though more rigged up for detecting memory damage
than leaks), was fairly useful in tracking down certain bugs in my lower
compiler.- Hide quoted text -

- Show quoted text -

Interesting Idea !! :):)

Karthik Balaguru
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,246
Members
46,841
Latest member
WilmerBelg

Latest Threads

Top