Nick Keighley said:
#the nonsense is almost all [but 2 people not] that write here
#have fear in think or managiament memory they programs use
incomprehensible.
if I guess right you suggest I'm scared to write my own version of
malloc.
I simply don't see the point.
One of the benchmarks[1] I was using on my compiler project was,
miraculously, faster than gcc, no matter what the latter's optimisation
level.
also happened to me before.
in the past I had cases where JIT-compiled BGBScript code was
outperforming GCC-compiled C code (usually for trivial tests, like
recursive Fibonacci or doing arithmetic in a loop).
nevermind if this is hardly the usual case. being a VM-based scripting
language, and currently running as threaded code (almost purely function
calls back into the C-based interpreter logic), it is generally
considerably slower than natively compiled C.
(the issue is that a full JIT is terrible to try to maintain in the face
of a language/VM design that is still a moving target).
Ie. always 4.2 seconds, compared with gcc's 5.6 seconds. Then I discovered
the timing of this benchmark is mostly based around calls to malloc() and
free(). Two other C compilers were both around 2.4 seconds. I was using
lcc-win's library.
I plugged in my own allocator, built on top of malloc(), which was
better at
lots of small allocations, and my timing was then 1.2 seconds!
yeah.
pretty much how I got into writing memory managers originally, was long
ago off in the land of Linux (although my memory is fuzzy, as it may
have been on DJGPP+DOS, I forget, this was around the late 90s), I was
writing some apps that were allocating lots of small strings and
similar. "malloc()" would chew through memory, and then often cause the
app to crash as well.
I later devised a solution:
I would allocate a 1MB block of memory, and created a small linked-list
allocator (the block would have a link to the start of the free-list),
and used best-fit. I was able to considerably reduce memory-bloat and
crashes by doing so.
GC started being used on-off early on as well (it has been a long hard
road getting it "just right" though).
I later improved density by switching from a free-list to fixed-size
memory cells (originally 8 bytes) and using a bitmap for allocation, and
improved performance by switching over to first-fit, and capacity (by
allocating more chunks as-needed).
later on, I switched to 16-byte cells and 4MB chunks, as well as tending
to fall back to using a different allocation strategy (currently pulling
the actual memory from malloc) for larger objects (several kB and
larger), mostly as this tends to be faster (the performance of scanning
a bitmap for spans of free cells drops off rapidly as the object size
gets larger).
similarly, free-lists were also devised for the small case as well,
mostly by daisy-chaining free objects of a particular number of cells
(used prior to a bitmap-based scan). this handled the case of up to 256
cells (4kB).
....
actually, I remember much more recently (on WinXP, likely using either
MinGW or MSVC, I forget) writing an small app which used "malloc()" for
small strings, and it also chewed through absurd amounts of memory (it
processed text files, allocating strings for individual tokens, ..., and
could easily run up against the 3GB process-size limit). so, a custom MM
still makes some sense here as well.
So, you were saying about there not being any point? (Of course this was
not
rewriting malloc(), but the large memory blocks I worked with could just
as easily have come from the OS, and given had the same results.)
if the object becomes significantly larger than the OS page size (not
necessarily 1:1 with HW pages, for example Windows uses 4kB HW pages,
but 64kB OS pages), then the cost overhead of just allocating a raw
chunk of memory gets much smaller (an only partially-used page is small
vs the size of the object itself).
[ actually, I have before wondered about bi-level allocation in a
filesystem, say, where 64 or 256 byte cells are used for smaller files
(probably by subdividing blocks), and the usual 4kB / 8kB / 16kB blocks
are used for larger ones, but I have little say over what OS developers
do, and app-local pseudo-filesystems are of little relevance to OS
people. (note that the Linux EXT filesystems instead work by typically
storing small files directly within the "inode" structure, but this
creates a jump between what will fit within the inode, and what will
require full disk blocks). ]
but, if smaller, using a full page to allocate a small string is almost
pure waste.
(Sadly I had to retire this benchmark, as it was not actually testing
generated code.)
[1]
http://shootout.alioth.debian.org/u32/program.php?test=binarytrees&lang=gcc&id=1