How much memory does malloc(0) allocate?

E

Eric Sosman

[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?
 
B

BGB

[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?

well, no. allocations 0 are not common (0 is "generally invalid", and <0
is invalid and can't be allocated).

but the highest point is (was) 1-15 bytes, and it rapidly drops off from
there.

but, I am not sure of what name there would be for this exact
distribution (Gaussian was the closest I could find).


correction: re-ran the heap-statistics tool, currently the highest point
seems to be 16-31 bytes (followed by 32-47 bytes, ...).

the most common object types are currently "metadata leaves" and
"metadata nodes" (basically, structures related to a hierarchical
database), followed mostly by various other small-structure types.

in any case though, small allocations seem to be pretty common.
 
E

Eric Sosman

On 7/28/2013 4:59 PM, BGB wrote:
[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?

well, no. allocations 0 are not common (0 is "generally invalid", and <0
is invalid and can't be allocated).

but the highest point is (was) 1-15 bytes, and it rapidly drops off from
there.

but, I am not sure of what name there would be for this exact
distribution (Gaussian was the closest I could find).


correction: re-ran the heap-statistics tool, currently the highest point
seems to be 16-31 bytes (followed by 32-47 bytes, ...).

the most common object types are currently "metadata leaves" and
"metadata nodes" (basically, structures related to a hierarchical
database), followed mostly by various other small-structure types.

in any case though, small allocations seem to be pretty common.

Sounds perhaps Geometric/Exponential (that wouldn't have the spike at 32k).

... and wouldn't be "centered on 0."
 
B

BGB

On 7/28/2013 4:59 PM, BGB wrote:
[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?

well, no. allocations 0 are not common (0 is "generally invalid", and <0
is invalid and can't be allocated).

but the highest point is (was) 1-15 bytes, and it rapidly drops off from
there.

but, I am not sure of what name there would be for this exact
distribution (Gaussian was the closest I could find).


correction: re-ran the heap-statistics tool, currently the highest point
seems to be 16-31 bytes (followed by 32-47 bytes, ...).

the most common object types are currently "metadata leaves" and
"metadata nodes" (basically, structures related to a hierarchical
database), followed mostly by various other small-structure types.

in any case though, small allocations seem to be pretty common.

Sounds perhaps Geometric/Exponential (that wouldn't have the spike at 32k).

yeah, graphs look about right...

the spike is a break from the pattern, but alas...
 
M

Malcolm McLean

On 7/28/2013 5:09 PM, Eric Sosman wrote:


correction: re-ran the heap-statistics tool, currently the highest point
seems to be 16-31 bytes (followed by 32-47 bytes, ...).

the most common object types are currently "metadata leaves" and
"metadata nodes" (basically, structures related to a hierarchical
database), followed mostly by various other small-structure types.

in any case though, small allocations seem to be pretty common.
Skewed sample problem.
One program doesn't necessarily represent the typical situation. If you have
a tree-like structure which dominates the total number of objects in the
system, you either have lots of allocations of sizeof(NODE), or a few
large allocations of N * sizeof(NODE). (See my book Basic Algororthms
about how to write a fast fixed-block allocator). It depends if allocation
performace is a concern or not.
The some program have mainly dynamic strings, others mainly fixed fields.
If you're ultimately storing data in a database like SQL, you might as well
write char str[64], because SQL can't handle arbitrarily long strings.
However if you're not, generally mallocing strings is neater an more robust.
 
B

BGB

On 7/28/2013 5:09 PM, Eric Sosman wrote:


correction: re-ran the heap-statistics tool, currently the highest point
seems to be 16-31 bytes (followed by 32-47 bytes, ...).

the most common object types are currently "metadata leaves" and
"metadata nodes" (basically, structures related to a hierarchical
database), followed mostly by various other small-structure types.

in any case though, small allocations seem to be pretty common.
Skewed sample problem.
One program doesn't necessarily represent the typical situation. If you have
a tree-like structure which dominates the total number of objects in the
system, you either have lots of allocations of sizeof(NODE), or a few
large allocations of N * sizeof(NODE). (See my book Basic Algororthms
about how to write a fast fixed-block allocator). It depends if allocation
performace is a concern or not.
The some program have mainly dynamic strings, others mainly fixed fields.
If you're ultimately storing data in a database like SQL, you might as well
write char str[64], because SQL can't handle arbitrarily long strings.
However if you're not, generally mallocing strings is neater an more robust.

this is not to say that they represent the bulk of the memory usage,
only that they held top-place (for the most allocated object type).

they represent around 0.87% of the total memory usage (5MB / 576MB),
with an allocation count of around 1.93M.

they are followed by heap-allocated triangles for skeletal models (~ 21k
allocs), terrain-chunk headers (6k allocs), and around 116 other object
types.

don't have a percentage for object-counts, I would have to add and
calculate this manually.


yeah, there are heap allocated strings and symbols in the mix as well,
but they don't hold as high of a position.

there were previously lots of individually wrapped int/float/double
values as well, but these have since been moved over to using slab
allocators.


to explain the 32kB spike:
this has to do with the voxel terrain logic, which has "chunks" which
are 16x16x16 arrays of 8 byte values (voxels, each represents the
locally active area in terms of 1 meter cubes, and are basically a
collection of bit-fields).

there are only about 5826 of them, but in the dump data, these represent
32% of the total memory usage (186MB / 576MB).

there are also serialized voxel regions while only having 8 allocations
(in the dump), represent 7% of the memory use (41MB / 576MB). regions
store the voxels in an RLE-compressed format, for parts of the terrain
that are not currently active.

then there are occasional other large things, like 9 console buffers
which use 2MB (currently for a 160x90 console with 4-bytes for each
character and formatting).

....


note that some data is also stored in RAM in a "compressed" format, such
as audio data for the mixer.

originally, this data was stored in RAM as raw PCM audio, but this was
kind of bulky (audio data can use a lot of RAM at 16-bit 44.1kHz), so I
developed a custom audio codec which allows random-access decompression,
and stores the audio at 176kbps.

now audio is no longer a significant memory user.


work was also going on recently to allow an alternate in-memory format
for the voxel chunks, which basically would exploit a property:
typically, each chunk only has a small number of unique voxel types;
so, in many cases, eligible chunks could be represented in a form where
they use 8-bit (byte) indices into a table of voxel-values, which would
store an eligible chunk in 6kB rather than 32kB.

but, as-is, this is a fairly involved modification.

....
 
L

Lynn McGuire

[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?


Obviously I've been going about handing out of memory conditions the
wrong way - I should just malloc a few object of negative size!

Isn't size_t always an unsigned int?

Lynn
 
J

James Kuyper

On 7/28/2013 4:59 PM, BGB wrote:
[...]
allocator statistics have generally shown that small objects (< 4kB)
represent a good portion of the total memory use, *1, but currently with
a big spike at 32kB (one of the major subsystems allocates a lot of 32kB
arrays).

*1: roughly forming a Gaussian distribution centered on 0, with millions
of small objects.

Centered on zero? So zero-byte allocations are the commonest
of all? And allocations of 10 to 20 bytes are about as common as
those of -20 to -10?


Obviously I've been going about handing out of memory conditions the
wrong way - I should just malloc a few object of negative size!

Isn't size_t always an unsigned int?

If the distribution of allocation sizes had in fact been centered on
zero, and included any positive allocation sizes, then it would also
necessarily have had to include some negative allocation sizes.
Therefore, it would have had to have been a non-conforming
implementation which used a signed type.

Of course, the description of the curve as being "centered on zero" was
incorrect. It has a peak at 0, but no part of the curve covers negative
values.
 
J

James Kuyper

On 07/29/2013 03:42 PM, Lynn McGuire wrote:
....
Isn't size_t always an unsigned int?

No. It must be an unsigned integer type, but it doesn't have to be
unsigned int. SIZE_MAX must be at least 65535, but even "unsigned short"
is big enough to meet that requirement. On a system where CHAR_BIT==16,
size_t could even be "unsigned char". The only unsigned type that can't
be size_t is _Bool.
 
K

Keith Thompson

James Kuyper said:
On 07/29/2013 03:42 PM, Lynn McGuire wrote:
...

No. It must be an unsigned integer type, but it doesn't have to be
unsigned int. SIZE_MAX must be at least 65535, but even "unsigned short"
is big enough to meet that requirement. On a system where CHAR_BIT==16,
size_t could even be "unsigned char". The only unsigned type that can't
be size_t is _Bool.

There's a common confusion between the terms "int" and "integer".

Even though the derivation of the C keyword "int" is obviously as
an abbreviation of the English word "integer", their meanings are
quite distinct.

In C, the word "integer" refers to any of a number of distinct types,
ranging from char to long long.

"int" is a type name that refers to just one of those types.
The keyword "int" can also be used as part of the names for several
other types, such as "short int", and "unsigned long long int", and
so forth, but when used by itself it refers only to that one type.

"const" and "constant" can cause similar confusion; "constant"
means, more or less, evaluable at compile time, but "const" means
"read-only".
 
G

glen herrmannsfeldt

Keith Thompson said:
(snip)

There's a common confusion between the terms "int" and "integer".

The word "an" above suggests this is appropriate.
Even though the derivation of the C keyword "int" is obviously as
an abbreviation of the English word "integer", their meanings are
quite distinct.

(snip)

However, it does hint that int might be used as an abbreviation
for the work "integer". Following the usual English rules, it should
be followed by a period.

-- glen
 
K

Keith Thompson

glen herrmannsfeldt said:
The word "an" above suggests this is appropriate.


(snip)

However, it does hint that int might be used as an abbreviation
for the work "integer". Following the usual English rules, it should
be followed by a period.

Using "int", with or without a period, as an abbreviation for "integer"
while discussing C strikes me as a Very Bad Idea. (No offense intended
to Lynn McGuire, who probably just made a minor and unintentional error,
as we all do from time to time.)
 
G

Geoff

Using "int", with or without a period, as an abbreviation for "integer"
while discussing C strikes me as a Very Bad Idea. (No offense intended
to Lynn McGuire, who probably just made a minor and unintentional error,
as we all do from time to time.)

Or he peeked at one of the header files for his implementation and
found it defined as unsigned int and assumed that is what the standard
specifies.
 
M

Malcolm McLean

Using "int", with or without a period, as an abbreviation for "integer"
while discussing C strikes me as a Very Bad Idea. (No offense intended
to Lynn McGuire, who probably just made a minor and unintentional error,
as we all do from time to time.)
Almost every integer should be int. Since integers usually end up indexing
arrays (even a char, when you think about it, will probably eventually
end up as an index into a glyph table of some sort), that means that int
needs to be able to index an arbitrary array. Then you don't need any other
types, except to save memory, or for a few algorithms that need huge integers.

We don't need twenty plus integer types in C.
 
P

Phil Carmody

Eric Sosman said:
The C99 Rationale (I haven't seen a C11 version yet) explains
the Committee's thinking; see section 7.20.3.


Fine -- But in your usage, the assert should precede the
call to malloc(), and not depend on the returned value.

What's a "solid" assert? assert() is one of the most ephemeral
bits of code it's possible to write in C.

Phil
 
J

James Kuyper

On 07/30/2013 05:34 AM, Malcolm McLean wrote:
....
Almost every integer should be int. Since integers usually end up indexing
arrays (even a char, when you think about it, will probably eventually
end up as an index into a glyph table of some sort), that means that int
needs to be able to index an arbitrary array. Then you don't need any other
types, except to save memory, or for a few algorithms that need huge integers.

We don't need twenty plus integer types in C.

If you dismiss all the reasons for doing so as irrelevant, it can seem
pointless to have so many different integer types. Using the same
"logic", we only need one hammer design.
<http://en.wikipedia.org/wiki/Hammer#Gallery>.
 
J

James Harris \(es\)

James Kuyper said:
On 07/30/2013 05:34 AM, Malcolm McLean wrote:
...

If you dismiss all the reasons for doing so as irrelevant, it can seem
pointless to have so many different integer types. Using the same
"logic", we only need one hammer design.
<http://en.wikipedia.org/wiki/Hammer#Gallery>.

Similarly, lots of different types of wheels are needed. We wouldn't want to
run our cars and bicycles on Assyrian chariot wheels. If nothing else, the
the iron scythes might get in the way of other road users. ;-)

Hence, despite the oft-quoted anti-proverbial, wheels do sometimes need to
be reinvented.

James
 
L

Lynn McGuire

Using "int", with or without a period, as an abbreviation for "integer"
while discussing C strikes me as a Very Bad Idea. (No offense intended
to Lynn McGuire, who probably just made a minor and unintentional error,
as we all do from time to time.)

I make minor and unintentional errors all the time!

My point was that size_t is unsigned. I was not
thinking about the actual size of size_t. I would
prefer all modern day usage of this kind of data
to be 64 bit. At least.

Lynn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,076
Messages
2,570,565
Members
47,200
Latest member
Vanessa98N

Latest Threads

Top