"null Considered Harmful"

B

Ben Bacarisse

Malcolm McLean said:
void *bbx_malloc(int size)
{
void *answer;

if(size == 0)
size = 1;
answer = malloc(size);
if(!answer)
{
fprintf(stderr, "Out of memory\n");
exit(EXIT_FAILURE);
}
return answer;
}

So we return a one byte allocation on a request for zero bytes.

It's something and nothing. Also it takes an int, preventing anyone from
creating a heap array that cannot be indexed by plain int, so Baby X doesn't
need to mess about with size_t index variables.

You still have to take them into account though. You can't avoid the
fact that malloc's parameter is a size_t so it may be better to include
a test for size < 0.
 
M

Malcolm McLean

You still have to take them into account though. You can't avoid the
fact that malloc's parameter is a size_t so it may be better to include
a test for size < 0.
One advantage of a signed type is that, in situation of being fed random
garbage values, you'll hit a negative which can be flagged as an error,
pretty quickly.
 
B

BGB

"null Considered Harmful"
http://peat.org/2013/12/07/null-considered-harmful/

"In almost every software project I've been a part of,
the majority of errors I've encountered are caused by
unexpected null references. Many are caught at compile
or test time, but they always creep though, and few
platforms are exempt. They crop up everywhere: web
applications, desktop applications, embedded software,
game consoles, mobile devices -- almost everywhere there
is software, there are null references."

I am speechless.

people also often regularly rant about for loops, variable assignment,
ability to call stuff recursively, ...


some stuff I have done in my own code is also seen as "pure evil" by
some people (some parts of infrastructure are built on ugly hacks with
pointers and function pointers), ...

but, if all this stuff were taken away, you would be left with languages
which can't really do much of anything.


though, some things I wouldn't mind as much, like having a
general-purpose bounds checking mechanism, mostly as array-overruns can
often be difficult to track down and result in far reaching bugs.

MSVC now (added fairly recently) does provide a special case
bounds-checking mechanism, but I am annoyed at how stupid it can be,
like it is either fully on or fully off, apparently not sufficiently
smart as to figure out that indexing an array with a constant index will
either always pass or always fail, and so does not need run-time bounds
checks.

luckily, it is easy to side-step: use a pointer to an array, no bounds
checking:
int foo[32];
int *pfoo;

foo[8]=3; //uses bounds checking at runtime...
pfoo=foo;
pfoo[8]=3; //no bounds checking...

in some ways, a totally amazing piece of engineering there...
 
C

Charlton Wilbur

П> I think better solution is to use exceptions. You can't ignore
П> an exception, whilst return code is very easy to ignore.

Exceptions add overhead to the runtime, which makes them only a better
solution in situations where that overhead is acceptable.

This is why I'm glad they exist in Java and Perl and Javascript, and
glad they don't exist in C.

Charlton
 
B

BGB

You still have to take them into account though. You can't avoid the
fact that malloc's parameter is a size_t so it may be better to include
a test for size < 0.

hmm...

void *newb_malloc(int size)
{
void *ptr;

if(size == 0)
{ size = 1; }
else if(size<0)
{
newb_logandtrap("Can't allocate negative sizes\n");
}else if(size>(1<<24))
{
newb_logandtrap("Not allowed to allocate overly large memory\n");
}else if(size>(1<<22))
{
newb_logandtrap("Not good to allocate large memory\n");
}else ...

ptr=malloc(size);
if(!ptr)
{
newb_logandtrap("Out of memory\n");
}
return(ptr);
}


where logandtrap could do something like record information about the
event into a log file and trap to the debugger or similar (such as via
raising an exception or similar).

maybe the library could also be set up such that the default handler
displays an error box:
"Something Happened"
Abort, Retry, Ignore, Fail.
or:
OK, Cancel.


better yet if a person could get a little animation and some annoying
sound effects for their box, or make it rapidly shake in place, ...
because, after all, it apparently works so well for website ads.
 
J

James Kuyper

П> I think better solution is to use exceptions. You can't ignore
П> an exception, whilst return code is very easy to ignore.

Exceptions add overhead to the runtime, which makes them only a better
solution in situations where that overhead is acceptable.

I don't know anything about the details - but it is commonly claimed in
C++ forums, that there are well-known techniques, in widespread common
usage, for implementing C++ exceptions so that they incur no runtime
overhead - the cost of using those techniques is that actually throwing
an exception is slower than it would be if using the more naive
implementations that do incur runtime overhead.
 
L

Les Cargill

Charlton said:
П> I think better solution is to use exceptions. You can't ignore
П> an exception, whilst return code is very easy to ignore.

Exceptions add overhead to the runtime, which makes them only a better
solution in situations where that overhead is acceptable.

This is why I'm glad they exist in Java and Perl and Javascript, and
glad they don't exist in C.

Charlton


Why is checking all the return codes such an unreasonable burden
anyway?
 
L

Les Cargill

BGB said:
On 12/11/2013 11:08 AM, Lynn McGuire wrote:
but, if all this stuff were taken away, you would be left with languages
which can't really do much of anything.

Well... that doesn't matter to "standards" enforcers who embrace a
Mandarin mentality.
though, some things I wouldn't mind as much, like having a
general-purpose bounds checking mechanism, mostly as array-overruns can
often be difficult to track down and result in far reaching bugs.

I have never understood how this is such a burden. Just either
globalize any such constraints or pass them as parameters.
MSVC now (added fairly recently) does provide a special case
bounds-checking mechanism, but I am annoyed at how stupid it can be,
like it is either fully on or fully off, apparently not sufficiently
smart as to figure out that indexing an array with a constant index will
either always pass or always fail, and so does not need run-time bounds
checks.

That sounds like a good thing.
luckily, it is easy to side-step: use a pointer to an array, no bounds
checking:
int foo[32];
int *pfoo;

foo[8]=3; //uses bounds checking at runtime...
pfoo=foo;
pfoo[8]=3; //no bounds checking...

in some ways, a totally amazing piece of engineering there...

Doctor, doctor it hurts when I do that....
 
J

James Kuyper

The commonest argument, and one I have some sympathy with, is that you
end up with code where it's actually hard to find the statements that do
the work as they are lost in five times as much error checking code.

I know what you mean. We have coding guidelines that call for checking
all error return codes, and I'm a believer in that idea, but it often
ends up creating code that looks like it's main purpose is the creation
of error messages. I once has someone submit a test driver for a routine
which triggered every single error condition, but which failed to check
whether the routine handled a successful call correctly.
 
B

BGB

Well... that doesn't matter to "standards" enforcers who embrace a
Mandarin mentality.

yeah.



I have never understood how this is such a burden. Just either
globalize any such constraints or pass them as parameters.

the issue is that lots of code either does not check bounds, or stuff
goes out of bounds due to "simple coding errors", like accidentally
allocating either the wrong amount of memory, or calculating the array
index wrong.


to some extent it is possible to use a "tripwire detector" or similar
built into the memory manager, which can detect when code has ran off
the end of an array (either when freeing memory, or periodically based
on a timer or similar and doing a scan over the heap to make sure
everything is still in-order), generally because memory objects may have
"tripwires" before and after them, and in most cases, an overrun will
mangle them, and when detected it is possible to analyze them and write
the results out to a log or similar (the memory manager also keeps track
of things like where memory allocation and free calls come from, and so
can basically say where the offending memory-object is from, ...).


but, it isn't really possible to catch the code which does so in the act
(like with an explicit trap), and sometimes there might still be memory
corruptions due to code which seemingly escapes the tripwire detector
(writes to addresses not directly adjacent to the memory object in
question).

likewise, writes to random (garbage) addresses (such as from
accidentally using uninitialized variables) may often not trap if the
application uses most of the available address space (it only reliably
traps if the address points at an address range owned by the OS).


a few times I had thought it might be nice if, when an uninitialized
variable is used by code, besides just generating a warning or similar,
if it could also set the variable initially to a "known bad" address.

like, to better deal with cases where the programmer has not gone and
faithfully fixed up every "use of uninitialized variable 'foo'" or
"'foo' must return a value" style warning.

well, though in general I try to go and clean them up, but this is not
always reliably done.

That sounds like a good thing.

bounds-checking, in general, yes.

not being smart enough to optimize away some constant-index checks, not
always so good.

luckily, it is easy to side-step: use a pointer to an array, no bounds
checking:
int foo[32];
int *pfoo;

foo[8]=3; //uses bounds checking at runtime...
pfoo=foo;
pfoo[8]=3; //no bounds checking...

in some ways, a totally amazing piece of engineering there...

Doctor, doctor it hurts when I do that....

it can matter for code which tends to be CPU bound...

a particular case where it came up was within a video codec, where these
constant-index array accesses ate lots of clock cycles.

code within video codecs tends to be fairly performance-sensitive, as
wasting too many cycles in a codec can very easily lead to performance
problems... (more so when a person is trying to squeeze speeds of like
300+ megapixels per second out of the thing, like for multiple
concurrent streams or for 4K+ or similar...).


variable indexes could theoretically still be checked, as then it is
doing its job, rather than just being stupid and wasting clock cycles.

in a few cases this has led mostly to just using lots of variables
instead of arrays, like:
int i0, i1, i2, i3, ...;
i8=3;

but, then gets kind of annoying as the compiler's optimizer sometimes
goes and rewrites things in stupid ways.

like sometimes, one will store something to a variable, and the
optimizer is like "hey, I will go inline/repeat this expression every
time that variable is used", which is sometimes not good:
say, the programmer had a specific reason for factoring out the
expression into its own variable (say, because calculating something
once and using a variable can often be cheaper than recalculating it a
bunch of times in a row, ...).


other times, it is like, one writes:
memcpy(dst, src, 8);
and the compiler is like, "I will helpfully rewrite this to":
mov edi, [...]
mov esi, [...]
mov ecx, 8
rep movsb
(*)

and the programmer is left having to manually intervene, as it is
generally preferable to have the compiler generate something like:
mov ecx, [...]
mov edx, [...]
mov [...], ecx
mov [...], edx
or, better yet:
movq xmm0, [...]
movq [...], xmm0


*: actually, "rep movsb" and friends can go "surprisingly fast" for
larger memory copies (several KiB or more), where they can outperform
more generic loops, but this doesn't mean it is the best choice for
small fixed-size copies.


so, sometimes, one is left to wonder if MSVC's code-generator is
brain-damaged, or contains a "cleverness seemingly beyond the
understanding of mortal men".
 
C

Charlton Wilbur

LC> Why is checking all the return codes such an unreasonable burden
LC> anyway?

It isn't. But if the function in question has 10 lines of logic, and
200 lines of checking for all the possible error conditions, your code
will be unreadable.

Exceptions means that you can write the 10 lines of logic clearly,
provide a way to handle the exceptions that you can handle, and ignore
all the ones you can't, for them to be caught at a higher level.

There are other virtues besides performance; simple, readable code that
focuses the maintainer's attention on the common expected cases, in
this day where my *phone* has a dual-core 64 bit processor, is likely to
save more money than code that shaves 3 cycles off each loop iteration.

Charlton
 
L

Les Cargill

Dr said:
The commonest argument, and one I have some sympathy with, is that you
end up with code where it's actually hard to find the statements that do
the work as they are lost in five times as much error checking code.

I can sympathize somewhat, but it's also possible to do the error
checking in a called function which hides it all.
 
L

Les Cargill

BGB said:
the issue is that lots of code either does not check bounds, or stuff
goes out of bounds due to "simple coding errors", like accidentally
allocating either the wrong amount of memory, or calculating the array
index wrong.

So I have habits that *preclude* that sort of thing. It's too detailed,
but enforce your constraints with the furniture provided by the language
system.
to some extent it is possible to use a "tripwire detector" or similar
built into the memory manager, which can detect when code has ran off
the end of an array (either when freeing memory, or periodically based
on a timer or similar and doing a scan over the heap to make sure
everything is still in-order), generally because memory objects may have
"tripwires" before and after them, and in most cases, an overrun will
mangle them, and when detected it is possible to analyze them and write
the results out to a log or similar (the memory manager also keeps track
of things like where memory allocation and free calls come from, and so
can basically say where the offending memory-object is from, ...).


but, it isn't really possible to catch the code which does so in the act
(like with an explicit trap), and sometimes there might still be memory
corruptions due to code which seemingly escapes the tripwire detector
(writes to addresses not directly adjacent to the memory object in
question).

likewise, writes to random (garbage) addresses (such as from
accidentally using uninitialized variables) may often not trap if the
application uses most of the available address space (it only reliably
traps if the address points at an address range owned by the OS).

*Shudder*. -Wall.
a few times I had thought it might be nice if, when an uninitialized
variable is used by code, besides just generating a warning or similar,
if it could also set the variable initially to a "known bad" address.

Ah, the 0xDEADBEEF thing...
like, to better deal with cases where the programmer has not gone and
faithfully fixed up every "use of uninitialized variable 'foo'" or
"'foo' must return a value" style warning.

Agreed. Although you really should fix those before it leaves
your hands.
well, though in general I try to go and clean them up, but this is not
always reliably done.

I don't know why not.
That sounds like a good thing.

bounds-checking, in general, yes.

not being smart enough to optimize away some constant-index checks, not
always so good.

luckily, it is easy to side-step: use a pointer to an array, no bounds
checking:
int foo[32];
int *pfoo;

foo[8]=3; //uses bounds checking at runtime...
pfoo=foo;
pfoo[8]=3; //no bounds checking...

in some ways, a totally amazing piece of engineering there...

Doctor, doctor it hurts when I do that....

it can matter for code which tends to be CPU bound...

a particular case where it came up was within a video codec, where these
constant-index array accesses ate lots of clock cycles.

That's probably true.

<snip>
 
E

Eric Sosman

I can sympathize somewhat, but it's also possible to do the error
checking in a called function which hides it all.

In my experience, that would be the exception (sorry) and
not the rule.

The hard part about dealing with a failure is not checking
a result code, but figuring out what to do when the result says
R-O-N-G. The decision about how to respond typically involves
a large amount of the caller's context -- in fact, a large amount
of the caller's caller's caller's caller's context (which is why
strategies like "call abort() on any failure" are stupid, and
"call a caller-supplied callback, passing it not enough information"
are even stupider).

Checking for errors is usually[*] pretty simple:

char *p = malloc(n);
if (p == NULL) { ... }

It's what goes in the "..." that's difficult.

[*] But not always. Can you spot the gaffe in

FILE *stream = ...;
int ch;
while ((ch = getc(stream)) != EOF ) {
...
}

? Hint: The GNU coding standards forbid you from caring about
it, even if you do spot it.
 
L

Les Cargill

Eric said:
In my experience, that would be the exception (sorry)

Har! :)

I'd rather see code that handles all the errors cases properly
and clearly and up front than try to mix error evaluation
and "real work".

I've worked on a lot of systems where you do *all* the error checking up
front, then a "commit" at the end. You have to manage partial updates,
that sort of thing.

If the "commit" fails, that's a BIG error, and we either crash or
get adult help, or erase as much of the "transaction" as we can. That's
usually a "somebody unplugged is in the 100 microseconds since we did
all the error checking" thing.

I also save off the previous values of things we changed and keep
a list of things we added, in order to restore to the previous state.

These are habits arrived at from old '90s CASE tools and
writing things like SNMP agents.
and
not the rule.

The hard part about dealing with a failure is not checking
a result code, but figuring out what to do when the result says
R-O-N-G. The decision about how to respond typically involves
a large amount of the caller's context -- in fact, a large amount
of the caller's caller's caller's caller's context (which is why
strategies like "call abort() on any failure" are stupid, and
"call a caller-supplied callback, passing it not enough information"
are even stupider).

Fair enough. See below (1) for my high-torque tool of choice for
this.
Checking for errors is usually[*] pretty simple:

char *p = malloc(n);
if (p == NULL) { ... }

It's what goes in the "..." that's difficult.

[*] But not always. Can you spot the gaffe in

FILE *stream = ...;
int ch;
while ((ch = getc(stream)) != EOF ) {
...
}

? Hint: The GNU coding standards forbid you from caring about
it, even if you do spot it.

Ha! That old warhorse. :) You have to break it up or at least call
feof() *AND* check ch. If GNU does not like "while (1)" with
"break;"s, then they are missing a good thing. The preference for
using only predicate trees troubles me - I think it's a good
way to write unreadable code.

I haven't used getc() or the like in a very long time. My dominant mode
is to use fread() or read() and unmarshal from a buffer. It's a bit more
code, but the library functions are just a little too weird. And I try
to never use a blocking or buffered channel if I can avoid it.

(1)
Past a certain level of complexity ( say, an if() {} else if () else {}
tree about six long ), I will go to a state machine. I work very hard
at *not* using nested ifs if I can. I prefer "cascaded" ones.

that at least puts all the invariants in a list-like structure for you,
and allows you to manage time-order dependencies between invariants.

In a way, this makes it worse ( it's less like natural language ) but
it's habit now. It's a bit of a trade towards being correct the first
time rather than having nicer, more friendly code later ( there are
soldier words when I have to read them again, too ) but by my lights,
it's usually worth it.

Also also - for *scripting* languages with an eval(), having "arrays
of text" indexed by strings ( or numbers ) is a heck of a great way to
write state machines.
 
B

BGB

So I have habits that *preclude* that sort of thing. It's too detailed,
but enforce your constraints with the furniture provided by the language
system.

tends to happen sometimes.

the main issue mostly is the hassle of tracking down these sorts of
issues in a largish codebase.

*Shudder*. -Wall.

"-Wall", specifically, is a GCC feature.

there is "/Wall /WX" though, which basically can do similar to "-Wall
-Werror" though.

Ah, the 0xDEADBEEF thing...

0xDEADBEEF and 0xBAADF00D and similar are useful, but pop up in fairly
specific cases (malloc, ...).

more often local variables just contain whatever was previously in this
location on the stack, which isn't as useful.

Agreed. Although you really should fix those before it leaves
your hands.


I don't know why not.

effort mostly, sometimes when going and throwing something together or
working on something else, not all warnings end being up addressed
immediately, though every so often, an effort is made to go clean up all
the warnings within a given library or similar.

MSVC now (added fairly recently) does provide a special case
bounds-checking mechanism, but I am annoyed at how stupid it can be,
like it is either fully on or fully off, apparently not sufficiently
smart as to figure out that indexing an array with a constant index
will
either always pass or always fail, and so does not need run-time bounds
checks.


That sounds like a good thing.

bounds-checking, in general, yes.

not being smart enough to optimize away some constant-index checks, not
always so good.

luckily, it is easy to side-step: use a pointer to an array, no bounds
checking:
int foo[32];
int *pfoo;

foo[8]=3; //uses bounds checking at runtime...
pfoo=foo;
pfoo[8]=3; //no bounds checking...

in some ways, a totally amazing piece of engineering there...



Doctor, doctor it hurts when I do that....

it can matter for code which tends to be CPU bound...

a particular case where it came up was within a video codec, where these
constant-index array accesses ate lots of clock cycles.

That's probably true.

yeah.


I am probably one of those strange people, who primarily develops things
like codecs mostly as plain C code (mostly ASM-free) without relying on
multithreading or GPGPU stuff.

but, granted, it also requires a bit of careful coding (and a little bit
of "magic") to get good performance out of these sorts of things.


most of my DCT-style codecs seems to cap out at around just short of 100
Mpix/sec (for comparison, XviD reaches 105 Mpix/sec on my PC with it
having a lot of ASM thrown in the mix).

I can reach 300-500 Mpix/sec, but generally this is with VQ-style codecs
targeting special image representations (such as DXT or BC7) rather than
generic RGBA (*).

*: it is harder than it may seem to get pixel arrays filled all that
quickly. a direct comparison between XviD and a VQ codec is cheating
though IMO, and XviD seems pretty hard to beat as far as the speed of
DCT-based codecs goes AFAICT.

FWIW, 62 Mpix/sec is sufficient for 1080p30 (1080p60 needs 125 though,
and 2160p30 needs 249).


but, faster is better, and leaves more time for the CPU to do other things.

likewise, VQ allows higher speeds generally at the cost of worse
compression (the goal being to get high speeds with at least passable
compression).
 
M

Malcolm McLean

There are other virtues besides performance; simple, readable code that
focuses the maintainer's attention on the common expected cases, in
this day where my *phone* has a dual-core 64 bit processor, is likely to
save more money than code that shaves 3 cycles off each loop iteration.
Basically I agree with you.
But oddly enough, both very high end and very low end hardware tend to have
precious cycles. On very low end hardware, obviously, the device might not
really be powerful enough to do what you want. On very high end hardware,
the reason for using the hardware at all is usually that you have a "heroic"
computation to do. You generally have to book very limited processor time,so
shaving cycles is a great advantage.
 
B

BGB

In my experience, that would be the exception (sorry) and
not the rule.

The hard part about dealing with a failure is not checking
a result code, but figuring out what to do when the result says
R-O-N-G. The decision about how to respond typically involves
a large amount of the caller's context -- in fact, a large amount
of the caller's caller's caller's caller's context (which is why
strategies like "call abort() on any failure" are stupid, and
"call a caller-supplied callback, passing it not enough information"
are even stupider).

usually, in my case, it amounts to one of:
critical problem, blow up and trap to debugger;
non-critical problem: maybe log the event, try to handle gracefully
(usually means passing an error status back to the caller and otherwise
having the call behaving as a no-op).

eventually, either it will reach a point where the error is either
ignorable, becomes part of the normal program logic, or can otherwise be
dealt with as appropriate.


OTOH, simply exiting on error is usually stupid, as then one ends up
with apps which just dump the user back out on the desktop or similar
for no obvious reason, which is actually one of the few options actually
worse than a "This program has performed an illegal operation and must
be shut down." box, as at least in the case of the box, you know that it
was because of a program crash...

Checking for errors is usually[*] pretty simple:

char *p = malloc(n);
if (p == NULL) { ... }

It's what goes in the "..." that's difficult.

[*] But not always. Can you spot the gaffe in

FILE *stream = ...;
int ch;
while ((ch = getc(stream)) != EOF ) {
...
}

? Hint: The GNU coding standards forbid you from caring about
it, even if you do spot it.
 
B

BartC

Also also - for *scripting* languages with an eval(), having "arrays
of text" indexed by strings ( or numbers ) is a heck of a great way to
write state machines.

Yes, what about script languages; you can write, for example:

a = b+c

which, where b and c are large data structures, might involve allocating
memory. But what happens when the allocation fails? How do you even test for
it? These languages might offer exceptions, but do you really want to test
every single operation and assignment in hundreds of lines of code?

Implemented in C, however, the suggestion seems to be that you *have* to do
just that: have checks scattered all over the actual code, rather than have
one check in a wrapper around malloc() for example. Or to have such checks
separated out and hidden away from the 'user' code.

(Since, as was mentioned, the hard bit is not checking these things,
although that will destroy the structure and readability of any code, as
well as introducing more scope for errors; it's what on earth do you do when
there is an error.)
 
B

BartC

OTOH, simply exiting on error is usually stupid

If we're talking about memory allocation failures, then it will usually be
for one of two reasons: there's either a bug in the software that makes it
use up all the memory, or (more rarely) the machine has actually run out of
memory because the program needs too much of it. (Or, sometimes, it might be
other programs that are hogging most of the memory).

In any case, usually there's not much that can be done other than report the
fact and then quit, if this was a routine allocation that was expected to
work. It's too bad if this has happened to a customer, then you need to
establish the reason.

In some cases an out-of-memory condition (ie. an attempt to allocate memory,
that failed) *can* be handled by the software, but the code will know about
it and will use a different allocator that doesn't just quit. So in an
interactive app, for example, it can report that a particular operation
failed, but then carries on and the user can try something else.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,075
Messages
2,570,562
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top