Implementing my own memcpy

Netocrat · Jun 27, 2005

You shouldn't.

You're dictating my personal preferences? Given a choice where all other
things are equal, you are saying that having a preference should not be
allowed?

IMNSHO one should be equally comfortable with testing
against a minimum as with testing against a maximum, just as in
mathematics one should be comfortable both with proof by induction and
proof by infinite descent, and in programming with both recursion and
iteration. Not being comfortable with both directions limits you.

Sure - I have no problems with that statement. I understand and can use
all of those concepts where appropriate - apart from proof by infinite
descent with which I am unacquainted. Where I can see no significant
benefit for using one approach over the other though, I disagree that a
preference is limiting. Often there is a link between a preference and a
level of understanding of a particular technique, but that isn't
necessarily true.

Mark F. Haigh · Jun 27, 2005

Netocrat said:
The word complain was a poor choice, and if you had read it in context
you would have understood that and wouldn't have to pick nits with my
phrasing.

Quit whining. Here "complain" means "produce at least one diagnostic
message" (C99 5.1.1.3). If you did not receive one for your code, you
are not compiling C; rather, you're compiling some other C-like
language (ie GNU C). C-like languages are not topical here.

Chuck is telling you or anyone listening how to invoke gcc in a
standards-compliant mode. You should listen to him.

I said that my assumption had been that trying to increment a void
pointer "would not work at all", not that it would generate a "complaint",
and that that assumption is what caused me to falsely believe that the
cast had modified the pointer arithmetic.

The difference between the two is not very meaningful. When you ignore
a diagnostic that pertains to syntax or constraint violations, you get
undefined behavior (UB). So it might work when you run it, it might
not, or it might cause earthquakes off the coast of California. What
happens when you run code containing UB is not defined by the C
standard, which is what we discuss here.

So the sentence you quote was added to confirm that my assumption was
wrong: that gcc does still "work" ie compile without error given code that
increments a void pointer; not that it doesn't generate warnings.
Granted, I didn't use the options in this case but that's beside the point
I was trying to make.

I am aware of those options and usually do use them (apart from the -W
option which now that I read about it looks useful, so I'll add it in
future).

Look, I understand the importance of correctness and precision especially
in a group about a standardised programming language, but if you're going
to jump on what you see as an error in someone's phrasing - and even to go
further and make assumptions about their general use of the compiler - at
least give them the benefit of the doubt and try to see if what they're
saying can be interpreted correctly given the overall context.

So what's your point? Either speak more precisely or grow a thicker
skin. Consider doing both if you plan on posting here regularly.

Mark F. Haigh
(e-mail address removed)

Netocrat · Jun 27, 2005

Quit whining. ....
So what's your point? Either speak more precisely or grow a thicker
skin.
Consider doing both if you plan on posting here regularly.

I'll take your advice leave it there then.

Netocrat · Jun 27, 2005

On Sun, 26 Jun 2005 19:06:11 +0000, CBFalconer wrote:

The point is not so much what you understand, but what other readers of
your article will understand. My post was intended to correct the
mistaken impression that gcc fails to properly diagnose misuse of a void
*.

Fair enough. As Mark Haigh pointed out my response was a bit of an
over-reaction. I didn't express myself well or correctly in the first
place so my later complaint against your response does seem petulant.

Richard Harter · Jun 27, 2005

If you're thinking that n could be negative,
then it's simpler to write the code so that there
are no values of n which cause undefined behavior.

while ( n > 0) {--n;}

This won't do; it's best to keep the flow control code out of the loop
body. However your observation is on point. The formula should work
for signed and unsigned (no separate forms for integer types), and
should produce iterate values in the range [n-1...0], and should not
execute the loop body if n<0. For ascending loops the formula

for (i=0; i<n; i++) {}

handles both signed and unsigned. For descending loops a variant of
Chris's formulation may be best, e.e.,

if (n>0) for(i=n; i-- > 0

{}

Richard Harter, (e-mail address removed)
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.

Richard Harter · Jun 27, 2005

You're dictating my personal preferences? Given a choice where all other
things are equal, you are saying that having a preference should not be
allowed?

You're inventing paranoid interpretations?

(IOW: Don't be silly.)

Sure - I have no problems with that statement. I understand and can use
all of those concepts where appropriate - apart from proof by infinite
descent with which I am unacquainted. Where I can see no significant
benefit for using one approach over the other though, I disagree that a
preference is limiting. Often there is a link between a preference and a
level of understanding of a particular technique, but that isn't
necessarily true.

We shall disagree, you and I. Preferences are always limiting because
they create habits that lead you to make the preferred choice without
thinking about it. If there truly is no significant difference that's
a win. However if your habit keeps you from asking the question of
which way to do it, then you lose when the choice does matter.

Richard Harter, (e-mail address removed)
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.

pete · Jun 27, 2005

Richard said:
This won't do; it's best to keep the flow control code out of the loop
body.

There is no flow control in the loop body.
Writing a while loop that checks the value of a variable
which was changed in the body of the loop,
is just the most natural thing.

char *str_rev(char *s)
{
char *p, *q, swap;

if (*s != '\0') {
p = s;
q = p + strlen(p + 1);
while (q > p) {
swap = *q;
*q-- = *p;
*p++ = swap;
}
}
return s;
}

Richard Harter · Jun 27, 2005

There is no flow control in the loop body.
Writing a while loop that checks the value of a variable
which was changed in the body of the loop,
is just the most natural thing.

Sorry, I wasn't clear. What is wanted here is a formula for which the
iteration variable is referenced within the loop body but not altered,
i.e.,

<expression generating iterate><loop body using unaltered iterate>

Of course one can write loops that change a variable within the loop;
that is the general case. In the general case the flow control code
and the loop body code are mingled together. In the special case we
can cleanly separate them if we choose to do so. AFAIK C lacks any
guaranteed way to keep the loop body from mucking with the flow
control variables; you have to use an idiom.

Richard Harter, (e-mail address removed)
http://home.tiac.net/~cri, http://www.varinoma.com
Save the Earth now!!
It's the only planet with chocolate.

Netocrat · Jun 27, 2005

You're inventing paranoid interpretations?

(IOW: Don't be silly.)

Should I be allowed to prefer my interpretation?

Don't take that seriously - this time it was _intended_ to be silly.

We shall disagree, you and I. Preferences are always limiting because
they create habits that lead you to make the preferred choice without
thinking about it. If there truly is no significant difference that's a
win. However if your habit keeps you from asking the question of which
way to do it, then you lose when the choice does matter.

Yes, no, perhaps - I suspect that we are more in agreement than
disagreement on this point anyway.

<Off-topic>
A friend once suggesed that the vast majority of arguments occur between
people who fundamentally agree but are unable to communicate with the
other person in a way that allows them to understand that they are in
agreement. I've always remembered that idea. Often I find that arguments
actually arise over subtle differences in assumptions or definitions, etc
that either don't arise during the overt argument or arise after hours of
frustrating conflict - frustrating because neither can understand the
apparent stupidity of the other in not accepting claims that are based on
assumptions or definitions which they falsely assume the other person to
share.

Anyhow this rant is better suited to a newsgroup about conflict resolution
or pop psychology so I'll leave it there.
</Off-topic>

Netocrat · Jun 27, 2005

The following accesses a draft that include further modifications, but has
the disadvantage of being a .pdf file rather than a .txt file. This
hampers quick and easy searches, grepping, quoting, etc.

<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf>

Meanwhile, I think I have a suitably formatted and organized version of
n869.txt mounted as:

<http://cbfalconer.home.att.net/download/n869_txt.bz2>

which means you have to have bzip2 on your system to extract it properly,
but minimizes the download time (it is about 200k).

Does your text version include the modifications of the pdf version?

How is it different from the document in the link pete gave originally -
what are the modifications and when were they introduced?

Also C90 and C89 seem to be interchangeable terms - correct?

Finally I understand that C90/C89 had some modifications made prior to C99
- where are those detailed?

Netocrat · Jun 27, 2005

On Tue, 28 Jun 2005 03:05:48 +1000, Netocrat wrote:

Oh and strictly speaking does ANSI C refer to? The original C89/C90
standard or the updated C99 one or can it be either? Is ANSI still
associated with the later standard?

CBFalconer · Jun 27, 2005

Netocrat said:
. snip ...

Does your text version include the modifications of the pdf
version?

No, if you mean the final issue pdf version. It cuts the
indentation down so it doesn't generate over long lines when
quoted, and strips out pagination.

--
"I'm a war president. I make decisions here in the Oval Office
in foreign policy matters with war on my mind." - GWB 2004-2-8
"If I knew then what I know today, I would still have invaded
Iraq. It was the right decision" - G.W. Bush, 2004-08-02
"This notion that the United States is getting ready to attack
Iran is simply ridiculous. And having said that, all options
are on the table." - George W. Bush, Brussels, 2005-02-22

Netocrat · Jun 28, 2005

No, if you mean the final issue pdf version. It cuts the indentation down
so it doesn't generate over long lines when quoted, and strips out
pagination.

Actually I meant the pdf you linked to which is quoted above.

Michael Wojcik · Jun 28, 2005

In addition, I would argue that it is the caller's
responsibility to pass valid arguments. Have you noticed that even in
Unix, library-only functions tend not to set EINVAL much, and certainly
not for invalid pointer arguments? This is because such a requirement
would needlessly slow down and bloat correctly written code, i.e. code
that always passes valid arguments.

So you advocate a design that substitutes premature and very likely
insignificant optimization for safety? Brilliant. It's that sort
of thinking that's given C its sterling reputation for secure
programming.

Here's a better plan: in any function that's not a small, static,
leaf function with only one caller, VALIDATE THE PARAMETERS. It
costs very, very little - if there's a performance impact, the
function call itself must be in the inner loop, so you have other
design issues to examine.

It is usually functions invoking
system calls that set errno to EINVAL because the operating system
kernel *has to* check all arguments for correctness, so as to avoid
accidental or malicious system corruption.

In typical modern OSes, the kernel "has to check all arguments" as
a side effect of the context switch. EINVAL comes along for free.

I will also point out that your check is incomplete anyway because
``obj'' may contain an invalid address that isn't NULL;

It doesn't verify that the user isn't a moron, either. When C
provides a magic API for validating data against all improper uses,
I'm sure many of us will adopt it. (Some, of course, will continue
to complain that it causes "bloat".)

The point of defensive programming is preventing the problems you
can reasonably prevent. Throwing up your hands because there are
some you cannot prevent is a foolish response.

CBFalconer · Jun 29, 2005

Michael said:
.... snip ...

Here's a better plan: in any function that's not a small, static,
leaf function with only one caller, VALIDATE THE PARAMETERS. It
costs very, very little - if there's a performance impact, the
function call itself must be in the inner loop, so you have other
design issues to examine.

In my book, after validation, if you can put a sensible
interpretation on an invalid parameter, do so rather than causing a
crash. My basic example is interpreting a NULL pointer as pointing
to an empty source string. Obviously this doesn't apply to a
destination. My version of strlcpy and strlcat does this.

<http://cbfalconer.home.att.net/download/strlcpy.zip>

Corollary: If a function return has a meaning, don't alter that
meaning for some peculiar set of input values. In the above,
strlcpy/cat always return the space required for a successful
action, irrespective of actual success.

Nils Weller · Jun 30, 2005

So you advocate a design that substitutes premature and very likely
insignificant optimization for safety? Brilliant. It's that sort
of thinking that's given C its sterling reputation for secure
programming.

I realize that this is a very religious topic involving very strong
feelings (as evidenced by the tone of the paragraph above), and I do not
intend to change anyone's coding style.

It is my opinion that the check under debate does more harm than good
because it does not only needlessly slow down and bloat correctly
written code, but it also makes it harder to detect and handle the bug.
The errno construct is particularly bad to force the problem upon the
callers attention because a caller is unlikely to check whether the
function failed because of an invalid argument; After all, the function
would not have been called had the arguments not been believed to be
valid. The bug is likely to be masked. On the other hand, if the caller
does examine errno, it could just as well have been more careful to
avoid the bug in the first place, making the check within the function
superfluous (it would make more sense to use assert() or similar.)

If you frequently encounter a situation where you erroneously pass a
null pointer to a function, then perhaps it is time to ask yourself why
you could ever lose track of a property as essential as whether or not
your pointer points to a valid object, or perhaps why you cannot write a
few allocator and deallocator functions more carefully rather than
cluttering all other functions with needless argument validations.

Here's a better plan: in any function that's not a small, static,
leaf function with only one caller, VALIDATE THE PARAMETERS. It
costs very, very little - if there's a performance impact, the
function call itself must be in the inner loop, so you have other
design issues to examine.

Some validations make sense, others do not. My point of view is that in
cases like the one we are discussing, the validation results in either
an incomplete and suboptimal solution to a dubious and rare problem, or
duplicate and unneeded work. You are free to disagree.

In typical modern OSes, the kernel "has to check all arguments" as
a side effect of the context switch. EINVAL comes along for free.

This statement doesn't make any sense to me. First of all, you probably
mean a ``mode switch'' rather than a ``context switch''. Second, the
arguments are validated when the system call itself accesses (or tries
to access) the data. And this is because a protection boundary is
crossed and so the kernel has to check all arguments for correctness, so
as to avoid accidental or malicious system corruption, which is what I
said above. It is a deliberate and conscious decision to treat the
program as an enemy rather than a trusted partner, and nothing comes for
free.

The point of defensive programming is preventing the problems you
can reasonably prevent. Throwing up your hands because there are
some you cannot prevent is a foolish response.

Yawn.

Michael Wojcik · Jul 1, 2005

It is my opinion that the check under debate does more harm than good
because it does not only needlessly slow down and bloat correctly
written code,

It does nothing "needlessly". The need is patently obvious from the
miserable security state of a vast number of C programs.

In the vast majority of cases, the increase in execution time and
program size you complain about are negligible.

but it also makes it harder to detect and handle the bug.

Nonsense. One possible consequence of undefined behavior is that
the program works correctly anyway, or appears to. This is never
an issue when an error is detected by correctly-written error
handling. Error detection is only harder with error handling if
the probability of getting the error handling seriously wrong is
greater than the probability of the undefined behavior being
inobvious. If those odds are against error detection, I suggest
you find a better programmer to write your error-handling code.

The errno construct is particularly bad ...

I didn't advocate using errno; that was another poster. I think
overloading errno for application error signalling is a poor plan.

callers attention because a caller is unlikely to check whether the
function failed because of an invalid argument;

Certainly, if the caller was written by someone incompetent. If
we're going to assume that, though, then your assumption that the
caller provided valid arguments looks a bit shaky, doesn't it?

After all, the function
would not have been called had the arguments not been believed to be
valid.

This is one of the silliest arguments I've heard in some time.
Programs do not "believe" anything. Programmers may believe they
have written correct code; they are often wrong.

The bug is likely to be masked. On the other hand, if the caller
does examine errno, it could just as well have been more careful to
avoid the bug in the first place, making the check within the function
superfluous (it would make more sense to use assert() or similar.)

I've explained why I believe assert() is useless in other threads;
I won't reiterate that now. Suffice it to say that I find this line
of argument completely unpersuasive, too.

If you frequently encounter a situation where you erroneously pass a
null pointer to a function,

I do not reserve error handling for conditions I believe will be
frequent.

Some validations make sense, others do not.

A vapid generalization. Some ideas apply to some situations. So?

This statement doesn't make any sense to me. First of all, you probably
mean a ``mode switch'' rather than a ``context switch''.

I meant just what I wrote. A switch between user mode and kernel
mode is a context switch. This usage is well-established.

Second, the
arguments are validated when the system call itself accesses (or tries
to access) the data. And this is because a protection boundary is
crossed and so the kernel has to check all arguments for correctness, so
as to avoid accidental or malicious system corruption, which is what I
said above.

In the traditional SysV Unix implementation, for example, the kernel
has a different page mapping from each user process, for obvious
reasons; and when a user process makes a system call, the parameters
must be remapped. If any of them point to pages which are not mapped
for that process, the VMM traps the access. See Bach, _The Design of
the UNIX Operating System_, 6.4.2. The detection of invalid pointers
happens automatically by the memory management hardware during the
context switch process.

True, some OSes do make explicit checks (I see the Linux 2.4 kernel
does). Others do not. Your generalization was no more correct than
mine was, it appears.

Yawn.

Those who don't care about security are doomed to lose it. Remind me
never to use any software you've written.

--
Michael Wojcik (e-mail address removed)

Against all odds, over a noisy telephone line, tapped by the tax authorities
and the secret police, Alice will happily attempt, with someone she doesn't
trust, whom she can't hear clearly, and who is probably someone else, to
fiddle her tax return and to organise a coup d'etat, while at the same time
minimising the cost of the phone call. -- John Gordon

Nils Weller · Jul 2, 2005

[...]

The errno construct is particularly bad ...

Click to expand...

I didn't advocate using errno; that was another poster. I think
overloading errno for application error signalling is a poor plan.

And I specifically objected to the errno construct used by said other
poster. And then you objected to my objection ... I think we are getting
dragged into an unnecessary discussion about a style issue, and there is
no enlightenment whatsoever for anyone following this thread because
it's just too pointless (yes, I should have known better than to start
talking about a religious topic.)

[...]
Those who don't care about security are doomed to lose it. Remind me
never to use any software you've written.

I think you need to cool down a bit. Nothing in programming is either
good or bad; some things just suck less than others, and there are
always tradeoffs. I believe that being careful to pass valid arguments
and omitting the checks will pay off in the long run - through reduced
(source) code size and perhaps a little more speed, and in addition,
that the errno construct in particular is likely to mask bugs.

Furthermore, it does not even make sense to say that such argument
validation in case of a bug does actually buy any security because the
definition of ``security'' depends upon the context of the application.
When writing a desktop or server application, an invalid pointer
argument will probably result in a crash, which is good in some ways
because it makes the bug obvious and helps you find and fix it.

On the other hand, if you're writing control software for a spaceship,
or anything else whose survival is ``mission-critical'', then you would
of course prefer sophisticated error detection and handling over a crash
But then you proably wouldn't be using C either, and there would be a
whole range of other potential problems to be handled. The choice of an
argument validation policy and how you can react to errors is not a
choice of the Dark Side vs. the Bright Side, but of whatever makes the
most sense in your circumstances.

Note that I'm not even arguing against argument validation in general,
I just prefer to omit it in cases like the one we are discussing, and I
think that the standard C library, including all of string.h, is giving
a good example to follow for library functions in general - If you pass
an invalid argument, you get undefined behavior.

It does nothing "needlessly". The need is patently obvious from the
miserable security state of a vast number of C programs.

So your point is that there are many sloppy programmers who may use the
software, and that you would rather give them an error code than a
crash. This is fine with me, but it doesn't mean that coding with this
purpose in mind results in good software design, nor does it mean that
your choice of policy is better than ``mine'' (again, my comments were
restricted to a very specific case.) While there is a vast number of
broken C programs out there, there is also a vast number of well-working
(though certainly not completely bug-free) C programs out there that
never pass invalid arguments to library functions.

Do HP-UX, AIX and UnixWare with page zero mapped make your software
``more secure'' than, say, Linux or FreeBSD with page zero unmapped,
only because they will permit a program that incorrectly dereferences a
null pointer for reading to continue executing?

In the vast majority of cases, the increase in execution time and
program size you complain about are negligible.

Negligible but still unnecessary if the caller is written correctly. A
caller should take full responsibility to invoke the callee correctly.
This will reduce (source) code size and it may help you enforce a design
where the state of your objects is always well-known.

There are certainly times when this rule of thumb should be lifted
because an invalid invocation seems more likely, but low-level library
functions do not belong into this category, and bad null pointer
arguments should probably never be expected (though there are exceptions
where it makes sense to explicitly define, or overload the meaning of a
null pointer - see free() and fflush() for examples.)

but it also makes it harder to detect and handle the bug.

Click to expand...

Nonsense. One possible consequence of undefined behavior is that
the program works correctly anyway, or appears to. [...]

The errno construct I objected to, and which triggered this pointless
subthread, requires explicit interaction from the caller. A dereferenced
null pointer will yield a crash on a vast number of implementations.
There are platforms where this is not the case, but perhaps that just
means that you should not be doing software development on them if you
can avoid them?

Don't get me wrong, I'm not saying ``let's trade program robustness for
lower code size and programming efforts!'', but something more along the
lines of ``this simply does not affect robustness in a well-written
application, and even if the bug does occur some time, then it is
questionable whether the `defensive' approach does actually save the
day.''

Certainly, if the caller was written by someone incompetent. If
we're going to assume that, though, then your assumption that the
caller provided valid arguments looks a bit shaky, doesn't it?

When was the last time you actually adhered to what you are proposing
here? If you know what you're doing, then you also know with reasonable
certainty whether or not you are invoking a function correctly at any
point in time. Programming is no lottery and you always have to make
essential assumptions about the integrity of your program.

Many Unix functions set a wide variety of errno codes, but it is
impractical and nonsensical to test for all of them.

char buf[128];
int rc;
int fd;

/* ... */

rc = read(buf, sizeof buf - 1, fd);
if (rc == -1) {
/*
* Would you really test for:
* EBADF (bad file descriptor)
* EFAULT (bad address)
* EINVAL (bad STREAM or multiplexer)
* ... and a variety of other obscure
* errors that you *know* cannot occur
* in the particular program context?
*/
} else if (rc == 0) {
/* EOF */
} else {
buf[rc] = 0;
/* Use buf */
}

If you ever find yourself writing something like

if (errno == EFAULT) {

.... then you should reassure yourself that the pointer can *never* be
invalid and solve the actual problem, rather than coding around it.

This is one of the silliest arguments I've heard in some time.
Programs do not "believe" anything. Programmers may believe they
have written correct code; they are often wrong.

See the example above.

This whole thing really boils down to the question of whether or not
compatibility with buggy code is desirable. Programmers may also be able
to get their implementation to enable them to dereference null pointers,
to emulate misaligned instructions, and to make string constants
writable. That doesn't mean that any development relying on these
features will result in stable and well-designed software.

the UNIX Operating System_, 6.4.2. The detection of invalid pointers
happens automatically by the memory management hardware during the
context switch process.

The kernel is also mapped into the process's address space, yet it
should not be accessed through a userland pointer, which is why address
translation does not suffice to ensure validity.

True, some OSes do make explicit checks (I see the Linux 2.4 kernel
does). Others do not. Your generalization was no more correct than
mine was, it appears.

I don't think I made any generalizations; While the pointer stuff is
indeed for the most part covered by the hardware, all other arguments
also need to be checked explicitly. For example, an integer you pass to
the kernel may or may not be a valid file descriptor, but the kernel
cannot take its validity for granted.

Nils Weller · Jul 2, 2005

rc = read(buf, sizeof buf - 1, fd);

Of course I had to goof this one!

rc = read(fd, buf, sizeof buf - 1);

CBFalconer · Jul 2, 2005

Nils said:
Of course I had to goof this one!

rc = read(fd, buf, sizeof buf - 1);

I have no idea what it goes with, because your previous article was
much too long to read.

However, you have still goofed, because
there is no such standard function as 'read'. Look up fread, which
IS portable.

silly with memcpy()	4	Sep 30, 2011
Build My Own CRM	1	Apr 8, 2023
CUDA segmentation error I cannot resolve	0	Mar 7, 2024
Adding adressing of IPv6 to program	1	Feb 16, 2023
Copying files from sub folders under source directories into subfolders with same names as source di	17	May 19, 2014
Question regarding memcpy and memmove function	16	Sep 2, 2007
Assigning an array to another array using C's assignment operator	0	Feb 1, 2013
Assigning an array to another array using C's assignment operator	13	Jan 31, 2013

Implementing my own memcpy

Netocrat

Mark F. Haigh

Netocrat

Netocrat

Richard Harter

Richard Harter

pete

Richard Harter

Netocrat

Netocrat

Netocrat

CBFalconer

Netocrat

Michael Wojcik

CBFalconer

Nils Weller

Michael Wojcik

Nils Weller

Nils Weller

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads