Strange bug

Ian Collins · Nov 23, 2010

Tim Rentsch said:
Tim Rentsch said:

Ian Collins said:

On 11/23/10 01:42 PM, Tim Rentsch wrote:

Well anyway, I have fixed it now with strcpy_safe. Here is the complete
working code.

#include<stdio.h>

You have calls to strlen(). You must declare it before using it,
preferably via "#include<string.h>". [snip]

The return type of strlen() is size_t, and therefore it is undefined
behavior to use strlen() without a '#include<string.h>' directive.

Isn't that what Keith said?

Click to expand...

Maybe it was but I didn't read it that way. I thought
Keith was saying it's okay to declare strlen() without
a #include<string.h>, even though with the #include
would be preferable. My point is that declaring strlen()
"manually", without the #include, still leads to undefined
behavior.

Click to expand...

The standard explictly permits this. C99 7.1.4p2:

Provided that a library function can be declared without
reference to any type defined in a header, it is also permissible
to declare the function and use it without including its
associated header.

(There is almost no sane reason to take advantage of this permission.)

The think the fish-hook there is "declared without reference to any type
defined in a header" given strlen() returns size_t, which is defined in
a header!

Tim Rentsch · Nov 23, 2010

Keith Thompson said:
Tim Rentsch said:

Ian Collins said:

On 11/23/10 01:42 PM, Tim Rentsch wrote:

Well anyway, I have fixed it now with strcpy_safe. Here is the complete
working code.

#include<stdio.h>

You have calls to strlen(). You must declare it before using it,
preferably via "#include<string.h>". [snip]

The return type of strlen() is size_t, and therefore it is undefined
behavior to use strlen() without a '#include<string.h>' directive.

Isn't that what Keith said?

Click to expand...

Maybe it was but I didn't read it that way. I thought
Keith was saying it's okay to declare strlen() without
a #include <string.h>, even though with the #include
would be preferable. My point is that declaring strlen()
"manually", without the #include, still leads to undefined
behavior.

Click to expand...

The standard explictly permits this. C99 7.1.4p2:

Provided that a library function can be declared without
reference to any type defined in a header, it is also permissible
to declare the function and use it without including its
associated header.

Right, but the return type of strlen() is size_t, *which
is defined in a header*. Hence strlen() cannot be so
declared.

Keith Thompson · Nov 23, 2010

Ian Collins said:
Tim Rentsch said:

On 11/23/10 01:42 PM, Tim Rentsch wrote:

Well anyway, I have fixed it now with strcpy_safe. Here is the complete
working code.

#include<stdio.h>

You have calls to strlen(). You must declare it before using it,
preferably via "#include<string.h>". [snip]

The return type of strlen() is size_t, and therefore it is undefined
behavior to use strlen() without a '#include<string.h>' directive.

Isn't that what Keith said?

Maybe it was but I didn't read it that way. I thought
Keith was saying it's okay to declare strlen() without
a #include<string.h>, even though with the #include
would be preferable. My point is that declaring strlen()
"manually", without the #include, still leads to undefined
behavior.

Click to expand...

The standard explictly permits this. C99 7.1.4p2:

Provided that a library function can be declared without
reference to any type defined in a header, it is also permissible
to declare the function and use it without including its
associated header.

(There is almost no sane reason to take advantage of this permission.)

Click to expand...

The think the fish-hook there is "declared without reference to any type
defined in a header" given strlen() returns size_t, which is defined in
a header!

Hmm. I see that I've read that paragraph in a way that made sense to
me rather than in a way that's consistent with what it actually says.

My "sensible" assumption is that if a function is declared in one
header and depends on a type defined in nother header, you can
declare the function yourself as long as the type is visible.

Concretely, I thought that this:

#include <stddef.h>
size_t strlen(const char *s);
int main(void) {
return strlen("");
}

is strictly conforming. But I now see that p2 doesn't apply in
this case.

On the other hand, I'd be astonished to see any implementation where
the above program doesn't work as expected. And I still suspect
that it was the *intent* for that program to be strictly conforming,
even if the standard doesn't say so. If you think about the reason
that declaring a library function is legal, it doesn't make sense
that a type defined in a different header should matter.

For that matter, it might still be possible to infer from the rest
of the standard that it's strictly conforming, though I'm not going
to take the time to figure out how. (If so, then 7.1.4p2 would
be redundant.)

Still, the standard doesn't guarantee what it doesn't guarantee.
And I don't think I'd advocate "fixing" that paragraph, given that
that's what headers are for.

J. J. Farrell · Nov 23, 2010

Super said:
If the function is called with a null string pointer then the program
will not crash but the return value will be undefined. In the normal
"success" case, the return value will be 0.

The function in question is:

> #include <stdio.h>
>
> int strcpy_safe(char *s, char *t)
> {
> int i = 0;
> if(strlen(t) == 0)
> return; // handle null strings
> for(; i < strlen(t); ++i)
> *(s+i) = *(t+i);
> *(s+strlen(t))='\0';
> return 0;
> }

What do you mean by "a null string pointer" and "null strings"? They are
not well-defined terms, only you know for sure what you mean by them; it
would help sensible discussion if you would define them.

There are two parameter values relevant to this discussion which
represent edge cases for your function. These are: a null pointer

strcpy_safe(dest, NULL);

and an empty string

strcpy_safe(dest, "");

In the first case your function has undefined behaviour because it
passes a null pointer to strlen. Undefined behaviour means that anything
at all can happen, and different things are likely to happen in
different C implementations. One of the commoner possibilities is that
the program crashes at that point. You might choose not to worry about
that since your behaviour would be the same as the C library functions,
but it's common to check for null pointers in this sort of case so you
can get more predictable behaviour if your function ever gets one by
accident. The overhead of checking should be negligible.

In the second case, your function does a return without specifying any
value. This breaks a constraint of the language definition, and the
compiler is required to issue a diagnostic to tell you about your error.
If the compiler chooses to carry on and compile the program, once again
anything can happen.

Why are you detecting empty strings and handling them differently
anyway? It's much simpler and more usual to support them like any other
string. If you just delete the lines

> if(strlen(t) == 0)
> return; // handle null strings

you'll both get rid of your error and fully support empty strings.

Someone earlier in this topic suggested using memmove so this code is
left over from when I was trying that out. For efficiency reasons I
decided to use my own strcpy_safe function which is tailored for strings,
not arbitrary memory.

It's extremely unlikely that your own function would be more efficient
than one based on memmove, particularly as currently written with all
those calls to strlen. Something like

memmove(s, t, strlen(t)+1)

would almost certainly be better than calling your strcpy_safe,
especially if you already know the length of the source string so you
can avoid calling strlen here.

Ian Collins · Nov 23, 2010

Ian Collins said:
Ian Collins said:

On 11/23/10 01:42 PM, Tim Rentsch wrote:

Well anyway, I have fixed it now with strcpy_safe. Here is the complete
working code.

#include<stdio.h>

You have calls to strlen(). You must declare it before using it,
preferably via "#include<string.h>". [snip]

The return type of strlen() is size_t, and therefore it is undefined
behavior to use strlen() without a '#include<string.h>' directive.

Isn't that what Keith said?

Maybe it was but I didn't read it that way. I thought
Keith was saying it's okay to declare strlen() without
a #include<string.h>, even though with the #include
would be preferable. My point is that declaring strlen()
"manually", without the #include, still leads to undefined
behavior.

The standard explictly permits this. C99 7.1.4p2:

Provided that a library function can be declared without
reference to any type defined in a header, it is also permissible
to declare the function and use it without including its
associated header.

(There is almost no sane reason to take advantage of this permission.)

Click to expand...

The think the fish-hook there is "declared without reference to any type
defined in a header" given strlen() returns size_t, which is defined in
a header!

Click to expand...

Hmm. I see that I've read that paragraph in a way that made sense to
me rather than in a way that's consistent with what it actually says.

My "sensible" assumption is that if a function is declared in one
header and depends on a type defined in nother header, you can
declare the function yourself as long as the type is visible.

Concretely, I thought that this:

#include<stddef.h>
size_t strlen(const char *s);
int main(void) {
return strlen("");
}

is strictly conforming. But I now see that p2 doesn't apply in
this case.

Yes, it took a while for the penny to drop for me as well. Maybe

"Provided that a library function can be declared without reference to
any type defined in the standard header where the function is declared"

Would be better wording.

Eric Sosman · Nov 23, 2010

Well anyway, I have fixed it now with strcpy_safe. Here is the complete
working code.

#include<stdio.h>

int strcpy_safe(char *s, char *t)
{
int i = 0;
if(strlen(t) == 0)
return; // handle null strings

"Don't handle," you mean. A reasonable copy operation would
make an empty copy of an empty string; yours just leaves the result
as whatever it was beforehand:

char result[] = "I met a man who wasn't there.";
char empty[] = "";
strcpy_safe(result, empty);
assert (strlen(result) == strlen(empty)); // Oops!

Oh, and just for laughs: Your use of strlen() invokes undefined
behavior, and really will fail on some real machines.

for(; i< strlen(t); ++i)
*(s+i) = *(t+i);
*(s+strlen(t))='\0';

It is astonishing that you've had the nerve to use the word
"efficiency," and then offer a QUADRATIC copy operation!

return 0;
}

static int TrimWSpaceL(char *s)
{
int memmove(char *,char*);

You've criticized the Standard library as "stupid," yet you
haven't the wit to write a proper declaration of memmove(). Worse
still, you haven't the wit *not* to write such a declaration! And
here we are in undefined-behavior-land yet again.

if(isspace(*s)) {

Undefined behavior, for yet another reason that has been
explained here oft and oft again.

If you're "Super Ted," I shudder to think how imbecilic Plain Ted
must be.

Jens Thoms Toerring · Nov 23, 2010

If the function is called with a null string pointer then the program
will not crash but the return value will be undefined. In the normal
"success" case, the return value will be 0.

Please define what you mean by "null string pointer", otherwise
discussions are rather futile. As far as I understood your code
it is supposed to mean an empty string, i.e. a char array with
the first element being '\0'. That's the shortest possible string,
i.e. a string that doesn't contain a single character. That's a
completely "legal" string (and not uncommon at all).

But then there is also the possibility that your funcion receives a
NULL pointer as (at least) one of its arguments. And a NULL pointer
is a completely different beast from a pointer to an empty string.
Please compare

char *p = "";
char *q = NULL;

The first one is a pointer to a (constant) empty string that all
standard functions will deal with correctly (your strcpy_safe()
function doesn't though). The second one is a pointer that points
to an invalid memory location, i.e. an address that you never can
use. Most standard functions don't expect to receive such a pointer
and if they do will make the program crash - you're not supposed
to pass such a pointer to one of them. But then you use strlen()
(in a function you annotated with the word "safe") all over the
place without checking for NULL pointers. But most implementations
of strlen() won't react gracefully when receiving a NULL pointer
and crash - but they won't when receiving a pointer to a string
that doesn't hold a single character. So your use of the word
"safe" seems to be a bit strange.

And your claim that the program will not crash when you return
nothing from a function that is advertised as returning an int
may be correct on a number of implementations. But ask yourself:
what do I do with the return value as the caller of a function
that, under some circumstances, doesn't return anything? I.e.
what will be the value of the variable 'x' when you use your own
strcpy_safe() function:

int x = strcpy_safe( s, t );

(assuming that 's' and 't' are pointers to strings) with t
pointing to an empty (but completely valid) string? If you
check the return value you do it because you want to test if
things worked out the way you hoped for and be able to deal
with situations were it didn't - otherwise having an int re-
turn value is rather useless here. But what will be the value
of 'x' if 't' was a "null string pointer" (whatever you men by
that)? Since your function didn't return anything it can't have
a well-defined value. On most implementations 'x' will probably
end up with some more or less random value - which might have
a good chance of being 0, i.e. the value you otherwise return
on success. So, even on obliging implementations you would end
up with a useless value - everything your function will return
is suspect. And other implementations might take that as an
excuse for crashing your program...

Someone earlier in this topic suggested using memmove so this code is
left over from when I was trying that out. For efficiency reasons I
decided to use my own strcpy_safe function which is tailored for strings,
not arbitrary memory.

A string is (more or less arbitrary) memory, just with an extra
twist, i.e. that you know where it ends from the '\0' character
at the end. Actually, all strings are char arrays and the only
thing that distinguishes them from non-string char arrays is
that they have a '\0' character somewhere. But even if you see a
big difference between a string and a char array it still doesn't
make sense to declare a standard function instead of including the
appropriate header file, especially when getting the declaration
completely wrong - even for testing purposes.

And nothing in your functions was efficient by any conventional
measures - they actually were about as inefficient as one can
get without seriously considering how to do it worse. Sorry for
having to be that blunt. If you want to talk about efficiency
please try to understand at least the difference between e.g.

size_t i;
for ( i = 0; i < strlen( t ); i++ )
* ( s + i ) = * ( t + i );
* ( s + strlen( t ) ) = '\0';

and

while ( *s++ = *t++ )
;

Start by figuring out that (or if) they have the same result.
Count how many times you have to call strlen(). Try to figure
out what strlen() needs to do to return the correct value (may-
be by writing your own strlen() function (just try to call it
the same;-) to get an idea). If you've done that ask yourself:
how many memory accesses does the first version take for a
string with N characters and how many the second (each memory
access does take some time)? How many function calls are re-
quired in the two versions (calling a function does have an
overhead, so not calling a function tends to be cheaper)?

And if you're done with that it might be time for trying to find
out why there's a memcpy() and a memmove() function - one of them
allowing copies between overlapping mamory regions and the other
not doing that. You may find that this isn't about stupid or not
stupid but about the capabilities of the underlying hardware and
thus efficiency, at least on a number of CPUs.

Regards, Jens

Francois Grieu · Nov 23, 2010

(I had a go at implementing a fast version of the OP's Trim function; it was
about a magnitude faster, but the recursive content was so obviously
contrived, that it wasn't worth bothering with, causing a 3x slowdown over a
version using iteration. Not a good example.)

The OP's algorithm has a worst case run time O(n^2) where n is the
length of the string, when the straight algorithm runs O(n).
The slowdown due to recursion in this case is by more than a
constant factor.

Francois Grieu

Francois Grieu · Nov 23, 2010

Francois Grieu said:
Francois Grieu said:

[snip]

I note that the archetypal learning use case for recursion are
factorials and the "Tower of Hanoi", although both happen to have
a non-recursive solution which is much simpler:
http://en.wikipedia.org/wiki/Tower_of_Hanoi#Non-recursive_solution
This should be a giveaway that recursion is seldom useful, since
instructors fail to find a good use case for it.

Click to expand...

I think the reasoning here is bad. These examples are chosen
because they usually are already familiar to students outside the
realm of computing. More specifically, they are chosen to *explain*
recursion, not to *motivate* recursion. I think they do a good job
in this respect -- in both cases the recursive solutions are simple
enough and easy to understand. Calling the referenced non-recursive
tower-of-hanoi algorithm simpler, let alone much simpler, is highly
misleading; the *steps* may be simpler, but seeing how and why the
algorithm works certainly is not. Conversely, for a recursive
version it's easy to see at a glance both how it works and why, and
that's the point of the exercise - to present a simple example where
recursion can be understood. Deeper understanding, about when it
might be good to use recursion, and when not, comes only later.

I do acknowledge that
- the most natural solution to Tower of Hanoi is recursive,
simple and correct.
- Tower of Hanoi is thus a good introductory problem for
recursion in a computer class.

I stand by my points, as clarified below:
- Tower of Hanoi is *not* a good use case for recursion,
unless when learning/teaching is the purpose.
- Recursion is seldom useful in practice, where non-recursive
solutions often exist and lead to simpler code.
- There are so few counterexamples to the above that recursion
is taught with problems like Tower of Hanoi that are both
artificial and with a shorter and more efficient
non-recursive algorithm.

Similarly, Mergesort, which someone in this thread sees as a
good use case of recursion, is indeed a good problem for
teaching recursion. It can also be written simply and elegantly
with a single size_t variable rather than reentrant call or
stack array; once this is grasped, the code gets shorter and
more efficient, and arguably as easy to derive.
In my view Mergesort is thus not a good use case for recursion
in the sense that implies a reentrant function.

Quicksort is an excellent problem for teaching on recursion,
because
- Applying to a simple Quicksort just what was learned from
recursion in Tower of Hanoi leads to code that superficially
works, but on many architectures will overflow the execution
stack for specially constructed input. Such failures are part
of the process of learning.
- There is a portable way out of that (using recursion for
the smallest problem only, solving that one first, and
replacing tail recursion for the rest by iteration).

Several qsort() C library source that I looked at do not use
recursion in the sense that implies a reentrant function.
In fact, having no reentrant function is a design objective
of some C libraries, for portability, testability, and
efficiency reasons.

Francois Grieu

Francois Grieu · Nov 23, 2010

To change my recursive 8 queens program into a 20 queens
program, all I have to do is change this line:
#define SQUARES 8
to
#define SQUARES 20
I think it would be a lot more work than that,
to change an iterative version.

Depends on the iterative code you are starting from.
A short, readable, conformant C program can count solutions
to the SQUARES queens problem (or show one), with the above
property, and no function recursively called.
I'm not considering using the macro processor to produce
nested loops, but using an array to implement a stack
SQUARES deep.

Francois Grieu

BartC · Nov 23, 2010

Francois Grieu said:
The OP's algorithm has a worst case run time O(n^2) where n is the
length of the string, when the straight algorithm runs O(n).
The slowdown due to recursion in this case is by more than a
constant factor.

My version of it called strlen() and memmove() just once. Recursion was used
just for counting spaces, and that was linear time I think, the same as
iteration.

lawrence.jones · Nov 24, 2010

Ian Collins said:
Yes, it took a while for the penny to drop for me as well. Maybe

"Provided that a library function can be declared without reference to
any type defined in the standard header where the function is declared"

Would be better wording.

I think it should be more like "...defined only in the standard
header...". If the type is also defined in a different header, then
including that header should suffice. In short, I'm pretty sure the
intent was that declaring a function by hand should work as long as you
can do it "correctly". But it's hard to say "correctly" in standardese.

Tim Rentsch · Nov 24, 2010

Francois Grieu said:
Francois Grieu said:

[snip]

I note that the archetypal learning use case for recursion are
factorials and the "Tower of Hanoi", although both happen to have
a non-recursive solution which is much simpler:
http://en.wikipedia.org/wiki/Tower_of_Hanoi#Non-recursive_solution
This should be a giveaway that recursion is seldom useful, since
instructors fail to find a good use case for it.

Click to expand...

I think the reasoning here is bad. These examples are chosen
because they usually are already familiar to students outside the
realm of computing. More specifically, they are chosen to *explain*
recursion, not to *motivate* recursion. I think they do a good job
in this respect -- in both cases the recursive solutions are simple
enough and easy to understand. Calling the referenced non-recursive
tower-of-hanoi algorithm simpler, let alone much simpler, is highly
misleading; the *steps* may be simpler, but seeing how and why the
algorithm works certainly is not. Conversely, for a recursive
version it's easy to see at a glance both how it works and why, and
that's the point of the exercise - to present a simple example where
recursion can be understood. Deeper understanding, about when it
might be good to use recursion, and when not, comes only later.

Click to expand...

I do acknowledge that
- the most natural solution to Tower of Hanoi is recursive,
simple and correct.
- Tower of Hanoi is thus a good introductory problem for
recursion in a computer class.

Okay, that's all good.

I stand by my points, as clarified below:
- Tower of Hanoi is *not* a good use case for recursion,
unless when learning/teaching is the purpose.

I think that depends on what one is trying to optimize. It's
probably faster to write a recursive version that is clearly
correct, and if time-to-market is a key concern then a recursive
solution is probably the right choice here (assuming of course
that ToH is a necessary part of whatever software is being
delivered).

- Recursion is seldom useful in practice, where non-recursive
solutions often exist and lead to simpler code.

I'll just flat out disagree here. Recursion is often useful in
practice, and even when non-recursive solutions exist there
frequently are recursive solutions that are easier to understand,
or result in faster code, or both.

- There are so few counterexamples to the above that recursion
is taught with problems like Tower of Hanoi that are both
artificial and with a shorter and more efficient
non-recursive algorithm.

That's a bogus conclusion. Tower of Hanoi and other simple
example problems are used to illustrate recursion because the
problems are already familiar to students, not because of any
shortage of examples where a recursive solution is viable.

Similarly, Mergesort, which someone in this thread sees as a
good use case of recursion, is indeed a good problem for
teaching recursion.
Okay.

It can also be written simply and elegantly
with a single size_t variable rather than reentrant call or
stack array;

Certainly mergesort can be written iteratively using O(1) storage.
The idea that this can be done "simply and elegantly" is (a)
nebulously vague, (b) completely subjective, (c) an unsupported
opinion, (d) at best debatable, or (e) several of the above.

once this is grasped, the code gets shorter and
more efficient,

That's not consistent with the timing results pete reported.
Do you have any kind of supporting evidence to offer?

and arguably as easy to derive.

If you offer up both a program and an informal correctness proof
I might agree, but as an unsupported statement I don't buy it.
I have programmed myself both a recursive mergesort and an
iterative mergesort, and understood both well, and the iterative
version was definitely harder to get right than the recursive
version. If you want to convince people otherwise you should
be prepared to provide a counterexample.

In my view Mergesort is thus not a good use case for recursion
in the sense that implies a reentrant function.

In my view this conclusion rests on some faulty premises,
as explained above.

Quicksort is an excellent problem for teaching on recursion,
because
- Applying to a simple Quicksort just what was learned from
recursion in Tower of Hanoi leads to code that superficially
works, but on many architectures will overflow the execution
stack for specially constructed input. Such failures are part
of the process of learning.
- There is a portable way out of that (using recursion for
the smallest problem only, solving that one first, and
replacing tail recursion for the rest by iteration).

What you're saying is that quicksort is good mostly for teaching
some of the *problems* associated with recursion, and one way of
getting around such problems. Forgive my cynicism, but this sounds
like you're subtly promoting an anti-recursion bias in the students.
Of course, I agree that it's important to understand the problem of
overly deep recursion, but this can be illustrated more easily with
a recursive function for measuring the length of a list, which in
turn can be used to show how to turn a non-tail-recursive function
into a function that is tail recursive (and thus easily iterizable
if one chooses to do that).

Several qsort() C library source that I looked at do not use
recursion in the sense that implies a reentrant function.
In fact, having no reentrant function is a design objective
of some C libraries, for portability, testability, and
efficiency reasons.

As a counterpoint, the paper "Engineering a Sort Function", one of
the most widely cited papers about how to do a good job implementing
qsort, embraces recursion down *both* branches of sorting subarrays.
Quoting from the paper:

We have not adopted many customary improvements. By
managing a private stack we could cut stack space from
nearly 20 variables to 2 per level. By sorting the larger
side of the partition last and eliminating tail recursion,
we could reduce worst-case stack growth from linear in /n/
to logarithmic. Neither trick is worth the effort.

Personally I'm more likely to be convinced by Jon Bentley and Doug
McIlroy than by the (presumed) statements of anonymous developers
working on some unknown C libraries.

Tim Rentsch · Nov 24, 2010

I think it should be more like "...defined only in the standard
header...". If the type is also defined in a different header, then
including that header should suffice. In short, I'm pretty sure the
intent was that declaring a function by hand should work as long as you
can do it "correctly". But it's hard to say "correctly" in standardese.

I've read this opinion twice now, and it surprised me both times.
The plain reading is clear enough, and seems to make no effort in
the "correctly" direction. It seems just as likely, or perhaps
even more likely, that the intention was to allow declarations of
functions that existed before the Standard, but any function that
needed a "new" type (ie, a type defined in some system header)
must be declared by #include'ing the appropriate header. This
would allow old code to work K&R style, but new code must do the
right thing. Doesn't that seem like a more plausible explanation?

Keith Thompson · Nov 24, 2010

Tim Rentsch said:
I've read this opinion twice now, and it surprised me both times.
The plain reading is clear enough, and seems to make no effort in
the "correctly" direction. It seems just as likely, or perhaps
even more likely, that the intention was to allow declarations of
functions that existed before the Standard, but any function that
needed a "new" type (ie, a type defined in some system header)
must be declared by #include'ing the appropriate header. This
would allow old code to work K&R style, but new code must do the
right thing. Doesn't that seem like a more plausible explanation?

Not to me.

I think the point is that the function declarations provided by headers
are just C function declarations. A declaration needs to be visible in
order to call a library function, and it doesn't matter to the compiler
whether that declaration comes from a #included header or is part of the
source file being compiled.

In what plausible way could this program:

#include <stddef.h>
int main(void)
{
size_t strlen(const char *s);
return strlen("");
}

fail in a real implementation? The standard, as currently worded,
fails to define its behavior, but IMHO a conceptually *simpler*
rule would cover this program.

Again, that's not to say that writing such code is a good idea,
but the rule in the standard does reinforce the idea that standard
functions like strlen() can be treated like ordinary functions.

Nick · Nov 24, 2010

Francois Grieu said:
Similarly, Mergesort, which someone in this thread sees as a
good use case of recursion, is indeed a good problem for
teaching recursion. It can also be written simply and elegantly
with a single size_t variable rather than reentrant call or
stack array; once this is grasped, the code gets shorter and
more efficient, and arguably as easy to derive.
In my view Mergesort is thus not a good use case for recursion
in the sense that implies a reentrant function.

To help me follow this point, could you re-write this without the
recursion -
http://groups.google.com/group/comp.lang.c/msg/16d6d06ec56651e3?hl=en

It's not obvious to me how to do it while keeping it simple and elegant.

lawrence.jones · Nov 24, 2010

Keith Thompson said:
I think the point is that the function declarations provided by headers
are just C function declarations. A declaration needs to be visible in
order to call a library function, and it doesn't matter to the compiler
whether that declaration comes from a #included header or is part of the
source file being compiled.
Exactly.

In what plausible way could this program:

#include <stddef.h>
int main(void)
{
size_t strlen(const char *s);
return strlen("");
}

fail in a real implementation? The standard, as currently worded,
fails to define its behavior, but IMHO a conceptually *simpler*
rule would cover this program.

The difficulty is that declaring strlen as:

unsigned strlen(const char *s);

is perfectly correct on a platform where size_t is a synonym for
unsigned int, but is not correct in general. I'm struggling to come
up with words that unambiguously allow your declaration but not mine.

Keith Thompson · Nov 25, 2010

The difficulty is that declaring strlen as:

unsigned strlen(const char *s);

is perfectly correct on a platform where size_t is a synonym for
unsigned int, but is not correct in general. I'm struggling to come
up with words that unambiguously allow your declaration but not mine.

I think that "unsigned strlen(const char *s)" *should* be valid
if size_t and unsigned are the same type. The declaration is ugly
and non-portable, but the compiler shouldn't be able to reject it
if it's compatible with the standard form. If we have "typedef
unsigned size_t;", the compiler shouldn't be able to distinguish
between unsigned and size_t.

The current wording is a little weak on that point, even without
covering functions that are declared in one header and depend on
types "defined" in another. It says:

Provided that a library function can be declared without
reference to any type defined in a header, it is also permissible
to declare the function and use it without including its
associated header.

It doesn't say you have to declare it *correctly*. One possible
reading is that:

double strlen(int s);
double x = strlen(42);

is permissible.

Loosely, if you declare a standard library function yourself, the
declaration has to specify the same type as the actual function.
The phrase "same type" probably needs tweaking ("compatible type",
maybe?). If it doesn't, the behavior is undefined. (Does the
declaration itself introduce UB, or just a call?)

Basically, if a user declaration of strlen would be valid in
the presence of #include <string.h>, it's valid in its absence;
if it would be invalid, then it's an incompatible declaration
and the behavior of any call that refers to the user declaration
is undefined.

Yet another reason that it's a bad idea to depend on this is that
standard function declarations can change from one version of
the standard to another. A program that gets the declaration for
strcpy() by #include'ing <string.h> can easily be portable across
C90, C99, and future standards; a program that tries to declare
it itself can get hung up on the "restrict" qualifiers that were
added in C99.

Keith Thompson · Nov 25, 2010

Keith Thompson said:
Provided that a library function can be declared without
reference to any type defined in a header, it is also permissible
to declare the function and use it without including its
associated header.

It doesn't say you have to declare it *correctly*. One possible
reading is that:

double strlen(int s);
double x = strlen(42);

is permissible.

[...]

On second thought, I think that's covered by C99 6.5.2.2p9:

If the function is defined with a type that is not compatible
with the type (of the expression) pointed to by the expression
that denotes the called function, the behavior is undefined.

I'm assuming here that strlen() has a "definition" somewhere.
On the other hand, it might be written in a language other than
C, or implemented via pure compiler magic. (Note that I'm not
referring to the permission to define strlen() as a macro; in that
case, any declaration is irrelevant unless the caller bypasses the
macro definition.)

So perhaps having 7.1.4p2 refer to 6.5.2.2, and state that all
this stuff works as if the standard library functions have actual
C definitions (is that already stated somewhere?), would cover all
the bases.

Mark Wooding · Nov 25, 2010

Nick said:
To help me follow this point, could you re-write this without the
recursion -
http://groups.google.com/group/comp.lang.c/msg/16d6d06ec56651e3?hl=en

This is Chris Torek's recursive linked-list mergesort.

It's not obvious to me how to do it while keeping it simple and
elegant.

Chris's algorithm is a `top-down' mergesort. It's not especially easy
to make such a beast nonrecursive without an explicit stack. But
there's a very close cousin, a `bottom-up' mergesort, which is easy to
make nonrecursive.

The difference is as follows.

* A top-down mergesort splits its input into two sublists of similar
length, recursively sorts the two halves, and merges them together.
So if you start with N items, then you split them into two lists of
N/2 items at the top level, four lists of N/4 items at the next, and
so on until you have only trivial sublists.

* A bottom-up mergesort builds up islands of sortedness. It starts by
cutting the input into pairs of sublists of (say) length 1 which are
obviously sorted, merging the pairs, and concatenating the merged
results. Then every pair of items is sorted; so we cut the list
into pairs of lists of length 2, merging and concatenating.

This latter is quite easy to implement nonrecursively: all you need to
remember is how long the sublists you're meant to work with are. I
think the resulting code is fairly elegant, but I'll let you be the
final judge. Note the triple indirection in `mergelists': I think this
is one of the more obvious natural applications for a triply-indirected
pointer.

/* Merge together two sorted lists A and B into a single sorted list.
* The sort order is determined by the CMP function; if CMP decides that two
* elements are equivalent then prefer the element from list A. On exit,
* **TAILP is set to point to the start of the merged list, and *TAILP is the
* address of the `next' link of the last node of the merged list, so that
* further stuff can be appended later: several merged lists can be
* accumulated simply by passing the same TAILP argument repeatedly to
* `mergelists'; to terminate the list finally, set **TAILP = 0.
*/
static void mergelists(struct list ***tailp,
struct list *a, struct list *b,
int (*cmp)(const struct list *,
const struct list *))
{
struct list **tail = *tailp;

/* Choose the lesser item from the two list heads and advance. Continue
* until we run out of one of the lists; when done; set A to the (possibly)
* nonempty list.
*/
for (;

{
if (!b) break;
else if (!a) { a = b; break; }
else if (cmp(a, b) <= 0) { *tail = a; tail = &a->next; a = a->next; }
else { *tail = b; tail = &b->next; b = b->next; }
}

/* Append the (possibly) nonempty list onto the accumulated output, and
* find the end of it.
*/
*tail = a;
if (a) {
while (a->next) a = a->next;
tail = &a->next;
}

/* Tell the caller where the end of the list is. */
*tailp = tail;
}

/* Sort a list L using in-place list mergesort.
* The CMP function is a preorder, outputting <0, 0, or >0 according to
* whether the payload of its left argument is less than, equivalent to, or
* greater than, its right argument; `equivalence' must indeed be an
* equivalence relation, and CMP must be transitive. The sort is stable: two
* nodes with equivalent payloads remain in the same relative order.
*/
static struct list *mergesort(struct list *l,
int (*cmp)(const struct list *,
const struct list *))
{
size_t i, n = 1;
struct list *head, **tail;
struct list *a, *b, **p;
int done = 0;

/* An empty input is a nasty edge case. Dispose of it in advance. */
if (!l) return (0);

/* Main loop. We cut the list into pairs of sublists of length N, and
* merge each pair together, concatenating the results to form a new list
* in which each consecutive run of 2 N elements is sorted. Then we double
* N. When N is longer than the list, we're done.
*/
do {

/* Collect the merged sublist pairs. */
tail = &head;

/* While there is more input list to consume, cut out two sublists of
* length N.
*/
while (l) {

/* Find the start and end of the first sublist. Note that the extra
* level of indiration in P allows it to lag one node behind, so that
* we can terminate the sublist when we've counted enough.
*/
p = &l; a = *p;
for (i = n; i && *p; i--) p = &(*p)->next;

/* Find the start and end of the second sublist. */
b = *p; *p = 0; p = &b;
for (i = n; i && *p; i--) p = &(*p)->next;
l = *p; *p = 0;

/* If we've hit the end of the input, and haven't produced any output,
* then our two sublists must contain all of the elements. Therefore,
* when we've merged them, we'll have sorted the entire input, and
* we'll be finished.
*/
if (!l && tail == &head) done = 1;

/* Merge the sublists, and append to our output. The A list consists
* of elements earlier than the B list, and `mergelists' will preserve
* the relative order of equivalent nodes, so the sort is stable.
*/
mergelists(&tail, a, b, cmp);
}

/* Terminate the output list, and make it be the new input. */
*tail = 0;
l = head;
n <<= 1;
} while (!done);

/* Finished. */
return (l);
}

-- [mdw]

Fibonacci	0	May 13, 2023
C pipe	1	Dec 9, 2021
Adding adressing of IPv6 to program	1	Feb 16, 2023
Please help with C programming to save GPS reception data in Raspberry Pi.	0	Dec 8, 2022
STRING - Remove small letters from string	1	Jan 20, 2023
I can't figure out why my program is not working.	4	May 23, 2024
C language. work with text	3	Dec 10, 2021
Can someone tell me what's wrong with this question on StackOverflow?	0	Aug 19, 2023

Strange bug

Ian Collins

Tim Rentsch

Keith Thompson

J. J. Farrell

Ian Collins

Eric Sosman

Jens Thoms Toerring

Francois Grieu

Francois Grieu

Francois Grieu

BartC

lawrence.jones

Tim Rentsch

Tim Rentsch

Keith Thompson

Nick

lawrence.jones

Keith Thompson

Keith Thompson

Mark Wooding

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads