is it tail recursion

S

Stephen Horne

ok explain me this one .
int factorial (int n)
{
if (n==0) return 1;
else
factorail(n-1)*n;
}

why this recursion is tail recursion and harmonic is not.

This is not tail recursion. There are tail recursive implementations
of factorial, but this is not one of them.

The following is IIRC the standard tail-recursive approach to the
factorial...


int factorial_internal (int p_So_Far, int n)
{
if (n == 0) return 1;
return factorial_internal (p_So_Far * n, n - 1);
}

int factorial (int n)
{
return factorial_internal (1, n);
}


It's a bit crappy to look at, but it isn't a real-world example of
where tail recursion is a good idea in my view - just a common
illustration of how a non-tail-recursive function can be translated
into a tail-recursive form by replacing working variables with
parameters. You can find this kind of thing in Haskell and ML
tutorials.

My personal view is that if you need to do this kind of thing, the
explicitly iterative form is a better choice than the tail recursive
form anyway.


BTW - I have given some thought to Pete Beckers assertion that C and
C++ compilers have been optimising recursion into iteration for some
time. It's actually quite a bit cleverer than it sounds on the
surface. A naive conversion - or rather a naive test for whether the
conversion is valid - could easily go badly wrong.

When working with tree data structures, a common approach I use is to
plan the modification before doing it, making node splitting and
rebalancing decisions and perhaps allocating memory for new nodes and
so on, but not modifying the existing nodes. This means I don't have
serious problems in cases where, e.g., it turns out I'm unable to
allocate enough memory for new nodes. I can still leave the tree in an
unmodified and therefore self-consistent state because I detect the
problem while planning. That said, the planning itself has to be
lightweight, at least when the data structure is in memory - yet the
plan has to be a variable size data structure since I cannot limit the
depth of the tree. Library code might be in use for a very long time.

My favorite approach is basically to build a plan as a linked list on
the stack, with each plan node being a local variable in one layer of
the recursion. When the plan is completed, it is either executed or
cleaned up by iterating through the list before the recursion unwinds.
This keeps the plan off the heap, and so long as the parameters to the
recursive call are handled carefully (generally limited to a single
pointer-to-plan-node) it's pretty efficient.

Trouble is, it will *look* tail recursive until you examine the
detail. The code *is* tail recursive, but there's a serious problem
with stack use. After all, each one of those plan nodes is just a
local variable at one layer of the recursion. They all have to be
separate. A naive conversion to an iterative implementation would mean
that all those plan nodes are found at the same location on the stack,
which is clearly invalid.


There's not necessarily any way for the compiler to know that that
looks-like-tail-recursion cannot be naively optimised. Data flow
analysis cannot do the job, in general, because the data flow can pass
through other nested function calls. For example...


void recursive (plan_node* p_prev)
{
if (planning_done (p_prev))
{
execute_plan (p_prev);
return;
}

plan_node l_next;

l_next.m_link = p_prev;
// other plan setup here

recursive (do_stuff (&l_next));
}


That is a pretty good representation of the structure of what I'd do
for a tree modification (particularly working leaf up, such as an
insert into a particular leaf node). The only oddity is the do_stuff
function - I can't think of a good reason for it in this case, but the
compiler cannot make assumptions about the limits of programmers
imaginations. It's there to make the point about data flow, though. In
this case, the compiler cannot know if the parameter to recursive will
be a pointer to l_next or not.


Of course this is just a variation on common dangling pointer issues
where there are pointers to local variables in functions that have
exited. The difference is that in this case, the function *hasn't*
exited, at least according to the promises made by the C and C++
languages.

Just about the only optimisation of recursion that can be done safely
in C++ is to spot a sequence in the generated assembler where a call
is followed pretty much immediately by a ret. This isn't a tail
recursion optimisation because it's much more general than even
recursion - many call-then-ret sequences can be optimised the same way
irrespective of whether they are recursive.

Even this needs some care WRT stack frame handling - you cannot assume
that the current functions locals will not be referenced somehow by
the called function, so in a sense you have to grow a larger stack
frame for the function with each level of recursion - only a part of
which is considered current, but the whole lot being popped when an
unoptimised ret is finally encountered.

That means that while stack growth may be reduced a little, it cannot
normally be completely avoided, and there is a cost even for the
reduction - a more complex stack-frame setup for the optimised
function call.

In other words, the promises made by claims of tail recursion
elimination are not met by this call optimisation approach.

This kind of thing is not an issue in Scheme since pretty much
everything is stored in dynamically allocated memory and garbage
collected, at least in principle. The tail recursion elimination is
guaranteed, whereas the stack storage optimisation is a matter for the
compiler to sort out, and Scheme compilers may fail to use stack
memory in cases where a C++ programmer would use a local variable.


Therefore, full tail recursion elimination of the kind done in Scheme
and other functional languages simply isn't viable in C or C++. The
nature of the languages means that the compiler can only determine
validity in certain special cases. Of course the qualification of
exactly what can and can't be handled probably isn't documented very
well for most compilers because the detail is probably rather complex,
because it probably changes between releases, and because the whole
point of automatic optimisation is to leave it to the optimiser and
not fuss about it.


Now, having made that assertion with absolute confidence, I'll sit
back and wait to be told just how wrong I am ;-)
 
S

Stephen Horne

Just about the only optimisation of recursion that can be done safely
in C++ is to spot a sequence in the generated assembler where a call
is followed pretty much immediately by a ret. This isn't a tail
recursion optimisation because it's much more general than even
recursion - many call-then-ret sequences can be optimised the same way
irrespective of whether they are recursive.

Whoops - this is exactly what functional types call tail recursion (or
more accurately, tail call) elimination. C and C++ are doing nothing
different in this respect.

I'll get back to you later with the excuse for why it wasn't stupid to
say that - it's going to take a fair bit of creative thinking ;-)


The point about stack use and references to local variables stands,
however. C and C++ compilers have to be pessimistic about validity in
cases where most functional languages guarantee to optimise tail
calls.
 
S

Stephen Horne

What happens if you write the return statement as:

return n * fac(n-1);

Don't focus so much on the ordering in the source code. The order of
execution is the important thing. You can't do the multiplication
until you know what values you are going to multiply - ie until after
the recursive call has completed.

This is more immediately obvious in Scheme, where the order of
execution and the order of the text are generally much more closely
related (in an inverse kind of way). It's even more immediately
obvious in Forth, though I have no idea whether Forth compilers do
tail call elimination.
 
S

Stephen Horne

On Nov 4, 4:04 pm, (e-mail address removed) wrote:

The only sure way in some functional languages, probably.
There's no sure way in C++. The way to do this in C++ is to
write whatever is most natural, without worrying about whether
the compiler will find tail recursion or not, and then, if the
profiler shows it's too slow, optimize manually (probably
eliminating the recursion).

Even in functional languages, it's easy to make mistakes - to think
something is true tail recursion when it is not, and get a broken
program (which still probably passes unit testing) as a result.

Which is why I think tail call optimisation should be explicitly
requested when needed. It's not just an optimisation - it's a
correctness issue.

In C++, though, not a big deal - the assumption of tail call
optimisation isn't there and, as you say, the idioms are different.
 
S

Stephen Horne

Sigh. The text that you snipped described an optimization that certain
compilers did, and my question was specifically about how a small
variation in the code affected that optimization.

As you say, sigh.

Yes. You swapped the order in which two parameters to the
multiplication operator were written. That might change the order in
which those two parameters are evaluated, but it won't change when the
multiplication is done.

Multiplication is commutative. In a simple example like this, if
reordering the arguments could enable more optimisations, the compiler
would probably already be doing it. If there were other function calls
(not just the directly recursive one), maybe not, as it might need to
be pessimistic about possible side-effects of expression evaluation.
But in this case, I seriously doubt that that's an issue.

So... Don't focus so much on the ordering in the source code, etc etc.
 
S

Stephen Horne

OTOH, on 32bit architectures 'int' will overflow at 16!,
which is long before you run into stack problems, so on
these architectures I don't see a need to not to apply
the use the most obvious implementation of factorial. it
would seem premature optimization to me.

The basic issue is that you're unlikely to use a simple factorial for
32 bit integers anyway.

The factorial is mostly a building block for defining other, more
directly useful functions. An obvious example would be counting
combinations or permutations. Those more useful functions often have
large numbers of cases where the factorial goes out of bounds, but the
needed result doesn't - often because one factorial is divided by
another. An iterative solution that does the whole job at once is both
more efficient and less prone to overflow.

That is, instead of...

int somefunc (int p1, int p2)
{
return factorial (p1) / factorial (p2);
}

You'd more likely have...

int somefunc (int p1, int p2)
{
int temp = 1;
for (int i = p2+1; i <= p1; ++i) temp *= i;
return temp;
}

If you are doing this kind of thing for anything serious, of course,
you're probably using a big integer library. Still quite hard to
overflow the stack using recursion, but possible - the maximum size of
the stack isn't all that big in general.


Besides, the real point of the factorial function is only to
illustrate a principle. Real world functions that require tail
recursion elimination would generally not have that problem.

Another common illustration from the functional world would be a
simple list summary such as an 'any' function...

struct bool_list_node
{
bool_list_node* m_link;
bool m_value;
};

bool any (bool_list_node* p)
{
if (p == 0) return false;
if (p->m_value) return true;

return any (p->m_link);
}

The tail recursion here has no numeric overflow issues, but could
easily overrun the stack. Fortunately, this is not the kind of code
that C++ programmers would normally write.

And in truth, you wouldn't normally write that in a functional
language either - you'd use a library 'any' function, or maybe a
'fold' or 'reduce' function. It's still just a simple illustration of
a principle.
 
R

robertwessel2

It's even more immediately
obvious in Forth, though I have no idea whether Forth compilers do
tail call elimination.


It's reasonably common in Forth compilers that generate direct
threaded code in batch mode. It's not so much tail recursion
elimination, but rather the optimization of tail calls in general (IOW
replacing the final CALL/RET in a word with a JMP), if the compiler
can tell that the called words don't care about the contents of the
return stack at entry. Since the return stack is exposed in Forth,
the called word can look to see what's on top of the return stack,
which will obviously be different in the CALL/RET and JMP cases.
Fortunately the vast majority of words do not, and only use the return
stack for additional local data storage - loop counters are
particularly popular.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,167
Messages
2,570,913
Members
47,455
Latest member
Delilah Code

Latest Threads

Top