Algorithms in C - Robert Sedgewick

A

arnuld

This code is from chapter 2 of Algorithms in C:

Sequential Search:

int search( int a[], int v, int l, int r)
{
int i;
for(i = 1; i <= r; i++)
if(v == a) return i;

return -1;
}




Binary Search:

int search( int a[], int v, int l, int r )
{
while( r >= 1 )
{
int m = (l + r)/2;
if(v == a[m]) return m;
if(v < a[m]) r = m - 1;
else l = m + 1;
}

return -1;
}




Now author explains the average cost of S-Search as:

(1 + 2 + 3 + .... + N)/N = (N + 1)/2

and the worst-case cost of B-Search as:

T(Base-N) <= T(Base-N/2) + 1 for N >=2 and T(Base-1) = 1




I am going to ask a very simple question. BY just looking at the code of
both searches, any man with brains can simply conclude that B-Search, on
a sorted input, is always cheaper as compared to S-Search. If it is so
clear form the code, why do I h ave to go through the incomprehensible
junk of N, T, C N(BaseN) etc etc. The book is about Algorithms in C but
when I started to read, the author is teaching me whole recurrence series
(which of course is an obtuse thing to understand):

C(Base-N) = C(Base-(N-1)) + N for N >= 2 with C(Base-1) = 1
C(Base-N) = C(Base-(N-2)) + (N - 1) + N
= C(Base-(N-3)) + (N - 2) + (N - 1) + N
= ..............
= ..............
= C(Base-1) + 2 + ... (N - 2) + (N - 1) + N
= 1 + 2 + 3 + .. + (N - 2) + (N - 1) + N
= N(N+1)/2


Is it necessary to understand whole exponential series and recurrences
and permutations and combinations and logarithmic series before you even
start to comprehend a search or sorting or any algorithm at all ? If that
is so why will I use Sedgwick, I can read Hall & Knight's "Higher
Algebra".

My experience says its enough to know that N is less than N^2 which is
less than 2^N and logN is least of them all and if B-Search takes logN on
a sorted input and S-Search takes N, then we can go with B-Search based
algorithm, provided you have estimated the cost of sorting first.

So should I go with reading all the maths or just go to some random
chapter and start working on an algorithm in C ?
 
E

Eric Sosman

arnuld said:
This code is from chapter 2 of Algorithms in C:

Sequential Search:

int search( int a[], int v, int l, int r)
{
int i;
for(i = 1; i <= r; i++)
if(v == a) return i;

return -1;
}


I don't have the book, but this looks very un-C-like.
Ordinarily, one would expect element [0] of an array to
hold something useful, and might not expect element [r] to
exist at all. Is this offered as real C code, or just as
the first stage in a transliteration from a language with
[1]-based arrays, possibly Fortran?
Binary Search:

int search( int a[], int v, int l, int r )
{
while( r >= 1 )
{
int m = (l + r)/2;
if(v == a[m]) return m;
if(v < a[m]) r = m - 1;
else l = m + 1;
}

return -1;
}

The termination criterion looks completely wrong. Are
you sure you've transcribed the code correctly?
Now author explains the average cost of S-Search as:

(1 + 2 + 3 + .... + N)/N = (N + 1)/2

and the worst-case cost of B-Search as:

T(Base-N) <= T(Base-N/2) + 1 for N >=2 and T(Base-1) = 1
I am going to ask a very simple question. BY just looking at the code of
both searches, any man with brains can simply conclude that B-Search, on
a sorted input, is always cheaper as compared to S-Search.

Yes, any man with brains could conclude that, if his brains
didn't function very well. The sequential search will take more
steps than the binary search, it's true -- but the "step" of one
is simpler (cheaper) than the "step" of the other. Look at the
sequential "step:" two comparisons and an addition. Look at
the binary "step:" three comparisons, two additions, and a
division.
Is it necessary to understand whole exponential series and recurrences
and permutations and combinations and logarithmic series before you even
start to comprehend a search or sorting or any algorithm at all ?

No: You can choose to learn things by rote and not bother
with understanding them. Carry your grimoire everywhere, and
hope you never encounter a situation that's just a little bit
different, something that isn't written down for you in exactly
the form you encounter it.

Or you can arm yourself with the ability to do a little
analysis unaided, so you won't be thrown for a loop if things
are just a little different from those you saw in the book. For
example, consider applying the binary search to a sorted array
containing some equal elements, the task being to find the low
and high indices of the span of elements equal to `v', or to
determine that `v' is not present. There are at least three
obvious ways to approach the task; how will you choose one?
My experience says its enough to know that N is less than N^2 which is
less than 2^N and logN is least of them all and if B-Search takes logN on
a sorted input and S-Search takes N, then we can go with B-Search based
algorithm, provided you have estimated the cost of sorting first.

Have you ever noticed that practical Quicksort implementations
often switch to insertion sort or something of the kind when the
sub-files get short enough? Why would they abandon the O(N*logN)
partition-exchange method in favor of an O(N*N) method? Are they
just being perverse -- or is there something you've failed to
notice?
So should I go with reading all the maths or just go to some random
chapter and start working on an algorithm in C ?

It's your choice: memorize, or comprehend. Each approach has
its advantages and disadvantages.
 
C

Charles Richmond

arnuld said:
[snip...] [snip...] [snip...]

I am going to ask a very simple question. BY just looking at the code of
both searches, any man with brains can simply conclude that B-Search, on
a sorted input, is always cheaper as compared to S-Search.

But a binary search is *not* always cheaper than a sequential
search. For just a few items, sequential search can be cheaper.
There is a "breakpoint" where the binary search becomes cheaper.
This may be somewhere between five and nine items.


--
+----------------------------------------+
| Charles and Francis Richmond |
| |
| plano dot net at aquaporin4 dot com |
+----------------------------------------+
 
D

Default User

Eric said:
arnuld said:
This code is from chapter 2 of Algorithms in C:

Sequential Search:

int search( int a[], int v, int l, int r)
{
int i;
for(i = 1; i <= r; i++)
if(v == a) return i;

return -1;
}


I don't have the book, but this looks very un-C-like.
Ordinarily, one would expect element [0] of an array to
hold something useful, and might not expect element [r] to
exist at all. Is this offered as real C code, or just as
the first stage in a transliteration from a language with
[1]-based arrays, possibly Fortran?


As I recall, the original book was written in Pascal, and Sedgewick
used 1-based arrays for the C version as well.

The code is on his web site:

<http://www.cs.princeton.edu/~rs/Algs3.c1-4/code.txt>



Brian
 
J

Julienne Walker

arnuld said:
Binary Search:
int search( int a[], int v, int l, int r )
{
   while( r >= 1 )
     {
        int m = (l + r)/2;
        if(v == a[m]) return m;
        if(v < a[m]) r = m - 1;
        else l = m + 1;
      }
   return -1;
}

     The termination criterion looks completely wrong.  Are
you sure you've transcribed the code correctly?

While I don't have that book on hand at the moment to verify the
transcription, I can safely say that the code in it is generally quite
poor, in terms of readability, style, and overall correctness. I spent
more time fixing the code than learning from it, though otherwise the
book is excellent and I still recommend it with a warning not to rely
on the code.
 
J

James Kuyper

arnuld said:
This code is from chapter 2 of Algorithms in C:

Sequential Search:

int search( int a[], int v, int l, int r)
{
int i;
for(i = 1; i <= r; i++)
if(v == a) return i;

return -1;
}


That looks like Fortran-inspired C code. I'm not personally familiar
with this book; it might have some very useful information about
algorithms. However, I recommend not paying too much attention to the C
code; you're better off learning a more idiomatic C programming style.

....
Now author explains the average cost of S-Search as:

(1 + 2 + 3 + .... + N)/N = (N + 1)/2

and the worst-case cost of B-Search as:

T(Base-N) <= T(Base-N/2) + 1 for N >=2 and T(Base-1) = 1




I am going to ask a very simple question. BY just looking at the code of
both searches, any man with brains can simply conclude that B-Search, on
a sorted input, is always cheaper as compared to S-Search.

It is not always cheaper, and in some cases it can be more expensive.

The first issue is the frequency with which items on the list are
searched for. If some items in a list are searched for a lot more
frequently than others, and the list is sorted in decreasing order by
frequency of use, it's quite possible for the linear search to be as
fast or even faster than a binary search. In order for this to be true,
the chance that a given item will be searched for has to be very high
for the first few items in the list, and that chance has to drop very
rapidly with each of the first several items on the list. Lists with
those characteristics are not commonplace, but neither are they as rare
as you might think.

The second issue is the length of the list - binary search is a little
more complicated to set up than a linear search, and each binary search
step is a bit more complicated than the corresponding linear search
step. It isn't much of a difference, but For sufficiently short lists,
it's a difference that can make a linear search as fast as, if not
faster than, a binary search. How short is "sufficiently short"? That's
highly implementation-dependent. However, I'd never even dream of using
a binary search on a list with as few as 5 elements, the break-even
point is usually quite a bit higher than that.

These first case is really just an extension of the second. A list
where, most of the time, only the first few items have to be searched
through, is effectively very similar to a list that only has a few items.
 
D

Default User

Eric Sosman wrote:

Well, possibly -- I wouldn't know. But in the case at hand, it
seems more than likely that the O.P. has mis-transcribed "l" as "1",
don't you think? And I sort of wonder what other transcription
errors may have been introduced along with that one ...

That's not the case.



Brian
 
A

arnuld

I don't have the book, but this looks very un-C-like.
Ordinarily, one would expect element [0] of an array to hold something
useful, and might not expect element [r] to exist at all. Is this
offered as real C code, or just as the first stage in a transliteration
from a language with [1]-based arrays, possibly Fortran?

I guess Pascal. I got that from accu.org .


Binary Search:

int search( int a[], int v, int l, int r ) {
while( r >= 1 )
{
int m = (l + r)/2;
if(v == a[m]) return m;
if(v < a[m]) r = m - 1;
else l = m + 1;
}

return -1;
}

The termination criterion looks completely wrong. Are
you sure you've transcribed the code correctly?

Well, the condition inside while loop was typed incorrectly its (r >= l),
its l (Ell) not 1 (one). They are so confusing to me all the time. If you
look at the code I have posted in CLC and CLC++ in last 2 years, you will
never see me using l as a variable name.



No: You can choose to learn things by rote and not bother
with understanding them. Carry your grimoire everywhere, and hope you
never encounter a situation that's just a little bit different,
something that isn't written down for you in exactly the form you
encounter it.
...SNIP....
Have you ever noticed that practical Quicksort implementations
often switch to insertion sort or something of the kind when the
sub-files get short enough? Why would they abandon the O(N*logN)
partition-exchange method in favor of an O(N*N) method? Are they just
being perverse -- or is there something you've failed to notice?

Yes, I failed to notice that. Its a new insight you told me, I can never
deduce this.


It's your choice: memorize, or comprehend. Each approach has
its advantages and disadvantages.

Well, I am willing to buy a very basic book ("Higher Algebra" by Hall &
Knight) and learn Maths. Trouble is I still don't get how math is
connected with finding what kind of loop in C will take more CPU time or
if algebra fits in finding which piece of C code will make efficient use
of CPU, then from where I need to start to understand ? C^2, C(Base 2^n +
1) or log N.

Math always seemed incomprehensible to me. It looks like Newton's Law of
Gravitation to me. Before discovered the law of gravitation, apple still
fell on the ground, gravitation existed before Newton saw the apple
falling. Same I feel for math, I mean the fundamentals that math explains
exist without maths. I am not arguing, I am just trying to understand.
 
A

arnuld

Math always seemed incomprehensible to me. It looks like Newton's Law of
Gravitation to me. Before discovered the law of gravitation, apple still
fell on the ground, gravitation existed before Newton saw the apple
falling. Same I feel for math, I mean the fundamentals that math
explains exist without maths. I am not arguing, I am just trying to
understand.


What I exactly meant why I shouldn't understand those fundamentals which
exist before Math. We are writing code, C or Lisp or whatever, we are
doing programming which is a different skill from Maths.
 
K

Keith Thompson

Richard Heathfield said:
1, 2, 3, 4, 5, 6, 7, 8, 9, ..... 1048573, 1048574, 1048575.

To find the key 1 via binary search will take 20 comparisons, whereas
it will only take 1 comparison via sequential search.

Nevertheless, *on average* binary search is much, much quicker. The
average sequential search will take half a million comparisons for
the above list, whereas a binary search on the above list (assuming
it is a straightforward sorted linear list, that is) will never take
more than 20 comparisons.

The average *successful* search will take half a million comparisons.
An unsuccessful search will take a million. The number of comparisons
required for an average search depends on how many of them are
successful (and on the distribution of the values successfully
searched for).
 
P

Phil Carmody

arnuld said:
This code is from chapter 2 of Algorithms in C:

Sequential Search:

int search( int a[], int v, int l, int r)
{
int i;
for(i = 1; i <= r; i++)

Don't learn Algorithms in C.

Learn algoritms. Implement them in C later.

Don't learn Numerical Recipes in C either.

Phil
 
J

James Kuyper

arnuld wrote:
....
Well, I am willing to buy a very basic book ("Higher Algebra" by Hall &
Knight) and learn Maths. Trouble is I still don't get how math is
connected with finding what kind of loop in C will take more CPU time or
if algebra fits in finding which piece of C code will make efficient use
of CPU, then from where I need to start to understand ? C^2, C(Base 2^n +
1) or log N.

It won't help you find that loop; it will help you identify it as such,
once you've found it. Looking for an apple, without knowing how to
identify it as an apple when you find it, is pretty difficult; the same
is true of efficient algorithms.
Math always seemed incomprehensible to me. It looks like Newton's Law of
Gravitation to me. Before discovered the law of gravitation, apple still
fell on the ground, gravitation existed before Newton saw the apple
falling. Same I feel for math, I mean the fundamentals that math explains
exist without maths. I am not arguing, I am just trying to understand.

Math provides a powerful language for describing how the apple falls; it
is that language that allows an understanding of gravity that is
sufficiently detailed and precise to allow the design of complicated
systems that either depend upon or have to cope with the effects of
gravity.

Math provides an equally powerful language for describing the complexity
of algorithms; you'll never get really good at managing the complexity
of the algorithms you design without having mastered enough of that
language to describe them.

If you can't master that language (and I've known many people who had a
lot of trouble with it), you will thereby restrict your career path to
those roles where mastery of that language is not required. Those are
generally lower paid and less interesting than the ones where it is
required.
 
A

arnuld

Er, quite so. I should have pointed that out. Thank you.

By that you mean the worst case successful search (20th comparison) or a
successful search which is not worst case( e.g. 10th, 15th or even 3rd
comparison) ??
 
A

arnuld

... SNIP...
If you can't master that language (and I've known many people who had a
lot of trouble with it), you will thereby restrict your career path to
those roles where mastery of that language is not required.
Those are generally lower paid

That is not a problem.

and less interesting than the ones where it is required.

This is a very big problem. I was contemplating on looking at Knuth's
volume 1 vs Hall & Knight. I will go with Hall and Knight.
 
A

arnuld

..SNIP...
You rightly say that the fundamentals that mathematics explains exist
without mathematics. Of course they do. But mathematics has proved to be
a tremendously useful explanatory tool. And, by facilitating a clear
visualisation of those fundamentals, it can offer fresh insights.

Actually I have "fear of Maths" , fear because I think I am incapable of
having a prodigal brain like Knuth or Sawyer or Tesla. I am just an
average guy with an average brain. That algebra, integration takes life
out of me.


NOTE: But you Heathfield said once: Anyone who can wrote quality code
has definitely more than average intelligence, while replying to one of
my posts.


Now, you say in your original article that anyone with brains can
conclude that binary search on a sorted input (by which I take it you
mean a sorted list) is always cheaper than a sequential search. Well,
nice try but it isn't true. Here's the sorted list:

1, 2, 3, 4, 5, 6, 7, 8, 9, ..... 1048573, 1048574, 1048575.

To find the key 1 via binary search will take 20 comparisons, whereas it
will only take 1 comparison via sequential search.

Gosh!.. Why such a simple operation did not come to *my* mind ?
 
N

Nick Keighley

arnuld said:
This code is from chapter 2 of Algorithms in C:
Sequential Search:
int search( int a[], int v, int l, int r)
{
  int i;
  for(i = 1; i <= r; i++)

Don't learn Algorithms in C.

Learn algoritms. Implement them in C later.

yes but some code is nice. I own the corresponding
"Algorithms In Pascal" but I never bothered getting the C version.
Don't learn Numerical Recipes in C either.

another CORTRAN (C being Fortran) book
 
E

Eric Sosman

arnuld said:
arnuld wrote:
[...]
So should I go with reading all the maths or just go to some random
chapter and start working on an algorithm in C ?
It's your choice: memorize, or comprehend. Each approach has
its advantages and disadvantages.

Well, I am willing to buy a very basic book ("Higher Algebra" by Hall &
Knight) and learn Maths. Trouble is I still don't get how math is
connected with finding what kind of loop in C will take more CPU time or
if algebra fits in finding which piece of C code will make efficient use
of CPU, then from where I need to start to understand ? C^2, C(Base 2^n +
1) or log N.

Most of the analysis you'll need to get along as a programmer
won't be "higher," although some extremely deep questions occur
now and then. For example, the simple-seeming problem of how to
compute X to the N'th power for arbitrary large N is not (as far
as I know) fully solved. For "practical" N, though, there's no
difficulty -- even with only "lower" math you could always write
yourself a program to search out the shortest path to a given
non-huge N. (We'll suppose X is a big matrix or something, where
multiplications cost enough to make the search worth while.)

Recursions of the kind that you illustrated earlier arise
just about every time you turn around, because divide-and-conquer
is such a common solution technique. You wind up with a program
that solves a big problem by dividing it into smaller sub-problems,
solving them (perhaps by dividing them even further), and somehow
stitching the individual solutions together into a Grand Answer.
Merge sorting, many kinds of search procedures, some kinds of
image manipulation ... You'll run into the divide-and-conquer
approach very very often, and the answer is *not* always just
"O(N log N)," and you'd better have some understanding of why.
(True story: I once found an O(N*N*logN) sort in actual use in
production code -- yes, folks, *worse* than Bubblesort! It turned
out that the sort routine itself was O(N logN) as one would expect,
but silly programmer had given it a comparison routine that took
O(N) time *per* *call* ...) As the programmer writing the call
to that searching routine or sorting routine, you'd better have
at least some notion that an O(N) comparison would be a Bad Thing.
Math always seemed incomprehensible to me. It looks like Newton's Law of
Gravitation to me. Before discovered the law of gravitation, apple still
fell on the ground, gravitation existed before Newton saw the apple
falling. Same I feel for math, I mean the fundamentals that math explains
exist without maths. I am not arguing, I am just trying to understand.

One use of math is to improve your models of phenomena. You
start with "Oh, that apple fell" and later "Oh, another apple
fell" and after while you generalize to "Apples fall," which is
your model of the world. You've made progress, because you can
now predict the behavior of apples everywhere without needing to
go watch each one as it splats. (Whether your predictions are
accurate is another matter having to do with how effective your
model is; at the moment, we're just concerned with the modeling
process.)

Okay, so you've now formed a description that applies (you
hope) to all apples: They fall. But are there other questions
you might want answered? How about "How hard does the apple
land?" which might be of interest if you were thinking about
having a little sit-down beneath the tree, and wondering whether
your little sit-down might become your eternal lie-down. You
could experiment, placing friends under apple trees of various
heights and observing who survived and who didn't, and from the
data you might come up with ranges of tree heights labeled
"Safe," "Iffy," and "Send Flowers."

But if you haven't got enough friends willing to risk their
crania for the sake of your experiments, you're going to need
another way to figure out how hard the apple hits. And that's
where the mathematical model enters the picture. Newton is not
immortal because he saw that apples fall (if indeed he did; the
story may be apocryphal), but because he invented a mathematical
model that described their falling. Thanks to Newton, you now
need only two numbers to predict how hard an apple hits: Its
mass, and the height it drops from. Now, you can assess the
safety of any particular apple tree without sacrificing a whole
bunch of ex-friends first. (Again, we focus on the making of
the model and not on its accuracy: Newton's predictions would
be pretty badly in error for an apple tree ten miles tall,
since air resistance would come into play. But models are like
that: Their predictions are "good enough" over "practical ranges"
and the models keep on working until greater accuracy and/or
wider ranges are needed.)

Okay; that's my idea of one reason why mathematics can be a
useful description of a natural phenomenon, even though the
phenomenon occurs with no mathematicians in the vicinity. Take
it or leave it. But I do not think you'll make it as a programmer
unless you have at least *some* grasp of mathematics. You will
*need* to be able to count sets and combinations of various kinds.
You will *need* to make choices based on probabilistic arguments.
You may or may not ever need to calculate a contour integral,
but there's a nugget of mathematics you just can't do without.
 
E

Eric Sosman

Keith said:
The average *successful* search will take half a million comparisons.
An unsuccessful search will take a million. The number of comparisons
required for an average search depends on how many of them are
successful (and on the distribution of the values successfully
searched for).

Since the array being searched is sorted, the average for
an unsuccessful search is also about half a million.

Actually, given the array as shown both the successful
and unsuccessful searches are O(1):

if (x < 1 || x > 1048575) return NOT_FOUND;
return array[x-1];

:) (Nor is this an entirely whimsical quibble: Consider the
code typically generated for a `switch' with a "dense" set of
`case' labels ...)
 
E

Eric Sosman

arnuld said:
By that you mean the worst case successful search (20th comparison) or a
successful search which is not worst case( e.g. 10th, 15th or even 3rd
comparison) ??

Aha! Here's a perfect example of the kind of "Why
mathematics matters" issue you asked about earlier. No
search, successful or unsuccessful, will take more than
twenty comparisons. Some "lucky" successful searches
will take fewer, by happening to hit on the desired key
while the "active" interval is still relatively wide.
One particularly lucky successful search will find its
key on the very first comparison! How does the "luck
factor" skew the average? Is the effect significant, or
is it negligible? *That's* the kind of question that faces
a programmer often and often again. Care to have a go?


S

P

O

I

L

E

R



S

P

A

C

E


One way to tackle the problem is to set up one of those
recursions you seem to dread. Let's work "forward," following
the comparisons as they're made. The very first comparison in
an array of N elements lands on the desired key with probability
1/N. Otherwise (with probability 1-1/N) it misses, but leaves
you with only (N-1)/2 array locations still to be searched.
Letting C(N) be the average number of comparisons to find a
key in an array of size N (we're assuming a successful search),
we've got

C(N) = 1 * (1/N) + C((N-1)/2) * (1 - 1/N)

.... and if you can solve this recursion, you're all set. (The
decision of whether to start with C(1)=0 or C(1)=1 depends on
how sure you are that the key is actually present.)

By the way, the recursion I've shown is not quite right,
as it glosses over one crucial detail. Can you spot it? (Hint:
is N even or odd?)

Another way to attack the problem is to visualize the search
as a binary tree. The root node is the first array slot you
compare against. The left sub-tree is rooted at the slot you
check second if the original comparison comes out <, and the right
sub-tree starts at the second-compared slot if the first comparison
comes out >, and so on through the third, ..., twentieth comparison.
There are N nodes in the tree altogether. The number of comparisons
you make to arrive at any particular node K is one plus K's distance
from the root of the tree. So the average number of comparisons
is the tree's "internal path length," divided by N, plus one. If
you know something about the mathematics of trees, you are once
again all set.
 
K

Keith Thompson

Eric Sosman said:
Since the array being searched is sorted, the average for
an unsuccessful search is also about half a million.

Actually, given the array as shown both the successful
and unsuccessful searches are O(1):
[snip]

Conclusion: The cost of a search depends critically on the
assumptions you can make about both the key and the contents of
the array.

Another case we both missed: If we know the array is sorted but
we can't assume the values are contiguous (i.e., we really have to
search), then a search for an element outside the range of stored
values can be O(1); if the stored values range from 1 to 1 million,
it'd doesn't take long to determine that 1 billion doesn't appear.
On the other hand, pre-checking the upper bound imposes a non-zero
cost on *all* searches, which might not be worth it; we might save
time overall by searching all 1 million elements for the value 1
billion if it makes most searches very slighly faster.

Even linear searches can be non-trivial.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
474,008
Messages
2,570,270
Members
46,872
Latest member
Stephendes

Latest Threads

Top