Implementing strstr

blmblm · Mar 29, 2010

Well hoo fucking ray: but it's bad style to /* FALLTHROUGH */, Alice
in Wonderland. It happens to be a basic violation of structured
programming rules, and it's rarely necessary. The resulting switch
cannot with fallthrough be recoded as if then else, and switch isn't
one of the Bohm Jacopini primitives.

Depends where the fallthrough occurs, no?

switch (i) {
case 0:
case 1: printf("%d is 0 or 1\n", i); break;
case 2:
case 3: printf("%d is 2 or 3\n", i); break;
default: printf("%d is not in 0..3\n", i);
}

can be recoded as

if ((i == 0) || (i == 1)) {
printf("%d is 0 or 1\n", i);
}
else if ((i == 2) || (i == 3)) {
printf("%d is 2 or 3\n", i);
}
else {
printf("%d is not in 0..3\n", i);
}

and in fact the "switch" version is, as far as I know, a fairly
standard idiom -- as I think Seebs pointed out.

(Aside to Seebs: I was going to include /* FALLTHROUGH */ comments
but realized I wasn't sure where they should go. ? )

blmblm · Mar 29, 2010

So why are there cases for them in the first case?

Just my nose, Dweebach.

You know, I looked briefly at the code in question, in

http://github.com/wrpseudo/pseudo/blob/master/pseudo.c

and to me it seems not to be doing either a mkdir *or* a mknod,
but performing processing common to two cases identified as
MKNOD and MKDIR. That there would be processing common to
"make a directory (mkdir)" and "make a special file (mknod)" --

it doesn't strike me as far-fetched or bad design. said:
I'm quite aware of your convention of coding explicit cases for things
you never expect with fallthrough to an error. It seems designed to
mislead.

And you haven't answered the question. Why does a mkdir do a mknod?

I'm not sure it does, but perhaps the author of the code can clarify.

You're NOT expressing things "analaguous to" (it's "analogous") to the
logical or above. Anyone who thinks that doesn't know C. This is
because there's usually no code associated with the Or statement
"cases" UNLESS they are in the form

x == VALUE_ONE ? ( <code>, TRUE) : FALSE

The only "analogy" is that switch like || is lazy, evaluating all
cases down to the first true case.

Huh? I don't understand what point you're making here.

[Parenthetically, one wonders what a pseudo root is for.

Click to expand...

Same thing as fakeroot -- to let you create things like package archives
or disk images which reflect specified ownership, permissions, or device
nodes, without requiring root privileges on the machine where you create
them.

Click to expand...

If this was useful, it exists coded by someone competent.

That's because you don't understand the semantics of switch().

Click to expand...

What ev er. I've written compilers for language with switch(). You
haven't.

Pretty much.

What error? There is no possible error. by_ino and by_path are not
pointers, they're objects. They are initialized at the top of the
function. If data of either type have been found, we copy the best
fit into db_header.

Click to expand...

A competent programmer would know that under two circumstances, the
"data" DOESN'T HAVE to be be of "either" type. Either an external
error or a modification to the surrounding code would falsify this,
and as far as I can see you failed to plan for this.

The usage, later, is:

if (found_ino || found_path) {
*msg = db_header;
} else {
msg->result = RESULT_FAIL;
}

Which is to say, IF we found either of them, we copy its data in, otherwise,
the operation has failed.

Click to expand...

We're not talking about the later code. The upper code leaves
db_header with an undetermined value in the formal sense.

Good guess.

I see no reason to assign it a null value, when I am guaranteed that
I will always assign it one or another of two values, both of which
are necessarily initialized.

Click to expand...

But: you don't.

No, there's at least two others. There's fakeroot (which we used to
use) and fakeroot-ng. Both are, for various reasons, unsuitable to
our purposes.

Yes.

/* warning: GNU getopt permutes arguments, which is just plain
* wrong. The + suppresses this annoying behavior, but may not
* be compatible with sane option libraries.
*/

The issue here is as follows:

1. GNU's implementation of getopt() defaults to reordering arguments,
which violates both the POSIX spec and reasonable (IMHO, but also
in the HO of many other perhaps better-qualified people) expectations.
2. This program is necessarily run on systems where the system getopt()
is GNU getopt().
3. To suppress the undesired behavior, I use the GNU getopt() extension
which removes this behavior.
4. Because a future porter might migrate this code to a system wherein
this extension doesn't exist (and isn't needed), I explain what
extension I'm using and why I'm using it.

I stand by the comment and implementation choice at this time.

(Note that getopt() is a POSIX extension, not a standard C feature, so
this is only marginally topical; however, I think the general question of
what to comment on when different implementations respond to a spec
differently is worth talking about.)

There's no default handler because the function's spec is defined very
clearly to never produce values other than the option characters specified,
a question mark, or -1. Since -1 was handled above, ? is the only other
possibility. I suppose I should have a default: case for the possible
case where a new option is added to the option string but not to the switch
statement, but it hasn't yet come up.

Click to expand...

After two months we would expect you to have one, Dweebach.

Great programmers anticipate even endogenous errors, meaning that they
code so that any statement that is not changed in a future release has
a high probability of still working when other statements are changed!

Now this strikes me as a reasonable criticism. Seebs?

blmblm · Mar 29, 2010

Maybe in the general population. Among those interested in programming,
perhaps not?

(Gag reflex) (Eye roll) (Crotch grab) (Tongue waggle)

You'd never get an entry level job with Mies van Der Rohe. God, he
said, is in the details. He'd fire you if you turned in a sloppy
drawing for a cornice in one of his buildings. He'd fire you for cause
if you told him that "but Mies, poor widdle me gots ADHD and you must
give me high level tasks so me isn't bored!"

Hm. What you apparently perceive as a plea for sympathy or
special treatment I perceive as a simple statement of fact.
That probably shouldn't surprise me, though, given that I also
perceived the long-ago statement about not having taken any CS
classes as a simple statement of fact, while you have repeatedly
called it a boast.

[ snip ]

blmblm · Mar 29, 2010

[ snip ]

Yes, since any time you write checking code you are risking more bugs.
The final approval of a computer system cannot be the computer itself.

I think there are tradeoffs in any such situation -- is it easier
and/or more reliable to write and debug the checking code, or to
rely on human ability to compare expected with actual output?
In this particular case it may be the latter. I think with your
string-replacement code it would have been worth the trouble to
write checking code (and in fact I did).

Instead, because computers were invented for both an evil purpose (in
order to slaughter the people of Hiroshima and Nagasaki by calculating
what would be needed at Alamogordo) and a good purpose (to ease the
burden of calculation on people), the good purpose would demand that
they not be left to run by themselves and check their own output.

It takes in fact seconds to read the output and approve it. I find
your comments offensive since I don't see this level of "regression
testing" anywhere else in this intellectual slum and looney bin,

It just seems to me that if you're going to go to as much as
trouble as you already have, it might make sense to go one more
step. I will readily admit that when I write throwaway code I
often don't do even as good a job as you've done here of writing

self-documenting tests. said:
and I
suspect that you, like many academic and corporate females, are trying
to normalize deviance with sweet talk.

I have no idea what this means -- but that's probably just as well,
since if I understood it I suspect I'd be offended.

Well, now that I think about it, I suppose that's not as bad as
it might be, since it would be easy enough -- in my preferred
development environment anyway [*] -- to copy the expected output
into a text file, capture actual output in another text file,
and have the computer compare the two.

Click to expand...

Get Dweebach to write you the script.

Hm, what parts of this are scriptable, and would I need help
writing scripts for the parts that are [*] ....

The "capture output and compare" part doesn't really seem worth
the trouble of scripting, but it would be easy enough.

The "copy expected output into a text file" part .... well,
now, I suppose it might be interesting to think a little about
the feasibility of automating/scripting that, beyond what would

be easy to do in a text editor. Or not. said:
[*] For short C programs -- text-based tools under Linux.

Click to expand...

Including bash scripts. So I might not need help.

blmblm · Mar 29, 2010

spinoza1111 said:
spinoza1111 said:

In the replace() program of last month's flame festival, a little
program was trying to get out. Here it is: an implementation of strstr
including a call that returns the offset of the found substring. Two
hours including all comments and dedicatory ode, written for this
occasion.

Click to expand...

If I were going to put as much effort into writing comments as you
appear to have done with this program, I would explicitly discuss
the function's parameters and return value -- i.e., I would say
how the return value depends on the function's inputs and also
describe any side effects [*].

Click to expand...

I wuv it: you see some effort, so you suggest something less useful as
an outlet for my energies. No, dear heart, a thousand times no.

"Wuv"? "Dear heart"? Feh. Well, let that go ....

I disagree. Less entertaining for you [*], perhaps, but less useful?

[*] And perhaps for some readers. Your verse doesn't appeal to me,
but others might enjoy it.

(Before I go on -- I should probably apologize just a bit for the
admonitory tone of the previous post. On review it comes across
as lecturing in a way that I'm not sure I intended.)

I teach poetry, based on the Norton Anthology and John Lennard's The
Poetry Handbook. I note in my classes that there ain't no such thing
as free verse. Instead, a poet like TS Eliot and Ginsberg uses old
forms, and new forms, in new ways instead of trying to meet the
requirements of a traditional form.

In so doing, the poet sets up certain expectations which he either
meets or disappoints in the reader's first reading of the poem. When
he disappoints them, this means something. For example, Lennard shows
how a poem starts out like a sonnet and contains fourteen lines: but
some lines are fragmentary like a bombed city, since the sonnet is
about a Holocaust victim in a time when Adorno said we can't write any
more goddamn poetry.

Here, the comment block sets up the expectation that Nilges has
written some code and is gonna tell us, overall, what the code does.
So I do. Next, some but by no means all programs in the old days (when
programmers were more literate than today) would often contain a quote
AFTER the intro and BEFORE the code, in the same relative position a
dedicatory quote appears in a book, such as Shakespeare's First Folio,
which contains some dedicatory material AFTER the title page and
BEFORE the first play (which is for some silly reason the last
complete play of Shakespeare, The Tempest).

The reader is then either amused (if civilized and urbane) or outraged
(if like many programmers) but goes on to find, yes, functions without
my usual custom, which is to set up a visual fence as a line comment,
followed by the function name, followed by a statement as to what it
does, and yes indeedy, side effects...although great programmers make
fewer side effects.

But here, no such comments exist because

(1) If the code is small, let it document itself

(2) I would like to give the aliterate programmer-reader a break after
having him eyeball the ode.

Well, I will admit that I didn't read the part of the comments that
were in verse; I find your verse off-putting and so usually skip it.

No competent programmer expects commenting to tell her what the code
does.

I don't agree. In my opinion, a careful and competent programmer
documents his or her code in such a way that another programmer
can learn *what* the code does (in terms of how outputs and side
effects relate to inputs) without reading the code to find out
*how* it does that. This is what makes multi-person programming
projects possi .... Oh wait, you don't approve of those, do you?
Well, I'd claim it's still helpful if the code is to be modular
and reusable -- if you want to use the code again a year from
now, do you really want to read it again .... Oh, maybe you do.

Then again, where does one stop with reading the code rather than
hoping to find out what it does from its comments? Can one use
library functions without reading *their* source code? How about
other tools?

And simply reading the code would show you that the ptrIndex
parameter is needed when we want to get the offset of the string
without an extra calculation. Had you troubled to read the code, which
we must as programmers do whether or not there are comments, you would
find that strstrWithIndex and strstr are in a polymorphic relation to
each other in OO terms.

How is this polymorphism?

[*] The only case in which I would omit such discussion would be
when the names of the function and/or parameters are so descriptive
as to make discussion superfluous. In that regard, I find the
names used in the man-page documentation of strstr ("needle" and
"haystack") more descriptive if less formal than your "target"
and "master".

Click to expand...

You've never taught English in China. It's offensive to people with
good English as a second language to use idioms in the cutesy way of
unix man pages.

Is it really ("offensive") ....

Well, I'll agree that English prose meant for a wide audience that
might include non-native speakers should probably avoid idioms
that might be unfamiliar to some of the audience. I hadn't really
thought about that, and it's a good point.

[ snip ]

blmblm · Mar 29, 2010

It was both, because Seebach (because he didn't study computer
science) has no committment to the Copernican revolution of structured
programming, and he thinks it's cute to fall through like Alice in the
Rabbit hole. Most competent instructors would lower his grade for
using this "feature".

I would not. Whether this makes me an exception to the rule or marks
me as not competent -- <shrug>.

Now, I might suggest to a student who turned in the code above that it
would probably be a good idea to document, at the point at which ACK
and NAK are treated as errors, *why* they're errors.

My first published work on programming was written because on the IBM
mainframe in 1976 I was able only to use unstructured cobol and
assembler, but in these languages I found that programs with a
"structured flowchart" restricted to the Bohm/Jacopini primitives were
far more reliable than unstructured programs. I would have preferred
to use C at the time because it was structured therefore I have
nothing but the utmost contempt for programmers who think it's cute to
use fallthrough, especially when those clowns go around calling people
morons and insane, you dig me?

There are, are there not, TWO WAYS to support what used to be called
"open subroutines" in C: preprocessor macros and inline. Therefore, if
the "overhead" of subroutine calling an error handler is too much
(despite the fact that if an error has occured, "efficiency" is
usually not as important as handling the goddamn error), you have TWO
WAYS to call the error handler inside the error case, avoiding
fallthrough, which creates a new control structure, an unnecessary and
confusing control structure.

I don't agree that fallthrough as used in the code in question
is confusing, nor that packaging the code to be executed as a
macro or inline function would be clearer.

Ben Bacarisse · Mar 29, 2010

spinoza1111 <[email protected]> wrote:

Click to expand...

The following code, from Peter's "pseudo root simulator", is submitted
to this discussion group [...]

Click to expand...

int
pseudo_server_response(pseudo_msg_t *msg, const char *tag) {
switch (msg->type) {
case PSEUDO_MSG_PING:
msg->result = RESULT_SUCCEED;
if (opt_l)
pdb_log_msg(SEVERITY_INFO, msg, tag, "ping");
return 0;
break;
case PSEUDO_MSG_OP:
return pseudo_op(msg, tag);
break;
case PSEUDO_MSG_ACK:
case PSEUDO_MSG_NAK:
default:
pdb_log_msg(SEVERITY_WARN, msg, tag, "invalid message");
return 1;
}
}

Click to expand...

It was both, because Seebach (because he didn't study computer
science) has no committment to the Copernican revolution of structured
programming, and he thinks it's cute to fall through like Alice in the
Rabbit hole. Most competent instructors would lower his grade for
using this "feature".

I would not. Whether this makes me an exception to the rule or marks
me as not competent -- <shrug>.[/QUOTE]

No, neither would I, and there is an significant technical issue here.

The code above does not "fall through" in the way the term is usually
used -- it just labels some code with more than one case label.
This is simply what you do when more than once case should be handled
in the same way and there is nothing unstructured about it. Some
languages permit the multiple cases to be abbreviated, but that does
not alter the structure.

What is (at first glance) a little odd is that two explicit cases have
been added to a default cause. One has to ask if the added
documentary value in making these explicit is worthwhile. I'd say
that it is -- particularly if all switch statements that switch on the
message type explicitly include these four cases.

<snip>

Seebs · Mar 29, 2010

// * Arduously and ignobly like unto the meanest Hind *
// * That knoweth not his Elbow from his Behind. *
printf("Expect '0': %c\n", *strstr("0123456789", "0")); [etc.]

Click to expand...

This is quite possibly the worst C code
I have ever had the misfortune of seeing.

And unless that's intended as a test of the library's code, there's not
even any guarantee that it's doing what it's supposed to do, because
strstr() is in implementation namespace.

But yeah, his code seems to be like that. Thus far, his big complaint about
my code has been that he doesn't know how switch() works and he can't tell
a pointer from a plain object.

-s

Seebs · Mar 29, 2010

Then you're missing lots and lots of nasty slurs .... I gather
that this doesn't bother you, though,

Honestly, it does. I really feel like I'm missing some prime humor here,
but it's really hard to read his hilarious nonsense without feeling compelled
to try to correct some reasonable subset of the technical errors, whereupon
everyone yells at me for wasting bandwidth.

I'm trying to
only quote when I think there's a technical point that might be of
interest to other readers.

Well, his "technical criticisms" of pseudo have indeed been fascinating to
me. I admit that his technical ability surprised me; I thought I'd
established a reasonable working lower bound for him, but I was wrong again.

-s

Seebs · Mar 29, 2010

Depends where the fallthrough occurs, no?

Furthermore, why on earth would I care?

I do actually have at least one case where a fallthrough is used to
allow two cases in a switch to share code with several other cases.
(Note that this can be handled by a nested if.)

switch (x) {
case 0:
printf("roly poly ");
/* FALLTHROUGH */
case 1: case 2:
printf("fish heads\n");
break;
default:
printf("sorry, can't think of any other songs.\n");
break;
}

===

if (x == 0 || x == 1 || x == 2) {
if (x == 0) {
printf("roly poly ");
}
printf("fish heads\n");
} else {
printf("sorry, can't think of any other songs.\n");
}

But who CARES? I'm not here to comply with an arbitrary set of abstract
"rules of structured programming". I'm here to write clear, legible, C,
conforming to the conventions and idioms of that language.

(Aside to Seebs: I was going to include /* FALLTHROUGH */ comments
but realized I wasn't sure where they should go. ? )

There isn't a completely general convention. Usually, they go precisely
where the "break;" would have gone if there were no fallthrough, except
that sometimes they're put on the case labels rather than at the end of
the material under the case.

This criticism is particularly illucid, simply because the original complaint
given was that I should not have called an error routine for those cases.

But in fact, I did exactly that -- I wrote code to call an error routine
if those ever occur.

It feels like you're dumber than I could possibly imagine. The key
distinction, I think, being that my criticisms of Schildt are actually
correct, whereas your criticisms of my code so far show an astounding
lack of comprehension of basic C.

Up to a point, yes. There is a reason to have high-level documentation,
though, which is that no amount of "readable" makes up for the reader
not understanding the basic goal of the program.

Colorless green ideas sleep furiously!

But that's not client code.

Come on, really. What function had the disputed MSG_ACK and MSG_NAK?

"pseudo_server_response()"

I do not think it is reasonable for anyone to suspect that this is
"client code".

As to whether I should put in cases for an "invalid" condition: One of the
reasons pseudo exists is that other tools we'd used were not robust enough;
they could crash or otherwise fail. As such, pseudo is full of tests for
conditions that are obviously impossible *if the rest of the code is free
of bugs*. What this means is that even when there have been bugs (and there
have), the program has continued to work, with a grand total of one reported
failure in the field.

At one point during internal testing, I had a bug in the server which caused
it to crash under fairly common circumstances. It went unnoticed for a
week or two because the recovery code kept things running smoothly anyway.

-s

Seebs · Mar 29, 2010

You know, I looked briefly at the code in question, in

http://github.com/wrpseudo/pseudo/blob/master/pseudo.c

and to me it seems not to be doing either a mkdir *or* a mknod,
but performing processing common to two cases identified as
MKNOD and MKDIR. That there would be processing common to
"make a directory (mkdir)" and "make a special file (mknod)" --
it doesn't strike me as far-fetched or bad design. <shrug>

Exactly.

Actually, I could probably combine them fairly effectively with OP_CREAT,
too. The reason they're not the same is that MKNOD and MKDIR are the two
normal cases where a new entry is being created for something which is not
a plain file. CREAT is currently handled differently because I don't
necessarily assume that any existing file with the same inode is a bug,
but that may be a design flaw.

I'm not sure it does, but perhaps the author of the code can clarify.

Both mknod and mkdir do "clear any previous entries with this path or
inode number, then store this new file in the database".

While there certainly are both fakeroot and fakeroot-ng, and I would not
choose to call the authors "incompetent", they do not solve the problems we
need solved, and retrofitting the functionality we want was impractical.

You're quite right!

Similarly, when I write:

z = x + y;
z = z - y;
/* assume z = x */

I am not planning for the possibility of external changes.

Which is to say: There's a practical limit to how often you should recheck
that the code you just wrote hasn't changed. The only possible cases are:
1. Neither a matching path nor a matching inode was found.
We pick up the zeroed-out header.
2. Either a matching path or a matching inode was found, or both.
Use the best fit.

The only "external errors" that could occur would be for the find-by-inode
or find-by-path to fail, in which case, I'd think they had failed, leading
us towards case 1 above. Any modification to the surrounding code would,
indeed, require this to be changed. Yup. When I make changes within a
function to values used later in that function, sometimes I have to change
the way I use those values later in that function. OH WOE IS ME.

No, it doesn't. It leaves db_header with either a zeroed-out structure or
the best match found.

Except I do. Really, this isn't hard.

Now this strikes me as a reasonable criticism. Seebs?

It actually sort of is! Admittedly, it's right only in a way that makes his
original criticism wrong, but yup, that's right. I've added a 'default' next
to the ?.

Apart from that, this is the usual sort of Nilges rant -- tons of compliants
based on totally misunderstanding basic idioms, and huge, epic, rants about
how I should be even more careful and paranoid. (Which I'm somewhat
sympathetic to just because for the most part, I take it as a personal
offense if pseudo manages to fail in a way that doesn't produce a clear
diagnostic as to what went wrong.)

-s

Seebs · Mar 29, 2010

Maybe in the general population. Among those interested in programming,
perhaps not?

Maybe not, but I don't know all that many ADHD programmers, and I'm on
the fringe so far as extremity of traits.

Wow, Nilges was right about something!

Now, the key question is: Would I care?

Hm. What you apparently perceive as a plea for sympathy or
special treatment I perceive as a simple statement of fact.
That probably shouldn't surprise me, though, given that I also
perceived the long-ago statement about not having taken any CS
classes as a simple statement of fact, while you have repeatedly
called it a boast.

Exactly. I'm not expecting sympathy or praise, merely explaining how it
comes to be that I find hard things easier than easy things.

Basically, my brain's only marginally useful outside of flow*. The tradeoff
is that, once I am in a working state, I'm extremely good. Obviously, this
imposes some limits on my choice of jobs or lifestyles. So I adapt to those
limits, because what else would I do?

-s
[*] http://en.wikipedia.org/wiki/Flow_(psychology)

Seebs · Mar 29, 2010

Is it really ("offensive") ....

It is not. Part of the point of developing *good* <language> as a second
language is to develop familiarity with idioms.

And yes, I do feel competent to talk on the issue, as:
1. I spent a year in China while my mom was coaching people for the TOEFL
and GRE, and we discussed effective teaching techniques.
2. A large number of my coworkers are Chinese, and have thanked me greatly
for using and explaining idioms, because it helps them understand real-world
usage of English. (I think the thanks may be more tied to the explaining
than to the use, but they don't seem to find the use offensive; rather, they
view it as an opportunity for learning.)

-s

Seebs · Mar 29, 2010

What is (at first glance) a little odd is that two explicit cases have
been added to a default cause. One has to ask if the added
documentary value in making these explicit is worthwhile. I'd say
that it is -- particularly if all switch statements that switch on the
message type explicitly include these four cases.

Most of the time (but not always), I test for every defined value
(usually excluding things like FOO_NONE or FOO_MAX) explicitly, and "default"
is there to handle the possibility that a value not in the defined range
at all will occur.

-s

Nick · Mar 29, 2010

Seebs said:
Basically, my brain's only marginally useful outside of flow*. The tradeoff
is that, once I am in a working state, I'm extremely good. Obviously, this
imposes some limits on my choice of jobs or lifestyles. So I adapt to those
limits, because what else would I do?

Well you could post thousands of lines a day to comp.lang.c about how
the world isn't structured exactly the way you want it to be, and about
how no-one else uses words in the same humpty-dumpty fashion you do, and
about bizarre industrial relations history of the 1970s and why that
means some programming languages follow paradigms invented by medieval
painters (I may have got the last of these a bit wrong). It seems to
work for at least one person.

Seebs · Mar 29, 2010

Well you could post thousands of lines a day to comp.lang.c about how
the world isn't structured exactly the way you want it to be, and about
how no-one else uses words in the same humpty-dumpty fashion you do, and
about bizarre industrial relations history of the 1970s and why that
means some programming languages follow paradigms invented by medieval
painters (I may have got the last of these a bit wrong). It seems to
work for at least one person.

Good thought!

Actually, I think I'm gonna take advantage of being on my lunch break
to write up my own implementation of strstr(), plus a test harness.

Because, hey, it'll be fun. And I'm sure that the various funny mistakes
I make in the process will amuse.

-s

Squeamizh · Mar 29, 2010

(Thanks for quoting this, I never see his garbage except when quoted.)

You killfiled spinoza so that you don't have to see his garbage. Then
you thank someone for quoting his garbage and circumventing your
killfile. I suggest you remove spinoza from your killfile.

Seebs · Mar 29, 2010

Okay, this sounded fun. Here's a trivial test harness and a trivial strstr.
Feel free to find bugs. The only one I caught in testing was that I had
the order of the two arguments reversed -- I did (needle, haystack) rather
than (haystack, needle). Feel free to present broken cases; I have no
illusions that I'll have gotten this bug-free on the first try.

This program assumes that the library strstr() works.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char *my_strstr(const char *s1, const char *s2);

/* for test harness, not for implementation */
#define MAX_LEN 128

/* imagine that a large buffer is divided into five parts:
* 1 2 3 4 5
* test_strstr() copies the argument strings into slots 2 and 4, so
* that results can be compared for relative position, not just equality,
* because they're all in a single argument.
*/
void
test_strstr(char *haystack, char *needle) {
size_t needle_len, haystack_len;
char big_buffer[MAX_LEN * 5] = { 0 };
char *store_needle = big_buffer + (MAX_LEN * 1);
char *store_haystack = big_buffer + (MAX_LEN * 3);

char *lib_return, *my_return;

if (!needle || !haystack) {
fprintf(stderr, "error: test_strstr() needs valid strings.\n");
return;
}
needle_len = strlen(needle);
haystack_len = strlen(haystack);
if (needle_len > 128) {
fprintf(stderr, "test_strstr: needle too long (limit %d)\n",
MAX_LEN);
return;
}
if (haystack_len > 128) {
fprintf(stderr, "test_strstr: haystack too long (limit %d)\n",
MAX_LEN);
return;
}
strcpy(store_needle, needle);
strcpy(store_haystack, haystack);
lib_return = strstr(store_haystack, store_needle);
my_return = my_strstr(store_haystack, store_needle);

printf("test: <%s> in <%s>: ",
store_needle, store_haystack);
if (lib_return == my_return) {
if (lib_return)
printf("<%s> (offset %d).\n",
lib_return, lib_return - store_haystack);
else
printf("No match.\n");
return;
}
/* they don't match... what went wrong? */
if (!my_return) {
printf("I found no match, lib found <%s> (offset %d).\n",
lib_return, lib_return - store_haystack);
return;
}
if (my_return < store_haystack) {
printf("return (%p) underruns haystack (%p - %p)?\n",
(void *) my_return, (void *) store_haystack,
(void *) (store_haystack + haystack_len));
return;
}
if (my_return > store_haystack + haystack_len) {
printf("return (%p) overruns haystack (%p - %p)?\n",
(void *) my_return, (void *) store_haystack,
(void *) (store_haystack + haystack_len));
return;
}
/* so at this point, my_return is non-null and in the haystack */
if (!lib_return) {
printf("lib found no match, I found <%s> (offset %d).\n",
my_return, my_return - store_haystack);
}
printf("lib/my return mismatch:\n");
printf("\tlib: <%s> at %d\n", lib_return, lib_return - store_haystack);
printf("\tme: <%s> at %d\n", my_return, my_return - store_haystack);
}

char *
my_strstr(const char *haystack, const char *needle) {
const char *found = 0;
const char *next_candidate = 0;
const char *h, *n;

/* an empty string is found immediately, even in an empty string */
if (!*needle)
found = haystack;
for (; !found && *haystack; ++haystack) {
if (*haystack == *needle) {
next_candidate = 0;
h = haystack;
n = needle;
/* stash the next spot matching the first
* character of the needle.
*/
while (*++n == *++h) {
if (*n == *haystack && !next_candidate)
next_candidate = h;
}
/* we reached the end of the needle, so
* we are done
*/
if (!*n) {
found = haystack;
} else {
if (next_candidate)
haystack = next_candidate - 1;
else
haystack = h - 1;
}
}
}

/* sorry, no way around it; we can't return a possibly-qualified
* pointer, so we have to drop const in case the original strings
* were not really const.
*/
return (char *) found;
}

int
main(void) {
test_strstr("foobar", "foo");
test_strstr("foobar", "bar");
test_strstr("foobar", "baz");
test_strstr("banana", "na");
test_strstr("bananas", "nas");
test_strstr("iced tea", "a");
test_strstr("blah", "");
return 0;
}

Keith Thompson · Mar 29, 2010

Squeamizh said:
You killfiled spinoza so that you don't have to see his garbage. Then
you thank someone for quoting his garbage and circumventing your
killfile. I suggest you remove spinoza from your killfile.

Or keep him there and stop replying to him. Your call.

Seebs · Mar 29, 2010

Or keep him there and stop replying to him. Your call.

Actually, I like it the way it is -- I get a filtered feed of an occasional
actual technical question, without the flood of irrelevant insults.

The guy's unambiguously a usenet kook, but his questions on technical
issues are occasionally interesting, for much the same reason that it
can be occasionally interesting to try to answer questions asked by
other novice-level programmers. It's just not very rewarding to me
to search through several-hundred line posts full of tinfoil hat nonsens
to find an occasional gem like his observation that there should be
a default case in a getopt() switch because someone could modify the
argument string but not remember to add the corresponding case. That
was actually a good idea, I think.

So I appreciate it when people who have more patience with his rambling
nonsense than I do filter out the occasional things worth responding to
and make them noticeable.

-s

Need Helping adding Square root code to an existing calculator. (Absolute begginer?)	0	Jan 13, 2025
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
Efficency and the standard library	500	Feb 10, 2010
Comparison of Integer and Pointer (that's supposed to be an Integer). Where did I go wrong?	0	Nov 19, 2022
Command Line Arguments	0	Mar 7, 2023
Help in this program.	2	May 14, 2022
C is NOT significantly more efficient than C Sharp	7	Dec 27, 2009
strstr crashes on NULL	7	Jul 12, 2007

Implementing strstr

blmblm

blmblm

blmblm

blmblm

blmblm

blmblm

Ben Bacarisse

Seebs

Seebs

Seebs

Seebs

Seebs

Seebs

Seebs

Nick

Seebs

Squeamizh

Seebs

Keith Thompson

Seebs

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads