substring finding problem!

Ben Bacarisse · Feb 18, 2010

fedora said:
I added the one because when both string and sub string are equal length the
diff will give zero, so to make the while loop below work properly, i add
one.

i cant compare remaining_len >= 0 since that'll always be true for
unsigned.

I think you are right. The bug is a bit further on... I mixed two
related issues up.

The problem is not the + 1, but the fact that you subtract one from
remaining_len every time, even when anchor has jumped by more than
one. Put:

assert(!substr || substr + lsub <= str + lstr);

just before the return and run with "aaxbx" and "xa" and you will see
that the assert will fire because the second time round the loop
anchor points to the 'b' and remaining_len is 3 (when it should be 2).

One solution is to replace the decrement of remaining_len with a new
calculation of it:

remaining_len = str + lstr - anchor - lsub + 1;

If you do that, there is really no need to have it in a variable.
Just pass this value to strFirstCh and loop while anchor is not null
but I suspect that you can come up with a simpler way to write the
whole function if you try.

Ben Bacarisse · Feb 18, 2010

fedora said:
Ike said:

Ike Naar wrote:

if (((orig_str + orig_lstr) - str) <= step) break;

[snip]

(orig_str + orig_lstr <= str + step)

[snip]

Initially i thought of writing it exactly like yours but then when str is
< step bytes before it's end, adding step to it would be undefined
behaviour so i coded like i did.

Click to expand...

Whoops you're right I did not think about that.

Maybe i can cast step to signed long and then compare
with (orig_str+orig_lstr) - str? is this ok?

Click to expand...

That is a possibility. If your compiler supports prtdiff_t then perhaps
it's better to cast step to that type, so that the types on both sides
of the comparison operator are the same.

What about (orig_str + orig_lstr - step <= str) ?

Click to expand...

This is perfect thanks!! After changing that line to the comparison above,
i've stopped getting the strange seg faults as far as i can see.

so it seems this failed test allowed lstr to wrap around to very big values
and access foreign memory locations. now it works!

No, sorry, your program is still wrong. You are looking at a symptom
not a cause. The change alters thing so it /seems/ to work but the
bug I pointed out (though I miss-described it) is still there. This
new test (which I don't think is needed if you correct strSubstr) is
hiding the undefined effect of going outside the array, but the
behaviour is still undefined even though you may not see a crash. See
my other post about that.

lstr "wrapped round" only because strSubstr is returning a pointer too
close to the end of the string. This line:

lstr = (orig_str + orig_lstr) - str;

sets an unsigned lstr to a possibly signed pointer difference. With
the + error in strSubstr, this difference can be -1. If you fix
strSubstr everything is OK again. Of course, I may have missed
another problem, but i am pretty sure about this one!

spinoza1111 · Feb 18, 2010

No, it's not an ad hominem fallacy. It's the very well supported view
that what Nilges accepts or doesn't accept should not be a component of any
technical decision. I'm not saying that an argument is wrong because of
the person who is making it. I'm saying that a conclusion should be ignored
(neither accepted nor rejected) based on the person who has offered it.

....and that's an ad hominem fallacy. If you'd troubled to take a class
in informal logic, you'd have discovered that there is a VALID
argument based on applicable authority, but none based on anti-
authority. Nobody except a Fascist or a child refuses to believe
something because of his hatred of a person.

Which is to say, if you know someone is a clown, knowing his position on an
issue tells you nothing for or against the issue. Now, if he had advanced
an argument, it could be worth discussing that argument, but as long as
we're just talking about his conclusion, it's not an ad hominem fallacy to
suggest disregarding the conclusions reached by someone who is demonstrably
very bad at the topic in question.

I'd be careful with this, dear boy. You're the one who posted a
"solution" for replacing %s (and bugger all) with something else, that
didn't work, and who posted a strlen with a crude off by one bug. Many
people here (but not me) may well use your demonstrated incompetence
as a reason for ignoring your input on technical matters. I haven't:
most recently, I used the fact that your latest stupidity in the off-
by-one strlen was so glaringly obvious, because you used short
identifiers, to admit that my long and literate identifies might need
to be reduced.

For some purposes, yes.

For manipulation of sequences of non-NUL chars, terminated by a char, not so
much.

You haven't responded to Malcolm's concerns at all.

True. This is addressed in no small part by the multibyte stuff, which you
would presumably use for multibyte strings.

....as an exception (in other words, a bug in waiting)

I'm not aware of "strfind".

While the various interfaces are certainly flawed, consider that the Nilges
alternative is to duplicate the flaws of the interface without even the
benefit of already having been debugged. Or, worse, to just not even come
close.

Incoherent. How do I "duplicate the flaws of the interface?" My
replace merely starts with Nul terminated strings because this is what
I'm given. You're just issuing words out of hatred at this point.

There are certainly cases where the <string.h> functions are not the right
tool. However, that Nilges argues against it is not an argument either way
in that. His arguments might be an argument either way; his conclusion is
not.

Empty words.

There is a techie named Seebach
Who sense and skill doth lack
Who uses words
Like little turds
That incompetent techie named Seebach

His code it had an ugly Bug
Off by one, snug as a rug
It was obvious to all
And the sensitive this bug did appall
It made them say, oh Ugh

But we would like to forgive him
Programming is hard for all men
And when he withdraws his Schildt rant
We will do so, like Immanuel Kant.

spinoza1111 · Feb 18, 2010

This whole project thread has been filled how not to engineer software.
Application code specifically avoiding libraries, Not Invented Here,
random design with moving target specifications or no specifications,
ad hoc testing with a dose of 20+ Year old unresolved office battles,
interpersonal rivalry and off topic rants.

As several have stated not the environment that we are used to.

You're in denial. In fact, Seebach and Heathfield insist on making
this recreational environment, in which we could in the absence of job
pressures engage in a common search for truth, representative of the
usual sort of development environment. The fact is that most high tech
firms produce low tech software consistently, and they do so because
correct software takes "too much time", and requires for its
developers, free human beings unafraid of being laid off into a savage
state of unemployment when they are the targets of office bullies.

spinoza1111 · Feb 18, 2010

Yes, but it's important to be prepared to program in some of the many
environments which real programmers often end up having to work in.

To be fair, I've never had a coworker in the same class as Nilges. Not
even particularly close. But I have had to work with arbitrary or
bad specifications, specifications which change repeatedly during
implementation, old office battles, and vehement opposition to things which
were Not Invented Here.

At one point, I was asked to develop a linked list implementation. The
proposed design looked like this:

struct list_node {
struct list_node *next;
void *data;
};

struct list {
struct list_node *head;
struct list_node *tail;
};

The specification was much as you'd expect. Except for one TINY detail..
Which was that the formal specification was that
(struct list *) (x->tail->next) == x
whenever tail was not null.

That is to say, if the list contained any members, the "next" pointer for
the last member of the list was a pointer (suitably converted) to the list
object.

So iteration would look roughly like:
for (l = x->head; l->next != x; l = l->next) {
/* ... */
}

It took a day or so of effort for me to round up enough senior developers
to all sit on the guy and tell him that:

1. He was wrong.
2. He was micro-managing, which is presumptively wrong.

before we were allowed to use a more conventional design.

Having had to deal with things like a database in which the formal schema
description begins with "all fields are VARCHAR for simplicity", I found the
Nilges String Replace Challenge to be a surprisingly good approximation of
what programming work is often like in the real world.

(Disclaimer: All the above memories are faded with age. My current
environment is pretty good about this kind of stuff. I have no clue about
the office politics, as our management put a great deal of time and effort
into ensuring that they are Not Our Problem.)

-s

Fascists like to tell stories about fantasy people and other people in
hopes that their listeners will then confuse the mythical or alternate
person with their target. This of course is easier than going head to
head, where Peter consistently loses.

Peter has not, to my knowledge, posted a solution to my challenge that
works. The only people to have done so are Willem and io_x. Instead,
he started the ball rolling with code with bugs (%s and bugger all to
bugger all) and has since that time posted more code with bugs, that
uses string.h, along with a marvelously compact example of off by one
bugosity.

So of course, it's time to start telling war stories about his
victories over "incompetent" coworkers.

Instead of developing his own competence (for example, by taking
university courses in comp sci), Peter insists on advancing his career
by trashing good people like Schildt and paying his way onto standards
committees.

He is a most amusing creature.

spinoza1111 · Feb 18, 2010

Yes, but it's important to be prepared to program in some of the many
environments which real programmers often end up having to work in.

To be fair, I've never had a coworker in the same class as Nilges. Not

No, I don't think you have.

even particularly close. But I have had to work with arbitrary or
bad specifications, specifications which change repeatedly during
implementation, old office battles, and vehement opposition to things which
were Not Invented Here.

You've memorized the mere names of thought crimes ("not invented
here") without learning your trade; as you have told us, you haven't
taken any university computer science classes at all.

At one point, I was asked to develop a linked list implementation. The
proposed design looked like this:

struct list_node {
struct list_node *next;
void *data;

This is as we know a mistake. You're retailing a story, in the
Fascist's way, in hopes that your auditors will confuse the story with
me. William Butler Yeats noticed that both sides in the Irish
"troubles" preferred to read their own biased newspapers and were
uninterested in dialog with the other side, and in the case of the
Republicans further split when Michael Collins tried to establish the
Irish free state:

The bees build in the crevices
Of loosening masonry, and there
The mother birds bring grubs and flies.
My wall is loosening; honey-bees,
Come build in the empty house of the state.

We are closed in, and the key is turned
On our uncertainty; somewhere
A man is killed, or a house burned,
Yet no clear fact to be discerned:
Come build in he empty house of the stare. (Yeats: The Stare's Nest by
My Window, to be continued)

};

struct list {
struct list_node *head;
struct list_node *tail;
};

The specification was much as you'd expect. Except for one TINY detail..
Which was that the formal specification was that
(struct list *) (x->tail->next) == x
whenever tail was not null.

That is to say, if the list contained any members, the "next" pointer for
the last member of the list was a pointer (suitably converted) to the list
object.

So iteration would look roughly like:
for (l = x->head; l->next != x; l = l->next) {
/* ... */
}

It took a day or so of effort for me to round up enough senior developers
to all sit on the guy and tell him that:

1. He was wrong.
2. He was micro-managing, which is presumptively wrong.

before we were allowed to use a more conventional design.

OK, he specified a circular singly-linked list, and (because by your
own admission you have never taken a single computer science class)
you had never seen this simple data structure, and you wasted an
entire day trying to figure this code out in consequence. For the same
reason you preferred to try destroy the career of Herb Schildt and you
hound me here, this enraged you and you maliciously went after the
original designer in order to ruin his position, because you didn't
know how to code a simple loop (we've seen how incompetent you are at
this task in your strlen that was off by one).

if (x->head == NULL) return;
list * p = x->head;
list * p2 = p;
do { ... p = p->next;} while(p != p2);

You may in fact know more about C than I do in the infantile rote
register, but you can't program, and instead of learning your trade,
you try to destroy reputations. You're a Fascist and an incompetent.
Oh honey-bees, come and build in the empty house of the stare: oh
honey-bees, come and build in the empty house of the state.

Having had to deal with things like a database in which the formal schema
description begins with "all fields are VARCHAR for simplicity", I found the
Nilges String Replace Challenge to be a surprisingly good approximation of
what programming work is often like in the real world.

You can't deal. Like your hero George Bush (whom you supported in
2000) you're incurious and you jump to conclusions about concepts and
people all too readily, because you're not qualified for the job
you're in and you don't want to learn anything except slogans.

A barricade of stone or of wood;
Some fourteen days of civil war;
Last night they trundled down the road
That dead young soldier in his blood:
Come build in the empty house of the stare.

We had fed the heart on fantasies,
The heart's grown brutal from the fare;
More substance in our enmities
Than in our love; O honey-bees,
Come build in the empty house of the stare. (Yeats)

spinoza1111 · Feb 18, 2010

I saw this 20 years ago and it became nothing but a memory as better
more effective approaches prevailed.

Software development practices have improved a lot as applications
have become more complex and requirements better defined.

How can "defining requirements" improve "software development
practices"? "Defining requirements" is being able to express the
requirements in English, other human languages, visuals, and if
necessary mathematical formalisms: if you define them in code, you're
not, by definition, "defining requirements": you are coding.

The fantasy has always been that the people who through intellectual
incuriosity and lack of qualifications (for example, psychology majors
like Seebach) can neither code nor do traditional mathematics will
somehow sit around in meetings and so such great "requirements" that
the code will be easy. The reality is lost baggage, stuck
accellerators, military veterans without health care, children
screaming under stairwells, boys sobbing in armies, and old men
weeping in the parks.

spinoza1111 · Feb 18, 2010

spinoza1111wrote:

This project and your comments show just how far out of touch
you are with software technology. It goes back a long time
when with pride you were debugging in machine code on a 1401
instead of using tools created to debug software.

Huh? I use .Net's rich development environment all the time, but sure,
I don't use bug creation tools like C strings if I don't have to.

NIH and other juvenile behavior at best illustrates the way
not to implement a project and at worst misleading and intellectually
dishonest.

NIH labels a response. Sure, there are companies that have failed
owing to excess internal development and the attitude of

We don't like it 'cause it's not invented here
If you use that solution, you must be a queer

just as any form of development by catchphrase (such as misunderstood
"simplicity") is prone to fail. There are also companies that have
succeeded by developing things internally, including Apple, IBM until
1990 and Northern Telecom until about 1983.

Note that the negation of a catchphrase is still a catchphrase.
Northern Telecom switched catchphrases in 1983 and proceeded to fail
by farming everything out.

On the ground in little data processing shops, NIH is simply a way of
normalizing deviance and forcing smart people to use their skills to
do dumb things, in many cases.

The broad based generalizations that you attribute to software
development companies just doesn't fit the facts.

Actually, I've been using specific examples taken from personal
experience at Bell-Northern Research and while I hope things have
changed since I changed careers, the evidence here is that things have
gotten worse.

And, your riposte is itself a broad-based generalization.

spinoza1111 · Feb 18, 2010

[...]

Yes, strcat and strlen are O(N) - so, where it matters, you remember the
string length, having found it out the first time.

Click to expand...

Bingo! :^)

Yes, strcat and strlen are order N
So be sure to squirrel away their values, like the Hen
Secretes seeds in her cache of seminal treasures
Great programmers, says Dickie, take these measures.

spinoza1111 · Feb 18, 2010

No, he's not saying that an argument is wrong because Mr Nilges is
making it. He's saying that Mr Nilges's support for an argument is of no
value in determining whether or not the argument is valid. That's very
different.

Here is an example of "ad hominem" argument:

"Mr X says strcpy is dangerous. Mr X doesn't know spit about C.
Therefore strcpy is not dangerous". Poor reasoning.

Here is an example of what Seebs is saying, using the above
(hypothetical) case:

"Mr X says strcpy is dangerous. Mr X doesn't know spit about C.
Therefore X's claim that strcpy is dangerous adds no value to the
argument that strcpy is dangerous. Whether it is or isn't dangerous is
another matter entirely."

The fallacy here is that while I know a bit about C, I know in fact
less than the regulars: but the regulars are uneducated in computer
science and who are incompetent programmers (who know a lot of rote
facts about C):

* Peter Seebach has by his own admission never taken a computer
science class, and in consequence makes silly off by one bugs, thinks
designing a common data structure (a circular linked list) is
practically a termination offense, and at least at the time of his
Schildt attack, thought that "the 'heap' is a DOS term"

* Richard Heathfield "engineers" a "reusable" linked list by
physically copying node values into nodes instead of using pointers,
creating a "reusable solution" that is O(M*N), prone to unpredictable
failure in production.

Managers who don't know how to manage free men and women prefer that
headhunters find men and women who are overly specifically "trained"
in platforms, and the result is plain.

That is, he is in effect claiming that it is possible that even someone
who knows little about a subject may know enough about it to make a
correct claim, or may simply make a correct claim by chance. So, moving
back from the abstract to the concrete, he is not dismissing any claim
made by Mr Nilges as being necessarily incorrect. That would be foolish,
not only because it would be an invalid "ad hominem" argument, but also
because Mr Nilges (who is on record as saying that he wishes to cause
maximum damage to this newsgroup) could exploit such a position by
deliberately making claims that are clearly true.

I don't know where you or Julienne get that claim you said I made:
that I wish to destroy this newsgroup. It sounds like the way Arabs
and Iranians, speaking in writing in their own language are
systematically mistranslated (by Israeli translation firms) as saying
"death to Jews" when for the most part they are calling for an end to
the criminal gang that has put the Jews of Israel at risk with their
criminal policies.

In fact, I always am the source, according to all neutral observers,
of interesting content.

Taking your specific points one at a time:

Whilst it is true that strcpy offers no added protection against buffer
overrun, careful programming overcomes this problem. Thus, strcpy does
not get in the way of the programmer who knows full well that his buffer
is sufficiently large - no performance penalty is imposed.

Knowledge is justified true belief, and programmers who decide that
some power of two is "big enough" don't usually verify this with end
users. Furthermore, knowledge is justified true belief, and most end
users don't know the answer either. Therefore, the best policy is to
use an approach in which you don't have to worry about "buffer
overrun", and this is easy in using a modern language such as C Sharp
or Java. In C, it's easy if one has taken a computer science class in
data structures, and had the chance in a true university to create a
linked list without some moron of a manager asking what you're doing.

With the linked list, you still have to worry about malloc failing.
But if you're competent, you call malloc checking its return. Whereas
allocating a buffer that you "think" is "big enough" and then blindly
copying to it is asking for trouble.

Sure, if you're reasonably competent, you will check the current size
of the string in the buffer against the allocated maximum (and, if
you're reasonably incompetent, you will do so with what even you, dear
Richard, knows is an O(n) strlen call, hopefully not using Peter's
famous off by one version). If you are a little more competent you
might even #define a symbol with the maximum rather than hard coding
it in two places.

But this of course doesn't begin to address the fact that in the so-
called "real world" (referring to it as reality, not in mythical
incantatory invocation), the buffer size will under maintenance
eventually become, for trivial jobs, always the maximum amount it
needed to be for unusual jobs, making for a performance pig.

Yes, strcat and strlen are O(N) - so, where it matters, you remember the
string length, having found it out the first time. These two functions
offer simple solutions to a simple task, and as such are very often a

"Simple" to the Simpleton is always a term of Praise:
It means "at last I understand it, to my inner shame'd amaze":
He then imposes this lack of Art upon the sage and Wise
Whenever in Authority, to their shock, and awe'd surprise.
Beggars ride fine horses, whipping them in scorn:
Shouting, we must understand, tho' their comprehension be porn:
Belisarius puts his Foot upon the scholar and the priest,
Commanding them on pain of death to say to him the least:
Yet Vandal, Goth, Visigoth, Ostrogoth and Byzantine:
Need the wisdom of the learned to find the gold they'd glean:
So they can fill themselves with swill and ravish the name of Art
Expelling all at fair Rome's fall in one almighty Fart.

good solution to the task at hand. Where that is not the case, we have
the option of building more powerful tools.

Actually, we don't, since the ability is extinguished by Envy and
Rage:
When Barbarians are in control, the cry is death unto the sage.

(And yes, I agree that
strcat's interface could be improved; for example, it could return a
pointer to the null terminator rather than to the beginning of the string..)

The mighty mountain Heathfield labors, and great Heaves are heard on
Earth:
The Natives stand a-wonderin' what this presages, whether boon, or
dearth:
The storm clouds gather on the peak, no man dares show his face:
Heathfield announces an idea to the assembled people in this place.
It's "return a pointer to the end and not to the beginning"
The Volcano has but Popp'd, and out comes...almost nothing.

Again, I must agree that the const inconsistency with strchr is a bit of
a wart. But the input is const purely to constitute a promise that the
function won't write to the input string. The return value is non-const
because strchr would otherwise be a real pain to use. How could it be
done better?

But Heathfield says so, and it must of Needs be true:
This is the best of all possible worlds, there is no point in Rue:
Don't dream, don't hope, you're not the Pope, you are but a sot and
Bum
Heathfield knows this of you, because that's what he's become.

As for strfind, that's not C's problem. Take it up with the vendor.

To my mind, the sprintf function does not have serious buffer problems.

So lo! Heathfield's code has sprung a secret Leak:
But because he's the boss, no man may dare to speak.
It sprintsf bytes on adjacent pages, the debugger he but rages,
Why did this guy use such a feature, and will it to the ages?
His coworkers say, hey hey hey, you'd better hide your rage away:
The Boss wrote dat code back in his high and palmy Day.
If he hears you you're for the chop, better just rewrite the Code
Secretly and in silent hope you won't be the fall guy not his Toad.

Nevertheless, some people obviously disagree, and C99 provides snprintf
for such people.

The scanf function is basically a mess, and is rarely used correctly. I
am at a loss to understand why it is introduced so early in programming
texts.

The strtok function is of limited use, but there are times when it is
just the ticket. It would be better, however, for it to take a state
pointer. I'm not convinced that its interface is particularly non-intuitive.

And when he doubles a negative, the trouble really starts:
His meaning is conceal'd, it gets out in Fits, and Starts.

spinoza1111 · Feb 18, 2010

spinoza1111wrote:

You just answered your own question.

But dear boy, you said that software improves if we define
requirements. Please explain how this work, and note that industrial
experience is that the success of a software development project is
completely independent of requirements quality. Sometimes, beautiful
requirements produce beautiful code, sometimes, beautiful requirements
beautifully miss the point. Sometimes, Extreme programming projects
start out with no requirements and beauty results.

spinoza1111 · Feb 18, 2010

But dear boy, you said that software improves if we define
requirements. Please explain how this work, and note that industrial
experience is that the success of a software development project is
completely independent of requirements quality. Sometimes, beautiful
requirements produce beautiful code, sometimes, beautiful requirements
beautifully miss the point. Sometimes, Extreme programming projects
start out with no requirements and beauty results.

An interesting factoid about this discussion is that no American or
British posters, save one, have solved the problem I set (write
replace without using string.H).

Willem appears to be Dutch, and io_x appears to be Italian. I am an
American expatriate.

I think this is linked to the "Tea Party" movement in America, which
is ignorance rampant. It's people who absolutely rely on Social
Security and Medicare calling for an end to Social Security and
Medicare. It's also linked to the antics of the "British National
Party".

Chris M. Thomasson · Feb 18, 2010

Here is a simple technique that can possibly cut down on the number of
search operations one needs to perform. It does this by using the length of
the string, (e.g., fits in well with counted string), and initializes a hash
table of the stores characters of the comparand string from right-to-left.
It performs a search from left-to-right, however it compares source against
comparand in right-to-left. If a search character does not exist within the
comparand, the search index can be incremented by the length of the
comparand. For instance:
_______________________________________________________________
src = ChrisHelloJohn
cmp = Hello
hash['o'] = 5
hash['l'] = 4
hash['e'] = 2
hash['H'] = 1
_______________________________________________________________

Okay, we want to find `cmp' in `src. So you put the head index on `s' and
you put the tail on 'C'. You can skip 5 characters if `hash[src[head]]' is
zero. There are more elaborate ways to skip characters and I think you can
use the right-to-left hash list to determine other things as well. Here is
some a very crude initial code sketch of the algorithm I am thinking about:

http://clc.pastebin.com/f7df86b63

Here is the output I get for search for "Hello" in the following string:

"ldsefkdsjehuoeewhs;dfjpewfjHello9438jfgohj"
_______________________________________________________________
right-to-left hash hit 'o'!
right-to-left hash hit 'l'!
right-to-left hash miss 'l' chars
right-to-left hash hit 'e'!
right-to-left hash hit 'H'!
compare(1) 'f' to 'o'
skip 5 chars!
compare(2) 'e' to 'o'
compare(3) 'h' to 'o'
skip 5 chars!
compare(4) 'w' to 'o'
skip 5 chars!
compare(5) 'f' to 'o'
skip 5 chars!
compare(6) 'f' to 'o'
skip 5 chars!
compare(7) 'l' to 'o'
compare(8) 'o' to 'o'
compare(9) 'l' to 'l'
compare(10) 'l' to 'l'
compare(11) 'e' to 'e'
compare(12) 'H' to 'H'
((12) compares + (5) hashes) out of (42) characters target at (27)
Hello9438jfgohj
_______________________________________________________________

The algorithm was able to skip some characters and only perform 12
comparisons in order to find the comparand at index 27 in a string 42
characters wide. I think there are many more optimizations that could be
made...

Chris M. Thomasson · Feb 18, 2010

Chris M. Thomasson said:
Here is a simple technique that can possibly cut down on the number of
search operations one needs to perform. It does this by using the length
of the string, (e.g., fits in well with counted string), and initializes a
hash table of the stores characters of the comparand string from
right-to-left. It performs a search from left-to-right, however it
compares source against comparand in right-to-left. If a search character
does not exist within the comparand, the search index can be incremented
by the length of the comparand. For instance:
_______________________________________________________________
src = ChrisHelloJohn
cmp = Hello
hash['o'] = 5
hash['l'] = 4
hash['e'] = 2
hash['H'] = 1
_______________________________________________________________

Okay, we want to find `cmp' in `src. So you put the head index on `s' and
you put the tail on 'C'. You can skip 5 characters if `hash[src[head]]' is
zero. There are more elaborate ways to skip characters and I think you can
use the right-to-left hash list to determine other things as well. Here is
some a very crude initial code sketch of the algorithm I am thinking
about:

http://clc.pastebin.com/f7df86b63

[...]

I know there should be some bugs in there, so, don't use the code for
anything. I will post fixes as I find them. If you can fix a bug, that would
be very nice as well! Thanks in advance, I really would appreciate it.

:^)

Chris M. Thomasson · Feb 18, 2010

I found a stupid bug; here is fix:

http://clc.pastebin.com/f4f063111

And here is the difference from the previous code:

http://clc.pastebin.com/pastebin.php?diff=f4f063111

Nick Keighley · Feb 18, 2010

I'm not sure this is true !

acquiring good requirements is part of good software development
practice.

yes. Though various graphical representaions are popular as well.

well yes, but he didn't say define the requirements in code

You just answered your own question.

Click to expand...

But [...] you said that software improves if we define
requirements.

no he didn't he said "Software development practices [are] improved
[if] requirements [are] better defined." (the actual quote is at the
beginning of the post).

Please explain how this work, and note that industrial
experience is that the success of a software development project is
completely independent of requirements quality.

really, could you expand on that. I was under the impression that many
projects had failed due to badly defined requirements.

Sometimes, beautiful
requirements produce beautiful code, sometimes, beautiful requirements
beautifully miss the point.

then they aren't good requirements! If you ask a guy to build a garage
then complain he missed the swimming pool out that means you gave him
a poor requirement.

Sometimes, Extreme programming projects
start out with no requirements and beauty results.

in a sense you're deriving the requirement by iterative refinement.
Have you read "Principles of Software Engineering Management" by Tom
Gilb. He was extreme programming before the (slightly silly) name was
invented.

Nick Keighley · Feb 18, 2010

The scanf function is basically a mess, and is rarely used correctly. I
am at a loss to understand why it is introduced so early in programming
texts.

a former clc regular once posted

***
The fscanf equivalent of fgets is so simple
that it can be used inline whenever needed:-
char s[NN + 1] = "", c;
int rc = fscanf(fp, "%NN[^\n]%1[\n]", s, &c);
if (rc == 1) fscanf("%*[^\n]%*c);
if (rc == 0) getc(fp);
***

I think scanf() is seen as a straight forward way to read simple
unvalidated input. I'm not convinced that's a good idea.

Chris M. Thomasson · Feb 18, 2010

Chris M. Thomasson said:
Here is a simple technique that can possibly cut down on the number of
search operations one needs to perform.
[...]

The algorithm was able to skip some characters and only perform 12
comparisons in order to find the comparand at index 27 in a string 42
characters wide. I think there are many more optimizations that could be
made...

Here is a version with some minor optimizations that make it perform better
for certain types of input:

http://clc.pastebin.com/fe41ca06

here is difference:

http://clc.pastebin.com/pastebin.php?diff=fe41ca06

I am learning that this simple algorithm works better with large alphabets.

Chris M. Thomasson · Feb 18, 2010

spinoza1111wrote:

Click to expand...

[...]

I think standard library routines can be written in a language other
than C,
or some mix. For example, hand-written assembly.

Click to expand...

Correct. Wonder how many library routines are written in assembler.
Don't know.

Here is how `strstr()' is implmented on one of my compilers:

http://clc.pastebin.com/m15e4ab43

Chris M. Thomasson · Feb 18, 2010

If found another bug; here is fix:

http://clc.pastebin.com/fdc461ce

and difference:

http://clc.pastebin.com/pastebin.php?diff=fdc461ce

I think I will write a little automated test application.

Fibonacci	0	May 13, 2023
Blue J Ciphertext Program	2	Nov 22, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
Need help finding Segmentation fault C++	0	Apr 16, 2022
Python code problem	2	Apr 23, 2023
Scanf is being prioritized over printf ?	1	Nov 5, 2023
My Status, Ciphertext	2	Nov 28, 2023
Can someone pls help me with a little algorithm script	1	Nov 28, 2024

substring finding problem!

Ben Bacarisse

Ben Bacarisse

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

spinoza1111

Chris M. Thomasson

Chris M. Thomasson

Chris M. Thomasson

Nick Keighley

Nick Keighley

Chris M. Thomasson

Chris M. Thomasson

Chris M. Thomasson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads