substring finding problem!

spinoza1111 · Feb 18, 2010

spinoza1111wrote:

No, it isn't. Learn to read.

> If you'd troubled to take a class

This isn't about authority. It's about cluelessness.

> Nobody except a Fascist or a child refuses to believe

That's not what he's suggesting. Learn to read. What he's saying is of
the form "X is a bozo. X says Y. Y might be right or it might be wrong.
Either way, X's opinion is irrelevant because X is a bozo." This is very
different from the "ad hominem" fallacy, "X says Y. X is a bozo.
Therefore Y is wrong". Seebs knows you're a bozo, but he's not daft
enough to assume a priori that everything you say is necessarily
incorrect. And this is very wise: despite the apparent difficulty of the
task, I can call several counter-examples to mind.

Knowledge is justified true belief pace Gettier, but Seebs is not a
qualified programmer (incorrect simple examples posted, unable to code
without string.h, ignorance of the heap, no education in comp sci,
unable to properly moderate clcm). Therefore he merely "believes" I'm
a bozo because he can't take constructive criticism.

And he is daft enough, in fact, to reject everything I say no matter
what, so far. Whereas you are not.

spinoza1111 · Feb 18, 2010

Here is a simple technique that can possibly cut down on the number of
search operations one needs to perform. It does this by using the length of
the string, (e.g., fits in well with counted string), and initializes a hash
table of the stores characters of the comparand string from right-to-left..
It performs a search from left-to-right, however it compares source against
comparand in right-to-left. If a search character does not exist within the
comparand, the search index can be incremented by the length of the
comparand. For instance:
_______________________________________________________________
src = ChrisHelloJohn
cmp = Hello
hash['o'] = 5
hash['l'] = 4
hash['e'] = 2
hash['H'] = 1
_______________________________________________________________

Okay, we want to find `cmp' in `src. So you put the head index on `s' and
you put the tail on 'C'. You can skip 5 characters if `hash[src[head]]' is
zero. There are more elaborate ways to skip characters and I think you can
use the right-to-left hash list to determine other things as well. Here is
some a very crude initial code sketch of the algorithm I am thinking about:

http://clc.pastebin.com/f7df86b63

Here is the output I get for search for "Hello" in the following string:

"ldsefkdsjehuoeewhs;dfjpewfjHello9438jfgohj"
_______________________________________________________________
right-to-left hash hit 'o'!
right-to-left hash hit 'l'!
right-to-left hash miss 'l' chars
right-to-left hash hit 'e'!
right-to-left hash hit 'H'!
compare(1) 'f' to 'o'
skip 5 chars!
compare(2) 'e' to 'o'
compare(3) 'h' to 'o'
skip 5 chars!
compare(4) 'w' to 'o'
skip 5 chars!
compare(5) 'f' to 'o'
skip 5 chars!
compare(6) 'f' to 'o'
skip 5 chars!
compare(7) 'l' to 'o'
compare(8) 'o' to 'o'
compare(9) 'l' to 'l'
compare(10) 'l' to 'l'
compare(11) 'e' to 'e'
compare(12) 'H' to 'H'
((12) compares + (5) hashes) out of (42) characters target at (27)
Hello9438jfgohj
_______________________________________________________________

The algorithm was able to skip some characters and only perform 12
comparisons in order to find the comparand at index 27 in a string 42
characters wide. I think there are many more optimizations that could be
made...

Excellent work. Thanks, will examine when I have more time.

spinoza1111 · Feb 18, 2010

I'm not sure this is true !

acquiring good requirements is part of good software development
practice.

yes. Though various graphical representaions are popular as well.

well yes, but he didn't say define the requirements in code

But [...] you said that software improves if we define
requirements.

Click to expand...

no he didn't he said "Software development practices [are] improved
[if] requirements [are] better defined." (the actual quote is at the
beginning of the post).

Please explain how this work, and note that industrial
experience is that the success of a software development project is
completely independent of requirements quality.

Click to expand...

really, could you expand on that. I was under the impression that many
projects had failed due to badly defined requirements.

Requirements are by definition inadequate for if they were adequate,
we could build a compiler to compile them. The demand for
"requirements definition" presupposes that the "analysts", thought to
be of a higher social class, know by virtue of their social class more
than the "mere" programmers.

then they aren't good requirements! If you ask a guy to build a garage
then complain he missed the swimming pool out that means you gave him
a poor requirement.

in a sense you're deriving the requirement by iterative refinement.
Have you read "Principles of Software Engineering Management" by Tom
Gilb. He was extreme programming before the (slightly silly) name was
invented.

Yes, I've read Gilb...a member of the Scandinavian school of data
processing theory, which took the needs of the workers into account.

spinoza1111 · Feb 18, 2010

spinoza1111wrote:

Correct, you finally got it.

Are you asking or telling?

But does the code do what is needed?

That makes no sense. No matter what comes out of a program with no
requirements must be correct, novel approach to guaranteed success

No, the programmers themselves are human beings and able, in some
cases better than users, to express the requirements. The myth of
requirements started as a conspiracy between incompetent and
irresponsible programmers who wanted to be indemnified for their
ignorance and stupid mistakes, and users who wanted credit and
"control".

Nick Keighley · Feb 18, 2010

Software development practices have improved a lot as applications
have become more complex and requirements better defined. [...]
How can "defining requirements" improve "software development
practices"?

Click to expand...

acquiring good requirements is part of good software development
practice.

really, could you expand on that. I was under the impression that many
projects had failed due to badly defined requirements.

Requirements are by definition inadequate for if they were adequate,
we could build a compiler to compile them.[/QUOTE]

comp.lang.lisp are discussing this at the moment. They have a slightly
different view in that they program in the Highest Level Possible
Langage (HLPL) hence if someone is going to write a specification in a
formal language they might as well just write it in Lisp, and then
execute it.

The point was raised that a requirement can be complete but not
specify a program.

- compute 1 million digits of pi
- decrypt an encypted message given only the public key

The demand for
"requirements definition" presupposes that the "analysts", thought to
be of a higher social class, know by virtue of their social class more
than the "mere" programmers.

no, I just think the guy who is going to use the thing ought to give
us vague idea of what he wants and what he expects to do with it. If
you want to call the people who extract this information "analysts",
so be it.

Yes, I've read Gilb...a member of the Scandinavian school of data
processing theory, which took the needs of the workers into account.

and despite this, writes a rattling good book!

He's firmly of the belief that the requirements are never complete. In
part because the program changes its environment. The existence of the
program causes working practices to change and hence requirements to
alter. The solution is "evolutionary delivery". A series of deliveries
that asymptopically approach the (hopefully!) settling requirment.

Walter Banks · Feb 18, 2010

spinoza1111 said:
I don't know where you . . . get that claim you said I made:
that I wish to destroy this newsgroup.

I have never said that.

Is that your wish?

w..

Nick Keighley · Feb 18, 2010

comp.lang.lisp are discussing this at the moment. They have a slightly
different view in that they program in the Highest Level Possible
Langage (HLPL) hence if someone is going to write a specification in a
formal language they might as well just write it in Lisp, and then
execute it.

the thread is called "a defense of ad hoc software development"

***
The difference between specifications and programs is a
difference in degree, not a difference in kind. Once we realize this,
it seems strange to require that one write specifications for a
program before beginning to implement it. If the program has to be
written in a low-level language, then it would be reasonable to
require that it be described in high-level terms first. But as the
programming language becomes more abstract, the need for
specifications begins to evaporate. Or rather, the implementation and
the specifications can become the same thing.
If the high-level language is going to be re-implemented in a
lower-level language, it starts to look even more like
specifications.
[...] in other words, [...] the specifications for C programs could
be
written in Lisp.
***

<snip>

Walter Banks · Feb 18, 2010

spinoza1111 said:
spinoza1111wrote:
Software development practices have improved a lot as applications
have become more complex and requirements better defined.

Click to expand...

I'm not sure this is true !

How can "defining requirements" improve "software development
practices"?

Click to expand...

acquiring good requirements is part of good software development
practice.

"Defining requirements" is being able to express the
requirements in English, other human languages, visuals, and if
necessary mathematical formalisms:

Click to expand...

yes. Though various graphical representaions are popular as well.

if you define them in code, you're
not, by definition, "defining requirements": you are coding.

Click to expand...

well yes, but he didn't say define the requirements in code

You just answered your own question.

Click to expand...

But [...] you said that software improves if we define
requirements.

Click to expand...

no he didn't he said "Software development practices [are] improved
[if] requirements [are] better defined." (the actual quote is at the
beginning of the post).

Please explain how this work, and note that industrial
experience is that the success of a software development project is
completely independent of requirements quality.

Click to expand...

really, could you expand on that. I was under the impression that many
projects had failed due to badly defined requirements.

Click to expand...

Requirements are by definition inadequate for if they were adequate,
we could build a compiler to compile them.

Defining how an application is supposed to behave is very different
from an application design and implementation. Requirements provide
clear input into software design and application testing.

w..

ImpalerCore · Feb 18, 2010

Here is a simple technique that can possibly cut down on the number of
search operations one needs to perform. It does this by using the length
of the string, (e.g., fits in well with counted string), and initializes a
hash table of the stores characters of the comparand string from
right-to-left. It performs a search from left-to-right, however it
compares source against comparand in right-to-left. If a search character
does not exist within the comparand, the search index can be incremented
by the length of the comparand. For instance:
_______________________________________________________________
src = ChrisHelloJohn
cmp = Hello
hash['o'] = 5
hash['l'] = 4
hash['e'] = 2
hash['H'] = 1
_______________________________________________________________

Click to expand...

Okay, we want to find `cmp' in `src. So you put the head index on `s' and
you put the tail on 'C'. You can skip 5 characters if `hash[src[head]]' is
zero. There are more elaborate ways to skip characters and I think you can
use the right-to-left hash list to determine other things as well. Here is
some a very crude initial code sketch of the algorithm I am thinking
about:

Click to expand...

http://clc.pastebin.com/f7df86b63

Click to expand...

[...]

I know there should be some bugs in there, so, don't use the code for
anything. I will post fixes as I find them. If you can fix a bug, that would
be very nice as well! Thanks in advance, I really would appreciate it.

:^)

Would you write me a check for $2.56 for each one

bartc · Feb 18, 2010

Richard said:
Would trust any programmer who comments like this?

,----
| push edi ; Preserve edi, ebx and esi
| push ebx
| push esi
`----

I found the following quite scary:

mov ecx,[esp + 8] ; str2 (the string to be searched for)

push edi ; Preserve edi, ebx and esi
push ebx
push esi
....
mov edi,[esp + 10h] ; str1 (the string to be searched)
....
mov ecx,[esp + 14h] ; str2

First, that it uses hardcoded offsets onto the stack. Secondly, that after
those pushes it just matter-of-factly uses a different set of offsets to
compensate! And perhaps thirdly that it uses hex for the offsets...

Which leads me to think it might be machine generated, with the comments
added later?

Chris M. Thomasson · Feb 19, 2010

[...]

Yes, strcat and strlen are O(N) - so, where it matters, you remember
the
string length, having found it out the first time.

Click to expand...

Bingo! :^)

Click to expand...

Yes, strcat and strlen are order N
So be sure to squirrel away their values, like the Hen
Secretes seeds in her cache of seminal treasures
Great programmers, says Dickie, take these measures.

I personally think it would be wise to stash length values away if you need
to use them again, and again, and again.... Okay, you get the point.

;^)

Chris M. Thomasson · Feb 19, 2010

Na. However, I would praise you for finding a bug Sir! Well, that's not that
much of an incentive in any way, shape or form!!!! Well, thank you anyway!

;^D

If you find one, could you please educate me? IMHO, a dead bug is a good
bug!!!!

:^D

Chris M. Thomasson · Feb 19, 2010

Richard Harter said:
Here is a simple technique that can possibly cut down on the number of
search operations one needs to perform.

Click to expand...

[...]

http://clc.pastebin.com/f7df86b63

Click to expand...

You are well on your way to reinventing Boyer-Moore.

Humm. I don't compute two tables. Anyway, yes you are right! My solution ==
"kind of?" crappy at best! Skipping `cmp_size' chars is good... This just
might be a hard core retarded version of great algorithms?

:^D

blmblm · Feb 19, 2010

[ snip ]

An interesting factoid about this discussion is that no American or
British posters, save one, have solved the problem I set (write
replace without using string.H).

Oh, it's string.H we're to avoid? is it okay to use string.h?
Yeah, yeah, nitpick ....

Well, for the record, I'm an American mostly-lurker, and I've
been tinkering with various solutions over the past few days, not
looking carefully at others' solutions because that would seem
to spoil the fun. I've just posted some of them, in a reply to
Seebs's post with subject line "Efficency [sic] and the standard
library".

For what it's worth, all my solutions use recursion -- that seemed

to me to be a natural way to solve this problem. said:
Willem appears to be Dutch, and io_x appears to be Italian. I am an
American expatriate.

I think this is linked to the "Tea Party" movement in America, which
is ignorance rampant. It's people who absolutely rely on Social
Security and Medicare calling for an end to Social Security and
Medicare. It's also linked to the antics of the "British National
Party".

(Quoted to preserve context. I'll resist the temptation to comment.)

Seebs · Feb 19, 2010

Oh, it's string.H we're to avoid? is it okay to use string.h?
Yeah, yeah, nitpick ....

It's also a silly challenge. If I were going to do that, my first pass
would just be to put "my" in front of the str* functions I use, and
define them.

But why would I do that? How about we set a new challenge, which is to
do matrix multiplication without using the '*' operator. Or, better yet,
how about we iterate through an array without using a for loop, and write
our own routines to interpolate numbers into textual strings, so we don't
have to rely on printf()?

Without a *reason* to not use the string library, this is nothing more
than a challenge to do something stupid.

-s

Ben Pfaff · Feb 19, 2010

Seebs said:
But why would I do that? How about we set a new challenge, which is to
do matrix multiplication without using the '*' operator. Or, better yet,
how about we iterate through an array without using a for loop, and write
our own routines to interpolate numbers into textual strings, so we don't
have to rely on printf()?

Funny, I see posters asking for code to do dumb things like this
regularly.

Nick · Feb 19, 2010

Malcolm McLean said:
That's the ad hominem fallacy. It's not a pretentious term for
"insult" but a common falacy, which is to suppose that an argument is
wrong because of the person who is making it.

In fact there are good reasons for deprecating string.h. chars
effectively have to be octets, whilst often programs need to accept
non-Latin strings. Then the functions are all very old, with certain
weaknesses (no protection from buffer overun in strcpy, an O(N)
performance for strcat and strlen, an inconvenient interface for
strcat, const inconsistencies with strchr, very poor functionality
with strfind and const inconsiencies here too, very serious buffer
problems with sprintf, an overly difficult interface and buffer
problems with sscanf, thread problems with strtok and a non-intuitive
interface.

That, though, is an argument for implementing a different string library
- of which there are dozens. /Then/ you use that to implement your find
and replace.

What you don't do is to re-implement the standard library with all the
API drawbacks.

Once you have a basic dynamic string library, it's pretty trivial -
here's a snippet from my source bucket.

dstring *Do_Fn_Replace(const char *substrate,const char *pattern,const char *replacement) {
dstring *res = dstr_create(D_EXPR);
char *p;

// no activity on empty substrate or pattern
if(*substrate == '\0' || *pattern == '\0') {
dstrset(res,substrate);
return res;
}

dstrset(res,"");
while((p=strstr(substrate,pattern))) {
dstrncat(res,substrate,p-substrate);
dstrcat(res,replacement);
substrate = p + strlen(pattern);
}
dstrcat(res,substrate);

return res;
}

blmblm · Feb 19, 2010

On Feb 18, 12:33 pm, Walter Banks <[email protected]> wrote:

Click to expand...

[ snip ]

An interesting factoid about this discussion is that no American or
British posters, save one, have solved the problem I set (write
replace without using string.H).

Click to expand...

Oh, it's string.H we're to avoid? is it okay to use string.h?
Yeah, yeah, nitpick ....

Click to expand...

My understanding is that the requirement is there to make life
simpler for people not family with the C standard library,
particularly the functions prototyped in string.h, and for people
who have trouble distinguishing between upper and lower case.

Well, for the record, I'm an American mostly-lurker, and I've
been tinkering with various solutions over the past few days, not
looking carefully at others' solutions because that would seem
to spoil the fun. I've just posted some of them, in a reply to
Seebs's post with subject line "Efficency [sic] and the standard
library".

For what it's worth, all my solutions use recursion -- that seemed
to me to be a natural way to solve this problem. <shrug>

Click to expand...

I also happen to be American and I also posted one; however I put
it in a separate thread. See the thread entitled "A substring
replacement implementation". You might want to include it in
your test suite.

Yes, I saw your code, though I really only looked at the main
program and the comments -- both of which I rather like better
than mine.

I think I was rather hoping that the others who had posted
solutions might use my code to benchmark, but really, how hard
would it be for *me* to do that .... Maybe I will. "Stay tuned"?

Seebs · Feb 19, 2010

Once you have a basic dynamic string library, it's pretty trivial -
here's a snippet from my source bucket.

And if you have a sufficiently complete string library, it's COMPLETELY
trivial:

char *do_replace(char *needle, char *haystack, char *replacement) {
return mylib_replace_string(needle, haystack, replacement);
}

-s

Nick · Feb 19, 2010

Seebs said:
And if you have a sufficiently complete string library, it's COMPLETELY
trivial:

char *do_replace(char *needle, char *haystack, char *replacement) {
return mylib_replace_string(needle, haystack, replacement);
}

Well exactly. My first level implements the basic allocate and destroy,
plus "string.h like" functions to assign, extend, get length, etc. You
have to stop somewhere.

Fibonacci	0	May 13, 2023
Blue J Ciphertext Program	2	Nov 22, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
Need help finding Segmentation fault C++	0	Apr 16, 2022
Python code problem	2	Apr 23, 2023
Scanf is being prioritized over printf ?	1	Nov 5, 2023
My Status, Ciphertext	2	Nov 28, 2023
Can someone pls help me with a little algorithm script	1	Nov 28, 2024

substring finding problem!

spinoza1111

spinoza1111

spinoza1111

spinoza1111

Nick Keighley

Walter Banks

Nick Keighley

Walter Banks

ImpalerCore

bartc

Chris M. Thomasson

Chris M. Thomasson

Chris M. Thomasson

blmblm

Seebs

Ben Pfaff

Nick

blmblm

Seebs

Nick

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads