substring finding problem!

bartc · Feb 16, 2010

fedora said:
Hi all!

Have finished my program for spinoza's challenge. rewrote everything and
this time i made each statement as simple as posible, so that i can
understand the program. The allSubstr procedure can search for over
lapping
sub-string too like spinoza wanted, but the replace routine doesnt use
that
since i cant think how to replace over lapping ones!

haven't used any function from string.h! it works for strings i could
think
of but maybe it got bugs since i'm just a beginner...

Seems to be solid enough.

Except, if it can't find a substring (I think for substrings longer than the
text), sometimes it returns the original text unchanged, and sometimes it
returns NULL, eg.

"a", "ab", "" returns NULL, but:

"ab", "x", "" returns "ab"

fedora · Feb 16, 2010

fedora said:
Hi all!

Have finished my program for spinoza's challenge. rewrote everything and
this time i made each statement as simple as posible, so that i can
understand the program. The allSubstr procedure can search for over
lapping sub-string too like spinoza wanted, but the replace routine doesnt
use that since i cant think how to replace over lapping ones!

haven't used any function from string.h! it works for strings i could
think of but maybe it got bugs since i'm just a beginner...

how's mine spinoza111?

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

size_t strLength(char *cstr) {
size_t index = 0;

while (cstr[index] != '\0') ++index;
return index;
}

char *strFirstCh(char *str, char ch, size_t lstr) {
char *chpos = 0;
size_t current;

for (current = 0; current < lstr; current++) {
if (str[current] == ch) {
chpos = str + current;
break;
}
}
return chpos;
}

int strComp(char *s, char *t, size_t len) {
int ret = 0;
size_t index;

for (index = 0; index < len; index++) {
if (s[index] != t[index]) {
ret = 1;
break;
}
}
return ret;
}

char *strSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub) {
char *substr = 0;
char *anchor = str;
size_t remaining_len = (lstr - lsub) + 1;

assert(str && sub && lstr && lsub && lstr >= lsub);
while (remaining_len > 0 && anchor) {
if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
if (strComp(anchor, sub, lsub) == 0) {
substr = anchor;
break;
}
anchor++;
remaining_len--;
}
}
return substr;
}

unsigned allSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub,
char ***ps,
int overlap) {
unsigned occurs = 0;
unsigned ctr;
char *orig_str = str;
size_t orig_lstr = lstr;
size_t step;

if (overlap == 1)
step = 1;
else
step = lsub;

while (lstr >= lsub) {
str = strSubstr(str, sub, lstr, lsub);
if (str == 0)
break;
occurs++;
str += step;
lstr = (orig_str + orig_lstr) - str;
}

if (occurs > 0 && ps) {
str = orig_str;
lstr = orig_lstr;
*ps = malloc(occurs * sizeof **ps);
if (*ps) {
for (ctr = 0; ctr < occurs; ctr++) {
ps[0][ctr] = str = strSubstr(str, sub, lstr, lsub);
str += step;
lstr = (orig_str + orig_lstr) - str;
}
}
}
return occurs;
}

char *replace(char *str, char *substr, char *rep) {
char *new = 0;
size_t lstr, lsubstr, lrep, lnew, strc, newc, repc, replaced;
unsigned replacements;
char **subpos;

assert(str && substr && rep);
lstr = strLength(str);
lsubstr = strLength(substr);
lrep = strLength(rep);
if (lstr == 0 || lsubstr == 0 || lsubstr > lstr)
return 0;
replacements = allSubstr(str, substr, lstr, lsubstr, &subpos, 0);
if (replacements > 0) {
lnew = (lstr - (replacements * lsubstr)) + (replacements * lrep);
new = malloc(lnew + 1);
if (!new)
return 0;

Oops... mem leak here! I return without giving back the subpos array. THat
should be :-

if (!new) {
free(subpos);
return 0;
}

strc = newc = replaced = 0;
while (strc <= lstr) {
if (str + strc == subpos[replaced]) {
for (repc = 0; repc < lrep; repc++) {
new[newc] = rep[repc];
newc++;
}
replaced++;
strc += lsubstr;
}
else {
new[newc] = str[strc];
strc++;
newc++;
}
}
free(subpos);
}
else {
new = malloc(lstr + 1);
if (!new)
return 0;
for (strc = 0; strc <= lstr; strc++)
new[strc] = str[strc];
}
return new;
}

int main(int argc, char **argv) {
char *newstr;

assert(argc == 4);
newstr = replace(argv[1], argv[2], argv[3]);
if (newstr)
printf("%s\n", newstr);
else
printf("replace() -> null\n");
free(newstr);
return 0;
}

thanks a lot all who helped!

fedora · Feb 16, 2010

bartc said:
Seems to be solid enough.

Except, if it can't find a substring (I think for substrings longer than
the text), sometimes it returns the original text unchanged, and sometimes
it returns NULL, eg.

"a", "ab", "" returns NULL, but:

"ab", "x", "" returns "ab"

thanks bartc!

The reason the first one returns null pointer is because i assume its
incorrect to call the routine with a substring bigger than target string.
maybe i should've put it into the assert but i thought it wasn't serious
enought to crash, so i return null ptr.

In second case it returns original string (not exactly but copy of original
because main routine free()s the replace's returned pointer but we cant
free() argv[] strings!!) because the sub-string doesn't occur and there's
nothing to replace. replace("ab", "ab", "") will give an empty string. hope
all cases are logivcal and consistent!

for right to left, we can simply reverse the target string before sending to
replace(), but i didn't put the functionality for that. The replace
procedure is very untidy to me! it could've been made much more neat and
efficient if i'd written strcpy() too, but not yet done!

anyways, seeing Williem's recursive program makes me ashamed i'm so stupid
beginner!!

thanks all

Phil Carmody · Feb 16, 2010

Chris M. Thomasson said:
Now you are confusing me here. First you say it's not a challenge,
then you seem to contradict yourself. Can you please clear this up for
me? Thanks.

Do not feed the troll.

Phil

Ben Bacarisse · Feb 16, 2010

fedora said:
Have finished my program for spinoza's challenge. rewrote everything and
this time i made each statement as simple as posible, so that i can
understand the program. The allSubstr procedure can search for over lapping
sub-string too like spinoza wanted, but the replace routine doesnt use that
since i cant think how to replace over lapping ones!

haven't used any function from string.h! it works for strings i could think
of but maybe it got bugs since i'm just a beginner...

The result is good, but I am not sure you were right to accept the
peculiar notion of not using standard string functions. If you felt
you had to, why not use standard functions but then plug-in you own
versions? That way you learn about the standard library and get to
write the character-fiddling functions that can be useful learning
exercises.

I'll make a few detailed comments (one is a bug), but on the "big
picture" I don't see why you mix size_t and unsigned all over the
place. I'd stick to size_t.

Finally, I think it is odd to return a null result when the substring
is too long to match. I'd treat is like any other substring that does
not match.

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

size_t strLength(char *cstr) {
size_t index = 0;

while (cstr[index] != '\0') ++index;
return index;
}

char *strFirstCh(char *str, char ch, size_t lstr) {
char *chpos = 0;
size_t current;

for (current = 0; current < lstr; current++) {
if (str[current] == ch) {
chpos = str + current;
break;
}
}
return chpos;
}

int strComp(char *s, char *t, size_t len) {
int ret = 0;
size_t index;

for (index = 0; index < len; index++) {
if (s[index] != t[index]) {
ret = 1;
break;
}
}
return ret;

Small point: ret is the same as index != len. Can you see why? Many
C programmers would just return index != len here. Also, I'd reverse
the sense of the returned value and call the function strEqual.

}

char *strSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub) {
char *substr = 0;
char *anchor = str;
size_t remaining_len = (lstr - lsub) + 1;

assert(str && sub && lstr && lsub && lstr >= lsub);
while (remaining_len > 0 && anchor) {
if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
if (strComp(anchor, sub, lsub) == 0) {
substr = anchor;
break;
}
anchor++;
remaining_len--;
}
}
return substr;
}

unsigned allSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub,
char ***ps,
int overlap) {
unsigned occurs = 0;
unsigned ctr;
char *orig_str = str;
size_t orig_lstr = lstr;
size_t step;

if (overlap == 1)
step = 1;
else
step = lsub;

while (lstr >= lsub) {
str = strSubstr(str, sub, lstr, lsub);
if (str == 0)
break;
occurs++;
str += step;
lstr = (orig_str + orig_lstr) - str;
}

if (occurs > 0 && ps) {
str = orig_str;
lstr = orig_lstr;
*ps = malloc(occurs * sizeof **ps);
if (*ps) {
for (ctr = 0; ctr < occurs; ctr++) {
ps[0][ctr] = str = strSubstr(str, sub, lstr, lsub);
str += step;
lstr = (orig_str + orig_lstr) - str;
}
}
}
return occurs;
}

char *replace(char *str, char *substr, char *rep) {
char *new = 0;
size_t lstr, lsubstr, lrep, lnew, strc, newc, repc, replaced;
unsigned replacements;
char **subpos;

assert(str && substr && rep);
lstr = strLength(str);
lsubstr = strLength(substr);
lrep = strLength(rep);
if (lstr == 0 || lsubstr == 0 || lsubstr > lstr)
return 0;
replacements = allSubstr(str, substr, lstr, lsubstr, &subpos, 0);
if (replacements > 0) {
lnew = (lstr - (replacements * lsubstr)) + (replacements * lrep);
new = malloc(lnew + 1);
if (!new)
return 0;
strc = newc = replaced = 0;
while (strc <= lstr) {
if (str + strc == subpos[replaced]) {

You have a bug here. replaced can become equal to the size of the
subpos array and, hence, you index outside of it.

for (repc = 0; repc < lrep; repc++) {
new[newc] = rep[repc];
newc++;
}
replaced++;
strc += lsubstr;
}
else {
new[newc] = str[strc];
strc++;
newc++;
}
}
free(subpos);
}
else {
new = malloc(lstr + 1);
if (!new)
return 0;
for (strc = 0; strc <= lstr; strc++)
new[strc] = str[strc];
}
return new;
}

int main(int argc, char **argv) {
char *newstr;

assert(argc == 4);

I don't think this is a good use of assert. It is almost always
wrong to use it to check user input. I'd just use an "if".

newstr = replace(argv[1], argv[2], argv[3]);
if (newstr)
printf("%s\n", newstr);
else
printf("replace() -> null\n");
free(newstr);
return 0;
}

Chris M. Thomasson · Feb 16, 2010

Walter Banks said:
As much as I like recursive solutions for many things including most of
the
parsers I have written.

There are some application areas where recursion is avoided. Most of the
automotive bugs 10 or 15 years ago had a stack depth component and most
code is now written with predictable run time requirements.

I do not "necessarily" want to restrict "potential" user input in order to
get around the limitations of a recursive function in an environment that
has a rather small per-thread stack size. If you can create an iterative
solution, then I say go ahead and do it. This may work out when you realize
that the limits you set on a recursive function are to great to run on a
system that has lower per-task/thread stack size.

Chris M. Thomasson · Feb 16, 2010

Correct. Wonder how many library routines are written in assembler.
Don't know.

Well, a standard library implementation in the form of a deferment to native
OS library calls can be written in 100% assembly language. Perhaps a
standard C header can defer to highly efficient native OS provided
primitives.

Chris M. Thomasson · Feb 16, 2010

[...]

Well, a standard library implementation in the form of a deferment to
native OS library calls can be written in 100% assembly language.

Ummm... The above should probably read as:
__________________________________________________________
Well, a standard library implementation can defer to native OS provided
library calls that happen to be implemented in 100% assembly language...
__________________________________________________________

Is that any better?

;^o

spinoza1111 · Feb 17, 2010

Do not feed the troll.

I am not a "troll", and "troll" is a Nordic racist word, referring as
it does to peoples pushed out of Western Europe by invaders after the
fall of Rome. I have been discussing and submitting code written in C.

A "troll" is one who posts insincerely in order to get a rise out of
people. I do not do so.

Richard Heathfield is by no means a friend of mine, yet he has gone on
record to say that I am not a "troll".

Please keep your comments on-topic if you cannot keep them civil.

spinoza1111 · Feb 17, 2010

I do not "necessarily" want to restrict "potential" user input in order to
get around the limitations of a recursive function in an environment that
has a rather small per-thread stack size. If you can create an iterative
solution, then I say go ahead and do it. This may work out when you realize
that the limits you set on a recursive function are to great to run on a
system that has lower per-task/thread stack size.

Recursive solutions are in the thread where Seebs misspelled
"efficiency" in the header, in C by Willem and in C Sharp by myself.

Each solution will put something on the stack for each occurence of
the target. But if you don't recurse, then you need to create one of
my segments in my linked list. The stack frame is a couple of
addresses as is the segment.

Therefore, it seems to me that there's a minimum storage complexity to
the problem and not just to these two solutions. In a language/
computer where strings could grow magically this would not be the
case, but this would be the existence as if by magic of hardware that
could hey presto realloc strings "under the covers".

Here, the storage complexity is real if hidden. It is the need to keep
finding and reallocing small strings, or getting a big string and
discarding what you don't need.

And if you implement replace() in assembler, you still have the same
storage complexity.

The most storage efficient solution for C's format for strings would
examine the target and replacement strings. If the target length is
greater than or equal to that of the replacement, do the
transformation *in situ* by shifting bytes left. If the target length
is less, you must reallocate, perhaps more than once.

Which is why C programmers need to understand that C doesn't support
strings out of the box. Instead, it provides a silly set of solutions
based on the absurd idea that it makes sense to terminate a string
with a Nul.

C without strings could be a sensible language for low-level
programming of toys. C with strings needs to use a standardized, open
source modern string.H that represents strings as linked lists.

Chris M. Thomasson · Feb 17, 2010

[...]

Fine, since garbage collection is simpler than software design. We
have the right to think of software entities coming into existence and
dying without having to be midwifes or funeral directors.

What about forgetting to set a reference to NULL? Sometimes, you can
unnecessarily extend the lifetimes of objects in a pure GC system if you
forget to set certain object references to NULL. IMHO, a GC does not mean
you have to check you're brain at the door.

spinoza1111 · Feb 17, 2010

[...]

Fine, since garbage collection is simpler than software design. We
have the right to think of software entities coming into existence and
dying without having to be midwifes or funeral directors.

Click to expand...

What about forgetting to set a reference to NULL? Sometimes, you can
unnecessarily extend the lifetimes of objects in a pure GC system if you
forget to set certain object references to NULL. IMHO, a GC does not mean
you have to check you're brain at the door.

think a GC is convenient, and I also feel the same way about certain
library
functions. However, there are times when you do want to "re-invent"
something. For instance, I am okay with using various manual memory
management techniques to help relieve the pressure on a GC.

Click to expand...

True. In my book "Build Your Own .Net Language and Compiler" I have a
methodology for stateless objects which requires the user to dispose
the object calling a dispose() method. This allows the object to set
all its references to objects from the heap to null.

Precisely because you still need a brain when you have a garbage
collector means that you need the garbage collector since there's no
reason to waste fine brains on manual memory management when those
brains could be solving more important problems.

It is true, however, that you might run out of Fun Stuff to Think
About just as airline pilots in modern high tech cockpits traveling
across the Pacific might get bored. Too bad. I need the pilot to be
bored. I don't want him to have fun or be challenged.

spinoza1111 · Feb 17, 2010

spinoza1111wrote:

That is true. I have also gone on record as saying that you're an idiot.

That is correct. And I have gone on record as saying you're a fool.
So, we agree that I am not a troll.

fedora · Feb 17, 2010

Ben said:
The result is good, but I am not sure you were right to accept the
peculiar notion of not using standard string functions. If you felt
you had to, why not use standard functions but then plug-in you own
versions? That way you learn about the standard library and get to
write the character-fiddling functions that can be useful learning
exercises.

Thanks Ben! I didn't use stdlib functions because i wanted to how
easy/difficult it would be to write my own. also spinoza didn't accept
program that called function in string.h.

About plugging in my own versions, do you mean writing routines with the
same name so that at linking the linker finds my lib first and plug it in?
but i read somewhere that using ansi c's namespace and relying on linker is
all undefined and gcc can replace calls to stdlib functions with inline code
so bypassing my code too... but i'll try it.

I'll make a few detailed comments (one is a bug), but on the "big
picture" I don't see why you mix size_t and unsigned all over the
place. I'd stick to size_t.
ok.

Finally, I think it is odd to return a null result when the substring
is too long to match. I'd treat is like any other substring that does
not match.

and just return copy of original target str? Okay, i'll change code to do
that but right now i'm having bigger problems.

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

size_t strLength(char *cstr) {
size_t index = 0;

while (cstr[index] != '\0') ++index;
return index;
}

char *strFirstCh(char *str, char ch, size_t lstr) {
char *chpos = 0;
size_t current;

for (current = 0; current < lstr; current++) {
if (str[current] == ch) {
chpos = str + current;
break;
}
}
return chpos;
}

int strComp(char *s, char *t, size_t len) {
int ret = 0;
size_t index;

for (index = 0; index < len; index++) {
if (s[index] != t[index]) {
ret = 1;
break;
}
}
return ret;

Click to expand...

Small point: ret is the same as index != len. Can you see why? Many
C programmers would just return index != len here. Also, I'd reverse
the sense of the returned value and call the function strEqual.

okay. that's better than modelling after stdlib strcmp. my reasoning was
since there's only one case where two strings can be identical but many
cases where they can differ i'd use the one bool value for the first case
(false) and return different true values for un-equal cases. but i think
i've thought of it wrongly. boolean flase and true are both unique values
and not like C's definition of true.

}

char *strSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub) {
char *substr = 0;
char *anchor = str;
size_t remaining_len = (lstr - lsub) + 1;

assert(str && sub && lstr && lsub && lstr >= lsub);
while (remaining_len > 0 && anchor) {
if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
if (strComp(anchor, sub, lsub) == 0) {
substr = anchor;
break;
}
anchor++;
remaining_len--;
}
}
return substr;
}

unsigned allSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub,
char ***ps,
int overlap) {
unsigned occurs = 0;
unsigned ctr;
char *orig_str = str;
size_t orig_lstr = lstr;
size_t step;

if (overlap == 1)
step = 1;
else
step = lsub;

while (lstr >= lsub) {
str = strSubstr(str, sub, lstr, lsub);
if (str == 0)
break;
occurs++;
str += step;
lstr = (orig_str + orig_lstr) - str;
}

if (occurs > 0 && ps) {
str = orig_str;
lstr = orig_lstr;
*ps = malloc(occurs * sizeof **ps);
if (*ps) {
for (ctr = 0; ctr < occurs; ctr++) {
ps[0][ctr] = str = strSubstr(str, sub, lstr, lsub);
str += step;
lstr = (orig_str + orig_lstr) - str;
}
}
}
return occurs;
}

char *replace(char *str, char *substr, char *rep) {
char *new = 0;
size_t lstr, lsubstr, lrep, lnew, strc, newc, repc, replaced;
unsigned replacements;
char **subpos;

assert(str && substr && rep);
lstr = strLength(str);
lsubstr = strLength(substr);
lrep = strLength(rep);
if (lstr == 0 || lsubstr == 0 || lsubstr > lstr)
return 0;
replacements = allSubstr(str, substr, lstr, lsubstr, &subpos, 0);
if (replacements > 0) {
lnew = (lstr - (replacements * lsubstr)) + (replacements * lrep);
new = malloc(lnew + 1);
if (!new)
return 0;
strc = newc = replaced = 0;
while (strc <= lstr) {
if (str + strc == subpos[replaced]) {

Click to expand...

You have a bug here. replaced can become equal to the size of the
subpos array and, hence, you index outside of it.

Thanks for spotting! i'd never have seen that. I replaced that line by

if (replaced < replacements && str + strc == subpos[replaced]) {

i made no other changes and compiled and ran... but strange errors are
occurring. below i'm pasting session from gdb... i really appreciate if
anyone can tell me why it's happening. what's irritating is there is no
patter for the seg faults. sometimes it happens, sometimes not.

compiled with gcc -Wall -Wextra -std=c99 -pedantic -o replace replace.c -
ggdb3

# gdb ./replace
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...

(gdb) run "sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda" "+"
Starting program: /home/fedora/c/replace
"sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda" "+"

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400648 in strFirstCh (str=0x7fff1da36feb "rc/c/replace",
ch=115 's', lstr=18446744073709551546) at replace.c:17
17 if (str[current] == ch) {

how can str be "rc/c/replace"?? It should be some part of the "sdak..."
string as i see...
ch is with right value. but lstr is wrong. how did it get that value? i cant
see any thing in my code that is wrong but i'm not intel.

big thanks to any one who can say why str is "rc/c/replace" and lstr is
wrong...

am thinking if string programing in C is naturally so difficult or i'm just
stupid

for (repc = 0; repc < lrep; repc++) {
new[newc] = rep[repc];
newc++;
}
replaced++;
strc += lsubstr;
}
else {
new[newc] = str[strc];
strc++;
newc++;
}
}
free(subpos);
}
else {
new = malloc(lstr + 1);
if (!new)
return 0;
for (strc = 0; strc <= lstr; strc++)
new[strc] = str[strc];
}
return new;
}

int main(int argc, char **argv) {
char *newstr;

assert(argc == 4);

Click to expand...

I don't think this is a good use of assert. It is almost always
wrong to use it to check user input. I'd just use an "if".

newstr = replace(argv[1], argv[2], argv[3]);
if (newstr)
printf("%s\n", newstr);
else
printf("replace() -> null\n");
free(newstr);
return 0;
}

Click to expand...

Phil Carmody · Feb 17, 2010

Richard Heathfield said:
That is true. I have also gone on record as saying that you're an idiot.

He does it for the strokes. His motives may be different, and the
strokes he seeks may be different (though not vastly different from
some of the sci.math cranks I've encountered), but he's still doing
it to elicit responses. Troll, crank, idiot - you can tick all three
with him.

And Chris' post did nothing but invite him to spew more inane crap
onto c.l.c. Why would anyone want that?

Phil

fedora · Feb 17, 2010

Hi all!

Here's more session output from gdb... i cant figure out why it crashes for
random strings but not for others...

(gdb) run "askdjfhakdfhakdfhasdjkhaksdfhaklsdjfhlakfhlakfhaklfh" "ask" "+"
Starting program: /home/fedora/c/replace
"askdjfhakdfhakdfhasdjkhaksdfhaklsdjfhlakfhlakfhaklfh" "ask" "+"

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400648 in strFirstCh (str=0x7fffe11e6ff5 "ce", ch=97 'a',
lstr=18446744073709551559) at replace.c:17
17 if (str[current] == ch) {
(gdb) run
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaskadhfklajsdhfkajfhklajdfhjkafhkdf"
"a" "+"
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/fedora/c/replace
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaskadhfklajsdhfkajfhklajdfhjkafhkdf"
"a" "+"
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++sk+dhfkl+jsdhfk+jfhkl+jdfhjk+fhkdf

Program exited normally.
(gdb)

i can see that str and lstr are getting corrupt but i cant see where in my
code that is happening... and putting lots of printfs into is really
discouraging.

thanks.

fedora · Feb 17, 2010

Hi all!

now I added two printf calls to strSubstr and allSubstr to trace where str
and lstr are getting wrong values.

I also added one if statement to check if adding step to str would result in
it going beyond str+lstr. that is :-

if (((orig_str + orig_lstr) - str) <= step) break;

below is full code and run in gdb with still seg fault. suddenly lstr is
getting i can see, but which statement is doing that i can't see

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

size_t strLength(char *cstr) {
size_t index = 0;

while (cstr[index] != '\0') ++index;
return index;
}

char *strFirstCh(char *str, char ch, size_t lstr) {
char *chpos = 0;
size_t current;

for (current = 0; current < lstr; current++) {
if (str[current] == ch) {
chpos = str + current;
break;
}
}
return chpos;
}

int strComp(char *s, char *t, size_t len) {
int ret = 0;
size_t index;

for (index = 0; index < len; index++) {
if (s[index] != t[index]) {
ret = 1;
break;
}
}
return ret;
}

char *strSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub) {
char *substr = 0;
char *anchor = str;
size_t remaining_len = (lstr - lsub) + 1;

assert(str && sub && lstr && lsub && lstr >= lsub);
printf("in strSubstr: str = %p\tlstr = %zu\n", (void*)str, lstr);

while (remaining_len > 0 && anchor) {
if (anchor = strFirstCh(anchor, *sub, remaining_len)) {
if (strComp(anchor, sub, lsub) == 0) {
substr = anchor;
break;
}
anchor++;
remaining_len--;
}
}
return substr;
}

unsigned allSubstr(
char *str,
char *sub,
size_t lstr,
size_t lsub,
char ***ps,
int overlap) {
unsigned occurs = 0;
unsigned ctr;
char *orig_str = str;
size_t orig_lstr = lstr;
size_t step;

if (overlap == 1)
step = 1;
else
step = lsub;

while (lstr >= lsub) {
str = strSubstr(str, sub, lstr, lsub);
if (str == 0)
break;
occurs++;
if (((orig_str + orig_lstr) - str) <= step) break;
str += step;
lstr = (orig_str + orig_lstr) - str;
printf("in allSubstr: str = %p\tlstr = %zu\n", (void*)str, lstr);
}

if (occurs > 0 && ps) {
str = orig_str;
lstr = orig_lstr;
*ps = malloc(occurs * sizeof **ps);
if (*ps) {
for (ctr = 0; ctr < occurs; ctr++) {
ps[0][ctr] = str = strSubstr(str, sub, lstr, lsub);
str += step;
lstr = (orig_str + orig_lstr) - str;
}
}
}
return occurs;
}

char *replace(char *str, char *substr, char *rep) {
char *new = 0;
size_t lstr, lsubstr, lrep, lnew, strc, newc, repc, replaced;
unsigned replacements;
char **subpos;

assert(str && substr && rep);
lstr = strLength(str);
lsubstr = strLength(substr);
lrep = strLength(rep);
if (lstr == 0 || lsubstr == 0 || lsubstr > lstr)
return 0;
replacements = allSubstr(str, substr, lstr, lsubstr, &subpos, 0);
if (replacements > 0) {
lnew = (lstr - (replacements * lsubstr)) + (replacements * lrep);
new = malloc(lnew + 1);
if (!new) {
free(subpos);
return 0;
}
strc = newc = replaced = 0;
while (strc <= lstr) {
if (replaced < replacements && str + strc == subpos[replaced]) {
for (repc = 0; repc < lrep; repc++) {
new[newc] = rep[repc];
newc++;
}
replaced++;
strc += lsubstr;
}
else {
new[newc] = str[strc];
strc++;
newc++;
}
}
free(subpos);
}
else {
new = malloc(lstr + 1);
if (!new)
return 0;
for (strc = 0; strc <= lstr; strc++)
new[strc] = str[strc];
}
return new;
}

int main(int argc, char **argv) {
char *newstr;

assert(argc == 4);
newstr = replace(argv[1], argv[2], argv[3]);
if (newstr)
printf("%s\n", newstr);
else
printf("replace() -> null\n");
free(newstr);
return 0;
}

# gdb ./replace
(gdb) run "asdkhfaklsdjfhakldfhaklsdjfhakldfhakldfhaskldfh" "asd" "+"
Starting program: /home/fedora/c/replace
"asdkhfaklsdjfhakldfhaklsdjfhakldfhakldfhaskldfh" "asd" "+"
in strSubstr: str = 0x7fff646a86e4 lstr = 47
in allSubstr: str = 0x7fff646a86e7 lstr = 44
in strSubstr: str = 0x7fff646a86e7 lstr = 44
in allSubstr: str = 0x7fff646a8717 lstr = 18446744073709551612
in strSubstr: str = 0x7fff646a8717 lstr = 18446744073709551612

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400698 in strFirstCh (str=0x7fff646a8ff5 "ce", ch=97 'a',
lstr=18446744073709551559) at replace.c:17
17 if (str[current] == ch) {

(gdb) run "asdkhfaklsdjfhakldfhaklsdjfhakldfhakldfhaskldfh" "as" "+"
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/fedora/c/replace
"asdkhfaklsdjfhakldfhaklsdjfhakldfhakldfhaskldfh" "as" "+"
in strSubstr: str = 0x7fff27e376e5 lstr = 47
in allSubstr: str = 0x7fff27e376e7 lstr = 45
in strSubstr: str = 0x7fff27e376e7 lstr = 45
in allSubstr: str = 0x7fff27e3770f lstr = 5
in strSubstr: str = 0x7fff27e3770f lstr = 5
in strSubstr: str = 0x7fff27e376e5 lstr = 47
in strSubstr: str = 0x7fff27e376e7 lstr = 45
+dkhfaklsdjfhakldfhaklsdjfhakldfhakldfh+kldfh

Program exited normally.

as its shown using "as" for sub-string instead of "asd" makes it work ok.
somewhere some length calculation is getting wrapped around but am not able
to pin point...

thanks all.

Ben Bacarisse · Feb 17, 2010

fedora said:
Thanks Ben! I didn't use stdlib functions because i wanted to how
easy/difficult it would be to write my own. also spinoza didn't accept
program that called function in string.h.

Why do you want to do what spinoza1111 says? I am curious about how
you decided it was something you wanted to do.

His initial programs all used string.h and some versions erroneously
called strlen without including string.h. I don't know why he decided
to change his mind, but a skilled C programmer would use the standard
library and only do something more complex if there was a compelling
reason.

About plugging in my own versions, do you mean writing routines with the
same name so that at linking the linker finds my lib first and plug it in?
but i read somewhere that using ansi c's namespace and relying on linker is
all undefined and gcc can replace calls to stdlib functions with inline code
so bypassing my code too... but i'll try it.

You do it like this:

#define STRLEN my_strelen

and you write STRLEN, STRSTR etc in your code.

Your problems below come from this + 1, I think. It looks wrong.
Sorry I did not spot this first time round.

i made no other changes and compiled and ran... but strange errors are
occurring. below i'm pasting session from gdb... i really appreciate if
anyone can tell me why it's happening. what's irritating is there is no
patter for the seg faults. sometimes it happens, sometimes not.

If you run off an array, all kinds of strange things can happen. It
is often not worthwhile trying to work out exactly why (at least I've
stopped trying -- I just fix the problem).

compiled with gcc -Wall -Wextra -std=c99 -pedantic -o replace replace.c -
ggdb3

# gdb ./replace
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...

(gdb) run "sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda" "+"
Starting program: /home/fedora/c/replace
"sdakfhaskfhaskdfhaskdfhaksdfhksdfhajksdfhkjsdfhsdfasd" "sda" "+"

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400648 in strFirstCh (str=0x7fff1da36feb "rc/c/replace",
ch=115 's', lstr=18446744073709551546) at replace.c:17
17 if (str[current] == ch) {

how can str be "rc/c/replace"?? It should be some part of the "sdak..."
string as i see...
ch is with right value. but lstr is wrong. how did it get that value? i cant
see any thing in my code that is wrong but i'm not intel.

big thanks to any one who can say why str is "rc/c/replace" and lstr is
wrong...

See above.

am thinking if string programing in C is naturally so difficult or i'm just
stupid

No, it is hard to get the details right but you are not helping
yourself by not breaking your program up into helpful simple
functions. You have some nice functions to help find the strings, but
you stopped there. I'd have some functions to help build the copy
with the replacements.

*Actually*, I'd use (and did use) memcpy and strcpy, but if I had
decided to drink the "no string.h" cool aid, I'd write these myself.
Neither is more than a line or two and they simplify the replace
function a lot.

For my own amusement, I've studied what slows up this replace function
and I have written reasonably fast version that avoids strstr because,
as I've descried elsewhere, it forces the program to re-scan strings
unnecessarily. For very long strings, strstr still wins because of the
sophisticated algorithm that glibc's version uses, but for anything
else it is slightly faster to scan for all the sub-string positions
"by hand". Of course, it still uses the other string functions.

<snip>

bartc · Feb 17, 2010

am thinking if string programing in C is naturally so difficult or i'm
just
stupid

It's just very fiddly.

spinoza1111 · Feb 17, 2010

He does it for the strokes. His motives may be different, and the
strokes he seeks may be different (though not vastly different from
some of the sci.math cranks I've encountered), but he's still doing
it to elicit responses. Troll, crank, idiot - you can tick all three
with him.

All of which may be true. The problem is that in every discussion I'm
in, I am (like Shakespeare's Falstaff) not only Witty (IMO) in myself
but the cause that is of Witte that is in others. It's always a
refreshing change from the tedious efforts in this and other
newsgroups to recreate the dull spirit of the nastiest type of
business office, which is anhedonia gone insane. I somehow manage to
drive discussions that are always above the usual level.

Oh yes, and this "troll, crank, and idiot" here was the first to post
a solution to the problem, the first to debug it, and is the only one
to have posted anything like a correct solution other than Willem and
(as far as I can tell) io_x. The Regular Guys have been posting
idiotic nonsolutions with far more bugs, all of which use string.h.
It's far too late for the Chomsky Type 3 Guys to post anything, since
they'd probably plagiarize Willem or myself.

But if nonconforming to normalized deviance and Eunuch programming is
to be a troll, a crank and an idiot in your book, hey, so I am.

Fibonacci	0	May 13, 2023
Blue J Ciphertext Program	2	Nov 22, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023
Need help finding Segmentation fault C++	0	Apr 16, 2022
Python code problem	2	Apr 23, 2023
Scanf is being prioritized over printf ?	1	Nov 5, 2023
My Status, Ciphertext	2	Nov 28, 2023
Can someone pls help me with a little algorithm script	1	Nov 28, 2024

substring finding problem!

bartc

fedora

fedora

Phil Carmody

Ben Bacarisse

Chris M. Thomasson

Chris M. Thomasson

Chris M. Thomasson

spinoza1111

spinoza1111

Chris M. Thomasson

spinoza1111

spinoza1111

fedora

Phil Carmody

fedora

fedora

Ben Bacarisse

bartc

spinoza1111

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads