replace substring

M

magix

Hi,

I have
char* str1 = "d:\temp\data\test.txt"

I want to replace all the "\" to be "\\", so that

the string will have "d:\\temp\\data\\test.txt"

Can you help ?

Regards,
Magix
 
M

Martin Ambuhl

magix said:
Hi,

I have
char* str1 = "d:\temp\data\test.txt"

I want to replace all the "\" to be "\\", so that

the string will have "d:\\temp\\data\\test.txt"

Can you help ?

The string to which str1 points is a string literal, and you should not
attempt to modify it. String literals are usually not modifiable and in
any case your new string takes more space than the string literal does,
so even if you could modify it, you would overrun the end of it.

You need to allocate an array of sufficient space for the new string
(including the terminating zero-byte) and then something like

/* warning: neither tested nor written to be bulletproof
(or even dart-resistant) */
void replaceslash(char *source, char *target)
{
for (;*source; source++)
{
if (*source = '\\') *target++ = *source;
*target++ = *source;
}
}

Now, the code above simply produces two '\\' when it sees one.
But on almost any system, it makes more sense to replace '\\' with '/'
(even windows and dos have no problem with this), when the above
then becomes
void replaceslash(char *source, char *target)
{
for (;*source; source++)
{
if (*source = '\\') *target++ = '/';
else *target++ = *source; /* notice the else! */
}
}
 
S

Szabolcs Borsanyi

Try
sed ss\\\\s\\\\\\\\sg
This sed solution seems to be off-topic, but, when applied to the OP's
source, could fix the C problem.

The OP should tell how he got that invalid string literal. If it his
own typing, he should replace the backslashes with pairs of backslashes,
either by hand or using the sed script above. (But it is unlikely to have
sed on a platform with path names containing lot's of backslashes.)

(you=OP)
On the other hand, it is an interesting question, how to replace
valid backslash characters in a string to pairs of backslashes, which can
be relevant, e.g. when writing a source code or passing the string to an
other picky application. If that is the case, you can do this replacement
on the course of copying each character to a separate character array.

Third point is, that it is possible to put a string into a character
array , like char path[]="d:\\path"; This character array cannot hold
more characters than it did originally, but feel free to put a big number
between the square brackets so that you do not have this limitation.
(This point is surely covered by the faq)

Szabolcs
 
J

Jens Thoms Toerring

magix said:
I have
char* str1 = "d:\temp\data\test.txt"
I want to replace all the "\" to be "\\", so that
the string will have "d:\\temp\\data\\test.txt"

You can't change the above string at all. What you have
there is a pointer that points to memory that you are
not allowed to change - it could be read-only memory.

Now, if you change the definitionof 'str1' to

char str1[ ] = "d:\temp\data\test.txt";

things already look a bit better since now 'str1' isn't
just a pointer but a real array, initialized with
the string. And that's now something you can change.
But you couldn't double just one character in that
array since it simply isn't large enough to hold
one more character. To add something to the string
you would first have to get hold of memory where
the resulting string would fit in.

But there is another problems. There's not a single
backslash in that string! E.g. "\t" isn't a backslash
followed by the character 't' but it's the escape
sequence for the tab character. And '\d' is not even
a valid escape sequence, so the compiler will try to
warn you and drop the backslash. If you try to print
that string anyway you will get something like

d: empdata est.txt

(the details depending on the tab settings on your
machine). So there actually isn't a single backslash
character in your 'str1' since the single backslash
characters you wrote in there are always taken to
"escape" the following character. Whenever you
see a backslash in a string in C (or a character
constant like '\t') remember that this is just a
kind of "operator", changing the meaning of the
following character in the string. And to get a
real baclslash character you have to write '\\'
since the backslash "escapes" itself.

Thus if you want backslashes in the string you have to
double them already when you write the string in order
to end up with single backslashes within it at all.
Putting in two backslashes in a row results in a
single backslash character in the string. If you do
that like this

char str1[ ] = "d:\\temp\\data\\test.txt";

then 'str1' contains three (not six) backslashes. And
I guess you're not really interested in doubling those
anymore.
Regards, Jens

PS. If I am not completely misinformed you can also
use a normal slash '/' as a path separator under
Windows (what I guess you're using from the 'd:'
at the start of the string) in a C program. So if
this is supposed to be a path you could use

char str1[ ] = "d:/temp/data/test.txt";

and pass that to e.g. fopen() and it should still
work as intended.
 
B

Bartc

magix said:
Hi,

I have
char* str1 = "d:\temp\data\test.txt"

I want to replace all the "\" to be "\\", so that

the string will have "d:\\temp\\data\\test.txt"

Can you help ?

If str1 really contains those slashes, then it already represents a valid
Windows filename, if that was the idea.

Doubling up the slashes I think still yields a valid filename, but is
unnecessary.

But if that is initialisation code from a C program, the slashes will be
converted or ignored.

Assuming however you do have such a string and want to double up the
slashes, I tried this somewhat fiddly code. You may want to put str1=str2 at
the end, but is anyway just to give an idea. Note the double \\ characters
here are really single.


/* Duplicate \ characters in a string */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
char* str1 = "d:\\temp\\data\\test.txt"; /* old string (in const memory)
*/
char* str2 = NULL; /* new string (to be in heap memory) */
char *p,*q;
char c;
int i,len,slashes;

slashes=0;
len=strlen(str1);

for (i=0; i<len; ++i) /* Calculate size of new string */
if (str1=='\\')++slashes;

str2=malloc(len+slashes+1);

if (str2!=NULL) {
p=str1;
q=str2;
while (c=*p++) {
*q++ = c;
if (c=='\\') *q++ = '\\';
};
*q++ = 0;
printf("STR1 = %s\n",str1);
printf("STR2 = %s\n",str2);

// free(str2);
}
else
puts("No memory");

}
 
K

Keith Thompson

magix said:
I have
char* str1 = "d:\temp\data\test.txt"

I want to replace all the "\" to be "\\", so that

the string will have "d:\\temp\\data\\test.txt"

Can you help ?

You're going to need to be clearer about what you have and what you
want.

Do you mean that you have a declaration:

const char* str1 = "d:\temp\data\test.txt";

(note: I added the "const" and the semicolon) *in a C source file*,
and you want to modify the C source file so the declaration is

const char* str1 = "d:\\temp\\data\\test.txt";

? If so, I'd say the best tool for the job is a text editor; just add
the extra backslashes and save the file. But that's so trivial that I
suspect it's not really what you were asking.

If you want to modify a large number of such incorrect declarations,
it's going to be difficult. For one thing, backslashes can appear
legitimately in string literals; you wouldn't want to change
"hello, world\n"
to
"hello, world\\n"
.. I can imagine ways to detect which string literals should be
translated and which should be left alone, but it can't be done 100%
reliably, and I wouldn't even make the attempt without a lot more
information.

If you mean that you want to leave the declaration alone and change
the value of the string at run time, you're pretty much out of luck
(and fixing the declaration is a much better idea anyway).

Tell us what you're really trying to do, and we can probably help.
 
A

Antoninus Twink

The copy part will look something like:

while (*p1) {
if ('\' == (*p2++ = *p1++)) *p2++ = '\';
}

It won't look like that if you want it to pass a code review in any
professional environment. Blech.
 
R

Richard

CBFalconer said:
No. str1 is a non-writable string. You can create str2 and copy
str1, with modifications, into it. The copy part will look
something like:

while (*p1) {
if ('\' == (*p2++ = *p1++)) *p2++ = '\';
}

Note to magix : never, ever lay your code out like this or it will be
laughed out of any code review. Multiple statements on line might save a
little bit vertical real estate but are a pain in the behind to debug
and read. Others might disagree. Only you can decide.

Something like (not compiled or tested but you get the meaning I hope)

char c;
while (c = *d++ = *s++) { /* s==source, d==destination */
if (c=='\')
*d++ = '\';
}
 
B

Ben Bacarisse

Richard said:
Note to magix : never, ever lay your code out like this or it will be
laughed out of any code review. Multiple statements on line might save a
little bit vertical real estate but are a pain in the behind to debug
and read. Others might disagree. Only you can decide.

Something like (not compiled or tested but you get the meaning I hope)

char c;
while (c = *d++ = *s++) { /* s==source, d==destination */
if (c=='\')
*d++ = '\';
}

The layout is up to you, of course, but you've altered the semantics.
 
K

Keith Thompson

CBFalconer said:
No. str1 is a non-writable string. You can create str2 and copy
str1, with modifications, into it. The copy part will look
something like:

while (*p1) {
if ('\' == (*p2++ = *p1++)) *p2++ = '\';
}

Note that '\' is a syntax error. It should be '\\'.
 
K

Keith Thompson

Note to magix : never, ever lay your code out like this or it will be
laughed out of any code review. Multiple statements on line might save a
little bit vertical real estate but are a pain in the behind to debug
and read. Others might disagree. Only you can decide.
[...]
Note that CFB's code does not have multiple statements on one
line.

Yes, it does. It just happens that one of the statements is part of
the other statement.

(Assuming, of course, that '\' is changed to '\\'.)
 
K

Keith Thompson

[email protected] (Richard Harter) said:
while (*p1) {
if ('\' == (*p2++ = *p1++)) *p2++ = '\';
}

Note to magix : never, ever lay your code out like this or it will be
laughed out of any code review. Multiple statements on line might save a
little bit vertical real estate but are a pain in the behind to debug
and read. Others might disagree. Only you can decide. [...]
Note that CFB's code does not have multiple statements on one
line.

Yes, it does. It just happens that one of the statements is part of
the other statement.

My bad. You're right. In some languages the whole line is one
statement but not C.

<NITPICK>
Well, it's one statement in C as well, it just happens to contain
another statement within it. In fact, that's the way it is in most
languages: the equivalent of "if (condition) statement" is a compound
statement that contains another statement. C isn't unusual in this
regard.
</NITPICK>

[...]
 
T

Tomás Ó hÉilidhe

char* str1 = "d:\temp\data\test.txt"

I want to replace all the "\" to be "\\", so that

the string will have "d:\\temp\\data\\test.txt"


Do you want to do it "in place"? If so then you'll need to be sure
that the buffer is big enough. Something like

void DoubleBackSlashes(char *src)
{
char register *dst = src;

while ((*dst++ = *src++))
{
if ('\\' == *dst)
{
*dst++ = '\\';
}
}
}
 
B

Barry Schwarz

Do you want to do it "in place"? If so then you'll need to be sure
that the buffer is big enough. Something like

void DoubleBackSlashes(char *src)
{
char register *dst = src;

while ((*dst++ = *src++))

This doesn't work when src points to a string literal.
{
if ('\\' == *dst)
{
*dst++ = '\\';

This doesn't work at all. It destroys every character following the
first / before that character is processed.


Remove del for email
 
T

Tomás Ó hÉilidhe

This doesn't work when src points to a string literal.


This doesn't work at all.  It destroys every character following the
first / before that character is processed.


Oh Christ what was I thinking :-O

There's a few different ways of writing this algorithm depending upon:
* whether the original buffer is big enough
* whether memory is so scarce that you shouldn't create another
buffer
 
P

Paul Hsieh

I have
char* str1 = "d:\temp\data\test.txt"

I want to replace all the "\" to be "\\", so that

the string will have "d:\\temp\\data\\test.txt"

Can you help ?

All the solutions posted so far are incomplete or utter nonsense. Not
too surprising, as the C language is just such garbage for dealing
with strings. Your source data is likely wrong as well, as within
strings \ is always escaped by \\ (though you might mean \t, \d, \t
characters in there, but it doesn't look like it). You want to write
to a char * string, but you have done nothing to make sure its
writable or finding storage for it. Let's try this:

char * str1 = "d:\\temp\\data\\test.txt";
char * str2 = (char *) malloc (sizeof(char) * (1 + 2 * strlen
(str1)));
if (!str2) {
char * d = str2, *s = str1;
while ('\0' != (*d = *s)) {
d++;
if ('\\' == *s) {
*d = *s;
d++;
}
s++;
}
}

return str2; /* or str1 = str2; or however else you want to do this */

Its hacky code and its very difficult to write code like the above in
a sustainable way (and I haven't reviewed it -- it might contain
bugs). A much easier solution is just to use "The Better String
Library":

bstring b = bfromcstr ("d:\\temp\\data\\test.txt");
static struct tagbstring from = bsStatic ("\\");
static struct tagbstring replace = bsStatic ("\\\\");

if (BSTR_OK != bfindreplace (b, from, replace, 0)) {
bdestroy (b);
b = NULL;
}
/* If you have the memory b will contain the post modified string.
*/

Life is too short not to have solutions like this on hand.
 
B

Ben Bacarisse

CBFalconer said:
It works fine to copy the string except that dst cannot be equal to
src. It doesn't work for the rest.

Of coarse not. It is no longer testing the original src.

I know I'll regret this but... check again. Barry Schwarz is correct
(except for the typo: writing / instead of \).

if ('\\' == *dst) *dst++ = '\\';

looks harmless (even pointless) putting, as it does, a \ where one
already exists, but because src also points there, the increment means
that the loop condition will copy the \ over whatever follows. This
causes both the loop condition and that of the nested 'if' to be
forever true. UB is guaranteed if the string contains \.
 
F

Flash Gordon

Paul Hsieh wrote, On 06/06/08 02:59:
All the solutions posted so far are incomplete or utter nonsense. Not

Yours is buggy.
too surprising, as the C language is just such garbage for dealing
with strings. Your source data is likely wrong as well, as within
strings \ is always escaped by \\ (though you might mean \t, \d, \t
characters in there, but it doesn't look like it). You want to write
to a char * string, but you have done nothing to make sure its
writable or finding storage for it. Let's try this:

char * str1 = "d:\\temp\\data\\test.txt";
char * str2 = (char *) malloc (sizeof(char) * (1 + 2 * strlen
(str1)));

Why the sizeof(char)? You know that is is 1 by definition so putting it
is is pointless and makes the code harder to read. The cast is also
pointless and may stop the compiler from producing a warning if he
forgets to include stdlib.h

Not bugs, but not helpful in my opinion.
if (!str2) {

Urm, don't you mean "if (str2)" ? I'm sure you did not want to do the
processing if and only if the malloc call failed! This is a bug.
char * d = str2, *s = str1;
while ('\0' != (*d = *s)) {
d++;
if ('\\' == *s) {
*d = *s;
d++;
}
s++;
}
}

I would not seperate out all of those increments.
while ((*d++ = *s) != '\0) {
if (*s++ = '\\')
*d++ = '\\'
return str2; /* or str1 = str2; or however else you want to do this */

Its hacky code and its very difficult to write code like the above in
a sustainable way (and I haven't reviewed it -- it might contain
bugs). A much easier solution is just to use "The Better String
Library":

bstring b = bfromcstr ("d:\\temp\\data\\test.txt");
static struct tagbstring from = bsStatic ("\\");
static struct tagbstring replace = bsStatic ("\\\\");

if (BSTR_OK != bfindreplace (b, from, replace, 0)) {
bdestroy (b);
b = NULL;
}
/* If you have the memory b will contain the post modified string.
*/

Life is too short not to have solutions like this on hand.

In this instance it does not look like much less code to write. You
could probably do it with fewer lines of code using a regular expression
library, but for all I know that could a be slower solution.

We all have utility libraries we use, and I agree that a string library
such as yours can be useful for some tasks.
 
P

Paul Hsieh

Paul Hsieh wrote, On 06/06/08 02:59:


Yours is buggy.

I gave a disclaimer that I didn't really review it. At least I didn't
eat myself with an incorrect self-aliasing solution that fails, or run
off the end of some unknown sized array.
Why the sizeof(char)?

Why does your physics professor insist on units at the end of your
calculations? Its self documenting even if its unnecessary. I could
have said sizeof(*str2) or something like that I suppose.
[...] You know that is is 1 by definition so putting it
is is pointless and makes the code harder to read.

And when you decide to substitute char with wchar_t then what?
[...] The cast is also
pointless and may stop the compiler from producing a warning if he
forgets to include stdlib.h

No it will not. I think at this point, making such claims is willful
deception.
Not bugs, but not helpful in my opinion.


Urm, don't you mean "if (str2)" ? I'm sure you did not want to do the
processing if and only if the malloc call failed! This is a bug.
Right.


I would not seperate out all of those increments.
while ((*d++ = *s) != '\0) {
if (*s++ = '\\')
*d++ = '\\'

I suppose you just *had* to reverse the order of the comparisons for
taste reasons didn't you? It really bugs you that much doesn't it?
Take a closer look at your second line.

When you have to run *this* much convoluted nonsense through your head
just to do a simple find and replace, you are bound to make this kind
of mistake. That's my *REAL* point, and you've just helped make it.

In any event, as to the ramming things into as few lines as possible
thing you are doing here, I have an aversion to code that even *looks*
like it might have UB. The incrementors/decrementors are specifically
troublesome here since their side effect is being hidden by order of
operations, and C's order of operations is already well known to be
broken.
In this instance it does not look like much less code to write.

And I didn't even manage to avoid bugs here either (it should be &from
and &replace -- I blame it on the fact that I wrote the earlier code)
but the library goes a long way to mitigate them (the compiler would
immediately tell you how to fix it.)
[...] You
could probably do it with fewer lines of code using a regular expression
library,

I kind of doubt it -- there is still the problem of managing the
memory for the string result, and that's not typically taken care of
by a regex library.
[...] but for all I know that could a be slower solution.

That too (it is.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,830
Latest member
HeleneMull

Latest Threads

Top