Non-constant constant strings

B

BartC

char* readonlySourceCode[] =
{
"if (something[9999])\r\n",
"{\r\n",
" // Do something\r\n",
"} else {\r\n",
" // Do something else\r\n",
"}"
};
Is there a way to create the lines in the readonlySourceCode definition so
it's not read-only. I'm using Visual C++, and am looking for something
like this:

What's wrong with keeping the lines in a text file? Then you can read them
into an allocated string which will necessarily be writeable. Or you can
just edit the file (or run a script on it to make the changes needed).
 
G

glen herrmannsfeldt

Rick C. Hodgin said:
On Monday, January 20, 2014 9:10:24 AM UTC-5, Aleksandar Kuktin wrote:
(snip)
int i, len;
char* ptr;
for (i = 0; list != null; i++)
{
len = strlen(list) + 1;
ptr = (char*)malloc(len); memcpy(ptr, list[0], len);
list[0] = ptr;
}

And what, exactly, is wrong with the basic principle of this approach?

What "exactly" is wrong with this approach is that I must do
something manually in code, something that is (a) unnecessary,
(b) rather cumbersome mechanically, and (c) something the compiler
would be capable of doing for me were it not for design protocol
limitations being artificially imposed upon an otherwise valid
data request for a block of read-write memory.

There are an infinite number of features that could be added
to languages and/or compilers, and that could make it easier
for the needs of some people. The commonly needed ones get in,
the rare ones don't.

K&R C had this feature, but not intialized auto arrays.
With the ability to initialize arrays with string constants,
the need to modify string constants was reduced.

(snip)
I don't have need of making copies of my data.
It introduces unnecessary code, complexity, opportunity for
errors. What I do have need of is accessing the data I've
encoded, as it's encoded at comple-time, to be altered at run-time.

-- glen
 
K

Keith Thompson

Rick C. Hodgin said:
glen herrmannsfeldt said:
(snip)
char foo[] = "Rick"; // Goes to read-write memory
char* list[] = { "Rick" } // Goes to read-only memory
I want a way for list[0] to go to the same place as foo.
I am using Visual C++ compiler, but I am writing in C.
I use the C++ compiler because it has some relaxed syntax
constraints.
As I remember it, not having looked recently, the pre-ANSI (K&R)
compilers allowed writable strings. While not the best practice,
it was an allowed and sometimes useful technique.

All versions of the C language, from K&R to ISO C11, have permitted
compilers to make string literals writable. What's changed over
time is that most compilers don't take advantage of that permission.

Ah! That's a shame. :)

Not really, at least not for the vast majority of C programmers.

C string literals are intended to be *constant*. The fact that
compilers are permitted to generate code that crashes on an attempt to
modify the array specified by a string literal makes for better error
checking. The fact that such checks are not required is for backwards
compability for code written before the "const" keyword was added to the
language; even if not for that, C tends to make such things undefined
behavior rather than requiring run-time diagnostics.

[...]
I believe the language should operate such that as I've defined a to
point to "foo", and b to point to "foo", and these are separate
strings, then they should be separate strings in memory, the same as
if I'd said char* a="123"; char* b="456".

If *I* write

char *a = "foo";
char *b = "foo";

all I care about is that both a and b point to strings containing
the characters 'f', 'o', and 'o', in that order. (It also means
that I've forgotten the "const" keyword for some reason.) And if
I later write:

printf("%s\n", a);

the compiler is free to generate code that does the equivalent of

puts("foo");

Forbidding the two occurrences of "foo" to occupy the same memory
location would matter only if (a) you want to be able to modify the
contents of the array (which C doesn't permit you to rely on), or
(b) if you care about the result of (a == b).

If you want writable strings, you can get them:

char a_array[] = "foo";
char *a = a_array; /* or &a_array[0] */

It's slightly less convenient for what you're trying to do, but I don't
think that's a common enough case to justify changing the language as
you suggest.

A compatible language change (or compiler-specific extension) that
wouldn't break existing code might be a new kind of string literal, with
a prefix indicating that the array is writable and may not be shared
with other string literals with the same value. Perhaps something like:

char *a = W"foo";
char *b = W"foo";
a[0] = 'F';
printf("%s%s\n", a, b); /* will print "Foofoo" */

If you wanted to take that approach, you options would be:

1. Modify some open-source compiler to implement it as a language
extension (lots of work);
2. Persuade the maintainers of some compiler to provide it (less work
for you, but likely to fail); or
3. Persuade the ISO C committee to add such a feature to the next C
standard (even more likely to fail, and requries waiting at least a
decade before you can use it).

Barring that, you can either use the existing features of the language,
or implement a preprocessing step that translates code using something
like this feature into standard C.

BTW, you might find that compound literals (added to the language by the
1999 standard) are helpful:

This:

#include <stdio.h>

int main(void) {
char *s = (char[]){"hello"};
s[0] = 'H';
puts(s);
}

prints "Hello". But the array whose first element s points to is still
just 6 characters long, and unlike string literals, an object created by
a compound has automatic storage duration (it ceases to exist when you
leave the enclosing block).
 
G

glen herrmannsfeldt

(snip, I wrote)
All versions of the C language, from K&R to ISO C11, have permitted
compilers to make string literals writable. What's changed over
time is that most compilers don't take advantage of that permission.

Well, also the need to do it has been reduced. For one, initialized
auto arrays in ANSI C helped. But an initialized auto array takes
up twice as much memory, one for the initialization value and
another when it is allocated. For most systems, initialized static
arrays only allocate the one copy and initialize it at program
fetch.

There were lots of tricks used in the small memory days that
went away as memory prices decreased and systems got larger.

A program might have some string data that it needs to print out
once, and never again. It could reuse that memory for something
else later.
They don't *have* to do that unless they make additional guarantees
beyond what the language specifies.
If I write:
char *a = "foo";
char *b = "foo";
a[0] = 'F';
puts(b);
and the puts call is actually executed, the language permits it to
print either "foo", or "Foo"(or "fnord", or a suffusion of yellow).
The behavior of the assignment to a[0] is undefined, and once you
do that all bets are off.
But if a compiler were to guarantee, as a language extension,
that string literals are meaningfully modifiable, then it would
probably have to guarantee that the strings pointed to by a and
b must be distinct (unless the compiler can prove that they're
never modified). The compiler's documentation would have to spell
out just what additional guarantees it offers. (Such an extension
would not make the compiler non-conforming, since any code that
takes advantage of it have undefined behavior.)

Yes. If the extension was called "writable-strings", one would
hope that it supplied separate copies.

-- glen
 
G

glen herrmannsfeldt

Rick C. Hodgin said:
On Monday, January 20, 2014 11:14:45 AM UTC-5, Keith Thompson wrote:
(snip)
Ah! That's a shame. :)
(snip)

I believe the language should operate such that as I've defined
a to point to "foo", and b to point to "foo", and these are
separate strings, then they should be separate strings in memory,
the same as if I'd said char* a="123"; char* b="456".
(snip)

I personally believe it's a silly requirement to do such a
comparison to save a few bytes of space by default. I'd
rather have it always duplicated and then allow the developer
to provide a manually inserted command line switch which
specifically turns on that kind of checking, and that kind
of substituting.

I believe Java requires String constants in the same class
(maybe only method) to have the same reference value. That is, in

if("string"=="string") ...

the if condition will be true. As far as I know, C doesn't
require that, but allows for it.

-- glen
 
G

glen herrmannsfeldt

Keith Thompson said:
Rick C. Hodgin said:
I have a need for something like this, except that I need to
edit list[N]'s data, as in memcpy(list[0], "eno", 3):
char* list[] = { "one", "two", "three", "four" }; [...]

It would be helpful if you'd format your articles to have lines
no longer than about 72 columns. Usenet is not the web, and
newsreaders don't necessarily deal with with arbitrary long lines.
(My newsreader does split long lines, but not at word boundaries.)

And some news hosts enforce this.

-- glen
 
K

Keith Thompson

glen herrmannsfeldt said:
Rick C. Hodgin said:
On Monday, January 20, 2014 9:10:24 AM UTC-5, Aleksandar Kuktin wrote:
(snip)
int i, len;
char* ptr;
for (i = 0; list != null; i++)
{
len = strlen(list) + 1;
ptr = (char*)malloc(len); memcpy(ptr, list[0], len);
list[0] = ptr;
}
And what, exactly, is wrong with the basic principle of this approach?

What "exactly" is wrong with this approach is that I must do
something manually in code, something that is (a) unnecessary,
(b) rather cumbersome mechanically, and (c) something the compiler
would be capable of doing for me were it not for design protocol
limitations being artificially imposed upon an otherwise valid
data request for a block of read-write memory.

There are an infinite number of features that could be added
to languages and/or compilers, and that could make it easier
for the needs of some people. The commonly needed ones get in,
the rare ones don't.

K&R C had this feature, but not intialized auto arrays.
With the ability to initialize arrays with string constants,
the need to modify string constants was reduced.


Did K&R1 guarantee that string literals are writable and unique?
(My copy is at home; I'll try to check later.)
 
R

Rick C. Hodgin

My usual solution in that case is to put all the data into a file
of some kind, than, as part of the build process, usually with make,
convert that file into appropriate C, just before compiling it.

Yes. There are many ways to do it. I considered that option as well. It's just a lot of work for something the compiler should be able to do.
Oh, also, one of my favorite C features (Java also has), you
can have the extra comma on the last line. Convenient for program
generated text, though most likely in the standard as it allows
for easy preprocessor conditionals.
-- glen

See, and I think that such an "allowance" is patently absurd and should not be a part of any language. :)

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

What's wrong with keeping the lines in a text file? Then you can read them
into an allocated string which will necessarily be writeable. Or you can
just edit the file (or run a script on it to make the changes needed).


Nothing logically. It's just that if I use the external file, now I'm maintaining an extra file, I had to write the extra code which reads it in, and that all presents many more opportunities for errors at runtime.

Best regards,
Rick C. Hodgin
 
E

Eric Sosman

[...]
Did K&R1 guarantee that string literals are writable and unique?
(My copy is at home; I'll try to check later.)

K&R guarantees uniqueness ("all strings, even when written
identically, are distinct" -- pg. 181), but I can't find any
promise of modifiability. One might guess that the strings
were intended to be modifiable -- why guarantee uniqueness if
not? -- but if there's any explicit language to that effect I've
overlooked it.

But that was long ago: The uniqueness guarantee (and any
accompanying mutability) was rescinded by the original ANSI C
Standard way back in 1989. Quoth the Rationale:

"String literals are not required to be modifiable. This
specification allows implementations to share copies of
strings with identical text, to place string literals in
read-only memory, and to perform certain optimizations.
[...] Those members of the C89 Committee who insisted that
string literals should be modifiable were content to have
this practice designated a common extension [...]"
 
R

Rick C. Hodgin

Not really, at least not for the vast majority of C programmers.

How did I guess you were going to say that? :)
C string literals are intended to be *constant*.

They are most often intended to be constant, but not always. There are many
cases where developers allocate something with an initial value, but then
alter it at runtime.

char defaultOption = "4";

In this case, the default option is 4 until the user changes it. It's a constant bit of text, but is not constant. :)
The fact that
compilers are permitted to generate code that crashes on an attempt to
modify the array specified by a string literal makes for better error
checking.

I would like to be able to specify that with a const prefix, as in this
type of syntax:

char* list[] =
{
"foo1",
const "foo2",
"foo3"
}

In this case, I do not want the second element to be changed, but the
first and third... they can change.
The fact that such checks are not required is for backwards
compability for code written before the "const" keyword was added to the
language; even if not for that, C tends to make such things undefined
behavior rather than requiring run-time diagnostics.

Well ... there's logic there. It makes sense. I think it's time for a
switchover though. We're getting into multi-processor programming, multiple
threads. Where we are in 2010s and later is not where we were in 1980s.
[...]
I believe the language should operate such that as I've defined a to
point to "foo", and b to point to "foo", and these are separate
strings, then they should be separate strings in memory, the same as
if I'd said char* a="123"; char* b="456".

If *I* write
char *a = "foo";
char *b = "foo";
all I care about is that both a and b point to strings containing
the characters 'f', 'o', and 'o', in that order. (It also means
that I've forgotten the "const" keyword for some reason.) And if
I later write:
printf("%s\n", a);
the compiler is free to generate code that does the equivalent of
puts("foo");
Forbidding the two occurrences of "foo" to occupy the same memory
location would matter only if (a) you want to be able to modify the
contents of the array (which C doesn't permit you to rely on), or
(b) if you care about the result of (a == b).
If you want writable strings, you can get them:
char a_array[] = "foo";
char *a = a_array; /* or &a_array[0] */

It's slightly less convenient for what you're trying to do, but I don't
think that's a common enough case to justify changing the language as
you suggest.

I realize C operates this way and it's fine. I think the future standard
should be that everything is in read-write memory except those things
explicitly prefixed with const, or a new _c("text") macro which identifies
that data explicitly as a constant.
A compatible language change (or compiler-specific extension) that
wouldn't break existing code might be a new kind of string literal, with
a prefix indicating that the array is writable and may not be shared
with other string literals with the same value. Perhaps something like:
char *a = W"foo";
char *b = W"foo";
a[0] = 'F';
printf("%s%s\n", a, b); /* will print "Foofoo" */
If you wanted to take that approach, you options would be:
1. Modify some open-source compiler to implement it as a language
extension (lots of work);
2. Persuade the maintainers of some compiler to provide it (less work
for you, but likely to fail); or
3. Persuade the ISO C committee to add such a feature to the next C
standard (even more likely to fail, and requries waiting at least a
decade before you can use it).
Yup.

Barring that, you can either use the existing features of the language,
or implement a preprocessing step that translates code using something
like this feature into standard C.

BTW, you might find that compound literals (added to the language by the
1999 standard) are helpful:
This:
#include <stdio.h>
int main(void) {
char *s = (char[]){"hello"};
s[0] = 'H';
puts(s);
}
prints "Hello". But the array whose first element s points to is still
just 6 characters long, and unlike string literals, an object created by
a compound has automatic storage duration (it ceases to exist when you
leave the enclosing block).

Best regards,
Rick C. Hodgin
 
I

Ian Collins

Rick said:
There are lots of solutions and workarounds. I'm looking for a
compiler directive that will override the default behavior of
allocating constant strings to read-only memory, and instead allocate
them to read-write memory.

char foo[] = "Rick"; // Goes to read-write memory char* list[] = {
"Rick" } // Goes to read-only memory

I want a way for list[0] to go to the same place as foo. I am using
Visual C++ compiler, but I am writing in C. I use the C++ compiler
because it has some relaxed syntax constraints.

For what you are doing, you would be better off using C++ strings.
Given you have to tool to hand, you may as well use it.
 
R

Rick C. Hodgin

K&R guarantees uniqueness ("all strings, even when written
identically, are distinct" -- pg. 181)

To me, this is the only way that makes sense. If I want to use the same string I can reference it. Or, I could introduce a compiler switch which introduces an option to combine similar strings marked const.
But that was long ago: The uniqueness guarantee (and any
accompanying mutability) was rescinded by the original ANSI C
Standard way back in 1989.

And I bet you could hear the thuds on the floor as many developers screamed "WHAT!" and then passed out.
Quoth the Rationale:
"String literals are not required to be modifiable. This
specification allows implementations to share copies of
strings with identical text, to place string literals in
read-only memory, and to perform certain optimizations.

Insanity I say! :)
[...] Those members of the C89 Committee who insisted that
string literals should be modifiable were content to have
this practice designated a common extension [...]"

The word "common" being used very loosely there. LOL! :)

Best regards,
Rick C. Hodgin
 
E

Eric Sosman

How did I guess you were going to say that? :)


They are most often intended to be constant, but not always. There are many
cases where developers allocate something with an initial value, but then
alter it at runtime.

char defaultOption = "4";

Constraint violation, requiring a diagnostic. Presumably
you meant one of

char defaultOption = '4';
or
char *defaultOption = "4";
In this case, the default option is 4 until the user changes it. It's a constant bit of text, but is not constant. :)

No, not at all. `defaultOption' (either version) is not a
constant, but a variable. It has an initial value, that's all.
You can change the variable's value with one of

if (do_z)
defaultOption = 'Z';
or
if (pile_it_deeply)
defaultOption = "Gomer Pyle";

There's really no difference between any of these and

int defaultOption = 42;
...
if (behave_differently)
defaultOption = -17;

In none of these cases is there any need to change the value
of a constant, nor any reason to want to do so.
 
E

Eric Sosman

To me, this is the only way that makes sense. If I want to use the same string I can reference it. Or, I could introduce a compiler switch which introduces an option to combine similar strings marked const.


And I bet you could hear the thuds on the floor as many developers screamed "WHAT!" and then passed out.

Did you miss the part about "Those members ... were content?"

At that time I was a developer with not quite twenty years'
worth of C experience, and I neither screamed nor thudded. YMMV.
Quoth the Rationale:
"String literals are not required to be modifiable. This
specification allows implementations to share copies of
strings with identical text, to place string literals in
read-only memory, and to perform certain optimizations.

Insanity I say! :)
[...] Those members of the C89 Committee who insisted that
string literals should be modifiable were content to have
this practice designated a common extension [...]"

The word "common" being used very loosely there. LOL! :)

See Appendix J.
 
K

Keith Thompson

Rick C. Hodgin said:
How did I guess you were going to say that? :)


They are most often intended to be constant, but not always. There are many
cases where developers allocate something with an initial value, but then
alter it at runtime.

char defaultOption = "4";

In this case, the default option is 4 until the user changes it. It's
a constant bit of text, but is not constant. :)

Did you mean "char *defaultOption" or "char defaultOption[]" rather than
"char defaultOptions", or did you mean '4' rather than "4"?
The fact that
compilers are permitted to generate code that crashes on an attempt to
modify the array specified by a string literal makes for better error
checking.

I would like to be able to specify that with a const prefix, as in this
type of syntax:

char* list[] =
{
"foo1",
const "foo2",
"foo3"
}

In this case, I do not want the second element to be changed, but the
first and third... they can change.

If I were to suggest a new language feature to support that, I'd want an
explicit marker for a string that I *do* want to be able to change.

In your proposed C-like language, what would this snippet print?

for (int i = 0; i < 2; i ++) {
char *s = "hello";
if (i == 0) {
s[0] = 'H';
}
puts(s);
}

In C as it's currently defined, the string literal "hello" corresponds
to an anonymous array object with static storage duration; attempting to
modify it has undefined behavior. As I understand it, you want to
remove the second part of that. The above code has one occurrence of a
string literal, but it's being used in the initializer for two distinct
objects. On the second iteration, does s point to a string with
contents "hello" or "Hello"?

Either interpretation is problematic.
Well ... there's logic there. It makes sense. I think it's time for a
switchover though. We're getting into multi-processor programming, multiple
threads. Where we are in 2010s and later is not where we were in 1980s.

I fail to see how this argues for modifiable string literals.
[...]
I believe the language should operate such that as I've defined a to
point to "foo", and b to point to "foo", and these are separate
strings, then they should be separate strings in memory, the same as
if I'd said char* a="123"; char* b="456".

If *I* write
char *a = "foo";
char *b = "foo";
all I care about is that both a and b point to strings containing
the characters 'f', 'o', and 'o', in that order. (It also means
that I've forgotten the "const" keyword for some reason.) And if
I later write:
printf("%s\n", a);
the compiler is free to generate code that does the equivalent of
puts("foo");
Forbidding the two occurrences of "foo" to occupy the same memory
location would matter only if (a) you want to be able to modify the
contents of the array (which C doesn't permit you to rely on), or
(b) if you care about the result of (a == b).
If you want writable strings, you can get them:
char a_array[] = "foo";
char *a = a_array; /* or &a_array[0] */

It's slightly less convenient for what you're trying to do, but I don't
think that's a common enough case to justify changing the language as
you suggest.

I realize C operates this way and it's fine. I think the future standard
should be that everything is in read-write memory except those things
explicitly prefixed with const, or a new _c("text") macro which identifies
that data explicitly as a constant.

As a language design issue, I *strongly* disagree with this.
Personally, I like the idea of making everything read-only unless you
explicitly say you want to be able to modify it. (Obviously C isn't
defined this way; equally obviously, this is merely my own opinion.)

BTW, your _c("text") macro would still have to be defined somehow;
a new kind of string literal would probably make more sense.

The bottom line is that standard C cannot, and IMHO should not, cater to
every obscure coding practice. A language can have:
1. mutable string literals;
2. immutable string literals; or
3. both, with distinct syntax.

C has chosen option 2, and it has served us well. I would not strongly
object to option 3, but I'm not convinced that it would be worth the
extra complexity. You're welcome to push for option 1, but don't expect
to succeed.
 
B

BartC

Oh, also, one of my favorite C features (Java also has), you
can have the extra comma on the last line. Convenient for program
generated text, though most likely in the standard as it allows
for easy preprocessor conditionals.

A nice feature I came across (I think from the poster known as 'BGB') was a
form of include where the text in the included file was inserted as a string
constant. So if the text in the file was:

one
two
three

then including that file would be equivalent to:

"one\ntwo\nthree\n"

in the source code. (Obviously in C it would need to be allowed inside an
expression.)
 
G

glen herrmannsfeldt

(snip, I wrote)
Did K&R1 guarantee that string literals are writable and unique?
(My copy is at home; I'll try to check later.)

I have one somewhere, but I found K&R2, appendix C, summary
of changes: (from K&R1)

"Strings are no longer modifyable, and may be placed in
read-only memory."

Doesn't say about unique, might need an actual K&R1.

Reminds me of a rarely used feature in Fortran, though maybe
gone by now. You can read over H format descriptors.

READ(5,1)
1 FORMAT(20HSOMETHING GOES HERE.)
WRITE(6,1)

The first READ writes over the contents of the H descriptor,
the WRITE then writes the new value.

Seems like a similar reuse of otherwise not needed memory, but
pretty strange now.

Most often not so useful, as there is no way to get carriage
control in place.

-- glen
 
S

Seebs

And I bet you could hear the thuds on the floor as many developers
screamed "WHAT!" and then passed out.

I don't think so. Back in the late 80s, when I was just starting to
learn C, I was aware that if you had two string literals, and one was
the same characters as the tail end of the other, the compiler might
use the same storage for both. It's really easy to obtain a modifiable
string if I want one, so I don't expect literals to be modifiable, or
indeed, even to occur in code or storage anywhere if they don't really
have to.

-s
 
G

glen herrmannsfeldt

Rick C. Hodgin said:
I have a need for something like this, except that I need to
edit list[N]'s data, as in memcpy(list[0], "eno", 3):
char* list[] = { "one", "two", "three", "four" };
I have a work-around like this:
char one[] = "one";
char two[] = "two";
char three[] = "three";
char four[] = "four";
char* list[] = { one, two, three, four };
However, this is clunky because I want to be able to change the
items because in the actual application it is source code that
I'm coding within the compiler for an automatic processor.

I thought I should go back to the beginning of this discussion,
to see what you were actually doing.

Seems to me that when most people do this, or at least when I do,
I substitute the copy just before writing it out. You need a loop
that goes through and writes out the lines, inside that loop copy
each line to a line buffer, modify as appropriate, then write
it out.

That also allows for variable length substitution, though sometimes
constant length is better.

It is common for error messages where the appropriate context,
such as line number, is substituted, or an error code.

It is common for macro processors, such as Mortran 2, or TeX,
where the macro arguments are substituted in the expansion.
Both Mortran and TeX use # to indicate a substitution, such
as #1 for the first argument, #2 for the second.

One that I have done takes advantage of an interesting C feature.
Put %d where you want the number to go, and printf it:

for(i=0;i<sizeof(x)/sizeof(*x);i++) {
fprintf(outfile,x, n, n, n, n, n, n, n, n, n, n);
}

Now up to 10 %d's in the line will be replaced by the value of n.
(I like to put in extra to avoid the problem of not having enough.)

Very little work to write.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,085
Messages
2,570,597
Members
47,220
Latest member
AugustinaJ

Latest Threads

Top