int * vs char *

I

Ian Collins

I suspect that the vast majority of code that attempts to modify
string literals does so as the result of bugs. A lot more code
uses string literals in contexts that don't treat them as const,
but doesn't actually try to modify them; for example:

void func(char *s) {
printf("In func(), s = \"%s\"\n");
}

...

func("hello");

In short, I think the issue is not that anyone wants to modify
string literals; it's that making them const would break existing
code that *doesn't* actually modify string literals.

(Stroustrup was able to do this in C++ because there was no existing
C++ code before he invented the language.)

No quite, the change came in with the 1998 C++ standard so there was
plenty of existing code. That's why there was a "special case" which
allows your example to pass without requiring a diagnostic.
 
K

Keith Thompson

Ian Collins said:
No quite, the change came in with the 1998 C++ standard so there was
plenty of existing code. That's why there was a "special case" which
allows your example to pass without requiring a diagnostic.

The change I was referring to was making string literals const.

I'm not familiar with the "special case"; I'll have to look into it
(elsewhere).
 
I

Ian Collins

The change I was referring to was making string literals const.

That was the change introduced by the standard. Prior to that, C++ had
the same rule as C. The change was a hot topic of discussion at the
ACCU conference held just after the standard was ratified.
 
T

Tim Rentsch

Noob said:
Keith said:
Even if you corrected the type mismatch by writing:

char *s = "hello";
*s = 'w';

your program's behavior would be undefined. s points to a string
literal (more precisely, it points to the first element of the static
array associated with the string literal), and any attempt to modify a
string literal has undefined behavior.

Since modifying a character string literal already has UB in
the current standard, then why doesn't the next standard
specify that string literal have type const char[] instead
of just char[] ?

Because the cost is large and the benefit is small.

I daresay it's pretty easy to get diagnostics for
non-const uses of string literals if one wants
them. Given that, there is no compelling reason
to force everyone to change, especially since it
can be useful for an implementation to define
string literals so that they are usefully writeable.
 
I

Ian Collins

Noob said:
Keith said:
Even if you corrected the type mismatch by writing:

char *s = "hello";
*s = 'w';

your program's behavior would be undefined. s points to a string
literal (more precisely, it points to the first element of the static
array associated with the string literal), and any attempt to modify a
string literal has undefined behavior.

Since modifying a character string literal already has UB in
the current standard, then why doesn't the next standard
specify that string literal have type const char[] instead
of just char[] ?

Because the cost is large and the benefit is small.

The benefit can be the difference between something failing to compile
or failing horribly at run time.
I daresay it's pretty easy to get diagnostics for
non-const uses of string literals if one wants
them. Given that, there is no compelling reason
to force everyone to change, especially since it
can be useful for an implementation to define
string literals so that they are usefully writeable.

Under what circumstances?
 
A

Alan Curry

What is the correct return type of these C functions (and I've
probably missed a few) which return a pointer into somewhere into
the string passed as a first argument, given that the first argument
might be a C string literal (which we're now changing to const) or
it might be a writable character array? If it's a writable character
array, you might want to use the returned pointer to write into the
string.

strchr()
strrchr()
strstr()
strpbrk()

All should return an integer offset from the beginning of the input string.
In their present form they are dangerous.
 
S

Stephen Sprunk

What is the correct return type of these C functions (and I've
probably missed a few) which return a pointer into somewhere into
the string passed as a first argument, given that the first argument
might be a C string literal (which we're now changing to const) or
it might be a writable character array? If it's a writable character
array, you might want to use the returned pointer to write into the
string.

strchr()
strrchr()
strstr()
strpbrk()

It seems you are stuck with one of several bad choices:
(1) dragging in C++ function overloading (does that even solve
the problem? Can you overload on const/non-const arguments
of otherwise the same type?),

AIUI, you can. I see the problem, though; in fact, while researching
that question, one page I found actually listed the various overloaded
forms of strchr() offered by one compiler.

If one weren't to add function overloading (which has its own appeal),
the only solution would be to deprecate those functions and design
replacements with a const-friendly interface. However, that would break
so much code that it's simply not feasible.

S
 
I

Ian Collins

AIUI, you can. I see the problem, though; in fact, while researching
that question, one page I found actually listed the various overloaded
forms of strchr() offered by one compiler.

The C++ standard replaces the C strchr() with two overloads:

const char* strchr(const char* s, int c);
char* strchr( char* s, int c);
 
S

Stephen Sprunk

The C++ standard replaces the C strchr() with two overloads:

const char* strchr(const char* s, int c);
char* strchr( char* s, int c);

Compare to the only option C provides:

char *strchr(const char *s, int c);

This takes a const or non-const argument but always returns a non-const
pointer; the potential loss of const-ness invites bugs. Thanks to
overloading, the C++ version is able to return a pointer that matches
the const-ness of its argument, preventing bugs.

S
 
S

Shao Miller

Shao Miller said:
On 06/22/11 10:25 PM, James Kuyper wrote: [...]
The existing code problem was a acknowledged when C++ changed the type
of string literals. Compilers may choose not to issues a diagnostic for
this case. Now we have had over a decade to fix the smelly code, I
believe a diagnostic is now required by the new C++ standard.

C could and should have done the same, but as usual those worried about
breaking already broken code appear to have won the day.

Again, why should a C implementation be rendered non-conforming [to some
future Standard] thusly?

In a "bare metal" environment, one might very well wish to overwrite
their string literals' storage, no? The "bare metal" implementation
might need to define such action as being appropriate.

By using "proper" static arrays, we lose out on the "shared storage"
benefit. Writing for bare metal, hopefully one knows what one is doing.

Is this example silly?

If there's a need for a "bare metal" environment to be able to modify
string literals, that can be provided as an extension.

As in, a documented extension?
Any code that
currently takes advantage of that ability already has undefined
behavior.

Exactly the nature of my question. If there is at least one real
program that depends on this "undefined behaviour" (use the Standard
definition, please :) ), then that program's source code might have to
be adjusted for future versions of an implementation, depending on how
that future implementation implements the extension.
If such a feature were desirable, we could have an optional 'M' (for
modifiable) prefix for string literals, similar to the existing 'L'
prefix for wide string literals. For example, "hello" could be of type
const char[6], and M"hello" could be of type char[6]].

An interesting idea! 'M()' could get close, perhaps.

#define M(string) ((char[]){string})

I guess we'd need 'ML', too?
The fact that I've never heard of anyone implementing something like
this suggests (though not strongly) that there's no demand for it.

Suppose you've a program loaded into memory via a serial line. Suppose
the program is loaded into writable memory (that seems pretty likely).
Suppose the program offers a CLI. Suppose a user can rename commands or
variables, or redefine preset scripts. Yes, all of these could be done
cleanly (in my opinion) via 'static' 'char[]'s, but it can be more
convenient for some people to simply type the string literal right into
some spot in the source code where it's used and forget about coming up
with a meaningful (and possibly redundant) identifier, i.e.
'csz_Hello__world___And_how_are_you__today_'
I suspect that the vast majority of code that attempts to modify
string literals does so as the result of bugs.

That seems probable to me, too.
A lot more code
uses string literals in contexts that don't treat them as const,
but doesn't actually try to modify them; for example:

void func(char *s) {
printf("In func(), s = \"%s\"\n");
}

...

func("hello");

In short, I think the issue is not that anyone wants to modify
string literals;

For the right use case, I would.
it's that making them const would break existing
code that *doesn't* actually modify string literals.

(Stroustrup was able to do this in C++ because there was no existing
C++ code before he invented the language.)

Well doesn't 'const' "come from" C++, anyway?

And why isn't there a write-only counterpart, for symmetry, such as a
memory-mapped port that mustn't be read from?

And during very early development (and Usenet code examples), can it be
pleasant to avoid 'const' concerns altogether and then to analyze and
refine, gradually sprinkling 'const' in where appropriate? I don't
advocate this, but imagine that some folks might get "stuck" in
"analysis paralysis" if they had to think 'const'ness through at every
corner. I could be mistaken.
 
I

Ian Collins

Suppose you've a program loaded into memory via a serial line. Suppose
the program is loaded into writable memory (that seems pretty likely).
Suppose the program offers a CLI. Suppose a user can rename commands or
variables, or redefine preset scripts.

A recipe for disaster! What happens if a new name is linger than the old?
Yes, all of these could be done
cleanly (in my opinion) via 'static' 'char[]'s, but it can be more
convenient for some people to simply type the string literal right into
some spot in the source code where it's used and forget about coming up
with a meaningful (and possibly redundant) identifier, i.e.
'csz_Hello__world___And_how_are_you__today_'

Yuck, what a contrived example! As you say, there is a method that does
not use undefined behaviour.
That seems probable to me, too.


For the right use case, I would.


Well doesn't 'const' "come from" C++, anyway?

No, it's just used properly there!
And why isn't there a write-only counterpart, for symmetry, such as a
memory-mapped port that mustn't be read from?

That case isn't uncommon in embedded systems (watchdog reset being a
common example). However reading is generally harmless.
And during very early development (and Usenet code examples), can it be
pleasant to avoid 'const' concerns altogether and then to analyze and
refine, gradually sprinkling 'const' in where appropriate? I don't
advocate this, but imagine that some folks might get "stuck" in
"analysis paralysis" if they had to think 'const'ness through at every
corner. I could be mistaken.

Not really, if you don't know whether the function you are writing will
modify its arguments, you are big trouble!
 
K

Keith Thompson

Shao Miller said:
Shao Miller said:
On 6/22/2011 2:56 PM, Ian Collins wrote:
On 06/22/11 10:25 PM, James Kuyper wrote: [...]
The existing code problem was a acknowledged when C++ changed the type
of string literals. Compilers may choose not to issues a diagnostic for
this case. Now we have had over a decade to fix the smelly code, I
believe a diagnostic is now required by the new C++ standard.

C could and should have done the same, but as usual those worried about
breaking already broken code appear to have won the day.


Again, why should a C implementation be rendered non-conforming [to some
future Standard] thusly?

In a "bare metal" environment, one might very well wish to overwrite
their string literals' storage, no? The "bare metal" implementation
might need to define such action as being appropriate.

By using "proper" static arrays, we lose out on the "shared storage"
benefit. Writing for bare metal, hopefully one knows what one is doing.

Is this example silly?

If there's a need for a "bare metal" environment to be able to modify
string literals, that can be provided as an extension.

As in, a documented extension?

Well, yes, documenting it would certainly be a nice touch.
Exactly the nature of my question. If there is at least one real
program that depends on this "undefined behaviour" (use the Standard
definition, please :) ),

If you were correcting my spelling from "behavior" to "behaviour", the
Standard uses the US-style "behavior" spelling. If not, what do you
mean?

(Hmm, does the UK standard body, its equivalent of the US ANSI,
"translate" ISO standards into UK spellings?)
then that program's source code might have to
be adjusted for future versions of an implementation, depending on how
that future implementation implements the extension.
If such a feature were desirable, we could have an optional 'M' (for
modifiable) prefix for string literals, similar to the existing 'L'
prefix for wide string literals. For example, "hello" could be of type
const char[6], and M"hello" could be of type char[6]].

An interesting idea! 'M()' could get close, perhaps.

#define M(string) ((char[]){string})

I guess we'd need 'ML', too?
The fact that I've never heard of anyone implementing something like
this suggests (though not strongly) that there's no demand for it.

Suppose you've a program loaded into memory via a serial line. Suppose
the program is loaded into writable memory (that seems pretty likely).
Suppose the program offers a CLI. Suppose a user can rename commands or
variables, or redefine preset scripts. Yes, all of these could be done
cleanly (in my opinion) via 'static' 'char[]'s, but it can be more
convenient for some people to simply type the string literal right into
some spot in the source code where it's used and forget about coming up
with a meaningful (and possibly redundant) identifier, i.e.
'csz_Hello__world___And_how_are_you__today_'

And suppose the user wants to replace the content with a string that's
longer than the original literal.

[snip]
For the right use case, I would.

Perhaps.

There are benefits and drawbacks both to allowing modifications of
string literals, and to disallowing them. IMHO the benefits of
disallowing modifications (catching potential errors) far outweight the
benefit for a few obscure use cases.

My M"..." proposal *could* give us the best of both, and I wouldn't
object if it showed up in a future standard. I just don't think
it's sufficiently useful.
Well doesn't 'const' "come from" C++, anyway?

I think so.
And why isn't there a write-only counterpart, for symmetry, such as a
memory-mapped port that mustn't be read from?

Lack of usefulness, I suppose. "const" (which perhaps should have been
called "readonly") is massively useful. "writeonly" would be probably
useful only in some very low-level code. And an implementation could
provide a #pragma that does the same thing (have any done so?).
And during very early development (and Usenet code examples), can it be
pleasant to avoid 'const' concerns altogether and then to analyze and
refine, gradually sprinkling 'const' in where appropriate? I don't
advocate this, but imagine that some folks might get "stuck" in
"analysis paralysis" if they had to think 'const'ness through at every
corner. I could be mistaken.

I think designing const into your code from the start is a lot
easier.

My personal preference is to declare everything "const" *unless*
I specifically need to modify it. In fact, if I were designing
a new language (without concern for backward compatibility),
declared objects would be read-only by default, with some special
syntax ("var"?) to make them writable. With sufficiently flexible
initialization, I suspect most objects don't need to be modified
after their creation. (I do not for one moment suggesting making
such a change to C.)
 
I

Ian Collins

If you were correcting my spelling from "behavior" to "behaviour", the
Standard uses the US-style "behavior" spelling. If not, what do you
mean?

(Hmm, does the UK standard body, its equivalent of the US ANSI,
"translate" ISO standards into UK spellings?)

Alas, no. We have to suffer the cultural imperialism!
 
K

Keith Thompson

Ian Collins said:
On 06/25/11 07:39 AM, Shao Miller wrote: [...]
Well doesn't 'const' "come from" C++, anyway?

No, it's just used properly there!

See the ANSI C Rationale, at
<http://www.lysator.liu.se/c/rat/c5.html#3-5-3>:

The Committee has added to C two type qualifiers: const and
volatile. Individually and in combination they specify the
assumptions a compiler can and must make when accessing an
object through an lvalue.

The syntax and semantics of const were adapted from C++; the
concept itself has appeared in other languages. volatile is
an invention of the Committee; it follows the syntactic model
of const.
 
I

Ian Collins

Ian Collins said:
On 06/25/11 07:39 AM, Shao Miller wrote: [...]
Well doesn't 'const' "come from" C++, anyway?

No, it's just used properly there!

See the ANSI C Rationale, at
<http://www.lysator.liu.se/c/rat/c5.html#3-5-3>:

The Committee has added to C two type qualifiers: const and
volatile. Individually and in combination they specify the
assumptions a compiler can and must make when accessing an
object through an lvalue.

The syntax and semantics of const were adapted from C++; the
concept itself has appeared in other languages. volatile is
an invention of the Committee; it follows the syntactic model
of const.

OK, but it was standardised in C long before C++.
 
T

Tim Rentsch

Shao Miller said:
[discussing modifiable string literals]

An interesting idea! 'M()' could get close, perhaps.

#define M(string) ((char[]){string})

Different storage duration.
 
T

Tim Rentsch

Ian Collins said:
Noob said:
Keith Thompson wrote:

Even if you corrected the type mismatch by writing:

char *s = "hello";
*s = 'w';

your program's behavior would be undefined. s points to a string
literal (more precisely, it points to the first element of the static
array associated with the string literal), and any attempt to modify a
string literal has undefined behavior.

Since modifying a character string literal already has UB in
the current standard, then why doesn't the next standard
specify that string literal have type const char[] instead
of just char[] ?

Because the cost is large and the benefit is small.

The benefit can be the difference between something failing to compile
or failing horribly at run time.

But no language change is needed to obtain that benefit; for
those who want it, it's available today through compiler options.
Under what circumstances?

I can easily imagine an implementation providing a compiler
option to make string literals modifiable - not turned on
all the time, but having the option. When might that option
be useful? Some examples:

1. Compiling legacy code that assumes literals are writable.

2. To force string literals to unique locations to help track
where various strings appear in the program (working under the
assumption that a writable-literals option would force different
literals to distinct locations, which it should).

3. During debugging, it might be handy to be able to change
a particular string literal, eg a printf() format, to help
explore program behavior.

I admit these examples may not occur very often. Still, why give
up the flexibility to preserve them, since the language as it
exists today also allows the option of checking string literals
being used "const"-inappropriately -- what benefit would we get
that we don't already have?
 
K

Keith Thompson

Tim Rentsch said:
I can easily imagine an implementation providing a compiler
option to make string literals modifiable - not turned on
all the time, but having the option. When might that option
be useful? Some examples:

1. Compiling legacy code that assumes literals are writable.

Valid, but I think a lot of such code, perhaps most of it, has been
fixed by necessity, after it blew up when it was recompiled by a
compiler that makes literals non-writable.
2. To force string literals to unique locations to help track
where various strings appear in the program (working under the
assumption that a writable-literals option would force different
literals to distinct locations, which it should).

That could be done by an option that just forces literals to unique
locations without making them writable.
3. During debugging, it might be handy to be able to change
a particular string literal, eg a printf() format, to help
explore program behavior.

Ok, but I don't think I've ever felt the need to do that.
I admit these examples may not occur very often. Still, why give
up the flexibility to preserve them, since the language as it
exists today also allows the option of checking string literals
being used "const"-inappropriately -- what benefit would we get
that we don't already have?

We'd get the benefit of a guarantee that programs accidentally
attempt to write string literals will be caught and fixed more
easily, regardless of which conforming compiler we're using.
A hypothetical compiler option doesn't do me much good if the
compiler I'm using doesn't provide it (recompiling with a different
compiler isn't always an option).

Similar arguments could be made in favor of making modifying a
const-qualified object undefined behavior rather than a constraint
violation:

const int x = 42;
x = 43;

In my opinion, the only good reason to consider allowing string
literals to be modifiable is for compatibility with very old
implementations. I suspect that if string literals had been const
from the beginning (which would have required inventing "const"
many years sooner), we wouldn't be having this discussion.
 
S

Shao Miller

A recipe for disaster! What happens if a new name is linger than the old?

When it's time to overwrite, count how many characters before the null
character and limit accordingly. This'd mean that redefining these
could narrow the strings and one couldn't redefine with a larger string
afterwards, but a script mightn't need to redefine more than once; i.e.
defaults.

But I wouldn't worry about that as much as I'd worry about the potential
for shared storage. ;)
Yes, all of these could be done
cleanly (in my opinion) via 'static' 'char[]'s, but it can be more
convenient for some people to simply type the string literal right into
some spot in the source code where it's used and forget about coming up
with a meaningful (and possibly redundant) identifier, i.e.
'csz_Hello__world___And_how_are_you__today_'

Yuck, what a contrived example! As you say, there is a method that does
not use undefined behaviour.

Yes, it is contrived. Yes, it's cleaner the other way. But ought our
opinions to be enforced globally via such a change in the Standard?
Some programmers might just like it. Since I'm not one of them, maybe I
shouldn't be attempting to defend their position. Heh.
No, it's just used properly there!

I really thought it did come from C++. Oops.
That case isn't uncommon in embedded systems (watchdog reset being a
common example). However reading is generally harmless.

But a "write-only" attribute still adds useful information. A
programmer coming along to work on someone else's code mightn't realize
that there is no expectation whatsoever for the value of an object, once
read. Such an attribute could allow them to find that out at
translation time. It's a digression, anyway. :)
Not really, if you don't know whether the function you are writing will
modify its arguments, you are big trouble!

That seems just a bit beginner-unfriendly, to me. I've seen beginners
struggle with 'const'-ness quite a bit, especially with one or more
levels of indirection.

My point just there is that the behaviour of a program can be the same
with 'const' completely removed. So it seems more like something that
"ought to" be used rather than "must" be used. But hey, that's just an
opinion.
 
S

Shao Miller

Shao Miller said:
On 6/22/2011 2:56 PM, Ian Collins wrote:
On 06/22/11 10:25 PM, James Kuyper wrote:
[...]
The existing code problem was a acknowledged when C++ changed the type
of string literals. Compilers may choose not to issues a diagnostic for
this case. Now we have had over a decade to fix the smelly code, I
believe a diagnostic is now required by the new C++ standard.

C could and should have done the same, but as usual those worried about
breaking already broken code appear to have won the day.


Again, why should a C implementation be rendered non-conforming [to some
future Standard] thusly?

In a "bare metal" environment, one might very well wish to overwrite
their string literals' storage, no? The "bare metal" implementation
might need to define such action as being appropriate.

By using "proper" static arrays, we lose out on the "shared storage"
benefit. Writing for bare metal, hopefully one knows what one is doing.

Is this example silly?

If there's a need for a "bare metal" environment to be able to modify
string literals, that can be provided as an extension.

As in, a documented extension?

Well, yes, documenting it would certainly be a nice touch.
Exactly the nature of my question. If there is at least one real
program that depends on this "undefined behaviour" (use the Standard
definition, please :) ),

If you were correcting my spelling from "behavior" to "behaviour", the
Standard uses the US-style "behavior" spelling. If not, what do you
mean?

Oops; no. Just an attempt to direct the interpretation of "that depends
on this \"undefined behaviour\"". It's previously been demonstrated
that it might be interpreted as plain English rather than the precise
Standard definition.
(Hmm, does the UK standard body, its equivalent of the US ANSI,
"translate" ISO standards into UK spellings?)

Heh. I've no idea.
Suppose you've a program loaded into memory via a serial line. Suppose
the program is loaded into writable memory (that seems pretty likely).
Suppose the program offers a CLI. Suppose a user can rename commands or
variables, or redefine preset scripts. Yes, all of these could be done
cleanly (in my opinion) via 'static' 'char[]'s, but it can be more
convenient for some people to simply type the string literal right into
some spot in the source code where it's used and forget about coming up
with a meaningful (and possibly redundant) identifier, i.e.
'csz_Hello__world___And_how_are_you__today_'

And suppose the user wants to replace the content with a string that's
longer than the original literal.

So let them. That's a discussion about user expectations. The user
might even be the programmer, wishing to override some defaults for a
particular client's needs.

Above is just an example for why a programmer might not want string
literals to be 'const'-qualified, especially in an environment where
'const' mightn't have any physical meaning; where all program memory is
writable, and usefully so.
[snip]
For the right use case, I would.

Perhaps.

There are benefits and drawbacks both to allowing modifications of
string literals, and to disallowing them. IMHO the benefits of
disallowing modifications (catching potential errors) far outweight the
benefit for a few obscure use cases.

They probably do out-weigh in terms of use cases, in my opinion.
My M"..." proposal *could* give us the best of both, and I wouldn't
object if it showed up in a future standard. I just don't think
it's sufficiently useful.

Agreed.


I think so.


Lack of usefulness, I suppose. "const" (which perhaps should have been
called "readonly") is massively useful.

I find it useful as a quality measure, potential optimization
opportunity, and for interface specification, but would find
"write-only" useful for the same reasons.
"writeonly" would be probably
useful only in some very low-level code. And an implementation could
provide a #pragma that does the same thing (have any done so?).

Well MS has '_in', '_out', and '_inout'[1].
I think designing const into your code from the start is a lot
easier.

That strikes me as a useful habit for:
- A programmer that understands 'const'
- A programmer that perceives benefits from using 'const'

I'm just not sure that's the set of all programmers.
My personal preference is to declare everything "const" *unless*
I specifically need to modify it. In fact, if I were designing
a new language (without concern for backward compatibility),
declared objects would be read-only by default, with some special
syntax ("var"?) to make them writable. With sufficiently flexible
initialization, I suspect most objects don't need to be modified
after their creation. (I do not for one moment suggesting making
such a change to C.)

What are the major benefits of having such a read-only attribute as a
default? I think that there are lots of concepts translating to code
constructs and patterns that make for high-quality code (such as
'const'), but do they have meaning in every environment? As an example,
if "storage" can be "readable" and "writable," isn't it kind of pleasant
that subsets can be defined by explicit inclusion of a predicate such as
'const'? As another example, if "storage" can be "addressable" and
"non-addressable," isn't it kind of pleasant that subsets can be defined
by explicit inclusion of a predicate such as 'register'?

But maybe you're right and maybe 'const'-default string literals could
raise the average quality of C code. :)

[1] http://msdn.microsoft.com/en-us/library/aa383701(v=VS.85).aspx
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,091
Messages
2,570,604
Members
47,224
Latest member
Gwen068088

Latest Threads

Top