Non-constant constant strings

K

Keith Thompson

Rick C. Hodgin said:
I do not believe in automatic storage.

You apparently do not understand automatic storage.

Any variable sdefined inside a function (without the "static" keyword)
have automatic storage duration. That's equally true if they're all
defined at the top of the function. Such variables are allocated on
entry to the function, and deallocated on return from the function.

Variables defined in an inner block also have automatic storage
duration, and at least notionally are allocated and deallocated on entry
to and exit from the nearest enclosing block (compound statement,
delimited by "{" and "}"). I think that's what you "do not believe in"
(by which I presume you mean "dislike"). BTW, a compiler could simply
allocate all such variables on function entry and exit.

[...]
 
K

Keith Thompson

Ben Bacarisse said:
Rick C. Hodgin said:
In moving forward, I would write some new ability to create the list
properly, giving the compiler a new ability to handle variable items:

char *archiveFormats[] =
{
#elementif CPIO_SUPPORTED "cpio"
#elementif TAR_SUPPORTED "tar"
#elementif ZIP_SUPPORTED "ZIP"
#elementif APK_SUPPORTED "apk"
null // Optional
};

Something like this might suit:

#define OPT(which, what) which(what)

#define YES(x) x,
#define NO(x)

#define X YES
#define Y NO

const char *list[] = {
OPT(X, "abc")
OPT(Y, "def")
};

<snip>

The whole point is that you don't know, for each element, whether it's
going to be the last in the list.
 
J

James Kuyper

I do not believe in automatic storage. I think it's something that hides
operations and removes them from overt, explicit, code execution.

I can't imagine why you would think that. How does the hiding occur, and
in what context?

None of your C functions take parameters? All function parameters have
automatic storage duration.

I guess you must either avoid recursion, or have a strong aversion to it
- implementing recursion in C without using objects of automatic storage
duration is feasible, but very complicated.

....
I define
all of my variables at the top and then include an explicit section which
initializes them below, usually as the first thing done in the function. In
that way, anyone stepping through the code or looking at the code, would see
step by step exactly what's happening. There would be no room for confusion,
nor would one have to deduce what's going on.

I can't imagine why you would think that making use of objects with
automatic storage duration would would cause confusion or make people
have to deduce what's going on. Could you explain?
FWIW, I would never, under any circumstances, code something like Keith's
example. In fact, when I saw that code I had to try it out to see if it
was actually legal. :) I looked at the assembly it was doing and it was
interesting. There was an issue in Visual C++ with the char* result = malloc()
bit. malloc()'s void* return needed to be cast to char* to compile. :)

You must have been compiling in C++ mode; the rules governing such
statements are different in the two languages. You need to make sure
which language you're talking about - otherwise you'll confuse not only
other people, but also yourself.
I just don't code like that. It seems overtly confusing. I would much
rather have defined blocks which do everything explicitly.

There's nothing about automatic storage duration that is less explicit
than it is for static storage duration. Quite the contrary, actually.
Objects with static storage duration that are not explicitly initialized
are implicitly zero-initialized.
 
R

Rick C. Hodgin

That's mixing levels. Your remarks were about what should be seen at
the language level.

The language is only a mechanism to get something down to the mechanical
level. It takes human readable source code and converts it into 1s and
0s in a particular pattern which represents the peculiarities of the CPU's
machine code and data models.

I look at computing as real computing, meaning what takes place inside of
the CPU. It reads data in to its registers. It executes them through a
processing engine (an adder, shifter, etc.), and then produces output which
is stored back into registers, or into memory.

There are NO computers anywhere which do not do this. They all require
input, do something with that input, and generate output. It is read/write.
All of it.
Most computers work with voltages, but there are no voltage setting
commands in most programming languages.

There are settings at the CPU level which can affect these things. Intel
has defined several MSRs that affect all kinds of motherboard features,
including voltages. C has no concept of this, but the C I use is able to
include assembly opcodes which generate into real commands.
Kaz's point is
that programming does not imply what you seemed to be talking about --
modification of things (variables, objects and so on) in the programming
language.

It always does. Period. I defy anyone to point to one example where
computer programming does not relate to modifying something. Even if it's
only going through an instruction stream, the instruction pointer is being
modified. It is reading input, executable code, and producing output,
stepping ahead, and more than likely doing real-work in-between (and not
just an endless series of NOP instructions).
Yet you know Kaz is categorically wrong? Surely you accept that there
might be people who know more about programming languages and language
design than you do?

In this case, yes. Computers process.
My FWIW is that Kaz's remark is pertinent to your goal of massively
parallel programming. The read/write model is highly problematic for
many parallel architectures and, if I *had* put money on it, I would say
that a functional programming language (albeit probably not a purely
functional one) will the winner in this field. You might want to look
into it the idea rather than dismiss it as categorically wrong.

I have a solution for that, but it requires a new hardware model designed
around that thing I've been discussing.

Best regards,
Rick C. Hodgin
 
J

James Kuyper

Then my ro and rw would be new type qualifiers which alter the declaration
of the variable to be either a constant read-only or variable read-write.
By default it would always be rw, unless the compiler switch -RO was used,
and any variable can be explicitly type qualified using ro or rw based upon
need in source code.

How does C handle read-only variables allocated and initialized at runtime?

Any C object which is itself declared 'const' can be allocated in
read-only memory - but whether or not that actually happens is up to the
compiler. Note: the precise position of "const" in the type
specification relative to "*" is very important. In the declaration

char const *p = "Hello ";

p itself is not declared "const", only the thing that p points at, so p
will generally not be stored in read-only memory (if your code never
actually tries to change p, the compiler might notice that fact, and
choose to put it in read-only memory anyway, even though it is not
declared "const"). This is what you should do with any pointer that
points at a string literal. In

char target[] = "World!";
char * const q = target;

q itself is declared const, and therefore could be stored in read-only
memory.

All C objects are always allocated and initialized at run time. The
difference you may be thinking about is the one between objects with
static storage duration, which are (at least conceptually) allocated and
initialized at run time before main() starts executing, and those with
automatic storage duration, which are allocated and executed after
main() starts. Objects with automatic storage duration can be stored in
read-only memory, if declared "const".
I know in Windows and Visual C++ we can mark a memory range to be a particular
type (such as read-only), change it to read-write, populate a new variable,
and then set it back, but that is an OS convention, not a language convention.

Is there a C method to handle runtime-allocated read-only variables apart
from the const type qualifier, which would've been only enforceable at
compile time?

Yes, const-qualified objects with automatic storage duration can be
implemented that way; whether they actually are implemented that way is
up to the compiler.
 
K

Keith Thompson

Rick C. Hodgin said:
The language is only a mechanism to get something down to the mechanical
level. It takes human readable source code and converts it into 1s and
0s in a particular pattern which represents the peculiarities of the CPU's
machine code and data models.

Then you have a very different point of view from mine.

The way I look at it, a programming language is mechanism for specifying
*behavior*. How the implementation translates my C code into machine
language is of little concern to me as long as the resulting program
behaves the way I want it to. I don't even necessarily know or care
which flavor of CPU my code is running on.

[...]
 
K

Keith Thompson

James Kuyper said:
On 01/22/2014 05:28 PM, Rick C. Hodgin wrote: [...]
How does C handle read-only variables allocated and initialized at runtime?

Any C object which is itself declared 'const' can be allocated in
read-only memory - but whether or not that actually happens is up to the
compiler.

Not necessarily:

const int r = rand();

(unless the target system has some way of initializing memory at run
time *and then* making it read-only -- which would be an interesting
feature).

To answer Rick's question, marking an object as "const" means that the
compiler will reject (or at least warn about) any attempt to modify that
object directly. Whether it can be stored in physical or virtual
read-only memory is another matter. Attempting to modify a
const-qualified object (which you can do indirectly) has undefined
behavior.

For example, let's say we have:

const int ro = 42;

at file scope. The compiler may or may not store ro in read-only
memory. If it does, then executing

*(int*)ro = 43;

will likely cause the program to crash. If it's not stored in read-only
memory, then executing that same statement will likely modify the value
stored in ro -- but since the behavior is undefined, a later attempt to
read the value of ro might yield 42, or 43, or anything else. It could
plausibly yield 43 if the generated code simply retrieves the value
stored at that address. It could plausibly yield 42 if the compiler,
recognizing that ro was declared const, *assumes* that its value hasn't
changed.

The phrase "read-only memory" might refer either to physical ROM or to
RAM that's protected from modification by the virtual memory system.
 
J

James Kuyper

James Kuyper said:
On 01/22/2014 05:28 PM, Rick C. Hodgin wrote: [...]
How does C handle read-only variables allocated and initialized at runtime?

Any C object which is itself declared 'const' can be allocated in
read-only memory - but whether or not that actually happens is up to the
compiler.

Not necessarily:

const int r = rand();

(unless the target system has some way of initializing memory at run
time *and then* making it read-only -- which would be an interesting
feature).

Rick did in fact mention such a feature later on in the same message,
and I believe I've heard of such a feature from other people, as well.
If such a featured does exist, making use of it for 'r' would not
violate any requirement of the C standard, which is the only thing that
matters from my perspective.
 
K

Kaz Kylheku

The language is only a mechanism to get something down to the mechanical
level.

Sorry, no it isn't. The language is also the medium in which we express the
solution, and with the help of which we think about the solution.

How a task is solved in different languages can be radically different. (Take
a look at the Rosetta Code website, where volunteers develop solutions to
numerous problems, in numerous languages, for comparative purposes.)

A programming language can supply us with a mental model of computation which
is quite different from the one which is "native" to the electronic computer
which realizes it, and we can work with the program at a level which is below
that mental model.

Someone fine-tuning the cascaded style sheet (CSS) of a set of web pages hardly
needs to think about machine registers. The paradigm is one of selectors which
match elements of the document, and apply properties. Any lower level
consideration of the semantics is pointless.

And note that your utilitarian statement above is at odds with why you came
here.

You already had your solution working at the mechanical level; and you
were looking to polish that turd, without changing the mechanical level.

Why? Obviously, because you care about how the solution is expressed in the
source language.
I look at computing as real computing, meaning what takes place inside of
the CPU.

Then why do you care about the difference between

char x[] = "abc";
char *y = { x };

versus some other declarative layout which produces the same bits?

Are those differences about "real" computing?
 
S

Seebs

Interesting. To recap, my reasoning relates to the nature of software and
computers in general. They are just that: computers. They take input, process
it, and generate output. That, by definition, means read-write. The only
cases where I would like to have something be read-only is when I explicitly
cast for it, indicating that this variable should not be modified. In that
way the compiler can help catch errors, as can the runtime environment.

I feel that way about many things, but literal constants are not among them.

I mean, consider that FORTRAN actually did exactly what you suggest, to such
an extent that you could indeed modify integer constants in some cases. That
didn't go over well with most people.

-s
 
S

Seebs

Understandably. I believe if you want something to be constant, you should
declare it as const, and then it is created at executable load time into an
area of memory which is explicitly marked read-only. It will signal a fault
when an attempt to write to it is made.

C runs on machines which don't have the ability to trigger such faults. :)

But seriously, you think that a plain literal number in a program ought to have
to be explicitly declared constant? Consider a program like this:

#include <stdio.h>

int add(int x, int y) {
y = y + x;
x = x + 1;
return y;
}

int main(void) {
int y = 0;
for (i = 0; i < 10; ++i) {
y = add(1, y);
}
printf("y: %d\n", y);
return 0;
}

You think that this program ought to produce, not 10, but some larger value,
because the literal 1 is not declared specifically to be constant, and thus
it ought to be modifiable?

(Not "will produce", since C is generally pass-by-value, but "should" as a
matter of design philosophy.)
I disagree. A literal should be a typed bit of text which explicitly conveys
a value of known type at compile time.
Okay...

It should not be read-only unless it
is explicitly cast with a const prefix. I would also change that to not
require the bulky keyword const, but to use C"text" or some other shorthand
equivalent.

What about non-string literals, like numbers?
I'm not talking about changing C. C will be the way C is forever because
there is so much legacy code written in it. I'm talking about a new standard
for another version of C.

It doesn't sound like a "version of C".
I think you must mean something related to how strings are physically packed
together in memory by the compiler is undefined, and therefore overwriting a
string boundary, for example, would cause undefined behavior..
No.

If it's
otherwise ... I'm confused because writing to a string is a fundamental
component of data processing, and is something that should be wholly
defined. :)

But writing to the contents of a pointer to a literal is intentionally
undefined, because there is no guarantee that literals were stored in
writeable memory.

-s
 
R

Rick C. Hodgin

I feel that way about many things, but literal constants are not among them.

I mean, consider that FORTRAN actually did exactly what you suggest, to such
an extent that you could indeed modify integer constants in some cases. That
didn't go over well with most people.


There's another misunderstanding taking place here. The literal is the
value used to populate the constant. The literal cannot change, by
definition, but only variables can change. The literal only conveys
compile-time information. What exists after that is no longer the literal,
but whatever it was converted into by the compiler. In some cases it will
be part of the assembly instruction. In other cases it will be some data
in read-only memory. In other cases it will be some data in read-write
memory. The point is ... literals cannot change. They only convey some
information for a time, and are then discarded, being substituted by what
they were changed into. So when I'm talking about modifying some value,
it is the literal that's been assigned to some placeholder variable, which
now has a memory address and can be changed.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

C runs on machines which don't have the ability to trigger such faults. :)

Yes. But not in any of the cases where I program. I live in the world of
x86 and ARM CPUs. They trigger faults.
But seriously, you think that a plain literal number in a program ought
to have to be explicitly declared constant?

No. I consider a literal number to already to be a constant. It is an
explicit value of determinate quantity used for whatever purpose. However,
if you assign that literal number to something like:
int foo = 5;

....then the variable foo must be declared const if you don't want its value
to change.
Consider a program like this:

Must I? :)
#include <stdio.h>
int add(int x, int y) {
y = y + x;
x = x + 1;
return y;
}
int main(void) {
int y = 0;
for (i = 0; i < 10; ++i) {
y = add(1, y);
}
printf("y: %d\n", y);
return 0;
}

You think that this program ought to produce, not 10, but some larger value,
because the literal 1 is not declared specifically to be constant, and thus
it ought to be modifiable?

No. I consider all literal numbers to already be constants. What I mean is
that every variable created should be read-write by default, unless explicitly
type qualified as const. In such a case, then it becomes a read-only constant.
Until that time, all variables are read-write.

Example:
char* list[] = { "one", "two", "three" };

In this case, list[0] should contain a writable character array of four bytes,
lines[1] likewise, and list[2] should be six bytes. However, today unless
you use the compound literal syntax on GCC, the string literals are converted
to read-only constants. In my view, that should not be.

I think string literals should only exist as constants when used like this:
printf("Hi, mom!\n");

Or when they are explicitly type qualified, as in:
const char foo[] = "Hi, mom!";

Without using the const prefix, those string literals should always be
converted at compile-time into read-write values.
(Not "will produce", since C is generally pass-by-value, but "should" as a
matter of design philosophy.)

You could use int& x, though it would not work in this case because 1 is
a literal number ... right? :)
What about non-string literals, like numbers?

See above.
It doesn't sound like a "version of C".

It uses nearly all C syntax and conventions. It just greatly relaxes many
restraints and constraints, converting errors into warnings. This code
will not compile in Visual C++ without an error:

#include <stdlib.h>
int main(int argc, char* argv[])
{
union {
char* p;
int _p;
};

p = malloc(5);
return(_p);
}

It will fail on the malloc() line because malloc() returns void*. Well,
it's not really an error. It's a violation of C's pointer protocol that
says you must explicitly cast malloc()'s return as (char*) before you assign
it to C. My language will not require that, but only generate a warning.
My IDE will also let you permanently silence warnings that you've already
"signed off on" .... and until the associated source code line changes
again, you'll never see that warning on that line.
But writing to the contents of a pointer to a literal is intentionally
undefined, because there is no guarantee that literals were stored in
writeable memory.

Oh! I misunderstood what you were saying.

I don't see this as undefined behavior. The string literal only conveyed
information at compile time, unless it was used in an expression, at which
case it should not be undefined behavior, but rather it should signal a fault
(on those machines which can signal a fault) and simply conduct the write on
those machines that do not have protected memory and therefore cannot isolate
the inappropriate writes from the appropriate ones.

void foo(char* p)
{
p[0] = '5';
}

char data1[] = "Hi, mom!";
const char data2[] = "Hi, mom!";

int main(int argc, char* argv[])
{
foo("Hi, mom!");
}

Use of the string literal foo("Hi, mom!") should signal a fault.
Use of the string literal used to populate data1 should work fine.
Use of the string literal used to populate data2 should signal a fault.

It is not the string literal that will ever change. It is the thing that
it is populated into. In the case of foo("Hi, mom!") it will have created
an externally unnamed variable in memory somewhere. On machines which have
read-only and read-write memory, it will have gone to read-only. On other
machines it will have gone to read-write memory. In any event, at that
point the string literal no longer exists, but rather the memory location
the compiler assigned for it does exist.

It should not be undefined behavior, but rather defined. There may need to
be exceptions of unimplemented features on certain architectures which cannot
support fault signaling, for example, but the behavior should be defined.

Best regards,
Rick C. Hodgin
 
B

Ben Bacarisse

Rick C. Hodgin said:
Something like this might suit:

#define OPT(which, what) which(what)

#define YES(x) x,
#define NO(x)

#define X YES
#define Y NO

const char *list[] = {
OPT(X, "abc")
OPT(Y, "def")
};

Aren't you now required to include somewhere in your source code an explicit
list of operators for each optional entry? And with code generators, don't
those usually come from external sources?

I don't understand your point at all. I was suggesting a way to get the
sort of conditional behaviour you wanted without a new directive. I may
have misunderstood what you wanted.

If you are asking a follow-up question about getting different results
from that same list, there is a common technique that uses an external
list to generate different things in different places:

You have a file with the list like this:
==== ops.inc ===
X(cpio)
X(tar)
X(zip)
X(apk)
================

To get a list of strings in an array you write:

const char *list[] = {
#define X(s) #s,
#include "ops.inc"
#undef X
};

To get a set of function declarations and function pointers you write:

typedef void operation(void);
#define X(s) operation do_##s;
#include "ops.inc"
#undef X

operation *ops[] = {
#define X(s) do_##s,
#include "ops.inc"
#undef X
};

<snip>
 
I

Ian Collins

Rick said:
On Wednesday, January 22, 2014 9:08:04 PM UTC-5, Seebs wrote:

Example:
char* list[] = { "one", "two", "three" };

In this case, list[0] should contain a writable character array of four bytes,
lines[1] likewise, and list[2] should be six bytes.

How can it when list is an array pointers?
However, today unless
you use the compound literal syntax on GCC, the string literals are converted
to read-only constants. In my view, that should not be.

Compound literals are a part of C, not a gcc extension.
I think string literals should only exist as constants when used like this:
printf("Hi, mom!\n");

Or when they are explicitly type qualified, as in:
const char foo[] = "Hi, mom!";

Without using the const prefix, those string literals should always be
converted at compile-time into read-write values.

You haven't declared a literal, you have declared an array of char which
is writable.
It uses nearly all C syntax and conventions. It just greatly relaxes many
restraints and constraints, converting errors into warnings. This code
will not compile in Visual C++ without an error:

Because you are compiling it as C++.
#include <stdlib.h>
int main(int argc, char* argv[])
{
union {
char* p;
int _p;
};

This isn't valid C.
p = malloc(5);
return(_p);

Both p and _p are undeclared variables in C.
}

It will fail on the malloc() line because malloc() returns void*. Well,
it's not really an error.

It is in C++.
 
R

Rick C. Hodgin

Rick said:
On Wednesday, January 22, 2014 9:08:04 PM UTC-5, Seebs wrote:

Example:
char* list[] = { "one", "two", "three" };

In this case, list[0] should contain a writable character array of four bytes,
lines[1] likewise, and list[2] should be six bytes.

How can it when list is an array pointers?

The pointers should point to read-write values.
Compound literals are a part of C, not a gcc extension.

They are not implemented on all C compilers. Visual C++ does not support
the syntax, but GCC does. I used it as a well-known example of a compiler
that supports it.
I think string literals should only exist as constants when used like this:
printf("Hi, mom!\n");
Or when they are explicitly type qualified, as in:
const char foo[] = "Hi, mom!";
Without using the const prefix, those string literals should always be
converted at compile-time into read-write values.

You haven't declared a literal, you have declared an array of char which
is writable.

I've used a string literal to initially populate it.
It uses nearly all C syntax and conventions. It just greatly relaxes many
restraints and constraints, converting errors into warnings. This code
will not compile in Visual C++ without an error:

Because you are compiling it as C++.
#include <stdlib.h>
int main(int argc, char* argv[])
{
union {
char* p;
int _p;
};

This isn't valid C.

I did not know that. It's why I use a C++ compiler to compile my C code.
It has many syntax allowances C does not.

union {
char* p;
int _p;
};
Both p and _p are undeclared variables in C.

It is in C++.

Yeah, but not really. It's only a violation of C++'s protocol for pointer
exchange. As far as the machine goes, it's a pointer and they can be
exchanged. In my opinion, the compiler should allow it, and only warn
about it.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

Ian Collins

Ian, do I know you from somewhere? Your name is very familiar to me.

Best regards,
Rick C. Hodgin
 
B

Ben Bacarisse

Rick C. Hodgin said:
The language is only a mechanism to get something down to the mechanical
level. It takes human readable source code and converts it into 1s and
0s in a particular pattern which represents the peculiarities of the CPU's
machine code and data models.

Would all computer programs stop meaning anything if all computers
vanished form the world? The more high-level the language, the more its
meaning is independent of the hardware. For some problems (and
parallelism is one) that is an important separation of levels. There's
no a lot a leeway in C for the language to help with parallelism. It's
the language's view of what a program is that imposes this restriction.
That all program in all languages eventually become zeros and one is a
truism, but it misses that crucial point.
I look at computing as real computing, meaning what takes place inside of
the CPU. It reads data in to its registers. It executes them through a
processing engine (an adder, shifter, etc.), and then produces output which
is stored back into registers, or into memory.

There are NO computers anywhere which do not do this. They all require
input, do something with that input, and generate output. It is read/write.
All of it.

Yes, but there are languages that don't expose this restriction to the
programmer. In C, the order of expression evaluation is not specified.
That lets the implementation make some optimisation's, but it can't look
at f() + g() and decide to run f and q concurrently on separate
processors. In a purely functional language it could. The machines are
as messing in both cases but if the language can't express assignment,
f() + g() can always be made concurrent. How the language expresses
things is more important than what all machines end up doing at the
lowest level.

In this case, yes. Computers process.

But you missed his point. You may be correct in what you are saying,
but that does not mean that everyone else is wrong. "It's all
read/write" does not negate "assignment is not needed" for example.

<snip>
 
K

Kaz Kylheku

There's another misunderstanding taking place here. The literal is the
value used to populate the constant. The literal cannot change, by
definition, but only variables can change. The literal only conveys

Literal: Short form of "literal constant". Piece of datum which is part of
the body of a computer program. Use of a literal is a form of
self-reference. Modification of a literal is self-modification.

Constant: 1. A run-time storage location resembling a variable, but
which can only be initialized, and not mutated.
The manner by which a storage loaction is attributed as
constant varies: if storage locations have type, it can be
a type attribute; or it can be some special attribute of
storage locations.
2. A symbol, occurences of which are effectively replaced by a
literal by a compiler or interpreter. (Also "manifest constant").
Global constants of type 1. above can be treated as de-facto
type 2. by compilers as an optimization.
3. Another short form of "literal constant".

"Modification of a constant" can mean: self-modifying code (somehow making 3
evaluate to 4, or mutating string literals or other kinds of literals in
languages that have them, like quoted lists in Lisp); modification of a
compile-time symbol (manifest constant) to have some other value (like #undef
FOO in C, followed by #define FOO 42); or modification of a run-time storage
location (defeating const with casts, and then overwriting a declared object in
C).
 
K

Kaz Kylheku

I did not know that. It's why I use a C++ compiler to compile my C code.
It has many syntax allowances C does not.

Modern C also has syntax allowances that C++ does not.

The C99 literal casting trick you've become enamored with isn't in C++, oops.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,575
Members
47,207
Latest member
HelenaCani

Latest Threads

Top