Non-constant constant strings

James Kuyper · Jan 23, 2014

Rick said:
Rick said:

On Wednesday, January 22, 2014 9:08:04 PM UTC-5, Seebs wrote:

Example:
char* list[] = { "one", "two", "three" };

In this case, list[0] should contain a writable character array of four bytes,
lines[1] likewise, and list[2] should be six bytes.

Click to expand...

How can it when list is an array pointers?

Click to expand...

The pointers should point to read-write values.

Note that there is a big difference between "contain" and "point at".

....

They are not implemented on all C compilers.

It is supported on all fully conforming implementations of C99, which
has been the current version of the C standard for nearly 15 years now.

....

I did not know that. It's why I use a C++ compiler to compile my C code.
It has many syntax allowances C does not.

It also disallows many kinds of syntax that C supports. This is a pretty
normal case when you're talking about two different languages. There's
nothing wrong with compiling code using C++; but when you do so, you
should discuss the consequences in comp.lang.c++, not comp.lang.c. Doing
so will avoid lots of confusion, both on yours and other peoples'.

union {
char* p;
int _p;
};

Yeah, but not really. It's only a violation of C++'s protocol for pointer
exchange. ...

The proper term is "type conversion", not "pointer exchange". The rules
of both C and C++ allow an int* to have a different size,
representation, and alignment requirement from that of void*, and that
that was done precisely because there have been real machines where that
was actually the case. Therefore, type conversion can be something
considerably more complicated than just moving something from one
register to another, as is implied by your use of "exchange".

... As far as the machine goes, it's a pointer and they can be
exchanged. In my opinion, the compiler should allow it, and only warn
about it.

Well, the people who designed C++ disagree with you about that; they
think that type safety is very important; so important as to justify
treating it as an error when such a type conversion is not explicitly
requested. This is one case where the C rules are more in alignment with
your opinions: C worries less about type safety than C++, and therefore
allows the conversion of void* to other pointer types to occur implicitly.

David Brown · Jan 23, 2014

There are no computers anywhere in existence, be they men or machine, that
do not:

(1) Take some form of input
(2) Perform some operation on it
(3) Generate output

A random number generator might have no input.
A copier does not operate on its input.
A Windows machine on BSOD produces no output no matter how much you
hammer on the keyboard.

It is the fundamental definition of a "computer," ie "one who computes" or
"something that computes". They all follow this course:

(1) input
(2) process
(3) output

This is the story in all kids books with titles like "My first book
about computers".

There is a sense in which all computers get some input, do some
processing, and produce some output - but only in a fairly meaningless
way, just like the "moving parts" in a digital watch.

And in all cases, that absolutely means read-write. There are no exceptions.

No, that's wrong - by "read-write" you mean that a single "thing" has
its value modified. Examples you gave were register values that
changed. Many processes are easily modelled using read-write systems,
but it is certainly not necessary. Your post is an input, I am
processing it and producing an output - I haven't /changed/ or
/modified/ anything. Of course you can say that as synapses have fired
on and off in my brain then my brain state has been modified, but that
is an irrelevant detail of the implementation. Stick to what is
/relevant/ here, and you will /learn/ something. Please read up a
little on functional programming languages (the Wikipedia article will
do for starters). In particular, you should be looking at /pure/
functional programming, and side-effect free programming.

David Brown · Jan 23, 2014

I occasionally meet people who insist that "marriage" is "obsolete".
Some people seem to still it to have some appeal. Similarly, even
though we pretty much have the technology to do all fertilization
in labs, people seem to like the "obsolete" method.

Things which are old, and therefore presumably obsolete, include:
* Having friends.

Replaced by Facebook.

* Music.

Now we have Youtube.

* Written language.

SMS LOL.

* Spoken language.

SMS again.

* Pets.
Tamagotchi

* Carbon-and-oxygen metabolisms.

Within the next few years, we'll all be solar and wind powered to cut
down on our CO2 emissions.

glen herrmannsfeldt · Jan 23, 2014

(snip)

Your example declares an array "list" of three pointers-to-char. The
compiler (I'm using the term loosely - some of this is handled by the
linker and maybe even the OS link/loader) will place the strings "one",
"two" and "three" into a section of memory marked "read only" and
usually linked along with the code section. It will allocate space in
the "data" section for an array of three pointers. It will create a
block of data in the initialised data section containing the addresses
of the three string literals within the code section. And it will
arrange for the pre-main startup code to copy these three addresses into
the read-write memory identified by "list".

Reminds me, the OP was concerned about the memory waste using an
array of longer strings.

On pretty much all systems that allow for blocks of read-only
memory, it has to be a whole block, such as a 4K byte page.

On average a half page, or 2K bytes, will be wasted so he can
have the read-only constants.

-- glen

James Kuyper · Jan 23, 2014

On 01/22/2014 11:14 AM, Rick C. Hodgin wrote:
....

That's why a new language is needed. C will never change.

C has already changed significantly several times since first
standardized, most notably in 1995, 1999, and 2011. Why do you expect it
to suddenly stop changing? The C committee has made a commitment to
avoid making changes that break backwards compatibility, which means
that some bad decisions from the past are unlikely to ever be corrected,
but even that's not certain. Such changes are only to be "avoided", not
"prohibited". For instance, in C2011 they did finally get around to
deprecating gets(), though they took a couple of decades longer than
they should have.

Ben Bacarisse · Jan 23, 2014

BartC said:
BartC said:

And even in C, if you see
something like:

fn(x=(char[]){"ABC"});

you might reasonably expect that "ABC" is passed to fn each time this line
is executed, not something different every time!

Click to expand...

(I tried to see if fn could in fact be passed a string other than ABC
because of manipulation via x after previous calls.

But each time is was passed "ABC". That was because, in my gcc, the string
was reconstructed at the assignment point above.

It should only be re-initialised. x should get the same value every
time if gcc is getting it right. This behaviour is what the language
says -- it's not up to the implementation.

This just makes the behaviour of compound literals even more of a
mystery:

But any program construct is mysterious if you don't know what it does.
The mystery is probably just due to some expectation not being met.

if I *did* in fact want that behaviour, then this wouldn't work!

You'd have to avoid re-initialising the array on every call. That's
easy to arrange by doing const char *tmp = (char []){"ABC"}; once and
setting x from tmp each time: fn(x = tmp);

It also rather peculiarly, for shorter strings, constructed the string
32-bit bits at a time at each call, while for longer ones, the string
was stored in read-only memory! And copied presumably to local
writeable memory.)

That sounds reasonable to me. What's peculiar about it?

James Kuyper · Jan 23, 2014

BartC said:
BartC said:

And even in C, if you see
something like:

fn(x=(char[]){"ABC"});

you might reasonably expect that "ABC" is passed to fn each time this line
is executed, not something different every time!

Click to expand...

(I tried to see if fn could in fact be passed a string other than ABC
because of manipulation via x after previous calls.

But each time is was passed "ABC". That was because, in my gcc, the string
was reconstructed at the assignment point above.

This just makes the behaviour of compound literals even more of a mystery:
if I *did* in fact want that behaviour, then this wouldn't work!

The behavior of that code is essentially the same as

char temp[] = "ABC";
fn(x = temp);

Compound literals are simply a convenience, in that they allow you to
avoid having to declare "temp". Does that help unveil the mystery?

BartC · Jan 23, 2014

James Kuyper said:
On 01/23/2014 06:08 AM, BartC wrote:

This just makes the behaviour of compound literals even more of a
mystery:
if I *did* in fact want that behaviour, then this wouldn't work!

Click to expand...

The behavior of that code is essentially the same as

char temp[] = "ABC";
fn(x = temp);

It's not quite the same. If the fn() call is in a loop (together with code
that modifies via x), then the "ABC" value isn't restored at each iteration.

That only happens here if the char temp declaration is inside the loop block
too, and it is not modified between the declaration and the function call.

BartC · Jan 23, 2014

Ben Bacarisse said:
BartC said:

fn(x=(char[]){"ABC"});

you might reasonably expect that "ABC" is passed to fn each time this
line
is executed, not something different every time!

Click to expand...

Click to expand...

It also rather peculiarly, for shorter strings, constructed the string
32-bit bits at a time at each call, while for longer ones, the string
was stored in read-only memory! And copied presumably to local
writeable memory.)

Click to expand...

That sounds reasonable to me. What's peculiar about it?

1. When passing a string to a function, I generally expect it to just push a
pointer. Not do a dozen assignments (it was doing this with 30+ char
strings).

2. For both long and short strings, actual string data was stored in either
program memory (as immediate data) or in read-only memory, not in R/W
memory, even though the big deal about this construct is that the strings
are in writeable memory! (For the purpose the OP wants it for however, it
works as he expects.)

James Kuyper · Jan 23, 2014

James Kuyper said:
James Kuyper said:

On 01/23/2014 06:08 AM, BartC wrote:

This just makes the behaviour of compound literals even more of a
mystery:
if I *did* in fact want that behaviour, then this wouldn't work!

Click to expand...

The behavior of that code is essentially the same as

char temp[] = "ABC";
fn(x = temp);

Click to expand...

It's not quite the same. If the fn() call is in a loop (together with code
that modifies via x), then the "ABC" value isn't restored at each iteration.

That only happens here if the char temp declaration is inside the loop block
too, and it is not modified between the declaration and the function call.

I intended to imply that temp[] would be declared immediately before the
fn() call. Therefore, if fn() occurs inside a block, then temp is
declared inside that same block. Therefore, it is reinitialized each
time execution of the program reaches the declaration of temp, just as
the compound literal is reinitialized each time execution reaches the
compound literal (6.2.4p6).

Ben Bacarisse · Jan 23, 2014

BartC said:
Ben Bacarisse said:

BartC said:

fn(x=(char[]){"ABC"});

you might reasonably expect that "ABC" is passed to fn each time
this line
is executed, not something different every time!

Click to expand...

It also rather peculiarly, for shorter strings, constructed the string
32-bit bits at a time at each call, while for longer ones, the string
was stored in read-only memory! And copied presumably to local
writeable memory.)

Click to expand...

That sounds reasonable to me. What's peculiar about it?

Click to expand...

1. When passing a string to a function, I generally expect it to just
push a pointer. Not do a dozen assignments (it was doing this with 30+
char strings).

Then just pass the string. You wrote an initialisation. I'm not saying
you should have know -- your example is a corner case, but you wrote it
find out what happens. Try not to be too surprised! Someone writing
that in real code presumably wants the defined semantics: copy "ABC"
into a writable array (which may already exist), assign to x a pointer
to that array, and then call fn passing it that pointer value.

2. For both long and short strings, actual string data was stored in
either program memory (as immediate data) or in read-only memory, not
in R/W memory, even though the big deal about this construct is that
the strings are in writeable memory!

The big deal is that the *array* is writable. That the compiler puts
literal data used for initialisation in read-only memory (in code or
otherwise) is unexceptional. You can't (except for using specialist
tools) usually control where a compiler puts such data.

(For the purpose the OP wants it for however, it works as he expects.)

Yes. And to be clear, not by some accident of the implementation. It
works as expected because that is the defined semantics of the
construct.

Kaz Kylheku · Jan 23, 2014

(snip)

Reminds me, the OP was concerned about the memory waste using an
array of longer strings.

On pretty much all systems that allow for blocks of read-only
memory, it has to be a whole block, such as a 4K byte page.

On average a half page, or 2K bytes, will be wasted so he can
have the read-only constants.

The guy has no real-world, "big picture" view on what it means to actually
waste resources.

This is some code-generating tool whose invocation probably lives for a
fraction of a second and then returns all the memory that it used to the OS.

Memory is a serially reusable resource: it is not permanently consumed
by a program. And there is tons of free memory in systems that are used for
running programming tools.

Scraping for bytes on systems that are measured in gigabytes is dumber than
reusing toilet paper.

Kaz Kylheku · Jan 23, 2014

There are no computers anywhere in existence, be they men or machine, that
do not:

(1) Take some form of input
(2) Perform some operation on it
(3) Generate output

It is the fundamental definition of a "computer," ie "one who computes" or
"something that computes".

It is YOUR fundamental definition of a computer, not THE fundamental definition
of a computer.

You do not appear to have the necessary background to be giving everyone the
definition of what a computer is all by yourself (and neither do I);
so a citation would be appreciated.

And in all cases, that absolutely means read-write. There are no exceptions.

Trivial counterexample: a purely combinatorial logic circuit.

This computes a string of M bits out of a string of N bits without ever
clobbering a variable.

The circuits have elements which have state, like stray capacitances, but these
do not affect what it computes, but only delay its settling time.

Not only can we omit these from circuit diagrams, but we can use abstract gate
symbols to capture the circuit, rather than transistor-level schematics.

Kaz Kylheku · Jan 23, 2014

Too counterintuitive, rejected. It's "a, b", not "a ,b".

There is some sense in that when commas are separators. Firstly, you can
have a left factored grammar for this:

expr -> term
-> term , expr

For this, we might write a recursive descent parser which looks ahead
for a leading comma, and that signals the presence of another term to be
handled, which can be done in a loop.

In C++ programming, I usually write base class initializers in a constructor
like this:

foo::foo(int arg1, char *arg2)
: base(arg1)
, otherbase(arg2)
, thirdbase(42)
{
}

The colon means the first initializer is coming, then commas signal
additional ones, and it nicely formats this way.

BartC · Jan 23, 2014

For code which will run on computers that use other programs, if I am
writing
or reading text files, I want the C runtime to handle native text files
appropriately so I don't have to think about it. Text mode does this.

I agree with the OP on this.

There are only 3 types of line-endings in general use: CR/LF (Windows), LF
(Linux), CR (Mac), or at least those are all I've ever encountered.

Relying on the C runtime to deal with any combination won't always work. And
it seems to be on Windows that it can be a problem because Linux seems to
like to inflict its LF-coded files on everyone else.

(And in Linux it's easy to be careless because binary and text modes are the
same. I've seen one file that mixed text and binary, which presumably worked
under Linux, but in Windows/C it expanded LF to CR,LF, putting the following
binary data out of step.)

Rick C. Hodgin · Jan 23, 2014

A random number generator might have no input.

Yes it does. It samples some real-world thing to obtain the random number,
or it follows a pre-programmed set of rules for a pseudo-random ability.

A copier does not operate on its input.

Yes it does. Someone presses a button. It reads the color content from
the original, processes it through for error correction, and instructs,
via output, the re-printing mechanism.

A Windows machine on BSOD produces no output no matter how much you
hammer on the keyboard.

Yes it does. Internally there are still interrupts firing each time the
keyboard is pressed, mouse is moved, and so on. The fact that the screen
remains blue indicates the video card is updating its signal to the monitor
some 60+ times per second, and so on.

There are things we don't see, but all computers everywhere take input,
process it in some way, and produce output.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 23, 2014

Reminds me, the OP was concerned about the memory waste using an
array of longer strings.

On pretty much all systems that allow for blocks of read-only
memory, it has to be a whole block, such as a 4K byte page.

On average a half page, or 2K bytes, will be wasted so he can
have the read-only constants.

Those are entirely separate issues. They are completely outside of my
control. The goals I have in my application are to be minimally wasteful.
I even use 1-byte alignment on my compiler settings, sacrificing a little
speed for some extra packing.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 23, 2014

C has already changed significantly several times since first
standardized, most notably in 1995, 1999, and 2011. Why do you expect it
to suddenly stop changing?

It has inertia. It can't reasonably undo what's been done because there
is so much existing code base. Of course it will change, but it won't
undo what's been done for the sake of looking ahead. A branch-away is
required to do that.

The C committee has made a commitment to avoid making changes that break
backwards compatibility, which means that some bad decisions from the past
are unlikely to ever be corrected, but even that's not certain.

That's exactly what I meant by my wording. I just used different words.

Such changes are only to be "avoided", not
"prohibited". For instance, in C2011 they did finally get around to
deprecating gets(), though they took a couple of decades longer than
they should have.

Without knowing more, I'd wager it's a rarely used feature that does not
have any significant impact on existing code bases should it be removed.

And so it's clear ... I don't have a problem with C or C++ or any other
language maintaining their backward compatibility. I plan to do so as
well. However, from the beginning I am going to create something that
works with the way I code, and have coded, throughout my career using
x86 CPUs, and more recently, ARM CPUs.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 23, 2014

The guy has no real-world, "big picture" view on what it means to actually
waste resources.

This is some code-generating tool whose invocation probably lives for a
fraction of a second and then returns all the memory that it used to the OS.

Memory is a serially reusable resource: it is not permanently consumed
by a program. And there is tons of free memory in systems that are used for
running programming tools.

Scraping for bytes on systems that are measured in gigabytes is dumber than
reusing toilet paper.

Kaz, you are making a lot of mistakes in your assessment of me and my
understanding of things. I think this comes from me not being formally
trained in C, and not knowing the appropriate words to use, and therefore
coming across like I do not understand certain things because I seem to
misspeak.

The truth is I come from an assembly background. I have written my own
operating system. I have a broad knowledge base on x86 architecture. I
used to write articles for geek.com and tgdaily.com, and I've been on site
to interview dozens of scientists in all fields from computer programming
to hardware architecture to future devices. I do understand what I'm
talking about ... I just don't use the defined words as per the C specs.
I have never been educated in them, so I use the things that I have come
to understand them to be.

I apologize for the confusion this causes you.

Best regards,
Rick C. Hodgin

Rick C. Hodgin · Jan 23, 2014

It is YOUR fundamental definition of a computer, not THE fundamental
definition of a computer.

The two are the same. Input, process, output.

You do not appear to have the necessary background to be giving everyone the
definition of what a computer is all by yourself (and neither do I);
so a citation would be appreciated.

The definitive answer? I don't know. I did a google search and found
this on the Rasberry Pi. Seems definitive. There are other sources as
well.

http://www.slideshare.net/corb201/computer-systems-input-process-output

Specifically slide 5.

Trivial counterexample: a purely combinatorial logic circuit.

A circuit is not a computer. It's a mechanism. Old fashioned mechanical
cash registers had hand cranks to activate their mechanical compute
mechanisms within. The computer was the whole device, not just the hand
crank.

This computes a string of M bits out of a string of N bits without ever
clobbering a variable.

Untrue. It has a bit accumulator somewhere. That accumulator is constantly
being overwritten at each call. There is a bus architecture conveying
digital data from A to B continually, which is going through many devices,
the L1 cache, L2 cache, out to main memory, and so on. That bus is
constantly saturated with empties, fills, transmission timing, reads,
and so on.

It's all read-write because that something was read (N bits) and something
was processed (the algorithm) and something was output (M bits).

The circuits have elements which have state, like stray capacitances,
but these do not affect what it computes, but only delay its settling
time.

Circuits are components of a device. They are elements of the computer and
affected by those oddities. It's why it took so long for the semiconductor
industry to move below 180nm. At those levels the oddities began showing
up in circuitry and new geometry had to be created to counteract their
effects.

Not only can we omit these from circuit diagrams, but we can use abstract
gate symbols to capture the circuit, rather than transistor-level
schematics.

You can only omit them from certain levels of diagrams, as when working in
pure theory. Any real-world model would have to consider them because they
do exist.

Best regards,
Rick C. Hodgin

Tic Tac Toe Game	2	Mar 10, 2024
Constant Strings	17	Aug 30, 2007
Newbie: Array of pointers to strings questions.	22	May 11, 2005
Help in this program.	2	May 14, 2022
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
Constant time insertion into a sorted list?	1	Jul 16, 2008
Python point location of intersect between two lines	0	Feb 28, 2018
Command Line Arguments	0	Mar 7, 2023

Non-constant constant strings

James Kuyper

David Brown

David Brown

glen herrmannsfeldt

James Kuyper

Ben Bacarisse

James Kuyper

BartC

BartC

James Kuyper

Ben Bacarisse

Kaz Kylheku

Kaz Kylheku

Kaz Kylheku

BartC

Rick C. Hodgin

Rick C. Hodgin

Rick C. Hodgin

Rick C. Hodgin

Rick C. Hodgin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads