Non-constant constant strings

R

Rick C. Hodgin

The whole lifestyle which enables you to write programs is wasteful compared
to living in the wild.

YES! You get it! People are what's important. Everything that God made in
creation was made for people, for their benefit, to bring glory to Him. And
it is exactly that ... He created things for us to use, not that they be
wasted, but that they be used to make us better able to help one another, to
serve Him, and not necessarily in that order.
If you don't want to be wasteful, don't drive a car, don't consume anything
that was produced more than 20 miles from where you live, etc.

I don't want to be wasteful in my software's memory consumption. I always
set it to pack structures to single-byte alignment, and I don't employ the
solution you proposed which forces lots of N-length data structures when the
actual source code on each line might be an average of N/16 in real length
(because the one necessary long line mandates that all of the other much
shorter lines consume that much as well).
You're talking about wasting *bytes* on a machine whose RAM is measured in
*gigabytes*. Moreover, those bytes are not permanently wasted; they are only
briefly occupied, and then released and re-used for something else.

Yes. It's an unnecessary feature with Joe's compound literal solution. I
don't understand why you wouldn't prefer Joe's solution, which does not
waste anything, to yours, which wastes much? They both ultimately accomplish
the same thing in this case (where the string lengths of the literals are
not changing, just some of some of their contents).
This "relative waste" is less, relatively, than the scraps that remain on
your plate at dinner, probably even if you lick the plate "clean".

Yes. That's why I used the word "relative" ... because it is relative. In
this case it's relative to Joe's solution, which is not at all wasteful.
Except that it needs a compiler for the C99 dialect, which is larger, written
to a standard which almost tripled the page count compared to C90.

That's a minor issue because a C99 compiler exists, and I believe I can link
the output object file into my own, being as it's only data. If not ... I
still have the solution I've been using for the past few weeks.

Best regards,
Rick C. Hodgin
 
J

James Kuyper

On Wednesday, January 22, 2014 1:08:14 PM UTC-5, James Kuyper wrote: ....

FWIW, I would estimate that any reasonably intelligent C developer would be
able to figure out from context that when I use null, I mean NULL. If not...
well then... there we are. :)

A reasonably intelligent C developer would conclude from your use of
null rather than NULL that you weren't very familiar with the language,
and didn't understand the difference. The first part, at least, seems
very accurate, and the second is a reasonable conclusion, even though in
this case it seems to be false.
FWIW, I think the trailing null is the best current solution. And I
like the term sentinel. :) I would use it for variable code.
In moving forward, I would write some new ability to create the list
properly, giving the compiler a new ability to handle variable items:
char *archiveFormats[] =
{
#elementif CPIO_SUPPORTED "cpio"

For consistency with the naming of existing preprocessing directives,
that should be #elementifdef. Would you also propose #elementif, which
works like a #if, and #elementifndef, which would work like #ifndef?

I would probably change that as well. I think the pound sign prefix is
not so good. I think I would make all code items that are to be affected
at compile time as computations leading to variable code be prefixed with
leading dots, and then use more or less the same syntax as the rest of
the language.

You're talking about designing a new language, without bothering to
consider the issue of backwards compatibility with the existing C
language. There's nothing inherently wrong with doing that, but this is
not the right forum for such a discussion. You've already been
redirected toward a different forum, one specialized for the discussion
of new programming languages.
No idea what you're talking about, though I assume it has to do with the
current design of at least one C compiler, possibly GCC.

No, it has to do with the C standard, specifically section 5.1.1.2,
"Translation phases". If the language you're designing is intended to be
so different from C that the contents of the current C standard are
irrelevant, then this forum is definitely the wrong one for discussing it.
 
R

Rick C. Hodgin

In that case, with an alterable "template text" I would have made it a
text file, loaded a copy of hat text into memory and altered the copy
on the fly as needed. This allows replacement of the text without
recompiling all the source. Alternatively, putting the text into a
resource DLL and copying the resource would have solved the problem
without exposing the text to unauthorized alteration.

I considered that as well. Presently I process out line-by-line in the
source file as elements in the array. I would need to alter my algorithms
to handle the single file at one time. Someone else had a suggestion that
used a concatenation of multiple lines of text using \r\n\0 at the end of
each component, which was along those lines.

All of these are doable.

Best regards,
Rick C. Hodgin
 
S

Seebs

My algorithm actually looks for either ASCII-10 or ASCII-13, in any order, and
then looks at the next character. If it's the corresponding char, as in "\r\n"
or "\n\r" then it accepts that as a line feed. If it doesn't, then it reads it
in as a single-character line ending and continues processing. This allows
combinations like "\r\r" or "\n\n" to be recognized as two lines.

This is fascinating. My algorithm just has to look for a single character
which is always a line-ending and never has to worry about it.
It's pretty universal ... "in Windows."

.... I am starting to suspect that we are being trolled.
I don't have to type in the extra "\r", but I choose to do so because when we
do periodically bring up the text files in editors that care about the line
ending combination, it doesn't generate that error. It works perfectly fine
with or without the "\r" ... it's just done as a nicety.

You're missing the point. If you were doing this the sane way, you would
not have to type the \r, *and* you would never have problems in text editors.
I use binary files with this logic on Linux as well. It works the same. My
logic accounts for combinations that I've seen, so any combination of \r or
\n, repeated or not repeated, all parses out properly.

Except that you're creating files which have extra characters in them which
will sometimes be interpreted as literal characters which are part of the line.

I am wondering if perhaps you have been confused by the existence of binary
and text modes for ftp, where you really do always want binary.

-s
 
S

Seebs

Most if not all of the programmer's editors I've used on Windows
recognise Unix line endings and gcc on Unix recognises Windows endings.
Text mode is something of a curse!
Not all Unix tools tolerate Windows-style line endings. For example,
if you write:
if [ "$x" = 42 ] ; then
echo ok
fi
in a bash script, and the script file uses Windows-style line endings,
bash will complain that "then\r" is an unrecognized token. (Except that
it will print the "\r" literally, causing a very confusing error
message.)
Blindly using "foreign" format text files on any system is not a good
idea.
It's why my algorithm looks for \r or \n in any order, and then checks the
character after for the alternate (\r\n or \n\r combinations).

This is completely irrelevant.

We are not talking about what *your* code can *parse*.

We are talking about what happens when *your* code *generates* files which
have embedded Windows-specific line endings, and those files are then ever
exposed to other programs.

-s
 
K

Keith Thompson

Rick C. Hodgin said:
const char *archiveFormats[] = {
#if CPIO_SUPPORTED
"cpio",
#endif
#if TAR_SUPPORTED
"tar",
#endif
#if ZIP_SUPPORTED
"ZIP",
#endif
#if APK_SUPPORTED
"apk",
#endif
null
};

That alternative is still ugly, and IMO, still a fairly convincing
argument for the convenience of allowing a terminal comma on a list.

I think all those #if..#endif blocks are ugly. It clutters the code and
makes it hard for me to read.
However, it's also a real, if slight, improvement over your suggestion.

It seems odd to me that developers would reject the trailing null
solution using existing C abilities, and instead ask that a trailing
comma ability be included to save the 4-bytes a typical 32-bit pointer
would consume. Most modern compilers align data on 2^N boundaries for
speed anyway.

Developers asked for the trailing comma because it's convenient.

And the number of elements in the archiveFormats array is simply
sizeof archiveFormats / sizeof archiveFormats[0]
which an easily be wrapped in a macro:
#define ARRAY_LENGTH(arr) (sizeof (arr) / sizeof ((arr)[0]))
... ARRAY_LENGTH(archiveFormats) ...

Of course you have to be careful to apply this only to arrays,
not to pointers. If you're going to pass a pointer to a function
(using syntax that, unfortunately IMHO, makes it look like you're
passing an array), you need to pass the length explicitly -- or
use a sentinal value.

A sentinel value can be useful, but it's not always necessary.
Forcing the use of sentinel values because you don't like trailing
commas is absurd.

Think about this. Semicolons are terminators in C; commas are
separators. What's wrong with permitting commas to act like
terminators when it's convenient? I don't expect you to change
your opinion, but you might at least consider that those of us who
disagree with you do so for valid reasons.

[snip]
 
R

Rick C. Hodgin

A reasonably intelligent C developer would conclude from your use of
null rather than NULL that you weren't very familiar with the language,
and didn't understand the difference. The first part, at least, seems
very accurate, and the second is a reasonable conclusion, even though in
this case it seems to be false.

I think you can use "null" in Java. It's just habit I've picked up. Here
is my C code: https://github.com/RickCHodgin/libsf

If you have Visual Studio 2008 or later, load the vvm.sln file in
\libsf\vvm\core\ and you can see for yourself.
You're talking about designing a new language, without bothering to
consider the issue of backwards compatibility with the existing C
language.

I'm designing a new language. I don't have to worry about ANY backwards
compatibility with C. My new language will be C-like, but it won't be C.
There's nothing inherently wrong with doing that, but this is
not the right forum for such a discussion. You've already been
redirected toward a different forum, one specialized for the discussion
of new programming languages.

Then stop responding to me. :)
No, it has to do with the C standard, specifically section 5.1.1.2,
"Translation phases". If the language you're designing is intended to be
so different from C that the contents of the current C standard are
irrelevant, then this forum is definitely the wrong one for discussing it.

I agree.

Best regards,
Rick C. Hodgin
 
K

Keith Thompson

Geoff said:
In that case, with an alterable "template text" I would have made it a
text file, loaded a copy of hat text into memory and altered the copy
on the fly as needed. This allows replacement of the text without
recompiling all the source. Alternatively, putting the text into a
resource DLL and copying the resource would have solved the problem
without exposing the text to unauthorized alteration.

Or write your element block in some simple plain text format, write a
small tool that translates it to C source code, and include that tool in
your automated build process.
 
G

glen herrmannsfeldt

(snip)
Yes, but that does not cover all cases. In my case I explicitly allocate
a larger number of bytes than is required. My input string is something
like "[9999]" and I currently have only 812 items, so today it will always
only populate up to "[ 812]", but I leave room for expansion into the 1000s.
I notice no comment about the %d method that I suggested, which
takes minimal programming work!
I don't recall this one. I must have missed it. Can you repost?

You put %d where you want the number to go, then printf it:

#include <stdio.h>
int main() {
int i,n;
char *x[]={"hi", "th[%d]ere"};
n=33;
for(i=0;i<sizeof(x)/sizeof(*x);i++) printf(x,n,n,n,n,n,n,n);
}

Note that there are plenty of n's, just in case you need more.

This works best if you have only one number to subsitute,
in one or more places.

Note, no change to the constant strings, the library does
all the work while printing it.

About the only other complication is you have to double all %
that you actually want to print.

-- glen
 
G

glen herrmannsfeldt

I can see the need to remove the trailing null. And I can also
see the need to remove the trailing comma.
But in order to use a list that is populated without a trailing null you
must have a count that is known, so that either exists as a populated
constant that must be maintained, or it is another variable stored in
memory computed by the code generation utility ... either way it's (1)
manual work, or (2) 32-bits of storage taken up.


That is what sizeof(x)/sizeof(*x) is for.

Computers are better at counting than people are.

-- glen
 
G

glen herrmannsfeldt

Keith Thompson said:
(snip)
#define _rw(x) (char []) { x }
char* bar[] =
{
"jjj",
_rw("kkk"),
};
Works in GCC. Visual C++ is non-compliant though and
doesn't support this syntax. Still, a truly excellent solution!
One thing to watch out for (I think I mentioned this before): the object
associated with a composite literal has a lifetime that depends on where
it appears. If it's at file scope, the lifetime is static, just like a
string literal, but if it appears at block scope (inside a function
definition) it has automatic storage duration.

Can you declare it static, anyway?
That means that this:
const char *foo(void) { return "hello"; }
returns a pointer to a string that exists for the entire execution of
the program, but this:
char *bar(void) { return (char[]){"good-bye"}; }

maybe:

char *bar(void) { return (static char[]){"good-bye"}; }

-- glen
 
D

David Brown

Agreed. It's not always possible though. Some testing can only be done
at runtime because it relies upon external conditions or environments.

Yes.


Agreed. There are, however, other systems where a program could be tested
which do support the greater debugging features (like fault signaling), so
that when the same code is run on the weaker systems which do not support
such features and catch such common program errors, the issue is a non-issue
because it's already been caught on the better debugging platform.

I assume you are not very familiar with embedded development. Yes,
there are sometimes parts of code that can be compiled and run in a more
"debugger friendly" environment, especially if you are talking about
things like embedded Linux systems. But very often your code can only
sensibly be run on the target platform with whatever limitations that has.
Ridiculous. A literal conveys something known explicitly at compile-time,
but when used in conjunction with a variable name it only initially populates
the variable. A literal is only an always-constant value if it's used
directly in an expression, however that is not always the case either
because there exists the idea of self-modifying code. It's just that C
doesn't provide for those abilities natively.

Of course C does not provide for self-modifying code - it was designed
by sensible people. A literal should /always/ be a constant with the
same value everywhere. You can use it to initialise data and change
that data, but you don't change the literal.
A literal is conveyed quantity initially, but is something that can also
change at any point thereafter when used in a variable.

Wasn't it Humpty Dumpty who said "words mean what i want them to mean"?
What you're referring to as "literal" by your definition is actually a
"constant". They're different. A literal is simply something conveyed
explicitly. A constant is something that cannot be changed (at least
legally, as per facilities given by the language).

No, a "literal" is a constant value in the source code - such as a
number, or a string. I'm sure someone can quote you chapter and verse
from the C standards. Literals exist only at compile time, though they
may result in a copy being placed in the target code.

A "constant" is an object with a target memory allocation that the code
may not legally change (optimisations may result in it not having a
memory allocation, but the code should act as though it had one). Two
constants with the same value will have different memory allocations,
because their addresses must be unique - literals that happen to have
copies in memory can be freely combined by the compiler and/or linker.

Thus with "const int one = 1;", "one" is a constant, while "1" is a
literal. You can't legally change either of them.
 
K

Kaz Kylheku

I considered that as well. Presently I process out line-by-line in the
source file as elements in the array. I would need to alter my algorithms
to handle the single file at one time.

Odds are it could be done with less code. You could have some $VAR type syntax
in the template files. All you have to do is grab characters of the file with
the getc function and copy, except when you see an unescaped '$' character;
then collect the variable name, look it up in a table, and substitute the
value.

But why, when there is probably more than one library for this kind of thing,
too.

In ten seconds of searching I found this:

http://libctemplate.sourceforge.net/doc.html

(The only problem with this one is that it has the GNU Public License bug ...)
 
K

Keith Thompson

glen herrmannsfeldt said:
Keith Thompson said:
(snip)
#define _rw(x) (char []) { x }
char* bar[] =
{
"jjj",
_rw("kkk"),
};
Works in GCC. Visual C++ is non-compliant though and
doesn't support this syntax. Still, a truly excellent solution!
One thing to watch out for (I think I mentioned this before): the object
associated with a composite literal has a lifetime that depends on where
it appears. If it's at file scope, the lifetime is static, just like a
string literal, but if it appears at block scope (inside a function
definition) it has automatic storage duration.

Can you declare it static, anyway?

No, because it's not a declaration, and "static" is not part of the type
(it's a storage-class specifier, not a type qualifier like "const",
"volatile", or "restrict").
That means that this:
const char *foo(void) { return "hello"; }
returns a pointer to a string that exists for the entire execution of
the program, but this:
char *bar(void) { return (char[]){"good-bye"}; }

maybe:

char *bar(void) { return (static char[]){"good-bye"}; }

That's a syntax error.

You can't even write:

static char obj[] = (char[]){"good-bye"};

because it's a non-constant initializer for a static object. But even
if you could, you might as well just write:

static char obj[] = "good-bye";
 
R

Rick C. Hodgin

Also note that C is used on a huge variety of systems - including many
I assume you are not very familiar with embedded development. Yes,
there are sometimes parts of code that can be compiled and run in a more
"debugger friendly" environment, especially if you are talking about
things like embedded Linux systems. But very often your code can only
sensibly be run on the target platform with whatever limitations that has.

Correct. I have limited experience in embedded development. That consisted
primarily of an 8-bit micro-controller and JTAG debugging. Very primitive,
but the apps I wrote there were also very small.
Of course C does not provide for self-modifying code - it was designed
by sensible people.

LOL! :)
A literal should /always/ be a constant with the
same value everywhere. You can use it to initialise data and change
that data, but you don't change the literal.

This is confusing to me. Literals encode something at compile time. The
data they represent is then initialized at some place, loaded by the abi
loader, and then is available by offset into data space at runtime. That
data is not a literal at that point. It is data. The literal only
describes what was encoded at compile time.

If the block of memory that "literal" occupies as a variable now exists
in read-write memory, then it will not remain a literal. If it exists in
read-only memory then it will remain a literal.
Wasn't it Humpty Dumpty who said "words mean what i want them to mean"?

Yes. But, he was crazy. :)
No, a "literal" is a constant value in the source code - such as a
number, or a string. I'm sure someone can quote you chapter and verse
from the C standards. Literals exist only at compile time, though they
may result in a copy being placed in the target code.

Yes! Alright. This makes sense.
A "constant" is an object with a target memory allocation that the code
may not legally change (optimisations may result in it not having a
memory allocation, but the code should act as though it had one). Two
constants with the same value will have different memory allocations,
because their addresses must be unique - literals that happen to have
copies in memory can be freely combined by the compiler and/or linker.
Agreed.

Thus with "const int one = 1;", "one" is a constant, while "1" is a
literal. You can't legally change either of them.

Agreed. I was just using different words not coming from a C language
education with C language words being imprinted into my brain.

My bad. :)

Best regards,
Rick C. Hodgin
 
K

Keith Thompson

David Brown said:
No, a "literal" is a constant value in the source code - such as a
number, or a string. I'm sure someone can quote you chapter and verse
from the C standards. Literals exist only at compile time, though they
may result in a copy being placed in the target code.

A "constant" is an object with a target memory allocation that the code
may not legally change (optimisations may result in it not having a
memory allocation, but the code should act as though it had one). Two
constants with the same value will have different memory allocations,
because their addresses must be unique - literals that happen to have
copies in memory can be freely combined by the compiler and/or linker.

Your use of the word "constant", while reasonable, is inconsistent with
the way the word is used by the C standard.

A "constant" (as a noun) is a lexical element, either an
integer-constant, a floating-constant, an enumeration-constant,
or a character-constant. A string literal is not syntactically a
"constant", but it would have been reasonable to call it that (and
in fact K&R1 does use the term "string constant" as a synonym for
"string literal").

"Constant" as an adjective, as in "constant expression", refers to
something that can (and in contexts where it's required, must) be
evaluated at compile time.

The "const" keyword, though it's obviously related to the English word
"constant", really means "read-only".
Thus with "const int one = 1;", "one" is a constant, while "1" is a
literal. You can't legally change either of them.

"one" is a const-qualified object; any attempt to modify it either
violates a constraint or has undefined behavior. It cannot be used in a
constant expression. It's semantically no different from:

const int r = rand();

You can *try* to change the value of "one":

*(int*)&one = 42;

and that might even work (it has undefined behavior). There's no
mechanism to even try to change the value of 1; `&1` is a syntax error.
 
R

Rick C. Hodgin

Odds are it could be done with less code.

Would you like to criticize the pattern I use to mow my lawn as well? :)
You could have some $VAR type syntax
in the template files. All you have to do is grab characters of the file with
the getc function and copy, except when you see an unescaped '$' character;
then collect the variable name, look it up in a table, and substitute the
value.

Now I'm maintaining extra files. Joe's solution was brilliant. Exactly
what I was looking for. I was glad to find it.
But why, when there is probably more than one library for this kind of thing,
too.
In ten seconds of searching I found this:
http://libctemplate.sourceforge.net/doc.html
(The only problem with this one is that it has the GNU Public License bug ...)

I'm not a big fan of GNU because of Richard Stallman. I use various tools
because there is no practical alternative, but I pretty much avoid GNU stuff
where possible, and have on my long-term schedule a purpose to replace
everything today that is GNUish.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

Your use of the word "constant", while reasonable, is inconsistent with
the way the word is used by the C standard.

Yes. I was wrong in my understanding. Another poster straightened me out.
Thank you as well. :)

Best regards,
Rick C. Hodgin
 
J

James Kuyper

On 22/01/14 16:37, Rick C. Hodgin wrote: ....

No, a "literal" is a constant value in the source code - such as a
number, or a string. I'm sure someone can quote you chapter and verse
from the C standards. ...

Actually, no. The C standard describes string literals and compound
literals, but not any other kinds of literals, and it otherwise doesn't
use that term as a noun. There are a lot of things shared between string
literals and compound literals, and those things could be used as the
basis for a definition of "literal" - but the C standard doesn't provide
one.

I am familiar with the use of the word literal in a sense similar to the
one you use. Footnote 21 to section 2.14p1 of the C++ standard says "The
term “literal” generally designates, in this International Standard,
those tokens that are called “constants” in ISO C." With one exception,
the things described in the C standard under section 6.4.4 "Constants"
have corresponding "Literals" in section 2.14 of the C++ standard, and
the definitions of the C++ literals corresponds to the definitions of
the C constants:

6.4.4.1 "Integer Constants" => 2.14.2 "Integer literals"
6.4.4.2 "Floating constants" => 2.14.3 "Floating literals"
6.4.4.3 "Enumeration constants" => 7.2p1 enumerator
6.4.4.4 "Character constants" => 2.14.4 "Character literals"
6.4.5 "String literals" => 2.14.5 "String literals"

Section 2.14 goes on to describe several additional kinds of literals
that have no equivalent in C (boolean, pointer, and user-defined).
... Literals exist only at compile time, though they
may result in a copy being placed in the target code.

A "constant" is an object with a target memory allocation that the code
may not legally change (optimisations may result in it not having a
memory allocation, but the code should act as though it had one). Two
constants with the same value will have different memory allocations,
because their addresses must be unique - literals that happen to have
copies in memory can be freely combined by the compiler and/or linker.

Thus with "const int one = 1;", "one" is a constant, while "1" is a
literal. You can't legally change either of them.

As far as the C standard is concerned, "one" is an identifier that
identifies a const-qualified object, and "1" is an integer constant.
 
I

Ian Collins

David said:
I assume you are not very familiar with embedded development. Yes,
there are sometimes parts of code that can be compiled and run in a more
"debugger friendly" environment, especially if you are talking about
things like embedded Linux systems. But very often your code can only
sensibly be run on the target platform with whatever limitations that has.

Speaking as an embedded developer, those parts amount to the vast
majority of the code!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,082
Messages
2,570,589
Members
47,211
Latest member
Shamestone

Latest Threads

Top