Non-constant constant strings

J

Joe keane

But the array whose first element s points to is still
just 6 characters long, and unlike string literals, an object created by
a compound has automatic storage duration (it ceases to exist when you
leave the enclosing block).

How about this?

@ cat bar.c
char *bar[2] =
{
"jjj",
(char []) { "kkk" },
};
@ cc -S bar.c
@ cat bar.s
.file "bar.c"
.data
.type __compound_literal.0, @object
.size __compound_literal.0, 4
__compound_literal.0:
.string "kkk"
..globl bar
.section .rodata
..LC0:
.string "jjj"
.data
.align 4
.type bar, @object
.size bar, 8
bar:
.long .LC0
.long __compound_literal.0
.ident "GCC: (NetBSD nb2 20110806) 4.5.3"
 
R

Rick C. Hodgin

I'm one who would not readily change his mind, because (in
part) as things stand I can write stuff like:

const char *archiveFormats[] = {
#if CPIO_SUPPORTED
"cpio",
#endif
#if TAR_SUPPORTED
"tar",
#endif
#if ZIP_SUPPORTED
"ZIP",
#endif
#if APK_SUPPORTED
"apk",
#endif
};

It's *possible* to manage this sort of thing without introducing
an extra comma, but it's ugly as all-get-out:
[snip]

Try this, and then just always start at 1 instead of 0, and process until
you reach null:

const char *archiveFormats[] = {
null
#if CPIO_SUPPORTED
,"cpio"
#endif
#if TAR_SUPPORTED
,"tar"
#endif
#if ZIP_SUPPORTED
,"ZIP"
#endif
#if APK_SUPPORTED
,"apk"
#endif
,null
};

Best regards,
Rick C. Hodgin
 
K

Keith Thompson

Ian Collins said:
Most if not all of the programmer's editors I've used on Windows
recognise Unix line endings and gcc on Unix recognises Windows endings.
Text mode is something of a curse!

Not all Unix tools tolerate Windows-style line endings. For example,
if you write:

if [ "$x" = 42 ] ; then
echo ok
fi

in a bash script, and the script file uses Windows-style line endings,
bash will complain that "then\r" is an unrecognized token. (Except that
it will print the "\r" literally, causing a very confusing error
message.)

Blindly using "foreign" format text files on any system is not a good
idea.
 
J

James Kuyper

I'm one who would not readily change his mind, because (in
part) as things stand I can write stuff like:

const char *archiveFormats[] = {
#if CPIO_SUPPORTED
"cpio",
#endif
#if TAR_SUPPORTED
"tar",
#endif
#if ZIP_SUPPORTED
"ZIP",
#endif
#if APK_SUPPORTED
"apk",
#endif
};

It's *possible* to manage this sort of thing without introducing
an extra comma, but it's ugly as all-get-out:
[snip]

Try this, and then just always start at 1 instead of 0, and process until
you reach null:

const char *archiveFormats[] = {
null
#if CPIO_SUPPORTED
,"cpio"
#endif
#if TAR_SUPPORTED
,"tar"
#endif
#if ZIP_SUPPORTED
,"ZIP"
#endif
#if APK_SUPPORTED
,"apk"
#endif
,null
};

As he said: ugly. The two extra nulls (and "null" needs to be defined)
seem far worse to me than the extra comma - they survive into the object
file, and even into the final executable, taking up extra space. The
extra comma disappears during translation phase 7 and has no impact on
the actual executable.
 
K

Keith Thompson

Rick C. Hodgin said:
I'm not sure I would've been keen on that idea. I would rather have
maintained it as a deprecated functionality that would have been
slated to be removed in a few version releases. The old compilers
could've generated object code in a particular version of a compiler
that could be maintained for backward compatibility without negating
the language in moving forward. My opinion. :)

(Reformatting your long lines *again*.)

That's exactly what they did. I gave you a link to a recent draft
of the C standard. Take a look at section 6.11.6:

The use of function declarators with empty parentheses (not
prototype-format parameter type declarators) is an obsolescent
feature.

I'm personally not happy with how long it's taken to actually remove the
feature, but it's been officially obsolescent (which means that it may
be considered for withdrawal in future revisions of the standard) since
1989.

[...]
 
R

Rick C. Hodgin

Most if not all of the programmer's editors I've used on Windows
recognise Unix line endings and gcc on Unix recognises Windows endings.
Text mode is something of a curse!

Not all Unix tools tolerate Windows-style line endings. For example,
if you write:

if [ "$x" = 42 ] ; then
echo ok
fi

in a bash script, and the script file uses Windows-style line endings,
bash will complain that "then\r" is an unrecognized token. (Except that
it will print the "\r" literally, causing a very confusing error
message.)

Blindly using "foreign" format text files on any system is not a good
idea.

It's why my algorithm looks for \r or \n in any order, and then checks the
character after for the alternate (\r\n or \n\r combinations). If found,
it considers that grouping to be one newline. If not, it considers the single
character to be one newline. Then it continues parsing.

bash sounds like it needs some post-rebirth rehabilitation. :)

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

As he said: ugly. The two extra nulls (and "null" needs to be defined)
seem far worse to me than the extra comma - they survive into the object
file, and even into the final executable, taking up extra space. The
extra comma disappears during translation phase 7 and has no impact on
the actual executable.


There's a part of me that agrees with you. I would go to lengths to avoid
having this kind of issue. Since this is a heavily used feature, I would
probably create some type of generic tool to distribute around which is an
on-the-fly builder capable of preparing lists, and then returning the source
code. And it would know how to handle commas.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

I'm not sure I would've been keen on that idea. I would rather have
(Reformatting your long lines *again*.)

I see the text in a window on Google Groups which is about 72 characters
wide. I have to manually insert carriage returns to break it up. I
sometimes forget. I apologize.

What news reader are you using? Try groups.google.com and subscribe to the
comp.lang.c group.
That's exactly what they did. I gave you a link to a recent draft
of the C standard. Take a look at section 6.11.6:
The use of function declarators with empty parentheses (not
prototype-format parameter type declarators) is an obsolescent
feature.

Awesome! :) To quote the three drones from Voyager, former members of the
tertiary adjunct of unimatrix one that seven was in, one who appears very
much like Admiral Forrest from ST:Enterprise, "we have consensus."
I'm personally not happy with how long it's taken to actually remove the
feature, but it's been officially obsolescent (which means that it may
be considered for withdrawal in future revisions of the standard) since
1989.

I hear you. Always that backward compatibility. It's why it's important to
include dates. We have them in our U.S. Constitution even.

From the 21st amendment:
3. The article shall be inoperative unless it shall have been
ratified as an amendment to the Constitution ... within seven
years from the date of the submission hereof to the States
by the Congress.

Seven years is a good time period. It's the biblical period of forgiveness
(Deu 15:1), "At the end of every seven years you must cancel debts." How
the world would be better were that guidance followed.

Best regards,
Rick C. Hodgin
 
K

Keith Thompson

Rick C. Hodgin said:
In your proposed C-like language, what would this snippet print?
for (int i = 0; i < 2; i ++) {
char *s = "hello";
if (i == 0) {
s[0] = 'H';
}
puts(s);
}

In my proposed language, it would print "Hello" both times because the
char* s definition would've been pulled out of the loop and defined as a
function variable.

That's fine; if you don't want to write code like that, you don't have
to. But I didn't ask how you'd re-write it; I asked how *that code*
should behave.

I answered you. How should it behave?

I don't believe you did answer me.
In my compiler, I would pull the variable out and make it a function-variable
defined at the top, so it would've been altered the first time through and
both times would print Hello.

Do you mean by that that you would *change the source code I posted* so
that s is declared at a higher level? The result might be a better
program, but it's a different program than the one I posted, so that
doesn't answer my question at all.

Or do you mean that the compiler would implicitly do the equivalent of
moving s to a higher level? If so, it's unclear what that would mean.

How *should* it behave? In standard C, the behavior is undefined,
because it attempts to modify a string literal. I have no interest in
changing that rule (well, I'd prefer string literals to be const, but I
understand why they're not), so I have no further answer. You're the
one proposing changes; I'm asking you for details on how you can make
those changes consistently.
No ... I'm creating my own new language, RDC, which is C-like, but dumps a
lot of what I view as "hideous baggage left over from a bygone era" ... while
also adding a lot of new features I see as looking to the future of multiple
cores, GUI developer environments, touch screens, eventual 3D interfaces, and
more.

Ok. Then why are you discussing your non-C language in comp.lang.c?
Perhaps comp.lang.misc would be of interest to you.

[snip]
 
K

Keith Thompson

But the array whose first element s points to is still
just 6 characters long, and unlike string literals, an object created by
a compound has automatic storage duration (it ceases to exist when you
leave the enclosing block).

How about this?

@ cat bar.c
char *bar[2] =
{
"jjj",
(char []) { "kkk" },
};
@ cc -S bar.c
@ cat bar.s
[SNIP]

I don't understand assembly language well enough to figure out what
point you're making.
 
G

glen herrmannsfeldt

(snip)
In my experience, the special features of text mode as compared to
binary mode are conventions associated with operating systems. As such,
files adhering to those conventions can be used to communicate between
any two programs compiled for that operating system, whether or not
they're running on the same platforms or different platforms.
I wouldn't be surprised to learn that there are conventions for the
layout of text files that are associated with things other than
operating systems - but offhand I can't think of any.

I believe that HTTP (and so HTML) are OS independent, and,
as well as I know, use the "\r\n" line endings.

-- glen
 
G

glen herrmannsfeldt

(snip)
I don't use that feature, and I don't like it. However, this feature
simplifies the creation of machine-generated C code, and people who
write such generators are apparently sufficiently numerous that the
committee felt a need to accommodate their desires.

It does, and I do sometimes generate look-up tables,

But I believe that simplifying the use of the preprocessor is
a more important use. One can #ifdef table entries, without
a special case for the last one. (Since you don't know which one
will be the last.)

The second best choice would be to waste the last entry, with a
null, zero, or some other useless item. Complicates a lot of
other coding, though.

-- glen
 
K

Keith Thompson

Rick C. Hodgin said:
Most if not all of the programmer's editors I've used on Windows
recognise Unix line endings and gcc on Unix recognises Windows endings.
Text mode is something of a curse!

Not all Unix tools tolerate Windows-style line endings. For example,
if you write:

if [ "$x" = 42 ] ; then
echo ok
fi

in a bash script, and the script file uses Windows-style line endings,
bash will complain that "then\r" is an unrecognized token. (Except that
it will print the "\r" literally, causing a very confusing error
message.)

Blindly using "foreign" format text files on any system is not a good
idea.

It's why my algorithm looks for \r or \n in any order, and then checks the
character after for the alternate (\r\n or \n\r combinations). If found,
it considers that grouping to be one newline. If not, it considers the single
character to be one newline. Then it continues parsing.

That's workable if your tool runs only on systems that use one of \r,
\n, \r\n, or \n\r to mark line endings. (And either you treat \n\n as
an empty line, or you can safely ignore empty lines.) But it breaks
down if you want to *write* text files.

C has text mode for a reason. Take a moment to consider the bare
possibility that the people who designed it were not idiots.
 
K

Keith Thompson

Rick C. Hodgin said:
I see the text in a window on Google Groups which is about 72 characters
wide. I have to manually insert carriage returns to break it up. I
sometimes forget. I apologize.

What news reader are you using? Try groups.google.com and subscribe to the
comp.lang.c group.

groups.google.com is the problem. Google provides a web interface to
Usenet, something that predates the web and even the Internet. Google
has done a horribly poor job with their interface and has been
unresponsive to complaints.

I use the news.eternal-september.org free Usenet server. The client I
use is Gnus, which runs under Emacs. Mozilla Thunderbird is another
popular client.
 
R

Rick C. Hodgin

Rick C. Hodgin said:
In your proposed C-like language, what would this snippet print?
for (int i = 0; i < 2; i ++) {
char *s = "hello";
if (i == 0) {
s[0] = 'H';
}
puts(s);
}
In my proposed language, it would print "Hello" both times because the
char* s definition would've been pulled out of the loop and defined as a
function variable.

That's fine; if you don't want to write code like that, you don't have
to. But I didn't ask how you'd re-write it; I asked how *that code*
should behave.
I answered you. How should it behave?

I don't believe you did answer me.
In my compiler, I would pull the variable out and make it a function-
variable defined at the top, so it would've been altered the first
time through and both times would print Hello.

Do you mean by that that you would *change the source code I posted* so
that s is declared at a higher level? The result might be a better
program, but it's a different program than the one I posted, so that
doesn't answer my question at all.

Or do you mean that the compiler would implicitly do the equivalent of
moving s to a higher level? If so, it's unclear what that would mean.

The compiler would receive the definition of char* s where it is, but
it would logically create it as a local variable within the single
function. In short, I would not allow scoped variables within a block
within a function. I would have them all defined as local variables,
and they would all be available for use inside or outside of the block
they were defined in.
How *should* it behave? In standard C, the behavior is undefined,
because it attempts to modify a string literal. I have no interest in
changing that rule (well, I'd prefer string literals to be const, but I
understand why they're not), so I have no further answer. You're the
one proposing changes; I'm asking you for details on how you can make
those changes consistently.

In my case, it would not be a constant, but would be a string defined to
be the initial value indicated.
Ok. Then why are you discussing your non-C language in comp.lang.c?
Perhaps comp.lang.misc would be of interest to you.

Perhaps. It is/was all back story to my original question, the explanation
as to why I believe the strings in char* list[] = { "one", "two", "three" } should be read/write.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

I don't understand assembly language well enough to figure out what
point you're making.

I do understand assembly language, but I still didn't understand the
point being made.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

It's why my algorithm looks for \r or \n in any order, and then checks the
That's workable if your tool runs only on systems that use one of \r,
\n, \r\n, or \n\r to mark line endings. (And either you treat \n\n as
an empty line, or you can safely ignore empty lines.) But it breaks
down if you want to *write* text files.

Not at all. If it finds \r\n it is a single newline. If it finds \n\r it
is a single newline. If it finds \n\n it stops after the first \n and
considers it its own newline, and then continues parsing and encounters
the second \n and it is also its own newline. \n\n would be a double space.
C has text mode for a reason. Take a moment to consider the bare
possibility that the people who designed it were not idiots.

It's interesting that such a handy helper feature like text mode exists
to "help" developers, while other more obvious assistance features are
left completely out -- such as certain variable types not always being a
specified number of bits across platforms.

For the record, I believe C is one of the best languages ever constructed.
I also believe it has many many flaws. I hope to undo many of them with
my effort.

Best regards,
Rick C. Hodgin
 
R

Rick C. Hodgin

groups.google.com is the problem. Google provides a web interface to
Usenet, something that predates the web and even the Internet. Google
has done a horribly poor job with their interface and has been
unresponsive to complaints.

I use the news.eternal-september.org free Usenet server. The client I
use is Gnus, which runs under Emacs. Mozilla Thunderbird is another
popular client.


I cannot help but consider the fact that Google Groups provides a frew
web-based interface which removes shortcomings in the text-based Usenet
group. It allows HTML messages, longer lines with automatic wrapping,
immediate access to many groups, complex searching, and more.

It seems that the future may be speaking, in an attempt to bring Usenet
into the 2010s and beyond.

Text-based interfaces were nice ... they used the technology available at
the time (limited disk space, limited memory, slower clock speeds). But
the technology of the 2010s is significantly beyond anything we've had
previously. Most modern multi-core CPU desktops with 8+ GB of memory,
1+ TB of disk storage, an average to high-end GPU, have more computing
power than supercomputers did 15+ years ago.

GUIs provide a far better user experience, and are only becoming more
common as time goes on. Smart phones. Tablets. Touch screen. We're
changing our computing needs.

Best regards,
Rick C. Hodgin
 
B

Ben Bacarisse

Rick C. Hodgin said:
I do understand assembly language, but I still didn't understand the
point being made.

The listing (from Joe Keane) contains a fragment of C with an excellent
suggestion in it:

char *bar[2] =
{
"jjj",
(char []) { "kkk" },
};

The construct (char []){ "kkk" } is called a compound literal and
represents a anonymous object of the type the heads it up -- in this
example char array of char. The resulting object is writable.

The array bar contains two pointers to the start of two arrays. The one
built from a string literal is not writable, but the one built by the
compound literal is. Pretty much what you want.
 
K

Keith Thompson

Rick C. Hodgin said:
Not at all. If it finds \r\n it is a single newline. If it finds \n\r it
is a single newline. If it finds \n\n it stops after the first \n and
considers it its own newline, and then continues parsing and encounters
the second \n and it is also its own newline. \n\n would be a double space.


It's interesting that such a handy helper feature like text mode exists
to "help" developers, while other more obvious assistance features are
left completely out -- such as certain variable types not always being a
specified number of bits across platforms.

Are you acknowledging that text mode is useful?

If C had defined specified sizes for predefined types, then int would
probably be 16 bits and long would be 32 (the sizes they had in early
PDP-11 implementations).

Or perhaps not. The first edition of K&R, the book that defined the
language in 1978, showed int with a size of 16 bits on the PDP-11, 36
bits on the Honeywell 6000, and 32 bits on the IBM 370 and Interdata
8/32. None of those platforms had a 64-bit integer type, because 64-bit
integer arithmitec was not supported on the hardware of the time.

Which of those choices would you want to impose on all C implementations
for all platforms?

On the other hand, if you want fixed-width types, you can use int8_t,
which was added to said:
For the record, I believe C is one of the best languages ever constructed.
I also believe it has many many flaws. I hope to undo many of them with
my effort.

I suggest you need to be more familiar with what's already been done.
Reinventing the wheel is fine, but you may find that someone has already
figured out how to make it round.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,085
Messages
2,570,597
Members
47,218
Latest member
GracieDebo

Latest Threads

Top