Character array initialization

  • Thread starter Christian Kandeler
  • Start date
C

Christian Kandeler

Hi,

if I want to store the string "123456" in a variable of type char[], I can
do it like this:

char s[] = "123456";

Or like this:

char s[] = { '1', '2', '3', '4', '5', '6', '\0' };

Or like this:

char s[7] = "123456";

These are all equivalent because string literals have an implicit '\0'
character at the end. That's why it is a mistake to write this:

char s[6] = "123456";

Here we reserve one byte less than we need. Now here's my question: Doesn't
that mistake warrant a diagnostic? I was quite baffled today when four
different compilers failed to generate even a warning when fed the above
code, even though the mistake should be easy to detect. Or am I
misunderstanding string literals?


Thanks,
Christian
 
R

Richard Tobin

Christian Kandeler said:
That's why it is a mistake to write this:

char s[6] = "123456";

It's *often* a mistake.

But try this:

char s[5] = "123456";

I expect your compilers will generate a diagnostic.

What's the difference? There's no requirement in C that arrays of
characters be null-terminated. There are just a bunch of functions
that expect it. It's quite reasonable to initialize an array of 6
characters with 6 non-null characters. On the other hand, it makes no
sense to initialize an array of 5 characters with 6 characters.

-- Richard
 
P

pete

Christian said:
That's why it is a mistake to write this:

char s[6] = "123456";

It's not always a mistake.
It's my prefered way of writing:
char s[] = { '1', '2', '3', '4', '5', '6'};

You could have code like this:

const char letter[4] = "DCBA";

if (number > 60) {
if (number > 99) {
number = 99;
}
grade = letter[(number - 60) / 10];
} else {
grade = 'F';
}
 
D

Default User

Christian Kandeler wrote:
Here we reserve one byte less than we need. Now here's my question: Doesn't
that mistake warrant a diagnostic? I was quite baffled today when four
different compilers failed to generate even a warning when fed the above
code, even though the mistake should be easy to detect. Or am I
misunderstanding string literals?

No, because it's legal in C. From the C99 draft standard:


6.7.8 Initialization

[#14] An array of character type may be initialized by a
character string literal, optionally enclosed in braces.
Successive characters of the character string literal
(including the terminating null character if there is room
or if the array is of unknown size) initialize the elements
of the array.




Brian Rodenborn
 
C

Chris Torek

[string literals add a '\0' terminator, and] That's why it is a mistake
to write this:
char s[6] = "123456";
Here we reserve one byte less than we need. Now here's my question: Doesn't
that mistake warrant a diagnostic?

Back in the 1980s when the X3J11 committee was standardizing C
for the first time, this *was* an error and *did* get a diagnostic
from all existing (pre-ANSI) C compilers worthy of the name "C
compiler" (there were some strange compilers back then :) ).

But then someone on the committee decided it would be nice to
have a way to suppress the automatic '\0'-adding for certain
special cases. Whoever it was, proposed that if the programmer
manually counted up the bytes and put the size in an array definition,
and then used a string literal to initialize an array of "char",
that it would be OK to have the array be "just one too small" to
hold the final '\0', in which case the '\0' would be suppressed.

Of course, a MUCH BETTER suggestion was sent in during the review
periods -- a new escape sequence, \z, could be used at the end of
string literals to suppress the final zero byte, so that one could
even write things like x = "\10\2\4\1\z"[i & 3], for instance, to
make a four-byte literal array without the unnecessary '\0' at the
end -- but it was "Not Invented Here" and rejected. Had \z been
accepted, you would have been able to write:

char s[] = "123456\z";

and get an array of size 6, without having to manually count -- or
perhaps mis-count -- the bytes. Leaving out the \z would get a
diagnostic, just as one would expect.

The X3J11 committee folks went along with the dumb idea :) , so now
that is what we have.
 
M

Martin Ambuhl

Christian said:
These are all equivalent because string literals have an implicit '\0'
character at the end. That's why it is a mistake to write this:

char s[6] = "123456";

It is *not* a mistake. It declares an array of 6 chars, but not a
string. There is no law that every array of chars must be a string.
Here we reserve one byte less than we need.

No, we reserve the exactly amount of space we need for the char array.
Now here's my question: Doesn't
that mistake warrant a diagnostic?

It is not a mistake and does not require a diagnostic.
I was quite baffled today when four
different compilers failed to generate even a warning when fed the above
code, even though the mistake should be easy to detect.

It is not a mistake.
Or am I
misunderstanding string literals?

You are misunderstanding the initialization of char arrays.
 
E

Eric Sosman

Chris said:
[...]
Of course, a MUCH BETTER suggestion was sent in during the review
periods -- a new escape sequence, \z, could be used at the end of
string literals to suppress the final zero byte, so that one could
even write things like x = "\10\2\4\1\z"[i & 3], for instance, to
make a four-byte literal array without the unnecessary '\0' at the
end -- but it was "Not Invented Here" and rejected. Had \z been
accepted, you would have been able to write:

char s[] = "123456\z";

and get an array of size 6, without having to manually count -- or
perhaps mis-count -- the bytes. Leaving out the \z would get a
diagnostic, just as one would expect.

Interesting -- but an escape sequence whose effect is
to *suppress* a character rather than generate one would
certainly be an oddity. An escape sequence that has
different effects at different positions in a literal
would be peculiar. An escape sequence that could be
used in a string literal but not in a character literal
would be downright weird!

Hmmm: Would the proposal have outlawed \z except at
the end of a string literal, or would it have had some
other meaning (perhaps implementation-defined or undefined)
at other positions? I'm thinking of things like

#define ABC "ABC\z"
char abc[] = ABC;
#define XYZ "XYZ\z"
char xyz[] = XYZ;
char abcxyz[] = ABC XYZ; // what happens?

Also, whenever a new notation crops up somebody writes
a coding style guide that recommends its use, as in

puts ("Hello, world!\0\z");

The "benefit," of course, is that the programmer can see the
formerly invisible terminator, and perhaps be less likely to
commit the common mistake of overlooking it ... Ugly!
 
C

Chris Torek

Interesting -- but an escape sequence whose effect is
to *suppress* a character rather than generate one would
certainly be an oddity.

True enough.
... An escape sequence that could be
used in a string literal but not in a character literal
would be downright weird!

It might either be ignored, or diagnosed, when used in "strange"
places:
Hmmm: Would the proposal have outlawed \z except at
the end of a string literal, or would it have had some
other meaning (perhaps implementation-defined or undefined)
at other positions?

I do not recall whether the proposal even covered this, much less
what it might have said. If I were proposing it myself (e.g., if I
thought anyone might listen :) ), I would say it gets ignored in
other positions in string literals:
I'm thinking of things like

#define ABC "ABC\z"
char abc[] = ABC;
#define XYZ "XYZ\z"
char xyz[] = XYZ;
char abcxyz[] = ABC XYZ; // what happens?

Here the concatenation would result in "ABC\zXYZ\z" which "means"
the same as just "ABCXYZ\z", i.e., sizeof abcxyz would be 6.

I would probably vote for a required diagnostic if \z is used in
character constants, so that both 'a\z' and just '\z' are errors.
Also, whenever a new notation crops up somebody writes
a coding style guide that recommends its use, as in

puts ("Hello, world!\0\z");

The "benefit," of course, is that the programmer can see the
formerly invisible terminator, and perhaps be less likely to
commit the common mistake of overlooking it ... Ugly!

Indeed. But note that we can already do something similar in C99:

puts((const char []){'H', 'e', 'l', 'l', 'o', ',', ' ',
'w', 'o', 'r', 'l', 'd', '!', '\0'});

Again, the programmer can now see the formerly-invisible terminator
(if said programmer can see anything at all, in amongst all that
syntax! :) ).
 
C

CBFalconer

Chris said:
.... snip ...

The X3J11 committee folks went along with the dumb idea :) , so
now that is what we have.

And I wonder which version you supported :) In point of fact I
can see arguments both ways, and I suspect the one that won was
something like: Why have another escape which nobody ever heard
of before for a single purpose that can be handles anyhow.

--
"I'm a war president. I make decisions here in the Oval Office
in foreign policy matters with war on my mind." - Bush.
"Churchill and Bush can both be considered wartime leaders, just
as Secretariat and Mr Ed were both horses." - James Rhodes.
"If I knew then what I know today, I would still have invaded
Iraq. It was the right decision" - G.W. Bush, 2004-08-02
 
C

Christian Kandeler

Martin said:
char s[6] = "123456";

It is *not* a mistake. It declares an array of 6 chars, but not a
string. There is no law that every array of chars must be a string.

I was aware of the fact that not all character arrays are strings. I was
also aware of the fact that string literals have an implicit '\0' at the
end. What I was not aware of is the fact that this - and only this - last
character of the string literal can legally just magically disappear if
that makes it fit into a character array. And I must say that I find this
behaviour rather, umh, peculiar. Anyway, thanks to all who have answered.


Christian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,145
Messages
2,570,826
Members
47,371
Latest member
Brkaa

Latest Threads

Top