String constants

M

MQ

Hi,

I have a question about string constants. I compile the following program:

#include <stdio.h>
#include <string.h>

int main(void)
{
char str1[] = "\007";
char str2[] = "\0" "07";
char str3[] = { '\0', '0', '7', '\0' };

printf("str1 = %s\n" "str2 = %s\n" "str3 = %s\n", str1, str2, str3);
printf("sizeof(str1) = %d\n" "sizeof(str2) = %d\n"
"sizeof(str3) = %d\n", sizeof(str1), sizeof(str2),
sizeof(str3));
printf("strlen(str1) = %d\n" "strlen(str2) = %d\n"
"strlen(str3) = %d\n", strlen(str1), strlen(str2),
strlen(str3));

return 0;
}

Here is the output:

str1 =
str2 =
str3 =
sizeof(str1) = 2
sizeof(str2) = 4
sizeof(str3) = 4
strlen(str1) = 1
strlen(str2) = 0
strlen(str3) = 0

I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character. However, should not str1 and str2
be the same?

No, str1 contains a single ASCII character with value 7, followed by a
null terminator, which gives a length of two. str2 is actually three
characters, which are '\0', which is a null terminator character,
followed by the '0' character, followed by the '7' character. With the
null terminator at the end of the string, you have four characters.

str1 appears invisible because ASCII 7 is a non-printable character.
In str2 and str3 you have actually created a string which starts with a
null terminator, making the string appear to be empty (which is why
strlen returns 0 in both of these cases)
 
W

Walter Roberson

char str1[] = "\007";
char str2[] = "\0" "07";
char str3[] = { '\0', '0', '7', '\0' };
I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character.

True (provided the d are all in the range 0 through 7.)
However, should not str1 and str2
be the same? Obscure feature conflict (\ddd vs string concatenation)?

Concatenation of adjacent string literals is not done until a
later point than tokenization of the strings.

In str3, there is no concatenation taking place: you have
specified, char by char, exactly what should be put into adjacent
locations in the array.

Going back to your second string: would you expect that "\" "007"
would compile the same as "\007" ? It doesn't of course -- the
backslash escapes the double-quote, rather than being held in
suspension in case something is going to show up later.

The behaviour is well specified in C89: the octal sequence stops
at the first non-octal character.

Consider a problem in the hex escape sequences: "\xABCD".
That is treated as four hex digits, possibly split over several char.
Suppose, though, that you wanted to stop after the \xAB and you
wanted literal C and literal D: how would you do it?
The solution from the standard is that you can use "\xAB" "CD"
because the sequence ends at the first non-hex character
(the second double-quote.) But suppose it were otherwise, that
concatention took place first and then the result was maximally
tokenized: then in order to get the C to be a C, you would have to
put in the hex value corresponding to "C", and then you'd have to
put in the hex value corresponding to "D", and you'd have to
keep on encoding until finally your text happened to include something
that wasn't interpretable as hex.
 
G

gthorpe

Hi,

I have a question about string constants. I compile the following program:

#include <stdio.h>
#include <string.h>

int main(void)
{
char str1[] = "\007";
char str2[] = "\0" "07";
char str3[] = { '\0', '0', '7', '\0' };

printf("str1 = %s\n" "str2 = %s\n" "str3 = %s\n", str1, str2, str3);
printf("sizeof(str1) = %d\n" "sizeof(str2) = %d\n"
"sizeof(str3) = %d\n", sizeof(str1), sizeof(str2),
sizeof(str3));
printf("strlen(str1) = %d\n" "strlen(str2) = %d\n"
"strlen(str3) = %d\n", strlen(str1), strlen(str2),
strlen(str3));

return 0;
}

Here is the output:

str1 =
str2 =
str3 =
sizeof(str1) = 2
sizeof(str2) = 4
sizeof(str3) = 4
strlen(str1) = 1
strlen(str2) = 0
strlen(str3) = 0

I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character. However, should not str1 and str2
be the same? Obscure feature conflict (\ddd vs string concatenation)?
 
M

MQ

Also, if you wanted to, for example, use a string containing '\01' '0', how
would you do this unambiguously? As in str2? How about '\0' "01" or '\0' '0'???

I'm not sure what you are trying to acheive, but it seems you are not
understanding how strings work. '\0' is ASCII 0. You cannot justr
append this to a string of numbers and get a single character out of
it. You will get a string with ASCII 0 at the start (the null
character) plus the string of numbers. Can you explain what you are
trying to do so we can suggest a better way...

MQ
 
S

spibou

Also, if you wanted to, for example, use a string containing '\01' '0', how
would you do this unambiguously?

Since '\01' has value 1 I assume that what you want is a
string whose first byte has the value 1 and second byte the
value '0'. You do that with char str[] = {1,'0'}
How about '\0' "01" or '\0' '0'???

It would make your thoughts more clear for us if you
wrote the complete statement you have in mind.

Spiros Bousbouras
 
G

gthorpe

I have a question about string constants. I compile the following program:
#include <stdio.h>
#include <string.h>
int main(void)
{
char str1[] = "\007";
char str2[] = "\0" "07";
char str3[] = { '\0', '0', '7', '\0' };
printf("str1 = %s\n" "str2 = %s\n" "str3 = %s\n", str1, str2, str3);
printf("sizeof(str1) = %d\n" "sizeof(str2) = %d\n"
"sizeof(str3) = %d\n", sizeof(str1), sizeof(str2),
sizeof(str3));
printf("strlen(str1) = %d\n" "strlen(str2) = %d\n"
"strlen(str3) = %d\n", strlen(str1), strlen(str2),
strlen(str3));
return 0;
}
Here is the output:
str1 =
str2 =
str3 =
sizeof(str1) = 2
sizeof(str2) = 4
sizeof(str3) = 4
strlen(str1) = 1
strlen(str2) = 0
strlen(str3) = 0
I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character. However, should not str1 and str2
be the same? Obscure feature conflict (\ddd vs string concatenation)?

Also, if you wanted to, for example, use a string containing '\01' '0', how
would you do this unambiguously? As in str2? How about '\0' "01" or '\0' '0'???
 
C

Chris Torek

I have a question about string constants.

There are a number of tricks you need to "get straight in your head"
in order to deal with this.

First, a C string is actually a data structure, namely, an array
of "char"s in which the first zero-byte is considered the end of the
string.

Second, escapes like '\007' are interpreted by the compiler, and
the lexical rules for the octal version are:

From the backslash, consume up to (but no more than) three
octal digits, stopping when you run out of digits or when
the first "invalid" character occurs.

Hence, if you encounter

\1\29\00345

this "means" \1, then \2, then 9, then \003, then 4, then 5.

Third, string literals usually -- but not always[%] -- mean "generate
an anonymous array containing the characters given in the literal,
with a \0 character appended".

Last, adjacent string literals are concatenated after escape sequence
interpretation, but before adding the final \0.
char str1[] = "\007";

This string literal has one \7 character inside, so generates an
array containing two characters, namely \7 and \0.
char str2[] = "\0" "07";

Here there are two adjacent string literals. The first has one
\0 character inside, and the second has two characters inside,
'0' and '7'. These are concatenated -- giving '\0' '0' '7'
in that order -- and a final \0 is added. The result is the same
as if you wrote either:

char str2[] = "\00007";

or the initializer you gave for str3:
char str3[] = { '\0', '0', '7', '\0' };

Both of these create an array of size 4, containing the four
specified "char"s. Since str2 and str3 both begin with a zero
byte, their strlen()s are zero, even though both arrays continue
(always) to hold four "char"s.
I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character.

Right -- but only if the digits are uninterrupted, and all octal.
(The situation is quite different for \x escapes, as someone else
noted elsethread.)
However, should not str1 and str2 be the same?

No; the order in which the escape-interpretation and
string-literal-concatenation occurs forbids this.

[% The two exceptions are: when the literal is not the last in an
adjacent sequence, so that concatenation occurs before adding the
\0, or when the literal is used as an initializer for an array
whose size was specified, and whose specified size is exactly large
enough to hold the characters in the literal without adding the
\0. Making use of this second exception is particularly annoying;
it reminds me of the Bad Old Days of Hollerith constants in Fortran.]
 
K

Keith Thompson

Also, if you wanted to, for example, use a string containing '\01'
'0', how would you do this unambiguously? As in str2? How about '\0'
"01" or '\0' '0'???

If you want a string containing the characters '\1' and '0', you can
use "\0010", since an octal escape has at most 3 characters. Or you
can split it into two string literals: "\1" "0".

Your second example, '\0' "01" is ill-formed; adjacent string literals
are concatenated, but character constants are not. Assuming you want
{ '\0', "0", "1" }, you can write "\00001", or, more clearly,
"\0" "01".

Similarly, for your third example, you can write "\0000" or "\0" "0".

In each case, of course, there's an implicit trailing '\0' at the end
of each string literal (after concatentation), even if the last
character is an explicit '\0' -- but this is suppressed if the string
literal is an initializer for a character array of exactly the right
size. For example:
const char x[3] = "abc";
initializes x to { 'a', 'b', 'c' }, but
const char y[] = "abc"
initializes y to { 'a', 'b', 'c', '\0' }.
 
A

Andrew Poelstra

Hi,

I have a question about string constants. I compile the following program:

#include <stdio.h>
#include <string.h>

int main(void)
{
char str1[] = "\007";
If you're using ASCII, \007 is an unprintable character. Hence the
string appears empty.
char str2[] = "\0" "07";
Here you begin a string with a 0, which in C terminates a string. Hence,
the string /is/ empty.
char str3[] = { '\0', '0', '7', '\0' };
And you have the same problem here: '\0' signifies the end of a string.

Here is the output:

str1 =
str2 =
str3 =
sizeof(str1) = 2
sizeof(str2) = 4
sizeof(str3) = 4
strlen(str1) = 1
strlen(str2) = 0
strlen(str3) = 0

I understand that yet another obscure C feature is the octal character
specification so that \ddd is one character. However, should not str1 and
str2 be the same? Obscure feature conflict (\ddd vs string concatenation)?

Concatenation (sp?) occurs before or at the same time as replacing
escape characters, which includes hexadecimal and octal numbers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,739
Latest member
Clint8040

Latest Threads

Top