strings

T

thomas

char a[] = "abc";
char a[] = {'a','b','c','\0'};

Is there ANY difference between these two strings?
 
M

Maxim Yegorushkin

char a[] = "abc";
char a[] = {'a','b','c','\0'};

Is there ANY difference between these two strings?

"abc" has the same binary layout as {'a','b','c','\0'} does, hence no
difference.

Why do you ask?
 
O

osmium

Maxim Yegorushkin said:
char a[] = "abc";
char a[] = {'a','b','c','\0'};

Is there ANY difference between these two strings?

"abc" has the same binary layout as {'a','b','c','\0'} does, hence no
difference.

Why do you ask?

My WAG.

He wanted to know the answer.
 
T

thomas

char a[] = "abc";
char a[] = {'a','b','c','\0'};
Is there ANY difference between these two strings?

"abc" has the same binary layout as {'a','b','c','\0'} does, hence no
difference.

Why do you ask?

I thought the first may mean that the string cannot be modified. But
actually it can.
So I cannot figure out any difference between these two which I hope
exists.
 
M

Maxim Yegorushkin

thomas said:
On 24/10/09 10:58, thomas wrote:

char a[] = "abc";
char a[] = {'a','b','c','\0'};
Is there ANY difference between these two strings?
"abc" has the same binary layout as {'a','b','c','\0'} does, hence no
difference.

Why do you ask?

I thought the first may mean that the string cannot be modified. But
actually it can.
So I cannot figure out any difference between these two which I hope
exists.

In the first case you are not actually modifying the literal "abc" but
the values inside the 'a' array. That's different from:

const char* a = "abc";

In this case you don't have an array. You have a pointer pointing to a
literal. An modification attempt would modify the literal (and thus is
UB). In the case of the array you are modifying the contents of the
array, not the contents of the literal you used to initialize the array.

A bit off topic, from unix linker point of view, char const[] is better
than char const* for global and namespace scope strings.

This is because the latter refers to a string literal, so that there are
two objects: 1) char const* and 2) a string literal to which 1) refers.
In this case the (runtime) linker needs to resolve the addresses of both
objects. On the other hand, when char const[] is initialized with a
string literal, there is only one object char const[] where that string
literal is actually stored, so there is one object to resolve for the
linker.
 
J

James Kanze

thomas said:
On Oct 24, 8:09 pm, Maxim Yegorushkin<[email protected]>
wrote:
On 24/10/09 10:58, thomas wrote:
char a[] = "abc";
char a[] = {'a','b','c','\0'};
Is there ANY difference between these two strings?
"abc" has the same binary layout as {'a','b','c','\0'} does, hence no
difference.
Why do you ask?
I thought the first may mean that the string cannot be
modified. But actually it can. So I cannot figure out any
difference between these two which I hope exists.
In the first case you are not actually modifying the literal
"abc" but the values inside the 'a' array. That's different
from:
const char* a = "abc";
In this case you don't have an array. You have a pointer
pointing to a literal. An modification attempt would modify
the literal (and thus is UB). In the case of the array you
are modifying the contents of the array, not the contents of
the literal you used to initialize the array.
A bit off topic, from unix linker point of view, char const[]
is better than char const* for global and namespace scope
strings.

Every linker I've seen (Unix or otherwise) is capable of
handling both without any distinction. There are, however, a
couple of subtle issues.
This is because the latter refers to a string literal, so that
there are two objects: 1) char const* and 2) a string literal
to which 1) refers. In this case the (runtime) linker needs
to resolve the addresses of both objects. On the other hand,
when char const[] is initialized with a string literal, there
is only one object char const[] where that string literal is
actually stored, so there is one object to resolve for the
linker.

The issue is a bit more subtle: if "char const*" is used, there
is a non-const object, the pointer; if "char const[]" is used,
the only object in sight is the array of const, and an array of
const is treated as if it were const. And in C++, if an object
has a const type, it defaults to internal linkage. So each
translation unit using the "char const[]" gets its own copy
(with a different address), where as the variable declared "char
const*" has external linkage: it should be defined only once in
the program (which means that if you use "char const* p" in a
header file, you're likely to get multiple definitions).

The general rule I use (but other solutions are possible) is
that if I want to put the initializer in the header file, and it
doesn't matter that each instance has a different address (and
that the instances are duplicated, increasing executable size),
I use »char const var[] = "...";«---this is also what I'd use if
the variable is only used in a single translation unit (and is
declared in the source file. Otherwise (which is most of the
time when the variable is shared between different translation
units), I use »extern char const var[];« in the header file, and
»extern char const var[] = "...";« in one source file. (Note,
however, that you can't instantiate a template using the »char
const []« variant, since templates require external linkage.
This isn't a problem that often, however, as templates which
require a char const* as an argument aren't that frequent.)

The use of »char const*« is generally limited to tables of
strings, e.g.:
char const* const table[] = { ... };
The same considerations as above apply here to the const after
the *: as written, »table« has internal linkage, but an extern
in front of it can be used to give it external linkage.
 
M

Maxim Yegorushkin

thomas wrote:
On Oct 24, 8:09 pm, Maxim Yegorushkin<[email protected]>
wrote:
On 24/10/09 10:58, thomas wrote:
char a[] = "abc";
char a[] = {'a','b','c','\0'};
Is there ANY difference between these two strings?
"abc" has the same binary layout as {'a','b','c','\0'} does, hence no
difference.
Why do you ask?
I thought the first may mean that the string cannot be
modified. But actually it can. So I cannot figure out any
difference between these two which I hope exists.
In the first case you are not actually modifying the literal
"abc" but the values inside the 'a' array. That's different
from:
const char* a = "abc";
In this case you don't have an array. You have a pointer
pointing to a literal. An modification attempt would modify
the literal (and thus is UB). In the case of the array you
are modifying the contents of the array, not the contents of
the literal you used to initialize the array.
A bit off topic, from unix linker point of view, char const[]
is better than char const* for global and namespace scope
strings.

Every linker I've seen (Unix or otherwise) is capable of
handling both without any distinction.

Not true.

Here is an example:

[max@truth test]$ cat test.cc
char const a[] = "abc";
char const* b = "def";
char const* foo() { return a; }
char const* bar() { return b; }

Let's look at the assembly code of foo() and bar() when they are
compiled as position independent code (intended to be a part of a shared
library):

[max@truth test]$ g++ -O3 -fPIC -S -o - test.cc | c++filt
.file "test.cc"
.text
.p2align 4,,15
..globl foo()
.type foo(), @function
foo():
..LFB0:
.cfi_startproc
.cfi_personality 0x9b,DW.ref.__gxx_personality_v0
leaq a(%rip), %rax
ret
.cfi_endproc
..LFE0:
.size foo(), .-foo()
.p2align 4,,15
..globl bar()
.type bar(), @function
bar():
..LFB1:
.cfi_startproc
.cfi_personality 0x9b,DW.ref.__gxx_personality_v0
movq b@GOTPCREL(%rip), %rax
movq (%rax), %rax
ret
.cfi_endproc
....

In the code of foo() access to string "abc" through variable a requires
no indirection (access relative to the instruction pointer).

In function bar(), however, access to string "def" through b requires
indirection through the global offset table.

Now, lets look how variables a and b represented in the same assembly
output:

....
.section .rodata.str1.1,"aMS",@progbits,1
..LC0:
.string "def"

.section .data.rel.local,"aw",@progbits
.align 8
.type b, @object
.size b, 8
b:
.quad .LC0

.section .rodata
.type a, @object
.size a, 4
a:
.string "abc"

Here we can see that the value of variable a is a string of 4 bytes and
that string is stored in the read-only data section.

The value of variable b is a pointer (of 8 bytes) and this variable is
stored in section .data.rel.local. This is the section that the run-time
linker fills in at run-time with the actual addresses of referred
symbols. In this case, the value of pointer b is resolved by the dynamic
linker at run-time to point to string .LC0.

In summary, when building a shared library global and namespace scope
char const[] requires no run-time linker processing, where as char
const* does.

Making b char const* const instead, effectively making it static, allows
the compiler optimize out accessing b and access the string referred by
b directly.

The point I was making that using char const* for representing global
and namespace scope read-only stings is the worst possible choice from
the linker standpoint of view. Better choices are char const[] or char
const* const.
 
J

James Kanze

On 24/10/09 18:59, Juha Nieminen wrote:
thomas wrote:
On Oct 24, 8:09 pm, Maxim Yegorushkin<[email protected]>
wrote:
On 24/10/09 10:58, thomas wrote:
char a[] = "abc";
char a[] = {'a','b','c','\0'};
Is there ANY difference between these two strings?
"abc" has the same binary layout as {'a','b','c','\0'}
does, hence no difference.
Why do you ask?
I thought the first may mean that the string cannot be
modified. But actually it can. So I cannot figure out
any difference between these two which I hope exists.
In the first case you are not actually modifying the
literal "abc" but the values inside the 'a' array. That's
different from:
const char* a = "abc";
In this case you don't have an array. You have a pointer
pointing to a literal. An modification attempt would
modify the literal (and thus is UB). In the case of the
array you are modifying the contents of the array, not the
contents of the literal you used to initialize the array.
A bit off topic, from unix linker point of view, char
const[] is better than char const* for global and namespace
scope strings.
Every linker I've seen (Unix or otherwise) is capable of
handling both without any distinction.
Not true.
Here is an example:
[max@truth test]$ cat test.cc
char const a[] = "abc";
char const* b = "def";
char const* foo() { return a; }
char const* bar() { return b; }

Which proves what I said. The linker handles both cases without
any problems; if it doesn't, it's not compiling C++ correctly.

Have you actually tried compiling and linking this? Does one of
the cases fail when you do?
Let's look at the assembly code of foo() and bar() when they
are compiled as position independent code (intended to be a
part of a shared library):

If I were worried about the assembler code, I'd write in
assember.

[...]
In summary, when building a shared library global and namespace scope
char const[] requires no run-time linker processing, where as char
const* does.

So? I wouldn't say that that's really an argument one way or
another.
Making b char const* const instead, effectively making it
static, allows the compiler optimize out accessing b and
access the string referred by b directly.

Let's worry about the semantics before worrying about typically
insignificant optimization issues. My point, as I expanded it,
is that there are significant differences in the semantics
(visibility from other translation units, etc.), and that the
choice should depend on the desired semantics. If the public
interface requires that b be a char const* visible from other
translation units (e.g. so that they can change it), then you
don't have any choice. If the public interface says that b
shouldn't be visible outside the translation unit, then you
can't use just "char const*", except in a local namespace. If
the code requires the value to be used as an argument to a
template, then you can't use just "char const* const", either.
The point I was making that using char const* for representing
global and namespace scope read-only stings is the worst
possible choice from the linker standpoint of view. Better
choices are char const[] or char const* const.

The best choice is always to say what you mean, until the
profiler says you can't. If what you want is an array, then
char const[] is indicated. If what you (conceptually) want is a
pointer, then char const* is indicated. And as I pointed out,
about the only time I'd use char const* const is in a table.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,156
Messages
2,570,878
Members
47,413
Latest member
KeiraLight

Latest Threads

Top