Binary storage of string constants

R

Ross

Suppose I define a char* variable as follows:

char *s = "€";

What actually gets put into the binary? Presumably, it gets stored in
the encoding of the source file. Am I right? Or is it compiler/platform
dependent?

The C spec suggests that string constants get mapped in an
implementation-defined manner to members of the execution character
set. Does this mean that some compilers perform iconv-esque conversion
between the source and execution character sets at runtime? If so, does
this mean the result of strlen(s) may vary depending on the execution
character set?

Thanks in advance.
 
C

Chris Dollin

Richard said:
Ross said:


I don't know what you wrote there, but on my display I can see a little
white square.

Looks like a Euro here in this newsreader (knode). And when I tried pasting
it into a command window (konsole), it seemed to become a zero-width
character - and backspacing rubbed out a space in the command prompt!
 
R

Richard Heathfield

Ross said:
Suppose I define a char* variable as follows:

char *s = "€";

I don't know what you wrote there, but on my display I can see a little
white square. (Just a quick tip: use const char * when pointing at string
literals.)
What actually gets put into the binary?

It depends. The value might not even make it into the binary, depending on
whether s gets used. But typically the coding point of the character will
appear in the binary somewhere.
Presumably, it gets stored in
the encoding of the source file. Am I right? Or is it compiler/platform
dependent?

Very much so.

<snip>
 
R

Ross

Yeah, it was supposed to be a Euro symbol.

Any idea what happens at runtime, then? Is it possible that the string
gets converted into the execution chacacter set, or will it just remain
'as is' in the source character set? Does the same apply to character
constants?
 
R

Richard Heathfield

Chris Dollin said:
Looks like a Euro here in this newsreader (knode).

I'm using knode too. Perhaps Euros look like little white squares. (I must
admit I thought they were triangular rubber coins 6800 miles on a side, but
I've never actually seen one, so I could be wrong about that.)
And when I tried
pasting it into a command window (konsole), it seemed to become a
zero-width character - and backspacing rubbed out a space in the command
prompt!

Oopsie. If I were you, I'd sue the OP for breach of command prompt.
 
S

Skarmander

Ross said:
Suppose I define a char* variable as follows:

char *s = "€";

What actually gets put into the binary?

You don't know.
Presumably, it gets stored in the encoding of the source file. Am I
right?

No. The encoding of the source file is in principle completely immaterial to
whatever the compiler output is. Theoretically, the compiler could even
produce code that "computes" your strings just-in-time, so there aren't any
characters in the binary at all.
Or is it compiler/platform dependent?
Yes.

The C spec suggests that string constants get mapped in an
implementation-defined manner to members of the execution character
set. Does this mean that some compilers perform iconv-esque conversion
between the source and execution character sets at runtime?

The compiler isn't allowed to do that. Mapping of characters int the source
character set to the execution character set takes place at translation
time. "Implementation-defined" just means that the way of mapping has to be
documented.
If so, does this mean the result of strlen(s) may vary depending on the
execution character set?
Only insofar as the results of strlen() depend on the execution character
set used at translation (which is when the mapping from source character set
to execution character set happens).

When strlen() gets around, all that's left are characters stored in bytes.
strlen() counts these characters, which is the same as the number of bytes
they occupy. The result of strlen() on "the same" string may therefore vary
with platform, and even with compilation on the same platform, but not with
execution of the same translated program.

S.
 
R

Ross

Does this mean that 'execution character set' is referring to the
execution of the compiler, rather than the execution of the compiled
program (as I had assumed)?
 
R

Ross

Scrub that. I'm pretty sure that 'execution character set' is referring
to the execution of the compiled program. I guess the real question
should be: what is meant by 'translation time'? If it's synonymous with
'compilation time', how can the compiler know what the execution
character set is going to be? Surely this depends on the locale of the
system on which the compiled program is executed?
 
R

Ross

OK, I see where I'm going wrong. The execution character set is fixed
at compile time and is in no way affected by the locale of the system
in which the binary is executed.

Thanks to one and all.
 
K

Keith Thompson

Ross said:
Does this mean that 'execution character set' is referring to the
execution of the compiler, rather than the execution of the compiled
program (as I had assumed)?

Does *what* mean that?

Google Groups, for a long time, made it gratuitously difficult to
provide proper context when posting a followup, but I believe the
problem has been corrected.

Please leave enough context in your followups so that they make some
sense even if the previous article isn't available. Those of us who
read everything in this newsgroup can't remember all the details of
every thread.

<http://cfaj.freeshell.org/google/> has information (now obsolete)
about how to work around Google's former bug; it also has a number of
links to articles with good information about posting to Usenet.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,181
Messages
2,570,970
Members
47,537
Latest member
BellCorone

Latest Threads

Top