How to put non-ascii characters into C string?

W

wob

Hi there,

I wish to show some special characters such as the char for "alpha" and the
symbol for degrees.. Anyone please give me some suggestions how to do that?

Thank you very much!

Owen
 
E

Eric Sosman

wob said:
Hi there,

I wish to show some special characters such as the char for "alpha" and the
symbol for degrees.. Anyone please give me some suggestions how to do that?

The C Standard guarantees the presence of a "basic execution
character set" consisting of upper- and lower-case unaccented
letters, the digits zero through nine, assorted punctuation marks,
and a few special-purpose characters like '\n'. No matter what C
implementation you are using, these characters will be present and
available for your use.

The Standard also permits additional characters in the (non-
"basic") "execution character set." However, it does not require
that additional characters exist, nor does it specify any such
additional characters. The consequence is that any such extra
characters are available only at the whim of the implementation,
and different implementations will have different (possibly empty)
sets of extra characters.

The numeric character code that produces "alpha" one one
implementation may produce "upper left corner" on another and
"the Euro symbol" on still another. Thus, the way you ask for
an "alpha" to be produced will be specific to your system and
may not work on others; you might send "the alpha code" to the
output stream and see nothing but a smiley face or a blank.

The standardization of character sets and character codes is
a relatively recent development, and C has not yet caught up with
it. C still maintains a sort of agnosticism towards such matters
(which is why C is easily implemented on systems that use different
character repertoires), but it does complicate things when one wants
to use "exotic" characters. The trouble arises even within the
family of Latin-ish alphabets: One cannot write "Cosí fan tutte"
or "Götterdämmerung" or "Aïda" with the facilities guaranteed
by C.

So: You're stuck with the unhappy task of trying to figure out
what character codes (if any!) produce "alpha" and "degree sign"
on your system, and with the realization that the same codes might
not do anything sensible on the next system you use. As a purely
practical and ad-hoc approach, you could write yourself a little
program that runs through every `char' value and displays the glyph
your system produces for each. If you happen to find "alpha" and
"degree symbol" among the rendered glyphs you're in luck -- but
never forget that the same codes may do something completely
different on other systems.

There was a perfect world around here somewhere, but I think
I left it in the pocket of my other trousers.
 
A

akarl

wob said:
I wish to show some special characters such as the char for "alpha" and the
symbol for degrees.. Anyone please give me some suggestions how to do that?

On my Fedora Core 4 system (which uses Unicode UTF-8) I can in fact do:

#include <stdio.h>
#include <string.h>

int main(void)
{
printf("The character α occupies %i bytes.\n", strlen("α"));
return 0;
}

and the program will output:

The character α occupies 2 bytes.

In Emacs, special Unicode characters can be inserted with
`set-input-method' and argument `TeX'. α is then inserted by typing
`\alpha'. (This message will not display correctly without UTF-8 support.)


August
 
S

SM Ryan

# Hi there,
#
# I wish to show some special characters such as the char for "alpha" and the
# symbol for degrees.. Anyone please give me some suggestions how to do that?

If the encoding of the glyph doesn't include a null byte, you can use fprintf
or fputs or %s or %c. However determining the encoding of glyphs at the moment
is in flux. It depends on the font or whether you're using Unicode which gives
one code for all fonts. If it's Unicode, you have to know what the encoding is
like UTF-8 or Latin-1 or MacRoman or ...

It all depends on the context in which you want to specify a character like
"alpha".
 
C

Charles M. Reinke

----- Original Message -----
From: "Eric Sosman" <[email protected]>
Newsgroups: comp.lang.c
Sent: Wednesday, July 20, 2005 9:09 PM
Subject: Re: How to put non-ascii characters into C string?

wob said:
Hi there,

I wish to show some special characters such as the char for "alpha" and the
symbol for degrees.. Anyone please give me some suggestions how to do
that?

[snip]

So: You're stuck with the unhappy task of trying to figure out
what character codes (if any!) produce "alpha" and "degree sign"
on your system, and with the realization that the same codes might
not do anything sensible on the next system you use. As a purely
practical and ad-hoc approach, you could write yourself a little
program that runs through every `char' value and displays the glyph
your system produces for each. If you happen to find "alpha" and
"degree symbol" among the rendered glyphs you're in luck -- but
never forget that the same codes may do something completely
different on other systems.

There was a perfect world around here somewhere, but I think
I left it in the pocket of my other trousers.

The following code works for me, although some of the control characters
mangle the output in places:

#include <stdio.h>
#define MY_CHAR_MAX 256

int main(void) {
int i;

for(i=0;i<(MY_CHAR_MAX-1);i+=3) {
printf("Char value: %d = %-8c", i, i);
printf("Char value: %d = %-8c", i+1, i+1);
printf("Char value: %d = %c\n", i+2, i+2);
} /* for i */
printf("Char value: %d = %c\n", i, i);

return 0;
} /* main */

Apparently, my implementation has no "alpha", and the degree symbol is value
176.

-Charles
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
----- Original Message -----
From: "Eric Sosman" <[email protected]>
Newsgroups: comp.lang.c
Sent: Wednesday, July 20, 2005 9:09 PM
Subject: Re: How to put non-ascii characters into C string?


wob said:
Hi there,

I wish to show some special characters such as the char for "alpha" and
the
symbol for degrees.. Anyone please give me some suggestions how to do
that?

[snip]


So: You're stuck with the unhappy task of trying to figure out
what character codes (if any!) produce "alpha" and "degree sign"
on your system, and with the realization that the same codes might
not do anything sensible on the next system you use. As a purely
practical and ad-hoc approach, you could write yourself a little
program that runs through every `char' value and displays the glyph
your system produces for each. If you happen to find "alpha" and
"degree symbol" among the rendered glyphs you're in luck -- but
never forget that the same codes may do something completely
different on other systems.

There was a perfect world around here somewhere, but I think
I left it in the pocket of my other trousers.


The following code works for me, although some of the control characters
mangle the output in places:

#include <stdio.h>
#define MY_CHAR_MAX 256

int main(void) {
int i;

for(i=0;i<(MY_CHAR_MAX-1);i+=3) {
printf("Char value: %d = %-8c", i, i);
printf("Char value: %d = %-8c", i+1, i+1);
printf("Char value: %d = %c\n", i+2, i+2);
} /* for i */
printf("Char value: %d = %c\n", i, i);

return 0;
} /* main */

Apparently, my implementation has no "alpha", and the degree symbol is value
176.

Funny, when I ran your code, I find that value 176 is empty. OTOH, I get

Char value: 192 = { Char value: 193 = A Char value: 194 = B
Char value: 195 = C Char value: 196 = D Char value: 197 = E
Char value: 198 = F Char value: 199 = G Char value: 200 = H
Char value: 201 = I Char value: 202 = Char value: 203 =
Char value: 204 = ö Char value: 205 = Char value: 206 = ó
Char value: 207 = Char value: 208 = } Char value: 209 = J
Char value: 210 = K Char value: 211 = L Char value: 212 = M
Char value: 213 = N Char value: 214 = O Char value: 215 = P
Char value: 216 = Q Char value: 217 = R Char value: 218 =
Char value: 219 = Char value: 220 = Char value: 221 =
Char value: 222 = Char value: 223 = Char value: 224 = \
Char value: 225 = Char value: 226 = S Char value: 227 = T
Char value: 228 = U Char value: 229 = V Char value: 230 = W
Char value: 231 = X Char value: 232 = Y Char value: 233 = Z

What sort of "Extended ASCII" do I have?

- --
Lew Pitcher
IT Specialist, Enterprise Data Systems,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFC3+GSagVFX4UWr64RAm4ZAKC5bRwzaQzqHPru5msBTS9NNBEyrQCghIGn
PQ9E4Rnlddi/aJFOTjcD9N8=
=V3NY
-----END PGP SIGNATURE-----
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lew said:
Charles M. Reinke wrote: [snip]

Funny, when I ran your code, I find that value 176 is empty. OTOH, I get

Char value: 192 = { Char value: 193 = A Char value: 194 = B
Char value: 195 = C Char value: 196 = D Char value: 197 = E
Char value: 198 = F Char value: 199 = G Char value: 200 = H
Char value: 201 = I Char value: 202 = Char value: 203 =
Char value: 204 = ö Char value: 205 = Char value: 206 = ó
Char value: 207 = Char value: 208 = } Char value: 209 = J
Char value: 210 = K Char value: 211 = L Char value: 212 = M
Char value: 213 = N Char value: 214 = O Char value: 215 = P
Char value: 216 = Q Char value: 217 = R Char value: 218 =
Char value: 219 = Char value: 220 = Char value: 221 =
Char value: 222 = Char value: 223 = Char value: 224 = \
Char value: 225 = Char value: 226 = S Char value: 227 = T
Char value: 228 = U Char value: 229 = V Char value: 230 = W
Char value: 231 = X Char value: 232 = Y Char value: 233 = Z

What sort of "Extended ASCII" do I have?

Hint, here's how I compiled and ran the code...

//PITCHLW JOB (0000),' LEW PITCHER',CLASS=E,MSGCLASS=A,
// PRTY=8,NOTIFY=LDP
/*ROUTE PRINT LOCAL
//CC EXEC EDCCLG,
// INFILE='TEST.LDP.SOURCE(CHARTEST)'

- --
Lew Pitcher
IT Specialist, Enterprise Data Systems,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFC3+IqagVFX4UWr64RAhIIAJ0eNPB8Q3yb04VXgy9vaqYT3YNtbgCggoDj
9Cotu1T18sfb951PhnGizrY=
=k4f5
-----END PGP SIGNATURE-----
 
C

Charles M. Reinke

Lew Pitcher said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lew Pitcher wrote: [snip]
What sort of "Extended ASCII" do I have?

Hint, here's how I compiled and ran the code...

//PITCHLW JOB (0000),' LEW PITCHER',CLASS=E,MSGCLASS=A,
// PRTY=8,NOTIFY=LDP
/*ROUTE PRINT LOCAL
//CC EXEC EDCCLG,
// INFILE='TEST.LDP.SOURCE(CHARTEST)'
[snip]

Let's see, that looks like something from an IBM mainframe, maybe OS/390.
If that's the case, then the character set should be EBCDIC, which according
to Wikipedia is incompatible with ASCII and therefore not any kind of
"Extended ASCII". Based on the output you showed, I'd say you're using the
CCSID 500 (or something similar) variant of EBCDIC, which would explain why
e.g. "Char value: 202" was something other than "J".

I'm I close? What did I win?

-Charles
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lew Pitcher wrote:
[snip]
What sort of "Extended ASCII" do I have?

Hint, here's how I compiled and ran the code...

//PITCHLW JOB (0000),' LEW PITCHER',CLASS=E,MSGCLASS=A,
[snip]
I'm I close? What did I win?

Good analysis. You win not being bothered by me for a week :)


- --
Lew Pitcher
IT Specialist, Enterprise Data Systems,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFC3/4jagVFX4UWr64RAs4wAJ9LDU1HACQFGXxoMUg8aYvkEwaE5gCgt3pE
Vcw9tpVvYwqOIV5+7olfRlA=
=i/bI
-----END PGP SIGNATURE-----
 
C

CBFalconer

Lew said:
.... snip ...

Funny, when I ran your code, I find that value 176 is empty. OTOH, I get

Char value: 192 = { Char value: 193 = A Char value: 194 = B
Char value: 195 = C Char value: 196 = D Char value: 197 = E
Char value: 198 = F Char value: 199 = G Char value: 200 = H
Char value: 201 = I Char value: 202 = Char value: 203 =
Char value: 204 = ö Char value: 205 = Char value: 206 = ó
Char value: 207 = Char value: 208 = } Char value: 209 = J
Char value: 210 = K Char value: 211 = L Char value: 212 = M
Char value: 213 = N Char value: 214 = O Char value: 215 = P
Char value: 216 = Q Char value: 217 = R Char value: 218 =
Char value: 219 = Char value: 220 = Char value: 221 =
Char value: 222 = Char value: 223 = Char value: 224 = \
Char value: 225 = Char value: 226 = S Char value: 227 = T
Char value: 228 = U Char value: 229 = V Char value: 230 = W
Char value: 231 = X Char value: 232 = Y Char value: 233 = Z

What sort of "Extended ASCII" do I have?

I'm not sure, but I suspect you have EBCDIC, which is not ASCII.
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Lew Pitcher wrote:

... snip ...



I'm not sure, but I suspect you have EBCDIC, which is not ASCII.

Yah. It was a trick question. :)

I sometimes like to 'remind' clc posters that conforming C isn't restricted to
charactersets that map ASCII to the first 128 codepoints. It's a failing of
mine, I know, and I pay penance for it every day; I'm a COBOL programmer by
trade :-S


- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.7 (GNU/Linux)

iD8DBQFC4FXEagVFX4UWr64RAlgtAJ4mNY9IOftiv2/25FqmSoit/NeLGgCg2lb/
peLXjWQhpUme2B+iHhMqVEE=
=Vhj7
-----END PGP SIGNATURE-----
 
W

wob

Many thanks for all!

Lew Pitcher said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Lew Pitcher wrote:
[snip]

What sort of "Extended ASCII" do I have?

Hint, here's how I compiled and ran the code...

//PITCHLW JOB (0000),' LEW PITCHER',CLASS=E,MSGCLASS=A,
[snip]
I'm I close? What did I win?

Good analysis. You win not being bothered by me for a week :)


- --
Lew Pitcher
IT Specialist, Enterprise Data Systems,
Enterprise Technology Solutions, TD Bank Financial Group

(Opinions expressed are my own, not my employers')
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)

iD8DBQFC3/4jagVFX4UWr64RAs4wAJ9LDU1HACQFGXxoMUg8aYvkEwaE5gCgt3pE
Vcw9tpVvYwqOIV5+7olfRlA=
=i/bI
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,167
Messages
2,570,911
Members
47,453
Latest member
MadelinePh

Latest Threads

Top