C++ grammar: universal-character-name in identifiers

Francesco · Sep 6, 2009

Hi there,
sorry for posting this as a separate thread but the other one started
with the wrong foot.

After having posted (there) that C++ program with Chinese characters
used as identifiers, I begun to think: what if those identifiers
aren't really valid?

Then I started my search for checking out whether that program was
really valid C++ as I prematurely claimed.

Searching the web I wasn't able to find any source for clarifying this
issue - I was looking for some Unicode table classifying characters as
"digit", "alphabetic" and so on, and I wasn't able to find it - maybe
such a table doesn't even exist. I found an on-line interface to a
Chinese characters DB reporting codes, strokes classifications and so
on, but that's all about it.

Then, browsing my copy of TC++PL I've dropped my eye on the grammar.

An identifier is declared in this way:
-------
identifier:
nondigit
identifier nondigit
identifier digit
-------
and also:
-------
nondigit: one of
universal-character-name
_ a b c [...] x y z
A B C [...] X Y Z
-------

Of course, there is a universal-character-name for each digit,
punctuation sign and so on, but since those are defined as specific
grammar items (i.e. "digit", "preprocessing-op-or-punc" and so on) I
assume that "one of universal-character-name" excludes those
characters by definition.

So then, does it mean that "universal-character-name" stands for [a
representation of] _any_ character other than those defined by other
parts of the grammar - even if they represent a digit in some other
language?

For instance, take the character äºŒ (two) - if missing, the glyph looks
like an equal sign "=", just for information.

That's a digit in Chinese, does C++ consider it digit or nondigit?

Thank you for your attention,
best regards,
Francesco

Alf P. Steinbach · Sep 6, 2009

* Francesco:

Hi there,
sorry for posting this as a separate thread but the other one started
with the wrong foot.

After having posted (there) that C++ program with Chinese characters
used as identifiers, I begun to think: what if those identifiers
aren't really valid?

Then I started my search for checking out whether that program was
really valid C++ as I prematurely claimed.

Searching the web I wasn't able to find any source for clarifying this
issue - I was looking for some Unicode table classifying characters as
"digit", "alphabetic" and so on, and I wasn't able to find it - maybe
such a table doesn't even exist. I found an on-line interface to a
Chinese characters DB reporting codes, strokes classifications and so
on, but that's all about it.

Then, browsing my copy of TC++PL I've dropped my eye on the grammar.

An identifier is declared in this way:
-------
identifier:
nondigit
identifier nondigit
identifier digit
-------
and also:
-------
nondigit: one of
universal-character-name
_ a b c [...] x y z
A B C [...] X Y Z
-------

Of course, there is a universal-character-name for each digit,
punctuation sign and so on, but since those are defined as specific
grammar items (i.e. "digit", "preprocessing-op-or-punc" and so on) I
assume that "one of universal-character-name" excludes those
characters by definition.

So then, does it mean that "universal-character-name" stands for [a
representation of] _any_ character other than those defined by other
parts of the grammar - even if they represent a digit in some other
language?

For instance, take the character äºŒ (two) - if missing, the glyph looks
like an equal sign "=", just for information.

That's a digit in Chinese, does C++ consider it digit or nondigit?

The short of it is, as James Kanze remarked other-thread today or was it
yesterday, that while formally C++ supports general Unicode in names, and did
that before Java, most compilers don't support that.

The characters accepted formally by C++ are the set defined by some ISO
standard, IIRC the used for e.g. JavaScript, and I believe also Java.

There's an appendix at the back of the standard that has some more info, but
essentially: don't use it, not even Western language characters such as Ã†Ã˜Ã….

Cheers & hth.,

- Alf

Francesco · Sep 6, 2009

* Francesco:

Hi there,
sorry for posting this as a separate thread but the other one started
with the wrong foot.

Click to expand...

After having posted (there) that C++ program with Chinese characters
used as identifiers, I begun to think: what if those identifiers
aren't really valid?

Click to expand...

Then I started my search for checking out whether that program was
really valid C++ as I Â prematurely claimed.

Click to expand...

Searching the web I wasn't able to find any source for clarifying this
issue - I was looking for some Unicode table classifying characters as
"digit", "alphabetic" and so on, and I wasn't able to find it - maybe
such a table doesn't even exist. I found an on-line interface to a
Chinese characters DB reporting codes, strokes classifications and so
on, but that's all about it.

Click to expand...

Then, browsing my copy of TC++PL I've dropped my eye on the grammar.

Click to expand...

An identifier is declared in this way:
-------
identifier:
Â Â nondigit
Â Â identifier nondigit
Â Â identifier digit
-------
and also:
-------
nondigit: one of
Â Â universal-character-name
Â Â _ a b c [...] x y z
Â Â Â A B C [...] X Y Z
-------

Click to expand...

Of course, there is a universal-character-name for each digit,
punctuation sign and so on, but since those are defined as specific
grammar items (i.e. "digit", "preprocessing-op-or-punc" and so on) I
assume that "one of universal-character-name" excludes those
characters by definition.

Click to expand...

So then, does it mean that "universal-character-name" stands for [a
representation of] _any_ character other than those defined by other
parts of the grammar - even if they represent a digit in some other
language?

Click to expand...

For instance, take the character äºŒ (two) - if missing, the glyph looks
like an equal sign "=", just for information.

Click to expand...

That's a digit in Chinese, does C++ consider it digit or nondigit?

Click to expand...

The short of it is, as James Kanze remarked other-thread today or was it
yesterday, that while formally C++ supports general Unicode in names, and did
that before Java, most compilers don't support that.

The characters accepted formally by C++ are the set defined by some ISO
standard, IIRC the used for e.g. JavaScript, and I believe also Java.

There's an appendix at the back of the standard that has some more info, but
essentially: don't use it, not even Western language characters such as Ã†Ã˜Ã….

Fine, I won't use them in real code.

The purpose of my post was to check if the code I posted with Chinese
identifiers was really valid - and once I was there, to completely
understand the point.

Now I see from your post that I should look out for that appendix in
order to clarify this point - I thought my reasoning above was enough
to assume "all characters except those otherwise specified by this
grammar" as valid identifier's characters. I'll dig the appendix.

Thanks a lot,
Francesco

James Kanze · Sep 6, 2009

[...]
It depends. It can't be used as part of a number, but it is
legal in an identifier (even as the first character of an
identifier).

Fine, I won't use them in real code.

In portable code. I think they work in VC++.

Francesco · Sep 7, 2009

Â Â [...]

It depends. Â It can't be used as part of a number, but it is
legal in an identifier (even as the first character of an
identifier).

Thank you for the confirmation, James, just what I was looking for to
tranquilize myself about the Chinese program I posted. About your "It
depends", I suppose you meant something about the fact that some
isdigit() function could return true on that character - which would
be good, I suppose.

In portable code. Â I think they work in VC++.

Oh yes, of course I should have written "in portable code", up there.

Thanks again,
Francesco

C language now truly universal	0	Jan 1, 2011
Generating valid identifiers	8	Jul 26, 2012
Atoms, Identifiers, and Primaries	21	Apr 17, 2013
Non-identifiers in dictionary keys for **expression syntax	3	May 23, 2013
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
Homework in C - Help Needed	1	Oct 16, 2024
parsley parsing question, how to make a variable grammar	0	Jun 13, 2014
'$' character (and others) within identifiers	7	Oct 22, 2006

C++ grammar: universal-character-name in identifiers

Francesco

Alf P. Steinbach

Francesco

James Kanze

Francesco

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads