would it be possible to use minus in identifiers ?

Ike Naar · Jun 11, 2013

additional question:

would it be possible to allow spaces in numbers
9between decimal digits) with no downsides?
(i do not like to write int tab[100000]
and would prefer to write tab[100 000] and
so on)

seem that could be done with not affecting
anything else, but i am not sure

Some languages allow underscores in integer constants
for readability, so that one could write 100000 as 100_000.
C this is not possible.
But one can always write 100000 as 100*1000, or as 200*500,
or whatever seems convenient.

James Kuyper · Jun 11, 2013

"Maximal munch" doesn't care about what symbols exist or not. If "-" can be
part of an identifier name, then "b-c" must *always* represent a single
token "b-c", whether such an identifier exists or not.

The maximal munch rule (6.4p4) applies to the parsing of pre-processing
tokens during translation phase 4. In phase 7, each pre-processing token
is converted into a single token (5.1.1.2p7). fir's idea would require
that rule to be changed; a pre-processing token could end up being split
into multiple tokens, depending upon whether it, or part of it, matches
an identifier that is in scope at the point where the preprocessing
token is parsed. This would imply that conversion of preprocessing
tokens to tokens could not be done as a separate sub-phase, independent
of all later sub-phases, as I believe is currently the case (? I've
never implemented a compiler, I might be wrong about that).

Instead, parsing of declarations to fill in the symbol table would have
to be done in parallel with conversion of pre-processing tokens into
tokens. It could never be allowed to fall far enough behind to allow an
in-scope identifier to be missed. This would make it more difficult to
implement a C compiler efficiently; I would not be surprised if someone
managed to come up with a case where the parse would be ambiguous, using
fir's rule - but I haven't come up with any.

Keith Thompson · Jun 11, 2013

Ike Naar said:
additional question:

would it be possible to allow spaces in numbers
9between decimal digits) with no downsides?
(i do not like to write int tab[100000]
and would prefer to write tab[100 000] and
so on)

seem that could be done with not affecting
anything else, but i am not sure

Click to expand...

Some languages allow underscores in integer constants
for readability, so that one could write 100000 as 100_000.
C this is not possible.
But one can always write 100000 as 100*1000, or as 200*500,
or whatever seems convenient.

There's a potential trap there. 100000 is of whatever type is
big enough to hold it. 100*1000 is always of type int, and will
overflow on systems where int is 16 bits.

You can work around it by writing 100L*1000L -- but you could still
have problems with values too big to fit in a long int.

Keith Thompson · Jun 12, 2013

fir said:
additional question:

would it be possible to allow spaces in numbers
9between decimal digits) with no downsides?
(i do not like to write int tab[100000]
and would prefer to write tab[100 000] and
so on)

seem that could be done with not affecting
anything else, but i am not sure

I can't think of any problems off the top of my head, but I suspect
that if I spent some time on it I could come up with a contrived
but valid program that this change would break. I suppose it could
be handled similarly to string literal concatenation, which is done
in translation phase 6 -- but that would mean you can't use spaces
in integer literals in preprocessing directives, which are executed
in phase 4.

But there's ample precedent in other languages for allowing
underscores in integer literals, and I'd like to see that added
to C. (Maybe binary literals, 0b1100_1001, could be added at the
same time).

You'd have to define some rules for exactly where underscores can
be inserted. I'd allow them in both integer and floating-point
literals, but only between consecutive digits. I'm not sure
whether they should be allowed between digits in an exponent
(1.2e3_4); if your exponents are long enough that underscores make
them more readable, you're dealing with some really really big or
small numbers.

One corner case to consider: does allowing underscores only between
consecutive digits mean that 0x_12 (hex) is forbidden, but 0_12
(octal) is allowed? The leading 0 is more of a syntactic marker
than a digit. Come to think of it, a leading 0_ might have been
a better syntax for octal constants than just a leading 0.

Unfortunately, I think underscores in literals would conflict with
user-defined literals, a new feature in C++11. 123_456 already
has a meaning in C++11; it's equivalent to the function call
operator"_456"("123").

Michael Angelo Ravera · Jun 12, 2013

would it be possible to use minus sign in c

identifiers (I mean function names and variable

names for example

int my-foo()

{

}

int some-a-wariable = 10;

I mean would it be possible to write c

like compiler that would allow that or not?

I could without knowing it, risk the theorem

that it would be pososible but i do not know,

maybe some unavoidable syntax conflict would

arise (?) (or will they not?)

You would have a much easier time allowing whitespace within identifiers. You could probably invent a syntax that allowed anything whatever as part ofan identifier, if you were willing to delimit the identifier and either double the delimiter or escape it when you wanted to use it as part of the identifier.

int @this-identifier@;
int @@this-identifier@@;

could be made to work, but you would almost always have to delimit these.

Ike Naar · Jun 12, 2013

Ike Naar said:
Ike Naar said:

additional question:

would it be possible to allow spaces in numbers
9between decimal digits) with no downsides?
(i do not like to write int tab[100000]
and would prefer to write tab[100 000] and
so on)

seem that could be done with not affecting
anything else, but i am not sure

Click to expand...

Some languages allow underscores in integer constants
for readability, so that one could write 100000 as 100_000.
C this is not possible.
But one can always write 100000 as 100*1000, or as 200*500,
or whatever seems convenient.

Click to expand...

There's a potential trap there. 100000 is of whatever type is
big enough to hold it. 100*1000 is always of type int, and will
overflow on systems where int is 16 bits.

Oops, you are right, I had not thought about that.

To the OP: try to keep the number of unnamed numerical constants
(also known as "magic numbers") in the code to a minimum.
Give them names, and use those names instead of the numbers, e.g.

#define TABSIZE 100000

and then use TABSIZE in the rest of the program, instead of 100000.
This makes the code more readable and easier to maintain in case
you later decide to change that value to, say, 150000; you'll only
have to change it in one place.
As a bonus, when there are less instances of "100000" in the code,
there will be less need to write "100000" as "100 000" or "100_000".

Keith Thompson · Jun 12, 2013

Ike Naar said:
#define TABSIZE 100000

That's a heck of a big tab size!

Generating valid identifiers	8	Jul 26, 2012
Is it possible an iframe can overlapp another?	3	Apr 20, 2022
Atoms, Identifiers, and Primaries	21	Apr 17, 2013
Want to host websites that I will probably be the only user from home. Sacrilege, I know, but it has always been a dream of mine. Where do I start?	2	Aug 13, 2024
How do i get numberOfItemsHired to only accept 1-500 if it is outside those values error message should be displayed	10	Jul 5, 2024
Would Anyone Use a _FuncRetType Operator?	10	Feb 21, 2012
Should VHDL allow Unicode identifiers and comments	12	Jul 28, 2011
I try to make a stb stile one header file library and can't get it to work	1	Oct 24, 2024

would it be possible to use minus in identifiers ?

Ike Naar

James Kuyper

Keith Thompson

Keith Thompson

Michael Angelo Ravera

Ike Naar

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads