Bartc said:
I thought this was some sort of joke, but yes, these are actually in my C99
document as macros.
But, why? Every C programmer knows and loves && and friends, why did the
standard have a sudden urge to be sensible and turn C into something it
isn't?
Once C had made the decision that they wanted to support variants of the
ISO-646 character set, they pretty much had to stick with it.
ISO-646 specifies a set of 7-bit character encodings which share most,
but not all, of their character set. US-ASCII (which in this context is
known as ISO-646-IRV) is one of them. All the other national variants of
ISO-646 must contain the same set of characters as ASCII, except that
0x23 could be either # or £, 0x24 could be either $ or ¤, and any of the
following characters:
@ [ \ ] ^ ` { | } ~
could be replaced by arbitrary other characters. For instance, the
Spanish-language variant replaces these with
§ ¡ Ñ ¿ ^ ` ° ñ ç ~
(pologies to those using readers lacking Unicode support) and opts to
use £ instead of #. EBCDIC, of course, lacked several of these
characters as well (though, I believe that it may also have lacked !,
which begs the question as to why that character was not given a
trigraph equivalent, and in fact is relied upon for the trigraph
equivalent of |).
ISO-646 variants were in fairly wide use in the days that limiting
character encodings to 7 bits was important. The above character
replacements obviously make a great deal of sense if you're typing in
Spanish (and note that you still can't get characters like é without a
terminal that supports composition via backspacing and typing a ' over
your e).
Of course, if you're trying to code C on a system that uses this
variant, it's a pain in the butt to code even small programs like:
#include <stdlib.h>
#include <stdio.h>
int
main(int argc, char *argv[])
{
int res1=0, res2;
if (argv[ 0 ] != NULL)
res1 = printf("%s: ", argv[ 0 ]);
res2 = printf("Hello, world!\n");
return (res1 < 0 || res2 < 0) ? EXIT_FAILURE : EXIT_SUCCESS;
}
That's the reason trigraphs were invented: to allow people to use the
set of characters that can be counted upon to exist virtually
everywhere, to make up for the lack of characters that C really needs.
It's not pretty, but thanks to trigraphs in the original C Standard, our
Spanish programmer can at least write the program:
??=include <stdlib.h>
??=include <stdio.h>
int
main(int argc, char *argv??(??))
??<
int res1=0, res2;
if (argv??( 0 ??) != NULL)
res1 = printf("%s: ", argv??( 0 ??));
res2 = printf("Hello, world!??/n");
return (res1 < 0 ??!??! res2 < 0) ? EXIT_FAILURE : EXIT_SUCCESS;
??>
(There's a good chance you need to explicitly activate trigraph support
on your implementation, say via a "-trigraphs" switch, to compile the
above. The above example was obviously contrived to use as many of the
variant charactrs as possible: at the very least, it could have been
written without any use of ??( or ??) ([ or ]).)
Great, so now our Spanish coder can write code. It's still butt-ugly,
though, and about as cumbersome to type as it is to read.
With C95, however, digraphs and the iso646.h header were added, which
drastically improved the readability of such programs:
%:include <iso646.h>
%:include <stdlib.h>
%:include <stdio.h>
int
main(int argc, char *argv<::>)
<%
int res1=0, res2;
if (argv<: 0 :> != NULL)
res1 = printf("%s: ", argv<: 0 :>);
res2 = printf("Hello, world!??/n");
return (res1 < 0 or res2 < 0) ? EXIT_FAILURE : EXIT_SUCCESS;
%>
Note that the only place where we still required an ugly trigraph, was
to represent the backslash character for "\n" in "Hello, world\n".
C++ also has all the words that are #defined (%:defined?
) in C's
And macros in lower-case too! (I suppose some attempt to avoid
user-namespace pollution.)
Namespace pollution is no issue, as anyone #including <iso646.h> is
writing new code (or at least adapting old code), and knows to avoid them.
Note that these are far from the only examples of lower-case macros
within the C standard.
I think though eyebrows would be raised if anyone actually posted code with
these 'operators' in. Especially mixed with the originals.
Mixed with the originals, yeah. And I think it's fair to say that
virtually no one is using these variants today; not when the friendlier
8-bit ISO-8859-* encodings are so pervasive. Japan still uses theirs, I
believe (which merely replaces \ with ¥), but not to the exclusion of
ASCII. It probably shows up mainly in the ISO-2022-JP encoding.
But not sure about the _eq versions: 'and=', 'or=' and so on would have been
adequate.
Well, no (and it would be bitor=, not or=). Consider the fact that ||,
|, ||= and |= are all distinct and complete tokens. A lexical scanner
that would translate | into a BITOR token, = into an ASSIGN token, and
|= into a BITOR_EQ token, would translate "bitor=" as BITOR ASSIGN,
because macro expansion happens after tokenization.