Joe Wright said:
... To the point, what about..
((unsigned)(c)+1)&(UCHAR_MAX*2+1)
..as the index? No conditional, no multiple use. Still Wrong?
Looks OK to me, on typical implementations (and if UCHAR_MAX is the
same as UINT_MAX we have problems implementing the entire library
anyway
). It has two negative consequences, though:
- It doubles (well, almost) the size of the table, which used to
only be 257 entries (for the typical EOF=-1 through the typical
UCHAR_MAX=255).
- It only works if you *also* make sure that EOF is defined as,
e.g., -129 on machines where plain char is signed.
If we assume that you are the (sole) implementor, *you* get to
define whether plain char is signed, and you get to #define EOF in
<stdio.h>. You also get to decide on the actual values of UCHAR_MAX,
SCHAR_MIN, and SCHAR_MAX; let us assume you go with the typical
255, -128, and 127.
If you then choose to make both:
char *p; ... isspace(*p++) ...
and
int c; ... isspace(c = getc(fp)) ...
work, you can do this more simply by:
a) in stdio.h, #define EOF -129
b) in ctype.h,
#define isspace(c) (__ctype_table[(c) + 129] & __CT_ISSPACE)
where __ctype_table is an array of size (255+129) or 384. (The
double underscore names are in your -- the implementor's -- reserved
namespace, so you can be sure no user has used them for anything.
No silly user would go and put "#ifndef __FOO_H / #define __FOO_H_ /
#endif" in a header file, would they?
)
Note that, for ctype.h macros, there are three cases:
- the user passes a plain (or explicitly signed) "char" value;
- the user passes a correctly-converted "unsigned char" value;
- the user passes a value obtained from the getc() family.
In the first case, the possible valid values are -128..127 (we know
this because we, the implementors, just *defined* CHAR_MIN and
CHAR_MAX, while writing the C compiler!). In the second case, the
possible valid values are 0..255 (again, *we* defined these when
we wrote the compiler). In the last case, the valid values are
{EOF = -129, 0..255} -- again, we defined EOF.
Note that if we choose to define EOF as -1, we will not be able
to tell, in our table lookup, an invocation of isspace(EOF) from
"char c = -1; isspace(c)". If character -1 is not a space, that
might be OK (because EOF is not a space either), but character -1
is often y-umlaut ("ÿ", if your Usenet client has not eaten it),
which should produce a true (nonzero) value for some of the is*
functions for which is*(EOF) must be false (zero).
Alternatively, instead of testing whether the user has written
correct EOF-handling code (i.e., has not assumed that EOF is defined
as -1), and allowing the user to get by with sloppy is*() calls,
we can write the ctype.h macros in the usual fashion, and test
whether the user has written correct is*() calls while letting the
user get by with sloppy EOF-handling code. This shrinks our table
back from 384 entries to 257, and makes incorrect C code break on
*our* machine (whatever it is) in the same cases where it breaks
on Intel machines running Microsoftware, instead of breaking
different incorrect C code. (And: guess which breakage people
accept more easily....
)