Is there any GENRIC MACROS in c for INTEGERS,CHARACTERS ?

R

Richard Bos

Keith Thompson said:
The only case I can think of where it makes a real difference is
isspace(EOF), which I don't find particularly useful. (And of course
all this applies equally to the rest of the is*() functions.)

It can be useful, for example, in situations like

while (isspace(getchar())) ;

Admittedly, it's rather more useful in things like isalnum(), and even
more so in tolower(fgetc()) (think case-insensitive indexing, for
example). It wouldn't be a good idea, in any case, to give isspace() a
different interface from the other <ctype.h> functions.

Richard
 
E

Eric Sosman

Joe said:
[...]
But, is*(int c) makes it possible to accept EOF outside the
character domain and characters as well. But the macro ..

#define is*(c) (ctype[((c)&255)+1] & *)

.. limits the index 0..256 regardless the int value of c.

Actually, it limits the index to 1..255; ctype[0]
will never be used. Are you sure you've transliterated
the macro correctly?
So, my original point, casting the argument to is*() to unsigned
char serves no purpose. You don't need to do it. Ever.

Wrong. R-O-N-G, wrong. See up-thread for detailed
explanations that I don't see any use in repeating here;
if you didn't understand them the first time, you won't
understand them this time either. Just take it on faith:
You're Fire-- er, You're Wrong.
 
E

Eric Sosman

Eric said:
Joe said:
[...]
But, is*(int c) makes it possible to accept EOF outside the
character domain and characters as well. But the macro ..

#define is*(c) (ctype[((c)&255)+1] & *)

.. limits the index 0..256 regardless the int value of c.


Actually, it limits the index to 1..255;

Hmmm. "Actually actually," that should be 1..256.
Sorry for the confusion.
 
C

CBFalconer

Thomas said:
Well, a succubus is a type of demon. I am just speculating of
course. ;)

I always considered it to be a poorly implemented peripheral
communications scheme for micro-computers. :)
 
K

Keith Thompson

It can be useful, for example, in situations like

while (isspace(getchar())) ;

Admittedly, it's rather more useful in things like isalnum(), and even
more so in tolower(fgetc()) (think case-insensitive indexing, for
example).

I'd prefer to check for EOF before passing the result to isspace(),
but I suppose you could squeeze out a few cycles by doing it in one
fell swoop. De gustibus et cetera.
It wouldn't be a good idea, in any case, to give isspace() a
different interface from the other <ctype.h> functions.

Agreed; I woulnd't suggest such a thing. (I meant isspace() as an
example covering all the is*() functions, a point I could have made
more clearly.)
 
J

Joe Wright

Eric said:
Joe said:
[...]
But, is*(int c) makes it possible to accept EOF outside the
character domain and characters as well. But the macro ..

#define is*(c) (ctype[((c)&255)+1] & *)

.. limits the index 0..256 regardless the int value of c.


Actually, it limits the index to 1..255; ctype[0]
will never be used. Are you sure you've transliterated
the macro correctly?

So, my original point, casting the argument to is*() to unsigned
char serves no purpose. You don't need to do it. Ever.


Wrong. R-O-N-G, wrong. See up-thread for detailed
explanations that I don't see any use in repeating here;
if you didn't understand them the first time, you won't
understand them this time either. Just take it on faith:
You're Fire-- er, You're Wrong.

Well yes, I am. I just wrote a little program to prove that I was
Right and the program says that I'm Wrong.

If any of you feel I've wasted your time, I apologize.

I'm going to study the 'problem' and its solution and report back
here, if anyone will still listen to me.

Sorry I was Wrong. It doesn't happen often (I hope).
 
C

CBFalconer

Keith said:
(e-mail address removed) (Richard Bos) writes:
.... snip ...

I'd prefer to check for EOF before passing the result to
isspace(), but I suppose you could squeeze out a few cycles by
doing it in one fell swoop. De gustibus et cetera.

On the contrary, you may well want to use something like:

int getnonblank(FILE *f)
{
int ch;

while (isspace(ch = getc(f))) continue;
return ch;
}

and let the caller worry about EOF. I can see this called by:

int ch;

ch = getnonblank(f);
while (isdigit(ch)) {
/* process ch */
ch = getc(f);
return ch;

and the caller of that can still handle EOF conditions.
 
J

Joe Wright

Joe said:
Eric said:
Joe said:
[...]
But, is*(int c) makes it possible to accept EOF outside the character
domain and characters as well. But the macro ..

#define is*(c) (ctype[((c)&255)+1] & *)

.. limits the index 0..256 regardless the int value of c.



Actually, it limits the index to 1..255; ctype[0]
will never be used. Are you sure you've transliterated
the macro correctly?

So, my original point, casting the argument to is*() to unsigned char
serves no purpose. You don't need to do it. Ever.



Wrong. R-O-N-G, wrong. See up-thread for detailed
explanations that I don't see any use in repeating here;
if you didn't understand them the first time, you won't
understand them this time either. Just take it on faith:
You're Fire-- er, You're Wrong.

Well yes, I am. I just wrote a little program to prove that I was Right
and the program says that I'm Wrong.

If any of you feel I've wasted your time, I apologize.

I'm going to study the 'problem' and its solution and report back here,
if anyone will still listen to me.

Sorry I was Wrong. It doesn't happen often (I hope).

But in my own defense, I wasn't sure why I was wrong. Here's what I
found..

1. I did transliterate the macro correctly from a 1995 version of
the code..

#define isspace(c) (ctype[((c)&255)+1] & ISSPACE)

...and I agree it is broken. In a 1998 (and current) version we have..

#define isspace(c) (ctype[(int)(c)+1] & ISSPACE)

...which will accept EOF correctly (the first one didn't) but can
cause all kinds of havoc if other values are not in the range of
0..255. I brought this up with the author of the macro. He said the
Standard required me to present a value 0..255 and if I didn't and
my code blew up, shame on me. So I fixed it..

#define isspace(c) \
(ctype[(unsigned)(c) > 255 ? 0 : ((c)+1)] & ISSPACE)

...and presented it to him. He rejected it out of hand because the
conditional would be too much of a performance hit.

Now I understand better how and why I was wrong. Broken code is OK
if it's fast, even if you have to require your user to perform
unnatural acts.

Merry Christmas
 
E

Eric Sosman

Joe said:
[...] So I fixed it..

#define isspace(c) \
(ctype[(unsigned)(c) > 255 ? 0 : ((c)+1)] & ISSPACE)

..and presented it to him. He rejected it out of hand because the
conditional would be too much of a performance hit.

Performance aside, it suffers from a problem frequently
encountered with macros: it may evaluate its argument more
than once. This is Very Bad if the argument has side-effects:

if (isspace(*p++)) ...

if (isspace(ch = getchar())) ...

You might enjoy reading P.J. Plauger's "The Standard C
Library" for an exposition of the considerations that go into
implementing these and other (C90) Standard library functions.
I found it both entertaining and educational.
 
J

Joe Wright

Eric said:
Joe said:
[...] So I fixed it..

#define isspace(c) \
(ctype[(unsigned)(c) > 255 ? 0 : ((c)+1)] & ISSPACE)

..and presented it to him. He rejected it out of hand because the
conditional would be too much of a performance hit.


Performance aside, it suffers from a problem frequently
encountered with macros: it may evaluate its argument more
than once. This is Very Bad if the argument has side-effects:

if (isspace(*p++)) ...

if (isspace(ch = getchar())) ...

You might enjoy reading P.J. Plauger's "The Standard C
Library" for an exposition of the considerations that go into
implementing these and other (C90) Standard library functions.
I found it both entertaining and educational.

I'm sure I would learn lots from it. P.J. is a hero. To the point,
what about..

((unsigned)(c)+1)&(UCHAR_MAX*2+1)

...as the index? No conditional, no multiple use. Still Wrong?
 
K

Keith Thompson

Joe Wright said:
In a 1998 (and current) version we have..

#define isspace(c) (ctype[(int)(c)+1] & ISSPACE)

..which will accept EOF correctly (the first one didn't) but can cause
all kinds of havoc if other values are not in the range of 0..255. I
brought this up with the author of the macro. He said the Standard
required me to present a value 0..255 and if I didn't and my code blew
up, shame on me.

He's right (assuming UCHAR_MAX==255).

But note that ISSPACE is in the user's namespace; if your
implementation uses this definition (and not something like _ISSPACE),
it's non-conforming.
 
C

Chris Torek

... To the point, what about..

((unsigned)(c)+1)&(UCHAR_MAX*2+1)

..as the index? No conditional, no multiple use. Still Wrong?

Looks OK to me, on typical implementations (and if UCHAR_MAX is the
same as UINT_MAX we have problems implementing the entire library
anyway :) ). It has two negative consequences, though:

- It doubles (well, almost) the size of the table, which used to
only be 257 entries (for the typical EOF=-1 through the typical
UCHAR_MAX=255).

- It only works if you *also* make sure that EOF is defined as,
e.g., -129 on machines where plain char is signed.

If we assume that you are the (sole) implementor, *you* get to
define whether plain char is signed, and you get to #define EOF in
<stdio.h>. You also get to decide on the actual values of UCHAR_MAX,
SCHAR_MIN, and SCHAR_MAX; let us assume you go with the typical
255, -128, and 127.

If you then choose to make both:

char *p; ... isspace(*p++) ...
and
int c; ... isspace(c = getc(fp)) ...

work, you can do this more simply by:

a) in stdio.h, #define EOF -129
b) in ctype.h,
#define isspace(c) (__ctype_table[(c) + 129] & __CT_ISSPACE)

where __ctype_table is an array of size (255+129) or 384. (The
double underscore names are in your -- the implementor's -- reserved
namespace, so you can be sure no user has used them for anything.
No silly user would go and put "#ifndef __FOO_H / #define __FOO_H_ /
#endif" in a header file, would they? :) )

Note that, for ctype.h macros, there are three cases:

- the user passes a plain (or explicitly signed) "char" value;
- the user passes a correctly-converted "unsigned char" value;
- the user passes a value obtained from the getc() family.

In the first case, the possible valid values are -128..127 (we know
this because we, the implementors, just *defined* CHAR_MIN and
CHAR_MAX, while writing the C compiler!). In the second case, the
possible valid values are 0..255 (again, *we* defined these when
we wrote the compiler). In the last case, the valid values are
{EOF = -129, 0..255} -- again, we defined EOF.

Note that if we choose to define EOF as -1, we will not be able
to tell, in our table lookup, an invocation of isspace(EOF) from
"char c = -1; isspace(c)". If character -1 is not a space, that
might be OK (because EOF is not a space either), but character -1
is often y-umlaut ("ÿ", if your Usenet client has not eaten it),
which should produce a true (nonzero) value for some of the is*
functions for which is*(EOF) must be false (zero).

Alternatively, instead of testing whether the user has written
correct EOF-handling code (i.e., has not assumed that EOF is defined
as -1), and allowing the user to get by with sloppy is*() calls,
we can write the ctype.h macros in the usual fashion, and test
whether the user has written correct is*() calls while letting the
user get by with sloppy EOF-handling code. This shrinks our table
back from 384 entries to 257, and makes incorrect C code break on
*our* machine (whatever it is) in the same cases where it breaks
on Intel machines running Microsoftware, instead of breaking
different incorrect C code. (And: guess which breakage people
accept more easily.... :) )
 
J

Joe Wright

Chris said:
Joe Wright said:
... To the point, what about..

((unsigned)(c)+1)&(UCHAR_MAX*2+1)

..as the index? No conditional, no multiple use. Still Wrong?


Looks OK to me, on typical implementations (and if UCHAR_MAX is the
same as UINT_MAX we have problems implementing the entire library
anyway :) ). It has two negative consequences, though:

- It doubles (well, almost) the size of the table, which used to
only be 257 entries (for the typical EOF=-1 through the typical
UCHAR_MAX=255).

- It only works if you *also* make sure that EOF is defined as,
e.g., -129 on machines where plain char is signed.

If we assume that you are the (sole) implementor, *you* get to
define whether plain char is signed, and you get to #define EOF in
<stdio.h>. You also get to decide on the actual values of UCHAR_MAX,
SCHAR_MIN, and SCHAR_MAX; let us assume you go with the typical
255, -128, and 127.

If you then choose to make both:

char *p; ... isspace(*p++) ...
and
int c; ... isspace(c = getc(fp)) ...

work, you can do this more simply by:

a) in stdio.h, #define EOF -129
b) in ctype.h,
#define isspace(c) (__ctype_table[(c) + 129] & __CT_ISSPACE)

where __ctype_table is an array of size (255+129) or 384. (The
double underscore names are in your -- the implementor's -- reserved
namespace, so you can be sure no user has used them for anything.
No silly user would go and put "#ifndef __FOO_H / #define __FOO_H_ /
#endif" in a header file, would they? :) )

Note that, for ctype.h macros, there are three cases:

- the user passes a plain (or explicitly signed) "char" value;
- the user passes a correctly-converted "unsigned char" value;
- the user passes a value obtained from the getc() family.

In the first case, the possible valid values are -128..127 (we know
this because we, the implementors, just *defined* CHAR_MIN and
CHAR_MAX, while writing the C compiler!). In the second case, the
possible valid values are 0..255 (again, *we* defined these when
we wrote the compiler). In the last case, the valid values are
{EOF = -129, 0..255} -- again, we defined EOF.

Note that if we choose to define EOF as -1, we will not be able
to tell, in our table lookup, an invocation of isspace(EOF) from
"char c = -1; isspace(c)". If character -1 is not a space, that
might be OK (because EOF is not a space either), but character -1
is often y-umlaut ("ÿ", if your Usenet client has not eaten it),
which should produce a true (nonzero) value for some of the is*
functions for which is*(EOF) must be false (zero).

Alternatively, instead of testing whether the user has written
correct EOF-handling code (i.e., has not assumed that EOF is defined
as -1), and allowing the user to get by with sloppy is*() calls,
we can write the ctype.h macros in the usual fashion, and test
whether the user has written correct is*() calls while letting the
user get by with sloppy EOF-handling code. This shrinks our table
back from 384 entries to 257, and makes incorrect C code break on
*our* machine (whatever it is) in the same cases where it breaks
on Intel machines running Microsoftware, instead of breaking
different incorrect C code. (And: guess which breakage people
accept more easily.... :) )

The array remains at 257 of unsigned short. (UCHAR_MAX*2+1) is a
mask of 9 bits instead of 8 so as to accommodate 256. All the values
of interest are among -1..255 contiguous. Casting unsigned and
adding 1 we get 0..256 which is exactly what we want.

All valid values to is*() are EOF or 0..UCHAR_MAX. The comments
about -129 et al makes my head hurt. The characters of interest are
of type char and is positive. On all my systems I have ASCII
characters with value 0..127 (7 bits) so the fact that char is
signed is of no consequence. The 'other' character set is EBCDIC
(0..255 8 bits). Such a system usually has char unsigned. In any
case, EBCDIC is defined in 256 bytes and with EOF -1 fits our model.
 
D

Dave Thompson

Let's assume CHAR_BIT==8, and plain char is signed. Suppose one of
the arguments contains the character '\xe9' (233 decimal). As a
signed character, its value is -23. isdigit(-23) invokes undefined
behavior. (A given implementation may define isdigit() in such a way
that it doesn't cause any problems, but it's still undefined behavior.)

Changing the condition
isdigit(s)
to either
isdigit((unsigned char)s)
or
isdigit((unsigned)s)
avoids the undefined behavior.


The former does. The latter gives you isdigit( USHRT_MAX+1 -23 ) where
USHRT_MAX is at least 65535 and thus way out of the range of 8-bit
unsigned char which is 255.

The other safe form is isdigit( * (unsigned char*) &s ) .

- David.Thompson1 at worldnet.att.net
 
K

Keith Thompson

Dave Thompson said:
[...]
Changing the condition
isdigit(s)
to either
isdigit((unsigned char)s)
or
isdigit((unsigned)s)
avoids the undefined behavior.


The former does. The latter gives you isdigit( USHRT_MAX+1 -23 ) where
USHRT_MAX is at least 65535 and thus way out of the range of 8-bit
unsigned char which is 255.


Whoops, you're right.
 
C

Chris Torek

All valid values to is*() are EOF or 0..UCHAR_MAX. The comments
about -129 et al makes my head hurt.

I thought your entire goal here was to make incorrect C code of
the form:

char *p;
...
for (p = buf; *p != '\0'; p++)
*p = toupper(*p); /* WRONG */

"work right".
On all my systems I have ASCII
characters with value 0..127 (7 bits) so the fact that char is
signed is of no consequence. The 'other' character set is EBCDIC
(0..255 8 bits).

On *my* systems I have ISO-Latin-1, and someone might type in his
name as "Pádraig" (P, a-with-accent-acute, d, r, a, i, g). The
second character, when inspected via *p, has value -31.

If all you want is to make *correct* C code work, just require the
user to write:

for (p = buf; *p != '\0'; p++)
*p = toupper((unsigned char)*p); /* RIGHT */

in the first place, and there is no need for any masking -- toupper()
can just use:

#define toupper(c) (__ctype_map_to_upper[(c) + 1])

(with similar mask-free code for the is* macros). This is the
original <ctype.h> code to which you apparently objected.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,157
Messages
2,570,879
Members
47,413
Latest member
KeiraLight

Latest Threads

Top