QoI issue: silencing a warning

N

Noob

[ NOTE : cross-posted to comp.lang.c and comp.unix.programmer,
please trim as you see fit ]

Hello,

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}

$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]

On this platform, isdigit seems to be a (complex) macro, and the preprocessor
digests my function into...

int foo(const char *ext)
{
int ext_NNN = (((__ctype_ptr__+sizeof(""[ext[1]]))[(int)(ext[1])])&04) && (((__ctype_ptr__+sizeof(""[ext[2]]))[(int)(ext[2])])&04) && (((__ctype_ptr__+sizeof(""[ext[3]]))[(int)(ext[3])])&04);
return ext_NNN;
}

Sweet baby Jesus... IOCCC winner here.

If I want to make the warnings disappear, I see two obvious work-arounds:

1) Set -Wno-char-subscripts
2) Cast isdigit's parameter to int (or to unsigned or to unsigned char)
3) Something else?

My questions :

- Is something wrong with my code, or is this just a QoI issue?

- How would you silence the warnings?

Regards.
 
S

Stephen Sprunk

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}

$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
...
If I want to make the warnings disappear, I see two obvious work-arounds:

1) Set -Wno-char-subscripts
2) Cast isdigit's parameter to int (or to unsigned or to unsigned char)
3) Something else?

My questions :

- Is something wrong with my code, or is this just a QoI issue?

- How would you silence the warnings?

isdigit() is defined as taking an int argument, not a char; cast your
argument to int and the warnings will disappear.

S
 
R

Richard Kettlewell

Noob said:
My questions :

- Is something wrong with my code, or is this just a QoI issue?

The bug is in your code. Have a look at the specification of isdigit
(not the implementation!); you will see that its parameter must either
be EOF or the value of an unsigned char.
 
E

Eric Sosman

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}
[...]
- Is something wrong with my code, or is this just a QoI issue?

- How would you silence the warnings?

isdigit() is defined as taking an int argument, not a char; cast your
argument to int and the warnings will disappear.

Almost right, but not quite. The <ctype.h> functions take
int arguments, as you say, but their values must be either EOF
or (and this is the important bit) the value of an *unsigned*
char as converted to int. So the correct cast is

int ext_NNN = isdigit((unsigned char) ext[1]) ...

and so on.

It is tempting to save some typing by using an `unsigned char*'
in the first place, as in

int foo(const char *ext) {
const unsigned char *uxt = (const unsigned char*) ext;
int ext_NNN = isdigit(uxt[1]) ...

I *think* this is okay, but type-punning always makes me just a
little bit queasy, so I'd prefer (and recommend) the first option.
 
N

Noob

Richard said:
The bug is in your code. Have a look at the specification of isdigit
(not the implementation!); you will see that its parameter must either
be EOF or the value of an unsigned char.

So it is wrong to write:

char foo[] = "abcd";
isdigit(foo[1]);


Lemme see...

7.4 Character handling <ctype.h>
The header <ctype.h> declares several functions useful for classifying
and mapping characters. In all cases the argument is an int, the value of
which shall be representable as an unsigned char or shall equal the value
of the macro EOF. If the argument has any other value, the behavior is
undefined.


I don't understand your objection, in light of parameter conversion.
(IIUC, the plain char argument is implicitly converted to an int.)

6.5.2.2 Function calls
Semantics
7 If the expression that denotes the called function has a type that does include a prototype,
the arguments are implicitly converted, as if by assignment, to the types of the
corresponding parameters, taking the type of each parameter to be the unqualified version
of its declared type. The ellipsis notation in a function prototype declarator causes
argument type conversion to stop after the last declared parameter. The default argument
promotions are performed on trailing arguments.

What did I miss?

Regards.
 
K

Keith Thompson

Noob said:
[ NOTE : cross-posted to comp.lang.c and comp.unix.programmer,
please trim as you see fit ]

Hello,

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}
[snip]

N1570 7.4p1:
The header <ctype.h> declares several functions useful for
classifying and mapping characters. In all cases the argument
is an int, the value of which shall be representable as an
unsigned char or shall equal the value of the macro EOF. If
the argument has any other value, the behavior is undefined.

Plain char may be a signed type, which means that a value of type char
may be negative (and unequal to EOF), causing undefined behavior when
it's passed to isdigit().

Just cast the argument to unsigned char:

int ext_NNN = isdigit((unsigned char)ext[1]) &&
isdigit((unsigned char)ext[2]) &&
isdigit((unsigned char)ext[3]);

You're probably about to say that it's ridiculous that a function
intended to operate on characters can't properly handle an argument of
type char -- and I agree. I think it would have made more sense for
isdigit() and friends to take an argument of type (plain) char. The
problem is that these function are required to handle the value EOF
(typically -1), which is *also* a value char value (if plain char is
signed). I'm not convinced that permitting EOF as an argument was
worthwhile, but we're stuck with it.

On the other hand, when -1 is a valid char value, it typically
*isn't* a digit, or a lowercase or uppercase letter, etc.
$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]

That's a reasonable warning, since char values can be negative and a
negative subscript value is usually an error. (Not always; you could be
indexing from a pointer into the middle of an array.)
 
C

Casper H.S. Dik

Noob said:
I don't understand your objection, in light of parameter conversion.
(IIUC, the plain char argument is implicitly converted to an int.)


Because it doesn't work when "char" is signed; the char \x255
would be converted to -127 and that is not a valid paramter
to isdigit.

Casper
 
N

Noob

Keith said:
Noob said:
[ NOTE : cross-posted to comp.lang.c and comp.unix.programmer,
please trim as you see fit ]

Hello,

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}
[snip]

N1570 7.4p1:
The header <ctype.h> declares several functions useful for
classifying and mapping characters. In all cases the argument
is an int, the value of which shall be representable as an
unsigned char or shall equal the value of the macro EOF. If
the argument has any other value, the behavior is undefined.

Plain char may be a signed type, which means that a value of type char
may be negative (and unequal to EOF), causing undefined behavior when
it's passed to isdigit().

OK, I think I see the problem.

1. If my platform's plain chars are signed (which they are)
2. and if I pass a value greater than CHAR_MAX (e.g. '©')
3. and if the implementation of isdigit uses an array

then a negative index sends it into la-la-land.
Just cast the argument to unsigned char:

int ext_NNN = isdigit((unsigned char)ext[1]) &&
isdigit((unsigned char)ext[2]) &&
isdigit((unsigned char)ext[3]);

I think I'll write an isdigit wrapper (or even roll my own).

int myisdigit(char c)
{
return '0' <= c && c <= '9';
}

I think that's a portable solution, right?
You're probably about to say that it's ridiculous that a function
intended to operate on characters can't properly handle an argument of
type char -- and I agree.

You read my mind. This is insane.
I think it would have made more sense for
isdigit() and friends to take an argument of type (plain) char. The
problem is that these function are required to handle the value EOF
(typically -1), which is *also* a value char value (if plain char is
signed). I'm not convinced that permitting EOF as an argument was
worthwhile, but we're stuck with it.

On the other hand, when -1 is a valid char value, it typically
*isn't* a digit, or a lowercase or uppercase letter, etc.
$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]

That's a reasonable warning, since char values can be negative and a
negative subscript value is usually an error. (Not always; you could be
indexing from a pointer into the middle of an array.)

The warning is good. I was surprised they used an array, and that
isdigit was a macro. But it helped catch the problem, so... great!

Regards.
 
J

James Kuyper

Richard Kettlewell wrote: ....
The bug is in your code. Have a look at the specification of isdigit
(not the implementation!); you will see that its parameter must either
be EOF or the value of an unsigned char.

So it is wrong to write:

char foo[] = "abcd";
isdigit(foo[1]);


Lemme see...

7.4 Character handling <ctype.h>
The header <ctype.h> declares several functions useful for classifying
and mapping characters. In all cases the argument is an int, the value of
which shall be representable as an unsigned char or shall equal the value
of the macro EOF. If the argument has any other value, the behavior is
undefined.


I don't understand your objection, in light of parameter conversion.
(IIUC, the plain char argument is implicitly converted to an int.)

Yes, but plain char can be signed, so values thereof can be negative.
While the values of the basic execution character set are guaranteed to
be non-negative, that is not guaranteed for the extended character set.
When a negative char value is converted to int, its value remains
unchanged, and since it is negative, is inherently incapable of being
represented as an unsigned char.
 
K

Keith Thompson

Noob said:
Richard said:
The bug is in your code. Have a look at the specification of isdigit
(not the implementation!); you will see that its parameter must either
be EOF or the value of an unsigned char.

So it is wrong to write:

char foo[] = "abcd";
isdigit(foo[1]);

That's an interesting corner case that's actually guaranteed to work
correctly, but it's still a bad idea.
Lemme see...

7.4 Character handling <ctype.h>
The header <ctype.h> declares several functions useful for classifying
and mapping characters. In all cases the argument is an int, the value of
which shall be representable as an unsigned char or shall equal the value
of the macro EOF. If the argument has any other value, the behavior is
undefined.

I don't understand your objection, in light of parameter conversion.
(IIUC, the plain char argument is implicitly converted to an int.)

Yes, the plain char argument is implicitly converted to int,
so isdigit(foo[1]) passes an argument of the correct *type*.
The problem is that, in general, it's not guaranteed to be a valid
*value*.

Plain char may be either signed or unsigned. If it's signed, then
a char value can be negative, resulting in a negative int value
when it's converted from char to int. Unless the value happens to
be equal to EOF, passing it to isdigit() has undefined behavior.
(And if it does happen to be equal to EOF, then you're probably
not doing what you want to do -- though you'll almost certainly get
the correct result anyway, since (char)EOF is not a decimal digit
in any character set I've ever heard of.)

But as for your example:

char foo[] = "abcd";
isdigit(foo[1]);

the character 'b' is a member of the basic execution character set,
and all such characters are guaranteed to have nonnegative values
(N1570 6.2.5p3).

Nevertheless, just for the sake of consistency, you should always
cast a char value to unsigned char before passing it any of the is*()
or to*() functions declared in <ctype.h>.
 
J

James Kuyper

Keith said:
#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}
[snip]

N1570 7.4p1:
The header <ctype.h> declares several functions useful for
classifying and mapping characters. In all cases the argument
is an int, the value of which shall be representable as an
unsigned char or shall equal the value of the macro EOF. If
the argument has any other value, the behavior is undefined.

Plain char may be a signed type, which means that a value of type char
may be negative (and unequal to EOF), causing undefined behavior when
it's passed to isdigit().

OK, I think I see the problem.

1. If my platform's plain chars are signed (which they are)
2. and if I pass a value greater than CHAR_MAX (e.g. '�')

It's UCHAR_MAX that matters, not CHAR_MAX, and in either case, that's
not an possible problem with your code. For valid values of i, ext <=
CHAR_MAX, and CHAR_MAX < UCHAR_MAX.
It's the possibility that ext < 0 that is the problem.
3. and if the implementation of isdigit uses an array

then a negative index sends it into la-la-land.

That conclusion is valid, even though it conflicts with the premise you
used to reach that conclusion.
Just cast the argument to unsigned char:

int ext_NNN = isdigit((unsigned char)ext[1]) &&
isdigit((unsigned char)ext[2]) &&
isdigit((unsigned char)ext[3]);

I think I'll write an isdigit wrapper (or even roll my own).

int myisdigit(char c)
{
return '0' <= c && c <= '9';
}

I think that's a portable solution, right?

Rolling your own does work for isdigit(), but this is NOT an ideal
solution for most of the other <ctype.h> functions, because their
behavior is locale-dependent. It's better to use a simple wrapper:

inline int myisupper(char c) { return isupper((unsigned char)c); }
 
K

Keith Thompson

Casper H.S. Dik said:
Because it doesn't work when "char" is signed; the char \x255
would be converted to -127 and that is not a valid paramter
to isdigit.

(char)255 (which would be '\xff') would most likely be converted to -1.

And if EOF == -1, which is usually the case, then it *is* a valid
argument to isdigit, which just adds to the confusion. But negative
char values in general are not valid arguments to isdigit(), and can
cause undefined behavior.

IMHO forcing plain char to be unsigned would make a lot more sense. The
signedness of plain char was made implementation-defined for performance
reasons; I don't know to what extent those reasons still apply.
 
C

Casper H.S. Dik

(char)255 (which would be '\xff') would most likely be converted to -1.

Sorry, I completely failed in binary math.
And if EOF == -1, which is usually the case, then it *is* a valid
argument to isdigit, which just adds to the confusion. But negative
char values in general are not valid arguments to isdigit(), and can
cause undefined behavior.
IMHO forcing plain char to be unsigned would make a lot more sense. The
signedness of plain char was made implementation-defined for performance
reasons; I don't know to what extent those reasons still apply.

Yeah; unfortunately you can't change that as it is now encoded in
ABIs.

Casper
 
K

Keith Thompson

Noob said:
Keith said:
Noob said:
[ NOTE : cross-posted to comp.lang.c and comp.unix.programmer,
please trim as you see fit ]

Hello,

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}
[snip]

N1570 7.4p1:
The header <ctype.h> declares several functions useful for
classifying and mapping characters. In all cases the argument
is an int, the value of which shall be representable as an
unsigned char or shall equal the value of the macro EOF. If
the argument has any other value, the behavior is undefined.

Plain char may be a signed type, which means that a value of type char
may be negative (and unequal to EOF), causing undefined behavior when
it's passed to isdigit().

OK, I think I see the problem.

1. If my platform's plain chars are signed (which they are)
2. and if I pass a value greater than CHAR_MAX (e.g. '©')
3. and if the implementation of isdigit uses an array

then a negative index sends it into la-la-land.

A value of type char cannot be greater than CHAR_MAX.

The problem is that '©' is very likely *negative* on your system.

That's assuming you're using something like the 8-bit Latin-1 encoding.
My system is conigured to use UTF-8 by default, so when I enter a
literal '©' into a program using my text editor, I get a warning about a
multi-character character constant.

But you're more likely to get such a value by reading data from some
outside source rather than having it in a string or character literal.
Just cast the argument to unsigned char:

int ext_NNN = isdigit((unsigned char)ext[1]) &&
isdigit((unsigned char)ext[2]) &&
isdigit((unsigned char)ext[3]);

I think I'll write an isdigit wrapper (or even roll my own).

int myisdigit(char c)
{
return '0' <= c && c <= '9';
}

I think that's a portable solution, right?

I believe so -- but some of the is*() functions have locale-specific
behavior. If you want to wrap the is*() functions, I suggest using a
cast:

int myisdigit(char c) {
return isdigit((unsigned char)c);
}

You could even use a macro:

#define ISDIGIT(c) isdigit((unsigned char)(c))

[...]
The warning is good. I was surprised they used an array, and that
isdigit was a macro. But it helped catch the problem, so... great!

Any standard library function can be implemented as a macro.
The is*() and to*() functions declared in <ctype.h> happen to be
fairly easy to implement that way.
 
K

Keith Thompson

Noob said:
OK, I think I see the problem.

1. If my platform's plain chars are signed (which they are)
2. and if I pass a value greater than CHAR_MAX (e.g. '©')
3. and if the implementation of isdigit uses an array

then a negative index sends it into la-la-land.

Apart from the "greater than CHAR_MAX" part, that's an accurate
description of the way the problem shows up on your system.
You attempted to index before the beginning of the array, not past
its end.

More generally, passing any value that does not satisfy:

(arg >= 0 && arg <= UCHAR_MAX) || arg == EOF

has undefined behavior.
 
D

David Brown

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}
[...]
- Is something wrong with my code, or is this just a QoI issue?

- How would you silence the warnings?

isdigit() is defined as taking an int argument, not a char; cast your
argument to int and the warnings will disappear.

Almost right, but not quite. The <ctype.h> functions take
int arguments, as you say, but their values must be either EOF
or (and this is the important bit) the value of an *unsigned*
char as converted to int. So the correct cast is

int ext_NNN = isdigit((unsigned char) ext[1]) ...

and so on.

It is tempting to save some typing by using an `unsigned char*'
in the first place, as in

int foo(const char *ext) {
const unsigned char *uxt = (const unsigned char*) ext;
int ext_NNN = isdigit(uxt[1]) ...

I *think* this is okay, but type-punning always makes me just a
little bit queasy, so I'd prefer (and recommend) the first option.

An alternative could be to use the "-funsigned-char" option to make
plain char unsigned. But I don't know what other consequences that
might have on x86.
 
I

Ian Collins

Noob said:
[ NOTE : cross-posted to comp.lang.c and comp.unix.programmer,
please trim as you see fit ]

Hello,

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}

$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]

Whatever the implementation of isdigit, why would a char subscript be
worthy of a diagnostic? Yes char is usually signed, but so are short,
int and long.
 
B

Barry Margolin

Ian Collins said:
Noob said:
[ NOTE : cross-posted to comp.lang.c and comp.unix.programmer,
please trim as you see fit ]

Hello,

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}

$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]

Whatever the implementation of isdigit, why would a char subscript be
worthy of a diagnostic? Yes char is usually signed, but so are short,
int and long.

Possibly to catch specifically this error.
 
K

Keith Thompson

Ian Collins said:
Noob said:
[ NOTE : cross-posted to comp.lang.c and comp.unix.programmer,
please trim as you see fit ]

Hello,

My compiler (gcc 4.7) is being a little fussy about the following code:
(trimmed to a minimum)

#include <ctype.h>
int foo(const char *ext)
{
int ext_NNN = isdigit(ext[1]) && isdigit(ext[2]) && isdigit(ext[3]);
return ext_NNN;
}

$ gcc -Wall -c tutu.c
tutu.c: In function 'foo':
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]
tutu.c:4:3: warning: array subscript has type 'char' [-Wchar-subscripts]

Whatever the implementation of isdigit, why would a char subscript be
worthy of a diagnostic? Yes char is usually signed, but so are short,
int and long.

Probably because a programmer using a char as a subscript very likely
assumed he could use it to implement a lookup table covering the entire
range of char values. (Which is exactly what happened here, though
indirectly.)

Use of a short, int, or long as an index is less likely to indicate an
attempt to use a lookup table covering the entire range of the type.

It's prone to both false negatives and false positives, but most
heuristics are.
 
G

glen herrmannsfeldt

(snip)
An alternative could be to use the "-funsigned-char" option to make
plain char unsigned. But I don't know what other consequences that
might have on x86.

Seems to me the best suggestion. When you could reasonably rely
on character data being plain ASCII, there might have been some
argument against it, especially on processors that included a way
to convert signed char to int that didn't conveniently also allow
unsigned char to int.

As well as I know it, the PDP-11 was popular in the days that
decision might have been made. Maybe VAX, also.

x86 has a complicated history that traces back at least to the 8080.
The 8086 16 bit registers (AX, BX, CX, DX) could also be referened
as two 8 bit registers (AL, AH, BL, BH, etc). In that case, I believe
it would be more natural (easier) to zero the high byte than to sign
extend it, but it is about as easy either way.

(The 8086 was designed to allow a simple source translation
from 8080 assembly, assigning specific 8086 registers to matching 8080
registers.)

VAX has CVTBW and CVTBL for sign extended byte to (16 bit) word
and (32 bit) longword. There are also MOVZBW and MOVZBL for
zero extended move. Seems to me that it is about as easy either way.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top