C
Chris Croughton
I think the only valid concern is that tolower(char_type) might
be invoked mistakenly, for some negative (char) value. This
won't happen for the basic character set,
Correct.
nor for the most
common codesets for *defined* character codes,
Incorrect. The most common character sets in western Europe are the
ISO-8559-x ones (IOS-8559-1 is commonly known as Latin-1; Microsoft's
Windows character sets for English-speaking versions are largely based
on that). They have the top bit of the char set.
but could happen on some platforms if random garbage values are passed
to tolower().
Or perfectly valid national characters, in many cases with a single
keystroke on a national keyboard.
In practice this could occur when the character
codes come from a hostile user, for example.
They don't have to be hostile -- nor non-English-speaking. Shift-3 on a
UK keyboard (we speak English in the UK, mostly) is the British pound
sign (looks like a stylised L with a line through it), and that is value
0xA3 (163 unsigned, -93 signed). It's very likely to be typed by a user
in a text field or document.
The most likely
actual risk is denial of service due to crashing the process
with an illegal memory reference.
With potential loss of data and revenue as high as you can imagine.
The "more secure library" TR under current development by WG14
Where can I find that? It's mentioned on the JTC1/SC22/WG14-C page[1]
as link "TR 24731: Programming language C - Specification for secure C
library functions", but going to that link[2] doesn't mention it (it
does mention and provide links to the other TRs in progress).
[1] http://www.open-std.org/jtc1/sc22/wg14/
[2] http://www.open-std.org/jtc1/sc22/wg14/www/projects#24731
is meant to provide a "drop-in" (easy automated editing) way to
catch such abuses in existing, not-so-carefully-constructed
applications. The alternative is to do a better job in the
original design and coding.
A better alternative would be to (a) make plain char unsigned (some very
few non-conforming programs might have problems) or (b) extend the range
of the ctype.h functions and macros to include the range CHAR_MIN to -1
(which would waste all of 128 bytes on some systems and otherwise hurt
no one).
One could, of course, use unsigned char explicitly for all arrays -- and
then lose all of the functions in string.h (or have to cast for every
use) because they rightly cause diagnostics if called with a pointer to
unsigned char. Or use type punning or multiple pointers to the same
object, both of which are unsafe. All to get round a design flaw which
'saves' all of 128 bytes typically.
Chris C