simple answer:
char is normally signed (granted, not all C compilers agree to this, as a
few older/oddball compilers have made it default to unsigned).
<--
who says it's normally signed? I've seen compilers that made it
optional.
Since it hardly ever matters I don't understand why you care.
-->
it is normally signed, since this is what a majority of the compilers on a
majority of the common architectures do.
granted, it is not safe to rely on this, and hence I often use an explicit
signed type if it really matters.
so 'char'=='character' is a misnomer (historical accident?...)
<--
its an historical fact. It's hardly a misnomer, an accident or even an
error.
char is a C type for holding characters. I agree it might have been
a good idea to have a byte type as well.
-->
but, to have it hold characters, be of a fixed size, and signed?...
I would have rather had said separate byte type, and have left "char" to be
a machine-dependent type, similar to short or int.
since for
most practical uses, ASCII and UTF-8 chars are better treated as unsigned
<--
why should ASCII be unsigned? ASCII fits in 7 bits. Even extended
ASCIIs
still manage fine as signed values.
-->
errm, not really.
in practice, extended ASCII sets are generally defined as, and assumed to
be, within the 128-255 range...
likewise, signedness will generally not mix well with things like
encoding/decoding UTF-8 chars, ...
so, it is common practice in my case to cast to "unsigned char" when doing
things involving UTF-8, ... but otherwise leave strings as the more
traditional "char *" type.
(we just use 'char' as a matter of tradition, and cast to unsigned char
wherever it matters),
<--
it never matters with character data. I use unsigned char when I'm
manipulating external representations (bytes or octets)
-->
it matters with character data if it happens to be UTF-8.
many simple strategies for working with text may mess up fairly hard if the
text is UTF-8 and things are treated as signed.
and for most other uses (where we want a signed byte),
<--
that is, hardly ever. I'm tempted to say "never" as I don't think
I've ever needed tiny little integers. But I can imagine uses
for TLIs.
-->
there are many cases, especially if one does things involving image
processing or signal processing...
one needs them much like one needs 16-bit floats, although, granted, there
are other, typically more convinient, ways of shoving floating-point values
into 8 or 16 bit quantities, in the absence of a floating point type
(typically revolving around log or sqrt...).
'fixed point' is also sometimes appropriate, but in these cases it really
depends on the data.
memory is not free, hence it matters that it not all be wasted
frivolously...
thinking of 'char' as 'character' is misleading
<--
I disagree
-->
it is misleading if your string happens to be UTF-16...
then, suddenly, char is unable to represent said characters...
even with UTF-8, 'char' is not able to represent a character, only a single
byte which could be part of a multi-byte character.
hence, the issue...
(note that there are many
cases where a signed 8-bit value actually makes some sense).
<--
such as?
-->
signal-processing related numeric functions, small geometric data, ...
you "could" store everything as floats, and then discover that one is eating
up 100s of MB of memory on data which could easily be stored in much less
space (say, 1/4 the space).
many other (newer) languages reinterpret things,
but that doesn't matter
<snip>
<--
in my own uses, I typically use typedef to define 'byte' as 'unsigned
char'
I commonly do this
and 'sbyte' as 'signed char'.
I never do this
I also use 'u8' and 's8' sometimes.
I dislike these. Seems to be letting the metal show through
-->
s8/u8, s16/u16, s32/u32, ...
these are good for defining structures where values are expected to be
specific sizes...
my (newer) x86 interpreter sub-project uses these sorts of types
extensively, mostly as, with x86 machine code, things matter down to the
single bits...
many other tasks may involve similar levels of bit-centric twiddling, and so
the naming may also hint at the possible use of bit-centric logic code...
however, for most more general tasks, I use byte and sbyte instead...