M
Martin Wells
I'm trying to come up with a fully-portable macro for supplying memset
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .
I'll take a sample system before I begin:
CHAR_BIT == 16
sizeof(short) == sizeof(int) == 1
Assume none of the integer types have padding bits
Sign-magnitude
Therefore we have:
UCHAR_MAX == 65535
INT_MIN = -32767
INT_MAX = 32767
Let's say we have an array of bytes and we want to set every byte to
65000. We CANNOT use:
memset(data, 65000, sizeof data);
because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range.
Therefore we need to supply memset with an int value, which, went
converted to unsigned char, will yield the value we want.
The rules for converting from signed to unsigned are as follows:
| If the new type is unsigned, the value is converted
| by repeatedly adding or subtracting one more than
| the maximum value that can be represented in the
| new type until the value is in the range of the new type.
The addition method is easier to understand so we'll go with that one.
If we start off with a negative number like -1, then here's what will
happen:
char unsigned c = -1;
is equal to:
infinite_range_int x = -1; /* Let's pretend we have a signed
int type that can hold any number */
while (0 > x || UCHAR_MAX < x) x += UCHAR_MAX +
(infinite_range_int)1;
char unsigned c = x;
So on our own system, this is:
while (0 > x || 65535 < x) x += 65536;
Clearly, if x = -1, then it only takes one iteration of the loop to
yield 65535, i.e. UCHAR_MAX.
Therefore, if we want UCHAR_MAX-1, then we'd use (int)-2.
For UCHAR_MAX-2, we'd use (int)-3.
The entire set of data looks something like:
int char unsigned
-1 65535
-2 65534
-3 65533
-4 65532
-5 65531
-6 65530
-7 65529
-8 65528
-9 65527
-10 65526
-11 65525
-12 65524
....
....
-32764 32772
-32765 32771
-32766 32770
-32767 32769
-32768 32768 <--
Now I've just realised a problem. An unsigned char can store 65536
different combinations (i.e. 0 through 65535), but an int can only
store 65535 different combination (i.e. -32767 through 32767) if we're
using something other than two's complement. I don't know what I'll do
about that, but for now I'll try continue with the other two number
systems:
#if NUMBER_SYSTEM != SIGN_MAGNITUDE
#define UC_AS_INT(x) /* Whatever we're going to do */
#endif
My first thought is something like:
#define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned)(x) )
#define UC_AS_INT_Internal(x) ( x > INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )
Anyway it's Friday an I've stuff to do, but if anyone wants to finish
it off then feel free!
If we can't get all 65536 combinations out of one's complement or sign-
magnitude, then we can just have a macro that changes it to:
char unsigned *p = data;
char unsigned const *const pover = data + sizeof data;
while (pover != p) *p++ = c;
Martin
with an unsigned char rather than an int. I'm going to think out loud
as I go along. . .
I'll take a sample system before I begin:
CHAR_BIT == 16
sizeof(short) == sizeof(int) == 1
Assume none of the integer types have padding bits
Sign-magnitude
Therefore we have:
UCHAR_MAX == 65535
INT_MIN = -32767
INT_MAX = 32767
Let's say we have an array of bytes and we want to set every byte to
65000. We CANNOT use:
memset(data, 65000, sizeof data);
because the conversion from unsigned integer types to signed integer
types "is implementation-defined or an implementation-defined signal
is raised" if the number is out of range.
Therefore we need to supply memset with an int value, which, went
converted to unsigned char, will yield the value we want.
The rules for converting from signed to unsigned are as follows:
| If the new type is unsigned, the value is converted
| by repeatedly adding or subtracting one more than
| the maximum value that can be represented in the
| new type until the value is in the range of the new type.
The addition method is easier to understand so we'll go with that one.
If we start off with a negative number like -1, then here's what will
happen:
char unsigned c = -1;
is equal to:
infinite_range_int x = -1; /* Let's pretend we have a signed
int type that can hold any number */
while (0 > x || UCHAR_MAX < x) x += UCHAR_MAX +
(infinite_range_int)1;
char unsigned c = x;
So on our own system, this is:
while (0 > x || 65535 < x) x += 65536;
Clearly, if x = -1, then it only takes one iteration of the loop to
yield 65535, i.e. UCHAR_MAX.
Therefore, if we want UCHAR_MAX-1, then we'd use (int)-2.
For UCHAR_MAX-2, we'd use (int)-3.
The entire set of data looks something like:
int char unsigned
-1 65535
-2 65534
-3 65533
-4 65532
-5 65531
-6 65530
-7 65529
-8 65528
-9 65527
-10 65526
-11 65525
-12 65524
....
....
-32764 32772
-32765 32771
-32766 32770
-32767 32769
-32768 32768 <--
Now I've just realised a problem. An unsigned char can store 65536
different combinations (i.e. 0 through 65535), but an int can only
store 65535 different combination (i.e. -32767 through 32767) if we're
using something other than two's complement. I don't know what I'll do
about that, but for now I'll try continue with the other two number
systems:
#if NUMBER_SYSTEM != SIGN_MAGNITUDE
#define UC_AS_INT(x) /* Whatever we're going to do */
#endif
My first thought is something like:
#define UC_AS_INT(x) UC_AS_INT_Internal( (char unsigned)(x) )
#define UC_AS_INT_Internal(x) ( x > INT_MAX \
? -(int)(UCHAR_MAX - x) - 1 \
: (int)x )
Anyway it's Friday an I've stuff to do, but if anyone wants to finish
it off then feel free!
If we can't get all 65536 combinations out of one's complement or sign-
magnitude, then we can just have a macro that changes it to:
char unsigned *p = data;
char unsigned const *const pover = data + sizeof data;
while (pover != p) *p++ = c;
Martin