J
James Harris
I'm trying to make sense of the standard C data sizes for floating
point numbers. I guess the standards were written to accommodate some
particular floating point engines that were popular at one time but I
can only find references to the number of decimals etc. Basically, if
I wanted to specify C-sized reals in a language that only accepted bit-
widths, e.g.
float(exponent 8, mantissa 24)
I'm looking for what numbers would be needed for the exponent and
mantissa sizes to accurately mirror the C standard minimum widths. Not
sure my log calcs are correct......
AFAIK the sizes for real numbers must be at least
float: range 10^+/-37, precision 6 digits
double: range 10^+/-37, precision 10 digits
I think this means the number of bits used would be
float: 8 bits for exponent, 20 bits for mantissa
double: 8 bits for exponent, 33 bits for mantissa
These are much smaller than (and thus can be represented by) the
IEEE754 standard which has
ieee single precision: 8 bits for exponent, 24 bits for mantissa
ieee double precision: 11 bits for exponent, 53 bits for mantissa
In all cases the mantissa bits include the sign. Are my figures
correct for the number of bits needed for a mimimal C representation,
above? The double of 33 bits, especially, looks wrong.
Does the C standard specify /at least/ 10 digits of precision for
doubles or is it /about/ 10 digits? Or should it be at least /9/
digits and the mantissa 32 bits (making 40 in all with the exponent)?
point numbers. I guess the standards were written to accommodate some
particular floating point engines that were popular at one time but I
can only find references to the number of decimals etc. Basically, if
I wanted to specify C-sized reals in a language that only accepted bit-
widths, e.g.
float(exponent 8, mantissa 24)
I'm looking for what numbers would be needed for the exponent and
mantissa sizes to accurately mirror the C standard minimum widths. Not
sure my log calcs are correct......
AFAIK the sizes for real numbers must be at least
float: range 10^+/-37, precision 6 digits
double: range 10^+/-37, precision 10 digits
I think this means the number of bits used would be
float: 8 bits for exponent, 20 bits for mantissa
double: 8 bits for exponent, 33 bits for mantissa
These are much smaller than (and thus can be represented by) the
IEEE754 standard which has
ieee single precision: 8 bits for exponent, 24 bits for mantissa
ieee double precision: 11 bits for exponent, 53 bits for mantissa
In all cases the mantissa bits include the sign. Are my figures
correct for the number of bits needed for a mimimal C representation,
above? The double of 33 bits, especially, looks wrong.
Does the C standard specify /at least/ 10 digits of precision for
doubles or is it /about/ 10 digits? Or should it be at least /9/
digits and the mantissa 32 bits (making 40 in all with the exponent)?