James Kanze said:
Markus Moll wrote in message...
I asked My Second Virtual Cousin (twice added), and he said nothing! <G>
In Assembler, I used to use eight-byte(dd) and ten-byte(dt) types. That's
not even close to "twice the size" (If we're talking number of bits).
[ assembler == a386 ]
And g++ on a PC, at least in some configurations, uses 8 and 12
bytes.
At the hardware level, there are 10 bytes of information in an
Intel long double.
True, with some additional considerations. The commonly used IEEE 754
floating point formats are
single precision: 32 bits including 1 sign bit, 23 significand bits
(with an implicit leading 1, for 24 total), and 8 exponent bits
double precision: 64 bits including 1 sign bit, 52 significand bits
(with an implicit leading 1, for 24 total), and 11 exponent bits
double extended precision: 80 bits including 1 sign bit, 64 significand
bits (no implicit leading 1), and 15 exponent bits.
The native format of the x87 FPU is "double extended". We run into this
from time to time when floating point computations compile with more
optimization give slightly different results. The issue is that one
optimiziation is to hold intermediate results in 80-bit FPU registers
instead of rounding them down to fit in 64-bit memory locations. gcc
offers the -ffloat-store to suppress this optimization for that very
reason.
In addition, the x87 control register allows one to select between
extended (the default), double and single precision. Try this (on
Linux/gcc/glibc), for example:
#include <fpu_control.h>
void set_double(void)
{
unsigned short cw;
_FPU_GETCW(cw);
cw = (cw & ~_FPU_EXTENDED) | _FPU_DOUBLE;
_FPU_SETCW(cw);
}
and similarly for "set_extended". Note that this will make all kinds of
trouble for you because libm depends on having the FPU in extended
precision mode.
Now comes the real kicker: the SSE/SSE2/SSE3 vector co-processors do not
support double extended precision; only single and double precision.
gcc and icc are both using these co-processors pretty extensively now,
so you're not guaranteed to get even intermediate results done in
double-extended arithmetic.
Going back on-topic, if you want to peek at the binary representations
of IEEE floating-point numbers, you might enjoy the template included
below. I supply typedefs for single and double precision, writing the
typedef for double extended precision is left as an exercise to the
reader.
// IEEE Floating-Point template
// Copyright (C) 2007 Charles M. "Chip" Coldwell <
[email protected]>
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
// This program is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
// You should have received a copy of the GNU General Public License
// along with this program. If not, see <
http://www.gnu.org/licenses/>.
#ifndef IEEEFLOAT_HH
#define IEEEFLOAT_HH
template
<typename _float_t, typename _sint_t, typename _uint_t, int _mbits, int _ebits>
class ieee_float {
public:
typedef _uint_t uint_t;
typedef _sint_t sint_t;
typedef _float_t float_t;
enum { mbits = _mbits, ebits = _ebits };
#ifdef __BIG_ENDIAN__
uint_t s:1;
uint_t e:ebits;
uint_t m:mbits;
#else
uint_t m:mbits;
uint_t e:ebits;
uint_t s:1;
#endif
static const uint_t mdenom = ((uint_t)1 << mbits);
static const uint_t ebias = ((uint_t)1 << (ebits - 1)) - 1;
sint_t sign(void) const { return 1 - 2*s; }
sint_t exponent(void) const { return e ? e - ebias : -(ebias - 1); }
uint_t mantissa(void) const { return ((uint_t)(!!e) << mbits) | m; }
bool infinity(void) const { return (e == ((1 << ebits) - 1)) && (m == 0); }
bool nan(void) const { return (e == ((1 << ebits) - 1)) && (m != 0); }
bool denormal(void) const { return (e == 0) && (m != 0); }
ieee_float(float_t f) { *reinterpret_cast<float_t *>(this) = f; }
ieee_float(uint_t u) { *reinterpret_cast<uint_t *>(this) = u; }
operator float_t() const { return *reinterpret_cast<const float_t *>(this); }
operator uint_t() const { return *reinterpret_cast<const uint_t *>(this); }
};
typedef ieee_float<float, int, unsigned, 23, 8>
single_precision;
typedef ieee_float<double, long long, unsigned long long, 52, 11>
double_precision;
#endif
Chip