IEEE-754

  • Thread starter =?ISO-8859-1?Q?Roman_T=F6ngi?=
  • Start date
?

=?ISO-8859-1?Q?Roman_T=F6ngi?=

IEEE-754 Arithmetic:
Most real numbers can't be stored exactly on the computer, but there can
be stated the range within which a machine number lies.

For the following example, I assume double precision and the round mode
in effect to be 'round to nearest' and that the number lies within the
normalized range:

Definitions:
x := real number
round(x) := correctly rounded normalized number
eps := machine epsilon (2^(-52) for double precision)
abs(x) := absolute value of x

That is:

round(x) = x*(1 + delta)

with delta:

abs(delta) <= 1/2*eps (round to nearest)

i.d. abs(delta) <= 2^(-53) (double precision)

abs(delta) corresponds to the relative rounding error.

Now I can state the range including round(x):

-----------------------------------------
x*(1-2(-53)) <= round(x) <= x*(1+2^(-53))
-----------------------------------------

Is this the correct range according to my assumptions?



Thanks a lot
Roman
 
B

Boudewijn Dijkstra

Op Thu, 23 Aug 2007 12:45:52 +0200 schreef Roman Töngi
IEEE-754 Arithmetic:
Most real numbers can't be stored exactly on the computer, but there can
be stated the range within which a machine number lies.

For the following example, I assume double precision and the round mode
in effect to be 'round to nearest' and that the number lies within the
normalized range:

Definitions:
x := real number
round(x) := correctly rounded normalized number
eps := machine epsilon (2^(-52) for double precision)
abs(x) := absolute value of x

That is:
round(x) = x*(1 + delta)

with delta:
abs(delta) <= 1/2*eps (round to nearest)
i.d. abs(delta) <= 2^(-53) (double precision)

abs(delta) corresponds to the relative rounding error.

Now I can state the range including round(x):

Yes, but your assumptions are invalid. How did you arrive at a machine
epsilon of 2^(-52)?
 
E

Eric Sosman

Roman Töngi wrote On 08/23/07 06:45,:
IEEE-754 Arithmetic:
Most real numbers can't be stored exactly on the computer, but there can
be stated the range within which a machine number lies.

For the following example, I assume double precision and the round mode
in effect to be 'round to nearest' and that the number lies within the
normalized range:

Definitions:
x := real number
round(x) := correctly rounded normalized number
eps := machine epsilon (2^(-52) for double precision)
abs(x) := absolute value of x

That is:

round(x) = x*(1 + delta)

with delta:

abs(delta) <= 1/2*eps (round to nearest)

i.d. abs(delta) <= 2^(-53) (double precision)

abs(delta) corresponds to the relative rounding error.

Now I can state the range including round(x):

It looks right to me for x >= 0 (for x < 0 the
inequalities are backwards), and given suitable hand-
waving for abs(x) very small or very large. It might
be possible (I'm not sure) to sharpen the analysis a
tiny bit and change a `<=' to a `<', but whether that's
worth trying depends on your purpose in obtaining the
bound in the first place.

Note that the C language does not require IEEE
floating-point, nor does it require round-to-nearest,
nor does it specify the value of eps.
 
?

=?ISO-8859-15?Q?Roman_T=F6ngi?=

Boudewijn said:
Op Thu, 23 Aug 2007 12:45:52 +0200 schreef Roman Töngi


Yes, but your assumptions are invalid. How did you arrive at a machine
epsilon of 2^(-52)?
From the IEEE-specification for double format.
 
B

Boudewijn Dijkstra

Op Thu, 23 Aug 2007 18:08:15 +0200 schreef Roman Töngi
From the IEEE-specification for double format.

I asked how, not where. Unless it says something like: "the machine
epsilon is 2^(-52); this corresponds to the upper limit of the rounding
error."
 
C

cr88192

Boudewijn Dijkstra said:
Op Thu, 23 Aug 2007 18:08:15 +0200 schreef Roman Töngi


I asked how, not where. Unless it says something like: "the machine
epsilon is 2^(-52); this corresponds to the upper limit of the rounding
error."

my guess (probably OT here, oh well):

it will be this, presumably, unless the machine computes using less bits
than the format (such as if the calculations were internally performed with
floats, or with 48 bit mantissa values, or such).

may be a little higher really, as presumably the exact values of the low
order bits will depend on the exact HW.

for example, calculations performed with doubles in SSE are often slightly
off from those performed in the FPU, given the FPU uses an internal 80 bit
representation (with a 64 bit mantissa).

now, if our basic value is 1, and things are properly normalized (I think
this is required, except in the edge case of very small values), then our
epsilon is about the same as the relative weight of our low order bits.


now, if the major value were something other than 1, then the epsilon would
differ, in step with the exponent.

or such...
 
B

Boudewijn Dijkstra

now, if our basic value is 1, and things are properly normalized (I think
this is required, except in the edge case of very small values), then our
epsilon is about the same as the relative weight of our low order bits.


now, if the major value were something other than 1, then the epsilon
would differ, in step with the exponent.

or such...

Exactly. The epsilon will be proportional to the exponent.
 
B

Boudewijn Dijkstra

Op Mon, 27 Aug 2007 15:52:30 +0200 schreef Peter J. Holzer
And now read the OP again.

You're beyond me now. The OP was talking about a constant epsilon for the
whole range of numbers within the normalized range. Or did you read
something else between lines?
 
C

CBFalconer

Boudewijn said:
You're beyond me now. The OP was talking about a constant epsilon
for the whole range of numbers within the normalized range. Or
did you read something else between lines?

You are the first I have noted to consider 'proportional' to denote
a constant.
 
P

Peter J. Holzer

Op Mon, 27 Aug 2007 15:52:30 +0200 schreef Peter J. Holzer


You're beyond me now. The OP was talking about a constant epsilon for the
whole range of numbers within the normalized range.

Yes, but that epsilon was always multiplied by the number:

| round(x) = x*(1 + delta)
^ here
|
| with delta:
|
| abs(delta) <= 1/2*eps (round to nearest)
|
| i.d. abs(delta) <= 2^(-53) (double precision)
|
| abs(delta) corresponds to the relative rounding error.
|
| Now I can state the range including round(x):
|
| -----------------------------------------
| x*(1-2(-53)) <= round(x) <= x*(1+2^(-53))
^ here ^ here
| -----------------------------------------

This is afaik the normal use of eps. See for example
http://en.wikipedia.org/wiki/Machine_epsilon.
Or did you read something else between lines?

No, I just read the lines.

hp
 
B

Boudewijn Dijkstra

Op Tue, 28 Aug 2007 13:59:18 +0200 schreef CBFalconer
You are the first I have noted to consider 'proportional' to denote
a constant.

You could note that, but it'd be more correct to note that I wasn't
denoting a constant, but an entity identified by the OP as a constant.
 
B

Boudewijn Dijkstra

Op Wed, 29 Aug 2007 00:45:30 +0200 schreef Peter J. Holzer
Yes, but that epsilon was always multiplied by the number:

| round(x) = x*(1 + delta)
^ here

Yes, you're right. I was being incredibly thick (which doesn't usually
happen (for this long)).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,994
Messages
2,570,222
Members
46,810
Latest member
Kassie0918

Latest Threads

Top