We suggest to specify a datatype resp. a family of datatypes
ExecutableRealKulischMiranker
depending on roundings *r,* which we present later. These datatypes
consist of

- a
**sort***S[b,l,e1,e2]*which corresponds to the set*S*^{*}:=S(b,l,e1,e2)^{*}, - the arithmetic operations
*+, -, *, /,*where these operations respect the law*RG*for the rounding*r,*and - a rounded sum operator, which computes the value
*r(SUM*where_{i=1}^{n}x_{i}),*n*and__>__1*x*_{i}**e***S*for^{*}*1*__<__i__<__n.

At the first sight it seems to be doubtful that formula *(RG)* can be
met by an executable specification. In order to determine the
approximation *r(x ° y),* the exact but unknown result *x ° y*
seems to be required. But [KM81] shows by means of
algorithms for * ° ***e*** {+,*,-,/}* that whenever *x ° y* is
not representable in the finite floating point system, it is
sufficient to replace it by an appropriate and representable value *x
^{õ} y*. The latter will have the property

- monotone downwardly directed rounding
*\¯/:**forall x***e****R**^{*}: \¯/(x):= max{ y**e***S*^{*}| y__<__x }. - monotone upwardly directed rounding
__/\__:*forall x***e****R**^{*}:__/\__(x):= min{ y**e***S*^{*}| y__>__x }. - monotone rounding toward zero
`[]`_{b}:*forall x***e****R**^{*}, x__>__0:`[]`_{b}(x):= \¯/(x) and*forall x***e****R**^{*}, x < 0:`[]`_{b}(x):= -`[]`_{b}(-x). - monotone rounding away from zero
`[]`_{0}:*forall x***e****R**^{*}, x__>__0:`[]`_{0}(x):=__/\__(x) and*forall x***e****R**^{*}, x < 0:`[]`_{0}(x):= -`[]`_{0}(-x). - roundings
for`[]`_{µ},*µ***e***{1,2, ..., b-1}:*

where*forall x***e****R**^{*}, x**e***[0, b*^{e1 -1}):`[]`_{µ}(x):= 0,*forall x***e****R**^{*}, x__>__b^{e1 -1}:`[]`_{µ}(x):= {*\¯/(x)**, x***e***[\¯/(x), s*_{µ}(x))__/\__(x)*, x***e***[s*_{µ}(x),\¯/(x)],*forall x***e****R**^{*}, x < 0:`[]`_{µ}(x):= -`[]`_{µ}(-x),

If*forall x***e****R**^{*}: s_{µ}:= \¯/(x) + (__/\__(x) - \¯/(x) ) ·µ/b.*b*is an even number,denotes the rounding to the nearest floating point number.`[]`_{b/2}

Further, [KM81] present algorithms to compute the result of the rounded sum operator for all of the above mentioned roundings.

Thus we suggest to specify ExecutableRealKulischMiranker for all theses roundings. This leads - depending on the algorithms presented in [KM81] - to one specification with a parameter for the rounding or to a family of specifications, where each specification "implements" a special rounding.

For a comprehensive discussion of the IEEE standards we refer to [Gol91]. He summarizes the standards as follows:

*There are two different IEEE standards for floating-point
computation. IEEE 754 is a binary standard that requires b=2,
l=24 for single precision and l=53 for double precision
[IEE87]. It also specifies the precise layout of bits in a
single and double precision. IEEE 854 allows either b=2 or b=10
and unlike 754, does not specify how floating-point numbers are
encoded into bits [Cod84]. It does not require a particular
value for l, but instead it specifies constraints on the allowable
values of l for single and double precision.*

The standard IEEE 754 e.g. defines the following format parameters:

parameter single single extended double double extended l24>3253>64e2+127>+1023+1023>16383e1-126<+1022-1022<16382

Both IEEE standards cover the basic operations *+,-,*,/,* and also
specify square root, remainder and the two conversions between integer
and floating point numbers and between the internal format and decimal
output. They require all these operations to be exactly rounded. That
is, the result must be computed exactly and then rounded^{9}. There
are several rounding modes to be provided: the default is rounding to
the nearest floating point number. But also rounding toward *0,*
rounding toward *+ infty* and rounding toward *- infty* are covered.

We suggest to specify both standards as ExecutableReal754 and ExecutableReal854 with suitable parameters like single, double, base, etc.

The datatype
ExecutableRealKulischMiranker and
both specifications based on the IEEE standards,
ExecutableReal754 and
ExecutableReal854,
are on the one hand *independent* of the specifications
BasicReal and
AbstractExecutableReal as they rely
on floating-point systems, which can be realized in CASL without any
connection to the sort *Real*. On the other hand they are *closely related* with these specifications by the roundings.

The main differences between the two proposals are:

**underlying floating point systems:**- Both proposals are based on
*finite*floating point systems. This is the reason for the phrase "executable" in their names.- The first proposal includes the values
*- infty*and*+ infty,*while in the second the minimal and maximal elements are real numbers. - In the floating point system of [KM81] the parameters
base
*b,*length*l,*minimal exponent*e1,*and maximal exponent*e2*can be freely chosen, while the IEEE standards choose the base*b***e***{2, 10},*and give at least constraints on the values of*l,**e1,*and*e2.*

- The first proposal includes the values
**roundings:**- The computer arithmetic of [KM81] is based
on the theory of screens and roundings. It has a larger variety of
roundings than the IEEE standards (even if base
*b = 2*). **operations:**- A significant difference between both proposals is
the rounded sum operator of [KM81]. This operator is
important in numerical computations that deal with vectors: to
compute the scalar product of two vectors with
*n*components, with the rounded sum operator only*n+1*roundings are necessary (*n*roundings for the multiplications and one rounding for the sum), while the IEEE standards need*2n-1*roundings.Compared with the proposal based on [KM81] the IEEE standards include as additional operations square root, remainder and the two conversions.

**transcendental functions:**- The
*table maker's dilemma*is the reason for the different treatment of the transcendental functions in both approaches: While [KM81] does not treat them at all, the IEEE standards do not require them to be exactly rounded.

CoFI Note: M-7 -- Version: 0.2 -- 13 April 1999.

Comments to till@informatik.uni-bremen.de