implementing RealFloat decodeFloat

22 Dec 2010

      Hi everyone,

I've been working on [0] Haskell bindings for [1] libqd for [2] 
double-double and quad-double arithmetic, and have been struggling to 
implement [3] RealFloat, in particular [4] decodeFloat, mostly because 
of its postcondition but also some issues relating to variable precision:

---8<---
If decodeFloat x yields (m,n), then x is equal in value to m*b^^n, where 
b is the floating-point radix, and furthermore, either m and n are both 
zero or else b^(d-1) <= m < b^d, where d is the value of floatDigits x. 
In particular, decodeFloat 0 = (0,0).
---8<---

(BTW: that should probably really be "... <= abs m < ..."; perhaps this 
code could be added to the report and/or documentation, if it is correct:

   validDecode :: RealFloat f => f -> Bool
   validDecode f = case decodeFloat f of
     (0,0) -> True
     (m, e) -> b^(d-1) <= abs m && abs m < b^d
     where
       b = floatRadix f
       d = floatDigits f

)

The double-double and quad-double types do not have a fixed precision, 
though they do have a fixed minimum precision: this means that 
decodeFloat will (in general) lose some precision.  This is because for 
example in:

   data DoubleDouble = DoubleDouble !CDouble !CDouble
   dd = DoubleDouble a b

the semantics are that "dd = a + b", with |a|>>|b|; coupled with the 
IEEE implicit leading 1 bit, this means that there may be large gaps 
between exponents: for example: "1 + 0.5**100 :: DoubleDouble".

So far I've got a "mostly working" implementation thus:

   decodeFloat !(DoubleDouble a b) =
     case (decodeFloat a, decodeFloat b) of
       ((0, 0), (0, 0)) -> (0, 0)
       ((0, 0), (m, e)) -> (m `shiftL` f, e - f)
       ((m, e), (0, 0)) -> (m `shiftL` f, e - f)
       ((m1, e1), (m2, e2)) ->
         let fixup m e =
               if m < mMin
                 then fixup (m `shiftL` 1) (e - 1)
                 else if m >= mMax
                        then fixup (m `shiftR` 1) (e + 1)
                        else (m, e)
             mMin = 1 `shiftL` (ff - 1)
             mMax = 1 `shiftL` ff
             ff = floatDigits (0 :: DoubleDouble)
             g = e1 - e2 - f
         in  fixup ((m1 `shiftL` f) + (m2 `shiftR` g)) (e1 - f)
     where
       f = floatDigits (0 :: CDouble)

This does meet the postcondition as specified (which leads to breakage 
in other RealFloat methods), but has a recursion with no termination 
proof (so far), and is lossy in general:
...
let check f = uncurry encodeFloat (decodeFloat f) == f
check (1 + 0.5 ** 120 :: DoubleDouble)
   False
It does however seem to meet a weaker condition:
...
let check2 f = (decodeFloat . (`asTypeOf` f) . uncurry encodeFloat 
. decodeFloat $ f) == decodeFloat f
check2 (1 + 0.5 ** 120 :: DoubleDouble)
   True
Questions:

1. Is this weaker condition likely to be good enough in practice?
2. Can my implementation of decodeFloat be simplified?

Thanks for any insights,

Claude

[0] http://hackage.haskell.org/package/qd
[1] http://crd.lbl.gov/~dhbailey/mpdist/
[2] http://crd.lbl.gov/~dhbailey/dhbpapers/arith15.pdf
[3] 
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.htm...
[4] 
http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.htm...

-- 
http://claudiusmaximus.goto10.org

Claude Heiland-Allen

Henning Thielemann

tags

participants (2)