ANN: ieee version 0.7

Patrick Perry

20 Sep 2010 20 Sep '10

1:49 a.m.

ieee is a Haskell library for dealing with IEEE floating numbers. It was originally written to make testing with floating point values less painful. The library provides an "approximate equality" type class, AEq, with approximate equality operator "~==". One property of "~==" is that nearby floating point numbers are deemed equivalent, so that, for example, "1 ~== 1.00000001" evaluates to True. Documentation is on hackage: http://hackage.haskell.org/package/ieee Changes since the last release: * Add IEEE type class with instances for Double, Float, CDouble, and CFloat * Add functions for getting/setting NaN payloads * Add succIEEE/predIEEE for advancing up and down the IEEE number line (ported from Tango's nextUp and nextDown) * Add bisectIEEE for midpoints of two numbers on the IEEE number line (ported from Tango's ieeeMean) * Add identicalIEEE for exact (bitwise) equality of IEEE numbers * Add copySign for setting the sign bit of an IEEE number * Add sameSignificandBits for seeing how many significand bits of two IEEE numbers agree, ported from Tango's feqrel * Add nan, infinity, maxFinite, minNormal constants for IEEE numbers * Add maxNum and minNum * Rename maxF and minF to maxNaN and minNaN * Switch to a simpler "~==" comparison for complex numbers * Make "~==" comparison use sameSignificandBits for IEEE types * Make "===" comparison use bitwise equality for IEEE types * Remove old "eqRel" comparisons. * Remove old epsilon' and delta constants * Remove (RealFloat a) => AEq (Complex a) instance in favor of explicit instances for Complex {Double,Float,CDouble,CFloat}

Show replies by date

Jason Dagit

20 Sep 20 Sep

1:55 a.m.

On Sun, Sep 19, 2010 at 6:49 PM, Patrick Perry wrote:

...

ieee is a Haskell library for dealing with IEEE floating numbers. It was originally written to make testing with floating point values less painful. The library provides an "approximate equality" type class, AEq, with approximate equality operator "~==". One property of "~==" is that nearby floating point numbers are deemed equivalent, so that, for example, "1 ~== 1.00000001" evaluates to True.

Given that IEEE is actually a standards body and they have many standards, wouldn't it be more appropriate to call this library ieee754? Note: The library itself seems cool, thanks for publishing it regardless of name! Jason

Patrick Perry

2:18 a.m.

...

Given that IEEE is actually a standards body and they have many standards, wouldn't it be more appropriate to call this library ieee754?

If it seems important to people, I'd be happy to change the name. I'm not religious about these things. Will it clutter up hackage, though? Patrick

Conrad Parker

3:16 a.m.

On 20 September 2010 11:18, Patrick Perry wrote:

...

...
Given that IEEE is actually a standards body and they have many standards, wouldn't it be more appropriate to call this library ieee754?

If it seems important to people, I'd be happy to change the name. I'm not religious about these things. Will it clutter up hackage, though?

I reckon it's worth making it obvious that this library does 754 and not, say, 1394 or 802.11 ;-) On the other hand if you intend on expanding the package to implement every IEEE standard ... (j/k) Anyway, good work. Does this have any overlap with data-binary-ieee754? There was some recent discussion here about the encoding speed in that package. Conrad.

Patrick Perry

4:27 a.m.

I'm looking at data-binary-754 right now, and it seems pretty complicated. Why don't they just cast the values bitwise to integers and then serialize those? Forgive my naivete-- I don't know much about binary encoding issues. I went the route of implementing everything in C and then using the FFI. There's a lot of lot of bit-twiddling involved when you work with the guts of IEEE754, which is a lot easier to do in C. See http://github.com/patperry/hs-ieee/blob/master/cbits/feqrel_source.c for a truly ugly example. I ported this code from the Tango math library for D ( http://www.dsource.org/projects/tango/browser/trunk/tango/math/IEEE.d ). It's original author, Don Clugston, claims that the function is about as fast as a ">" comparison. Haskell's great, but I don't think it could get nearly as fast for a function like this. Patrick On Sun, Sep 19, 2010 at 11:16 PM, Conrad Parker wrote:

...

On 20 September 2010 11:18, Patrick Perry wrote:

...
...
Given that IEEE is actually a standards body and they have many standards, wouldn't it be more appropriate to call this library ieee754?

If it seems important to people, I'd be happy to change the name. I'm not religious about these things. Will it clutter up hackage, though?

I reckon it's worth making it obvious that this library does 754 and not, say, 1394 or 802.11 ;-) On the other hand if you intend on expanding the package to implement every IEEE standard ... (j/k)

Anyway, good work. Does this have any overlap with data-binary-ieee754? There was some recent discussion here about the encoding speed in that package.

Conrad.

John Millikin

4:31 a.m.

On Sun, Sep 19, 2010 at 20:16, Conrad Parker wrote:

...

Anyway, good work. Does this have any overlap with data-binary-ieee754? There was some recent discussion here about the encoding speed in that package.

I should probably make it more clear that data-binary-ieee754 is for special use cases; for most people, using something like this will be much faster since it doesn't have to poke around the individual bits: putFloat32be :: Float -> Put putFloat32be = putWord32be . unsafeCoerce I needed "real" IEEE754 binary support for round-trip parsing, where (for example) maintaining the particular bit pattern of a NaN is important. For 99% of people, the "unsafe" method will work fine.

Patrick Perry

4:43 a.m.

...

I needed "real" IEEE754 binary support for round-trip parsing, where (for example) maintaining the particular bit pattern of a NaN is important. For 99% of people, the "unsafe" method will work fine.

How does a C-style cast not preserve the bit pattern of a NaN? Again, sorry if this is a stupid question.

John Millikin

4:47 a.m.

On Sun, Sep 19, 2010 at 21:43, Patrick Perry wrote:

...

...
I needed "real" IEEE754 binary support for round-trip parsing, where (for example) maintaining the particular bit pattern of a NaN is important. For 99% of people, the "unsafe" method will work fine.

How does a C-style cast not preserve the bit pattern of a NaN? Again, sorry if this is a stupid question.

It's not a stupid question, and I don't know the answer. But if you plug a C-style cast into the data-binary-ieee754 unit tests, some of them (the fiddly ones, like roundtripping -NaN) will fail. Presumably, this is due to some optimization deep in the bowels of GHC, but I don't understand even a fraction of what goes on in there. For what it's worth, d-b-ieee754 was the very first Haskell library I ever wrote -- and it shows. If anybody knows how to make unsafeCoerce (or equivalent) roundtrip-safe, I would love to rip out all the ugly and make it sane.

Daniel Fischer

10:22 a.m.

On Monday 20 September 2010 06:47:12, John Millikin wrote:

...

On Sun, Sep 19, 2010 at 21:43, Patrick Perry wrote:

...
...
I needed "real" IEEE754 binary support for round-trip parsing, where (for example) maintaining the particular bit pattern of a NaN is important. For 99% of people, the "unsafe" method will work fine.

How does a C-style cast not preserve the bit pattern of a NaN? Again, sorry if this is a stupid question.

It's not a stupid question, and I don't know the answer. But if you plug a C-style cast into the data-binary-ieee754 unit tests, some of them (the fiddly ones, like roundtripping -NaN) will fail. Presumably, this is due to some optimization deep in the bowels of GHC, but I don't understand even a fraction of what goes on in there.

For what it's worth, d-b-ieee754 was the very first Haskell library I ever wrote -- and it shows. If anybody knows how to make unsafeCoerce (or equivalent) roundtrip-safe, I would love to rip out all the ugly and make it sane.

unsafeCoerce is not supposed to work for casts between Integral and Floating types. If you try to unsafeCoerce# between unboxed types, say Double# and Word64#, you're likely to get a compile failure (ghc panic). If you unsafeCoerce between the boxed types, it will probably work, but there are no guarantees. There's a feature request for unboxed coercion (i.e. reinterpretation of the bit-pattern): http://hackage.haskell.org/trac/ghc/ticket/4092

John Millikin

21 Sep 21 Sep

3:18 a.m.

On Mon, Sep 20, 2010 at 03:22, Daniel Fischer wrote:

...

unsafeCoerce is not supposed to work for casts between Integral and Floating types. If you try to unsafeCoerce# between unboxed types, say Double# and Word64#, you're likely to get a compile failure (ghc panic). If you unsafeCoerce between the boxed types, it will probably work, but there are no guarantees.

There's a feature request for unboxed coercion (i.e. reinterpretation of the bit-pattern):

http://hackage.haskell.org/trac/ghc/ticket/4092

Interesting -- in that bug report, Simon Mar says that converting the value using pointers will work correctly. I've changed d-b-ieee754 over to use this method (v 0.4.2); the tests are still passing, so I'll call it success.

Conrad Parker

5:11 a.m.

On 21 September 2010 12:18, John Millikin wrote:

...

On Mon, Sep 20, 2010 at 03:22, Daniel Fischer wrote:

...
unsafeCoerce is not supposed to work for casts between Integral and Floating types. If you try to unsafeCoerce# between unboxed types, say Double# and Word64#, you're likely to get a compile failure (ghc panic). If you unsafeCoerce between the boxed types, it will probably work, but there are no guarantees.

There's a feature request for unboxed coercion (i.e. reinterpretation of the bit-pattern):

http://hackage.haskell.org/trac/ghc/ticket/4092

Interesting -- in that bug report, Simon Mar says that converting the value using pointers will work correctly. I've changed d-b-ieee754 over to use this method (v 0.4.2); the tests are still passing, so I'll call it success.

I've been using unsafeCoerce: getFloat64be :: Get Double getFloat64be = do n <- getWord64be return (unsafeCoerce n :: Double) putFloat64be :: Double -> Put putFloat64be n = putWord64be (unsafeCoerce n :: Word64) but only tested it with quickcheck -- it passes about 10^7 checks, comparing roundtrips in combinatrion with the previous data-binary-ieee754 versions. However could that sometimes behave incorrectly? Should the d-b-iee754-0.4.2 versions with castPtr etc. be even faster? Conrad.

Daniel Fischer

2:10 p.m.

On Tuesday 21 September 2010 07:11:24, Conrad Parker wrote:

...

On 21 September 2010 12:18, John Millikin wrote:

...
On Mon, Sep 20, 2010 at 03:22, Daniel Fischer wrote:

...
unsafeCoerce is not supposed to work for casts between Integral and Floating types. If you try to unsafeCoerce# between unboxed types, say Double# and Word64#, you're likely to get a compile failure (ghc panic). If you unsafeCoerce between the boxed types, it will probably work, but there are no guarantees.

There's a feature request for unboxed coercion (i.e. reinterpretation of the bit-pattern):

http://hackage.haskell.org/trac/ghc/ticket/4092

Interesting -- in that bug report, Simon Mar says that converting the value using pointers will work correctly. I've changed d-b-ieee754 over to use this method (v 0.4.2); the tests are still passing, so I'll call it success.

And I'd expect it to be a heck of a lot faster than the previous implementation. Have you done any benchmarks?

...

I've been using unsafeCoerce:

getFloat64be :: Get Double getFloat64be = do n <- getWord64be return (unsafeCoerce n :: Double)

putFloat64be :: Double -> Put putFloat64be n = putWord64be (unsafeCoerce n :: Word64)

but only tested it with quickcheck -- it passes about 10^7 checks, comparing roundtrips in combinatrion with the previous data-binary-ieee754 versions. However could that sometimes behave incorrectly?

Should the d-b-iee754-0.4.2 versions with castPtr etc. be even faster?

No, Simon says "Not terribly efficient, but better than using the FFI." I would expect unsafeCoerce to be the fastest you can get (so far). From the docs of unsafeCoerce#, I get the impression that will probably work, but it's not listed among the cases where it's *supposed* to work, hence it's probably in fact a little unsafe. One problem I see with both, unsafeCoerce and poke/peek is endianness. Will the bit-pattern of a double be interpreted as the same uint64_t on little-endian and on big-endian machines? In other words, is the byte order for doubles endianness-dependent too? If yes, that's fine, if no, it would break between machines of different endianness.

...

Conrad.

Cheers, Daniel

John Millikin

5:46 p.m.

On Mon, Sep 20, 2010 at 22:11, Conrad Parker wrote:

...

I've been using unsafeCoerce:

getFloat64be :: Get Double getFloat64be = do n <- getWord64be return (unsafeCoerce n :: Double)

putFloat64be :: Double -> Put putFloat64be n = putWord64be (unsafeCoerce n :: Word64)

but only tested it with quickcheck -- it passes about 10^7 checks, comparing roundtrips in combinatrion with the previous data-binary-ieee754 versions. However could that sometimes behave incorrectly?

QuickCheck only generates a subset of possible floating point values; when I tested unsafeCoerce, it sometimes gave incorrect results when dealing with edge cases like signaling NaNs.

...

Should the d-b-iee754-0.4.2 versions with castPtr etc. be even faster?

It should be slightly slower, but not nearly as slow as the bitfield-based parsing. On Tue, Sep 21, 2010 at 07:10, Daniel Fischer wrote:

...

And I'd expect it to be a heck of a lot faster than the previous implementation. Have you done any benchmarks?

Only very rough ones -- a few basic Criterion checks, but nothing extensive. Numbers for put/get of 64-bit big-endian: getWord getFloat putWord putFloat Bitfields (0.4.1) 59 ns 8385 ns 1840 ns 11448 ns poke/peek (0.4.2) 59 ns 305 ns 1840 ns 744 ns unsafeCoerce 59 ns 61 ns 1840 ns 642 ns Note: I don't know why the cast-based versions can put a Double faster than a Word64; Float is (as expected) slower than Word32. Some special-case GHC optimization?

...

One problem I see with both, unsafeCoerce and poke/peek is endianness. Will the bit-pattern of a double be interpreted as the same uint64_t on little-endian and on big-endian machines? In other words, is the byte order for doubles endianness-dependent too? If yes, that's fine, if no, it would break between machines of different endianness.

Endianness only matters when marshaling bytes into a single value -- Data.Binary.Get/Put handles that. Once the data is encoded as a Word, endianness is no longer relevant.

Daniel Fischer

7:08 p.m.

On Tuesday 21 September 2010 19:46:02, John Millikin wrote:

...

On Tue, Sep 21, 2010 at 07:10, Daniel Fischer wrote:

...
And I'd expect it to be a heck of a lot faster than the previous implementation. Have you done any benchmarks?

Only very rough ones -- a few basic Criterion checks, but nothing extensive.

Certainly good enough for an indication.

...

Numbers for put/get of 64-bit big-endian:

getWord getFloat putWord putFloat Bitfields (0.4.1) 59 ns 8385 ns 1840 ns 11448 ns poke/peek (0.4.2) 59 ns 305 ns 1840 ns 744 ns

Yaw. That's a huge difference. I don't think there's much room for doubt that it's much faster (the exact ratios will vary of course).

...

unsafeCoerce 59 ns 61 ns 1840 ns 642 ns

Odd that unsafeCoerce gains 244 ns for get, but only 102 for put.

...

Note: I don't know why the cast-based versions can put a Double faster than a Word64;

Strange. putFloat does a putWord and a transformation, how can that be faster than only the putWord?

...

Float is (as expected) slower than Word32. Some special-case GHC optimization?

...
One problem I see with both, unsafeCoerce and poke/peek is endianness. Will the bit-pattern of a double be interpreted as the same uint64_t on little-endian and on big-endian machines? In other words, is the byte order for doubles endianness-dependent too? If yes, that's fine, if no, it would break between machines of different endianness.

Endianness only matters when marshaling bytes into a single value -- Data.Binary.Get/Put handles that. Once the data is encoded as a Word, endianness is no longer relevant.

I mean, take e.g. 2^62 :: Word64. If you poke that to memory, on a big- endian machine, you'd get the byte sequence 40 00 00 00 00 00 00 00 while on a little-endian, you'd get 00 00 00 00 00 00 00 40 , right? If both bit-patterns are interpreted the same as doubles, sign-bit = 0, exponent-bits = 0x400 = 1024, mantissa = 0 , thus yielding 1.0*2^(1024 - 1023) = 2.0, fine. But if on a little-endian machine, the floating point handling is not little-endian and the number is interpreted as sign-bit = 0, exponent-bits = 0, mantissa = 0x40, hence (1 + 2^(-46))*2^(-1023), havoc. I simply didn't know whether that could happen. According to http://en.wikipedia.org/wiki/Endianness#Floating-point_and_endianness it could. On the other hand, "no standard for transferring floating point values has been made. This means that floating point data written on one machine may not be readable on another", so if it breaks on weird machines, it's at least a general problem (and not Haskell's).

John Millikin

7:35 p.m.

On Tue, Sep 21, 2010 at 12:08, Daniel Fischer wrote:

...

...
Endianness only matters when marshaling bytes into a single value -- Data.Binary.Get/Put handles that. Once the data is encoded as a Word, endianness is no longer relevant.

I mean, take e.g. 2^62 :: Word64. If you poke that to memory, on a big- endian machine, you'd get the byte sequence 40 00 00 00 00 00 00 00 while on a little-endian, you'd get 00 00 00 00 00 00 00 40 , right?

If both bit-patterns are interpreted the same as doubles, sign-bit = 0, exponent-bits = 0x400 = 1024, mantissa = 0 , thus yielding 1.0*2^(1024 - 1023) = 2.0, fine. But if on a little-endian machine, the floating point handling is not little-endian and the number is interpreted as sign-bit = 0, exponent-bits = 0, mantissa = 0x40, hence (1 + 2^(-46))*2^(-1023), havoc.

I simply didn't know whether that could happen. According to http://en.wikipedia.org/wiki/Endianness#Floating-point_and_endianness it could. On the other hand, "no standard for transferring floating point values has been made. This means that floating point data written on one machine may not be readable on another", so if it breaks on weird machines, it's at least a general problem (and not Haskell's).

Oh, I misunderstood the question -- you're asking about architectures on which floating-point and fixed-point numbers use a different endianness? I don't think it's worth worrying about, unless you want to use Haskell for number crunching on a PDP-11. If you do need to implement IEEE754 parsing for unusual endians (like 3-4-1-2), parse the word yourself and then use 'wordToFloat' and friends to convert it.

Daniel Fischer

7:59 p.m.

On Tuesday 21 September 2010 21:35:21, John Millikin wrote:

...

Oh, I misunderstood the question -- you're asking about architectures on which floating-point and fixed-point numbers use a different endianness?

Basically, I wanted to know whether there are such beasts (apparently yes) and if, whether they're numerous enough to worry about (apparently no).

David Terei

22 Sep 22 Sep

5:42 a.m.

I by pure coincidence was fixing a bug in some code in GHC that involved converting a Haskell float into a hex IEEE form, this is how its done in the code, just to add another way :) castDoubleToWord8Array :: STUArray s Int Double -> ST s (STUArray s Int Word8) castDoubleToWord8Array = castSTUArray doubleToBytes :: Double -> [Int] doubleToBytes d = runST (do arr <- newArray_ ((0::Int),7) writeArray arr 0 d arr <- castDoubleToWord8Array arr i0 <- readArray arr 0 i1 <- readArray arr 1 i2 <- readArray arr 2 i3 <- readArray arr 3 i4 <- readArray arr 4 i5 <- readArray arr 5 i6 <- readArray arr 6 i7 <- readArray arr 7 return (map fromIntegral [i0,i1,i2,i3,i4,i5,i6,i7]) ) The nice thing about this code is it is pure Haskell and doesn't involve using any unsafe methods. You do have to handle the platform endianess with this method though while for the others you don't. On 22 September 2010 05:59, Daniel Fischer wrote:

...

On Tuesday 21 September 2010 21:35:21, John Millikin wrote:

...
Oh, I misunderstood the question -- you're asking about architectures on which floating-point and fixed-point numbers use a different endianness?

Basically, I wanted to know whether there are such beasts (apparently yes) and if, whether they're numerous enough to worry about (apparently no). _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Patrick Perry

20 Sep 20 Sep

12:23 p.m.

New subject: ANN: ieee754 version 0.7.1

Upon popular request, I've renamed the "ieee" package to "ieee754". The new hackage page is http://hackage.haskell.org/package/ieee754 . On Sep 19, 2010, at 11:16 PM, Conrad Parker wrote:

...

On 20 September 2010 11:18, Patrick Perry wrote:

...
...
Given that IEEE is actually a standards body and they have many standards, wouldn't it be more appropriate to call this library ieee754?

If it seems important to people, I'd be happy to change the name. I'm not religious about these things. Will it clutter up hackage, though?

I reckon it's worth making it obvious that this library does 754 and not, say, 1394 or 802.11 ;-) On the other hand if you intend on expanding the package to implement every IEEE standard ... (j/k)

Anyway, good work. Does this have any overlap with data-binary-ieee754? There was some recent discussion here about the encoding speed in that package.

Conrad.