Proposal: Add "fma" to the RealFloat class

Levent Erkok

29 Apr 2015 29 Apr '15

7:21 a.m.

This proposal is very much in the spirit of the earlier proposal on adding new float/double functions; for instance see here: https://mail.haskell.org/pipermail/libraries/2014-April/022667.html "fma" (a.k.a. fused-multiply-add) is one of those functions; which is the workhorse in many HPC applications. The idea is to multiply two floats and add a third with just one rounding, and thus preserving more precision. There are a multitude of applications for this operation in engineering data-analysis, and modern processors come with custom implementations and a lot of hardware to support it natively. I created a ticket along these lines already: https://ghc.haskell.org/trac/ghc/ticket/10364 Edward suggested that the matter should further be discussed here. I think the proposal is rather straightforward, and should be noncontroversial. To wit, we shall add a new method to the RealFloat class: class (RealFrac a, Floating a) => RealFloat a where ... fma :: a -> a -> a -> a The intention is that fma x y z = x * y + z except the multiplication and addition are done infinitely-precisely, and then rounded only once; as opposed to two roundings as one would get with the above implementation. Most modern architectures directly support this operation so we can map it easily; and in case the architecture does not have it available, we can get it via the C-math libraries, where it appears under the names fma (the double version), and fmaf (the float version.) There should be no default definitions; as an incorrect (two-rounding version) would essentially beat the purpose of having fma in the first place. While the name "fma" is well-established in the arithmetic/hardware community and in the C-library, we can also go with "fusedMultiplyAdd," if that is deemed more clear. Discussion period: 2 weeks.

Attachments:

attachment.html (text/html — 2.3 KB)

Show replies by date

Roman Cheplyaka

29 Apr 29 Apr

7:31 a.m.

I think there *should* be a default definition in terms of (+) and (*). A person who defines their own instance for their own purpose should be free to ignore this function if it's not needed for their specific application. On 29/04/15 10:21, Levent Erkok wrote:

...

This proposal is very much in the spirit of the earlier proposal on adding new float/double functions; for instance see here: https://mail.haskell.org/pipermail/libraries/2014-April/022667.html

"fma" (a.k.a. fused-multiply-add) is one of those functions; which is the workhorse in many HPC applications. The idea is to multiply two floats and add a third with just one rounding, and thus preserving more precision. There are a multitude of applications for this operation in engineering data-analysis, and modern processors come with custom implementations and a lot of hardware to support it natively.

I created a ticket along these lines already: https://ghc.haskell.org/trac/ghc/ticket/10364

Edward suggested that the matter should further be discussed here.

I think the proposal is rather straightforward, and should be noncontroversial. To wit, we shall add a new method to the RealFloat class:

class (RealFrac a, Floating a) => RealFloat a where ... fma :: a -> a -> a -> a

The intention is that

fma x y z = x * y + z

except the multiplication and addition are done infinitely-precisely, and then rounded only once; as opposed to two roundings as one would get with the above implementation. Most modern architectures directly support this operation so we can map it easily; and in case the architecture does not have it available, we can get it via the C-math libraries, where it appears under the names fma (the double version), and fmaf (the float version.)

There should be no default definitions; as an incorrect (two-rounding version) would essentially beat the purpose of having fma in the first place.

While the name "fma" is well-established in the arithmetic/hardware community and in the C-library, we can also go with "fusedMultiplyAdd," if that is deemed more clear.

Discussion period: 2 weeks.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Ivan Lazar Miljenovic

7:38 a.m.

On 29 April 2015 at 17:21, Levent Erkok wrote:

...

This proposal is very much in the spirit of the earlier proposal on adding new float/double functions; for instance see here: https://mail.haskell.org/pipermail/libraries/2014-April/022667.html

"fma" (a.k.a. fused-multiply-add) is one of those functions; which is the workhorse in many HPC applications. The idea is to multiply two floats and add a third with just one rounding, and thus preserving more precision. There are a multitude of applications for this operation in engineering data-analysis, and modern processors come with custom implementations and a lot of hardware to support it natively.

I created a ticket along these lines already: https://ghc.haskell.org/trac/ghc/ticket/10364

Edward suggested that the matter should further be discussed here.

I think the proposal is rather straightforward, and should be noncontroversial. To wit, we shall add a new method to the RealFloat class:

class (RealFrac a, Floating a) => RealFloat a where ... fma :: a -> a -> a -> a

The intention is that

fma x y z = x * y + z

Can we please have a better name (even if it's just "fusedMultipleAdd")? I have no real opinion on adding this function or not, but just seeing that function (in other code, doing ":info RealFloat" in ghci, etc.) tells me nothing about what it is. Ideally we wouldn't have to rely upon reading Haddock to understand random TLAs in code.

...

except the multiplication and addition are done infinitely-precisely, and then rounded only once; as opposed to two roundings as one would get with the above implementation. Most modern architectures directly support this operation so we can map it easily; and in case the architecture does not have it available, we can get it via the C-math libraries, where it appears under the names fma (the double version), and fmaf (the float version.)

There should be no default definitions; as an incorrect (two-rounding version) would essentially beat the purpose of having fma in the first place.

While the name "fma" is well-established in the arithmetic/hardware community and in the C-library, we can also go with "fusedMultiplyAdd," if that is deemed more clear.

Discussion period: 2 weeks.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

-- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com http://IvanMiljenovic.wordpress.com

Henning Thielemann

9:19 a.m.

On Wed, 29 Apr 2015, Levent Erkok wrote:

...

This proposal is very much in the spirit of the earlier proposal on adding new float/double functions; for instance see here: https://mail.haskell.org/pipermail/libraries/2014-April/022667.html

Btw. what was the final decision with respect to log1p and expm1? I suggest that the decision for 'fma' will be made consistently with 'log1p' and 'expm1'.

...

"fma" (a.k.a. fused-multiply-add) is one of those functions; which is the workhorse in many HPC applications. The idea is to multiply two floats and add a third with just one rounding, and thus preserving more precision. There are a multitude of applications for this operation in engineering data-analysis, and modern processors come with custom implementations and a lot of hardware to support it natively.

Ok, the proposal is about increasing precision. One could also hope that a single fma operation is faster than separate addition and multiplication but as far as I know, fma can even be slower since it has more data dependencies.

...

I think the proposal is rather straightforward, and should be noncontroversial. To wit, we shall add a new method to the RealFloat class:

class (RealFrac a, Floating a) => RealFloat a where ... fma :: a -> a -> a -> a

RealFloat excludes Complex.

...

There should be no default definitions; as an incorrect (two-rounding version) would essentially beat the purpose of having fma in the first place.

I just read again the whole expm1 thread and default implementations with possible loss of precision seem to be the best option. This way, one can mechanically replace all occurrences of (x*y+z) by (fma x y z) and will not make anything worse. Types with a guaranteed high precision should be put in a Fused class.

...

While the name "fma" is well-established in the arithmetic/hardware community and in the C-library, we can also go with "fusedMultiplyAdd," if that is deemed more clear.

Although I like descriptive names, the numeric classes already contain mostly abbreviations (abs, exp, sin, tanh, ...) Thus I would prefer the abbreviation for consistency. Btw. in DSP 56002 the same operation is called MAC (multiply-accumulate).

Edward Kmett

3:48 p.m.

On Wed, Apr 29, 2015 at 5:19 AM, Henning Thielemann < lemming@henning-thielemann.de> wrote:

...

On Wed, 29 Apr 2015, Levent Erkok wrote:

This proposal is very much in the spirit of the earlier proposal on

...
adding new float/double functions; for instance see here: https://mail.haskell.org/pipermail/libraries/2014-April/022667.html

Btw. what was the final decision with respect to log1p and expm1?

I suggest that the decision for 'fma' will be made consistently with 'log1p' and 'expm1'.

We decided to add them. Then we didn't do it in 7.10. I'll talk to Herbert about how to proceed to get them in 7.12, though we may wait until we know the outcome of this proposal and fuse the two together into one patch.

...

I think the proposal is rather straightforward, and should be

...
noncontroversial. To wit, we shall add a new method to the RealFloat class:

class (RealFrac a, Floating a) => RealFloat a where ... fma :: a -> a -> a -> a

RealFloat excludes Complex.

Good point. If we wanted to we could push this all the way up to Num given the operations involved, and I could see that you could benefit from it there for types that have nothing to do with floating point, e.g. modular arithmetic could get away with using a single 'mod'. There should be no default definitions; as an incorrect (two-rounding

...

...
version) would essentially beat the purpose of having fma in the first place.

I just read again the whole expm1 thread and default implementations with possible loss of precision seem to be the best option. This way, one can mechanically replace all occurrences of (x*y+z) by (fma x y z) and will not make anything worse. Types with a guaranteed high precision should be put in a Fused class.

I argued rather strenuously for this for the expm1, log1p case, but wasn't able to win folks over. While the name "fma" is well-established in the arithmetic/hardware

...

...
community and in the C-library, we can also go with "fusedMultiplyAdd," if that is deemed more clear.

Although I like descriptive names, the numeric classes already contain mostly abbreviations (abs, exp, sin, tanh, ...) Thus I would prefer the abbreviation for consistency. Btw. in DSP 56002 the same operation is called MAC (multiply-accumulate).

I have no strong preference on the name. fusedMultiplyAdd has the benefit that a non-domain-expert can figure it out. fma is traditional. -Edward

Ken T Takusagawa

10:19 p.m.

On Wed, 29 Apr 2015, Edward Kmett wrote:

...

Good point. If we wanted to we could push this all the way up to Num given the operations involved, and I could see that you could benefit from it there for types that have nothing to do with floating point, e.g. modular arithmetic could get away with using a single 'mod'.

I too advocate this go in Num. The place I anticipate seeing fma being used is in some polymorphic linear algebra library, and it is not uncommon (having recently done this myself) to do linear algebra on things that aren't RealFloat, e.g., Rational, Complex, or number-theoretic fields. --ken

Twan van Laarhoven

1 May 1 May

4:44 p.m.

I agree that Num is the place to put this function, with a default implementation. In my mind it is a special combination of (+) and (*), which both live in Num as well. I dislike the name fma, as that is a three letter acronym with no meaning to people who don't do numeric programming. And by putting the function in Num the name would end up in the Prelude. For further bikeshedding: my proposal for a name would mulAdd. But fusedMulAdd or fusedMultiplyAdd would also be fine. Twan On 2015-04-30 00:19, Ken T Takusagawa wrote:

...

On Wed, 29 Apr 2015, Edward Kmett wrote:

...
Good point. If we wanted to we could push this all the way up to Num given the operations involved, and I could see that you could benefit from it there for types that have nothing to do with floating point, e.g. modular arithmetic could get away with using a single 'mod'.

I too advocate this go in Num. The place I anticipate seeing fma being used is in some polymorphic linear algebra library, and it is not uncommon (having recently done this myself) to do linear algebra on things that aren't RealFloat, e.g., Rational, Complex, or number-theoretic fields.

--ken _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

amindfv＠gmail.com

5:07 p.m.

+1 for "mulAdd". The "fused" would be a misnomer if there's a default implementation. Tom El May 1, 2015, a las 12:44, Twan van Laarhoven escribió:

...

I agree that Num is the place to put this function, with a default implementation. In my mind it is a special combination of (+) and (*), which both live in Num as well.

I dislike the name fma, as that is a three letter acronym with no meaning to people who don't do numeric programming. And by putting the function in Num the name would end up in the Prelude.

For further bikeshedding: my proposal for a name would mulAdd. But fusedMulAdd or fusedMultiplyAdd would also be fine.

Twan

On 2015-04-30 00:19, Ken T Takusagawa wrote:

...
On Wed, 29 Apr 2015, Edward Kmett wrote:

...
Good point. If we wanted to we could push this all the way up to Num given the operations involved, and I could see that you could benefit from it there for types that have nothing to do with floating point, e.g. modular arithmetic could get away with using a single 'mod'.

I too advocate this go in Num. The place I anticipate seeing fma being used is in some polymorphic linear algebra library, and it is not uncommon (having recently done this myself) to do linear algebra on things that aren't RealFloat, e.g., Rational, Complex, or number-theoretic fields.

--ken _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

adam vogt

5:35 p.m.

The Num class is defined in GHC.Num, so Prelude could import GHC.Num hiding (fma) to avoid having another round of prelude changes breaking code. On Fri, May 1, 2015 at 12:44 PM, Twan van Laarhoven wrote:

...

I agree that Num is the place to put this function, with a default implementation. In my mind it is a special combination of (+) and (*), which both live in Num as well.

I dislike the name fma, as that is a three letter acronym with no meaning to people who don't do numeric programming. And by putting the function in Num the name would end up in the Prelude.

For further bikeshedding: my proposal for a name would mulAdd. But fusedMulAdd or fusedMultiplyAdd would also be fine.

Twan

On 2015-04-30 00:19, Ken T Takusagawa wrote:

...
On Wed, 29 Apr 2015, Edward Kmett wrote:

Good point. If we wanted to we could push this all the way up to Num

...
given the operations involved, and I could see that you could benefit from it there for types that have nothing to do with floating point, e.g. modular arithmetic could get away with using a single 'mod'.

I too advocate this go in Num. The place I anticipate seeing fma being used is in some polymorphic linear algebra library, and it is not uncommon (having recently done this myself) to do linear algebra on things that aren't RealFloat, e.g., Rational, Complex, or number-theoretic fields.

--ken _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

David Feuer

5:52 p.m.

I'm somewhat opposed to the Num class in general, and very much opposed to calling floating point representations "numbers" in particular. How are they numbers when they don't obey associative or distributive laws, let alone cancellation, commutativity, ....? I know Carter disagrees with me, but I'll stand my ground, resolute! I suppose adding some more nonsense into the trash heap won't do too much more harm, but I'd much rather see some deeper thought about how we want to deal with floating point. On May 1, 2015 1:35 PM, "adam vogt" wrote:

...

The Num class is defined in GHC.Num, so Prelude could import GHC.Num hiding (fma) to avoid having another round of prelude changes breaking code.

On Fri, May 1, 2015 at 12:44 PM, Twan van Laarhoven wrote:

...
I agree that Num is the place to put this function, with a default implementation. In my mind it is a special combination of (+) and (*), which both live in Num as well.

I dislike the name fma, as that is a three letter acronym with no meaning to people who don't do numeric programming. And by putting the function in Num the name would end up in the Prelude.

For further bikeshedding: my proposal for a name would mulAdd. But fusedMulAdd or fusedMultiplyAdd would also be fine.

Twan

On 2015-04-30 00:19, Ken T Takusagawa wrote:

...
On Wed, 29 Apr 2015, Edward Kmett wrote:

Good point. If we wanted to we could push this all the way up to Num

...
given the operations involved, and I could see that you could benefit from it there for types that have nothing to do with floating point, e.g. modular arithmetic could get away with using a single 'mod'.

I too advocate this go in Num. The place I anticipate seeing fma being used is in some polymorphic linear algebra library, and it is not uncommon (having recently done this myself) to do linear algebra on things that aren't RealFloat, e.g., Rational, Complex, or number-theoretic fields.

--ken _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Tikhon Jelvis

6 p.m.

Would it make sense to create a new class for operations like fma that has accuracy guarantees as part of its typeclass laws? Or would managing a bunch of typeclasses like that create too much syntactic, conceptual or performance overhead for actual use? To me, that seems like it could be better than polluting Num—which, after all, features prominently in the Prelude—but it might make for worse discoverability. If we do add it to Num, I strongly support having a default implementation. We don't want to make implementing a custom numeric type any more difficult than it has to be, and somebody unfamiliar with fma would just manually implement it without any optimizations anyhow or just leave it out, incomplete instantiation warnings nonwithstanding. Num is already a bit to big for casual use (I rarely care about signum and abs myself), so making it *bigger* is not appealing. Personally, I'm a bit torn on the naming. Something like mulAdd or fusedMultiplyAdd is great for non-experts, but it feels like fma is something that we only expect experts to care about, so perhaps it's better to name it in line with their expectations. On Fri, May 1, 2015 at 10:52 AM, David Feuer wrote:

...

I'm somewhat opposed to the Num class in general, and very much opposed to calling floating point representations "numbers" in particular. How are they numbers when they don't obey associative or distributive laws, let alone cancellation, commutativity, ....? I know Carter disagrees with me, but I'll stand my ground, resolute! I suppose adding some more nonsense into the trash heap won't do too much more harm, but I'd much rather see some deeper thought about how we want to deal with floating point. On May 1, 2015 1:35 PM, "adam vogt" wrote:

...
The Num class is defined in GHC.Num, so Prelude could import GHC.Num hiding (fma) to avoid having another round of prelude changes breaking code.

On Fri, May 1, 2015 at 12:44 PM, Twan van Laarhoven wrote:

...
I agree that Num is the place to put this function, with a default implementation. In my mind it is a special combination of (+) and (*), which both live in Num as well.

I dislike the name fma, as that is a three letter acronym with no meaning to people who don't do numeric programming. And by putting the function in Num the name would end up in the Prelude.

For further bikeshedding: my proposal for a name would mulAdd. But fusedMulAdd or fusedMultiplyAdd would also be fine.

Twan

On 2015-04-30 00:19, Ken T Takusagawa wrote:

...
On Wed, 29 Apr 2015, Edward Kmett wrote:

Good point. If we wanted to we could push this all the way up to Num

...
given the operations involved, and I could see that you could benefit from it there for types that have nothing to do with floating point, e.g. modular arithmetic could get away with using a single 'mod'.

I too advocate this go in Num. The place I anticipate seeing fma being used is in some polymorphic linear algebra library, and it is not uncommon (having recently done this myself) to do linear algebra on things that aren't RealFloat, e.g., Rational, Complex, or number-theoretic fields.

--ken _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Edward Kmett

2 May 2 May

6:18 a.m.

The main problem that I find in practice with the 'just exile it to another class' argument is that it creates a pain point. Do you implement against the worse implementations of exp or do they use the specialized class that provides harder guarantees for expm1 to avoid destroying all precision very near 1? It means that anything that builds on top of the abstraction you provide gets built two ways at least. I wound up with a lot of code that was written against Monad and Functor separately and spent much of my time dealing with nonsensical "made up" organization issues like "is this version the liftM-like one or the fmap-like one?" If it is in the class then folks can just reach out and use it. (<$) being directly in Functor means you can just reach for it and get better sharing when you 'refill' a functor with a constant. If it was exiled to some other place, there'd always the worry about of whether you should implement for portability or precision and you'll never get to stop thinking about it. -Edward On Fri, May 1, 2015 at 2:00 PM, Tikhon Jelvis wrote:

...

Would it make sense to create a new class for operations like fma that has accuracy guarantees as part of its typeclass laws? Or would managing a bunch of typeclasses like that create too much syntactic, conceptual or performance overhead for actual use?

To me, that seems like it could be better than polluting Num—which, after all, features prominently in the Prelude—but it might make for worse discoverability.

If we do add it to Num, I strongly support having a default implementation. We don't want to make implementing a custom numeric type any more difficult than it has to be, and somebody unfamiliar with fma would just manually implement it without any optimizations anyhow or just leave it out, incomplete instantiation warnings nonwithstanding. Num is already a bit to big for casual use (I rarely care about signum and abs myself), so making it *bigger* is not appealing.

Personally, I'm a bit torn on the naming. Something like mulAdd or fusedMultiplyAdd is great for non-experts, but it feels like fma is something that we only expect experts to care about, so perhaps it's better to name it in line with their expectations.

On Fri, May 1, 2015 at 10:52 AM, David Feuer wrote:

...
I'm somewhat opposed to the Num class in general, and very much opposed to calling floating point representations "numbers" in particular. How are they numbers when they don't obey associative or distributive laws, let alone cancellation, commutativity, ....? I know Carter disagrees with me, but I'll stand my ground, resolute! I suppose adding some more nonsense into the trash heap won't do too much more harm, but I'd much rather see some deeper thought about how we want to deal with floating point. On May 1, 2015 1:35 PM, "adam vogt" wrote:

...
The Num class is defined in GHC.Num, so Prelude could import GHC.Num hiding (fma) to avoid having another round of prelude changes breaking code.

On Fri, May 1, 2015 at 12:44 PM, Twan van Laarhoven wrote:

...
I agree that Num is the place to put this function, with a default implementation. In my mind it is a special combination of (+) and (*), which both live in Num as well.

I dislike the name fma, as that is a three letter acronym with no meaning to people who don't do numeric programming. And by putting the function in Num the name would end up in the Prelude.

For further bikeshedding: my proposal for a name would mulAdd. But fusedMulAdd or fusedMultiplyAdd would also be fine.

Twan

On 2015-04-30 00:19, Ken T Takusagawa wrote:

...
On Wed, 29 Apr 2015, Edward Kmett wrote:

Good point. If we wanted to we could push this all the way up to Num

...
given the operations involved, and I could see that you could benefit from it there for types that have nothing to do with floating point, e.g. modular arithmetic could get away with using a single 'mod'.

I too advocate this go in Num. The place I anticipate seeing fma being used is in some polymorphic linear algebra library, and it is not uncommon (having recently done this myself) to do linear algebra on things that aren't RealFloat, e.g., Rational, Complex, or number-theoretic fields.

--ken _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Brandon Allbery

1 May 1 May

6:11 p.m.

On Fri, May 1, 2015 at 5:52 PM, David Feuer wrote:

...

I'm somewhat opposed to the Num class in general, and very much opposed to calling floating point representations "numbers" in particular. How are they numbers when they don't obey associative or distributive laws, let alone cancellation, commutativity, ....? I know Carter

TBH I think Num is a lost cause. If you want mathematical numbers, set up a parallel class instead of trying to force a class designed for numbers "in the wild" to be a pure theory class. This operation in particular is *all about* numbers in the wild --- it has no place in theory, it's an optimization for hardware implementations. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

Carter Schonwald

2 May 2 May

2:47 a.m.

well said brandon. FMA support is absolutely a mathematical accuracy and performance engineering thing (except when it hinder performance ). It is worth noting that most modern CPUS support several *different* versions of the FMA operation, but thats beyond the scope / goal of this proposal I think. but yeah, for all of Num's warts, probably the right place to put it, with a default implementation in terms of * and + (and compiler supplied primops for applicable prelude types) On Fri, May 1, 2015 at 2:11 PM, Brandon Allbery wrote:

...

On Fri, May 1, 2015 at 5:52 PM, David Feuer wrote:

...
I'm somewhat opposed to the Num class in general, and very much opposed to calling floating point representations "numbers" in particular. How are they numbers when they don't obey associative or distributive laws, let alone cancellation, commutativity, ....? I know Carter

TBH I think Num is a lost cause. If you want mathematical numbers, set up a parallel class instead of trying to force a class designed for numbers "in the wild" to be a pure theory class.

This operation in particular is *all about* numbers in the wild --- it has no place in theory, it's an optimization for hardware implementations.

-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

wren romano

30 Apr 30 Apr

5:56 p.m.

On Wed, Apr 29, 2015 at 11:48 AM, Edward Kmett wrote:

...

Good point. If we wanted to we could push this all the way up to Num given the operations involved, and I could see that you could benefit from it there for types that have nothing to do with floating point, e.g. modular arithmetic could get away with using a single 'mod'.

I'm strongly in favor of adding fma *somewhere*, even if just as a family of primops; though, of course, it'd be nicer to put it in a type class so we don't have to pull in GHC.Exts. And as far as type classes go, I'm strongly in favor of pushing it all the way up to Num (or rather, to Semiring if only we had such a thing). There's no conceptual reason for it to live in RealFloat. -- Live well, ~wren

Takenobu Tani

3 May 3 May

7:42 a.m.

Hi, little information. General CPUs use term of "FMA" for "Mul + Add" operation and implement special instructions. x86(AMD64, Intel64) has FMA instructions: FMADD132PD, ... ARM has FMA instructions: VMLA, ... In DSP culture, it's called "MAC(Multiply and Accumulator)". Traditional DSPs have MAC(Multiply and Accumulator) instructions: TI's C67 has MAC instructions: MAC, ... If you map "fma" function to cpu's raw instruction, be careful for rounding and saturation mode. BTW, "FMA" operation is defined in IEEE754-2008 standard. Regards, Takenobu 2015-04-29 18:19 GMT+09:00 Henning Thielemann

...

:

...

On Wed, 29 Apr 2015, Levent Erkok wrote:

This proposal is very much in the spirit of the earlier proposal on

...
adding new float/double functions; for instance see here: https://mail.haskell.org/pipermail/libraries/2014-April/022667.html

Btw. what was the final decision with respect to log1p and expm1?

I suggest that the decision for 'fma' will be made consistently with 'log1p' and 'expm1'.

"fma" (a.k.a. fused-multiply-add) is one of those functions; which is the

...
workhorse in many HPC applications. The idea is to multiply two floats and add a third with just one rounding, and thus preserving more precision. There are a multitude of applications for this operation in engineering data-analysis, and modern processors come with custom implementations and a lot of hardware to support it natively.

Ok, the proposal is about increasing precision. One could also hope that a single fma operation is faster than separate addition and multiplication but as far as I know, fma can even be slower since it has more data dependencies.

I think the proposal is rather straightforward, and should be

...
noncontroversial. To wit, we shall add a new method to the RealFloat class:

class (RealFrac a, Floating a) => RealFloat a where ... fma :: a -> a -> a -> a

RealFloat excludes Complex.

There should be no default definitions; as an incorrect (two-rounding

...
version) would essentially beat the purpose of having fma in the first place.

I just read again the whole expm1 thread and default implementations with possible loss of precision seem to be the best option. This way, one can mechanically replace all occurrences of (x*y+z) by (fma x y z) and will not make anything worse. Types with a guaranteed high precision should be put in a Fused class.

While the name "fma" is well-established in the arithmetic/hardware

...
community and in the C-library, we can also go with "fusedMultiplyAdd," if that is deemed more clear.

Although I like descriptive names, the numeric classes already contain mostly abbreviations (abs, exp, sin, tanh, ...) Thus I would prefer the abbreviation for consistency. Btw. in DSP 56002 the same operation is called MAC (multiply-accumulate). _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

David Feuer

2:27 p.m.

We have (almost) no tradition of using CPU instruction names for our own function, and I don't see why now is the time to start. To take a recent example, we have countLeadingZeros and countTrailingZeros rather than clz, ctz, ctlz, cttz, bsf, bsr, etc. We also have popCount instead of popcnt, and use shiftR and shiftL instead of things like shl, shr, sla, sal, sra, sar, etc. Thus I am -1 on calling this thing fma. multiplyAdd seems more reasonable to me. On Sun, May 3, 2015 at 3:42 AM, Takenobu Tani wrote:

...

Hi,

little information.

General CPUs use term of "FMA" for "Mul + Add" operation and implement special instructions.

x86(AMD64, Intel64) has FMA instructions: FMADD132PD, ...

ARM has FMA instructions: VMLA, ...

In DSP culture, it's called "MAC(Multiply and Accumulator)". Traditional DSPs have MAC(Multiply and Accumulator) instructions:

TI's C67 has MAC instructions: MAC, ...

If you map "fma" function to cpu's raw instruction, be careful for rounding and saturation mode.

BTW, "FMA" operation is defined in IEEE754-2008 standard.

Regards, Takenobu

2015-04-29 18:19 GMT+09:00 Henning Thielemann < lemming@henning-thielemann.de>:

...
On Wed, 29 Apr 2015, Levent Erkok wrote:

This proposal is very much in the spirit of the earlier proposal on

...
adding new float/double functions; for instance see here: https://mail.haskell.org/pipermail/libraries/2014-April/022667.html

Btw. what was the final decision with respect to log1p and expm1?

I suggest that the decision for 'fma' will be made consistently with 'log1p' and 'expm1'.

"fma" (a.k.a. fused-multiply-add) is one of those functions; which is

...
the workhorse in many HPC applications. The idea is to multiply two floats and add a third with just one rounding, and thus preserving more precision. There are a multitude of applications for this operation in engineering data-analysis, and modern processors come with custom implementations and a lot of hardware to support it natively.

Ok, the proposal is about increasing precision. One could also hope that a single fma operation is faster than separate addition and multiplication but as far as I know, fma can even be slower since it has more data dependencies.

I think the proposal is rather straightforward, and should be

...
noncontroversial. To wit, we shall add a new method to the RealFloat class:

class (RealFrac a, Floating a) => RealFloat a where ... fma :: a -> a -> a -> a

RealFloat excludes Complex.

There should be no default definitions; as an incorrect (two-rounding

...
version) would essentially beat the purpose of having fma in the first place.

I just read again the whole expm1 thread and default implementations with possible loss of precision seem to be the best option. This way, one can mechanically replace all occurrences of (x*y+z) by (fma x y z) and will not make anything worse. Types with a guaranteed high precision should be put in a Fused class.

While the name "fma" is well-established in the arithmetic/hardware

...
community and in the C-library, we can also go with "fusedMultiplyAdd," if that is deemed more clear.

Although I like descriptive names, the numeric classes already contain mostly abbreviations (abs, exp, sin, tanh, ...) Thus I would prefer the abbreviation for consistency. Btw. in DSP 56002 the same operation is called MAC (multiply-accumulate). _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Levent Erkok

9:11 p.m.

Thank you for all the feedback on this proposal. Based on the feedback, I came to conclude that the original idea did not really capture what I really was after, and hence I think this proposal needs to be shelved for the time being. I want to summarize the points made so far: * Almost everyone agrees that we should have this functionality available. (But see below for the direction I want to take it in.) * There's some disagreement on the name chosen, but I think this is less important for the time being. * The biggest gripe is where does "fma" really belong. Original suggestion was 'RealFloat', but people pointed 'Num' is just a good place as well. * Most folks want a default definition, and see "fma" as an optimization. It is these last two points actually that convinced me this proposal is not really what I want to have. I do not see "fma" as an optimization. In particular, I'd be very concerned if the compiler substituted "fma x y z" for "x*y+z". The entire reason why IEEE754 has an fma operation is because those two expressions have different values in general. By the same token, I'm also against providing a default implementation. I see this not as an increased-precision issue, but rather a semantic one; where "x*y+z" and "fma x y z" *should* produce two different values, per the IEEE754 spec. It's not really an optimization, but how floating-point values work. In that sense "fma" is a separate operation that's related to multiplication and addition, but is not definable in those terms alone. Having said that, it was also pointed out that for non-float values this can act as an optimization. (Modular arithmetic was given as an example.) I'd think that functionality is quite different than the original proposal, and perhaps should be tackled separately. My original proposal was not aiming for that particular use case. My original motivation was to give Haskell access to the floating-point circuitry that hardware-manufacturers are putting a lot of effort and energy into. It's a shame that modern processors provide a ton of instructions around floating-point operations, but such operations are simply very hard to use from many high-level languages, including Haskell. Two other points were raised, that also convinced me to seek an alternative solution: * Tikhon Jelvis suggested these functions should be put in a different class, which suggests that we're following IEEE754, and not some idealized model of numbers. I think this suggestion is spot on, and is very much in line with what I wanted to have. * Takebonu Tani kindly pointed that a discussion of floats in the absence of rounding-modes is a moot one, as the entire semantics is based on rounding. Haskell simply picks "RoundNearestTiesToEven," but there are 4 other rounding modes defined by IEEE754, and I think we need a way to access those from Haskell in a convenient way. Based on this analysis, I'm withdrawing the original proposal. I think fma and other floating-point arithmetic operations are very important to support properly, but it should not be done by tacking them on to Num or RealFloat; but rather in a new class that also considers rounding-mode properly. The advantage of the "separate" class approach is, of course, I (or someone else) can create such a class and push it on to hackage, using FFI to delegate the task of implementation to the land-of-C, by supporting rounding modes and other floating-point weirdness appropriately. Once that class stabilizes and its details are ironed out, then we can imagine cooperating with GHC folks to actually bypass the FFI and directly generate native code whenever possible. This is the direction I intend to move on. Please drop me a line if you'd like to help out and/or have any feedback. Thanks! -Levent. On Sun, May 3, 2015 at 7:27 AM, David Feuer wrote:

...

We have (almost) no tradition of using CPU instruction names for our own function, and I don't see why now is the time to start. To take a recent example, we have countLeadingZeros and countTrailingZeros rather than clz, ctz, ctlz, cttz, bsf, bsr, etc. We also have popCount instead of popcnt, and use shiftR and shiftL instead of things like shl, shr, sla, sal, sra, sar, etc. Thus I am -1 on calling this thing fma. multiplyAdd seems more reasonable to me.

On Sun, May 3, 2015 at 3:42 AM, Takenobu Tani wrote:

...
Hi,

little information.

General CPUs use term of "FMA" for "Mul + Add" operation and implement special instructions.

x86(AMD64, Intel64) has FMA instructions: FMADD132PD, ...

ARM has FMA instructions: VMLA, ...

In DSP culture, it's called "MAC(Multiply and Accumulator)". Traditional DSPs have MAC(Multiply and Accumulator) instructions:

TI's C67 has MAC instructions: MAC, ...

If you map "fma" function to cpu's raw instruction, be careful for rounding and saturation mode.

BTW, "FMA" operation is defined in IEEE754-2008 standard.

Regards, Takenobu

2015-04-29 18:19 GMT+09:00 Henning Thielemann < lemming@henning-thielemann.de>:

...
On Wed, 29 Apr 2015, Levent Erkok wrote:

This proposal is very much in the spirit of the earlier proposal on

...
adding new float/double functions; for instance see here: https://mail.haskell.org/pipermail/libraries/2014-April/022667.html

Btw. what was the final decision with respect to log1p and expm1?

I suggest that the decision for 'fma' will be made consistently with 'log1p' and 'expm1'.

"fma" (a.k.a. fused-multiply-add) is one of those functions; which is

...
the workhorse in many HPC applications. The idea is to multiply two floats and add a third with just one rounding, and thus preserving more precision. There are a multitude of applications for this operation in engineering data-analysis, and modern processors come with custom implementations and a lot of hardware to support it natively.

Ok, the proposal is about increasing precision. One could also hope that a single fma operation is faster than separate addition and multiplication but as far as I know, fma can even be slower since it has more data dependencies.

I think the proposal is rather straightforward, and should be

...
noncontroversial. To wit, we shall add a new method to the RealFloat class:

class (RealFrac a, Floating a) => RealFloat a where ... fma :: a -> a -> a -> a

RealFloat excludes Complex.

There should be no default definitions; as an incorrect (two-rounding

...
version) would essentially beat the purpose of having fma in the first place.

I just read again the whole expm1 thread and default implementations with possible loss of precision seem to be the best option. This way, one can mechanically replace all occurrences of (x*y+z) by (fma x y z) and will not make anything worse. Types with a guaranteed high precision should be put in a Fused class.

While the name "fma" is well-established in the arithmetic/hardware

...
community and in the C-library, we can also go with "fusedMultiplyAdd," if that is deemed more clear.

Although I like descriptive names, the numeric classes already contain mostly abbreviations (abs, exp, sin, tanh, ...) Thus I would prefer the abbreviation for consistency. Btw. in DSP 56002 the same operation is called MAC (multiply-accumulate). _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Roman Cheplyaka

10:03 p.m.

Thanks for taking time to write this, Levent. Now that you explain this in such detail, it's clear why implementing fma in terms of add and multiply is wrong. I also have to admit that upon the first reading of your proposal, I confused RealFloat with RealFrac. Since RealFloat should only be implemented by actual floating-point types, I retract my earlier objection. And the idea of putting the IEEE754-specific functions in a separate class (or even module) sounds reasonable, too. On 04/05/15 00:11, Levent Erkok wrote:

...

Thank you for all the feedback on this proposal. Based on the feedback, I came to conclude that the original idea did not really capture what I really was after, and hence I think this proposal needs to be shelved for the time being.

I want to summarize the points made so far:

* Almost everyone agrees that we should have this functionality available. (But see below for the direction I want to take it in.) * There's some disagreement on the name chosen, but I think this is less important for the time being. * The biggest gripe is where does "fma" really belong. Original suggestion was 'RealFloat', but people pointed 'Num' is just a good place as well. * Most folks want a default definition, and see "fma" as an optimization.

It is these last two points actually that convinced me this proposal is not really what I want to have. I do not see "fma" as an optimization. In particular, I'd be very concerned if the compiler substituted "fma x y z" for "x*y+z". The entire reason why IEEE754 has an fma operation is because those two expressions have different values in general. By the same token, I'm also against providing a default implementation. I see this not as an increased-precision issue, but rather a semantic one; where "x*y+z" and "fma x y z" *should* produce two different values, per the IEEE754 spec. It's not really an optimization, but how floating-point values work. In that sense "fma" is a separate operation that's related to multiplication and addition, but is not definable in those terms alone.

Having said that, it was also pointed out that for non-float values this can act as an optimization. (Modular arithmetic was given as an example.) I'd think that functionality is quite different than the original proposal, and perhaps should be tackled separately. My original proposal was not aiming for that particular use case.

My original motivation was to give Haskell access to the floating-point circuitry that hardware-manufacturers are putting a lot of effort and energy into. It's a shame that modern processors provide a ton of instructions around floating-point operations, but such operations are simply very hard to use from many high-level languages, including Haskell.

Two other points were raised, that also convinced me to seek an alternative solution:

* Tikhon Jelvis suggested these functions should be put in a different class, which suggests that we're following IEEE754, and not some idealized model of numbers. I think this suggestion is spot on, and is very much in line with what I wanted to have. * Takebonu Tani kindly pointed that a discussion of floats in the absence of rounding-modes is a moot one, as the entire semantics is based on rounding. Haskell simply picks "RoundNearestTiesToEven," but there are 4 other rounding modes defined by IEEE754, and I think we need a way to access those from Haskell in a convenient way.

Based on this analysis, I'm withdrawing the original proposal. I think fma and other floating-point arithmetic operations are very important to support properly, but it should not be done by tacking them on to Num or RealFloat; but rather in a new class that also considers rounding-mode properly.

The advantage of the "separate" class approach is, of course, I (or someone else) can create such a class and push it on to hackage, using FFI to delegate the task of implementation to the land-of-C, by supporting rounding modes and other floating-point weirdness appropriately. Once that class stabilizes and its details are ironed out, then we can imagine cooperating with GHC folks to actually bypass the FFI and directly generate native code whenever possible.

This is the direction I intend to move on. Please drop me a line if you'd like to help out and/or have any feedback.

Thanks!

-Levent.

On Sun, May 3, 2015 at 7:27 AM, David Feuer mailto:david.feuer@gmail.com> wrote:

We have (almost) no tradition of using CPU instruction names for our own function, and I don't see why now is the time to start. To take a recent example, we have countLeadingZeros and countTrailingZeros rather than clz, ctz, ctlz, cttz, bsf, bsr, etc. We also have popCount instead of popcnt, and use shiftR and shiftL instead of things like shl, shr, sla, sal, sra, sar, etc. Thus I am -1 on calling this thing fma. multiplyAdd seems more reasonable to me.

On Sun, May 3, 2015 at 3:42 AM, Takenobu Tani mailto:takenobu.hs@gmail.com> wrote:

Hi,

little information.

General CPUs use term of "FMA" for "Mul + Add" operation and implement special instructions.

x86(AMD64, Intel64) has FMA instructions: FMADD132PD, ...

ARM has FMA instructions: VMLA, ...

In DSP culture, it's called "MAC(Multiply and Accumulator)". Traditional DSPs have MAC(Multiply and Accumulator) instructions:

TI's C67 has MAC instructions: MAC, ...

If you map "fma" function to cpu's raw instruction, be careful for rounding and saturation mode.

BTW, "FMA" operation is defined in IEEE754-2008 standard.

Regards, Takenobu

2015-04-29 18:19 GMT+09:00 Henning Thielemann mailto:lemming@henning-thielemann.de>:

On Wed, 29 Apr 2015, Levent Erkok wrote:

This proposal is very much in the spirit of the earlier proposal on adding new float/double functions; for instance see here: https://mail.haskell.org/pipermail/libraries/2014-April/022667.html

Btw. what was the final decision with respect to log1p and expm1?

I suggest that the decision for 'fma' will be made consistently with 'log1p' and 'expm1'.

"fma" (a.k.a. fused-multiply-add) is one of those functions; which is the workhorse in many HPC applications. The idea is to multiply two floats and add a third with just one rounding, and thus preserving more precision. There are a multitude of applications for this operation in engineering data-analysis, and modern processors come with custom implementations and a lot of hardware to support it natively.

Ok, the proposal is about increasing precision. One could also hope that a single fma operation is faster than separate addition and multiplication but as far as I know, fma can even be slower since it has more data dependencies.

I think the proposal is rather straightforward, and should be noncontroversial. To wit, we shall add a new method to the RealFloat class:

class (RealFrac a, Floating a) => RealFloat a where ... fma :: a -> a -> a -> a

RealFloat excludes Complex.

There should be no default definitions; as an incorrect (two-rounding version) would essentially beat the purpose of having fma in the first place.

I just read again the whole expm1 thread and default implementations with possible loss of precision seem to be the best option. This way, one can mechanically replace all occurrences of (x*y+z) by (fma x y z) and will not make anything worse. Types with a guaranteed high precision should be put in a Fused class.

While the name "fma" is well-established in the arithmetic/hardware community and in the C-library, we can also go with "fusedMultiplyAdd," if that is deemed more clear.

Although I like descriptive names, the numeric classes already contain mostly abbreviations (abs, exp, sin, tanh, ...) Thus I would prefer the abbreviation for consistency. Btw. in DSP 56002 the same operation is called MAC (multiply-accumulate).

Mike Meyer

11:05 p.m.

On Sun, May 3, 2015 at 4:11 PM, Levent Erkok wrote: * Tikhon Jelvis suggested these functions should be put in a different class, which suggests that we're following IEEE754, and not some idealized model of numbers. I think this suggestion is spot on, and is very much in line with what I wanted to have. This is very much in line with a suggestion I've been toying with for a long time. Basically, we have three different ideas for how floats should behave, and the current implementation isn't any of them. So I've been thinking that we ought to deal with this by moving Float out of Prelude - or at least large chunks of it. The three different models are: 1) Real numbers. We aren't going to get those. 2) IEEE Floats. This is what we've got, except as noted, there are lots of things that come with this that we don't provide. 3) Floats that obey the laws of Num. We don't get that, mostly because getting #2 breaks things. The breakage of #3 causes people creates behavior that's surprising - at least to people who aren't familiar with IEEE Floats. So the proposal I've been toying with was something along the lines of breaking RealFloat up along class lines. Those classes where RealFloat obeyed the class laws and IEEE Float behavior would stay in RealFloat. The rest would move out, and could be gotten by importing either Data.Float.IEEE or Data.Float.Num (or some such). Ideally, this will leave enough floating point behavior in the Prelude that doing simple calculations would just work - at least as well as it ever did, anyway. When you start doing things that can currently generate surprising results, you will need to import one of the two options. Figuring out which one means there's a chance you'll also figure out why you sometimes get those surprising results.

Carter Schonwald

11:50 p.m.

.... how would you have an implementation of finite precision floating point that has the "expected" exact algebraic laws for * and +? I would argue that Float and Double do satisfy a form of the standard algebric laws where equality is approximate. eg (a+(b+c)) - ((a+b)+c) <= \epsilon, where epsilon is some constant multiple of max(ulp(a),ulp(b),ulp(c)). (a similar idea applies to pretty much any other algebraic law you can state, such as distributivity etc) I do think that it'd be useful if the RealFloat class provided an ulp function (unit of least precision), which is available as part of any IEEE compliant c float library. there are MANY computable number represntations where the *exact* algebraic laws dont hold, but this *approximate* form which provides some notion of bounded forwards/backwards relative/absolute error bound guarantee in a particularly strong way. i think we should figure out articulating laws that play nice for both the *exact* and *approximate* universes. the On Sun, May 3, 2015 at 7:05 PM, Mike Meyer wrote:

...

On Sun, May 3, 2015 at 4:11 PM, Levent Erkok wrote: * Tikhon Jelvis suggested these functions should be put in a different class, which suggests that we're following IEEE754, and not some idealized model of numbers. I think this suggestion is spot on, and is very much in line with what I wanted to have.

This is very much in line with a suggestion I've been toying with for a long time. Basically, we have three different ideas for how floats should behave, and the current implementation isn't any of them. So I've been thinking that we ought to deal with this by moving Float out of Prelude - or at least large chunks of it.

The three different models are:

1) Real numbers. We aren't going to get those.

2) IEEE Floats. This is what we've got, except as noted, there are lots of things that come with this that we don't provide.

3) Floats that obey the laws of Num. We don't get that, mostly because getting #2 breaks things.

The breakage of #3 causes people creates behavior that's surprising - at least to people who aren't familiar with IEEE Floats.

So the proposal I've been toying with was something along the lines of breaking RealFloat up along class lines. Those classes where RealFloat obeyed the class laws and IEEE Float behavior would stay in RealFloat. The rest would move out, and could be gotten by importing either Data.Float.IEEE or Data.Float.Num (or some such).

Ideally, this will leave enough floating point behavior in the Prelude that doing simple calculations would just work - at least as well as it ever did, anyway. When you start doing things that can currently generate surprising results, you will need to import one of the two options. Figuring out which one means there's a chance you'll also figure out why you sometimes get those surprising results.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Mike Meyer

4 May 4 May

12:13 a.m.

On Sun, May 3, 2015 at 6:50 PM, Carter Schonwald

...

wrote:

...

.... how would you have an implementation of finite precision floating point that has the "expected" exact algebraic laws for * and +?

That's model #1 that we can't have. So you don't.

...

I would argue that Float and Double do satisfy a form of the standard algebric laws where equality is approximate.

eg (a+(b+c)) - ((a+b)+c) <= \epsilon, where epsilon is some constant multiple of max(ulp(a),ulp(b),ulp(c)). (a similar idea applies to pretty much any other algebraic law you can state, such as distributivity etc)

So how do you fix the fact that any comparison with a NaN and a non-NaN is false? Among other IEEE oddities.

...

I do think that it'd be useful if the RealFloat class provided an ulp function (unit of least precision), which is available as part of any IEEE compliant c float library.

there are MANY computable number represntations where the *exact* algebraic laws dont hold, but this *approximate* form which provides some notion of bounded forwards/backwards relative/absolute error bound guarantee in a particularly strong way.

True. That's the root of the problem the proposal is trying to solve.

...

i think we should figure out articulating laws that play nice for both the *exact* and *approximate* universes.

We also need laws that play nice for the IEEE universe, because people doing serious numerical work want that one. I believe you will wind up with two different sets of laws, which is why I proposed taking the parts that don't agree out of the Prelude, and letting users import the ones they want to use.

Joachim Breitner

8:14 a.m.

Hi, Am Sonntag, den 03.05.2015, 14:11 -0700 schrieb Levent Erkok:

...

Based on this analysis, I'm withdrawing the original proposal. I think fma and other floating-point arithmetic operations are very important to support properly, but it should not be done by tacking them on to Num or RealFloat; but rather in a new class that also considers rounding-mode properly.

does it really have to be a class? How much genuinely polymorphic code is there out there that yet requires this precise handling of precision? Have you considered adding it as monomorphic functions fmaDouble, fmaFloat etc. on hackage, using FFI? Then those who need these functions can start to use them. Furthermore you can start getting the necessary primops supported in GHC, and have your library transparently use them when available. And only then, when we have the implementation in place and actual users, we can evaluate whether we need an abstract class for this. Greetings, Joachim -- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

Edward Kmett

2:20 p.m.

Quite a bit actually. Consider something like: http://hackage.haskell.org/package/ad-4.2.1.1/docs/src/Numeric-AD-Rank1-Newt... The step function in there could be trivially adapted to using fused multiplyAdd and precision would just improve. If such a member _were_ in Num, I'd use it in a heartbeat there. If it were in an extra class? I'd have to make a second copy of the function to even try to see the precision win. Most of my numeric code is generic in some fashion, working over vector spaces or simpler number types just as easily. As this proposal has been withdrawn, the point is more or less moot for now. -Edward On Mon, May 4, 2015 at 4:14 AM, Joachim Breitner wrote:

...

Hi,

Am Sonntag, den 03.05.2015, 14:11 -0700 schrieb Levent Erkok:

...
Based on this analysis, I'm withdrawing the original proposal. I think fma and other floating-point arithmetic operations are very important to support properly, but it should not be done by tacking them on to Num or RealFloat; but rather in a new class that also considers rounding-mode properly.

does it really have to be a class? How much genuinely polymorphic code is there out there that yet requires this precise handling of precision?

Have you considered adding it as monomorphic functions fmaDouble, fmaFloat etc. on hackage, using FFI? Then those who need these functions can start to use them.

Furthermore you can start getting the necessary primops supported in GHC, and have your library transparently use them when available.

And only then, when we have the implementation in place and actual users, we can evaluate whether we need an abstract class for this.

Greetings, Joachim

-- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Yitzchak Gale

4:11 p.m.

Levent Erkok wrote:

...

...
...I think this proposal needs to be shelved for the time being.

I wrote:

...

Nevertheless, I vote for doing it now.

Edward Kmett wrote:

...

As this proposal has been withdrawn, the point is more or less moot for now.

OK, let me make myself more clear. I hereby propose the exact same proposal that Levant originally proposed in this thread and then withdrew, with the caveat that the scope of the proposal is explicitly orthogonal to any large scale change to the way we do floating point. Discussion period: 2 weeks, minus time spent so far in this thread since Levant's original proposal. Thanks, Yitz

Levent Erkok

5:36 p.m.

Yitz: Thanks for taking over. I do agree that "fma" can just be added to the Num class with all the ramifications and treated as an "optimization." But that's a different proposal than what I had in mind, so I'm perfectly happy you pursuing this version. Just one comment: The name "FMA" is quite overloaded, and perhaps it should be reserved to the true IEEE754 version. I think someone suggested 'mulAccum' as an alternative, which does make sense if one thinks about the dot-product operation. Please be absolutely clear in the documentation that this is not the IEEE754-fma; but rather a fused-multiply-add operation that is used for the Num class, following some idealized notion of numbers. In particular, the compiler should be free to substitute "a*b+c" with "mulAccum a b c". The latter (i.e., the IEEE754 variant) should be addressed in a different proposal that I intend to work on separately. -Levent. On Mon, May 4, 2015 at 9:11 AM, Yitzchak Gale wrote:

...

Levent Erkok wrote:

...
...
...I think this proposal needs to be shelved for the time being.

I wrote:

...
Nevertheless, I vote for doing it now.

Edward Kmett wrote:

...
As this proposal has been withdrawn, the point is more or less moot for now.

OK, let me make myself more clear.

I hereby propose the exact same proposal that Levant originally proposed in this thread and then withdrew, with the caveat that the scope of the proposal is explicitly orthogonal to any large scale change to the way we do floating point.

Discussion period: 2 weeks, minus time spent so far in this thread since Levant's original proposal.

Thanks, Yitz _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Artyom

5:40 p.m.

On 05/04/2015 08:36 PM, Levent Erkok wrote:

...

In particular, the compiler should be free to substitute "a*b+c" with "mulAccum a b c". But isn't it unacceptable in some cases? For instance, in this case (taken from Wikipedia): If /x/^2 − /y/^2 is evaluated as ((/x/×/x/) − /y/×/y/) using fused multiply–add, then the result may be negative even when /x/ = /y/ due to the first multiplication discarding low significance bits. This could then lead to an error if, for instance, the square root of the result is then evaluated.

Levent Erkok

5:49 p.m.

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster. On Mon, May 4, 2015 at 10:40 AM, Artyom wrote:

...

On 05/04/2015 08:36 PM, Levent Erkok wrote:

In particular, the compiler should be free to substitute "a*b+c" with "mulAccum a b c".

But isn't it unacceptable in some cases? For instance, in this case (taken from Wikipedia):

If *x*2 − *y*2 is evaluated as ((*x*×*x*) − *y*×*y*) using fused multiply–add, then the result may be negative even when *x* = *y* due to the first multiplication discarding low significance bits. This could then lead to an error if, for instance, the square root of the result is then evaluated.

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Artyom

5:58 p.m.

On 05/04/2015 08:49 PM, Levent Erkok wrote:

...

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster. No, it looks to me that Edward wants to have a more precise operation in Num: I'd have to make a second copy of the function to even try to see the precision win. Unless I'm wrong, you can't have the following things simultaneously:

...

The true IEEE754 variants where precision does matter should be part of a different class. So, does it mean that you're fine with not having point #3 because

1. the compiler is free to substitute /a+b*c/ with /mulAdd a b c/ 2. /mulAdd a b c/ is implemented as /fma/ for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754 people who need it would be able to use a separate class for IEEE754 floats?

Levent Erkok

6:22 p.m.

I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization. "fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics. I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly. I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so. -Levent. On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

...

On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

Carter Schonwald

5 May 5 May

2:54 a.m.

pardon the wall of text everyone, but I really want some FMA tooling :) I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12 i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things. @levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off. relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully. point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious. to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users. A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage. Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor. I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice. If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that. I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal. again, pardon the wall of text, i just really want to have nice things :) -Carter On Mon, May 4, 2015 at 2:22 PM, Levent Erkok wrote:

...

I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

...
On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Levent Erkok

4:54 a.m.

Carter: Wall of text is just fine! I'm personally happy to see the results of your experiment. In particular, the better "code-generation" facilities you add around floats/doubles that map to the underlying hardware's native instructions, the better. When we do have proper IEEE floats, we shall surely need all that functionality. While you're working on this, if you can also watch out for how rounding modes can be integrated into the operations, that would be useful as well. I can see at least two designs: * One where the rounding mode goes with the operation: `fpAdd RoundNearestTiesToEven 2.5 6.4`. This is the "cleanest" and the functional solution, but could get quite verbose; and might be costly if the implementation changes the rounding-mode at every issue. * The other is where the operations simply assume the RoundNearestTiesToEven, but we have lifted IO versions that can be modified with a "with" like construct: `withRoundingMode RoundTowardsPositive $ fpAddRM 2.5 6.4`. Note that `fpAddRM` (*not* `fpAdd` as before) will have to return some sort of a monadic value (probably in the IO monad) since it'll need to access the rounding mode currently active. Neither choice jumps out at me as the best one; and a hybrid might also be possible. I'd love to hear any insight you gain regarding rounding-modes during your experiment. -Levent. On Mon, May 4, 2015 at 7:54 PM, Carter Schonwald

...

wrote:

...

pardon the wall of text everyone, but I really want some FMA tooling :)

I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12

i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things.

@levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off.

relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully.

point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious.

to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users.

A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage.

Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor.

I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice.

If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that.

I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal.

again, pardon the wall of text, i just really want to have nice things :) -Carter

On Mon, May 4, 2015 at 2:22 PM, Levent Erkok wrote:

...
I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

...
On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Scott Turner

11:22 a.m.

On 2015-05-05 00:54, Levent Erkok wrote:

...

I can see at least two designs:

* One where the rounding mode goes with the operation: `fpAdd RoundNearestTiesToEven 2.5 6.4`. This is the "cleanest" and the functional solution, but could get quite verbose; and might be costly if the implementation changes the rounding-mode at every issue.

* The other is where the operations simply assume the RoundNearestTiesToEven, but we have lifted IO versions that can be modified with a "with" like construct: `withRoundingMode RoundTowardsPositive $ fpAddRM 2.5 6.4`. Note that `fpAddRM` (*not* `fpAdd` as before) will have to return some sort of a monadic value (probably in the IO monad) since it'll need to access the rounding mode currently active.

Neither choice jumps out at me as the best one; and a hybrid might also be possible. I'd love to hear any insight you gain regarding rounding-modes during your experiment.

The monadic alternative is more readily extensible to handle IEEE 754's sticky flags: inexact, overflow, underflow, divide-by-zero, and invalid.

Edward Kmett

2:51 p.m.

On Tue, May 5, 2015 at 7:22 AM, Scott Turner <2haskell@pkturner.org> wrote:

...

On 2015-05-05 00:54, Levent Erkok wrote:

...
I can see at least two designs:

* One where the rounding mode goes with the operation: `fpAdd RoundNearestTiesToEven 2.5 6.4`. This is the "cleanest" and the functional solution, but could get quite verbose; and might be costly if the implementation changes the rounding-mode at every issue.

* The other is where the operations simply assume the RoundNearestTiesToEven, but we have lifted IO versions that can be modified with a "with" like construct: `withRoundingMode RoundTowardsPositive $ fpAddRM 2.5 6.4`. Note that `fpAddRM` (*not* `fpAdd` as before) will have to return some sort of a monadic value (probably in the IO monad) since it'll need to access the rounding mode currently active.

Neither choice jumps out at me as the best one; and a hybrid might also be possible. I'd love to hear any insight you gain regarding rounding-modes during your experiment.

The monadic alternative is more readily extensible to handle IEEE 754's sticky flags: inexact, overflow, underflow, divide-by-zero, and invalid.

This gets messier than you'd think. Keep in mind we switch contexts within our own green threads constantly on shared system threads / capabilities so the current rounding mode, sticky flags, etc. would become something you'd have to hold per Thread, and then change proactively as threads migrate between CPUs / capabilities, which we're basically completely unaware of right now. This was what I learned when I tried my own hand at it and failed: http://hackage.haskell.org/package/rounding There found I gave up, and moved setting the rounding mode into custom primitives themselves. But even then you find other problems! The libm versions of almost every combinator doesn't just give slightly wrong answers when you switch rounding modes, it gives _completely_ wrong answers when you switch rounding modes. cos basically starts looking like a random number generator. This is rather amusing given that libm is the library that specified how to change the damn rounding mode and fixing this fact it was blocked by Ulrich Drepper when I last looked. Workarounds such as using crlibm http://lipforge.ens-lyon.fr/www/crlibm/ exist, but isn't installed on most platforms and it would rather dramatically complicate the installation of ghc to incur the dependency. This is why I've switched to using MPFR for anything with known rounding modes and just paying a pretty big performance tax for correctness. (That and I'm working to release a library that does exact real arithmetic using trees of nested linear fractional transformations -- assuming I can figure out how to keep performance high enough.) -Edward -Edward

...

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Carter Schonwald

3:07 p.m.

Irk. If libm is busted when changing rounding modes, that puts a nasty twist on things. I do agree that even if that hurdle is jumped, setting the local rounding mode will have to be part of every green thread context switch. But if libm is hosed that kinda makes adding that machinery a smudge pointless until there's a good story for that. On Tuesday, May 5, 2015, Edward Kmett wrote:

...

On Tue, May 5, 2015 at 7:22 AM, Scott Turner <2haskell@pkturner.org javascript:_e(%7B%7D,'cvml','2haskell@pkturner.org');> wrote:

...
On 2015-05-05 00:54, Levent Erkok wrote:

...
I can see at least two designs:

* One where the rounding mode goes with the operation: `fpAdd RoundNearestTiesToEven 2.5 6.4`. This is the "cleanest" and the functional solution, but could get quite verbose; and might be costly if the implementation changes the rounding-mode at every issue.

* The other is where the operations simply assume the RoundNearestTiesToEven, but we have lifted IO versions that can be modified with a "with" like construct: `withRoundingMode RoundTowardsPositive $ fpAddRM 2.5 6.4`. Note that `fpAddRM` (*not* `fpAdd` as before) will have to return some sort of a monadic value (probably in the IO monad) since it'll need to access the rounding mode currently active.

Neither choice jumps out at me as the best one; and a hybrid might also be possible. I'd love to hear any insight you gain regarding rounding-modes during your experiment.

The monadic alternative is more readily extensible to handle IEEE 754's sticky flags: inexact, overflow, underflow, divide-by-zero, and invalid.

This gets messier than you'd think. Keep in mind we switch contexts within our own green threads constantly on shared system threads / capabilities so the current rounding mode, sticky flags, etc. would become something you'd have to hold per Thread, and then change proactively as threads migrate between CPUs / capabilities, which we're basically completely unaware of right now.

This was what I learned when I tried my own hand at it and failed:

http://hackage.haskell.org/package/rounding

There found I gave up, and moved setting the rounding mode into custom primitives themselves. But even then you find other problems! The libm versions of almost every combinator doesn't just give slightly wrong answers when you switch rounding modes, it gives _completely_ wrong answers when you switch rounding modes. cos basically starts looking like a random number generator. This is rather amusing given that libm is the library that specified how to change the damn rounding mode and fixing this fact it was blocked by Ulrich Drepper when I last looked.

Workarounds such as using crlibm http://lipforge.ens-lyon.fr/www/crlibm/ exist, but isn't installed on most platforms and it would rather dramatically complicate the installation of ghc to incur the dependency.

This is why I've switched to using MPFR for anything with known rounding modes and just paying a pretty big performance tax for correctness. (That and I'm working to release a library that does exact real arithmetic using trees of nested linear fractional transformations -- assuming I can figure out how to keep performance high enough.)

-Edward

-Edward

...
_______________________________________________ Libraries mailing list Libraries@haskell.org javascript:_e(%7B%7D,'cvml','Libraries@haskell.org'); http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Levent Erkok

3:56 p.m.

Hmm, minefield ahead.. But surely there must be a "correct" compromise? (Even with a huge performance penalty.) I'll just add that rwbarton had this comment earlier: "Be aware (if you aren't already) that GHC does not do any management of floating-point control registers, so functions called through FFI should take care to clean up their floating-point state, otherwise the rounding mode can change unpredictably at the level of Haskell code." So, there're some FFI related issues even if we just leave the work to C. I'll also note that the current implementation of arithmetic on Double/Floats already has rounding mode issues: If someone does an FFI call to change the rounding mode via C (fgetround/fsetround functions) inside some IO block, then the arithmetic in that block cannot be "lifted" out even though it appears pure to GHC. Perhaps that should be filed as a bug too. -Levent. On Tue, May 5, 2015 at 8:07 AM, Carter Schonwald

...

wrote:

...

Irk. If libm is busted when changing rounding modes, that puts a nasty twist on things.

I do agree that even if that hurdle is jumped, setting the local rounding mode will have to be part of every green thread context switch. But if libm is hosed that kinda makes adding that machinery a smudge pointless until there's a good story for that.

On Tuesday, May 5, 2015, Edward Kmett wrote:

...
On Tue, May 5, 2015 at 7:22 AM, Scott Turner <2haskell@pkturner.org> wrote:

...
On 2015-05-05 00:54, Levent Erkok wrote:

...
I can see at least two designs:

* One where the rounding mode goes with the operation: `fpAdd RoundNearestTiesToEven 2.5 6.4`. This is the "cleanest" and the functional solution, but could get quite verbose; and might be costly if the implementation changes the rounding-mode at every issue.

* The other is where the operations simply assume the RoundNearestTiesToEven, but we have lifted IO versions that can be modified with a "with" like construct: `withRoundingMode RoundTowardsPositive $ fpAddRM 2.5 6.4`. Note that `fpAddRM` (*not* `fpAdd` as before) will have to return some sort of a monadic value (probably in the IO monad) since it'll need to access the rounding mode currently active.

Neither choice jumps out at me as the best one; and a hybrid might also be possible. I'd love to hear any insight you gain regarding rounding-modes during your experiment.

The monadic alternative is more readily extensible to handle IEEE 754's sticky flags: inexact, overflow, underflow, divide-by-zero, and invalid.

This gets messier than you'd think. Keep in mind we switch contexts within our own green threads constantly on shared system threads / capabilities so the current rounding mode, sticky flags, etc. would become something you'd have to hold per Thread, and then change proactively as threads migrate between CPUs / capabilities, which we're basically completely unaware of right now.

This was what I learned when I tried my own hand at it and failed:

http://hackage.haskell.org/package/rounding

There found I gave up, and moved setting the rounding mode into custom primitives themselves. But even then you find other problems! The libm versions of almost every combinator doesn't just give slightly wrong answers when you switch rounding modes, it gives _completely_ wrong answers when you switch rounding modes. cos basically starts looking like a random number generator. This is rather amusing given that libm is the library that specified how to change the damn rounding mode and fixing this fact it was blocked by Ulrich Drepper when I last looked.

Workarounds such as using crlibm http://lipforge.ens-lyon.fr/www/crlibm/ exist, but isn't installed on most platforms and it would rather dramatically complicate the installation of ghc to incur the dependency.

This is why I've switched to using MPFR for anything with known rounding modes and just paying a pretty big performance tax for correctness. (That and I'm working to release a library that does exact real arithmetic using trees of nested linear fractional transformations -- assuming I can figure out how to keep performance high enough.)

-Edward

-Edward

...
_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Carter Schonwald

11:40 a.m.

Hey Levent, I actually looked into how to do rounding mode setting a while ago, and the conclusion I came to is that those can simply be ffi calls at the top level that do a sort of with mode bracketing. Or at least I'm not sure if setting the mode in an inner loop is a good idea. That said, you are making a valid point, and I will investigate to what extent compiler support is useful for the latter. If bracketed mode setting and unsetting has a small enough performance overhead, adding support in ghc primops would be worth while. Note that those primops would have to be modeled as doing something thats like io or st, so that when mode switches happen can be predictable. Otherwise CSE and related optimizations could result in evaluating the same code in the wrong mode. I'll think through how that can be avoided, as I do have some ideas. I suspect mode switching code will wind up using new type wrapped floats and doubles that have a phantom index for the mode, and something like "runWithModeFoo:: Num a => Mode m->(forall s . Moded s a ) -> a" to make sure mode choices happen predictably. That said, there might be a better approach that we'll come to after some experimenting On May 5, 2015 12:54 AM, "Levent Erkok" wrote:

...

Carter: Wall of text is just fine!

I'm personally happy to see the results of your experiment. In particular, the better "code-generation" facilities you add around floats/doubles that map to the underlying hardware's native instructions, the better. When we do have proper IEEE floats, we shall surely need all that functionality.

While you're working on this, if you can also watch out for how rounding modes can be integrated into the operations, that would be useful as well. I can see at least two designs:

* One where the rounding mode goes with the operation: `fpAdd RoundNearestTiesToEven 2.5 6.4`. This is the "cleanest" and the functional solution, but could get quite verbose; and might be costly if the implementation changes the rounding-mode at every issue.

* The other is where the operations simply assume the RoundNearestTiesToEven, but we have lifted IO versions that can be modified with a "with" like construct: `withRoundingMode RoundTowardsPositive $ fpAddRM 2.5 6.4`. Note that `fpAddRM` (*not* `fpAdd` as before) will have to return some sort of a monadic value (probably in the IO monad) since it'll need to access the rounding mode currently active.

Neither choice jumps out at me as the best one; and a hybrid might also be possible. I'd love to hear any insight you gain regarding rounding-modes during your experiment.

-Levent.

On Mon, May 4, 2015 at 7:54 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:

...
pardon the wall of text everyone, but I really want some FMA tooling :)

I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12

i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things.

@levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off.

relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully.

point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious.

to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users.

A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage.

Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor.

I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice.

If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that.

I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal.

again, pardon the wall of text, i just really want to have nice things :) -Carter

On Mon, May 4, 2015 at 2:22 PM, Levent Erkok wrote:

...
I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

...
On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Carter Schonwald

12:16 p.m.

To clarify: I think theres a bit of an open design question how the explicitly moded api would look. I'd suspect it'll look somewhat like Ed's AD lib, and should be in a userland library I think. On May 5, 2015 7:40 AM, "Carter Schonwald" wrote:

...

Hey Levent, I actually looked into how to do rounding mode setting a while ago, and the conclusion I came to is that those can simply be ffi calls at the top level that do a sort of with mode bracketing. Or at least I'm not sure if setting the mode in an inner loop is a good idea.

That said, you are making a valid point, and I will investigate to what extent compiler support is useful for the latter. If bracketed mode setting and unsetting has a small enough performance overhead, adding support in ghc primops would be worth while. Note that those primops would have to be modeled as doing something thats like io or st, so that when mode switches happen can be predictable. Otherwise CSE and related optimizations could result in evaluating the same code in the wrong mode. I'll think through how that can be avoided, as I do have some ideas.

I suspect mode switching code will wind up using new type wrapped floats and doubles that have a phantom index for the mode, and something like "runWithModeFoo:: Num a => Mode m->(forall s . Moded s a ) -> a" to make sure mode choices happen predictably. That said, there might be a better approach that we'll come to after some experimenting On May 5, 2015 12:54 AM, "Levent Erkok" wrote:

...
Carter: Wall of text is just fine!

I'm personally happy to see the results of your experiment. In particular, the better "code-generation" facilities you add around floats/doubles that map to the underlying hardware's native instructions, the better. When we do have proper IEEE floats, we shall surely need all that functionality.

While you're working on this, if you can also watch out for how rounding modes can be integrated into the operations, that would be useful as well. I can see at least two designs:

* One where the rounding mode goes with the operation: `fpAdd RoundNearestTiesToEven 2.5 6.4`. This is the "cleanest" and the functional solution, but could get quite verbose; and might be costly if the implementation changes the rounding-mode at every issue.

* The other is where the operations simply assume the RoundNearestTiesToEven, but we have lifted IO versions that can be modified with a "with" like construct: `withRoundingMode RoundTowardsPositive $ fpAddRM 2.5 6.4`. Note that `fpAddRM` (*not* `fpAdd` as before) will have to return some sort of a monadic value (probably in the IO monad) since it'll need to access the rounding mode currently active.

Neither choice jumps out at me as the best one; and a hybrid might also be possible. I'd love to hear any insight you gain regarding rounding-modes during your experiment.

-Levent.

On Mon, May 4, 2015 at 7:54 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:

...
pardon the wall of text everyone, but I really want some FMA tooling :)

I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12

i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things.

@levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off.

relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully.

point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious.

to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users.

A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage.

Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor.

I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice.

If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that.

I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal.

again, pardon the wall of text, i just really want to have nice things :) -Carter

On Mon, May 4, 2015 at 2:22 PM, Levent Erkok wrote:

...
I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

...
On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Carter Schonwald

12:27 p.m.

Hrm, now that ive thought about it a wee bit more,perhaps the rounding mode info needs to be attached to ghc threads, otherwise there will be some fun bugs in multithreaded code that uses multiple rounded modes. I'll do some investigation. On May 5, 2015 8:16 AM, "Carter Schonwald" wrote:

...

To clarify: I think theres a bit of an open design question how the explicitly moded api would look. I'd suspect it'll look somewhat like Ed's AD lib, and should be in a userland library I think. On May 5, 2015 7:40 AM, "Carter Schonwald" wrote:

...
Hey Levent, I actually looked into how to do rounding mode setting a while ago, and the conclusion I came to is that those can simply be ffi calls at the top level that do a sort of with mode bracketing. Or at least I'm not sure if setting the mode in an inner loop is a good idea.

That said, you are making a valid point, and I will investigate to what extent compiler support is useful for the latter. If bracketed mode setting and unsetting has a small enough performance overhead, adding support in ghc primops would be worth while. Note that those primops would have to be modeled as doing something thats like io or st, so that when mode switches happen can be predictable. Otherwise CSE and related optimizations could result in evaluating the same code in the wrong mode. I'll think through how that can be avoided, as I do have some ideas.

I suspect mode switching code will wind up using new type wrapped floats and doubles that have a phantom index for the mode, and something like "runWithModeFoo:: Num a => Mode m->(forall s . Moded s a ) -> a" to make sure mode choices happen predictably. That said, there might be a better approach that we'll come to after some experimenting On May 5, 2015 12:54 AM, "Levent Erkok" wrote:

...
Carter: Wall of text is just fine!

I'm personally happy to see the results of your experiment. In particular, the better "code-generation" facilities you add around floats/doubles that map to the underlying hardware's native instructions, the better. When we do have proper IEEE floats, we shall surely need all that functionality.

While you're working on this, if you can also watch out for how rounding modes can be integrated into the operations, that would be useful as well. I can see at least two designs:

* One where the rounding mode goes with the operation: `fpAdd RoundNearestTiesToEven 2.5 6.4`. This is the "cleanest" and the functional solution, but could get quite verbose; and might be costly if the implementation changes the rounding-mode at every issue.

* The other is where the operations simply assume the RoundNearestTiesToEven, but we have lifted IO versions that can be modified with a "with" like construct: `withRoundingMode RoundTowardsPositive $ fpAddRM 2.5 6.4`. Note that `fpAddRM` (*not* `fpAdd` as before) will have to return some sort of a monadic value (probably in the IO monad) since it'll need to access the rounding mode currently active.

Neither choice jumps out at me as the best one; and a hybrid might also be possible. I'd love to hear any insight you gain regarding rounding-modes during your experiment.

-Levent.

On Mon, May 4, 2015 at 7:54 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:

...
pardon the wall of text everyone, but I really want some FMA tooling :)

I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12

i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things.

@levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off.

relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully.

point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious.

to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users.

A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage.

Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor.

I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice.

If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that.

I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal.

again, pardon the wall of text, i just really want to have nice things :) -Carter

On Mon, May 4, 2015 at 2:22 PM, Levent Erkok wrote:

...
I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

...
On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Jan-Willem Maessen

2:23 p.m.

On Tue, May 5, 2015 at 8:16 AM, Carter Schonwald

...

wrote:

...

To clarify: I think theres a bit of an open design question how the explicitly moded api would look. I'd suspect it'll look somewhat like Ed's AD lib, and should be in a userland library I think.

Another concern here is laziness. What happens when you force a thunk of type Double inside a "withRoundingMode" kind of construct? -Jan

...

On May 5, 2015 7:40 AM, "Carter Schonwald" wrote:

...
Hey Levent, I actually looked into how to do rounding mode setting a while ago, and the conclusion I came to is that those can simply be ffi calls at the top level that do a sort of with mode bracketing. Or at least I'm not sure if setting the mode in an inner loop is a good idea.

That said, you are making a valid point, and I will investigate to what extent compiler support is useful for the latter. If bracketed mode setting and unsetting has a small enough performance overhead, adding support in ghc primops would be worth while. Note that those primops would have to be modeled as doing something thats like io or st, so that when mode switches happen can be predictable. Otherwise CSE and related optimizations could result in evaluating the same code in the wrong mode. I'll think through how that can be avoided, as I do have some ideas.

I suspect mode switching code will wind up using new type wrapped floats and doubles that have a phantom index for the mode, and something like "runWithModeFoo:: Num a => Mode m->(forall s . Moded s a ) -> a" to make sure mode choices happen predictably. That said, there might be a better approach that we'll come to after some experimenting On May 5, 2015 12:54 AM, "Levent Erkok" wrote:

...
Carter: Wall of text is just fine!

I'm personally happy to see the results of your experiment. In particular, the better "code-generation" facilities you add around floats/doubles that map to the underlying hardware's native instructions, the better. When we do have proper IEEE floats, we shall surely need all that functionality.

While you're working on this, if you can also watch out for how rounding modes can be integrated into the operations, that would be useful as well. I can see at least two designs:

* One where the rounding mode goes with the operation: `fpAdd RoundNearestTiesToEven 2.5 6.4`. This is the "cleanest" and the functional solution, but could get quite verbose; and might be costly if the implementation changes the rounding-mode at every issue.

* The other is where the operations simply assume the RoundNearestTiesToEven, but we have lifted IO versions that can be modified with a "with" like construct: `withRoundingMode RoundTowardsPositive $ fpAddRM 2.5 6.4`. Note that `fpAddRM` (*not* `fpAdd` as before) will have to return some sort of a monadic value (probably in the IO monad) since it'll need to access the rounding mode currently active.

Neither choice jumps out at me as the best one; and a hybrid might also be possible. I'd love to hear any insight you gain regarding rounding-modes during your experiment.

-Levent.

On Mon, May 4, 2015 at 7:54 PM, Carter Schonwald < carter.schonwald@gmail.com> wrote:

...
pardon the wall of text everyone, but I really want some FMA tooling :)

I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12

i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things.

@levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off.

relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully.

point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious.

to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users.

A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage.

Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor.

I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice.

If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that.

I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal.

again, pardon the wall of text, i just really want to have nice things :) -Carter

On Mon, May 4, 2015 at 2:22 PM, Levent Erkok wrote:

...
I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

...
On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Takenobu Tani

1:06 p.m.

Hi, Related informatioln. Intel FMA's information(hardware dependent) is here: Chapter 11 Intel 64 and IA-32 Architectures Optimization Reference Manual http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32... Of course, it is information that depends on the particular processor. And abstraction level is too low. PS I like Haskell's abstruct naming convention more than "fma":-) Regards, Takenobu 2015-05-05 11:54 GMT+09:00 Carter Schonwald :

...

pardon the wall of text everyone, but I really want some FMA tooling :)

I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12

i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things.

@levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off.

relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully.

point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious.

to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users.

A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage.

Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor.

I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice.

If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that.

I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal.

again, pardon the wall of text, i just really want to have nice things :) -Carter

On Mon, May 4, 2015 at 2:22 PM, Levent Erkok wrote:

...
I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

...
On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Takenobu Tani

1:36 p.m.

Hi, Is this useful? BLAS (Basic Linear Algebra Subprograms) http://www.netlib.org/blas/ http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms Regards, Takenobu 2015-05-05 22:06 GMT+09:00 Takenobu Tani :

...

Hi,

Related informatioln.

Intel FMA's information(hardware dependent) is here:

Chapter 11

Intel 64 and IA-32 Architectures Optimization Reference Manual

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32...

Of course, it is information that depends on the particular processor. And abstraction level is too low.

PS I like Haskell's abstruct naming convention more than "fma":-)

Regards, Takenobu

2015-05-05 11:54 GMT+09:00 Carter Schonwald :

...
pardon the wall of text everyone, but I really want some FMA tooling :)

I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12

i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things.

@levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off.

relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully.

point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious.

to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users.

A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage.

Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor.

I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice.

If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that.

I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal.

again, pardon the wall of text, i just really want to have nice things :) -Carter

On Mon, May 4, 2015 at 2:22 PM, Levent Erkok wrote:

...
I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

...
On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Carter Schonwald

1:52 p.m.

Hey Takenobu, Yes both are super useful! I've certainly used the Intel architecture manual a few times and I wrote/maintain (in my biased opinion ) one of the nicer blas ffi bindings on hackage. It's worth mentioning that for haskellers who are interested in either mathematical computation or performance engineering, on freenode the #numerical-haskell channel is pretty good. Though again I'm a bit biased about the nice community there On Tuesday, May 5, 2015, Takenobu Tani wrote:

...

Hi,

Is this useful?

BLAS (Basic Linear Algebra Subprograms) http://www.netlib.org/blas/ http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms

Regards, Takenobu

2015-05-05 22:06 GMT+09:00 Takenobu Tani javascript:_e(%7B%7D,'cvml','takenobu.hs@gmail.com');>:

...
Hi,

Related informatioln.

Intel FMA's information(hardware dependent) is here:

Chapter 11

Intel 64 and IA-32 Architectures Optimization Reference Manual

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32...

Of course, it is information that depends on the particular processor. And abstraction level is too low.

PS I like Haskell's abstruct naming convention more than "fma":-)

Regards, Takenobu

2015-05-05 11:54 GMT+09:00 Carter Schonwald javascript:_e(%7B%7D,'cvml','carter.schonwald@gmail.com');>:

...
pardon the wall of text everyone, but I really want some FMA tooling :)

I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12

i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things.

@levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off.

relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully.

point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious.

to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users.

A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage.

Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor.

I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice.

If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that.

I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal.

again, pardon the wall of text, i just really want to have nice things :) -Carter

On Mon, May 4, 2015 at 2:22 PM, Levent Erkok javascript:_e(%7B%7D,'cvml','erkokl@gmail.com');> wrote:

...
I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom javascript:_e(%7B%7D,'cvml','yom@artyom.me');> wrote:

...
On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

_______________________________________________ Libraries mailing list Libraries@haskell.org javascript:_e(%7B%7D,'cvml','Libraries@haskell.org'); http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org javascript:_e(%7B%7D,'cvml','Libraries@haskell.org'); http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Takenobu Tani

6 May 6 May

12:38 p.m.

Hi Carter, Uh excuse me, you are BLAS master [1] ;-) And, thank you for teaching me about #numerical-haskell. I'll learn it. I like effective performance and abstraction. [1] http://hackage.haskell.org/package/linear-algebra-cblas Thank you, Takenobu 2015-05-05 22:52 GMT+09:00 Carter Schonwald :

...

Hey Takenobu, Yes both are super useful! I've certainly used the Intel architecture manual a few times and I wrote/maintain (in my biased opinion ) one of the nicer blas ffi bindings on hackage.

It's worth mentioning that for haskellers who are interested in either mathematical computation or performance engineering, on freenode the #numerical-haskell channel is pretty good. Though again I'm a bit biased about the nice community there

On Tuesday, May 5, 2015, Takenobu Tani wrote:

...
Hi,

Is this useful?

BLAS (Basic Linear Algebra Subprograms) http://www.netlib.org/blas/ http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms

Regards, Takenobu

2015-05-05 22:06 GMT+09:00 Takenobu Tani :

...
Hi,

Related informatioln.

Intel FMA's information(hardware dependent) is here:

Chapter 11

Intel 64 and IA-32 Architectures Optimization Reference Manual

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32...

Of course, it is information that depends on the particular processor. And abstraction level is too low.

PS I like Haskell's abstruct naming convention more than "fma":-)

Regards, Takenobu

2015-05-05 11:54 GMT+09:00 Carter Schonwald :

...
pardon the wall of text everyone, but I really want some FMA tooling :)

I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12

i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things.

@levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off.

relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully.

point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious.

to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users.

A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage.

Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor.

I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice.

If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that.

I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal.

again, pardon the wall of text, i just really want to have nice things :) -Carter

On Mon, May 4, 2015 at 2:22 PM, Levent Erkok wrote:

...
I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

...
On 05/04/2015 08:49 PM, Levent Erkok wrote:

Artyom: That's precisely the point. The true IEEE754 variants where precision does matter should be part of a different class. What Edward and Yitz want is an "optimized" multiply-add where the semantics is the same but one that goes faster.

No, it looks to me that Edward wants to have a more precise operation in Num:

I'd have to make a second copy of the function to even try to see the precision win.

Unless I'm wrong, you can't have the following things simultaneously:

1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is more precise) 3. Num operations for Double (addition and multiplication) always conform to IEEE754

The true IEEE754 variants where precision does matter should be part of a different class.

So, does it mean that you're fine with not having point #3 because people who need it would be able to use a separate class for IEEE754 floats?

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Carter Schonwald

7:42 p.m.

Hblas is what I recommend https://hackage.haskell.org/package/hblas Doesn't have everything yet. But the design is a lite better. On Wednesday, May 6, 2015, Takenobu Tani wrote:

...

Hi Carter,

Uh excuse me, you are BLAS master [1] ;-)

And, thank you for teaching me about #numerical-haskell. I'll learn it. I like effective performance and abstraction.

[1] http://hackage.haskell.org/package/linear-algebra-cblas

Thank you, Takenobu

2015-05-05 22:52 GMT+09:00 Carter Schonwald javascript:_e(%7B%7D,'cvml','carter.schonwald@gmail.com');>:

...
Hey Takenobu, Yes both are super useful! I've certainly used the Intel architecture manual a few times and I wrote/maintain (in my biased opinion ) one of the nicer blas ffi bindings on hackage.

It's worth mentioning that for haskellers who are interested in either mathematical computation or performance engineering, on freenode the #numerical-haskell channel is pretty good. Though again I'm a bit biased about the nice community there

On Tuesday, May 5, 2015, Takenobu Tani javascript:_e(%7B%7D,'cvml','takenobu.hs@gmail.com');> wrote:

...
Hi,

Is this useful?

BLAS (Basic Linear Algebra Subprograms) http://www.netlib.org/blas/ http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms

Regards, Takenobu

2015-05-05 22:06 GMT+09:00 Takenobu Tani :

...
Hi,

Related informatioln.

Intel FMA's information(hardware dependent) is here:

Chapter 11

Intel 64 and IA-32 Architectures Optimization Reference Manual

http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32...

Of course, it is information that depends on the particular processor. And abstraction level is too low.

PS I like Haskell's abstruct naming convention more than "fma":-)

Regards, Takenobu

2015-05-05 11:54 GMT+09:00 Carter Schonwald
...
:

...
pardon the wall of text everyone, but I really want some FMA tooling :)

I am going to spend some time later this week and next adding FMA primops to GHC and playing around with different ways to add it to Num (which seems pretty straightforward, though I think we'd all agree it shouldn't be exported by Prelude). And then depending on how Yitzchak's reproposal of that exactly goes (or some iteration thereof) we can get something useful/usable into 7.12

i have codes (ie *dotproducts*!!!!!) where a faster direct FMA for *exact numbers*, and a higher precision FMA for *approximate numbers *(*ie floating point*), and where I cant sanely use FMA if it lives anywhere but Num unless I rub typeable everywhere and do runtime type checks for applicable floating point types, which kinda destroys parametrically in engineering nice things.

@levent: ghc doesn't do any optimization for floating point arithmetic (aside from 1-2 very simple things that are possibly questionable), and until ghc has support for precisly emulating high precision floating point computation in a portable way, probably wont have any interesting floating point computation. Mandating that fma a b c === a*b+c for inexact number datatypes doesn't quite make sense to me. Relatedly, its a GOOD thing ghc is conservative about optimizing floating point, because it makes doing correct stability analyses tractable! I look forward to the day that GHC gets a bit more sophisticated about optimizing floating point computation, but that day is still a ways off.

relatedly: FMA for float and double are not generally going to be faster than the individual primitive operations, merely more accurate when used carefully.

point being*, i'm +1 on adding some manner of FMA operations to Num* (only sane place to put it where i can actually use it for a general use library) and i dont really care if we name it fusedMultiplyAdd, multiplyAndAdd accursedFusionOfSemiRingOperations, or fma. i'd favor "fusedMultiplyAdd" if we want a descriptive name that will be familiar to experts yet easy to google for the curious.

to repeat: i'm going to do some leg work so that the double and float prims are portably exposed by ghc-prims (i've spoken with several ghc devs about that, and they agree to its value, and thats a decision outside of scope of the libraries purview), and I do hope we can to a consensus about putting it in Num so that expert library authors can upgrade the guarantees that they can provide end users without imposing any breaking changes to end users.

A number of folks have brought up "but Num is broken" as a counter argument to adding FMA support to Num. I emphatically agree num is borken :), BUT! I do also believe that fixing up Num prelude has the burden of providing a whole cloth design for an alternative design that we can get broad consensus/adoption with. That will happen by dint of actually experimentation and usage.

Point being, adding FMA doesn't further entrench current Num any more than it already is, it just provides expert library authors with a transparent way of improving the experience of their users with a free upgrade in answer accuracy if used carefully. Additionally, when Num's "semiring ish equational laws" are framed with respect to approximate forwards/backwards stability, there is a perfectly reasonable law for FMA. I am happy to spend some time trying to write that up more precisely IFF that will tilt those in opposition to being in favor.

I dont need FMA to be exposed by *prelude/base*, merely by *GHC.Num* as a method therein for Num. If that constitutes a different and *more palatable proposal* than what people have articulated so far (by discouraging casual use by dint of hiding) then I am happy to kick off a new thread with that concrete design choice.

If theres a counter argument thats a bit more substantive than "Num is for exact arithmetic" or "Num is wrong" that will sway me to the other side, i'm all ears, but i'm skeptical of that.

I emphatically support those who are displeased with Num to prototype some alternative designs in userland, I do think it'd be great to figure out a new Num prelude we can migrate Haskell / GHC to over the next 2-5 years, but again any such proposal really needs to be realized whole cloth before it makes its way to being a libraries list proposal.

again, pardon the wall of text, i just really want to have nice things :) -Carter

On Mon, May 4, 2015 at 2:22 PM, Levent Erkok wrote:

...
I think `mulAdd a b c` should be implemented as `a*b+c` even for Double/Float. It should only be an "optmization" (as in modular arithmetic), not a semantic changing operation. Thus justifying the optimization.

"fma" should be the "more-precise" version available for Float/Double. I don't think it makes sense to have "fma" for other types. That's why I'm advocating "mulAdd" to be part of "Num" for optimization purposes; and "fma" reserved for true IEEE754 types and semantics.

I understand that Edward doesn't like this as this requires a different class; but really, that's the price to pay if we claim Haskell has proper support for IEEE754 semantics. (Which I think it should.) The operation is just different. It also should account for the rounding-modes properly.

I think we can pull this off just fine; and Haskell can really lead the pack here. The situation with floats is even worse in other languages. This is our chance to make a proper implementation, and we have the right tools to do so.

-Levent.

On Mon, May 4, 2015 at 10:58 AM, Artyom wrote:

> On 05/04/2015 08:49 PM, Levent Erkok wrote: > > Artyom: That's precisely the point. The true IEEE754 variants where > precision does matter should be part of a different class. What Edward and > Yitz want is an "optimized" multiply-add where the semantics is the same > but one that goes faster. > > No, it looks to me that Edward wants to have a more precise > operation in Num: > > I'd have to make a second copy of the function to even try to see > the precision win. > > Unless I'm wrong, you can't have the following things simultaneously: > > 1. the compiler is free to substitute *a+b*c* with *mulAdd a b c* > 2. *mulAdd a b c* is implemented as *fma* for Doubles (and is > more precise) > 3. Num operations for Double (addition and multiplication) > always conform to IEEE754 > > The true IEEE754 variants where precision does matter should be > part of a different class. > > So, does it mean that you're fine with not having point #3 because > people who need it would be able to use a separate class for IEEE754 floats? > >

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Takenobu Tani

7 May 7 May

12:40 p.m.

Hi Carter, Thank you for teaching me again. I'll learn by it. well-established:-) Thank you, Takenobu 2015-05-07 4:42 GMT+09:00 Carter Schonwald :

...

Hblas is what I recommend https://hackage.haskell.org/package/hblas

Doesn't have everything yet. But the design is a lite better.

Levent Erkok

4 May 4 May

5:48 p.m.

Joachim: I do think that a class is needed. The IEEE754 is actually quite agnostic about floating-point types. What IEEE754 says about floats are the sizes of the exponent and the mantissa; let's call them E and M for short. Then, one can define a floating-point type for each combination of E and M, both of which are at least 2. The resulting type will fit into E+M+1 bits. We have: - "Float" is E=8, M=23. (And thus fits into a 32 bit machine word with the sign bit.) - "Double" is E=11, M=52. (And thus fits into a 64 bit machine word with the sign bit.) (In fact IEEE754 defines single/double precision to have at least those E/M values, but allows for larger. But let's ignore that for a moment.) You can see that the next thing in line is going to be something that fits into 128 bits, also known as quad-precision. (Where E=15, M=112, plus 1 for the sign-bit.) If we get type-literals into Haskell proper, then these types can all be nicely represented as "FP e m" for numbers e, m >= 2. It just happens that Float/Double are what most hardware implementations support "naturally," but all IEEE-754 operations are defined for all precisions, and I think it would make sense to capture this nicely in Haskell, much like we have Int8, Int16, Int32 etc, and have them instances of this new class. So, I'm quite against creating "fmaFloat"/"fmaDouble" etc.; but rather collect all these in a true IEEE754 arithmetic class. Float and Double will be the two instances for today, but one can easily see the extension to other variants in the future. (C already supports long-double to an extent, so that's missing in Haskell; as one sticking point.) This class should also address rounding-modes, as almost all float-operations only make sense in the context of a rounding mode. The design space there is also large, but that's a different discussion. -Levent. On Mon, May 4, 2015 at 1:14 AM, Joachim Breitner wrote:

...

Hi,

Am Sonntag, den 03.05.2015, 14:11 -0700 schrieb Levent Erkok:

...
Based on this analysis, I'm withdrawing the original proposal. I think fma and other floating-point arithmetic operations are very important to support properly, but it should not be done by tacking them on to Num or RealFloat; but rather in a new class that also considers rounding-mode properly.

does it really have to be a class? How much genuinely polymorphic code is there out there that yet requires this precise handling of precision?

Have you considered adding it as monomorphic functions fmaDouble, fmaFloat etc. on hackage, using FFI? Then those who need these functions can start to use them.

Furthermore you can start getting the necessary primops supported in GHC, and have your library transparently use them when available.

And only then, when we have the implementation in place and actual users, we can evaluate whether we need an abstract class for this.

Greetings, Joachim

-- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org

_______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Yitzchak Gale

10 a.m.

Levent Erkok wrote:

...

...I think this proposal needs to be shelved for the time being.

Nevertheless, I vote for doing it now. A better, more featureful, and more principled approach to FP is definitely needed. It would be great if we could tackle that and finally solve it - and I think we can. But that's a huge issue which has been discussed extensively in the past, and orthogonal to Levant's proposal. In the meantime, adding new functions that provide access to more FP functionality without adding any significant new weirdness are welcome, and will naturally flow into whatever future solution to the broader FP issue we implement. It makes little difference whether or not we provide a bad but working default implementation; my vote is to provide it. It will prevent breakage in case someone happens to have implemented a manual RealFloat instance out there somewhere, and it won't affect the standard instances because we'll provide implementations for those. Obviously a clear explanatory Haddock comment would be required. Even better, trigger a warning if an instance does not provide an explicit implementation, but I'm not sure if that's possible. I'm still in favor of doing Levant's proposal now even if the consensus is to omit the default. I vote for the usual practice of a human-readable name, but don't let bikeshedding hold this back. Thanks, Yitz

Merijn Verstraaten

11:49 a.m.

I would suggest adding the relevant high-precision versions as direct functions on Float/Double and then add the "better" versions as part of Num as was suggested. Anyone who *needs* the precision can then get it by using the functions directly and forcing a specific type (since I don't think polymorphic code and this sort of precision demands fit well together). This way it's *possible* to write code with the required precision for Float/Double and anyone using Num gets an optional precision boost. Cheers, Merijn

...

On 4 May 2015, at 12:00, Yitzchak Gale wrote:

Levent Erkok wrote:

...
...I think this proposal needs to be shelved for the time being.

Nevertheless, I vote for doing it now.

A better, more featureful, and more principled approach to FP is definitely needed. It would be great if we could tackle that and finally solve it - and I think we can. But that's a huge issue which has been discussed extensively in the past, and orthogonal to Levant's proposal.

In the meantime, adding new functions that provide access to more FP functionality without adding any significant new weirdness are welcome, and will naturally flow into whatever future solution to the broader FP issue we implement.

It makes little difference whether or not we provide a bad but working default implementation; my vote is to provide it. It will prevent breakage in case someone happens to have implemented a manual RealFloat instance out there somewhere, and it won't affect the standard instances because we'll provide implementations for those. Obviously a clear explanatory Haddock comment would be required. Even better, trigger a warning if an instance does not provide an explicit implementation, but I'm not sure if that's possible. I'm still in favor of doing Levant's proposal now even if the consensus is to omit the default.

I vote for the usual practice of a human-readable name, but don't let bikeshedding hold this back.

Thanks, Yitz _______________________________________________ Libraries mailing list Libraries@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

Carter Schonwald

1:07 p.m.

Agreed. It will be a boon for dot product powered algorithms every where. There's a valid argument for in parallel exploration of systematically better abstractions for the future, but that shouldn't preclude making core tooling and primops a bit better in time for 7.12 I'll start investigating adding the applicable primops to ghc on all supported platforms. Most of the widely used ones have direct instruction support, but some may have to call out to the c fma, eg unregisterized builds and perhaps x86_32 unless I'm mistaken on the latter. On Monday, May 4, 2015, Merijn Verstraaten wrote:

...

I would suggest adding the relevant high-precision versions as direct functions on Float/Double and then add the "better" versions as part of Num as was suggested. Anyone who *needs* the precision can then get it by using the functions directly and forcing a specific type (since I don't think polymorphic code and this sort of precision demands fit well together). This way it's *possible* to write code with the required precision for Float/Double and anyone using Num gets an optional precision boost.

Cheers, Merijn

...
On 4 May 2015, at 12:00, Yitzchak Gale javascript:;> wrote:

Levent Erkok wrote:

...
...I think this proposal needs to be shelved for the time being.

Nevertheless, I vote for doing it now.

A better, more featureful, and more principled approach to FP is definitely needed. It would be great if we could tackle that and finally solve it - and I think we can. But that's a huge issue which has been discussed extensively in the past, and orthogonal to Levant's proposal.

In the meantime, adding new functions that provide access to more FP functionality without adding any significant new weirdness are welcome, and will naturally flow into whatever future solution to the broader FP issue we implement.

It makes little difference whether or not we provide a bad but working default implementation; my vote is to provide it. It will prevent breakage in case someone happens to have implemented a manual RealFloat instance out there somewhere, and it won't affect the standard instances because we'll provide implementations for those. Obviously a clear explanatory Haddock comment would be required. Even better, trigger a warning if an instance does not provide an explicit implementation, but I'm not sure if that's possible. I'm still in favor of doing Levant's proposal now even if the consensus is to omit the default.

I vote for the usual practice of a human-readable name, but don't let bikeshedding hold this back.

Thanks, Yitz _______________________________________________ Libraries mailing list Libraries@haskell.org javascript:; http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries

3712

Age (days ago)

3720

Last active (days ago)

List overview

Download

49 comments

22 participants

participants (22)

adam vogt
amindfv＠gmail.com
Artyom
Brandon Allbery
Carter Schonwald
David Feuer
Edward Kmett
Henning Thielemann
Ivan Lazar Miljenovic
Jan-Willem Maessen
Joachim Breitner
Ken T Takusagawa
Levent Erkok
Merijn Verstraaten
Mike Meyer
Roman Cheplyaka
Scott Turner
Takenobu Tani
Tikhon Jelvis
Twan van Laarhoven
wren romano
Yitzchak Gale