
Even if Ord becomes lawful for floating point, there will still be massive
problems reasoning about it because the Num instances can't support the
ring laws, let alone the ordered ring laws. What should `compare NaN n` be?
If it's an exception, then the ordering is not total, you can't store NaN
in a Set, etc. If it's LT or GT, then you get a total ordering, but a
rather weird one. So yeah, you'd be able to store NaN in a Set and have an
NaN key in a Map, but then as soon as you start looking at where these are
coming from and where they're going, everything goes weird and you need
type-specific code anyway.
On Thu, Feb 7, 2019, 4:29 PM Carter Schonwald to further add weight, i'm still doing preliminary hackery on the
signalling approach, but the signalling for FP state stuff seems to be OS
thread local, so it can be treated as an exception perfectly well! On Thu, Feb 7, 2019 at 4:27 PM Carter Schonwald <
carter.schonwald@gmail.com> wrote: @sven and @henning :
i'm actually doing some preliminary work to add save and restore for FPU
state to the GHC RTS, at the green/haskell thread layer. after first
ripping out x87 code gen, which just needs some more docs written out
before its merged in. note that i'm speaking specifically of the MXCSR
register save and restore, not the more hefty operations you might be
thinking. FPU mode state save and restore is done already on EVERY OS when
switching threads/processes, and in the agner fog latency tables the cost
of manipulating mxcsr registers is pretty small!
https://www.agner.org/optimize/instruction_tables.pdf LDMXCSR (restore) and STMXCSR (save) have cpu latencies at like 5-20
cycles (more often 8-15), so having the current C ffi calls set the
default C FPU environment (as we currently have ordinarily) is super doable
to ensure no breakage of existing C bindings, plus have a new ccall variant
that inherits the host haskell thread FPU state. we're talking sub 10
nanosecond overhead on x86 and x86_64 platforms (and either way, on those
platforms soon ghc will only be using the sse2 or higher ). point being: aside from like AMD piledriver micro architecture and some
stuff from VIA, the performance of the CPU instruction for the signalling
nans state setup and related rounding mode etc, should work perfectly well, @Daniel Cartwright On Thu, Feb 7, 2019 at 12:05 PM Sven Panne Am Do., 7. Feb. 2019 um 17:22 Uhr schrieb Henning Thielemann <
lemming@henning-thielemann.de>: [...] What about calling into foreign code? If I call a BLAS routine
and one
element of the result vector is NaN, shall this be trapped? Or shall it
be
trapped once I access the NaN element? IMHO this is the biggest show stopper for some exotic NaN handling, as
correct as it may be mathematically or aesthetically: The floating point
environment is a thread-local (i.e. basically global) entity on most
platforms, and most programming language runtimes expect a "default"
environment, i.e. no traps when NaNs are encountered. So if Haskell wants
to do things differently, the FPE has to be set/reset around foreign calls
and for around every Haskell callback. I am not sure if this is really
worth the trouble and the performance loss. For some special applications
it might be OK or even important, but my gut feeling is that trapping NaNs
is the wrong default in our current world...
_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries _______________________________________________
Libraries mailing list
Libraries@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries