On Thu, Feb 7, 2019 at 4:27 PM Carter Schonwald <carter.schonwald@gmail.com> wrote:

@sven and @henning :
i'm actually doing some preliminary work to add save and restore for FPU state to the GHC RTS, at the green/haskell thread layer. after first ripping out x87 code gen, which just needs some more docs written out before its merged in. note that i'm speaking specifically of the MXCSR register save and restore, not the more hefty operations you might be thinking.

FPU mode state save and restore is done already on EVERY OS when switching threads/processes, and in the agner fog latency tables the cost of manipulating mxcsr registers is pretty small!
https://www.agner.org/optimize/instruction_tables.pdf

LDMXCSR (restore) and STMXCSR (save) have cpu latencies at like 5-20 cycles (more often 8-15), so having the current C ffi calls set the default C FPU environment (as we currently have ordinarily) is super doable to ensure no breakage of existing C bindings, plus have a new ccall variant that inherits the host haskell thread FPU state. we're talking sub 10 nanosecond overhead on x86 and x86_64 platforms (and either way, on those platforms soon ghc will only be using the sse2 or higher ).

point being: aside from like AMD piledriver micro architecture and some stuff from VIA, the performance of the CPU instruction for the signalling nans state setup and related rounding mode etc, should work perfectly well,

@Daniel Cartwright I do not support documenting false laws in any enshrined way, it will result in broken code. (Also i'm actually working to do some fixes, if you reread my remarks and merijn's, and i think we can have our cake and eat it, with the finest floats). Lets fix stuff and then document true laws!

On Thu, Feb 7, 2019 at 12:05 PM Sven Panne <svenpanne@gmail.com> wrote:
Am Do., 7. Feb. 2019 um 17:22 Uhr schrieb Henning Thielemann <lemming@henning-thielemann.de>:
[...] What about calling into foreign code? If I call a BLAS routine and one
element of the result vector is NaN, shall this be trapped? Or shall it be
trapped once I access the NaN element?

IMHO this is the biggest show stopper for some exotic NaN handling, as correct as it may be mathematically or aesthetically: The floating point environment is a thread-local (i.e. basically global) entity on most platforms, and most programming language runtimes expect a "default" environment, i.e. no traps when NaNs are encountered. So if Haskell wants to do things differently, the FPE has to be set/reset around foreign calls and for around every Haskell callback. I am not sure if this is really worth the trouble and the performance loss. For some special applications it might be OK or even important, but my gut feeling is that trapping NaNs is the wrong default in our current world...
_______________________________________________
Libraries mailing list
Libraries@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries