[GHC] #8195: Different floating point results with -msse2 on 32bit Linux

#8195: Different floating point results with -msse2 on 32bit Linux --------------------------+------------------------------------------------ Reporter: | Owner: jstolarek | Status: new Type: bug | Milestone: Priority: normal | Version: 7.7 Component: | Operating System: Linux Compiler (NCG) | Type of failure: None/Unknown Keywords: | Test Case: perf/should_run/Conversions Architecture: x86 | Blocking: Difficulty: | Unknown | Blocked By: | Related Tickets: | --------------------------+------------------------------------------------ I noticed strange behaviour of `Conversions` test on 32bit machines. Here's the code of this test: {{{ {-# LANGUAGE BangPatterns #-} -- | Tests that conversions between various primitive types (e.g. -- Word, Double, etc) doesn't allocate. module Main (main) where import Data.Word -- Repeatedly convert Words to Doubles loop :: Floating a => Word -> a loop n = go 0 0.0 where go i !acc | i < n = go (i+1) (acc + fromIntegral i) | otherwise = acc {-# SPECIALISE loop :: Word -> Float #-} {-# SPECIALISE loop :: Word -> Double #-} main :: IO () main = do print (loop 1000000 :: Float) print (loop 1000000 :: Double) }}} This test is expected to produce: {{{ [t-jastol@cam-05-unx : /5playpen/t-jastol/ghc-validate] inplace/bin/ghc- stage2 -O -fforce-recomp testsuite/tests/perf/should_run/Conversions.hs [1 of 1] Compiling Main ( testsuite/tests/perf/should_run/Conversions.hs, testsuite/tests/perf/should_run/Conversions.o ) Linking testsuite/tests/perf/should_run/Conversions ... [t-jastol@cam-05-unx : /5playpen/t-jastol/ghc-validate] ./testsuite/tests/perf/should_run/Conversions 4.9994036e11 4.999995e11 }}} My optimization of self-recursive tail loops (d61c3ac186c94021c851f7a2a6d20631e35fc1ba) broke that and now both `Float` and `Double` results are identical: {{{ 4.999995e11 4.999995e11 }}} I believe my optimisation shouldn't affect this, but clearly it does. Now here's the interesting part: if I generate SSE assembly by adding `-msse2` flag, then I get expected results: {{{ 4.9994036e11 4.999995e11 }}} I'm not sure if we should get different results for different instruction sets. I'm putting it up as a ticket so perhaps we could investigate this more one day if we consider it relevant. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8195 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8195: Different floating point results with -msse2 on 32bit Linux
------------------------------------------------+--------------------------
Reporter: jstolarek | Owner:
Type: bug | Status: new
Priority: normal | Milestone:
Component: Compiler (NCG) | Version: 7.7
Resolution: | Keywords:
Operating System: Linux | Architecture: x86
Type of failure: None/Unknown | Difficulty:
Test Case: perf/should_run/Conversions | Unknown
Blocking: | Blocked By:
| Related Tickets:
------------------------------------------------+--------------------------
Comment (by Jan Stolarek

#8195: Different floating point results with -msse2 on 32bit Linux ------------------------------------------------+-------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Linux | Architecture: x86 Type of failure: None/Unknown | Difficulty: Test Case: perf/should_run/Conversions | Unknown Blocking: | Blocked By: | Related Tickets: ------------------------------------------------+-------------------------- Description changed by jstolarek: Old description:
I noticed strange behaviour of `Conversions` test on 32bit machines. Here's the code of this test:
{{{ {-# LANGUAGE BangPatterns #-}
-- | Tests that conversions between various primitive types (e.g. -- Word, Double, etc) doesn't allocate. module Main (main) where
import Data.Word
-- Repeatedly convert Words to Doubles loop :: Floating a => Word -> a loop n = go 0 0.0 where go i !acc | i < n = go (i+1) (acc + fromIntegral i) | otherwise = acc {-# SPECIALISE loop :: Word -> Float #-} {-# SPECIALISE loop :: Word -> Double #-}
main :: IO () main = do print (loop 1000000 :: Float) print (loop 1000000 :: Double) }}} This test is expected to produce: {{{ [t-jastol@cam-05-unx : /5playpen/t-jastol/ghc-validate] inplace/bin/ghc- stage2 -O -fforce-recomp testsuite/tests/perf/should_run/Conversions.hs [1 of 1] Compiling Main ( testsuite/tests/perf/should_run/Conversions.hs, testsuite/tests/perf/should_run/Conversions.o ) Linking testsuite/tests/perf/should_run/Conversions ... [t-jastol@cam-05-unx : /5playpen/t-jastol/ghc-validate] ./testsuite/tests/perf/should_run/Conversions 4.9994036e11 4.999995e11 }}} My optimization of self-recursive tail loops (d61c3ac186c94021c851f7a2a6d20631e35fc1ba) broke that and now both `Float` and `Double` results are identical: {{{ 4.999995e11 4.999995e11 }}} I believe my optimisation shouldn't affect this, but clearly it does.
Now here's the interesting part: if I generate SSE assembly by adding `-msse2` flag, then I get expected results: {{{ 4.9994036e11 4.999995e11 }}} I'm not sure if we should get different results for different instruction sets. I'm putting it up as a ticket so perhaps we could investigate this more one day if we consider it relevant.
New description: I noticed strange behaviour of `Conversions` test on 32bit machines. Here's a slightly reduced version of that test: {{{ {-# LANGUAGE BangPatterns #-} -- | Tests that conversions between various primitive types (e.g. -- Word, Double, etc) doesn't allocate. module Main (main) where import Data.Word -- Repeatedly convert Words to Doubles loop :: Word -> Float loop n = go 0 0.0 where go i !acc | i < n = go (i+1) (acc + fromIntegral i) | otherwise = acc main :: IO () main = do print (loop 1000000 :: Float) }}} This test is expected to produce: {{{ [t-jastol@cam-05-unx : /5playpen/t-jastol/ghc-validate] inplace/bin/ghc- stage2 -O -fforce-recomp testsuite/tests/perf/should_run/Conversions.hs [1 of 1] Compiling Main ( testsuite/tests/perf/should_run/Conversions.hs, testsuite/tests/perf/should_run/Conversions.o ) Linking testsuite/tests/perf/should_run/Conversions ... [t-jastol@cam-05-unx : /5playpen/t-jastol/ghc-validate] ./testsuite/tests/perf/should_run/Conversions 4.9994036e11 }}} My optimization of self-recursive tail loops (d61c3ac186c94021c851f7a2a6d20631e35fc1ba) change results of that program to be slightly different: {{{ 4.999995e11 }}} I believe my optimisation shouldn't affect this, but clearly it does. Now here's the interesting part: if I generate SSE assembly by adding `-msse2` flag, then I get expected results: {{{ 4.9994036e11 }}} I'm not sure if we should get different results for different instruction sets. I'm putting it up as a ticket so perhaps we could investigate this more one day if we consider it relevant. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8195#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8195: Different floating point results with -msse2 on 32bit Linux ------------------------------------------------+-------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Linux | Architecture: x86 Type of failure: None/Unknown | Difficulty: Test Case: perf/should_run/Conversions | Unknown Blocking: | Blocked By: | Related Tickets: ------------------------------------------------+-------------------------- Comment (by carter): i think 32bit x86 defaults to using the x87 floating point registers, which internally do 80 bit extended precision floating point. SSE2/SSE3 registers in contrast only have the standard double 64bit and float 32 bit precision floating point numbers. is this possibly attributable to the change in which FPU is being used? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8195#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8195: Different floating point results with -msse2 on 32bit Linux ------------------------------------------------+-------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Linux | Architecture: x86 Type of failure: None/Unknown | Difficulty: Test Case: perf/should_run/Conversions | Unknown Blocking: | Blocked By: | Related Tickets: ------------------------------------------------+-------------------------- Comment (by rwbarton): Yeah, the fact that `-msse2` affects the output is expected for the reason carter stated. This issue is discussed generally at http://www.haskell.org/ghc/docs/latest/html/users_guide/bugs.html#bugs-ghc (last bullet point of 14.2.1). The reason your tail loop optimization affected the output is probably that with the optimization the code compiles to a loop in which `acc` lives in an 80-bit x87 register for the entire duration, while before the optimization `acc` was pushed on the stack as a 32-bit value for the recursive call. (Or possibly it was spilled to memory for some other reason.) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8195#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#8195: Different floating point results with -msse2 on 32bit Linux ------------------------------------------------+-------------------------- Reporter: jstolarek | Owner: Type: bug | Status: new Priority: low | Milestone: Component: Compiler (NCG) | Version: 7.7 Resolution: | Keywords: Operating System: Linux | Architecture: x86 Type of failure: None/Unknown | Difficulty: Test Case: perf/should_run/Conversions | Unknown Blocking: | Blocked By: | Related Tickets: ------------------------------------------------+-------------------------- Changes (by ezyang): * priority: normal => low -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8195#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC