[GHC] #9246: GHC generates poor code for repeated uses of min/max

28 Jun 2014

      #9246: GHC generates poor code for repeated uses of min/max
------------------------------+--------------------------------------------
       Reporter:  arotenberg  |             Owner:
           Type:  bug         |            Status:  new
       Priority:  normal      |         Milestone:
      Component:  Compiler    |           Version:  7.8.2
       Keywords:              |  Operating System:  Windows
   Architecture:  x86_64      |   Type of failure:  Runtime performance bug
  (amd64)                     |         Test Case:
     Difficulty:  Unknown     |          Blocking:
     Blocked By:              |
Related Tickets:  #6135       |
------------------------------+--------------------------------------------
 Consider the following module, which intends to implement a
 [http://tavianator.com/2011/05/fast-branchless-raybounding-box-
 intersections/ branchless ray-AABB intersection test]:

 {{{
 module SimpleGeom where

 data Vec3 = Vec3
     {-# UNPACK #-} !Double
     {-# UNPACK #-} !Double
     {-# UNPACK #-} !Double
 data Ray = Ray !Vec3 !Vec3 !Vec3
 data AABB = AABB !Vec3 !Vec3

 testRayAABBIntersection :: Ray -> AABB -> Bool
 testRayAABBIntersection (Ray (Vec3 ox oy oz) _ (Vec3 invDx invDy invDz))
         (AABB (Vec3 minX minY minZ) (Vec3 maxX maxY maxZ)) =
     let tx1 = (minX - ox) * invDx
         tx2 = (maxX - ox) * invDx
         ty1 = (minY - oy) * invDy
         ty2 = (maxY - oy) * invDy
         tz1 = (minZ - oz) * invDz
         tz2 = (maxZ - oz) * invDz
         tmin = min tx1 tx2 `max` min ty1 ty2 `max` min tz1 tz2
         tmax = max tx1 tx2 `min` max ty1 ty2 `min` max tz1 tz2
     in tmax >= max 0 tmin
 }}}

 Everything is strict primitive operations, so GHC should generate very
 simple, fast code, right? But upon compiling with {{{ghc -O -ddump-simpl
 -ddump-to-file SimpleGeom}}}, I found a mess of nested local lambdas and
 similar performance-killing expression forms. (See the attached output
 file.)

 There are two possible issues I can see here.
 1. GHC is trying to expand out all of the branches recursively (I would
 presume via case-of-cases transformation), which is a bad idea in this
 instance compared to just performing the cases sequentially and storing
 their results.
 1. GHC is generating branches for floating-point min/max. Instruction sets
 like SSE2 include non-branching floating-point min/max instructions, which
 is exactly what this algorithm was designed to exploit, but GHC does not
 generate code that could take advantage of these instructions.

--
Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9246
GHC http://www.haskell.org/ghc/
The Glasgow Haskell Compiler

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

GHC

tags

participants (1)