
#8763: forM_ [1..N] does not get fused (10 times slower than go function) -------------------------------------+------------------------------------- Reporter: nh2 | Owner: (none) Type: bug | Status: new Priority: normal | Milestone: 8.6.1 Component: Compiler | Version: 7.6.3 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #7206 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): Here's an implementation of `efdtIntFB` that fits our requirements: {{{ data Direction = Up | Dn deriving Eq direction :: Int# -> Int# -> Direction direction from to | isTrue# (to >=# from) = Up | otherwise = Dn efdtIntFB :: (Int -> r -> r) -> r -> Int# -> Int# -> Int# -> r efdtIntFB c n x1 x2 y = emit first x1 where -- We can safely emit the first element if an iteration -- 'moves closer' to @y@. That's exactly the case when -- @dir_x2@ coincides with @dir_y@. !first = dir_x2 == dir_y !dir_x2 = direction x1 x2 !dir_y = direction x1 y -- We need the overflow flag in 'emit'. (# delta, delta_ovf #) = x2 `subIntC#` x1 -- | Think of @emit :: Maybe Int -> [Int]@, only unboxed. -- If the argument is 'Nothing', we reached the end of the list. -- If the argument is 'Just', we emit an element, compute -- the next candidate, validate it and recurse. emit False _ = n emit True x = I# x `c` emit next_ok next where -- Check that @next@ didn't move past @y@. -- Also, overflow is only allowed iff the computation for -- @delta@ overflowed. (# next, next_ovf #) = addIntC# x delta !next_ok = isTrue# (next_ovf ==# delta_ovf) && direction next y == dir_y -- TODO: evaluate strict && for branchless code {-# INLINE[0] efdtIntFB #-} }}} Some pros: - I find this much easier to understand. No complicated invariants, etc. - No Up/Dn variants to maintain. Still, if the direction is statically known, constant folding and inlining will simplify stuff to the equivalent code. - As a result, no more duplication of `c` occurrences - Also no more duplication of `n` occurrences Cons: - `emit`s closure is 4 words big (2 words bigger than the closure of the original `go_up` helper) in the most general form. It's unfortunate that we can't pack together `dir_y` and `delta_ovf` into a single word without confusing constant folding. This would need either some kind of constant propagation through bit fields (out of scope for GHC, I think) or a smarter closure allocation theme that packs together non-pointer payloads. - We pay for the generalisation of Up/Dn variants by having to compare with `dir_y` all the time. - `base` lacks `addWordC#` primitives, which I'll probably add now -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/8763#comment:60 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler