GHC worker/wrapper on multiple return values

Recently discovered GHC generates sub-optimal code when you have - two levels of loops in the IO monad - the inner loop doesn't do that much work - the outer loop does a lot of work From dumping core I believe the issue is the worker/wrapper transformation doesn't produce return value unboxing (from the inner loop to the outer in this case) when there are multiple return values (as you windup with the IO monad due to passing around the State# RealWorld behind the scenes). Here is some simple code that demonstrates the issue. With a single return value {-# LANGUAGE BangPatterns #-} module Main (main) where main :: IO () main = do let a = l1 0 100000000 print a l1 :: Int -> Int -> Int l1 !a !n = case n>0 of True -> let m = l2 0 n 3 in l1 (a+m) (n-1) False -> a l2 :: Int -> Int -> Int -> Int l2 !a !n !m = case m>0 of True -> l2 (a+n) (n-1) (m-1) False -> a you get nice clean core that performs no allocations and runs beautifully. Doing the same thing within the IO monad (effectively lifting the return value above into a unboxed tupple) {-# LANGUAGE BangPatterns, MagicHash, UnboxedTuples #-} module Main (main) where main :: IO () main = do a <- l1 0 100000000 print a l1 :: Int -> Int -> IO Int l1 !a !n = do case n>0 of True -> do m <- l2 0 n 3 l1 (a+m) (n-1) False -> return a l2 :: Int -> Int -> Int -> IO Int l2 !a !n !m = do case m>0 of True -> l2 (a+n) (n-1) (m-1) False -> return a you don't get boxing and unboxing the return value between l1 and l2. This has quite a dramatic effect when l2 doesn't do the that much work (massive amounts of additional memory allocation and a 2x slow down). $ time ./Simple-pure 14999999850000000 51,912 bytes allocated in the heap ... real 0m0.322s ... $ time ./Simple-io 14999999850000000 1,600,051,936 bytes allocated in the heap ... real 0m0.634s Should I open a ticket against GHC about this issue? Possibly it wouldn't be that hard to get the work/wrapper stuff to work its unboxing magic on multiple return values too? Cheers! -Tyson [*] https://gist.github.com/twhitehead/5744eee28bcde85b9fa4

On Wednesday, 17 February 2016 23:00:07 UTC-5, Tyson Whitehead wrote:
... Doing the same thing within the IO monad (effectively lifting the return value above into a unboxed tupple)
...
you don't get boxing and unboxing the return value between l1 and l2. This has quite a dramatic effect when l2 doesn't do the that much work (massive amounts of additional memory allocation and a 2x slow down).
That of course should have read "Doing the same thing within the IO monad (...) you get boxing and unboxing of the return value between l1 and l2..." (i.e., no "don't").
participants (1)
-
Tyson Whitehead