a faster, accumulating mapM (was Re: [Haskell-cafe] mapM vs mapM_ performance)

Hello Luke and other Haskellers,
Thanks for the reply, but as I noted before, the amount of memory
allocated (and resident) is roughly the same. Anyhow it's definitely
not a GC issue because I wrote an accumulating version of mapM and got
close to mapM_ 's performance.
In the code below, main1 is mapM_, main2 is the current mapM
(basicallly sequence . map), map3 is a hand-coded accumulating
parameter version, mapM2 is the accumulating parameter mapM and main4
uses mapM2. The timings I get are about 15, 175, 20 and 20 seconds
for main1, main2, main3 and main4 respectively. main2 uses about 2%
less memory than main3 or main4 on this particular run, though I don't
know if that is true generally.
Unless someone can see a reason why mapM2 is not as good as mapM, can
I suggest replacing the implementation of mapM by the implementation
of mapM2. A 10x speedup seems to be a bigger deal than GCing 2% more
memory.
best regards, Ben
module Main where
import System.IO (openFile, IOMode(..), hPutStr)
testlst = let ls = [(i, [(j, (fromIntegral j)::Float) | j <-
[1..5]::[Int]]) | i <- [1..500000]::[Int]]
in ls
main = do
h <- openFile "bardump" WriteMode
mapM_ ((hPutStr h) . show) testlst
main2 = do
h <- openFile "bardump2" WriteMode
result <- mapM ((hPutStr h) . show) testlst
print $ length result
main3 = do
h <- openFile "bardump3" WriteMode
result <- dump h testlst []
print $ length result
where dump h (x:xs) accum = do
hPutStr h $ show x
dump h xs $ ():accum
dump _ [] accum = return accum
mapM2 :: Monad m => (a -> m b) -> [a] -> m [b]
{-# INLINE mapM2 #-}
mapM2 fn lst = mapM2accum fn lst []
where mapM2accum _ [] accum = return accum
mapM2accum fn (x:xs) accum = do
r <- fn x
mapM2accum fn xs (r:accum)
main4 = do
h <- openFile "bardump2" WriteMode
result <- mapM2 ((hPutStr h) . show) testlst
print $ length result
On Thu, Apr 24, 2008 at 1:37 AM, Luke Palmer
On Tue, Apr 22, 2008 at 11:32 AM, Ben
wrote: Hello Haskellers,
I'm running ghc 6.8.2 on vista 64. Consider the following program, which is compiled with -02 -prof -auto-all:
module Main where
import System.IO (openFile, IOMode(..), hPutStr)
testlst = let ls = [(i, [(j, (fromIntegral j)::Float) | j <- [1..5]::[Int]]) | i <- [1..500000]::[Int]] in ls
main2 = do h <- openFile "bardump" WriteMode mapM_ ((hPutStr h) . show) testlst
main = do h <- openFile "bardump2" WriteMode mapM ((hPutStr h) . show) testlst return ()
main and main2 are different in only that mapM_ versus mapM_ are used. But the mapM version runs about 20x slower! I'm running with +RTS -p -hc -RTS and I see that the amount of memory allocated is about the same, and I think the resident memory is about the same too. But the mapM_ version runs in about 8.7 seconds, and the mapM version takes 167 seconds.
My first guess is that the garbage collector is not running at all in the mapM_ version, but is working it's ass off in the mapM version cleaning up the list that will never be used.
You may ask, why use mapM if you're discarding the values? Unfortunately in my real app I need the values, which are more interesting than IO ().
If you need the values, then you've got to pay that price I suppose. If you need the values, I'm going to take a stab that in your real app you use a lot of memory because of this (because presumably you're keeping the values around), whereas you're just seeing a speed hit on this small test program.
Luke

Hello Ben, Friday, April 25, 2008, 1:14:17 AM, you wrote: mapM2 :: Monad m =>> (a -> m b) -> [a] -> m [b]
{-# INLINE mapM2 #-} mapM2 fn lst = mapM2accum fn lst [] where mapM2accum _ [] accum = return accum mapM2accum fn (x:xs) accum = do r <- fn x mapM2accum fn xs (r:accum)
it seems you forget to reverse accum before returning it? -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hi Ben,
mapM2 :: Monad m => (a -> m b) -> [a] -> m [b] {-# INLINE mapM2 #-} mapM2 fn lst = mapM2accum fn lst [] where mapM2accum _ [] accum = return accum mapM2accum fn (x:xs) accum = do r <- fn x mapM2accum fn xs (r:accum)
Not that it should matter for performance any, but you really ought to reverse the result list too, or compute the accumulator in the right order. :-) mapM2 :: Monad m => (a -> m b) -> [a] -> m [b] {-# INLINE mapM2 #-} mapM2 fn lst = mapM2accum fn lst id where mapM2accum _ [] accum = return $ accum [] mapM2accum fn (x:xs) accum = do r <- fn x mapM2accum fn xs (accum . (r:)) Cheers, /Niklas

Hello Niklas, Friday, April 25, 2008, 1:25:39 AM, you wrote:
Not that it should matter for performance any, but you really ought to reverse the result list too, or compute the accumulator in the right order. :-)
unfortunately, this affects performance too. reverse costs one more scan through the list and building lot of thunks has its own space and time cost -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On the test case i'm running the performance impacts of reversing the
return list are negligible:
mapM3 :: Monad m => (a -> m b) -> [a] -> m [b]
{-# INLINE mapM3 #-}
mapM3 fn lst = mapM3accum fn lst []
where mapM3accum _ [] accum = return $ reverse accum
mapM3accum fn (x:xs) accum = do
r <- fn x
mapM3accum fn xs (r:accum)
main5 = do
print $ length $ mapM_ (flip replicate ()) [1..11]
time ~ 18 seconds (about the same, faster on my machine probably due
to timing artifacts) and the memory was about the same (strangely less
than the non-reversing one though again that's probably an artifact.)
In any case, I have some questions:
1) Why is the Prelude mapM so slow? It seems like running 10x slower
than mapM_ when generating only 50,000 return values is a problem.
2) Is there a reason to not use mapM3 above?
Thanks and take care, Ben
On Thu, Apr 24, 2008 at 2:33 PM, Bulat Ziganshin
Hello Niklas,
Friday, April 25, 2008, 1:25:39 AM, you wrote:
Not that it should matter for performance any, but you really ought to reverse the result list too, or compute the accumulator in the right order. :-)
unfortunately, this affects performance too. reverse costs one more scan through the list and building lot of thunks has its own space and time cost
-- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

On Thu, Apr 24, 2008 at 11:28 PM, Ben
2) Is there a reason to not use mapM3 above?
Yes, there certainly is. mapM3 is not equivalent to mapM; it is too strict: *Main> take 3 $ head $ mapM return [1,2,3,4,undefined] [1,2,3] *Main> take 3 $ head $ mapM3 return [1,2,3,4,undefined] [*** Exception: Prelude.undefined So, like foldl', mapM3 seems a viable alternative for mapM, but not a replacement. Luke

Luke,
Thanks for the nice answer. So maybe I'll give mapM3 the name mapM'
and put it in my personal library.
But I'm still a bit curious about the performance profile of mapM.
The profiler is telling me they're allocating around the same amount
of memory, so I am not clear what is making it slow. I am guessing it
has something to do with extra thunks due to laziness, but a 10x
slowdown?
Thanks again, B
On Thu, Apr 24, 2008 at 4:45 PM, Luke Palmer
On Thu, Apr 24, 2008 at 11:28 PM, Ben
wrote: 2) Is there a reason to not use mapM3 above?
Yes, there certainly is. mapM3 is not equivalent to mapM; it is too strict:
*Main> take 3 $ head $ mapM return [1,2,3,4,undefined] [1,2,3] *Main> take 3 $ head $ mapM3 return [1,2,3,4,undefined] [*** Exception: Prelude.undefined
So, like foldl', mapM3 seems a viable alternative for mapM, but not a replacement.
Luke

On Fri, Apr 25, 2008 at 12:02 AM, Ben
Luke,
Thanks for the nice answer. So maybe I'll give mapM3 the name mapM' and put it in my personal library.
Except the answer was wrong. I forgot the reverse in my implementation, so that undefined we were seeing was just the last element of the list. But the conclusion is still true :-) *Main> take 3 $ runIdentity $ mapM return (1:2:3:4:undefined) [1,2,3] *Main> take 3 $ runIdentity $ mapM3 return (1:2:3:4:undefined) *** Exception: Prelude.undefined
But I'm still a bit curious about the performance profile of mapM. The profiler is telling me they're allocating around the same amount of memory, so I am not clear what is making it slow. I am guessing it has something to do with extra thunks due to laziness, but a 10x slowdown?
Tail recursion can make a huge difference in strict settings: the difference between a loop and recursion in C. Luke

2) Is there a reason to not use mapM3 above?
Yes, there certainly is. mapM3 is not equivalent to mapM; it is too strict:
*Main> take 3 $ head $ mapM return [1,2,3,4,undefined] [1,2,3] *Main> take 3 $ head $ mapM3 return [1,2,3,4,undefined] [*** Exception: Prelude.undefined
So, like foldl', mapM3 seems a viable alternative for mapM, but not a replacement.
Wow. A 10x slowdown for a very commonly used function that in 99.8% of all use cases has no need for the extra laziness at all. No wonder some people say Haskell is a toy language... /Niklas

niklas.broberg:
2) Is there a reason to not use mapM3 above?
Yes, there certainly is. mapM3 is not equivalent to mapM; it is too strict:
*Main> take 3 $ head $ mapM return [1,2,3,4,undefined] [1,2,3] *Main> take 3 $ head $ mapM3 return [1,2,3,4,undefined] [*** Exception: Prelude.undefined
So, like foldl', mapM3 seems a viable alternative for mapM, but not a replacement.
Wow. A 10x slowdown for a very commonly used function that in 99.8% of all use cases has no need for the extra laziness at all. No wonder some people say Haskell is a toy language...
mapM_ is far more common, and optimised specially. -- Don

2008/4/25, Niklas Broberg
Wow. A 10x slowdown for a very commonly used function that in 99.8% of all use cases has no need for the extra laziness at all. No wonder some people say Haskell is a toy language...
A toy language that is still much faster than many currently popular languages so... Is Ruby/Python/... a toy too ? Still these numbers seems odd, there's probably something that don't optimize very well here. -- Jedaï

Wow. A 10x slowdown for a very commonly used function that in 99.8% of all use cases has no need for the extra laziness at all. No wonder some people say Haskell is a toy language...
A toy language that is still much faster than many currently popular languages so... Is Ruby/Python/... a toy too ?
I didn't say I agree, I most certainly don't. What I meant with my comment was that a slowdown of 10x, just to preserve laziness, is perfect fuel for those who claim that laziness is good in theory but bad in practice. And my alarm was more directed towards the fact that others seemed to find that perfectly acceptable. There are of course mitigating factors, in particular that mapM is rather uncommon over input lists that size, and for smaller list (say 50k instead of 500k) the slowdown isn't even half as bad (more like 2-3x). But I'm glad to hear that Simon is alarmed too. ;-) Cheers, /Niklas

Hi
I didn't say I agree, I most certainly don't. What I meant with my comment was that a slowdown of 10x, just to preserve laziness, is perfect fuel for those who claim that laziness is good in theory but bad in practice.
A bad implementation of laziness will always be slower than a bad implementation of strictness - as we have strict CPU's. However, laziness gives you some really cool opportunities for deforestation and supercompilation - so at some point Haskell will overcome the performance penalty of laziness and start to reap the performance benefits - I think... Thanks Neil

| 1) Why is the Prelude mapM so slow? It seems like running 10x slower | than mapM_ when generating only 50,000 return values is a problem. All this does seem odd. I've submitted a ticket so we don't forget it http://hackage.haskell.org/trac/ghc/ticket/2236 It appears to be some bad (possibly even non-linear) run-time system or garbage collector effect, caused by the very deep stack. Thanks for persisting with this. I'd expect mapM to be a bit slower, but not *that* much. Simon
participants (8)
-
Ben
-
Bulat Ziganshin
-
Chaddaï Fouché
-
Don Stewart
-
Luke Palmer
-
Neil Mitchell
-
Niklas Broberg
-
Simon Peyton-Jones