
GHC is keeping the entire representation of `lengthM` in memory
Do you mean that? lengthM is a function; its representation is just code.
Perhaps you mean that GHC is keeping the entire list [1..1000000] in memory? Now that certainly makes sense… after all, doing so saves allocating (I# 4), (I# 5) etc for each call of the function passed to awaitForever. Granted, it’s probably a bad idea in this case.
If that is your issue (still to be confirmed) the relevant ticket is https://ghc.haskell.org/trac/ghc/ticket/7206; could you add your example to that ticket, as further evidence that something should be done?
See also comment:9 in the ticket, which I have just added.
Simon
From: Haskell-Cafe [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Michael Snoyman
Sent: 28 August 2014 06:57
To: Bryan Vicknair
Cc: Haskell Cafe
Subject: Re: [Haskell-cafe] Conduit+GHC high memory use for simple Sink
I actually just got to an interesting result: sink2 is a red herring. Consider the following program:
import Control.Monad.IO.Class ( liftIO )
import Data.Conduit.Internal (ConduitM (..), Pipe (..), (>+>), runPipe, awaitForever)
main :: IO ()
main = runPipe $
(HaveOutput (Done ()) (return ()) ()) >+>
awaitForever (\_ -> liftIO $ lengthM 0 [1..10000000 :: Int] >>= print)
lengthM :: Monad m => Int -> [a] -> m Int
lengthM cnt [] = return cnt
lengthM cnt (_:xs) =
cnt' `seq` lengthM cnt' xs
where
cnt' = cnt + 1
On my machine, it takes 375MB of memory. What appears to be the cause is that GHC is keeping the entire representation of `lengthM` in memory, which is clearly a pessimization. I still need to research this further, but I thought you'd want to see these results now. (Plus, maybe someone else has some other ideas.)
In case anyone wants, the core for this code is available at:
http://lpaste.net/110125
Michael
On Thu, Aug 28, 2014 at 8:58 AM, Bryan Vicknair
But looking at the code again with fresher eyes than last night: I really don't understand why it had such abysmal performance. I'll look into this a bit more, looks like it should be interesting.
On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton
mailto:danburton.email@gmail.com> wrote: Michael, I don't see how your code sample for (3) is any different to the compiler than Roman's original sink2.
I also don't see how the original sink2 creates a bad bind tree. I presume that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3).
I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes.
-- Dan Burton
On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka
mailto:roma@ro-che.info> wrote: * Michael Snoyman
mailto:michael@snoyman.com> [2014-08-27 23:48:06+0300] The problem is the following Sink, which counts how many even/odd Tokens are seen:
type SinkState = (Integer, Integer)
sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1)
Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:
1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)