On Thu, Aug 28, 2014 at 11:49 AM, Michael Snoyman <michael@snoyman.com> wrote:

On Thu, Aug 28, 2014 at 11:37 AM, Simon Peyton Jones <simonpj@microsoft.com> wrote:

GHC is keeping the entire representation of `lengthM` in memory

Do you mean that? lengthM is a function; its representation is just code.

At the time I wrote it, I did. What I was seeing in the earlier profiling was that a large number of conduit constructors were being kept in memory, and I initially thought something similar was happening with lengthM. It *does* in fact seem like the memory problems with this later example are simply the list being kept in memory. And in fact, there's a far simpler version of this that demonstrates the problem:

main :: IO ()
main = printLen >> printLen

printLen :: IO ()
printLen = lengthM 0 [1..40000000 :: Int] >>= print

lengthM :: Monad m => Int -> [a] -> m Int
lengthM cnt [] = return cnt
lengthM cnt (_:xs) =
cnt' `seq` lengthM cnt' xs
where
cnt' = cnt + 1

I'll add that as a comment to #7206.

This still doesn't answer what's going on in the original code. I'm concerned that the issue may be the same, but I'm not seeing anything in the core yet that's jumping out at me as being the problem. I'll try to look at the code again with fresher eyes later today.

Alright, I've opened up a GHC issue about this:

https://ghc.haskell.org/trac/ghc/ticket/9520

I'm going to continue trying to knock this down to a simpler test case, but it seems that it's sufficient to call `action` twice to make the memory usage high.

Michael