Conduit+GHC high memory use for simple Sink

newer
ANN: HacBerlin - Haskell Hackathon...

older
Haskell Platform 2014 build...

Bryan Vicknair

27 Aug 2014 27 Aug '14

2:19 p.m.

Hello Cafe, First I'd like to thank Michael Snoyman and Gabriel Gonzalez for the work they've done on the conduit and pipes stream processing libraries, and all the accompanying tutorial content. I've been having fun converting a text processing app from C to Haskell. I'm seeing unexpectedly high memory usage in a stream-processing program that uses the conduit library. I've created a example that demonstrates the problem. The program accepts gzip files as arguments, and for each one, classifies each line as either Even or Odd depending on the length, then outputs some result depending on the Sink used. For each gzip file: action :: GzipFilePath -> IO () action (GzipFilePath filePath) = do result <- runResourceT $ CB.sourceFile filePath $$ Zlib.ungzip =$ CB.lines =$ token =$ sink2 putStrLn $ show result The problem is the following Sink, which counts how many even/odd Tokens are seen: type SinkState = (Integer, Integer) sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1) When I give this program a few gzip files, it uses hundreds of megabytes of resident memory. When I give the same files as input, but use the following simple Sink, it only uses about 8Mb of resident memory: sink1 :: MonadIO m => Sink Token m () sink1 = awaitForever (liftIO . putStrLn . show) At first I thought that sink2 performed so poorly because the addition thunks were being placed onto the heap until the end, so I added some bang patterns to make it strict. That didn't help however. I've done profiling, but I'm unable to figure out exactly what is being added to the heap in sink2 but not sink1, or what is being garbage collected in sink1, but not sink2. The full source is here: https://bitbucket.org/bryanvick/conduit-mem/src/HEAD/hsrc/bin/mem.hs Or you can clone the repo, which contains a cabal file for easy building: git clone git@bitbucket.org:bryanvick/conduit-mem.git cd comduit-mem cabal sandbox init cabal install --only-dependencies cabal build mem ./dist/build/mem/mem [GIVE SOME GZIP FILES HERE] You can change which sink is used in the 'action' function to see the different memory usage.

Show replies by date

Michael Snoyman

27 Aug 27 Aug

4:48 p.m.

On Wed, Aug 27, 2014 at 9:19 PM, Bryan Vicknair wrote:

...

Hello Cafe,

First I'd like to thank Michael Snoyman and Gabriel Gonzalez for the work they've done on the conduit and pipes stream processing libraries, and all the accompanying tutorial content. I've been having fun converting a text processing app from C to Haskell.

I'm seeing unexpectedly high memory usage in a stream-processing program that uses the conduit library.

I've created a example that demonstrates the problem. The program accepts gzip files as arguments, and for each one, classifies each line as either Even or Odd depending on the length, then outputs some result depending on the Sink used. For each gzip file:

action :: GzipFilePath -> IO () action (GzipFilePath filePath) = do result <- runResourceT $ CB.sourceFile filePath $$ Zlib.ungzip =$ CB.lines =$ token =$ sink2 putStrLn $ show result

The problem is the following Sink, which counts how many even/odd Tokens are seen:

type SinkState = (Integer, Integer)

sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1)

When I give this program a few gzip files, it uses hundreds of megabytes of resident memory. When I give the same files as input, but use the following simple Sink, it only uses about 8Mb of resident memory:

sink1 :: MonadIO m => Sink Token m () sink1 = awaitForever (liftIO . putStrLn . show)

At first I thought that sink2 performed so poorly because the addition thunks were being placed onto the heap until the end, so I added some bang patterns to make it strict. That didn't help however.

I've done profiling, but I'm unable to figure out exactly what is being added to the heap in sink2 but not sink1, or what is being garbage collected in sink1, but not sink2.

The full source is here: https://bitbucket.org/bryanvick/conduit-mem/src/HEAD/hsrc/bin/mem.hs

Or you can clone the repo, which contains a cabal file for easy building:

git clone git@bitbucket.org:bryanvick/conduit-mem.git cd comduit-mem cabal sandbox init cabal install --only-dependencies cabal build mem ./dist/build/mem/mem [GIVE SOME GZIP FILES HERE]

You can change which sink is used in the 'action' function to see the different memory usage. http://www.haskell.org/mailman/listinfo/haskell-cafe

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this: 1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.) 2. Instead of writing your `sink2` as you have, express it in terms of Data.Conduit.List.fold, which associates the right way. This looks like: fold add (0, 0) where add (!x, !y) Even = (x + 1, y) add (!x, !y) Odd = (x, y + 1) 3. Allow conduit 1.1's rewrite rules to emulate the same behavior and bypass the expensive monadic bind. This can be done by replacing your current `await` with "await followed by bind", e.g.: sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do await >>= maybe (return state) go where go Even = sink2 (evenCount + 1, oddCount ) go Odd = sink2 (evenCount , oddCount + 1) I'd definitely recommend (1). I'd *also* recommend using (2), as the built in functions will often times be more efficient than something hand-rolled, especially now that stream fusion is being added[2]. With conduit 1.2, step (3) *will* be a bit more efficient still (it avoids create an extra Maybe value), but not in a significant way. Michael [1] https://www.fpcomplete.com/blog/2014/08/iap-speeding-up-conduit [2] https://www.fpcomplete.com/blog/2014/08/conduit-stream-fusion

Roman Cheplyaka

5:55 p.m.

* Michael Snoyman [2014-08-27 23:48:06+0300]

...

...
The problem is the following Sink, which counts how many even/odd Tokens are seen:

type SinkState = (Integer, Integer)

sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1)

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:

1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)

Interesting. From looking at sink2, it seems that it produces a good, right-associated bind tree. Am I missing something? And what occupies the memory in this case? Roman

Dan Burton

6:39 p.m.

Michael, I don't see how your code sample for (3) is any different to the compiler than Roman's original sink2. I also don't see how the original sink2 creates a bad bind tree. I presume that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3). I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes. -- Dan Burton On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka wrote:

...

* Michael Snoyman [2014-08-27 23:48:06+0300]

...
...
The problem is the following Sink, which counts how many even/odd Tokens are seen:

type SinkState = (Integer, Integer)

sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1)

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:

1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)

Interesting. From looking at sink2, it seems that it produces a good, right-associated bind tree. Am I missing something?

And what occupies the memory in this case?

Roman

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Michael Snoyman

28 Aug 28 Aug

midnight

Firstly, I think you mean Bryan's code, right? Secondly, I wrote this email five minutes before going to bed, so please excuse any inaccuracies or incoherence in it. (3) is different to the original because it is form as `await >>= maybe`. There's a rewrite rule for this (providing the conduit 1.1 version): {-# RULES "await >>= maybe" forall x y. await >>= maybe x y = ConduitM (NeedInput (unConduitM . y) (unConduitM . const x)) #-} This gives two advantages: it avoids a costly (at least in conduit 1.1) monadic bind, and avoids creating a Maybe value that we're going to immediately throw away. This is the general idea of church encoding data structures, which is why it should give *some* speedup, but nothing significant. Regarding fold: it *is* identical to (3). I don't think the rewrite rules are playing a big part here. You're right about needing to be careful about fold due to WHNF. However, Bryan's code already used bang patterns to force evaluation of both values, so that isn't a problem. In a real-world application, I'd *also* recommend creating a strict pair datatype to avoid accidentally leaking thunks. But looking at the code again with fresher eyes than last night: I really don't understand why it had such abysmal performance. I'll look into this a bit more, looks like it should be interesting. On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton wrote:

...

Michael, I don't see how your code sample for (3) is any different to the compiler than Roman's original sink2.

I also don't see how the original sink2 creates a bad bind tree. I presume that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3).

I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes.

-- Dan Burton

On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka wrote:

...
* Michael Snoyman [2014-08-27 23:48:06+0300]

...
...
The problem is the following Sink, which counts how many even/odd Tokens are seen:

type SinkState = (Integer, Integer)

sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1)

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:

1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)

Interesting. From looking at sink2, it seems that it produces a good, right-associated bind tree. Am I missing something?

And what occupies the memory in this case?

Roman

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Bryan Vicknair

1:58 a.m.

Thanks for the interesting blog posts Michael. I updated the example project [1] to use conduit 1.2. Unfortunately, on my machine [2], my original sink2 still uses about 500Mb of memory when processing 4 gzip files of about 5Mb each, while sink1 only uses about 8Mb. I added sink3, which does the same as sink2 but uses fold from Conduit.List as you recommended, and that seems to work, using about 8Mb. Looking at the code for sink2 vs sink3, I don't understand what would be occupying so much memory in sink2 even in the case of expensive monadic binding, or exclusion from stream fusion. I'm curious if sink2 adds thunks to the heap that sink3 doesn't, or if the GC is failing to clean up heap objects in sink2 that is cleans up in sink3. I'm new at memory profiling, but the chart I get with '+RTS -h' or '+RTS -hr' basically just tells me that the action function is expensive. In the real project that inspired this example I'm going to do some cleanup, replacing manual recursion with higher-level functions from Conduit.List, as that seems like an all around good idea. Bryan Vicknair [1] https://bitbucket.org/bryanvick/conduit-mem [2] GHC 7.8.3, Arch Linux 3.16.1 kernel x86-64 On Thu, Aug 28, 2014 at 07:00:41AM +0300, Michael Snoyman wrote: <snip>

...

But looking at the code again with fresher eyes than last night: I really don't understand why it had such abysmal performance. I'll look into this a bit more, looks like it should be interesting.

On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton wrote:

...
Michael, I don't see how your code sample for (3) is any different to the compiler than Roman's original sink2.

I also don't see how the original sink2 creates a bad bind tree. I presume that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3).

I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes.

-- Dan Burton

On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka wrote:

...
* Michael Snoyman [2014-08-27 23:48:06+0300]

...
...
The problem is the following Sink, which counts how many even/odd Tokens are seen:

type SinkState = (Integer, Integer)

sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1)

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:

1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)

Michael Snoyman

1:56 a.m.

I actually just got to an interesting result: sink2 is a red herring. Consider the following program: import Control.Monad.IO.Class ( liftIO ) import Data.Conduit.Internal (ConduitM (..), Pipe (..), (>+>), runPipe, awaitForever) main :: IO () main = runPipe $ (HaveOutput (Done ()) (return ()) ()) >+> awaitForever (\_ -> liftIO $ lengthM 0 [1..10000000 :: Int] >>= print) lengthM :: Monad m => Int -> [a] -> m Int lengthM cnt [] = return cnt lengthM cnt (_:xs) = cnt' `seq` lengthM cnt' xs where cnt' = cnt + 1 On my machine, it takes 375MB of memory. What appears to be the cause is that GHC is keeping the entire representation of `lengthM` in memory, which is clearly a pessimization. I still need to research this further, but I thought you'd want to see these results now. (Plus, maybe someone else has some other ideas.) In case anyone wants, the core for this code is available at: http://lpaste.net/110125 Michael On Thu, Aug 28, 2014 at 8:58 AM, Bryan Vicknair wrote:

...

Thanks for the interesting blog posts Michael. I updated the example project [1] to use conduit 1.2. Unfortunately, on my machine [2], my original sink2 still uses about 500Mb of memory when processing 4 gzip files of about 5Mb each, while sink1 only uses about 8Mb. I added sink3, which does the same as sink2 but uses fold from Conduit.List as you recommended, and that seems to work, using about 8Mb.

Looking at the code for sink2 vs sink3, I don't understand what would be occupying so much memory in sink2 even in the case of expensive monadic binding, or exclusion from stream fusion. I'm curious if sink2 adds thunks to the heap that sink3 doesn't, or if the GC is failing to clean up heap objects in sink2 that is cleans up in sink3. I'm new at memory profiling, but the chart I get with '+RTS -h' or '+RTS -hr' basically just tells me that the action function is expensive.

In the real project that inspired this example I'm going to do some cleanup, replacing manual recursion with higher-level functions from Conduit.List, as that seems like an all around good idea.

Bryan Vicknair

[1] https://bitbucket.org/bryanvick/conduit-mem [2] GHC 7.8.3, Arch Linux 3.16.1 kernel x86-64

...
But looking at the code again with fresher eyes than last night: I really don't understand why it had such abysmal performance. I'll look into

...
bit more, looks like it should be interesting.

On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton wrote:

...
Michael, I don't see how your code sample for (3) is any different to

...
...
compiler than Roman's original sink2.

I also don't see how the original sink2 creates a bad bind tree. I

On Thu, Aug 28, 2014 at 07:00:41AM +0300, Michael Snoyman wrote: <snip> this a the presume

...
...
that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3).

I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes.

-- Dan Burton

On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka wrote:

...
* Michael Snoyman [2014-08-27 23:48:06+0300]

...
...
The problem is the following Sink, which counts how many even/odd Tokens are seen:

type SinkState = (Integer, Integer)

sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1)

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:

1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)

Michael Snoyman

2:31 a.m.

Well, I got it down to a case that depends on only base, and uses its own local implementation of a minimal conduit: http://lpaste.net/110126 But I'm not certain if this is still reproducing the original issue, or if the list getting lifted out here is a different issue. On Thu, Aug 28, 2014 at 8:56 AM, Michael Snoyman wrote:

...

I actually just got to an interesting result: sink2 is a red herring. Consider the following program:

import Control.Monad.IO.Class ( liftIO ) import Data.Conduit.Internal (ConduitM (..), Pipe (..), (>+>), runPipe, awaitForever)

main :: IO () main = runPipe $ (HaveOutput (Done ()) (return ()) ()) >+> awaitForever (\_ -> liftIO $ lengthM 0 [1..10000000 :: Int] >>= print)

lengthM :: Monad m => Int -> [a] -> m Int lengthM cnt [] = return cnt lengthM cnt (_:xs) = cnt' `seq` lengthM cnt' xs where cnt' = cnt + 1

On my machine, it takes 375MB of memory. What appears to be the cause is that GHC is keeping the entire representation of `lengthM` in memory, which is clearly a pessimization. I still need to research this further, but I thought you'd want to see these results now. (Plus, maybe someone else has some other ideas.)

In case anyone wants, the core for this code is available at:

http://lpaste.net/110125

Michael

On Thu, Aug 28, 2014 at 8:58 AM, Bryan Vicknair wrote:

...
Thanks for the interesting blog posts Michael. I updated the example project [1] to use conduit 1.2. Unfortunately, on my machine [2], my original sink2 still uses about 500Mb of memory when processing 4 gzip files of about 5Mb each, while sink1 only uses about 8Mb. I added sink3, which does the same as sink2 but uses fold from Conduit.List as you recommended, and that seems to work, using about 8Mb.

Looking at the code for sink2 vs sink3, I don't understand what would be occupying so much memory in sink2 even in the case of expensive monadic binding, or exclusion from stream fusion. I'm curious if sink2 adds thunks to the heap that sink3 doesn't, or if the GC is failing to clean up heap objects in sink2 that is cleans up in sink3. I'm new at memory profiling, but the chart I get with '+RTS -h' or '+RTS -hr' basically just tells me that the action function is expensive.

In the real project that inspired this example I'm going to do some cleanup, replacing manual recursion with higher-level functions from Conduit.List, as that seems like an all around good idea.

Bryan Vicknair

[1] https://bitbucket.org/bryanvick/conduit-mem [2] GHC 7.8.3, Arch Linux 3.16.1 kernel x86-64

...
But looking at the code again with fresher eyes than last night: I really don't understand why it had such abysmal performance. I'll look into

...
bit more, looks like it should be interesting.

On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton wrote:

...
Michael, I don't see how your code sample for (3) is any different to

...
...
compiler than Roman's original sink2.

I also don't see how the original sink2 creates a bad bind tree. I

On Thu, Aug 28, 2014 at 07:00:41AM +0300, Michael Snoyman wrote: <snip> this a the presume

...
...
that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3).

I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes.

-- Dan Burton

On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka wrote:

...
* Michael Snoyman [2014-08-27 23:48:06+0300]

...
> The problem is the following Sink, which counts how many even/odd Tokens > are > seen: > > type SinkState = (Integer, Integer) > > sink2 :: (Monad m) => SinkState -> Sink Token m SinkState > sink2 state@(!evenCount, !oddCount) = do > maybeToken <- await > case maybeToken of > Nothing -> return state > (Just Even) -> sink2 (evenCount + 1, oddCount ) > (Just Odd ) -> sink2 (evenCount , oddCount + 1)

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:

1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)

Simon Peyton Jones

4:37 a.m.

GHC is keeping the entire representation of `lengthM` in memory Do you mean that? lengthM is a function; its representation is just code. Perhaps you mean that GHC is keeping the entire list [1..1000000] in memory? Now that certainly makes sense… after all, doing so saves allocating (I# 4), (I# 5) etc for each call of the function passed to awaitForever. Granted, it’s probably a bad idea in this case. If that is your issue (still to be confirmed) the relevant ticket is https://ghc.haskell.org/trac/ghc/ticket/7206; could you add your example to that ticket, as further evidence that something should be done? See also comment:9 in the ticket, which I have just added. Simon From: Haskell-Cafe [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Michael Snoyman Sent: 28 August 2014 06:57 To: Bryan Vicknair Cc: Haskell Cafe Subject: Re: [Haskell-cafe] Conduit+GHC high memory use for simple Sink I actually just got to an interesting result: sink2 is a red herring. Consider the following program: import Control.Monad.IO.Class ( liftIO ) import Data.Conduit.Internal (ConduitM (..), Pipe (..), (>+>), runPipe, awaitForever) main :: IO () main = runPipe $ (HaveOutput (Done ()) (return ()) ()) >+> awaitForever (\_ -> liftIO $ lengthM 0 [1..10000000 :: Int] >>= print) lengthM :: Monad m => Int -> [a] -> m Int lengthM cnt [] = return cnt lengthM cnt (_:xs) = cnt' `seq` lengthM cnt' xs where cnt' = cnt + 1 On my machine, it takes 375MB of memory. What appears to be the cause is that GHC is keeping the entire representation of `lengthM` in memory, which is clearly a pessimization. I still need to research this further, but I thought you'd want to see these results now. (Plus, maybe someone else has some other ideas.) In case anyone wants, the core for this code is available at: http://lpaste.net/110125 Michael On Thu, Aug 28, 2014 at 8:58 AM, Bryan Vicknair mailto:bryanvick@gmail.com> wrote: Thanks for the interesting blog posts Michael. I updated the example project [1] to use conduit 1.2. Unfortunately, on my machine [2], my original sink2 still uses about 500Mb of memory when processing 4 gzip files of about 5Mb each, while sink1 only uses about 8Mb. I added sink3, which does the same as sink2 but uses fold from Conduit.List as you recommended, and that seems to work, using about 8Mb. Looking at the code for sink2 vs sink3, I don't understand what would be occupying so much memory in sink2 even in the case of expensive monadic binding, or exclusion from stream fusion. I'm curious if sink2 adds thunks to the heap that sink3 doesn't, or if the GC is failing to clean up heap objects in sink2 that is cleans up in sink3. I'm new at memory profiling, but the chart I get with '+RTS -h' or '+RTS -hr' basically just tells me that the action function is expensive. In the real project that inspired this example I'm going to do some cleanup, replacing manual recursion with higher-level functions from Conduit.List, as that seems like an all around good idea. Bryan Vicknair [1] https://bitbucket.org/bryanvick/conduit-mem [2] GHC 7.8.3, Arch Linux 3.16.1 kernel x86-64 On Thu, Aug 28, 2014 at 07:00:41AM +0300, Michael Snoyman wrote: <snip>

...

But looking at the code again with fresher eyes than last night: I really don't understand why it had such abysmal performance. I'll look into this a bit more, looks like it should be interesting.

On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton mailto:danburton.email@gmail.com> wrote:

...
Michael, I don't see how your code sample for (3) is any different to the compiler than Roman's original sink2.

I also don't see how the original sink2 creates a bad bind tree. I presume that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3).

I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes.

-- Dan Burton

On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka mailto:roma@ro-che.info> wrote:

...
* Michael Snoyman mailto:michael@snoyman.com> [2014-08-27 23:48:06+0300]

...
...
The problem is the following Sink, which counts how many even/odd Tokens are seen:

type SinkState = (Integer, Integer)

sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1)

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:

1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)

Michael Snoyman

4:49 a.m.

On Thu, Aug 28, 2014 at 11:37 AM, Simon Peyton Jones wrote:

...

GHC is keeping the entire representation of `lengthM` in memory

Do you mean that? lengthM is a function; its representation is just code.

At the time I wrote it, I did. What I was seeing in the earlier profiling was that a large number of conduit constructors were being kept in memory, and I initially thought something similar was happening with lengthM. It *does* in fact seem like the memory problems with this later example are simply the list being kept in memory. And in fact, there's a far simpler version of this that demonstrates the problem: main :: IO () main = printLen >> printLen printLen :: IO () printLen = lengthM 0 [1..40000000 :: Int] >>= print lengthM :: Monad m => Int -> [a] -> m Int lengthM cnt [] = return cnt lengthM cnt (_:xs) = cnt' `seq` lengthM cnt' xs where cnt' = cnt + 1 I'll add that as a comment to #7206. This still doesn't answer what's going on in the original code. I'm concerned that the issue may be the same, but I'm not seeing anything in the core yet that's jumping out at me as being the problem. I'll try to look at the code again with fresher eyes later today. Michael

...

Perhaps you mean that GHC is keeping the entire list [1..1000000] in memory? Now that certainly makes sense… after all, doing so saves allocating (I# 4), (I# 5) etc for each call of the function passed to awaitForever. Granted, it’s probably a bad idea in this case.

If that is your issue (still to be confirmed) the relevant ticket is https://ghc.haskell.org/trac/ghc/ticket/7206; could you add your example to that ticket, as further evidence that something should be done?

See also comment:9 in the ticket, which I have just added.

Simon

*From:* Haskell-Cafe [mailto:haskell-cafe-bounces@haskell.org] *On Behalf Of *Michael Snoyman *Sent:* 28 August 2014 06:57 *To:* Bryan Vicknair *Cc:* Haskell Cafe *Subject:* Re: [Haskell-cafe] Conduit+GHC high memory use for simple Sink

I actually just got to an interesting result: sink2 is a red herring. Consider the following program:

import Control.Monad.IO.Class ( liftIO ) import Data.Conduit.Internal (ConduitM (..), Pipe (..), (>+>), runPipe, awaitForever)

main :: IO () main = runPipe $ (HaveOutput (Done ()) (return ()) ()) >+> awaitForever (\_ -> liftIO $ lengthM 0 [1..10000000 :: Int] >>= print)

lengthM :: Monad m => Int -> [a] -> m Int lengthM cnt [] = return cnt lengthM cnt (_:xs) = cnt' `seq` lengthM cnt' xs where cnt' = cnt + 1

On my machine, it takes 375MB of memory. What appears to be the cause is that GHC is keeping the entire representation of `lengthM` in memory, which is clearly a pessimization. I still need to research this further, but I thought you'd want to see these results now. (Plus, maybe someone else has some other ideas.)

In case anyone wants, the core for this code is available at:

http://lpaste.net/110125

Michael

On Thu, Aug 28, 2014 at 8:58 AM, Bryan Vicknair wrote:

Thanks for the interesting blog posts Michael. I updated the example project [1] to use conduit 1.2. Unfortunately, on my machine [2], my original sink2 still uses about 500Mb of memory when processing 4 gzip files of about 5Mb each, while sink1 only uses about 8Mb. I added sink3, which does the same as sink2 but uses fold from Conduit.List as you recommended, and that seems to work, using about 8Mb.

Looking at the code for sink2 vs sink3, I don't understand what would be occupying so much memory in sink2 even in the case of expensive monadic binding, or exclusion from stream fusion. I'm curious if sink2 adds thunks to the heap that sink3 doesn't, or if the GC is failing to clean up heap objects in sink2 that is cleans up in sink3. I'm new at memory profiling, but the chart I get with '+RTS -h' or '+RTS -hr' basically just tells me that the action function is expensive.

In the real project that inspired this example I'm going to do some cleanup, replacing manual recursion with higher-level functions from Conduit.List, as that seems like an all around good idea.

Bryan Vicknair

[1] https://bitbucket.org/bryanvick/conduit-mem [2] GHC 7.8.3, Arch Linux 3.16.1 kernel x86-64

On Thu, Aug 28, 2014 at 07:00:41AM +0300, Michael Snoyman wrote: <snip>

...
But looking at the code again with fresher eyes than last night: I really don't understand why it had such abysmal performance. I'll look into this a bit more, looks like it should be interesting.

On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton wrote:

...
Michael, I don't see how your code sample for (3) is any different to the compiler than Roman's original sink2.

I also don't see how the original sink2 creates a bad bind tree. I presume that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3).

I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes.

-- Dan Burton

On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka wrote:

...
* Michael Snoyman [2014-08-27 23:48:06+0300]

...
...
The problem is the following Sink, which counts how many even/odd Tokens are seen:

type SinkState = (Integer, Integer)

sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1)

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:

1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)

Michael Snoyman

9:43 a.m.

On Thu, Aug 28, 2014 at 11:49 AM, Michael Snoyman wrote:

...

On Thu, Aug 28, 2014 at 11:37 AM, Simon Peyton Jones < simonpj@microsoft.com> wrote:

...
GHC is keeping the entire representation of `lengthM` in memory

Do you mean that? lengthM is a function; its representation is just code.

At the time I wrote it, I did. What I was seeing in the earlier profiling was that a large number of conduit constructors were being kept in memory, and I initially thought something similar was happening with lengthM. It *does* in fact seem like the memory problems with this later example are simply the list being kept in memory. And in fact, there's a far simpler version of this that demonstrates the problem:

main :: IO () main = printLen >> printLen

printLen :: IO () printLen = lengthM 0 [1..40000000 :: Int] >>= print

lengthM :: Monad m => Int -> [a] -> m Int lengthM cnt [] = return cnt lengthM cnt (_:xs) = cnt' `seq` lengthM cnt' xs where cnt' = cnt + 1

I'll add that as a comment to #7206.

This still doesn't answer what's going on in the original code. I'm concerned that the issue may be the same, but I'm not seeing anything in the core yet that's jumping out at me as being the problem. I'll try to look at the code again with fresher eyes later today.

Alright, I've opened up a GHC issue about this: https://ghc.haskell.org/trac/ghc/ticket/9520 I'm going to continue trying to knock this down to a simpler test case, but it seems that it's sufficient to call `action` twice to make the memory usage high. Michael

Michael Snoyman

2:37 a.m.

Can you provide the output from +RTS -s, as well as the heap profile for -hy? I believe I'm now also reproducing the memory leak on conduit 1.2, so it must have been a mistake in my testing last night when I thought 1.2 fixed it. On Thu, Aug 28, 2014 at 8:58 AM, Bryan Vicknair wrote:

...

Thanks for the interesting blog posts Michael. I updated the example project [1] to use conduit 1.2. Unfortunately, on my machine [2], my original sink2 still uses about 500Mb of memory when processing 4 gzip files of about 5Mb each, while sink1 only uses about 8Mb. I added sink3, which does the same as sink2 but uses fold from Conduit.List as you recommended, and that seems to work, using about 8Mb.

Looking at the code for sink2 vs sink3, I don't understand what would be occupying so much memory in sink2 even in the case of expensive monadic binding, or exclusion from stream fusion. I'm curious if sink2 adds thunks to the heap that sink3 doesn't, or if the GC is failing to clean up heap objects in sink2 that is cleans up in sink3. I'm new at memory profiling, but the chart I get with '+RTS -h' or '+RTS -hr' basically just tells me that the action function is expensive.

In the real project that inspired this example I'm going to do some cleanup, replacing manual recursion with higher-level functions from Conduit.List, as that seems like an all around good idea.

Bryan Vicknair

[1] https://bitbucket.org/bryanvick/conduit-mem [2] GHC 7.8.3, Arch Linux 3.16.1 kernel x86-64

...
But looking at the code again with fresher eyes than last night: I really don't understand why it had such abysmal performance. I'll look into

...
bit more, looks like it should be interesting.

On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton wrote:

...
Michael, I don't see how your code sample for (3) is any different to

...
...
compiler than Roman's original sink2.

I also don't see how the original sink2 creates a bad bind tree. I

On Thu, Aug 28, 2014 at 07:00:41AM +0300, Michael Snoyman wrote: <snip> this a the presume

...
...
that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3).

I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes.

-- Dan Burton

On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka wrote:

...
* Michael Snoyman [2014-08-27 23:48:06+0300]

...
...
The problem is the following Sink, which counts how many even/odd Tokens are seen:

type SinkState = (Integer, Integer)

sink2 :: (Monad m) => SinkState -> Sink Token m SinkState sink2 state@(!evenCount, !oddCount) = do maybeToken <- await case maybeToken of Nothing -> return state (Just Even) -> sink2 (evenCount + 1, oddCount ) (Just Odd ) -> sink2 (evenCount , oddCount + 1)

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:

1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)

Bryan Vicknair

2:27 p.m.

Michael, When I was first digging into this bug, I also ran into the case where doing an action twice would trigger a large increase in memory usage. Also strange was that doing 'sequence_ [action]' was causing the problem for me, but 'do action' was not. Strangely enough though, on my machine there is no memory difference between running 'action' once or twice in your 1st example [1] in comment 4 of bug #9520. Whether action is done once or twice, the maximum resident memory as reported by /usr/bin/time -v is about 1.1Mb. I get roughly the same memory usage from the second code example in that same comment. Attached are the .prof and .hp files from running the 'mem' binary using sink2 on my machine. Here is the output from the +RTS -s switch: 2,191,403,328 bytes allocated in the heap 4,269,946,560 bytes copied during GC 528,829,096 bytes maximum residency (21 sample(s)) 21,830,752 bytes maximum slop 1070 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 3826 colls, 0 par 0.37s 0.42s 0.0001s 0.0032s Gen 1 21 colls, 0 par 4.02s 14.98s 0.7131s 7.6060s INIT time 0.00s ( 0.00s elapsed) MUT time 0.90s ( 6.22s elapsed) GC time 2.74s ( 11.04s elapsed) RP time 0.00s ( 0.00s elapsed) PROF time 1.65s ( 4.35s elapsed) EXIT time 0.04s ( 0.05s elapsed) Total time 5.33s ( 17.31s elapsed) %GC time 51.4% (63.8% elapsed) Alloc rate 2,432,664,887 bytes per MUT second Productivity 17.6% of total user, 5.4% of total elapsed Bryan Vicknair [1] https://ghc.haskell.org/trac/ghc/ticket/9520#comment:4 On Thu, Aug 28, 2014 at 09:37:45AM +0300, Michael Snoyman wrote:

...

Can you provide the output from +RTS -s, as well as the heap profile for -hy?

I believe I'm now also reproducing the memory leak on conduit 1.2, so it must have been a mistake in my testing last night when I thought 1.2 fixed it.

On Thu, Aug 28, 2014 at 8:58 AM, Bryan Vicknair wrote:

...
Thanks for the interesting blog posts Michael. I updated the example project [1] to use conduit 1.2. Unfortunately, on my machine [2], my original sink2 still uses about 500Mb of memory when processing 4 gzip files of about 5Mb each, while sink1 only uses about 8Mb. I added sink3, which does the same as sink2 but uses fold from Conduit.List as you recommended, and that seems to work, using about 8Mb.

Looking at the code for sink2 vs sink3, I don't understand what would be occupying so much memory in sink2 even in the case of expensive monadic binding, or exclusion from stream fusion. I'm curious if sink2 adds thunks to the heap that sink3 doesn't, or if the GC is failing to clean up heap objects in sink2 that is cleans up in sink3. I'm new at memory profiling, but the chart I get with '+RTS -h' or '+RTS -hr' basically just tells me that the action function is expensive.

In the real project that inspired this example I'm going to do some cleanup, replacing manual recursion with higher-level functions from Conduit.List, as that seems like an all around good idea.

Bryan Vicknair

[1] https://bitbucket.org/bryanvick/conduit-mem [2] GHC 7.8.3, Arch Linux 3.16.1 kernel x86-64

...
But looking at the code again with fresher eyes than last night: I really don't understand why it had such abysmal performance. I'll look into

...
bit more, looks like it should be interesting.

On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton wrote:

...
Michael, I don't see how your code sample for (3) is any different to

...
...
compiler than Roman's original sink2.

I also don't see how the original sink2 creates a bad bind tree. I

On Thu, Aug 28, 2014 at 07:00:41AM +0300, Michael Snoyman wrote: <snip> this a the presume

...
...
that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3).

I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes.

-- Dan Burton

On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka wrote:

...
* Michael Snoyman [2014-08-27 23:48:06+0300]

...
> The problem is the following Sink, which counts how many even/odd Tokens > are > seen: > > type SinkState = (Integer, Integer) > > sink2 :: (Monad m) => SinkState -> Sink Token m SinkState > sink2 state@(!evenCount, !oddCount) = do > maybeToken <- await > case maybeToken of > Nothing -> return state > (Just Even) -> sink2 (evenCount + 1, oddCount ) > (Just Odd ) -> sink2 (evenCount , oddCount + 1)

Wow, talk about timing! What you've run into here is expensive monadic bindings. As it turns out, this is exactly what my blog post from last week[1] covered. You have three options to fix this:

1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and uses the codensity transform to avoid the problem. (I just tested your code; you get constant memory usage under conduit 1.2.0, seemingly without any code change necessary.)

Michael Snoyman

4:09 p.m.

I just added a comment onto that issue. I forgot to mention that that memory problem only occurs with optimizations turned on (-O or -O2). Can you test it out with one of those flags and let me know what happens? Your heap profile looks pretty similar to what I've been seeing as well, thanks for providing it. On Thu, Aug 28, 2014 at 9:27 PM, Bryan Vicknair wrote:

...

Michael,

When I was first digging into this bug, I also ran into the case where doing an action twice would trigger a large increase in memory usage. Also strange was that doing 'sequence_ [action]' was causing the problem for me, but 'do action' was not.

Strangely enough though, on my machine there is no memory difference between running 'action' once or twice in your 1st example [1] in comment 4 of bug #9520. Whether action is done once or twice, the maximum resident memory as reported by /usr/bin/time -v is about 1.1Mb. I get roughly the same memory usage from the second code example in that same comment.

Attached are the .prof and .hp files from running the 'mem' binary using sink2 on my machine.

Here is the output from the +RTS -s switch:

2,191,403,328 bytes allocated in the heap 4,269,946,560 bytes copied during GC 528,829,096 bytes maximum residency (21 sample(s)) 21,830,752 bytes maximum slop 1070 MB total memory in use (0 MB lost due to fragmentation)

Tot time (elapsed) Avg pause Max pause Gen 0 3826 colls, 0 par 0.37s 0.42s 0.0001s 0.0032s Gen 1 21 colls, 0 par 4.02s 14.98s 0.7131s 7.6060s

INIT time 0.00s ( 0.00s elapsed) MUT time 0.90s ( 6.22s elapsed) GC time 2.74s ( 11.04s elapsed) RP time 0.00s ( 0.00s elapsed) PROF time 1.65s ( 4.35s elapsed) EXIT time 0.04s ( 0.05s elapsed) Total time 5.33s ( 17.31s elapsed)

%GC time 51.4% (63.8% elapsed)

Alloc rate 2,432,664,887 bytes per MUT second

Productivity 17.6% of total user, 5.4% of total elapsed

Bryan Vicknair

[1] https://ghc.haskell.org/trac/ghc/ticket/9520#comment:4

...
Can you provide the output from +RTS -s, as well as the heap profile for -hy?

I believe I'm now also reproducing the memory leak on conduit 1.2, so it must have been a mistake in my testing last night when I thought 1.2 fixed it.

On Thu, Aug 28, 2014 at 8:58 AM, Bryan Vicknair wrote:

...
Thanks for the interesting blog posts Michael. I updated the example project [1] to use conduit 1.2. Unfortunately, on my machine [2], my original sink2 still uses about 500Mb of memory when processing 4 gzip files of about 5Mb each, while sink1 only uses about 8Mb. I added sink3, which does the same as sink2 but uses fold from Conduit.List as you recommended, and that seems to work, using about 8Mb.

Looking at the code for sink2 vs sink3, I don't understand what would be occupying so much memory in sink2 even in the case of expensive monadic binding, or exclusion from stream fusion. I'm curious if sink2 adds thunks to the heap that sink3 doesn't, or if the GC is failing to clean up heap objects in sink2 that is cleans up in sink3. I'm new at memory profiling, but

...
...
chart I get with '+RTS -h' or '+RTS -hr' basically just tells me that

On Thu, Aug 28, 2014 at 09:37:45AM +0300, Michael Snoyman wrote: the the

...
...
action function is expensive.

In the real project that inspired this example I'm going to do some cleanup, replacing manual recursion with higher-level functions from Conduit.List, as that seems like an all around good idea.

Bryan Vicknair

[1] https://bitbucket.org/bryanvick/conduit-mem [2] GHC 7.8.3, Arch Linux 3.16.1 kernel x86-64

...
But looking at the code again with fresher eyes than last night: I really don't understand why it had such abysmal performance. I'll look into

...
bit more, looks like it should be interesting.

On Thu, Aug 28, 2014 at 1:39 AM, Dan Burton < danburton.email@gmail.com> wrote:

...
Michael, I don't see how your code sample for (3) is any different to

...
...
compiler than Roman's original sink2.

I also don't see how the original sink2 creates a bad bind tree. I

On Thu, Aug 28, 2014 at 07:00:41AM +0300, Michael Snoyman wrote: <snip> this a the presume

...
...
that the reason "fold" works is due to the streaming optimization rule, and not due to its implementation, which looks almost identical to (3).

I worry about using fold in this case, which is only strict up to WHNF, and therefore wouldn't necessarily force the integers in the tuples; instead it would create tons of integer thunks, wouldn't it? Roman's hand-coded sink2 avoids this issue so I presume that's not what is causing his memory woes.

-- Dan Burton

On Wed, Aug 27, 2014 at 2:55 PM, Roman Cheplyaka
...
wrote:

...
...
...
* Michael Snoyman [2014-08-27

23:48:06+0300]

...
...
> > The problem is the following Sink, which counts how many even/odd Tokens > > are > > seen: > > > > type SinkState = (Integer, Integer) > > > > sink2 :: (Monad m) => SinkState -> Sink Token m SinkState > > sink2 state@(!evenCount, !oddCount) = do > > maybeToken <- await > > case maybeToken of > > Nothing -> return state > > (Just Even) -> sink2 (evenCount + 1, oddCount ) > > (Just Odd ) -> sink2 (evenCount , oddCount + 1) > > Wow, talk about timing! What you've run into here is expensive monadic > bindings. As it turns out, this is exactly what my blog post from last > week[1] covered. You have three options to fix this: > > 1. Just upgrade to conduit 1.2.0, which I released a few hours ago, and > uses the codensity transform to avoid the problem. (I just tested your > code; you get constant memory usage under conduit 1.2.0, seemingly without > any code change necessary.)

Bryan Vicknair

7:06 p.m.

On Thu, Aug 28, 2014 at 11:09:55PM +0300, Michael Snoyman wrote:

...

I just added a comment onto that issue. I forgot to mention that that memory problem only occurs with optimizations turned on (-O or -O2). Can you test it out with one of those flags and let me know what happens?

Wow, quite a large difference appears when using -O. 4Mb when the action is run only once vs 789Mb when it is run twice. What's interesting is that the bytes allocated in the heap seems to grow by a reasonable amount when action is run twice, but the total resident memory explodes. The results when action is run only once: 1,440,041,040 bytes allocated in the heap 465,368 bytes copied during GC 35,992 bytes maximum residency (2 sample(s)) 21,352 bytes maximum slop 1 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 2756 colls, 0 par 0.01s 0.01s 0.0000s 0.0006s Gen 1 2 colls, 0 par 0.00s 0.00s 0.0001s 0.0001s INIT time 0.00s ( 0.00s elapsed) MUT time 0.19s ( 0.19s elapsed) GC time 0.01s ( 0.01s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.20s ( 0.20s elapsed) %GC time 6.0% (5.3% elapsed) Alloc rate 7,563,673,522 bytes per MUT second Productivity 93.6% of total user, 93.5% of total elapsed Command being timed: "./foo +RTS -s" User time (seconds): 0.16 System time (seconds): 0.03 Percent of CPU this job got: 98% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:00.20 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 4024 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 338 Voluntary context switches: 1 Involuntary context switches: 21 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 The results when action is run twice: 2,080,041,040 bytes allocated in the heap 1,346,503,136 bytes copied during GC 389,736,000 bytes maximum residency (11 sample(s)) 8,871,312 bytes maximum slop 768 MB total memory in use (0 MB lost due to fragmentation) Tot time (elapsed) Avg pause Max pause Gen 0 3968 colls, 0 par 0.28s 0.28s 0.0001s 0.0009s Gen 1 11 colls, 0 par 0.57s 0.57s 0.0519s 0.2668s INIT time 0.00s ( 0.00s elapsed) MUT time 0.32s ( 0.32s elapsed) GC time 0.85s ( 0.85s elapsed) EXIT time 0.03s ( 0.03s elapsed) Total time 1.20s ( 1.21s elapsed) %GC time 71.0% (70.7% elapsed) Alloc rate 6,553,280,639 bytes per MUT second Productivity 29.0% of total user, 28.8% of total elapsed Command being timed: "./foo +RTS -s" User time (seconds): 0.91 System time (seconds): 0.28 Percent of CPU this job got: 99% Elapsed (wall clock) time (h:mm:ss or m:ss): 0:01.20 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 789432 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 196760 Voluntary context switches: 1 Involuntary context switches: 47 Swaps: 0 File system inputs: 0 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0

4081

Age (days ago)

4082

Last active (days ago)

List overview

Download

14 comments

5 participants

participants (5)

Bryan Vicknair
Dan Burton
Michael Snoyman
Roman Cheplyaka
Simon Peyton Jones