
#9520: Running an action twice uses much more memory than running it once -------------------------------------+------------------------------------- Reporter: snoyberg | Owner: Type: bug | Status: closed Priority: normal | Milestone: Component: Compiler | Version: 7.8.3 Resolution: invalid | Keywords: Operating System: Linux | Architecture: x86_64 Type of failure: Runtime | (amd64) performance bug | Test Case: Blocked By: | Blocking: Related Tickets: #8457, #12620 | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by edsko): * related: #8457 => #8457, #12620 @@ -0,0 +1,6 @@ + '''EDIT''': A detailed analysis of the problems discussed in this ticket + can be found at http://www.well-typed.com/blog/2016/09/sharing-conduit/ . + There is no ghc bug here, as such, except perhaps #8457 "-ffull-laziness + does more harm than good". See also #12620 "Allow the user to prevent + floating and CSE". + New description: '''EDIT''': A detailed analysis of the problems discussed in this ticket can be found at http://www.well-typed.com/blog/2016/09/sharing-conduit/ . There is no ghc bug here, as such, except perhaps #8457 "-ffull-laziness does more harm than good". See also #12620 "Allow the user to prevent floating and CSE". This started as a [http://www.haskell.org/pipermail/haskell- cafe/2014-August/115751.html Haskell cafe discussion] about conduit. This may be related to #7206, but I can't be certain. It's possible that GHC is not doing anything wrong here, but I can't see a way that the code in question is misbehaving to trigger this memory usage. Consider the following code, which depends on conduit-1.1.7 and conduit- extra: {{{#!hs import Data.Conduit ( Sink, (=$), ($$), await ) import qualified Data.Conduit.Binary as CB import System.IO (withBinaryFile, IOMode (ReadMode)) main :: IO () main = do action "random.gz" --action "random.gz" action :: FilePath -> IO () action filePath = withBinaryFile filePath ReadMode $ \h -> do _ <- CB.sourceHandle h $$ CB.lines =$ sink2 1 return () sink2 :: (Monad m) => Int -> Sink a m Int sink2 state = do maybeToken <- await case maybeToken of Nothing -> return state Just _ -> sink2 $! state + 1 }}} The code should open up the file "random.gz" (I simply `gzip`ed about 10MB of data from /dev/urandom), break it into chunks at each newline character, and then count the number of lines. When I run it as-is, it uses 53KB of memory, which seems reasonable. However, if I uncomment the second call to `action` in `main`, maximum residency shoots up to 45MB (this seems to be linear in the size of the input file. I additionally tried copying `random.gz` into two files, `random1.gz` and `random2.gz`, and changed the two calls to `action` to use different file names. It still resulted in large memory usage. I'm going to continue working to make this a smaller reproducing test case, but I wanted to start with what I had so far. I'll also attach the core generated by both the low-memory and high-memory versions. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9520#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler