GHC, odd concurrency space leak

A problem with GHC? =================== This post describes some odd behaviour I have seen in GHC 6.12.1 when writing Combinatorrent. The post is literate Haskell so you can run it. The executive summary: A space leak occurs when a new process is spawned from inside another process - and I can't figure out why. I am asking for help on haskell-cafe. We begin by upgrading GHC from Haskell98 to something mature and modern:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
And then we do the import-tango:
module Main where
import Control.Monad.Reader import Control.Monad.State
In our system, a *Process* is an identifier for a thread of execution. Since these heavily makes use of the following imports,
import Control.Concurrent import Control.Concurrent.STM
it is beneficial to make processes live in the IO-monad. However, it turns out that having everything living in the IO monad is a clunky idea. A Process also has a current configuration: The channels and variables on which it can communicate for instance. The configuration is not expected to be changed over the course of the process running. For data which do change when the process runs, we want some state tracking. Luckily, the Xmonad X-monad comes to our rescue. In all its gory and glory details, we generalize it and rewrite it for our processes, which is how they are used in combinatorrent. Our model-kit for building new threads of execution is the following beast:
newtype Process a b c = Process (ReaderT a (StateT b IO) c) deriving (Functor, Monad, MonadIO, MonadState b, MonadReader a)
Note that the automatic derivations of *MonadState b* and *MonadReader a* makes GHC spit our some mkUsageInfo warnings in its generation of the .hi-files. They don't seem to be dangerous. Glueing instructions for our model kit is given by spawning off new threads:
run :: a -> b -> Process a b c -> IO (c, b) run c st (Process p) = runStateT (runReaderT p c) st
spawn :: a -> b -> Process a b () -> IO ThreadId spawn c st p = forkIO $ run c st p >> return ()
Our first dummy --------------- Machinery for having fun is now in place. Here is a crash-test-dummy we would like to play with:
main1 = do spawn () () (forever $ return ()) threadDelay (3 * 1000000)
-- main = main1
Note that the given units () and () are usually much more complicated, but for
the sake of this minimal example units will do. And running that does what we
expect it to do:
./Post +RTS -tstderr
<
p1 :: Process () () () p1 = forever $ return ()
startp1 :: IO ThreadId startp1 = spawn () () p1
startp2 :: IO ThreadId startp2 = spawn () () (forever $ do liftIO startp1 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (10 * 1000000))
main2 = do putStrLn "Main thread starting" startp2 threadDelay (1 * 1000000)
main = main2
Running *this* beast gives:
./Post +RTS -tstderr
Main thread starting
Delaying
<

Jesper Louis Andersen
This post describes some odd behaviour I have seen in GHC 6.12.1 when writing Combinatorrent. The post is literate Haskell so you can run it. The executive summary: A space leak occurs when a new process is spawned from inside another process - and I can't figure out why. I am asking for help on haskell-cafe.
...[snip]...
import Control.Monad.State
Does the problem go away if you use "Control.Monad.State.Strict"?
G
--
Gregory Collins

On Wed, Apr 14, 2010 at 2:13 PM, Gregory Collins
Jesper Louis Andersen
writes: This post describes some odd behaviour I have seen in GHC 6.12.1 when writing Combinatorrent. The post is literate Haskell so you can run it. The executive summary: A space leak occurs when a new process is spawned from inside another process - and I can't figure out why. I am asking for help on haskell-cafe.
...[snip]...
import Control.Monad.State
Does the problem go away if you use "Control.Monad.State.Strict"?
Nope :) That was the first thing I tried here. I tried playing with optimization level too. Next I tried making two versions that were as similar as possible and then comparing the core with ghc-core. I can't see a difference between a version that uses 1MB and a version that uses 160MB (on my system 160MB is the worst I can get it to blow up). The two versions I compared: Low memory: \begin{code}
startp4 :: IO ThreadId startp4 = spawn () () (return ())
startp3 :: IO ThreadId startp3 = spawn () () (forever $ do liftIO startp4 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (3 * 1000000))
main1 = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000)
main = main1 \end{code}
Too much memory: \begin{code}
startp4 :: IO ThreadId startp4 = spawn () () (forever $ return ())
startp3 :: IO ThreadId startp3 = spawn () () (forever $ do liftIO startp4 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (3 * 1000000))
main1 = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000)
main = main1 \end{code}
The difference is whether or not the threads must keep returning () or if they returns it once. I'm not sure what to make of it. My conclusion is that keeping the thread alive via forever is the problem, but when I test this hypothesis with a threadDelay the space leak goes away: \begin{code}
startp4 :: IO ThreadId startp4 = spawn () () (liftIO $ threadDelay (100 * 1000000))
startp3 :: IO ThreadId startp3 = spawn () () (forever $ do liftIO startp4 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (3 * 1000000))
main1 = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000)
main = main1 \end{code}
It will be interesting to hear what fixes this! Jason

On Wed, Apr 14, 2010 at 2:44 PM, Jason Dagit
On Wed, Apr 14, 2010 at 2:13 PM, Gregory Collins
wrote: Jesper Louis Andersen
writes: This post describes some odd behaviour I have seen in GHC 6.12.1 when writing Combinatorrent. The post is literate Haskell so you can run it. The executive summary: A space leak occurs when a new process is spawned from inside another process - and I can't figure out why. I am asking for help on haskell-cafe.
...[snip]...
import Control.Monad.State
Does the problem go away if you use "Control.Monad.State.Strict"?
Nope :) That was the first thing I tried here.
I tried playing with optimization level too.
Next I tried making two versions that were as similar as possible and then comparing the core with ghc-core. I can't see a difference between a version that uses 1MB and a version that uses 160MB (on my system 160MB is the worst I can get it to blow up).
The two versions I compared: Low memory: \begin{code}
startp4 :: IO ThreadId startp4 = spawn () () (return ())
startp3 :: IO ThreadId startp3 = spawn () () (forever $ do liftIO startp4 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (3 * 1000000))
main1 = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000)
main = main1 \end{code}
Too much memory: \begin{code}
startp4 :: IO ThreadId startp4 = spawn () () (forever $ return ())
startp3 :: IO ThreadId startp3 = spawn () () (forever $ do liftIO startp4 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (3 * 1000000))
main1 = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000)
main = main1 \end{code}
The difference is whether or not the threads must keep returning () or if they returns it once.
I'm not sure what to make of it. My conclusion is that keeping the thread alive via forever is the problem, but when I test this hypothesis with a threadDelay the space leak goes away:
\begin{code}
startp4 :: IO ThreadId startp4 = spawn () () (liftIO $ threadDelay (100 * 1000000))
startp3 :: IO ThreadId startp3 = spawn () () (forever $ do liftIO startp4 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (3 * 1000000))
main1 = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000)
main = main1 \end{code}
It will be interesting to hear what fixes this!
forever' m = do _ <- m forever' m
When I define that version of forever, the space leak goes away.

On Wed, Apr 14, 2010 at 3:13 PM, Daniel Fischer
Am Mittwoch 14 April 2010 23:49:43 schrieb Jason Dagit:
It will be interesting to hear what fixes this!
forever' m = do _ <- m forever' m
When I define that version of forever, the space leak goes away.
Not with optimisations.
Thanks for pointing that out. I forgot to say so in my email. Here are two reduced versions of the original program: Good version, ghc --make Terminate.hs: \begin{code} {-# OPTIONS -O0 #-} {-# LANGUAGE GeneralizedNewtypeDeriving #-} module Main where import Control.Monad (forever) import Control.Concurrent import Control.Concurrent.STM spawn :: IO a -> IO ThreadId spawn io = forkIO (io >> return ()) forever' m = do _ <- m forever' m startp4 :: IO ThreadId startp4 = spawn (forever' (return ())) startp3 :: IO ThreadId startp3 = spawn (forever $ do startp4 putStrLn "Delaying" threadDelay (3 * 1000000)) main = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000) \end{code} The bad version, ghc --make NonTermination.hs: \begin{code} {-# OPTIONS -O2 #-} {-# LANGUAGE GeneralizedNewtypeDeriving #-} -- Note: Change the optimization to -O1 to get a terminating version -- that uses much more memory than it should. module Main where import Control.Monad (forever) import Control.Concurrent import Control.Concurrent.STM spawn :: IO a -> IO ThreadId spawn io = forkIO (io >> return ()) startp4 :: IO ThreadId startp4 = spawn (forever (return ())) startp3 :: IO ThreadId startp3 = spawn (forever $ do startp4 putStrLn "Delaying" threadDelay (3 * 1000000)) main = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000) \end{code} Can some core expert please look at these and explain the difference? Thanks! Jason

Am Donnerstag 15 April 2010 00:52:22 schrieb Jason Dagit:
The bad version, ghc --make NonTermination.hs: \begin{code} {-# OPTIONS -O2 #-} {-# LANGUAGE GeneralizedNewtypeDeriving #-}
-- Note: Change the optimization to -O1 to get a terminating version
Doesn't seem to terminate with -O1 here (killed after ~30 seconds). However, it does terminate with -O2 and -fno-state-hack, but leaks (and there's no difference between forever and forever' with -O2 -fno-state- hack).
Can some core expert please look at these and explain the difference?
I'm interested in an explanation too.
Thanks! Jason
& Daniel

On Thu, Apr 15, 2010 at 1:33 AM, Daniel Fischer
Can some core expert please look at these and explain the difference?
I'm interested in an explanation too.
+1 The behaviour is consistent. GHC 6.8.3, 6.10.4, 6.12.1 and 6.13-20100416 all agree on the space leak. Here is the minimal program I have with the leak: \begin{code} {-# LANGUAGE GeneralizedNewtypeDeriving #-} module Main where import Control.Monad.State import Control.Concurrent newtype Process b c = Process (StateT b IO c) deriving (Monad, MonadIO, MonadState b) run :: b -> Process b c -> IO (c, b) run st (Process p) = runStateT p st spawn :: b -> Process b () -> IO ThreadId spawn st p = forkIO $ run st p >> return () p1 :: Process () () p1 = forever $ return () startp1 :: IO ThreadId startp1 = spawn () p1 startp2 :: IO ThreadId startp2 = spawn () (forever $ do liftIO startp1 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (10 * 1000000)) main = do putStrLn "Main thread starting" startp2 threadDelay (1 * 1000000) \end{code} .. so it looks like it is the state monad. I used ghc-core to print out this program in Core-format, killed all the type casts from System-F_c and inspected the code. I can't see what would make any problem there, but that was my first use of Core, so I might have overlooked something. The only thing I can see is that we "split" the State# RealWorld whenever we fork, but I think that is expected behaviour. The only other culprit I could guess at is the exception catch# primops in there. Should I file this as a bug? It has some bug-like qualities to it. In any case, what is going on is quite complicated so a resolution would be nice. If for nothing else to understand what is going on. -- J.

On Fri, Apr 16, 2010 at 2:51 PM, Jesper Louis Andersen < jesper.louis.andersen@gmail.com> wrote:
On Thu, Apr 15, 2010 at 1:33 AM, Daniel Fischer
wrote: Can some core expert please look at these and explain the difference?
I'm interested in an explanation too.
+1
The behaviour is consistent. GHC 6.8.3, 6.10.4, 6.12.1 and 6.13-20100416 all agree on the space leak. Here is the minimal program I have with the leak:
Myself and others posted "simpler" programs that had similar bad behavior, including the space leak (depending on optimizations flags). I realize it's tedious to retest all those versions, but do you think you could check with one of the other versions that doesn't need mtl?
\begin{code} {-# LANGUAGE GeneralizedNewtypeDeriving #-}
module Main where import Control.Monad.State import Control.Concurrent
newtype Process b c = Process (StateT b IO c) deriving (Monad, MonadIO, MonadState b)
run :: b -> Process b c -> IO (c, b) run st (Process p) = runStateT p st
spawn :: b -> Process b () -> IO ThreadId spawn st p = forkIO $ run st p >> return ()
p1 :: Process () () p1 = forever $ return ()
startp1 :: IO ThreadId startp1 = spawn () p1
startp2 :: IO ThreadId startp2 = spawn () (forever $ do liftIO startp1 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (10 * 1000000))
main = do putStrLn "Main thread starting" startp2 threadDelay (1 * 1000000) \end{code}
.. so it looks like it is the state monad.
I don't think so because we were able to produce the space leak without using StateT.
I used ghc-core to print out this program in Core-format, killed all the type casts from System-F_c and inspected the code. I can't see what would make any problem there, but that was my first use of Core, so I might have overlooked something. The only thing I can see is that we "split" the State# RealWorld whenever we fork, but I think that is expected behaviour. The only other culprit I could guess at is the exception catch# primops in there.
Should I file this as a bug? It has some bug-like qualities to it. In any case, what is going on is quite complicated so a resolution would be nice. If for nothing else to understand what is going on.
Well, I think Bulat correctly characterized the non-termination aspect. I didn't think the cooperative aspect of threading applied with the threaded RTS, so I'm not 100% sure I believe his characterization, but otherwise it seems like a reasonable explanation. The space leakiness is a different issue and likely worth a bug report in its own right. Do you think you could try checking for the speak leaking using the compacting garbage collector? I think that one is enabled with +RTS -c -RTS. Thanks for checking on all those different versions of GHC. Jason

On Sat, Apr 17, 2010 at 12:00 AM, Jason Dagit
Myself and others posted "simpler" programs that had similar bad behavior, including the space leak (depending on optimizations flags). I realize it's tedious to retest all those versions, but do you think you could check with one of the other versions that doesn't need mtl?
You got me curious enough that I decided to attack it systematically.
Here is a test-run script:
\begin{code}
#!/bin/bash
GHC68=/usr/local/stow/ghc-6.8.3/bin/ghc
GHC610=/usr/local/stow/ghc-6.10.4/bin/ghc
GHC612=ghc
GHC6HEAD=/usr/local/stow/ghc-6.13-20100416/bin/ghc
run_round () {
EXE=$(basename ${1} .hs)
echo ----------------------------------------------
echo GHC68
${GHC68} --make $2 $1
./${EXE} +RTS -tstderr $3
echo ----------------------------------------------
echo GHC610
${GHC610} --make $2 $1
./${EXE} +RTS -tstderr $3
echo ----------------------------------------------
echo GHC612
${GHC612} --make $2 $1
./${EXE} +RTS -tstderr $3
echo ----------------------------------------------
echo GHC HEAD
${GHC6HEAD} --make -rtsopts $2 $1
./${EXE} +RTS -tstderr $3
}
run_round $1 $2 $3
\end{code}
With this script down, we can run your "Good" version:
jlouis@illithid:~$ sh runner.sh JD-Good.hs
----------------------------------------------
GHC68
[1 of 1] Compiling Main ( JD-Good.hs, JD-Good.o )
Linking JD-Good ...
./JD-Good +RTS -tstderr
Main thread starting
Delaying
<
Well, I think Bulat correctly characterized the non-termination aspect. I didn't think the cooperative aspect of threading applied with the threaded RTS, so I'm not 100% sure I believe his characterization, but otherwise it seems like a reasonable explanation.
It is certainly a valid explanation, and the most plausible at the moment I think.
The space leakiness is a different issue and likely worth a bug report in its own right. Do you think you could try checking for the speak leaking using the compacting garbage collector? I think that one is enabled with +RTS -c -RTS.
Oh, that gives some interesting progress: Here is the run without -c: jlouis@illithid:~$ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.12.1 jlouis@illithid:~$ ghc --make -threaded Post.hs jlouis@illithid:~$ ./Post +RTS -s ./Post +RTS -s Main thread starting Delaying 840,429,800 bytes allocated in the heap 336,183,280 bytes copied during GC 86,294,808 bytes maximum residency (8 sample(s)) 2,648,600 bytes maximum slop 171 MB total memory in use (3 MB lost due to fragmentation) Generation 0: 1596 collections, 0 parallel, 0.35s, 0.33s elapsed Generation 1: 8 collections, 0 parallel, 0.27s, 0.35s elapsed Parallel GC work balance: nan (0 / 0, ideal 1) MUT time (elapsed) GC time (elapsed) Task 0 (worker) : 0.00s ( 0.32s) 0.00s ( 0.00s) Task 1 (worker) : 0.37s ( 0.32s) 0.62s ( 0.68s) Task 2 (worker) : 0.00s ( 0.32s) 0.00s ( 0.00s) Task 3 (worker) : 0.00s ( 0.32s) 0.00s ( 0.00s) SPARKS: 0 (0 converted, 0 pruned) INIT time 0.00s ( 0.00s elapsed) MUT time 0.28s ( 0.32s elapsed) GC time 0.62s ( 0.68s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 0.90s ( 1.00s elapsed) %GC time 68.9% (67.6% elapsed) Alloc rate 3,001,331,338 bytes per MUT second Productivity 31.1% of total user, 27.9% of total elapsed gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].steps[0].sync_large_objects: 0 gen[0].steps[1].sync_large_objects: 0 gen[1].steps[0].sync_large_objects: 0 And here with the compacting GC: jlouis@illithid:~$ ./Post +RTS -s -c ./Post +RTS -s -c Main thread starting Delaying 12,642,360 bytes allocated in the heap 2,522,160 bytes copied during GC 2,522,584 bytes maximum residency (3 sample(s)) 59,232 bytes maximum slop 4 MB total memory in use (0 MB lost due to fragmentation) Generation 0: 22 collections, 0 parallel, 0.02s, 0.01s elapsed Generation 1: 3 collections, 0 parallel, 5.08s, 5.09s elapsed Parallel GC work balance: nan (0 / 0, ideal 1) MUT time (elapsed) GC time (elapsed) Task 0 (worker) : 0.00s ( 0.01s) 0.00s ( 0.00s) Task 1 (worker) : 0.00s ( 0.01s) 5.10s ( 5.10s) Task 2 (worker) : 0.00s ( 0.01s) 0.00s ( 0.00s) Task 3 (worker) : 0.00s ( 0.01s) 0.00s ( 0.00s) SPARKS: 0 (0 converted, 0 pruned) INIT time 0.00s ( 0.00s elapsed) MUT time 0.01s ( 0.01s elapsed) GC time 5.10s ( 5.10s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 5.10s ( 5.11s elapsed) %GC time 99.9% (99.8% elapsed) Alloc rate 3,159,800,049 bytes per MUT second Productivity 0.0% of total user, 0.0% of total elapsed gc_alloc_block_sync: 0 whitehole_spin: 0 gen[0].steps[0].sync_large_objects: 0 gen[0].steps[1].sync_large_objects: 0 gen[1].steps[0].sync_large_objects: 0 So it looks like it eliminates the space leak, but note how the mutator doesn't get to do any work since we are using up all the time in the GC. We only get to run 22 Gen0 collections and 3 Gen1 collections. In other words, I don't think it does anything to help with the leak. According to heap profiling, two things take memory: PAPs (which are partial applications to the RTS, that is functions which are not yet fully applied), and a closure. It would make sense that it is a PAP when one looks at the core. State monads are s -> (s, a) and StateT with IO as the underlying monad gets translated into s -> ioS -> (ioS, (s, a)), so I am not too confused about the PAP appearing. Thinking more about this might reveal why the PAP appears however. Also, if you need me to run any kind of test against other RTS options or programs, I'll be happy to do it. Just bump me :) -- J.

Hello Jason, Saturday, April 17, 2010, 2:00:04 AM, you wrote:
Well, I think Bulat correctly characterized the non-termination aspect. I didn't think the cooperative aspect of threading applied with the threaded RTS, so I'm not 100% sure I believe his characterization, but otherwise it seems like a reasonable explanation.
it's a well known side of ghc green threads implementation. read notes in sources of Control.Concurrent module: The concurrency extension for Haskell is described in the paper /Concurrent Haskell/ http://www.haskell.org/ghc/docs/papers/concurrent-haskell.ps.gz. Concurrency is "lightweight", which means that both thread creation and context switching overheads are extremely low. Scheduling of Haskell threads is done internally in the Haskell runtime system, and doesn't make use of any operating system-supplied thread packages. GHC implements pre-emptive multitasking: the execution of threads are interleaved in a random fashion. More specifically, a thread may be pre-empted whenever it allocates some memory, which unfortunately means that tight loops which do no allocation tend to lock out other threads (this only seems to happen with pathological benchmark-style code, however). -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Jason Dagit wrote:
On Wed, Apr 14, 2010 at 3:13 PM, Daniel Fischer
mailto:daniel.is.fischer@web.de> wrote: Am Mittwoch 14 April 2010 23:49:43 schrieb Jason Dagit: > > It will be interesting to hear what fixes this! > > > > > > forever' m = do _ <- m > > forever' m > > When I define that version of forever, the space leak goes away.
Not with optimisations.
Thanks for pointing that out. I forgot to say so in my email.
Here are two reduced versions of the original program:
<snip> I find non-termination with a much simpler program than yours (GHC 6.12.1): \begin{code}{-# OPTIONS -O1 #-} import Control.Concurrent import Control.Monad (forever) main = do putStrLn "Main thread starting" forkIO $ do putStrLn "Started thread" forever $ return () putStrLn "Delaying" threadDelay (1 * 1000000) putStrLn "Delayed" \end{code} If I compile that with "ghc --make -threaded" and run it, with -O1 or -O2, it burns CPU and never terminates. With -O0 it terminates. So looks like some optimisation is causing the problem. I might guess it's something to do with the RTS and threadDelay that's causing the problem. "Delayed" is never printed on my system, so it seems like (even when run with +RTS -N2) the original thread is not ever being rescheduled; perhaps the timeout queue isn't checked properly when a thread is burning up the CPU like that, and optimisations are on? Thanks, Neil.

Hello Neil, Thursday, April 15, 2010, 12:37:35 PM, you wrote:
I find non-termination with a much simpler program than yours (GHC 6.12.1): forkIO $ do putStrLn "Started thread" forever $ return ()
ghc multithreading is actually cooperative: it switches only on memory allocation. since almost any haskell code allocates, there is no problem - it works like preemptive one. but sometimes this assumption fails - with optimization enabled, your code doesn't allocate so there are no chances for thread switching. replacing return () with threadDelay call solves the problem -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

I have not been following the details of this, I'm afraid, but I notice this:
forever' m = do _ <- m forever' m
When I define that version of forever, the space leak goes away.
What was the old version of forever that led to the leak?
If you can boil down the leak to a simple test case, do submit a Trac ticket.
Simon
From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Jason Dagit
Sent: 14 April 2010 22:50
To: Gregory Collins
Cc: Haskell Cafe
Subject: Re: [Haskell-cafe] GHC, odd concurrency space leak
On Wed, Apr 14, 2010 at 2:44 PM, Jason Dagit
This post describes some odd behaviour I have seen in GHC 6.12.1 when writing Combinatorrent. The post is literate Haskell so you can run it. The executive summary: A space leak occurs when a new process is spawned from inside another process - and I can't figure out why. I am asking for help on haskell-cafe.
...[snip]...
import Control.Monad.State
Does the problem go away if you use "Control.Monad.State.Strict"? Nope :) That was the first thing I tried here. I tried playing with optimization level too. Next I tried making two versions that were as similar as possible and then comparing the core with ghc-core. I can't see a difference between a version that uses 1MB and a version that uses 160MB (on my system 160MB is the worst I can get it to blow up). The two versions I compared: Low memory: \begin{code}
startp4 :: IO ThreadId startp4 = spawn () () (return ())
startp3 :: IO ThreadId startp3 = spawn () () (forever $ do liftIO startp4 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (3 * 1000000))
main1 = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000)
main = main1 \end{code}
Too much memory: \begin{code}
startp4 :: IO ThreadId startp4 = spawn () () (forever $ return ())
startp3 :: IO ThreadId startp3 = spawn () () (forever $ do liftIO startp4 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (3 * 1000000))
main1 = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000)
main = main1 \end{code}
The difference is whether or not the threads must keep returning () or if they returns it once. I'm not sure what to make of it. My conclusion is that keeping the thread alive via forever is the problem, but when I test this hypothesis with a threadDelay the space leak goes away: \begin{code}
startp4 :: IO ThreadId startp4 = spawn () () (liftIO $ threadDelay (100 * 1000000))
startp3 :: IO ThreadId startp3 = spawn () () (forever $ do liftIO startp4 liftIO $ putStrLn "Delaying" liftIO $ threadDelay (3 * 1000000))
main1 = do putStrLn "Main thread starting" startp3 threadDelay (1 * 1000000)
main = main1 \end{code}
It will be interesting to hear what fixes this!
forever' m = do _ <- m forever' m
When I define that version of forever, the space leak goes away.

I have not been following the details of this, I'm afraid, but I notice
Am Samstag 17 April 2010 14:41:28 schrieb Simon Peyton-Jones: this:
forever' m = do _ <- m forever' m
When I define that version of forever, the space leak goes away.
What was the old version of forever that led to the leak?
Control.Monad.forever forever :: Monad m => m a -> m b forever m = m >> forever m However, that isn't the problem. In my tests, both variants of forever exhibit the same behaviour, what makes it leak or not is the optimisation level.
If you can boil down the leak to a simple test case, do submit a Trac ticket.
Simon
The code below behaves well if compiled without optimisations (~36K maximum residency). When compiled with optimisations (-O1 or -O2, no discernible difference), it gets stuck in the infinite loop "always (return ())" [no surprise], but it runs in small space (+RTS -M58K for me). With -O2 -fno-state-hack (or -O1 -fno-state-hack), it leaks memory: 469,292,260 bytes allocated in the heap 837,326,332 bytes copied during GC 233,727,956 bytes maximum residency (9 sample(s)) 3,740,036 bytes maximum slop 456 MB total memory in use (4 MB lost due to fragmentation) --------------------------------------------- module Main (main) where import Control.Concurrent {- always :: Monad m => m a -> m b always a = a >> always a -} always :: Monad m => m a -> m b always a = do _ <- a always a spawner :: IO () spawner = always $ do forkIO $ always (return ()) putStrLn "Delaying" threadDelay 1000000 main :: IO () main = do putStrLn "Spawning" forkIO spawner putStrLn "Delaying main" threadDelay 4000000 -------------------------------------------

Daniel Fischer wrote:
I have not been following the details of this, I'm afraid, but I notice
Am Samstag 17 April 2010 14:41:28 schrieb Simon Peyton-Jones: this:
forever' m = do _ <- m forever' m
When I define that version of forever, the space leak goes away.
What was the old version of forever that led to the leak?
Control.Monad.forever
forever :: Monad m => m a -> m b forever m = m >> forever m
However, that isn't the problem. In my tests, both variants of forever exhibit the same behaviour, what makes it leak or not is the optimisation level.
This definition, plus sharing, is the source of the space leak. Consider this modification of your code: import Control.Concurrent always :: Monad m => m a -> m b always a = -- let act = a >> act in act do _ <- a always a noop :: IO () noop = return () body :: IO () body = always noop spawner :: IO () spawner = do forkIO $ body putStrLn "Delaying" threadDelay 1000000 body `seq` return () main :: IO () main = do putStrLn "Spawning" forkIO spawner putStrLn "Delaying main" threadDelay 4000000 Note that the 'always' in 'spawner' is gone, but it still exhibits the space leak. The leak goes away if the final line of 'spawner' is removed, hinting at the real problem: 'always' actually creates a long chain of actions instead of tying the knot. Indeed the following definition of 'always' (or 'forever') fares better in that regard, but is more susceptible to producing unproductive loops: always a = let act = a >> act in act (I used noop = yield for avoiding that problem in my tests) regards, Bertram

Hello Bertram, Sunday, April 18, 2010, 12:11:05 AM, you wrote:
always a = -- let act = a >> act in act do _ <- a always a
hinting at the real problem: 'always' actually creates a long chain of actions instead of tying the knot.
can you explain it deeper? it's what i see: always definition is equivalent to always a = do a always a what's the same as always a = a >> always a that looks exactly like your commented out definition, except that it doesn't create value act. but i don't see list of actions here -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Bertram,
Sunday, April 18, 2010, 12:11:05 AM, you wrote:
always a = -- let act = a >> act in act do _ <- a always a
hinting at the real problem: 'always' actually creates a long chain of actions instead of tying the knot.
can you explain it deeper? it's what i see: always definition is equivalent to
always a = do a always a
what's the same as
always a = a >> always a
This expands as always a = a >> always a = a >> a >> always a = a >> a >> a >> always a ... where each >> application is represented by a newly allocated object (or several, I have not looked at it in detail) on the heap. With always a = let act = a >> act in act there's only one >> application being allocated. The principle is the same as with repeat x = x : repeat x versus repeat x = let xs = x : xs in xs HTH, Bertram

Hello Bertram, Sunday, April 18, 2010, 3:36:31 AM, you wrote:
This expands as
always a = a >> always a = a >> a >> always a = a >> a >> a >> always a ... where each >> application is represented by a newly allocated object (or several, I have not looked at it in detail) on the heap.
why you think so? i always thought that >> in ghc just sequentially executes statements, the RealWorld magic exists only at compile-time -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
This expands as
always a = a >> always a = a >> a >> always a = a >> a >> a >> always a ... where each >> application is represented by a newly allocated object (or several, I have not looked at it in detail) on the heap.
why you think so?
At the time I wrote this, because it explains the space leak and because the space leak disappears if I address this precise issue. But I've since verified the theory by inspecting Core and Cmm code.
i always thought that >> in ghc just sequentially executes statements, the RealWorld magic exists only at compile-time
Yes, that's what happens once (>>) gets actually executed in IO. But this fact and the RealWorld token have nothing to do with the whole issue, which is about accumulating a chain of IO actions that have not yet been executed. I'll continue to write a >> b, which in IO, modulo newtypes, stands for \(s :: RealWorld#) -> case a s of (s', _) -> b s' The fact that the state token disappears at runtime does not change that this is a closure, represented by a (function) heap node. So we have some IO action let x = always a Now we run x, but also hold onto the corresponding thunk to reuse it later, say let x = always a in x >> x In order to execute that, x is forced, and evaluated to let x = let x' = always a in a >> x' in x >> x or, equivalently, let x' = always a x = a >> x' in x >> x Then the first step of the IO action is performed, resulting in let x' = always a x = a >> x' in x' >> x And now the same reduction happens again for x', let x2 = always a x' = a >> x2 x = a >> x' in x2 >> x and then again for x2, let x3 = always a x2 = a >> x3 x' = a >> x2 x = a >> x' in x2 >> x and so on, ad infinitum. This leaks memory because x, x', x2 etc. can't be garbage collected - there's still a reference to x. Note that this also explains why the space leak disappears if we remove the 'forever' in the spawner thread in the original example. This would not happen if the 'always a' was reused, i.e. if the code tied a knot as let act = a >> act in act does, but as you can see in the Core (and even Cmm if you look closely enough) that does not happen in those cases where the code leaks memory. HTH, Bertram

Am Samstag 17 April 2010 22:11:05 schrieb Bertram Felgenhauer:
Daniel Fischer wrote:
Am Samstag 17 April 2010 14:41:28 schrieb Simon Peyton-Jones:
I have not been following the details of this, I'm afraid, but I notice
this:
forever' m = do _ <- m forever' m
When I define that version of forever, the space leak goes away.
What was the old version of forever that led to the leak?
Control.Monad.forever
forever :: Monad m => m a -> m b forever m = m >> forever m
However, that isn't the problem. In my tests, both variants of forever exhibit the same behaviour, what makes it leak or not is the optimisation level.
This definition, plus sharing, is the source of the space leak. Consider this modification of your code:
import Control.Concurrent
always :: Monad m => m a -> m b always a = -- let act = a >> act in act do _ <- a always a
noop :: IO () noop = return ()
body :: IO () body = always noop
spawner :: IO () spawner = do forkIO $ body putStrLn "Delaying" threadDelay 1000000 body `seq` return ()
main :: IO () main = do putStrLn "Spawning" forkIO spawner putStrLn "Delaying main" threadDelay 4000000
Note that the 'always' in 'spawner' is gone, but it still exhibits the space leak. The leak goes away if the final line of 'spawner' is removed, hinting at the real problem: 'always' actually creates a long chain of actions instead of tying the knot.
Except that with optimisations turned on, GHC ties the knot for you (at least if always isn't exported). Without -fno-state-hack, the knot is tied so tightly that always (return ()) is never descheduled (and there's no leak). With -fno-state-hack, I get Rec { Main.main_always :: GHC.Types.IO () -> GHC.Types.IO () GblId [Arity 1 NoCafRefs Str: DmdType L] Main.main_always = \ (a_aeO :: GHC.Types.IO ()) -> let { k_sYz :: GHC.Types.IO () LclId [Str: DmdType] k_sYz = Main.main_always a_aeO } in (\ (eta_ann :: GHC.Prim.State# GHC.Prim.RealWorld) -> case (a_aeO `cast` (GHC.Types.NTCo:IO () :: GHC.Types.IO () ~ (GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)))) eta_ann of _ { (# new_s_anz, _ #) -> (k_sYz `cast` (GHC.Types.NTCo:IO () :: GHC.Types.IO () ~ (GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)))) new_s_anz }) `cast` (sym (GHC.Types.NTCo:IO ()) :: (GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)) ~ GHC.Types.IO ()) end Rec } which, despite tying the knot, leaks (so the program at least terminates).
Indeed the following definition of 'always' (or 'forever') fares better in that regard, but is more susceptible to producing unproductive loops:
Indeed, that doesn't terminate with -O2 -fno-state-hack
always a = let act = a >> act in act
(I used noop = yield for avoiding that problem in my tests)
regards,
Bertram

Daniel Fischer wrote:
Except that with optimisations turned on, GHC ties the knot for you (at least if always isn't exported). Without -fno-state-hack, the knot is tied so tightly that always (return ()) is never descheduled (and there's no leak).
Yes, I was concentrating on -O2, without -fno-state-hack.
With -fno-state-hack, I get
Rec { Main.main_always :: GHC.Types.IO () -> GHC.Types.IO () GblId [Arity 1 NoCafRefs Str: DmdType L] Main.main_always = \ (a_aeO :: GHC.Types.IO ()) -> let { k_sYz :: GHC.Types.IO () LclId [Str: DmdType] k_sYz = Main.main_always a_aeO } in (\ (eta_ann :: GHC.Prim.State# GHC.Prim.RealWorld) -> case (a_aeO `cast` (GHC.Types.NTCo:IO () :: GHC.Types.IO () ~ (GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)))) eta_ann of _ { (# new_s_anz, _ #) -> (k_sYz `cast` (GHC.Types.NTCo:IO () :: GHC.Types.IO () ~ (GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)))) new_s_anz }) `cast` (sym (GHC.Types.NTCo:IO ()) :: (GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)) ~ GHC.Types.IO ()) end Rec }
Which is always = \a_aeO -> let k_sYz = always a_aeO in a_aeO >> k_sYz specialised to IO, and with (>>) inlined. Where is the knot? regards, Bertram

Am Mittwoch 14 April 2010 23:13:13 schrieb Gregory Collins:
Jesper Louis Andersen
writes: This post describes some odd behaviour I have seen in GHC 6.12.1 when writing Combinatorrent. The post is literate Haskell so you can run it. The executive summary: A space leak occurs when a new process is spawned from inside another process - and I can't figure out why. I am asking for help on haskell-cafe.
...[snip]...
import Control.Monad.State
Does the problem go away if you use "Control.Monad.State.Strict"?
No. The problem goes away, however, if I replace p1 with p1 = forever $ liftIO (return () >> threadDelay 0) It is reduced, but still present, for p1 = forever $ liftIO (return () >> yield) It is also reduced by decreasing the delay in p2. I don't know what's going on, though.

* On Wednesday, April 14 2010, Jesper Louis Andersen wrote:
newtype Process a b c = Process (ReaderT a (StateT b IO) c) deriving (Functor, Monad, MonadIO, MonadState b, MonadReader a)
Note that the automatic derivations of *MonadState b* and *MonadReader a* makes GHC spit our some mkUsageInfo warnings in its generation of the .hi-files. They don't seem to be dangerous. Glueing instructions for our model kit is given by
The relevant bug for that is: http://hackage.haskell.org/trac/ghc/ticket/3955 -- Adam
participants (9)
-
Adam Vogt
-
Bertram Felgenhauer
-
Bulat Ziganshin
-
Daniel Fischer
-
Gregory Collins
-
Jason Dagit
-
Jesper Louis Andersen
-
Neil Brown
-
Simon Peyton-Jones