
Attached is a program with a space leak that I do not understand. I have coded a simple 'map' function, once using unsafePerformIO and once without. UnsafePerformIO has a space leak in some circumstances. In the main program I demonstrate cases with and without space leak. Without space leak the program writes a file to the disk until it is full. Any idea? The original problem is a function that is compiled by LLVM and shall be applied to a list in a mapAccumL manner.

Henning Thielemann wrote:
Attached is a program with a space leak that I do not understand. I have coded a simple 'map' function, once using unsafePerformIO and once without. UnsafePerformIO has a space leak in some circumstances. In the main program I demonstrate cases with and without space leak. Without space leak the program writes a file to the disk until it is full. Any idea?
The program relies on the GC doing short-cut evaluation of record selectors to avoid a space leak. If the user of the function splitAtLazy | splitAtLazy :: [b] -> [a] -> ([a],[a]) | splitAtLazy nt xt = | (\ ~(ys,zs) -> (ys,zs)) $ | case (nt,xt) of | (_:ns, x:xs) -> | let (ys,zs) = splitAtLazy ns xs | in (x:ys,zs) | (_, xs) -> ([],xs) somehow holds on to a reference of the returned pair while processing the first part of the list, there will be a space leak, because that means that the whole prefix remains reachable. splitAtLazy itself is not leaky, because the value returned by the recursive call is scrutinized as follows, Main.$wsplitAtLazy = ... (# case ds_sLb of wild_B1 { (ys_agC, zs_agE) -> ys_agC }, case ds_sLb of wild_B1 { (ys_agC, zs_agE) -> zs_agE } #) and ghc turns that into record selector thunks in the code generator. The precise rule can be found in compiler/codeGen/StgCmmBind.hs: | Note [Selectors] | ~~~~~~~~~~~~~~~ | We look at the body of the closure to see if it's a selector---turgid, | but nothing deep. We are looking for a closure of {\em exactly} the | form: | | ... = [the_fv] \ u [] -> | case the_fv of | con a_1 ... a_n -> a_i Now let's look at how the result of splitAtLazy is used. non-leaky version (case 0): Main.lvl1 = case Main.$wsplitAtLazy @ () @ GHC.Types.Char Main.xs Main.xs1 of ww_sLv { (# ww1_sLx, ww2_sLy #) -> Main.go (GHC.Base.++ @ GHC.Types.Char ww1_sLx ww2_sLy) } The return values are passed on to (++) directly. The result pair is actually never built at all, so no reference to it can be kept. leaky version (case 3): Main.ds = case Main.$wsplitAtLazy @ () @ GHC.Types.Char Main.xs Main.xs1 of ww_sKy { (# ww1_sKA, ww2_sKB #) -> (ww1_sKA, ww2_sKB) } This builds the pair returned by splitAtLazy. Main.lvl1 = case Main.ds of wild_Xw { (prefix_aCf, suffix_aCh) -> prefix_aCf } Use of prefix: it's a record selector. This is fine. Main.lvl2 = Main.go Main.lvl1 The prefix is then passed to some worker function. Main.lvl3 = case Main.ds of wild_Xw { (prefix_aCf, suffix_aCh) -> Main.go1 suffix_aCh } Use of suffix: Due to the call of Main.go1 this is *not* a record selector. It is compiled to an actual case expression, which to the garbage collector looks just like an ordinary thunk. A reference to Main.ds is kept around until the suffix is about to be processed and a memory leak ensues. If the compiler had produced Main.lvl3 = case Main.ds of wild_Xw { (prefix_aCf, suffix_aCh) -> suffix_aCh } Main.lvl4 = Main.go1 Main.lvl3 instead, then there would not be a leak. This whole record selector thunk business is very fragile. The good news is that the problem is completely unrelated to unsafePerformIO (the presence of unsafePerformIO makes optimisations more difficult, but any pure function of sufficient complexity would have the same effect). There's a simple fix for the problem, too: Change
let (prefix, suffix) = makeTwoLists 'a'
to let !(prefix, suffix) = makeTwoLists 'a' in which case the compiler produces code similar to the non-leaky case for all alternatives. HTH, Bertram

On Sun, 27 Jun 2010, Bertram Felgenhauer wrote:
If the compiler had produced
Main.lvl3 = case Main.ds of wild_Xw { (prefix_aCf, suffix_aCh) -> suffix_aCh }
Main.lvl4 = Main.go1 Main.lvl3
instead, then there would not be a leak. This whole record selector thunk business is very fragile.
Bertram, thank you a lot for the detailed analysis! It seems that I have stepped into a nasty implementation detail of GHC.
The good news is that the problem is completely unrelated to unsafePerformIO (the presence of unsafePerformIO makes optimisations more difficult, but any pure function of sufficient complexity would have the same effect).
There's a simple fix for the problem, too: Change
let (prefix, suffix) = makeTwoLists 'a'
to let !(prefix, suffix) = makeTwoLists 'a'
in which case the compiler produces code similar to the non-leaky case for all alternatives.
Actually this fix works for my example program and another one that is based on chunky StorableVector. But when I switch off profiling the StorableVector example runs into a space leak, again, while the one with plain lists that I have sent here still works. I'll investigate into this, but it seems indeed very fragile and I wonder whether there is a more reliable way. Maybe I can combine splitAtLazy and (++) to a function like splitAtAndAppend :: [x] -> ([a] -> [b]) -> ([a] -> [b]) -> [a] -> [b] but I'm afraid I will need pairs temporarily and then I run into the same problems.

On Sun, 27 Jun 2010, Henning Thielemann wrote:
Maybe I can combine splitAtLazy and (++) to a function like splitAtAndAppend :: [x] -> ([a] -> [b]) -> ([a] -> [b]) -> [a] -> [b] but I'm afraid I will need pairs temporarily and then I run into the same problems.
I have now implemented a solution that is tailored to special function types in the first ([a] -> [b]) argument. It works, but it is sad that the general, more declarative solution is so fragile.
participants (2)
-
Bertram Felgenhauer
-
Henning Thielemann