
Henning Thielemann wrote:
Attached is a program with a space leak that I do not understand. I have coded a simple 'map' function, once using unsafePerformIO and once without. UnsafePerformIO has a space leak in some circumstances. In the main program I demonstrate cases with and without space leak. Without space leak the program writes a file to the disk until it is full. Any idea?
The program relies on the GC doing short-cut evaluation of record selectors to avoid a space leak. If the user of the function splitAtLazy | splitAtLazy :: [b] -> [a] -> ([a],[a]) | splitAtLazy nt xt = | (\ ~(ys,zs) -> (ys,zs)) $ | case (nt,xt) of | (_:ns, x:xs) -> | let (ys,zs) = splitAtLazy ns xs | in (x:ys,zs) | (_, xs) -> ([],xs) somehow holds on to a reference of the returned pair while processing the first part of the list, there will be a space leak, because that means that the whole prefix remains reachable. splitAtLazy itself is not leaky, because the value returned by the recursive call is scrutinized as follows, Main.$wsplitAtLazy = ... (# case ds_sLb of wild_B1 { (ys_agC, zs_agE) -> ys_agC }, case ds_sLb of wild_B1 { (ys_agC, zs_agE) -> zs_agE } #) and ghc turns that into record selector thunks in the code generator. The precise rule can be found in compiler/codeGen/StgCmmBind.hs: | Note [Selectors] | ~~~~~~~~~~~~~~~ | We look at the body of the closure to see if it's a selector---turgid, | but nothing deep. We are looking for a closure of {\em exactly} the | form: | | ... = [the_fv] \ u [] -> | case the_fv of | con a_1 ... a_n -> a_i Now let's look at how the result of splitAtLazy is used. non-leaky version (case 0): Main.lvl1 = case Main.$wsplitAtLazy @ () @ GHC.Types.Char Main.xs Main.xs1 of ww_sLv { (# ww1_sLx, ww2_sLy #) -> Main.go (GHC.Base.++ @ GHC.Types.Char ww1_sLx ww2_sLy) } The return values are passed on to (++) directly. The result pair is actually never built at all, so no reference to it can be kept. leaky version (case 3): Main.ds = case Main.$wsplitAtLazy @ () @ GHC.Types.Char Main.xs Main.xs1 of ww_sKy { (# ww1_sKA, ww2_sKB #) -> (ww1_sKA, ww2_sKB) } This builds the pair returned by splitAtLazy. Main.lvl1 = case Main.ds of wild_Xw { (prefix_aCf, suffix_aCh) -> prefix_aCf } Use of prefix: it's a record selector. This is fine. Main.lvl2 = Main.go Main.lvl1 The prefix is then passed to some worker function. Main.lvl3 = case Main.ds of wild_Xw { (prefix_aCf, suffix_aCh) -> Main.go1 suffix_aCh } Use of suffix: Due to the call of Main.go1 this is *not* a record selector. It is compiled to an actual case expression, which to the garbage collector looks just like an ordinary thunk. A reference to Main.ds is kept around until the suffix is about to be processed and a memory leak ensues. If the compiler had produced Main.lvl3 = case Main.ds of wild_Xw { (prefix_aCf, suffix_aCh) -> suffix_aCh } Main.lvl4 = Main.go1 Main.lvl3 instead, then there would not be a leak. This whole record selector thunk business is very fragile. The good news is that the problem is completely unrelated to unsafePerformIO (the presence of unsafePerformIO makes optimisations more difficult, but any pure function of sufficient complexity would have the same effect). There's a simple fix for the problem, too: Change
let (prefix, suffix) = makeTwoLists 'a'
to let !(prefix, suffix) = makeTwoLists 'a' in which case the compiler produces code similar to the non-leaky case for all alternatives. HTH, Bertram