Why doesn't this consume all the computer's memory?

I would expect the following to consume all the computer's memory and die due to a buildup of lazy pattern matches for the `y` value. ``` import Data.Either main = print x >> print y where (length -> x, length -> y) = paritionEithers $ repeat (Left ()) ``` That is, `partitionEithers` is ``` partitionEithers :: [Either a b] -> ([a],[b]) partitionEithers = foldr go ([],[]) where go (Left x) ~(xs,ys) = (x:xs,ys) go (Right y) ~(xs,ys) = (xs,y:ys) ``` and, in the -ddump-simpl we see the `go Left` branch returns a thunk on both the right and left sides that hold onto the evaluation of (x:xs,ys) as we would expect ``` Left x_aqy -> (GHC.Types.: @ a_a1q8 x_aqy (case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }), case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA }); ``` Our code keeps generating more and more of these thunks as the left-hand side chases down the infinite list of `Left ()` values, and the machine cannot let go of them because, as far as it knows, we are going to reach the end sometime and then need the right-hand side. Thus I expect it would consume all the memory and crash. But it doesn't. It just sits there forever consuming 100% CPU at a constant memory limit. This means my mental model is defective and I'm unable to properly reason about the space usage of my programs. Could someone please enlighten me as to were I'm missing? Is there some sort of optimization going on here? When can it be depend on? Thanks very much! -Tyson I would expect

I believe I actually figured it out. There is not buildup because y
is just forever bound to
y = length . snd $ paritionEithers $ repeat (Left ())
I guess the thing to realize is that this function will traverse the
list twice. That is, what I wrote is essentially
x = length . fst $ paritionEithers $ repeat (Left ())
y = length . snd $ paritionEithers $ repeat (Left ())
where both x and y independently traverse the entire list repeating
any work that needs to be done to generate the elements.
Thanks! -Tyson
On Mon, 5 Nov 2018 at 14:00, Tyson Whitehead
I would expect the following to consume all the computer's memory and die due to a buildup of lazy pattern matches for the `y` value.
``` import Data.Either
main = print x >> print y where (length -> x, length -> y) = paritionEithers $ repeat (Left ()) ```
That is, `partitionEithers` is
``` partitionEithers :: [Either a b] -> ([a],[b]) partitionEithers = foldr go ([],[]) where go (Left x) ~(xs,ys) = (x:xs,ys) go (Right y) ~(xs,ys) = (xs,y:ys) ```
and, in the -ddump-simpl we see the `go Left` branch returns a thunk on both the right and left sides that hold onto the evaluation of (x:xs,ys) as we would expect
``` Left x_aqy -> (GHC.Types.: @ a_a1q8 x_aqy (case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }), case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA }); ```
Our code keeps generating more and more of these thunks as the left-hand side chases down the infinite list of `Left ()` values, and the machine cannot let go of them because, as far as it knows, we are going to reach the end sometime and then need the right-hand side.
Thus I expect it would consume all the memory and crash. But it doesn't. It just sits there forever consuming 100% CPU at a constant memory limit. This means my mental model is defective and I'm unable to properly reason about the space usage of my programs.
Could someone please enlighten me as to were I'm missing? Is there some sort of optimization going on here? When can it be depend on?
Thanks very much! -Tyson
I would expect

It can't be as simple as you make out. The semantics of ViewPatterns cannot be such that (length -> x, length -> y) = partitionEithers $ repeat (Left ()) means x = length . fst $ partitionEithers $ repeat (Left ()) y = length . snd $ partitionEithers $ repeat (Left ()) It must surely mean (px, py) = partitionEithers $ repeat (Left ()) x = length px y = length py With that interpretation my by-hand evaluations show a space leak, see below. The only way I can reconcile this with the observed behaviour is that GHC's garbage collector can "see through" simple case statements. I vaguely remember reading something like this. Can any GHC developer confirm? Tom main = print x >> print y where (length -> x, length -> y) = partitionEithers $ repeat (Left ()) main = case (partitionEithers (repeat (Left ()))) of (px, py) -> let x = length px y = length py in print x >> print y (let z = ([], [])) main = case (foldr go z (repeat (Left ()))) of (px, py) -> ... main = case (foldr go z (Left () : repeat (Left ()))) of (px, py) -> ... main = case (go (Left ()) (foldr go z (repeat (Left ()))) of (px, py) -> ... main = case (let t = foldr go z (repeat (Left ())) in (fst t, snd t)) of (px, py) -> ... t = foldr go z (repeat (Left ())) main = case (fst t, snd t) of (px, py) -> ... t = foldr go z (repeat (Left ())) main = case (fst t, snd t) of (px, py) -> ... t = foldr go z (repeat (Left ())) main = let x = length (fst t) y = length (snd t) in print x >> print y t = foldr go z (repeat (Left ())) main = let x = length (fst t) y = length (snd t) in print x >> print y (omitting a few steps ...) t = go (Left ()) z (foldr go z (repeat (Left ()))) main = let x = length (fst t) y = length (snd t) in print x >> print y t = let t2 = foldr go z (repeat (Left ())) in (fst t2, snd t2) ... t = (fst t2, snd t2) t2 = foldr go z (repeat (Left ())) ... t = (fst t2, snd t2) t2 = (fst t3, snd t3) t3 = foldr go z (repeat (Left ())) ... t = (fst t3, snd t2) t2 = (fst t3, snd t3) t3 = foldr go z (repeat (Left ())) ... On Mon, Nov 05, 2018 at 02:25:50PM -0500, Tyson Whitehead wrote:
I believe I actually figured it out. There is not buildup because y is just forever bound to
y = length . snd $ paritionEithers $ repeat (Left ())
I guess the thing to realize is that this function will traverse the list twice. That is, what I wrote is essentially
x = length . fst $ paritionEithers $ repeat (Left ()) y = length . snd $ paritionEithers $ repeat (Left ())
where both x and y independently traverse the entire list repeating any work that needs to be done to generate the elements.
Thanks! -Tyson On Mon, 5 Nov 2018 at 14:00, Tyson Whitehead
wrote: I would expect the following to consume all the computer's memory and die due to a buildup of lazy pattern matches for the `y` value.
``` import Data.Either
main = print x >> print y where (length -> x, length -> y) = paritionEithers $ repeat (Left ()) ```
That is, `partitionEithers` is
``` partitionEithers :: [Either a b] -> ([a],[b]) partitionEithers = foldr go ([],[]) where go (Left x) ~(xs,ys) = (x:xs,ys) go (Right y) ~(xs,ys) = (xs,y:ys) ```
and, in the -ddump-simpl we see the `go Left` branch returns a thunk on both the right and left sides that hold onto the evaluation of (x:xs,ys) as we would expect
``` Left x_aqy -> (GHC.Types.: @ a_a1q8 x_aqy (case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }), case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA }); ```
Our code keeps generating more and more of these thunks as the left-hand side chases down the infinite list of `Left ()` values, and the machine cannot let go of them because, as far as it knows, we are going to reach the end sometime and then need the right-hand side.
Thus I expect it would consume all the memory and crash. But it doesn't. It just sits there forever consuming 100% CPU at a constant memory limit. This means my mental model is defective and I'm unable to properly reason about the space usage of my programs.
Could someone please enlighten me as to were I'm missing? Is there some sort of optimization going on here? When can it be depend on?

I take back my follow up comment. I still don't understand why there
isn't a buildup of thunks.
I've written an updated/simplified variant and posted it on
r/haskellquestions. Hopefully someone will enlighten me.
https://www.reddit.com/r/haskellquestions/comments/9v6z49/why_does_this_code...
Thanks! -Tyson
On Mon, 5 Nov 2018 at 14:25, Tyson Whitehead
I believe I actually figured it out. There is not buildup because y is just forever bound to
y = length . snd $ paritionEithers $ repeat (Left ())
I guess the thing to realize is that this function will traverse the list twice. That is, what I wrote is essentially
x = length . fst $ paritionEithers $ repeat (Left ()) y = length . snd $ paritionEithers $ repeat (Left ())
where both x and y independently traverse the entire list repeating any work that needs to be done to generate the elements.
Thanks! -Tyson On Mon, 5 Nov 2018 at 14:00, Tyson Whitehead
wrote: I would expect the following to consume all the computer's memory and die due to a buildup of lazy pattern matches for the `y` value.
``` import Data.Either
main = print x >> print y where (length -> x, length -> y) = paritionEithers $ repeat (Left ()) ```
That is, `partitionEithers` is
``` partitionEithers :: [Either a b] -> ([a],[b]) partitionEithers = foldr go ([],[]) where go (Left x) ~(xs,ys) = (x:xs,ys) go (Right y) ~(xs,ys) = (xs,y:ys) ```
and, in the -ddump-simpl we see the `go Left` branch returns a thunk on both the right and left sides that hold onto the evaluation of (x:xs,ys) as we would expect
``` Left x_aqy -> (GHC.Types.: @ a_a1q8 x_aqy (case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }), case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA }); ```
Our code keeps generating more and more of these thunks as the left-hand side chases down the infinite list of `Left ()` values, and the machine cannot let go of them because, as far as it knows, we are going to reach the end sometime and then need the right-hand side.
Thus I expect it would consume all the memory and crash. But it doesn't. It just sits there forever consuming 100% CPU at a constant memory limit. This means my mental model is defective and I'm unable to properly reason about the space usage of my programs.
Could someone please enlighten me as to were I'm missing? Is there some sort of optimization going on here? When can it be depend on?
Thanks very much! -Tyson
I would expect

Sorry for all the noise. I believe I finally tracked down the eat all
the memory/don't eat all the memory trigger. It is the view pattern.
When I calculate the length in a view pattern, memory stays constant,
when I calculate it outside, it explodes.
I'll have to examine the simpl dump some more to see if I can figure
out the difference between these two. Sorry for the noise on the
list.
Cheers! -Tyson
On Wed, 7 Nov 2018 at 23:42, Tyson Whitehead
I take back my follow up comment. I still don't understand why there isn't a buildup of thunks.
I've written an updated/simplified variant and posted it on r/haskellquestions. Hopefully someone will enlighten me.
https://www.reddit.com/r/haskellquestions/comments/9v6z49/why_does_this_code...
Thanks! -Tyson On Mon, 5 Nov 2018 at 14:25, Tyson Whitehead
wrote: I believe I actually figured it out. There is not buildup because y is just forever bound to
y = length . snd $ paritionEithers $ repeat (Left ())
I guess the thing to realize is that this function will traverse the list twice. That is, what I wrote is essentially
x = length . fst $ paritionEithers $ repeat (Left ()) y = length . snd $ paritionEithers $ repeat (Left ())
where both x and y independently traverse the entire list repeating any work that needs to be done to generate the elements.
Thanks! -Tyson On Mon, 5 Nov 2018 at 14:00, Tyson Whitehead
wrote: I would expect the following to consume all the computer's memory and die due to a buildup of lazy pattern matches for the `y` value.
``` import Data.Either
main = print x >> print y where (length -> x, length -> y) = paritionEithers $ repeat (Left ()) ```
That is, `partitionEithers` is
``` partitionEithers :: [Either a b] -> ([a],[b]) partitionEithers = foldr go ([],[]) where go (Left x) ~(xs,ys) = (x:xs,ys) go (Right y) ~(xs,ys) = (xs,y:ys) ```
and, in the -ddump-simpl we see the `go Left` branch returns a thunk on both the right and left sides that hold onto the evaluation of (x:xs,ys) as we would expect
``` Left x_aqy -> (GHC.Types.: @ a_a1q8 x_aqy (case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }), case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA }); ```
Our code keeps generating more and more of these thunks as the left-hand side chases down the infinite list of `Left ()` values, and the machine cannot let go of them because, as far as it knows, we are going to reach the end sometime and then need the right-hand side.
Thus I expect it would consume all the memory and crash. But it doesn't. It just sits there forever consuming 100% CPU at a constant memory limit. This means my mental model is defective and I'm unable to properly reason about the space usage of my programs.
Could someone please enlighten me as to were I'm missing? Is there some sort of optimization going on here? When can it be depend on?
Thanks very much! -Tyson
I would expect

On Thu, 8 Nov 2018 at 01:09, Tyson Whitehead
Sorry for all the noise. I believe I finally tracked down the eat all the memory/don't eat all the memory trigger. It is the view pattern.
There was a request to post both codes as it seems a bit unexpected that a view pattern would make that difference. Here they are. I compiled both with `ghc file.hs` using the standard GHC 8.4.3 from NixOS 18.09. Constant memory code (RES 6MB): {-# LANGUAGE ViewPatterns #-} module Main (main) where import Data.Either (length -> lx,length -> ly) = partitionEithers (repeat $ Left ()) main = do print lx print ly Unbounded memory: module Main (main) where import Data.Either (xs, ys) = partitionEithers (repeat $ Left ()) main = do print $ length xs print $ length ys Cheers! -Tyson PS: The constant-memory view-pattern version seems to compile down to lxly = case partitionEithers (repeat $ Left ()) of (xs,ys) -> (length xs,length ys) main = do print (case lxly of (lx,_) -> lx) print (case lxly of (_,ly) -> ly) while the unbounded-memory non-view-pattern one compiles down to xsys = partitionEithers (repeat $ Left ()) xs = case xsys of (xs,_) -> xs ys = case xsys of (_,ys) -> ys main = do print (length xs) print (length ys)

I must admit I'm stumped! I don't see any significant difference between those two programs. On Thu, Nov 08, 2018 at 11:04:34AM -0500, Tyson Whitehead wrote:
Constant memory code (RES 6MB):
{-# LANGUAGE ViewPatterns #-}
module Main (main) where
import Data.Either
(length -> lx,length -> ly) = partitionEithers (repeat $ Left ())
main = do print lx print ly
Unbounded memory:
module Main (main) where
import Data.Either
(xs, ys) = partitionEithers (repeat $ Left ())
main = do print $ length xs print $ length ys
Cheers! -Tyson
PS: The constant-memory view-pattern version seems to compile down to
lxly = case partitionEithers (repeat $ Left ()) of (xs,ys) -> (length xs,length ys)
main = do print (case lxly of (lx,_) -> lx) print (case lxly of (_,ly) -> ly)
while the unbounded-memory non-view-pattern one compiles down to
xsys = partitionEithers (repeat $ Left ()) xs = case xsys of (xs,_) -> xs ys = case xsys of (_,ys) -> ys
main = do print (length xs) print (length ys)

It seems that it only runs in constant space when the two lengths compile to a pre-evaluated CAF. In the below version, at low optimization levels the evaluation of lx/ly is deferred to the "forkIO" thread, and memory use grows linearly with the timeout. At high optimization levels, memory use is constant, but the timeout never happens, and it seems plausible that the CAF is lifted out to the top level, and is evaluated in constant space (but infinite time). So it seems, that as a CAF, the generated code does not attempt to memoize the input infinite list. It may be worth noting that if "repeat" is replaced with "replicate 10000", "replicate 1000000", ... memory usage grows with the size of the generated list. Only the infinite list when pre-computed as a CAF seems to "run" in constant space. (Scare quotes around "run" since in this it never completes the computation. You either never finish, or use unbounded space, pick your poison). ------ snip ------ {-# LANGUAGE LambdaCase #-} {-# LANGUAGE ViewPatterns #-} module Main (main) where import System.Environment import System.Timeout import Control.Concurrent import Control.Concurrent.MVar import Data.List partitionEithers :: [Either a b] -> ([a],[b]) partitionEithers = foldr go ([],[]) where go (Left x) ~(xs,ys) = (x:xs,ys) go (Right y) ~(xs,ys) = (xs,y:ys) main = do n <- getArgs >>= \case [] -> return 100 a:_ -> return $ read a m <- newEmptyMVar forkIO $ do let (length -> lx, length -> ly) = partitionEithers $ repeat $ Left () print lx print ly putMVar m () timeout n $ takeMVar m ------ snip ------
On Nov 8, 2018, at 1:13 PM, Tom Ellis
wrote: I must admit I'm stumped! I don't see any significant difference between those two programs.
On Thu, Nov 08, 2018 at 11:04:34AM -0500, Tyson Whitehead wrote:
Constant memory code (RES 6MB):
{-# LANGUAGE ViewPatterns #-}
module Main (main) where
import Data.Either
(length -> lx,length -> ly) = partitionEithers (repeat $ Left ())
main = do print lx print ly
Unbounded memory:
module Main (main) where
import Data.Either
(xs, ys) = partitionEithers (repeat $ Left ())
main = do print $ length xs print $ length ys
-- Viktor.

Tyson Whitehead wrote:
and, in the -ddump-simpl we see the `go Left` branch returns a thunk on both the right and left sides that hold onto the evaluation of (x:xs,ys) as we would expect
``` Left x_aqy -> (GHC.Types.: @ a_a1q8 x_aqy (case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }), case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA }); ```
These two expressions are selectors (accessing a field from a single constructor data type), which the compiler recoginizes (not sure where and how exactly) and implemented in terms of special stg_sel_<nnn> closures. At runtime, applying such a closure results in special THUNK_SELECTOR thunks. These thunks are evaluated by the garbage collector (see rts/sm/Scav.c). So this is a very special optimization in the garbage collector, not a flaw in your general mental model. Cheers, Bertram

Bertram Felgenhauer via Haskell-Cafe wrote:
Tyson Whitehead wrote:
and, in the -ddump-simpl we see the `go Left` branch returns a thunk on both the right and left sides that hold onto the evaluation of (x:xs,ys) as we would expect
``` Left x_aqy -> (GHC.Types.: @ a_a1q8 x_aqy (case ds1_d1rO of { (xs_aqz, ys_aqA) -> xs_aqz }), case ds1_d1rO of { (xs_aqz, ys_aqA) -> ys_aqA }); ```
These two expressions are selectors (accessing a field from a single constructor data type), which the compiler recoginizes (not sure where and how exactly) and implemented in terms of special stg_sel_<nnn> closures. At runtime, applying such a closure results in special THUNK_SELECTOR thunks. These thunks are evaluated by the garbage collector (see rts/sm/Scav.c).
Or rather rts/sm/Evac.c, which has functions unchain_thunk_selectors() and eval_thunk_selector() that do the actual work. Cheers, Bertram
participants (4)
-
Bertram Felgenhauer
-
Tom Ellis
-
Tyson Whitehead
-
Viktor Dukhovni