
The following is in ghci 6.8.2 with default options (e.g., default heap and stack). "G>" denotes the ghci prompt. At some points ghci will use 500MB of memory. Be sure you have enough physical memory. G> :m + Data.List System.Random G> let f n = take n randoms (mkStdGen 0)) :: [Float] I define f to take a parameter so that thunks are created fresh every time I use it. G> sum (f 1000000) *** Exception: stack overflow This overflow is known, and its solution too: G> foldl' (+) 0 (f 1000000) 499985.38 The more interesting part is when we also sort: G> foldl' (+) 0 (sort (f 1000000)) *** Exception: stack overflow At this point everyone hastens to assign blames according to his/her mental model. But some mental models are more mistaken than others. When a mental model is shown to be mistaken by another example, we have a name for the feeling that ensues: "my brain has exploded". And here is an example that likely causes your brain to explode (it also causes ghci to use 500MB of memory): G> foldl' (+) 0 (sort (reverse (f 1000000))) 499992.0 (Don't worry about the different sum. We all know that re-ordering floating point numbers changes the sum.) Some brains haven't exploded yet. They believe that reverse helps by forcing all 1000000 cons cells before sorting. To burst them, let's reverse twice: G> foldl' (+) 0 (sort (reverse (reverse (f 1000000)))) *** Exception: stack overflow You can also reverse 3 times, 4 times... In general, even number of times overflows, odd number of times works. It is now clear that order before sorting matters. But why should it? I now give some hints and step down. I invite you to contemplate and discuss further. * Suppose f 6 is [a,b,c,d,e,f]. What is the size of thunk f? Compare to the size of thunk a, b, etc. Also note how much is shared among them. If you want, the source code of System.Random is at http://www.haskell.org/ghc/docs/latest/html/libraries/random/src/System-Rand... If that is too complicated, a simpler model is available at http://haskell.org/onlinereport/random.html For our present purpose, the simpler model is accurate enough. * Based on that knowledge, you're in a similar situation as http://www.haskell.org/haskellwiki/Stack_overflow#Scans Thus for example "last (f 1000000)" overflows, while "print (f 1000000)" works just fine (if you have the patience). * Debug.Trace.trace is great for verifying relative order of evaluation. If I construct this list [trace "0" a, trace "1" b, trace "2" c, ...] and pass it to sort, I can see which item is forced in what order when sort compares items. This command does: G> :m + Debug.Trace G> sort (zipWith (trace . show) [0..] (f 6)) What do you see? What do you think? Bump up the number 6 to 10, 20... to verify what you guess. * If you are interested in finding out in detail why sort does that, the source code is at http://www.haskell.org/ghc/docs/latest/html/libraries/base/src/Data-List.htm... Note the code of "merge": merge cmp xs [] = xs merge cmp [] ys = ys merge cmp (x:xs) (y:ys) = ... Suppose you re-order the first two lines, i.e., make it merge cmp [] ys = ys merge cmp xs [] = xs merge cmp (x:xs) (y:ys) = ... what will happen? * Prove that the sorting algorithm is only responsible for O(n log n) time and O(log n) stack. Thus any extra stack pressure must come from thunks in the list items.