
[note, the thread is almost a month old] Bernie Pope wrote:
On 23/10/2007, at 8:09 AM, Thomas Hartman wrote:
(Prelude sort, which I think is mergesort, just blew the stack.)
GHC uses a "bottom up" merge sort these days.
It starts off by creating a list of singletons, then it repeatedly merges adjacent pairs of lists until there is only one list left.
I was teaching sorting to my first year students, and I also bumped into the stack overflow at one million elements, using GHC's merge sort.
I think I got to the bottom of this. Consider the following snippet: sort $ (take (10^6) [1..]) The argument of the sort is a list with 10^6 unevaluated elements, [a=1, b=1+a, c=1+b, d=1+c, ...]. Now it turns out that merge sort as implemented in the base library compares the two last elements of the list first. This causes the evaluation of an expression that is approximately 10^6 applications of (+) deep. And that's where you get the stack overflow. [1] Thomas Hartman's example is of a similar nature, it also produces a list of unevaluated terms where each term depends on the value of the previous one. The modification that I proposed in http://www.haskell.org/pipermail/haskell-cafe/2007-October/033617.html has the effect of comparing the first two elements first. I actually believe that this is a reasonable change, because it's more likely to work out fine. But it'll produce a stack overflow on sort $ (reverse (take (10^6) [1..])) instead, which doesn't cause problems currently. The root problem is the creation of deep unevaluated expressions. Bertram [1] Note that sort [1..10^6] works just fine because [1..10^6] produces a list of fully evaluated values, as it compares each list element to the upper bound when it is generated.