Haskell's non-strict evaluation can often lead to unexpected results when doing tail recursion if you're used to strict functional programming languages. In order to get the desired behavior you will need to force the accumulator (with something like Data.List's foldl', $!, seq, BangPatterns, etc.).

One issue with the tail recursive version is that you're basically sequencing it twice, forcing the result is going to force the whole spine of the list, and then you're walking it again with deepseq. The overhead of the deepseq call with that size list on my machine is 2.2 seconds, so you're basically paying that cost twice with a tail recursive version. allPairs2 only forces what is needed, so deepseq isn't traversing any data structures that are already forced.

I think what's happening with allPairs2 is that GHC is throwing away the the list during the traversal since the result of the computation isn't used at all. I get very different timings (still fast, but no longer orders of magnitude difference) for allPairs2 if the result is still around. You could probably figure this out with the heap profiler.

h> deepseq (allPairs2 [1..10000]) True

True

(2.47 secs, 4403724176 bytes)

h> let x = allPairs2 [1..10000]

(0.00 secs, 510488 bytes)

h> deepseq x True

True

(10.47 secs, 4401625192 bytes)

h> deepseq x True

True

(2.21 secs, 1031864 bytes)

I'm no expert, but I think that allPairs2 is likely as good as you're going to get with straightforward Haskell using lists. It doesn't build up a lot of unevaluated computation, and if you only need to see the first n elements it's not going to have to process the entire input. allPairs1 has most of the good properties of allPairs2 but is a bit worse because of the concatenation, though concatenating in this way isn't a disaster like it would be in a strict functional programming language.

-bob

On Tue, Sep 3, 2013 at 3:28 PM, Scott Pakin <pakin@lanl.gov> wrote:

I'm a Haskell beginner, and I'm baffled trying to reason about code
performance, at least with GHC. For a program I'm writing I needed to
find all pairs of elements of a list. That is, given the list "ABCD"
I wanted to wind up with the list
[('A','B'),('A','C'),('A','D'),('B','C'),('B','D'),('C','D')], where
the order of elements in the resulting list is immaterial, but I don't
want both ('A', 'B') and ('B', 'A'), for example.

My first attempt looked like this:

allPairs1 :: [a] -> [(a, a)]
allPairs1 [] = []
allPairs1 (x:xs) = map (\a -> (x, a)) xs ++ allPairs1 xs

Benchmarking with ghci's ":set +s" reveals the following performance
(median of three runs shown):

$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude> :!ghc -c -O2 allpairs.hs
Prelude> :load allpairs
Ok, modules loaded: AllPairs.
Prelude AllPairs> :m +Control.DeepSeq
Prelude Control.DeepSeq AllPairs> :show modules
AllPairs ( allpairs.hs, allpairs.o )
Prelude Control.DeepSeq AllPairs> :set +s
Prelude Control.DeepSeq AllPairs> deepseq (allPairs1 [1..10000]) True
True
(4.85 secs, 4004173984 bytes)

A colleague suggested getting rid of the list append as follows:

allPairs2 :: [a] -> [(a, a)]
allPairs2 [] = []
allPairs2 (x:xs) = allPairs2' x xs xs
where allPairs2' x [] [] = []
allPairs2' x (y:ys) zs = (x,y) : allPairs2' x ys zs
allPairs2' x [] (z:zs) = allPairs2' z zs zs

Not surprisingly, that runs faster:

Prelude Control.DeepSeq AllPairs> deepseq (allPairs2 [1..10000]) True
True
(2.14 secs, 4403686376 bytes)

I then figured I could do even better by rewriting the code
tail-recursively:

allPairs3 :: [a] -> [(a, a)]
allPairs3 [] = []
allPairs3 (x:xs) = allPairs3' x xs xs []
where allPairs3' :: a -> [a] -> [a] -> [(a, a)] -> [(a, a)]
allPairs3' h (x:xs) ys result = allPairs3' h xs ys ((h, x):result)
allPairs3' _ [] (y:ys) result = allPairs3' y ys ys result
allPairs3' _ [] [] result = result

This takes half the memory as the above (yay!) but runs substantially
*slower* (and appears to be thrashing my computer's memory system):

Prelude Control.DeepSeq AllPairs> deepseq (allPairs3 [1..10000]) True
True
(10.58 secs, 2403820152 bytes)

What gives? Why does my tail-recursive implementation run even slower
and presumably produce more garbage than my initial, naive
implementation? Is there some optimization GHC is performing for
allPairs1 and allPairs2 that it can't apply to allPairs3?

Similarly, how should I, as a newcomer who wants to write efficient
Haskell code, reason about whether a code change is likely to improve
rather than degrade performance? Guidelines like "per-iteration
appends = bad" and "tail recursion = good" are great, but the
preceding data indicate that something subtler is taking precedence
over those guidelines with respect to code performance.

Thanks in advance,
-- Scott

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe