
On Tue, Oct 4, 2011 at 2:39 AM, Roman Cheplyaka
Suppose I want a foldl which is evaluated many times on the same list but with different folding functions.
I would write something like this, to perform pattern-matching on the list only once:
module F where myFoldl :: [a] -> (b -> a -> b) -> b -> b myFoldl [] = \f a -> a myFoldl (x:xs) = let y = myFoldl xs in \f a -> y f (f a x)
However, for some reason GHC eta-expands it back.
It seems to be a common misconception that eta-abstracting your functions in this way will speed up or otherwise improve your code. Simon PJ has already provided a good explanation of why GHC eta expands. Let me take another tack and describe why the code you wrote without eta expansion probably doesn't provide you with any actual benefit. Roughly speaking, you're creating a chain of closures whose contents exactly describe the contents of your list (ie you've created something that's isomorphic to your original list structure), and so you should expect no benefit at all. Let's consider a specific concrete call to myFoldl: myFoldl (1:2:3:4:[]) = r1 where rt = \ f a -> a x4 = 4 r4 = \ f a -> rt f (f a x4) x3 = 3 r3 = \ f a -> r4 f (f a x3) x2 = 2 r2 = \ f a -> r3 f (f a x2) x1 = 1 r1 = \ f a -> r2 f (f a x1) Each of the bindings above is a separate heap-allocated object. Now consider the data representation for the function closures r1..r4. Each such closure has two free variables: the previous closure (r2..r4 or rt) and the value of the current list element (x1..x4). If you write that out schematically in box-and-pointer notation, you'll quickly see that the result has the exact same memory structure as your original list. Now, you'll probably want to point out that transforming your list into a string of closures like this eliminates the need to pattern match on the list structure. This is true, but that's because you've replaced that pattern match with a call to an unknown closure. That's because in most circumstances we'll get just one copy of the code for r1..r4. GHC will actually generate code a bit like the following [*]: myFoldl (1:2:3:4:[]) = r1 where closure x r = \ f a -> r f (f a x) rt = \ f a -> a x4 = 4 r4 = closure x4 rt x3 = 3 r3 = closure x3 r4 x2 = 2 r2 = closure x2 r3 x1 = 1 r1 = closure x1 r2 So there are two different kinds of closure that get passed to r in closure x r: 1) rt 2) a call to closure xn r(n+1) Distinguishing these two cases (even if just by branching to a closure code pointer) leads to overheads comparable to (and generally larger than) those of pattern matching. GHC used to distinguish (:) and [] by branching to an unknown function pointer (exactly as is happening here) and switched to pointer tagging instead because it was faster. All this assumes everything has been completely evaluated already. Laziness drowns out most of the relative advantages and disadvantages here (and there's an even-more-involved explanation of why you might lose strictness information in your eta-abstracted function, making things worse still). It also assumes you are not able to specialize your code to a particular higher-order function. Any time you can do that, it's potentially very beneficial. For example, the following *might* be a worthwhile definition of foldl: {-# INLINE foldl #-} foldl f = loop where loop a [] = a loop a (x:xs) = loop (f a x) xs -Jan-Willem Maessen [*] Note that GHC actually treats the free variables of the closure (x and r in this case) a little bit specially, so the code isn't necessarily *literally* equivalent to what I've shown here, but it's pretty close.