
On Mon, Mar 31, 2014 at 11:54 PM, Dan Doel
In the past year or two, there have been multiple performance problems in various areas related to the fact that lambda abstraction is not free, though we tend to think of it as so. A major example of this was deriving of Functor. If we were to derive Functor for lists, we would end up with something like:
instance Functor [] where fmap _ [] = [] fmap f (x:xs) = f x : fmap (\y -> f y) xs
This definition is O(n^2) when fully evaluated,, because it causes O(n) eta expansions of f, so we spend time following indirections proportional to the depth of the element in the list. This has been fixed in 7.8, but there are other examples. I believe lens, [1] for instance, has some stuff in it that works very hard to avoid this sort of cost; and it's not always as easy to avoid as the above example. Composing with a newtype wrapper, for instance, causes an eta expansion that can only be seen as such at the core level.
The obvious solution is: do eta reduction. However, this is not operationally sound currently. The problem is that seq is capable of telling the difference between the following two expressions:
undefined \x -> undefined x
The former causes seq to throw an exception, while the latter is considered defined enough to not do so. So, if we eta reduce, we can cause terminating programs to diverge if they make use of this feature.
Luckily, there is a solution.
Semantically one would usually identify the above two expressions. While I do believe one could construct a semantics that does distinguish them, it is not the usual practice. This suggests that there is a way to not distinguish them, perhaps even including seq. After all, the specification of seq is monotone and continuous regardless of whether we unify ⊥ with \x -> ⊥ x or insert an extra element for the latter.
The currently problematic case is function spaces, so I'll focus on it. How should:
seq : (a -> b) -> c -> c
act? Well, other than an obvious bottom, we need to emit bottom whenever our given function is itself bottom at every input. This may first seem like a problem, but it is actually quite simple. Without loss of generality, let us assume that we can enumerate the type a. Then, we can feed these values to the function, and check their results for bottom. Conal Elliot has prior art for this sort of thing with his unamb [2] package. For each value x :: a, simply compute 'f x `seq` ()' in parallel, and look for any successes. If we ever find one, we know the function is non-bottom, and we can return our value of c. If we never finish searching, then the function must be bottom, and seq should not terminate, so we have satisfied the specification.
Love it. I have always been a fan of "fast and loose" reasoning in Haskell. Abstracting on seq, and treating it as a bottom if it evaluates to one, fits in perfectly.