Hi Edward,

Yes, I'm aware of that. However, I thought Dan's proposal especially droll given that changing seq to a class-based function would be sufficient to make eta-reduction sound, given appropriate instances (or lack thereof). Meaning we could leave the rest of the proposal unevaluated (lazily!).

And if somebody were to suggest that on a different day, +1 from me.

John

On Apr 1, 2014 10:32 AM, "Edward Kmett" <ekmett@gmail.com> wrote:

John,

Check the date and consider the process necessary to "enumerate all Haskell programs and check their types".

-Edward

On Tue, Apr 1, 2014 at 9:17 AM, John Lato <jwlato@gmail.com> wrote:

I think this is a great idea and should become a top priority. I would probably start by switching to a type-class-based seq, after which perhaps the next step forward would become more clear.

John L.

On Apr 1, 2014 2:54 AM, "Dan Doel" <dan.doel@gmail.com> wrote:

In the past year or two, there have been multiple performance problems in
various areas related to the fact that lambda abstraction is not free, though we
tend to think of it as so. A major example of this was deriving of Functor. If we
were to derive Functor for lists, we would end up with something like:

    instance Functor [] where
      fmap _ [] = []
      fmap f (x:xs) = f x : fmap (\y -> f y) xs

This definition is O(n^2) when fully evaluated,, because it causes O(n) eta
expansions of f, so we spend time following indirections proportional to the
depth of the element in the list. This has been fixed in 7.8, but there are
other examples. I believe lens, [1] for instance, has some stuff in it that
works very hard to avoid this sort of cost; and it's not always as easy to avoid
as the above example. Composing with a newtype wrapper, for instance, causes an
eta expansion that can only be seen as such at the core level.

The obvious solution is: do eta reduction. However, this is not operationally
sound currently. The problem is that seq is capable of telling the difference
between the following two expressions:

    undefined
    \x -> undefined x

The former causes seq to throw an exception, while the latter is considered
defined enough to not do so. So, if we eta reduce, we can cause terminating
programs to diverge if they make use of this feature.

Luckily, there is a solution.

Semantically one would usually identify the above two expressions. While I do
believe one could construct a semantics that does distinguish them, it is not
the usual practice. This suggests that there is a way to not distinguish them,
perhaps even including seq. After all, the specification of seq is monotone and
continuous regardless of whether we unify ⊥ with \x -> ⊥ x or insert an extra
element for the latter.

The currently problematic case is function spaces, so I'll focus on it. How
should:

    seq : (a -> b) -> c -> c

act? Well, other than an obvious bottom, we need to emit bottom whenever our
given function is itself bottom at every input. This may first seem like a
problem, but it is actually quite simple. Without loss of generality, let us
assume that we can enumerate the type a. Then, we can feed these values to the
function, and check their results for bottom. Conal Elliot has prior art for
this sort of thing with his unamb [2] package. For each value x :: a, simply
compute 'f x `seq` ()' in parallel, and look for any successes. If we ever find
one, we know the function is non-bottom, and we can return our value of c. If we
never finish searching, then the function must be bottom, and seq should not
terminate, so we have satisfied the specification.

Now, some may complain about the enumeration above. But this, too, is a simple
matter. It is well known that Haskell programs are denumerable. So it is quite
easy to enumerate all Haskell programs that produce a value, check whether that
value has the type we're interested in, and compute said value. All of this can
be done in Haskell. Thus, every Haskell type is programatically enumerable in
Haskell, and we can use said enumeration in our implementation of seq for
function types. I have discussed this with Russell O'Connor [3], and he assures
me that this argument should be sufficient even if we consider semantic models
of Haskell that contain values not denoted by any Haskell program, so we should
be safe there.

The one possible caveat is that this implementation of seq is not operationally
uniform across all types, so the current fully polymorphic type of seq may not
make sense. But moving back to a type class based approach isn't so bad, and
this time we will have a semantically sound backing, instead of just having a
class with the equivalent of the current magic function in it.

Once this machinery is in place, we can eta reduce to our hearts' content, and
not have to worry about breaking semantics. We'd no longer put the burden on
programmers to use potentially unsafe hacks to avoid eta expansions. I apologize
for not sketching an implementation of the above algorithm, but I'm sure it
should be elementary enough to make it into GHC in the next couple versions.
Everyone learns about this type of thing in university computer science
programs, no?

Thoughts? Comments? Questions?

Cheers,
-- Dan

[1] http://hackage.haskell.org/package/lens
[2] http://hackage.haskell.org/package/unamb
[3] http://r6.ca/

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe