
#12618: Add saturated constructor applications to Core -------------------------------------+------------------------------------- Reporter: simonpj | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Description changed by heisenbug: @@ -19,1 +19,1 @@ - So here's another idea, which came up in converstion at ICFP'16: '''add a + So here's another idea, which came up in conversation at ICFP'16: '''add a @@ -97,1 +97,1 @@ - * For `ConApp "(:)" [e1, e2]`, the type arument is just `exprType e1`. + * For `ConApp "(:)" [e1, e2]`, the type argument is just `exprType e1`. @@ -108,1 +108,1 @@ - A simple once-and-for-all analyis on the `DataCon` will establish how to + A simple once-and-for-all analysis on the `DataCon` will establish how to @@ -124,1 +124,1 @@ - types a `TyConApp` is not required to be satureted, but we could review + types a `TyConApp` is not required to be saturated, but we could review @@ -142,1 +142,1 @@ - We'll only know if we try it. I estimiate that it would take less than a + We'll only know if we try it. I estimate that it would take less than a New description: A long-standing performance bug in GHC is its behaviour on nested tuples. * The poster-child case is #5642. * The problem is that when you have a nested tuple application, e.g. `((4,True),('c','d'))`, the size of the type arguments grows quadratically with the expression size. This bites badly in some approaches to generic programming, which use nested tuples heavily. * It's explained in detail in Section 2.3 of [http://research.microsoft.com/en- us/um/people/simonpj/papers/variant-f/index.htm Scrap your type applications]. The same paper describes a solution, a modification of System F called System IF. It's neat, and I did once implement in in GHC. But it was quite complicated; most of the time the win was not big; and I don't know how it'd interact with casts, coercions, and type dependency in the latest version of GHC's Core. So here's another idea, which came up in conversation at ICFP'16: '''add a staturated constructor application to Core'''. So Core looks like {{{ data Expr v = Var v | App (Expr v) (Expr v) ... | ConApp DataCon [Expr v] -- NEW }}} Now I hate the idea of adding a new constructor to Core; I often brag about how few constructors it has. But the idea is this: * A `ConApp` is always saturated; think of it as an uncurried application. * Every data constructor has a wrapper Id. For simple constructors like `(:)`, the wrapper is just a curried version: {{{ (:) = /\a. \(x:a). \(y:[a]). (:) [x,y] }}} where the "`(:) [x,y]`" is just my concrete syntax for a `ConApp` node. * We are used to having an intro and elim form for each type former. So for `(->)` the intro form is `\x.e` and the elim form is `(e1 e2)`. For a data type like `Maybe`, the elim form is `case`, but what's the intro form? `ConApp` of course. * A `ConApp` mentions the `DataCon` explicitly, rather than having it buried inside the `IdDetails` of an `Id`, which somehow seems more honest. The proximate reason for doing this in the first place is to allow us to omit type arguments. Consider `Just True`. Curently we represent this as {{{ Var "Just" `App` Type boolTy `App` Var "True" }}} But with `ConApp` we can say {{{ ConApp "Just" [ConApp "True" []] }}} because it's easy to figure out the `boolTy` type argument from the argument. (We don't really have strings there, but you get the idea.) Can we omit ''all'' the type arguments? No: we can omit only those that appear free in any of the argument types. That is usally all of them (including existentials) but not always. Consider: {{{ data (,) a b where (,):: forall a b. a -> b -> (a,b) data [] a where [] :: forall a. [a] (:) :: forall a. a -> [a] -> [a] data Either a b where Left :: forall a b. a -> Either a b Right :: forall a b. b -> Either a b data (:=:) a b where Refl :: forall a b. (a~b) -> :=: a b data Foo where MkFoo :: a -> (a -> Int) -> Foo }}} For all of these data constructors except `[]` (nil), `Left` and `Right` we can omit all the type arguments, because we can recover them by simple matching against the types of the arguments. A very concrete way to think about this is how to implement {{{ exprType :: Expr Id -> Type exprType (Var v) = varType v exprType (Lam b e) = mkFunTy (varType b) (exprType e) exprType (App f a) = funResultTy (exprType f) ... exprType (ConApp con args) = mkTyConApp (dataConTyCon con) ??? }}} We know that the result type of type of a `ConApp` will be `T t1 ..tn` where `T` is the parent type constructor of the `DataCon`. But what about the (universal) type args `???`? We can get them from the types of the arguments `map exprType args`: * For `ConApp "(:)" [e1, e2]`, the type argument is just `exprType e1`. * For `ConApp "(:=:)" [e]`", we expect `exprType e` to return a type looking like `TyConApp "~" [t1, t2]`. Then `t1` and `t2` are the types we want. So matching is required. * What about an application of `Left`?? We need to recover two type args, but `exprType e1` gives us only one. So we must retain the other one in the application: `ConApp "Left" [Type ty2, e1]`. Similarly for the empty list. A simple once-and-for-all analysis on the `DataCon` will establish how to do the matching, which type args to retain, etc. Tradeoffs: * Pro: We can eliminate almost all type args of almost all data constructors; and for nested tuples we can eliminate all of them. * Pro: it's elegant having the intro/elim duality. * Pro: in GHC we often ask "is this expression a saturated constructor application?" (see `exprIsConApp_maybe`) and `ConApp` makes it easier to answer that question. * Pro: we do exactly this in types: we have `AppTy` and `TyConApp`. (In types a `TyConApp` is not required to be saturated, but we could review that choice.) * Con: adding a constructor is a big deal. In lots of places we'd end up saying {{{ f (App e1 e2) = App (f e1) (f e2) f (ConApp c es) = ConApp c (map f es) }}} In the olden days GHC's `App` had multiple arguments and the continual need to work over the list was a bit tiresome. Still `ConApp` is very simple and uniform; quite often adding `map` won't be difficult; and it may well be that constructors need to be treated differently anyway. * Con: it's not a general solution to the type-argument problem. See #9198 for example. System IF is the only general solution I know, but it seems like too big a sledgehammer. We'll only know if we try it. I estimate that it would take less than a week to work this change through all of GHC. 90% of it is routine. Other possibly-relevant tickets are #8523, #7428, #9630. -- -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12618#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler