
#12626: Remove redundant type applications in Core -------------------------------------+------------------------------------- Reporter: nomeata | Owner: Type: feature request | Status: new Priority: normal | Milestone: Component: Compiler | Version: 8.0.1 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): Replying to [comment:5 nomeata]:
What is special about a data constructor that allows this, that a function (like, say, `flip`) does not have?
Let me try to answer that myself: Essentially, you are proposing a way of compressing the representation of Core, by omitting (type) arguments that can easily and in a syntax- directed manner be recovered. You are implying here the existence of a pair of functions {{{ compressArgs :: DataCon -> [Expr v] -> [Expr v] decompressArgs :: DataCon -> [Expr v] -> [Expr v] }}} that remove and recover implied arguments. For example `compressArgs "Left" [Bool,Int,True] = [Bool, True]`. We obviously want to following identity to hold: {{{ length args == dcArity dc ==> decompressArgs dc (compressArgs dc args) == args }}} What does `compressArgs` and `decompressArgs` need from `DataCon`? Two bits of information: * It’s type (`forall a b. a -> Either a b`) and * its arity (3, counting type arguments). Well, `compressArgs` does not need the arity, because it is just the length of the input list, if the application is saturated. But `decompressArgs` does (why? see example later). So really, we have a pair of functions {{{ compressArgs :: Type -> [Expr v] -> [Expr v] decompressArgs :: Type -> Arity -> [Expr v] -> [Expr v] }}} Here, we want {{{ decompressArgs ty (length args) (compressArgs ty args) == args }}} And with this I can see how the above proposal easily generalizes to functions: Have {{{ | Apps (Expr v) Arity [Expr v] -- NEW }}} Instead of {{{f `App` x `App` y `App` z}}} you can use `Apps f 3 (compressArgs (exprType f) [x,y,z])`. No information is lost (because of the above identity), but any redundant information can be removed by `compressArgs`. Why do we need to store the arity? Because `compressArgs` can produce the same compressed list for different input lists: {{{ compressArgs "forall a. a -> a" [Bool,True] == [True] compressArgs "forall a. a -> a" [Bool] == [Bool] compressArgs "forall a. a -> a" [Type, Bool] == [Bool] }}} I imagine getting rid of all (well, many more) of the redundant type applications throughout Core can be big win, so maybe this generalization should be considered. Oh, and fun fact: It might be possible to remove `App` completely and replace it with a pattern synonym here: Due to `compressArgs ty [x] = [x]`, `Apps f 1 [e]` is equivalent to `App f e`. So this might not actually be a serious change to core, nor might it be increasing the number of constructors: Using a bidirectional smart constructor (which does optimizes the representation under the hood) this can hopefully be a completely transparent optimization of the representation. Using a pattern synonynm works almost; the problem is that the code should be generic in the binder, but then we have no way of knowing the binder’s type. But for what it’s worth: It seems to compile with this change: {{{ data Expr b = Var Id | Lit Literal -- | App (Expr b) (Arg b) | Apps (Expr b) Arity [Expr b] | Lam b (Expr b) | Let (Bind b) (Expr b) | Case (Expr b) b Type [Alt b] -- See #case_invariant# | Cast (Expr b) Coercion | Tick (Tickish Id) (Expr b) | Type Type | Coercion Coercion deriving Data unpackArgs :: Expr b -> Arity -> [Expr b] -> [Expr b] unpackArgs _ _ l = l -- do something smarter here packArgs :: Expr b -> [Expr b] -> [Expr b] packArgs _ l = l -- do something smarter here popArg :: Expr b -> Maybe (Expr b, Expr b) popArg (Apps e a xs) = case unpackArgs e a xs of [x] -> Just (e, x) xs -> Just (Apps e (a-1) (packArgs e (init xs)), last xs) popArg _ = Nothing pattern App e1 e2 <- (popArg -> Just (e1, e2)) where App e1 e2 | (f, args) <- collectArgs e1 = Apps f (length args +1) (packArgs f (args ++ [e2])) }}} I guess the solution to that is adding a type class, such as {{{ class CompressArgs b where unpackArgs :: Expr b -> Arity -> [Expr b] -> [Expr b] unpackArgs _ _ l = l packArgs :: Expr b -> [Expr b] -> [Expr b] packArgs _ l = l }}} or alternatively {{{ class HasType a where hasType :: Expr a -> Type }}} and adding instances for the two type of binders we have (`Var` and `TaggedBndr t`). With a few constraints added in various places (four modules only), this also compiles. It seems the remaining bit is to solve the staging issue: The instance should live in `CoreSyn`, but requires `expType` in `CoreUtils`. If that can be solved, the feature could be added quite painlessly. Then, in all places where the expression is traversed, one can any time make the decision to work on `Apps` directly, instead of `App`. I expect we’d get the memory performance boost we want (exciting!), but I also expect that this encoding/decoding in every traversal will cost runtime. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/12626#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler