[GHC] #14826: Flatten data types extending other data types in STG

#14826: Flatten data types extending other data types in STG -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: feature | Status: new request | Priority: low | Milestone: Component: Compiler | Version: 8.5 Keywords: | Operating System: Unknown/Multiple Architecture: | Type of failure: None/Unknown Unknown/Multiple | Test Case: | Blocked By: Blocking: | Related Tickets: Differential Rev(s): | Wiki Page: -------------------------------------+------------------------------------- This idea was triggered by https://mail.haskell.org/pipermail/haskell- cafe/2018-February/128570.html although it does not solve that particular case. Hence I don’t know if there is any practical use for it, but I wanted to jot it down. Consider a data type {{{ data Result = Ok !Foo | NotOK Error }}} where `Foo` is a concrete algebraic data type with $n$ constructors. Then the following transformation should be safe: * Do not generate an info table for `Ok` * Give the `NotOK` the constructor number $n+1$. * Replace calls to `Ok` with `id` * Replace {{{ case r as b of Of f -> E[b,f] | NotOk e -> E[b,e] }}} with {{{ case r as b of DEFAULT -> E[b,b] | NotOk e -> E[b,e] }}} This effectively makes every constructor or `Foo` a constructor of `Ok`, and eliminates the pointer indirection introduced by `Foo`. Checking if a `Result` is `Ok` is now a simple check of the pointer tag (if `Foo` and `Result` do not have too many constructors). Note that `Result` could have additional constructors, but only one can be eliminated. This one constructor needs to * have one argument * be strict in that argument * that argument must be an algebraic data type (even after this flattening) does not have `NotOk` as a constructor. We currently cannot do this in `Core`, but we could if our safe coercions were directed. Do you think you have a case where this could give a performance boost? Maybe some parser library? You can try this now using pattern synonyms and `unsafeCoerce`. If you think there is something to be gained here, then we can consider implementing this. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14826 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14826: Flatten data types extending other data types in STG -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: feature request | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): This might be useful to be able to do {{{ data IntSet = NonEmpty !NonEmptyIntSet | Tip data NonEmptyIntSet = Bin … | Leaf … }}} and get more expressive types than the current {{{ data IntSet = Bin … | Leaf … | Tip }}} without having to pay a performance penalty for it. Or even {{{ data [a] = Nil | NonEmpty !(NEList a) data (NEList a) = Cons a [a] }}} (no, I am not proposing to change the `Prelude` :-)) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14826#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14826: Flatten data types extending other data types in STG -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: feature request | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by sgraf): As I understand your proposal, this will re-use info pointers. Is that right? If so, is that a requirement? I imagine we could encode something more general by lifting any data constructor consisting purely of an unboxed sum of arity n to n distinct constructor tags. That way, we can drop the 'only once' requirement. Well, I guess this still involves non-`id` coercions... Still, we could encode the 'only once' thing, where we re-use constructor tags, as a special case to get erasable coercions. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14826#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14826: Flatten data types extending other data types in STG -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: feature request | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): One might be able to state this in terms of unboxed sums, yes.
That way, we can drop the 'only once' requirement. Well, I guess this still involves non-id coercions...
You mean the requirement that only one `Foo` can be flattened into a `Result`? Yes, if you do this with multiple ones you either get non-id coercions (which party defeat the purpose). Or, of all types are defined in the same module, you can maybe to a smart constructor tag allocation scheme where `Foo` gets tags 1…3, `Bar` gets tags 4…6, and `data FooBar = Foo !Foo | Bar !Bar` can now be flattened simply to a pointer to a `Foo` or a `Bar` value. But let’s focus on the simple case first that does not require a global constructor pointer allocation pass :-) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14826#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14826: Flatten data types extending other data types in STG -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: feature request | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata): A real-life example where people use horrible unsafe tricks to achieve essentially this: https://github.com/gregorycollins/hashtables/blob/fd385db6c81e157e2922e74664... -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14826#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14826: Flatten data types extending other data types in STG -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: feature request | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by simonpj): Interesting. Consider what happens today with the Haskell source {{{ data T = MkT {-# UNPACK #-} !Int Bool f (MkT n b) = MkT (n+1) b }}} We get the following Core {{{ data T = MkT Int# Bool $WMkT :: Int -> Bool -> T $WMkT n = case n of I# n# -> MkT n# b f x = case x of MkT n# b -> let n = I# n# in $WMkT (n+1) b }}} That is * The "real" Core data constructor `MkT` has an `Int#` field. * The wrapper `$WMkT` is an ordinary function that unboxes the field and calls the "real" data constructor * Pattern matching is desugared to rebox the field You could do the same thing here. With your example source {{{ data Foo = A Int | B Bool | C data Result = Ok !Foo | NotOK Error f :: Result -> Int f (Ok x) = wimwam x True x f (NotOk _) = 0 }}} you might desugar to {{{ data Result = Ok_A Int | OK_B Bool | OK_C C | NotOK Error f r = join j x = wimwam x True x in case r of OK_A n -> jump j (A n) OK_B b -> jump j (B b) OK_C -> jump j C NOtOK _ -> 0 }}} Note that I am NOT re-using the data constructor for `Foo` (which would be tricky and confusing). I'm simply generating new ones. ----------------- I think this would be do-able. It uses basically the same concepts as now, but generalises a bit. I wonder if we could leverage pattern synonyms rather than have more built-in stuff. -------------- A concern: it could blow up {{{ data T = MkT !(Maybe Int) !(Maybe Int) }}} Do we generate four constructors? In general a multiplicative number? You suggest just one field, but it'd be a shame of your perf went down the tubes because you added an innocuous field. And actually two or more is fine, provide only one expands to a multiplicity: {{{ data T = MkT !Char !(Maybe Int) Bool }}} is absolutely ok, desugaring to {{{ data T = MkT_Just Char# Int Bool | MkT_Nothing Char# Bool -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14826#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14826: Flatten data types extending other data types in STG -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: feature request | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Comment (by nomeata):
Note that I am NOT re-using the data constructor for Foo (which would be tricky and confusing).
Why not? In fact, with your desguaring, and #13861, we’d get that: {{{ data Result = Ok_A Int | OK_B Bool | OK_C C | NotOK Error f r = join j x = wimwam x True x in case r as b of OK_A n -> jump j b OK_B b -> jump j b OK_C -> jump j b NOtOK _ -> 0 }}}
I wonder if we could leverage pattern synonyms rather than have more built-in stuff.
Yes, your encoding is all doable with pattern synonyms already: {{{ data Foo = A Int | B Bool | C -- we want: data Result = Ok !Foo | NotOK Error data Result = Ok_A Int | OK_B Bool | OK_C | NotOK Error pattern Ok :: Foo -> Result pattern Ok f <- ((\case Ok_A i -> Just (A i); OK_B b -> Just (B b); OK_C -> Just C; _ -> Nothing) -> Just f) where Ok = \case A i -> Ok_A i; B b -> OK_B b; C -> OK_C }}} or, if people feel bold (and until #13861 does this automatically) {{{ pattern Ok :: Foo -> Result pattern Ok f <- ((\case NotOK _ -> Nothing; foo -> Just (unsafeCoerce foo)) -> Just f) where Ok = unsafeCoerce }}} So since this is somewhat nicely possible, it makes sense for people to play around with either of these encodings using pattern synonyms, and let us know if they notice any gains. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14826#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#14826: Flatten data types extending other data types in STG -------------------------------------+------------------------------------- Reporter: nomeata | Owner: (none) Type: feature request | Status: new Priority: low | Milestone: Component: Compiler | Version: 8.5 Resolution: | Keywords: Operating System: Unknown/Multiple | Architecture: | Unknown/Multiple Type of failure: None/Unknown | Test Case: Blocked By: | Blocking: Related Tickets: | Differential Rev(s): Wiki Page: | -------------------------------------+------------------------------------- Changes (by sgraf): * cc: sgraf (added) -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/14826#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC