Unpacking multi-type constructors

I think this is a feasible idea. Whether this is in fact a good idea is an entirely separate question, however, and I want feedback. Currently, I believe that an UNPACK pragma used on a multi-constructor type has no effect. For example, data Node = Node Int {-# UNPACK #-} !(Maybe Node) {-# UNPACK #-} !(Maybe Node) is the same as data Node = Node Int !(Maybe Node) !(Maybe Node) What I'd like is for this to translate into four constructors, one for each combination of constructors for the UNPACK'd fields: data Node = Node0 Int -- Nothing, Nothing | Node1 Int Node -- Just, Nothing | Node2 Int Node -- Nothing, Just | Node3 Int Node Node -- Just, Just The primary counterargument I can think of is that this can result in a single-constructor type being turned into a multi-constructor type, which runs the risk of interfering with GHC's sexcellent handling of single-constructor types. The countercounterargument is, of course, that the {-# UNPACK #-} pragma already behaves differently depending on the single-constructorness of the underlying type, and that the obligation is already on the programmer to check such things. For reference, could somebody point me to the place in GHC that currently takes care of {-# UNPACK #-} pragmas, so I could -- for instance -- figure out whether or not there's another reason that this idea isn't in place already? Louis Wasserman wasserman.louis@gmail.com

Nice idea, but I'm not yet convinced that it would work well in practice. First, there's an exponential explosion in the number of constructors required. Anything with an exponential is worrying. Second, you presumably do not want massive code duplication. So I assume that if you start with case x of { Node i n2 n2) -> e then you'll desugar to something like this: let fj i n1 n2 = e in case x of Node1 I -> fj i Nothing Nothing Node2 I n1 -> fj I (Just n1) Nothing ... etc for other cases... This doesn't look like a win, because you end up allocating the (Just n1) things anyway! So maybe you want to *duplicate* e, so you generate case x of Node1 I -> e[Nothing/n1, Nothing/n2] Node2 I n1 -> e[Just n1/n1, Nothing/n2] ... etc for other cases... But now you not only have an exponential number of branches: in each of those branches you duplicate an arbitrarily large expression e. That doesn't look attractive to me. Simon From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On Behalf Of Louis Wasserman Sent: 21 July 2009 00:20 To: glasgow-haskell-users@haskell.org Subject: Unpacking multi-type constructors I think this is a feasible idea. Whether this is in fact a good idea is an entirely separate question, however, and I want feedback. Currently, I believe that an UNPACK pragma used on a multi-constructor type has no effect. For example, data Node = Node Int {-# UNPACK #-} !(Maybe Node) {-# UNPACK #-} !(Maybe Node) is the same as data Node = Node Int !(Maybe Node) !(Maybe Node) What I'd like is for this to translate into four constructors, one for each combination of constructors for the UNPACK'd fields: data Node = Node0 Int -- Nothing, Nothing | Node1 Int Node -- Just, Nothing | Node2 Int Node -- Nothing, Just | Node3 Int Node Node -- Just, Just The primary counterargument I can think of is that this can result in a single-constructor type being turned into a multi-constructor type, which runs the risk of interfering with GHC's sexcellent handling of single-constructor types. The countercounterargument is, of course, that the {-# UNPACK #-} pragma already behaves differently depending on the single-constructorness of the underlying type, and that the obligation is already on the programmer to check such things. For reference, could somebody point me to the place in GHC that currently takes care of {-# UNPACK #-} pragmas, so I could -- for instance -- figure out whether or not there's another reason that this idea isn't in place already? Louis Wasserman wasserman.louis@gmail.commailto:wasserman.louis@gmail.com

Might be interesting to try the transformation manually and benchmark? simonpj:
Nice idea, but I’m not yet convinced that it would work well in practice.
First, there’s an exponential explosion in the number of constructors required. Anything with an exponential is worrying.
Second, you presumably do not want massive code duplication. So I assume that if you start with
case x of { Node i n2 n2) -> e
then you’ll desugar to something like this:
let fj i n1 n2 = e
in
case x of
Node1 I -> fj i Nothing Nothing
Node2 I n1 -> fj I (Just n1) Nothing
… etc for other cases…
This doesn’t look like a win, because you end up allocating the (Just n1) things anyway!
So maybe you want to *duplicate* e, so you generate
case x of
Node1 I -> e[Nothing/n1, Nothing/n2]
Node2 I n1 -> e[Just n1/n1, Nothing/n2]
… etc for other cases…
But now you not only have an exponential number of branches: in each of those branches you duplicate an arbitrarily large expression e. That doesn’t look attractive to me.
Simon
From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell-users-bounces@haskell.org] On Behalf Of Louis Wasserman Sent: 21 July 2009 00:20 To: glasgow-haskell-users@haskell.org Subject: Unpacking multi-type constructors
I think this is a feasible idea. Whether this is in fact a good idea is an entirely separate question, however, and I want feedback.
Currently, I believe that an UNPACK pragma used on a multi-constructor type has no effect. For example,
data Node = Node Int {-# UNPACK #-} !(Maybe Node) {-# UNPACK #-} !(Maybe Node)
is the same as
data Node = Node Int !(Maybe Node) !(Maybe Node)
What I'd like is for this to translate into four constructors, one for each combination of constructors for the UNPACK'd fields:
data Node = Node0 Int -- Nothing, Nothing
| Node1 Int Node -- Just, Nothing | Node2 Int Node -- Nothing, Just | Node3 Int Node Node -- Just, Just
The primary counterargument I can think of is that this can result in a single-constructor type being turned into a multi-constructor type, which runs the risk of interfering with GHC's sexcellent handling of single-constructor types.
The countercounterargument is, of course, that the {-# UNPACK #-} pragma already behaves differently depending on the single-constructorness of the underlying type, and that the obligation is already on the programmer to check such things.
For reference, could somebody point me to the place in GHC that currently takes care of {-# UNPACK #-} pragmas, so I could -- for instance -- figure out whether or not there's another reason that this idea isn't in place already?
Louis Wasserman wasserman.louis@gmail.com
_______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Some very rough benchmarks: folding over these two implementations of a
binary tree
data Node1 a = Node1 a (Maybe (Node1 a)) (Maybe (Node1 a))
data Node2 a = Node2A a | Node2B a (Node2 a) | Node2C a (Node2 a) | Node2D a
(Node2 a) (Node2 a)
with the latter of Simon's examples gives the following measurements when
performing 100 traversals over randomly generated trees of size 3000:
min mean +/-sd median max
Packed: 12.001 16.001 4.000 16.001 24.002
Unpacked: 12.000 13.144 1.952 12.001 16.001
(I'm not entirely sure what I'm doing, as this is my first contribution to
the compiler proper, and pointers as to how to obtain more convincing
benchmarks would be appreciated.)
My motivation is more philosophical than anything else, though:
- The behavior of unpacking several multi-constructor types can be highly
useful to the programmer, and is also (precisely because of the exponential
growth) difficult and time-consuming for the programmer to duplicate by
hand.
- Currently, UNPACK pragmas have *no effect* when used on
multi-constructor types, and give no warnings or other hints that they are
having no effect. There is currently no reason for a programmer to use an
UNPACK pragma on a multi-constructor type, so I would not expect to see code
bloat occurring in older programs. The primary exception to this would be
in programmers that use the (already regarded as a sledgehammer)
-funbox-strict-fields option.
- As it stands, it is already the programmer's burden to know how many
constructors an UNPACK'd type has, because the pragma will only have any
effect if it is a single-constructor type. The effect of the proposal is to
attach consequences to UNPACKing multi-constructor types, and placing the
burden on the programmer to be sure that exponentially great code expansion
does not get out of hand.
The first two bullets are really the ones that grate on me -- that use of
the UNPACK pragma in this fashion on multi-constructor types could be
seriously useful to a programmer, but currently has no effect, and as a
result, I would expect that implementing this proposal wouldn't cause
problems in old programs and could be useful in new ones.
I think I'm making sense. Would anyone else care to chime in?
Louis Wasserman
wasserman.louis@gmail.com
On Tue, Jul 21, 2009 at 6:38 PM, Don Stewart
Might be interesting to try the transformation manually and benchmark?
participants (3)
-
Don Stewart
-
Louis Wasserman
-
Simon Peyton-Jones