Computing the final representation type of a TyCon (Was: Unpack primitive types by default in data)

29 Nov 2012

      Hi all,

I've decided to try to implement the proposal included in the end of
this message. To do so I need to write a function

    hasPointerSizedRepr :: TyCon -> Bool

This function would check that that the TyCon is either

 * a newtype, which representation type has a pointer-sized representation, or
 * an algebraic data type, with one field that has a pointer-sized
representation.

I'm kinda lost in all the data types that GHC defines to represent
types. I've gotten no further than

    hasPointerSizedRepr :: TyCon -> Bool
    hasPointerSizedRepr tc@(AlgTyCon {}) = case algTcRhs tc of
                                             DataTyCon{ data_cons = [data_con] }
                                                         -> ...
                                             NewTyCon { data_con = [data_con] }
                                                         -> ...
                                             _           -> False
    hasPointerSizedRepr _                = False

I could use some pointers (no pun intended!) at this point. The
function ought to return True for all the following types:

    data A = A Int#
    newtype B = B A
    data C = C !B
    data D = D !C
    data E = E !()
    data F = F !D

One part that confuses me is figuring out the representation type of a
data constructor after unpacking. For example, the function should not
return true if called on G in this example:

    data G = G !H
    data H = H {-# UNPACK #-} !I
    data I = I !Int !Int

because if we unpacked H into G's constructor it would take up two
words, due to I being unpacked.

Does DataCon contain the unpacked representation of the data
constructor or only the before-optimizations representation?

Cheers,
Johan

On Thu, Feb 16, 2012 at 4:25 PM, Johan Tibell  wrote:
...
Hi all,
I've been thinking about this some more and I think we should
definitely unpack primitive types (e.g. Int, Word, Float, Double,
Char) by default.
The worry is that reboxing will cost us, but I realized today that at
least one other language, Java, does this already today and even
though it hurts performance in some cases, it seems to be a win on
average. In Java all primitive fields get auto-boxed/unboxed when
stored in polymorphic fields (e.g. in a HashMap which stores keys and
fields as Object pointers.) This seems analogous to our case, except
we might also unbox when calling lazy functions.
Here's an idea of how to test this hypothesis:
1. Get a bunch of benchmarks.
 2. Change GHC to make UNPACK a no-op for primitive types (as library
authors have already worked around the lack of unpacking by using this
pragma.)
 3. Run the benchmarks.
 4. Change GHC to always unpack primitive types (regardless of the
presence of an UNPACK pragma.)
 5. Run the benchmarks.
 6. Compare the results.
Number (1) might be what's keeping us back right now, as we feel that
we don't have a good benchmark set. I suggest we try with nofib first
and see if there's a different and then move on to e.g. the shootout
benchmarks.
I imagine that ignoring UNPACK pragmas selectively wouldn't be too
hard. Where the relevant code?
Cheers,
Johan

Johan Tibell

Johan Tibell

tags

participants (1)