Computing the final representation type of a TyCon (Was: Unpack primitive types by default in data)

Hi all,
I've decided to try to implement the proposal included in the end of
this message. To do so I need to write a function
hasPointerSizedRepr :: TyCon -> Bool
This function would check that that the TyCon is either
* a newtype, which representation type has a pointer-sized representation, or
* an algebraic data type, with one field that has a pointer-sized
representation.
I'm kinda lost in all the data types that GHC defines to represent
types. I've gotten no further than
hasPointerSizedRepr :: TyCon -> Bool
hasPointerSizedRepr tc@(AlgTyCon {}) = case algTcRhs tc of
DataTyCon{ data_cons = [data_con] }
-> ...
NewTyCon { data_con = [data_con] }
-> ...
_ -> False
hasPointerSizedRepr _ = False
I could use some pointers (no pun intended!) at this point. The
function ought to return True for all the following types:
data A = A Int#
newtype B = B A
data C = C !B
data D = D !C
data E = E !()
data F = F !D
One part that confuses me is figuring out the representation type of a
data constructor after unpacking. For example, the function should not
return true if called on G in this example:
data G = G !H
data H = H {-# UNPACK #-} !I
data I = I !Int !Int
because if we unpacked H into G's constructor it would take up two
words, due to I being unpacked.
Does DataCon contain the unpacked representation of the data
constructor or only the before-optimizations representation?
Cheers,
Johan
On Thu, Feb 16, 2012 at 4:25 PM, Johan Tibell
Hi all,
I've been thinking about this some more and I think we should definitely unpack primitive types (e.g. Int, Word, Float, Double, Char) by default.
The worry is that reboxing will cost us, but I realized today that at least one other language, Java, does this already today and even though it hurts performance in some cases, it seems to be a win on average. In Java all primitive fields get auto-boxed/unboxed when stored in polymorphic fields (e.g. in a HashMap which stores keys and fields as Object pointers.) This seems analogous to our case, except we might also unbox when calling lazy functions.
Here's an idea of how to test this hypothesis:
1. Get a bunch of benchmarks. 2. Change GHC to make UNPACK a no-op for primitive types (as library authors have already worked around the lack of unpacking by using this pragma.) 3. Run the benchmarks. 4. Change GHC to always unpack primitive types (regardless of the presence of an UNPACK pragma.) 5. Run the benchmarks. 6. Compare the results.
Number (1) might be what's keeping us back right now, as we feel that we don't have a good benchmark set. I suggest we try with nofib first and see if there's a different and then move on to e.g. the shootout benchmarks.
I imagine that ignoring UNPACK pragmas selectively wouldn't be too hard. Where the relevant code?
Cheers, Johan

Hi,
I've created an initial implementation that seems to work. I'd
appreciate it if someone could review the code (it's short!) to tell
me if it's sane, can be improved, etc:
https://github.com/tibbe/ghc/commit/6b44024173eae3029b7b43f7cc9fc7d9d801c367
On Thu, Nov 29, 2012 at 12:27 AM, Johan Tibell
I've decided to try to implement the proposal included in the end of this message. To do so I need to write a function
hasPointerSizedRepr :: TyCon -> Bool
This function would check that that the TyCon is either
* a newtype, which representation type has a pointer-sized representation, or * an algebraic data type, with one field that has a pointer-sized representation.
I'm kinda lost in all the data types that GHC defines to represent types. I've gotten no further than
hasPointerSizedRepr :: TyCon -> Bool hasPointerSizedRepr tc@(AlgTyCon {}) = case algTcRhs tc of DataTyCon{ data_cons = [data_con] } -> ... NewTyCon { data_con = [data_con] } -> ... _ -> False hasPointerSizedRepr _ = False
I could use some pointers (no pun intended!) at this point. The function ought to return True for all the following types:
data A = A Int# newtype B = B A data C = C !B data D = D !C data E = E !() data F = F !D
One part that confuses me is figuring out the representation type of a data constructor after unpacking. For example, the function should not return true if called on G in this example:
data G = G !H data H = H {-# UNPACK #-} !I data I = I !Int !Int
because if we unpacked H into G's constructor it would take up two words, due to I being unpacked.
Does DataCon contain the unpacked representation of the data constructor or only the before-optimizations representation?
participants (1)
-
Johan Tibell