Converting unboxed sum types in StgCmm

Hi! The original idea for implementing the backend part of the unboxed sums proposal was to convert from the core representation to the actual data representation (i.e. a tag followed by some pointer and non-pointer fields) in the unarise stg-to-stg https://github.com/ghc/ghc/blob/master/compiler/simplStg/UnariseStg.hs pass. I have now realized that this won't work. The problem is that stg is too strongly typed. When we "desugar" sum types we need to convert functions receiving a value e.g. from f :: (# Bool | Char #) -> ... to f :: NonPointer {-# tag#-} -> Pointer {-# Bool or Char #-} -> ... Since stg is still typed with normal Haskell types (e.g. Bool, Char, etc), this is not possible, as we cannot represent an argument which has two different types. It seems to me that we will have to do the conversion in the stg-to-cmm https://github.com/ghc/ghc/blob/master/compiler/codeGen/StgCmm.hs pass, which is quite a bit more involved. For example, StgCmmEnv.idToReg function will have to change from idToReg :: DynFlags -> NonVoid Id -> LocalReg to idToReg :: DynFlags -> NonVoid Id -> [LocalReg] to accommodate the fact that we might need more than one register to store a binder. Any ideas for a better solution? -- Johan

I wonder if rewriting any aliased pointer field as Any in Stg and any
non-pointer field as Word# would work. I suspect that not all non-pointer
fields (e.g. Double# on 32-bit) can be represented as Word#.
On Wed, Sep 9, 2015 at 3:22 PM, Johan Tibell
Hi!
The original idea for implementing the backend part of the unboxed sums proposal was to convert from the core representation to the actual data representation (i.e. a tag followed by some pointer and non-pointer fields) in the unarise stg-to-stg https://github.com/ghc/ghc/blob/master/compiler/simplStg/UnariseStg.hs pass.
I have now realized that this won't work. The problem is that stg is too strongly typed. When we "desugar" sum types we need to convert functions receiving a value e.g. from
f :: (# Bool | Char #) -> ...
to
f :: NonPointer {-# tag#-} -> Pointer {-# Bool or Char #-} -> ...
Since stg is still typed with normal Haskell types (e.g. Bool, Char, etc), this is not possible, as we cannot represent an argument which has two different types.
It seems to me that we will have to do the conversion in the stg-to-cmm https://github.com/ghc/ghc/blob/master/compiler/codeGen/StgCmm.hs pass, which is quite a bit more involved. For example, StgCmmEnv.idToReg function will have to change from
idToReg :: DynFlags -> NonVoid Id -> LocalReg
to
idToReg :: DynFlags -> NonVoid Id -> [LocalReg]
to accommodate the fact that we might need more than one register to store a binder.
Any ideas for a better solution?
-- Johan

Some of the SSE types are too big for that even on 64 bit, I think.
Like DoubleX8#.
On Thu, Sep 10, 2015 at 1:16 AM, Johan Tibell
I wonder if rewriting any aliased pointer field as Any in Stg and any non-pointer field as Word# would work. I suspect that not all non-pointer fields (e.g. Double# on 32-bit) can be represented as Word#.
On Wed, Sep 9, 2015 at 3:22 PM, Johan Tibell
wrote: Hi!
The original idea for implementing the backend part of the unboxed sums proposal was to convert from the core representation to the actual data representation (i.e. a tag followed by some pointer and non-pointer fields) in the unarise stg-to-stg pass.
I have now realized that this won't work. The problem is that stg is too strongly typed. When we "desugar" sum types we need to convert functions receiving a value e.g. from
f :: (# Bool | Char #) -> ...
to
f :: NonPointer {-# tag#-} -> Pointer {-# Bool or Char #-} -> ...
Since stg is still typed with normal Haskell types (e.g. Bool, Char, etc), this is not possible, as we cannot represent an argument which has two different types.
It seems to me that we will have to do the conversion in the stg-to-cmm pass, which is quite a bit more involved. For example, StgCmmEnv.idToReg function will have to change from
idToReg :: DynFlags -> NonVoid Id -> LocalReg
to
idToReg :: DynFlags -> NonVoid Id -> [LocalReg]
to accommodate the fact that we might need more than one register to store a binder.
Any ideas for a better solution?
-- Johan
_______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

The problem is that stg is too strongly typed It’s not really typed, or at least only in a very half-hearted way. To be concrete I think you can just use Any for any Pointer arg. All STG needs to know, really, is which things are pointers. Detailed type info like “are you a Char or a Bool” is strictly jam; indeed never used I think. (I could be wrong but I’m pretty sure I’m not wrong in a fundamental way. SImon From: Johan Tibell [mailto:johan.tibell@gmail.com] Sent: 09 September 2015 23:22 To: Simon Peyton Jones; Simon Marlow; ghc-devs@haskell.org Subject: Converting unboxed sum types in StgCmm Hi! The original idea for implementing the backend part of the unboxed sums proposal was to convert from the core representation to the actual data representation (i.e. a tag followed by some pointer and non-pointer fields) in the unarise stg-to-stghttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fsimplStg%2fUnariseStg.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=U%2bFUNsL87iEemajTnAW9SxD9N5b4%2bG8QB1q19%2fX%2bBI4%3d pass. I have now realized that this won't work. The problem is that stg is too strongly typed. When we "desugar" sum types we need to convert functions receiving a value e.g. from f :: (# Bool | Char #) -> ... to f :: NonPointer {-# tag#-} -> Pointer {-# Bool or Char #-} -> ... Since stg is still typed with normal Haskell types (e.g. Bool, Char, etc), this is not possible, as we cannot represent an argument which has two different types. It seems to me that we will have to do the conversion in the stg-to-cmmhttps://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fcodeGen%2fStgCmm.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=aXKZ78eGNKbJ6eZkxZgyJHgsAXpgOBjg3Zvqj%2bq7pk0%3d pass, which is quite a bit more involved. For example, StgCmmEnv.idToReg function will have to change from idToReg :: DynFlags -> NonVoid Id -> LocalReg to idToReg :: DynFlags -> NonVoid Id -> [LocalReg] to accommodate the fact that we might need more than one register to store a binder. Any ideas for a better solution? -- Johan

I'll give that a try. The main use of the stg types in the stg-to-cmm pass
is to call idPrimRep (which call typePrimRep) to figure out which register
type we need to use. I guess as long as I rewrite the stg types so they
give me the typePrimRep I want in the end I should be fine.
On Thu, Sep 10, 2015 at 2:37 AM, Simon Peyton Jones
The problem is that stg is too strongly typed
It’s not really typed, or at least only in a very half-hearted way. To be concrete I think you can just use Any for any Pointer arg. All STG needs to know, really, is which things are pointers. Detailed type info like “are you a Char or a Bool” is strictly jam; indeed never used I think. (I could be wrong but I’m pretty sure I’m not wrong in a fundamental way.
SImon
*From:* Johan Tibell [mailto:johan.tibell@gmail.com] *Sent:* 09 September 2015 23:22 *To:* Simon Peyton Jones; Simon Marlow; ghc-devs@haskell.org *Subject:* Converting unboxed sum types in StgCmm
Hi!
The original idea for implementing the backend part of the unboxed sums proposal was to convert from the core representation to the actual data representation (i.e. a tag followed by some pointer and non-pointer fields) in the unarise stg-to-stg https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fsimplStg%2fUnariseStg.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=U%2bFUNsL87iEemajTnAW9SxD9N5b4%2bG8QB1q19%2fX%2bBI4%3d pass.
I have now realized that this won't work. The problem is that stg is too strongly typed. When we "desugar" sum types we need to convert functions receiving a value e.g. from
f :: (# Bool | Char #) -> ...
to
f :: NonPointer {-# tag#-} -> Pointer {-# Bool or Char #-} -> ...
Since stg is still typed with normal Haskell types (e.g. Bool, Char, etc), this is not possible, as we cannot represent an argument which has two different types.
It seems to me that we will have to do the conversion in the stg-to-cmm https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fcodeGen%2fStgCmm.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=aXKZ78eGNKbJ6eZkxZgyJHgsAXpgOBjg3Zvqj%2bq7pk0%3d pass, which is quite a bit more involved. For example, StgCmmEnv.idToReg function will have to change from
idToReg :: DynFlags -> NonVoid Id -> LocalReg
to
idToReg :: DynFlags -> NonVoid Id -> [LocalReg]
to accommodate the fact that we might need more than one register to store a binder.
Any ideas for a better solution?
-- Johan

On 10/09/2015 10:37, Simon Peyton Jones wrote:
The problem is that stg is too strongly typed
It’s not really typed, or at least only in a very half-hearted way. To be concrete I think you can just use Any for any Pointer arg. All STG needs to know, really, is which things are pointers. Detailed type info like “are you a Char or a Bool” is strictly jam; indeed never used I think. (I could be wrong but I’m pretty sure I’m not wrong in a fundamental way.
Yes, the only thing the code generator needs to do with types is convert them to PrimReps (see idPrimRep), and all GC pointer types have the same PrimRep (PtrRep). Cheers Simon
SImon
*From:*Johan Tibell [mailto:johan.tibell@gmail.com] *Sent:* 09 September 2015 23:22 *To:* Simon Peyton Jones; Simon Marlow; ghc-devs@haskell.org *Subject:* Converting unboxed sum types in StgCmm
Hi!
The original idea for implementing the backend part of the unboxed sums proposal was to convert from the core representation to the actual data representation (i.e. a tag followed by some pointer and non-pointer fields) in the unarise stg-to-stg https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fsimplStg%2fUnariseStg.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=U%2bFUNsL87iEemajTnAW9SxD9N5b4%2bG8QB1q19%2fX%2bBI4%3d pass.
I have now realized that this won't work. The problem is that stg is too strongly typed. When we "desugar" sum types we need to convert functions receiving a value e.g. from
f :: (# Bool | Char #) -> ...
to
f :: NonPointer {-# tag#-} -> Pointer {-# Bool or Char #-} -> ...
Since stg is still typed with normal Haskell types (e.g. Bool, Char, etc), this is not possible, as we cannot represent an argument which has two different types.
It seems to me that we will have to do the conversion in the stg-to-cmm https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fcodeGen%2fStgCmm.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=aXKZ78eGNKbJ6eZkxZgyJHgsAXpgOBjg3Zvqj%2bq7pk0%3d pass, which is quite a bit more involved. For example, StgCmmEnv.idToReg function will have to change from
idToReg :: DynFlags -> NonVoid Id -> LocalReg
to
idToReg :: DynFlags -> NonVoid Id -> [LocalReg]
to accommodate the fact that we might need more than one register to store a binder.
Any ideas for a better solution?
-- Johan

I took a stab at this but ran into something I don't understand. For
recence, the whole implementation of unboxed sums is at
https://github.com/ghc/ghc/compare/master...tibbe:unboxed-sums and the
implementation of unarisation is at
https://github.com/ghc/ghc/compare/master...tibbe:unboxed-sums#diff-f5bc1f9e...
.
Running the compiler on the following file:
{-# LANGUAGE UnboxedSums #-}
module Test where
f :: (# Int | Char #) -> Int
f (# x | #) = x
{-# NOINLINE f #-}
g = f (# 1 | #)
Yields an error, like so:
ghc-stage2: panic! (the 'impossible' happened)
(GHC version 7.11.20150912 for x86_64-apple-darwin):
StgCmmEnv: variable not found
ds_svq
local binds for:
ds_gvz
ds_gvA
I probably got something wrong in UnariseStg, but I can't see what. I
printed this debug information to see the stg I'm rewriting:
unarise
[f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
\r srt:SRT:[0e :-> patError] [ds_svq]
case ds_svq of _ [Occ=Dead] {
(#_|#) x_svs [Occ=Once] -> x_svs;
(#|_#) _ [Occ=Dead] -> patError "UnboxedSum.hs:5:1-15|function
f"#;
};,
g :: Int
[GblId, Str=DmdType] =
\u srt:SRT:[r1 :-> f] []
let {
sat_svu [Occ=Once] :: Int
[LclId, Str=DmdType] =
NO_CCS I#! [1#];
} in
case (#_|#) [sat_svu] of sat_svv { __DEFAULT -> f sat_svv; };]
unariseAlts
[81 :-> [realWorld#], svq :-> [ds_gvz, ds_gvA]]
UbxTup 2
wild_svr
[((#_|#), [x_svs], [True], x_svs),
((#|_#),
[ipv_svt],
[False],
patError "UnboxedSum.hs:5:1-15|function f"#)]
It's ds_svg that's being complained about above. I find that a bit
confusing as that variable is never used on any RHS.
Some questions that might help me get there:
- I added a new RepType for unboxed sums, like so:
data RepType = UbxTupleRep [UnaryType]
| UbxSumRep [UnaryType]
| UnaryRep UnaryType
Does this constructor make sense? I store the already flattened
representation of the sum in here, rather than having something like
[[UnaryType]] and storing each alternative.
- In unariseAlts there's a bndr argument. Is that the binder of the
scrutinee as a whole (e.g. the 'x' in case e of x { ... -> ... })?
Any other idea what I might have gotten wrong?
On Mon, Sep 14, 2015 at 1:03 AM, Simon Marlow
On 10/09/2015 10:37, Simon Peyton Jones wrote:
The problem is that stg is too strongly typed
It’s not really typed, or at least only in a very half-hearted way. To be concrete I think you can just use Any for any Pointer arg. All STG needs to know, really, is which things are pointers. Detailed type info like “are you a Char or a Bool” is strictly jam; indeed never used I think. (I could be wrong but I’m pretty sure I’m not wrong in a fundamental way.
Yes, the only thing the code generator needs to do with types is convert them to PrimReps (see idPrimRep), and all GC pointer types have the same PrimRep (PtrRep).
Cheers Simon
SImon
*From:*Johan Tibell [mailto:johan.tibell@gmail.com] *Sent:* 09 September 2015 23:22 *To:* Simon Peyton Jones; Simon Marlow; ghc-devs@haskell.org *Subject:* Converting unboxed sum types in StgCmm
Hi!
The original idea for implementing the backend part of the unboxed sums proposal was to convert from the core representation to the actual data representation (i.e. a tag followed by some pointer and non-pointer fields) in the unarise stg-to-stg < https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fsimplStg%2fUnariseStg.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=U%2bFUNsL87iEemajTnAW9SxD9N5b4%2bG8QB1q19%2fX%2bBI4%3d
pass.
I have now realized that this won't work. The problem is that stg is too strongly typed. When we "desugar" sum types we need to convert functions receiving a value e.g. from
f :: (# Bool | Char #) -> ...
to
f :: NonPointer {-# tag#-} -> Pointer {-# Bool or Char #-} -> ...
Since stg is still typed with normal Haskell types (e.g. Bool, Char, etc), this is not possible, as we cannot represent an argument which has two different types.
It seems to me that we will have to do the conversion in the stg-to-cmm < https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fcodeGen%2fStgCmm.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=aXKZ78eGNKbJ6eZkxZgyJHgsAXpgOBjg3Zvqj%2bq7pk0%3d
pass, which is quite a bit more involved. For example, StgCmmEnv.idToReg function will have to change from
idToReg :: DynFlags -> NonVoid Id -> LocalReg
to
idToReg :: DynFlags -> NonVoid Id -> [LocalReg]
to accommodate the fact that we might need more than one register to store a binder.
Any ideas for a better solution?
-- Johan

Another question, in need to add something to AltType in StgSyn, would this
work
data AltType
= PolyAlt -- Polymorphic (a type variable)
| UbxTupAlt Int -- Unboxed tuple of this arity
| UbxSumAlt Int -- Unboxed sum of this arity
| AlgAlt TyCon -- Algebraic data type; the AltCons will be DataAlts
| PrimAlt TyCon -- Primitive data type; the AltCons will be LitAlts
or do I also have to capture which alternative was used here? Why do we
capture the arity in *tuple* case here?
On Mon, Sep 14, 2015 at 6:21 AM, Johan Tibell
I took a stab at this but ran into something I don't understand. For recence, the whole implementation of unboxed sums is at https://github.com/ghc/ghc/compare/master...tibbe:unboxed-sums and the implementation of unarisation is at https://github.com/ghc/ghc/compare/master...tibbe:unboxed-sums#diff-f5bc1f9e... .
Running the compiler on the following file:
{-# LANGUAGE UnboxedSums #-} module Test where
f :: (# Int | Char #) -> Int f (# x | #) = x {-# NOINLINE f #-}
g = f (# 1 | #)
Yields an error, like so:
ghc-stage2: panic! (the 'impossible' happened) (GHC version 7.11.20150912 for x86_64-apple-darwin): StgCmmEnv: variable not found ds_svq local binds for: ds_gvz ds_gvA
I probably got something wrong in UnariseStg, but I can't see what. I printed this debug information to see the stg I'm rewriting:
unarise [f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int [GblId, Arity=1, Str=DmdType, Unf=OtherCon []] = \r srt:SRT:[0e :-> patError] [ds_svq] case ds_svq of _ [Occ=Dead] { (#_|#) x_svs [Occ=Once] -> x_svs; (#|_#) _ [Occ=Dead] -> patError "UnboxedSum.hs:5:1-15|function f"#; };, g :: Int [GblId, Str=DmdType] = \u srt:SRT:[r1 :-> f] [] let { sat_svu [Occ=Once] :: Int [LclId, Str=DmdType] = NO_CCS I#! [1#]; } in case (#_|#) [sat_svu] of sat_svv { __DEFAULT -> f sat_svv; };] unariseAlts [81 :-> [realWorld#], svq :-> [ds_gvz, ds_gvA]] UbxTup 2 wild_svr [((#_|#), [x_svs], [True], x_svs), ((#|_#), [ipv_svt], [False], patError "UnboxedSum.hs:5:1-15|function f"#)]
It's ds_svg that's being complained about above. I find that a bit confusing as that variable is never used on any RHS.
Some questions that might help me get there:
- I added a new RepType for unboxed sums, like so:
data RepType = UbxTupleRep [UnaryType] | UbxSumRep [UnaryType] | UnaryRep UnaryType
Does this constructor make sense? I store the already flattened representation of the sum in here, rather than having something like [[UnaryType]] and storing each alternative. - In unariseAlts there's a bndr argument. Is that the binder of the scrutinee as a whole (e.g. the 'x' in case e of x { ... -> ... })?
Any other idea what I might have gotten wrong?
On Mon, Sep 14, 2015 at 1:03 AM, Simon Marlow
wrote: On 10/09/2015 10:37, Simon Peyton Jones wrote:
The problem is that stg is too strongly typed
It’s not really typed, or at least only in a very half-hearted way. To be concrete I think you can just use Any for any Pointer arg. All STG needs to know, really, is which things are pointers. Detailed type info like “are you a Char or a Bool” is strictly jam; indeed never used I think. (I could be wrong but I’m pretty sure I’m not wrong in a fundamental way.
Yes, the only thing the code generator needs to do with types is convert them to PrimReps (see idPrimRep), and all GC pointer types have the same PrimRep (PtrRep).
Cheers Simon
SImon
*From:*Johan Tibell [mailto:johan.tibell@gmail.com] *Sent:* 09 September 2015 23:22 *To:* Simon Peyton Jones; Simon Marlow; ghc-devs@haskell.org *Subject:* Converting unboxed sum types in StgCmm
Hi!
The original idea for implementing the backend part of the unboxed sums proposal was to convert from the core representation to the actual data representation (i.e. a tag followed by some pointer and non-pointer fields) in the unarise stg-to-stg < https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fsimplStg%2fUnariseStg.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=U%2bFUNsL87iEemajTnAW9SxD9N5b4%2bG8QB1q19%2fX%2bBI4%3d
pass.
I have now realized that this won't work. The problem is that stg is too strongly typed. When we "desugar" sum types we need to convert functions receiving a value e.g. from
f :: (# Bool | Char #) -> ...
to
f :: NonPointer {-# tag#-} -> Pointer {-# Bool or Char #-} -> ...
Since stg is still typed with normal Haskell types (e.g. Bool, Char, etc), this is not possible, as we cannot represent an argument which has two different types.
It seems to me that we will have to do the conversion in the stg-to-cmm < https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fcodeGen%2fStgCmm.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=aXKZ78eGNKbJ6eZkxZgyJHgsAXpgOBjg3Zvqj%2bq7pk0%3d
pass, which is quite a bit more involved. For example, StgCmmEnv.idToReg function will have to change from
idToReg :: DynFlags -> NonVoid Id -> LocalReg
to
idToReg :: DynFlags -> NonVoid Id -> [LocalReg]
to accommodate the fact that we might need more than one register to store a binder.
Any ideas for a better solution?
-- Johan

- data RepType = UbxTupleRep [UnaryType] | UbxSumRep [UnaryType] | UnaryRep UnaryType
Not, fully following, but ... this reptype business is orthogonal to
whether you add a normal type to the STG level that models anonymous, untagged unions, right? That is, when using Any for pointer types, they could use indicative phantom types, like "Any (Union Bool Char)", even if there's not full support for doing anything useful with (Union Bool Char) by itself. Maybe the casting machinery could greenlight a cast from Any (Union Bool Char) to Bool at least? There's already the unboxed union itself, (|# #|) , but that's different than a pointer to a union of types...

I've given this a yet some more thought. Given this simple core program:
f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
[GblId, Arity=1, Str=DmdType]
f =
\ (ds_dmE :: (#|#) Int Char) ->
case ds_dmE of _ [Occ=Dead] {
(#_|#) x_amy -> x_amy;
(#|_#) ipv_smK -> patError @ Int "UnboxedSum.hs:5:1-15|function f"#
}
We will get this stg pre-unarise:
unarise
[f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
\r srt:SRT:[0e :-> patError] [ds_svm]
case ds_svm of _ [Occ=Dead] {
(#_|#) x_svo [Occ=Once] -> x_svo;
(#|_#) _ [Occ=Dead] -> patError "UnboxedSum.hs:5:1-15|function
f"#;
};]
What do we want it to look like afterwards? I currently, have this, modeled
after unboxed tuples:
post-unarise:
[f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
\r srt:SRT:[0e :-> patError] [ds_gvu ds_gvv]
case (#_|#) [ds_gvu ds_gvv] of _ [Occ=Dead] { -- <-- WHAT
SHOULD GO HERE?
(#_|#) x_svo [Occ=Once] -> x_svo;
(#|_#) _ [Occ=Dead] -> patError "UnboxedSum.hs:5:1-15|function
f"#;
};]
Here I have performed the same rewriting of the scrutinee in the case
statement as for unboxed tuples, but note that this doesn't quite work, as
we don't know which data constructor to apply in "..." in case ... of. In
the case of tuples it's easy; there is only one.
It seems to me that we just want to rewrite the case altogether into
something that looks at the tag field of the data constructor. Also, in stg
we use the same DataCon as in core, but after stg the unboxed sum case
really only has one constructor (one with the union of all the fields),
which makes it awkward to reuse the original DataCon.
On Mon, Sep 14, 2015 at 7:27 AM, Ryan Newton
- data RepType = UbxTupleRep [UnaryType] | UbxSumRep [UnaryType] | UnaryRep UnaryType
Not, fully following, but ... this reptype business is orthogonal to whether you add a normal type to the STG level that models anonymous, untagged unions, right?
That is, when using Any for pointer types, they could use indicative phantom types, like "Any (Union Bool Char)", even if there's not full support for doing anything useful with (Union Bool Char) by itself. Maybe the casting machinery could greenlight a cast from Any (Union Bool Char) to Bool at least?
There's already the unboxed union itself, (|# #|) , but that's different than a pointer to a union of types...

It seems to me that we just want to rewrite the case altogether into something that looks at the tag field of the data constructor. Also, in stg we use the same DataCon as in core, but after stg the unboxed sum case really only has one constructor (one with the union of all the fields), which makes it awkward to reuse the original DataCon.
Is there a problem with introducing a totally new datatype at this point in the compile to represent the product (tag, wordish1, ..., wordishN, ptr1 ... ptrM)? Or, if it is an anonymous product, why can't it use existing unboxed sum machinery? Also, as an architecture thing, is there a reason this shouldn't be its own stg->stg pass? (P.S. "wordish" above has a weaselly suffix because as Dan pointed out, some unboxed things are > 64 bits.)

Johan
Sorry I’ve been buried. Let’s fix a time for a Skype call if you’d like to chat about this stuff.
Quick response to the below. I think that afterwards we want it to look like this:
post-unarise
f = \r [ ds1::Int# ds2::Ptr ]
case ds1 of
0# -> <rhs with ds2 in place of x_svo>
1# -> <rhs>
ds2 is the thing that contains either an Int or a char; ds1 is the tag that distinguishes htem.
Simon
From: Johan Tibell [mailto:johan.tibell@gmail.com]
Sent: 14 September 2015 17:03
To: Ryan Newton; Simon Peyton Jones
Cc: Simon Marlow; ghc-devs@haskell.org
Subject: Re: Converting unboxed sum types in StgCmm
I've given this a yet some more thought. Given this simple core program:
f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
[GblId, Arity=1, Str=DmdType]
f =
\ (ds_dmE :: (#|#) Int Char) ->
case ds_dmE of _ [Occ=Dead] {
(#_|#) x_amy -> x_amy;
(#|_#) ipv_smK -> patError @ Int "UnboxedSum.hs:5:1-15|function f"#
}
We will get this stg pre-unarise:
unarise
[f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
\r srt:SRT:[0e :-> patError] [ds_svm]
case ds_svm of _ [Occ=Dead] {
(#_|#) x_svo [Occ=Once] -> x_svo;
(#|_#) _ [Occ=Dead] -> patError "UnboxedSum.hs:5:1-15|function f"#;
};]
What do we want it to look like afterwards? I currently, have this, modeled after unboxed tuples:
post-unarise:
[f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
\r srt:SRT:[0e :-> patError] [ds_gvu ds_gvv]
case (#_|#) [ds_gvu ds_gvv] of _ [Occ=Dead] { -- <-- WHAT SHOULD GO HERE?
(#_|#) x_svo [Occ=Once] -> x_svo;
(#|_#) _ [Occ=Dead] -> patError "UnboxedSum.hs:5:1-15|function f"#;
};]
Here I have performed the same rewriting of the scrutinee in the case statement as for unboxed tuples, but note that this doesn't quite work, as we don't know which data constructor to apply in "..." in case ... of. In the case of tuples it's easy; there is only one.
It seems to me that we just want to rewrite the case altogether into something that looks at the tag field of the data constructor. Also, in stg we use the same DataCon as in core, but after stg the unboxed sum case really only has one constructor (one with the union of all the fields), which makes it awkward to reuse the original DataCon.
On Mon, Sep 14, 2015 at 7:27 AM, Ryan Newton

Yup, I think I have it figured out. Will just need to find the time to
write the remaining code.
On Tue, Sep 22, 2015 at 10:13 AM, Simon Peyton Jones
Johan
Sorry I’ve been buried. Let’s fix a time for a Skype call if you’d like to chat about this stuff.
Quick response to the below. I think that afterwards we want it to look like this:
post-unarise
f = \r [ ds1::Int# ds2::Ptr ]
case ds1 of
0# -> <rhs with ds2 in place of x_svo>
1# -> <rhs>
ds2 is the thing that contains either an Int or a char; ds1 is the tag that distinguishes htem.
Simon
*From:* Johan Tibell [mailto:johan.tibell@gmail.com] *Sent:* 14 September 2015 17:03 *To:* Ryan Newton; Simon Peyton Jones *Cc:* Simon Marlow; ghc-devs@haskell.org *Subject:* Re: Converting unboxed sum types in StgCmm
I've given this a yet some more thought. Given this simple core program:
f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
[GblId, Arity=1, Str=DmdType]
f =
\ (ds_dmE :: (#|#) Int Char) ->
case ds_dmE of _ [Occ=Dead] {
(#_|#) x_amy -> x_amy;
(#|_#) ipv_smK -> patError @ Int "UnboxedSum.hs:5:1-15|function f"#
}
We will get this stg pre-unarise:
unarise
[f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
\r srt:SRT:[0e :-> patError] [ds_svm]
case ds_svm of _ [Occ=Dead] {
(#_|#) x_svo [Occ=Once] -> x_svo;
(#|_#) _ [Occ=Dead] -> patError "UnboxedSum.hs:5:1-15|function f"#;
};]
What do we want it to look like afterwards? I currently, have this, modeled after unboxed tuples:
post-unarise:
[f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
[GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
\r srt:SRT:[0e :-> patError] [ds_gvu ds_gvv]
case (#_|#) [ds_gvu ds_gvv] of _ [Occ=Dead] { -- <-- WHAT SHOULD GO HERE?
(#_|#) x_svo [Occ=Once] -> x_svo;
(#|_#) _ [Occ=Dead] -> patError "UnboxedSum.hs:5:1-15|function f"#;
};]
Here I have performed the same rewriting of the scrutinee in the case statement as for unboxed tuples, but note that this doesn't quite work, as we don't know which data constructor to apply in "..." in case ... of. In the case of tuples it's easy; there is only one.
It seems to me that we just want to rewrite the case altogether into something that looks at the tag field of the data constructor. Also, in stg we use the same DataCon as in core, but after stg the unboxed sum case really only has one constructor (one with the union of all the fields), which makes it awkward to reuse the original DataCon.
On Mon, Sep 14, 2015 at 7:27 AM, Ryan Newton
wrote: - data RepType = UbxTupleRep [UnaryType] | UbxSumRep [UnaryType] | UnaryRep UnaryType
Not, fully following, but ... this reptype business is orthogonal to whether you add a normal type to the STG level that models anonymous, untagged unions, right?
That is, when using Any for pointer types, they could use indicative phantom types, like "Any (Union Bool Char)", even if there's not full support for doing anything useful with (Union Bool Char) by itself. Maybe the casting machinery could greenlight a cast from Any (Union Bool Char) to Bool at least?
There's already the unboxed union itself, (|# #|) , but that's different than a pointer to a union of types...
participants (5)
-
Dan Doel
-
Johan Tibell
-
Ryan Newton
-
Simon Marlow
-
Simon Peyton Jones