Another question, in need to add something to AltType in StgSyn, would this work

data AltType
  = PolyAlt             -- Polymorphic (a type variable)
  | UbxTupAlt Int       -- Unboxed tuple of this arity
  | UbxSumAlt Int       -- Unboxed sum of this arity
  | AlgAlt    TyCon     -- Algebraic data type; the AltCons will be DataAlts
  | PrimAlt   TyCon     -- Primitive data type; the AltCons will be LitAlts

or do I also have to capture which alternative was used here? Why do we capture the arity in tuple case here?

On Mon, Sep 14, 2015 at 6:21 AM, Johan Tibell <johan.tibell@gmail.com> wrote:
I took a stab at this but ran into something I don't understand. For recence, the whole implementation of unboxed sums is at https://github.com/ghc/ghc/compare/master...tibbe:unboxed-sums and the implementation of unarisation is at https://github.com/ghc/ghc/compare/master...tibbe:unboxed-sums#diff-f5bc1f9e9c230db4cf882bf18368a818.

Running the compiler on the following file:

{-# LANGUAGE UnboxedSums #-}
module Test where

f :: (# Int | Char #) -> Int
f (# x | #) = x
{-# NOINLINE f #-}

g = f (# 1 | #)

Yields an error, like so:

ghc-stage2: panic! (the 'impossible' happened)
  (GHC version 7.11.20150912 for x86_64-apple-darwin):
StgCmmEnv: variable not found
  ds_svq
  local binds for:
  ds_gvz
  ds_gvA

I probably got something wrong in UnariseStg, but I can't see what. I printed this debug information to see the stg I'm rewriting:

unarise
  [f [InlPrag=NOINLINE] :: (#|#) Int Char -> Int
   [GblId, Arity=1, Str=DmdType, Unf=OtherCon []] =
       \r srt:SRT:[0e :-> patError] [ds_svq]
           case ds_svq of _ [Occ=Dead] {
             (#_|#) x_svs [Occ=Once] -> x_svs;
             (#|_#) _ [Occ=Dead] -> patError "UnboxedSum.hs:5:1-15|function f"#;
           };,
   g :: Int
   [GblId, Str=DmdType] =
       \u srt:SRT:[r1 :-> f] []
           let {
             sat_svu [Occ=Once] :: Int
             [LclId, Str=DmdType] =
                 NO_CCS I#! [1#];
           } in 
             case (#_|#) [sat_svu] of sat_svv { __DEFAULT -> f sat_svv; };]
unariseAlts
  [81 :-> [realWorld#], svq :-> [ds_gvz, ds_gvA]]
  UbxTup 2
  wild_svr
  [((#_|#), [x_svs], [True], x_svs),
   ((#|_#),
    [ipv_svt],
    [False],
    patError "UnboxedSum.hs:5:1-15|function f"#)]

It's ds_svg that's being complained about above. I find that a bit confusing as that variable is never used on any RHS.

Some questions that might help me get there:
  • I added a new RepType for unboxed sums, like so:

    data RepType = UbxTupleRep [UnaryType]
        | UbxSumRep [UnaryType]
        | UnaryRep UnaryType

    Does this constructor make sense? I store the already flattened representation of the sum in here, rather than having something like [[UnaryType]] and storing each alternative.
  • In unariseAlts there's a bndr argument. Is that the binder of the scrutinee as a whole (e.g. the 'x' in case e of x { ... -> ... })?
Any other idea what I might have gotten wrong?


On Mon, Sep 14, 2015 at 1:03 AM, Simon Marlow <marlowsd@gmail.com> wrote:
On 10/09/2015 10:37, Simon Peyton Jones wrote:
The problem is that stg is too strongly typed

It’s not really typed, or at least only in a very half-hearted way.  To
be concrete I think you can just use Any for any Pointer arg.   All STG
needs to know, really, is which things are pointers.  Detailed type info
like “are you a Char or a Bool” is strictly jam; indeed never used I
think.  (I could be wrong but I’m pretty sure I’m not wrong in a
fundamental way.

Yes, the only thing the code generator needs to do with types is convert them to PrimReps (see idPrimRep), and all GC pointer types have the same PrimRep (PtrRep).

Cheers
Simon




SImon

*From:*Johan Tibell [mailto:johan.tibell@gmail.com]
*Sent:* 09 September 2015 23:22
*To:* Simon Peyton Jones; Simon Marlow; ghc-devs@haskell.org
*Subject:* Converting unboxed sum types in StgCmm

Hi!

The original idea for implementing the backend part of the unboxed sums
proposal was to convert from the core representation to the actual data
representation (i.e. a tag followed by some pointer and non-pointer
fields) in the unarise stg-to-stg
<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fsimplStg%2fUnariseStg.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=U%2bFUNsL87iEemajTnAW9SxD9N5b4%2bG8QB1q19%2fX%2bBI4%3d>
pass.

I have now realized that this won't work. The problem is that stg is too
strongly typed. When we "desugar" sum types we need to convert functions
receiving a value e.g. from

     f :: (# Bool | Char #) -> ...

to

     f :: NonPointer {-# tag#-} -> Pointer {-# Bool or Char #-} -> ...

Since stg is still typed with normal Haskell types (e.g. Bool, Char,
etc), this is not possible, as we cannot represent an argument which has
two different types.

It seems to me that we will have to do the conversion in the stg-to-cmm
<https://na01.safelinks.protection.outlook.com/?url=https%3a%2f%2fgithub.com%2fghc%2fghc%2fblob%2fmaster%2fcompiler%2fcodeGen%2fStgCmm.hs&data=01%7c01%7csimonpj%40064d.mgd.microsoft.com%7cca7beffb01494517d75108d2b9652973%7c72f988bf86f141af91ab2d7cd011db47%7c1&sdata=aXKZ78eGNKbJ6eZkxZgyJHgsAXpgOBjg3Zvqj%2bq7pk0%3d>
pass, which is quite a bit more involved. For example, StgCmmEnv.idToReg
function will have to change from

     idToReg :: DynFlags -> NonVoid Id -> LocalReg

to

     idToReg :: DynFlags -> NonVoid Id -> [LocalReg]

to accommodate the fact that we might need more than one register to
store a binder.

Any ideas for a better solution?

-- Johan