Re: GHC core representation

[This is in reply to Andrew Tolmach's mail to the Haskell list. He asked that replies should be sent to ghc-users.]
Many months after the topic was first raised, there is finally a draft document describing a formal external syntax for GHC's "Core" intermediate language.
Wow! This is very cool - it answers so many questions that I previously had to examine the source code to figure out! Two meta-comments before I go into detailed comments. 1) GHC documentation tends to have a very short half-life. What can be done to minimize this problem? For example can the tables of primops in section 4 be automatically generated? 2) I haven't seriously hacked on GHC or the the GHC frontend for several years so some of the following comments will be a little off. I'm hoping that people who are more up to date will correct me and that their corrections will be useful to anyone like me who used to know GHC but have lost track of the details. OK, onto the detailed comments. 1) GHC identifiers have SrcLocs (filename/linenumber) attached to them. It might be useful to be able to pass those into your parser. (But what are the SrcLoc's used for in Core code? Would they be useful?) 2) The names you use for operations all use the somewhat unreadable Z encoding. I think you should Z-decode these names because: 1) They're a lot more readable. e.g., decode "ZLZmZgZR" = "(->)" 2) It's possible that GHC will get confused if you fail to Z-encode a name. For example, it isn't possible to have the name "Z" in Core because GHC encodes the Haskell identifier "Z" as "ZZ". From memory, the Z encoding is something like this: ZL -> ( ZR -> ) Zm -> - Zg -> > Zl -> < Ze -> = Zc -> : Zp -> . Zxxx -> chr(xxx) where xxx is a sequence of digits 3) The restriction that Main.main have type IO a is unfortunate and, I think, unnecessary. It shouldn't be that hard to change it so that its type is more like: State# RealWorld -> (# State# RealWorld, Void# #) 4) While explaining namespaces, it'd be convenient to point out that you put a % in front of all keywords. Hmmm, I guess this is one reason to use Z-encoded names: % is a legal Haskell identifier. 5) The @a in datatype declarations threw me for a while. Later I saw you using it like BigLambda in an explicitly typed lambda calculus and understood what you were doing. It'd be worth making the connection explicit. 6) An alternative way of defining data constructors would be like this: %data BinTree :: * -> * = { Fork :: %forall a . BinTree a -> BinTree a -> BinTree a; Leaf :: %forall a . a -> BinTree a } That is, specify the actual type of data constructors instead of using Haskell datatype declaration syntax. This makes the language a little easier to specify. Notice that this syntax is a little more liberal than standard Haskell syntax because you could write types like: %data Expr :: * -> * = { Int :: Int -> Expr Int; App :: %forall a, b . Expr (a -> b) -> Expr a -> Expr b; Lam :: %forall a, b . Var a -> Expr b -> Expr (a -> b) } which cannot be expressed using normal Haskell syntax because the result types of each constructor are different. I saw Lennart mention this idea in a talk about 8 years ago and I've always wanted to play with it. :-) On the other hand, you might avoid this generalisation in case GHC does something weird with it. 7) In section 3.6, you say: "Value applications may be of user-defined functions, data constructors or primitives. Application of the latter two sorts need _not_ be saturated." I think you mean "none of these applications need to be saturated although both previously published descriptions of Core required that the latter two be saturated." 8) You say that the list of case alternatives "need not be exhaustive, even if no default is given; it is a disastrous run-time error if a needed case arm is missing." 1) Just how disastrous? Is an exception raised or does the RTS crash? 2) I feel a little uneasy with this design decision. 9) You don't mention the _scc_ operation used for profiling. 10) There's a move on to define %ccall in a more generic way that would apply to calling Java functions, .net functions, etc. This is bound to result in a change of the "ccall" name. It'd be worth adding a Working note to that effect. 11) The discussion of the terms "unboxed", "heap allocated" and "unlifted" (page 9) doesn't seem quite right. I believe it is: Lifted types must be heap allocated. Unlifted types may be heap allocated (e.g., Array#s) or unboxed (not heap allocated) (e.g., Int#). 12) The discussion of operations on MVar# (page 10) says that they take an initial type argument of the form (State# t) for some thread "t". Is this true? Does "t" not have to be RealWorld#? What does it mean if it is not RealWorld#? 13) Are the CCallable and CReturnable classes still there? Blech! [I'm the one who implemented them - I'd hoped they'd gone by now.] 14) If strings are represented by the "address of a C-format string" (section 4.2), how do we represent strings with embedded \0 characters in them? 15) dataToTag#, tagToEnum# and getTag# (section 4.4.1) might be used to implement the to/fromEnum operations but they may also be remnants of GHCi version 0.0 - a metacircular interpreter that was in GHC version 0.18 (or thereabouts). If the latter, someone ought to give them a decent burial. 16) You document unsafeCoerce# but not reallyUnsafePtrEq# :: a -> a -> Bool did someone finally kill that? 17) decodeDouble# and decodeFloat# are used to extract the exponent and mantissa of a floating point number. Ignoring unboxity, it returns an Int (the exponent) and an Integer (the mantissa). Representing that as unboxed types, you get (# Int#, Int#, ByteArray# #) 18) Section 4.4.8 asks what the relationships are between quotient, remainder, div and mod. This ought to be the same as that specified in the Haskell report for their boxed equivalents. 19) Your question about what indexArray# does and what alignment restrictions apply reminds me of a subtlety: When indexing into a ByteArr# or a Addr#, is the index scaled by the size of the object being read/written (as in C) or not? I am pretty sure that the index on an Addr# is not scaled (and the value of the (Addr# + Int#) combination is subject to exactly the same alignment restrictions as imposed by the hardware architecture. I don't recall the story for ByteArr# but a ByteArr# object is always aligned to the nearest 4 (maybe 8) byte boundary. 20) Section 4.4.15 says that takeMVar# and putMVar# "die" if the MVar is empty (respectively, full). 1) What does "die" mean? More generally, how is failure implemented in the primops? Do they raise exceptions? Do they return error codes? Do they call exit? Do they crash the RTS? 2) Are you sure these ops "die"? Their unboxed Haskell brethren will block the thread under the same circumstances. 21) You repeatedly mention the sequential implementation of the RTS. Last time I looked, it wasn't possible to build the RTS with concurrency turned off. If one were to build the GHC RTS with concurrency turned off, I think the concurrency primitives should not be available rather than providing versions that always/mostly fail. Or maybe one could implement non-preemptive "concurrency" in much the same way as we did in Hugs? (Not sure I recommend this route - the interactions between threads and exception handlers in Hugs are somewhat tricky.) -- Alastair Reid reid@cs.utah.edu http://www.cs.utah.edu/~reid/
participants (1)
-
Alastair David Reid