
A while ago, to help myself understand newtype, data, and strictness, I tried to write down how Haskell types correspond to lifted domains. Here is a cleaned-up version of my attempt. I am not sure that what follows is entirely correct -- please point out any errors. I would also appreciate comments or pointers to where this stuff is better described. In particular, I have not considered recursive types. Every type "t" in Haskell can be denotationally modeled by a complete partial order (CPO), which we will notate as "[t]". In fact, every type in standard Haskell corresponds to a -pointed- CPO, meaning a CPO with a least element "bottom". GHC, in addition, supports -unboxed- types, which correspond to CPOs in general. Given a CPO "D", we can -lift- it to a pointed CPO "lift D" by adding a bottom. Given two (not necessarily pointed) CPOs "D" and "E", we can take their -cartesian product- "D x E", which is like the set-theoretic product. Given two pointed CPOs "D" and "E", we can take their -smash product- "D * E", which is like the set-theoretic/cartesian product, but which identifies all pairs of the form "(x,bottom)" or "(bottom,y)" into one single new bottom element. The identity of "*" is the two-element pointed CPO "1", which consists of a bottom and a top. Given two pointed CPOs "D" and "E", we can take their -coalesced sum- "D + E", which is like the set-theoretic sum, but which identifies the bottom from "D" and the bottom from "E" into one single new bottom element. The identity of "+" is the one-element pointed CPO "0", which consists only of a bottom. Given any two (not necessarily pointed) CPOs, we have the property lift D * lift E = lift (D x E). In Haskell, if you write newtype t' = T t -- , then "[t']" is exactly isomorphic to "[t]". Pattern matching on the "T" is a no-op (i.e., always succeeds). Now turn to a data declaration data t' = T1 ... | T2 ... | ... -- . Each branch in the declaration specifies a pointed CPO by naming zero or more data types, each of which can be strict or non-strict. We lift the non-strict ones, then take the smash product of everything. The result is the pointed CPO denoted by the branch. The pointed CPO "[t']" denoted by the combined data type "t'" is the coalesced sum of the pointed CPOs denoted by all the branches. For example, if we write data t' = T1 !t11 !t12 | T2 t2 | T3 -- , then [t'] = [t11] * [t12] + lift [t2] + 1 -- . Pattern-matching on any data-declared type such as "t'" above is always strict, in the sense that if the value being matched against is bottom, then the whole expression denotes bottom. For example, given how we defined "t'" above, the expression case x' of T1 _ _ -> y T2 _ -> y T3 -> y is entirely equivalent to x' `seq` y -- . The only reason to have `seq` is because primitive types such as Int cannot be pattern-matched against. (Hence people call "seq" "polymorphic case".) Unboxed types are non-pointed CPOs, so it doesn't make sense to say data t' = T !t if "t" is an unboxed type, but it does make sense to say data t' = T t because "[t]" can be lifted to a pointed CPO. You ought to be able to say newtype t' = T t to define a new unboxed type "t'" from an existing unboxed type "t", but I faintly remember this being not implemented or allowed or something. Tuple types are special notation for what could be otherwise defined using "data" declarations as follows: data () = () data (,) t1 = (,) t1 data (,,) t1 t2 = (,,) t1 t2 data (,,,) t1 t2 t3 = (,,,) t1 t2 t3 -- ... So we have [( )] = 1 [(t1 )] = lift [t1] [(t1,t2 )] = lift [t1] * lift [t2] = lift ([t1] x [t2]) [(t1,t2,t3)] = lift [t1] * lift [t2] * lift [t3] = lift ([t1] x [t2] x [t3]) ... If "[t1]" has 3 elements including its bottom, and "[t2]" has 5 elements including its bottom, then "[(t1,t2)]" has 3 * 5 + 1 = 16 elements including its bottom. Given the definitions above, let us consider some corner and not so corner declarations: (When I write that foo denotes bar, I mean that foo denotes something isomorphic to bar.) data T1 -- denotes "0", the identity for "+" -- (but this is not standard Haskell) data T2 = T2 -- denotes "1", the identity for "*" data T3 = T3 () -- denotes "lift 1", because "()" denotes "1" data T4 = T4 !() -- denotes "1", because "()" denotes "1" newtype T5 = T5 () -- denotes "1", because "()" denotes "1" data T6 = T6 !T1 Int -- denotes "0 * lift [Int]", which is just "0" Note that although both T4 and T5 denote "1", pattern-matching on them is different: case T4 undefined of T4 _ -> y === undefined case T5 undefined of T5 _ -> y === y Whew. -- Edit this signature at http://www.digitas.harvard.edu/cgi-bin/ken/sig and view Ken's picture at http://www.medianstrip.net/~julia/pix/ken.jpg "Dont waste your fire men! They couldn't hit an elephant at this dist--" -Last words of Union General John Sedgwick at the Battle of Spotsylvania 1864