
Hans van Thiel wrote:
As a general comment on the teaching of Haskell, all books and tutorials, which I've seen, appear to treat this aspect of Haskell as if it were self explanatory. This while the better known imperative languages don't have anything like it. Only Real World Haskell explains algebraic data types to some satisfaction (IMHO, of course).
(Hopefully this different take on it helps more than it hurts...) In addition to keeping the type-level and the value-level separated, Haskell does a little bit to keep the type/interface-level and the implementation-level separate. The "data" keyword introduces both a new type and also a new implementation. This is the only way of introducing new implementations. ADTs are beauty incarnate, but unfortunately not well known outside of functional languages and abstract algebra. The "newtype" keyword introduces a new type, but it reuses an old implementation under the covers. Even though they have the same underlying implementation, the newtype and the type of the old implementation are considered entirely different semantically and so one cannot be used in lieu of the other. The dubiously named "type" keyword introduces an alias shorthand for some type that already exists. These aliases are, in a sense, never checked; that is, the macro is just expanded. This means that we can't carry any additional semantic information by using aliases and so if we have: type Celsius = Int type Fahrenheit = Int ...the type checker will do nothing to save us. If we wanted to add semantic tags to the Int type in order to say what units the number represents, then we could do that with a "newtype" and the type checker would ensure that we didn't mix units. Re "data" vs "newtype", where a newtype is possible (single data constructor, which has exactly one argument) there are still a few differences at the semantic level. Since a newtype's data constructor doesn't exist at runtime, evaluating a newtype to WHNF will evaluate the argument to WHNF; hence a newtype can be thought of as the data version with an obligatory strictness annotation. In terms of bottom, this means that: data Foo = Foo Int ...has both _|_ and (Foo _|_). Whereas, both of the following: data Foo = Foo !Int newtype Foo = Foo Int ...have only _|_. It should also be noted that the overhead for newtypes is not *always* removed. In particular, if we have the following definitions: data Z = Z newtype S a = S a We must keep the tags (i.e. boxes) for S around because (S Z) and (S (S Z)) need to be distinguishable. This only really comes up with polymorphic newtypes (since that enables recursion), and it highlights the difference between strict fields and unpacked strict fields. Typically newtypes are unpacked as well as strict (hence no runtime tag overhead), but it's not guaranteed. Another operational difference between newtypes and an equivalent data declaration has to do with the type class dictionaries. IIRC, with GeneralizedNewtypeDeriving the newtype actually uses the exact same dictionaries as the underlying type, thus avoiding unwrapping/rewrapping overhead. I'm somewhat fuzzy on all the details here, bit its another reason to use newtypes when you can. -- Live well, ~wren