Of types and constructors; a question of programming style

When I'm designing datatypes for a Haskell program, I sometimes seem to end up with a slightly incoherent mixture of algebraic types and constructors. I'm wondering if anyone has worked out a principled way to decide when to introduce a new data type. Here's an example from something I'm working on: [[ data Event = Document DocURI Element | Element Name BaseURI Language Children Attributes LiIndex Subject | EndElement | Attribute Name AttributeVal | Text TextVal ]] (abbreviated from the original for discussion, and using arbitrary type names for the field types) Using this formulation, I have no access to the type of the different alternatives of the Union type thus defined. A different way of describing the type overcomes this limitation: [[ data Event = EVDocument Document | EVElement Element | EVEndElement EndElement | EVAttribute Attribute | EVText Text data Document = Document DocURI Element data Element = Element Name BaseURI Language Children Attributes LiIndex Subject data EndElement = EndElement data Attribute = Attribute Name AttributeVal data Text = Text TextVal ]] which seems to be rather repetitious. And I haven't even started to use the named field syntax here. Yet another alternative, if I'm not interested in field names, might be to use tuples and type synonyms for the alternative values: [[ data Event = Document EVDocument | Element EVElement | EndElement EVEndElement | Attribute EVAttribute | Text EVText type EVDocument = (DocURI,Element) type EVElement = (Name,BaseURI,Language,Children,Attributes,LiIndex,Subject) type EVEndElement = () type EVAttribute = (Name,AttributeVal) type EVText = TextVal ]] Is there any community consensus concerning what constitutes good style in such circumstances? #g ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact

On 2004 July 06 Tuesday 05:35, Graham Klyne wrote:
When I'm designing datatypes for a Haskell program, I sometimes seem to end up with a slightly incoherent mixture of algebraic types and constructors.
example
data Event = Document DocURI Element | Element Name BaseURI Language Children Attributes LiIndex | Subject EndElement | Attribute Name AttributeVal | Text TextVal
At first I was going to say that I would _never_ feel the need to turn a set of constructors into a set of types. But looking again at your example constructors I grasp what you mean by "incoherent". In such cases, what may help is to consider why such disparate entities would be grouped together. It is not uncommon that the reason is that they all are processed by one or a few functions. Then you can consider making those functions into a class. Whether this is desirable depends on whether splitting up the implementation of the original functions, reorganized by "type", makes the program more modular.

At 10:23 06/07/04 -0400, Scott Turner wrote:
On 2004 July 06 Tuesday 05:35, Graham Klyne wrote:
When I'm designing datatypes for a Haskell program, I sometimes seem to end up with a slightly incoherent mixture of algebraic types and constructors.
example
data Event = Document DocURI Element | Element Name BaseURI Language Children Attributes LiIndex | Subject EndElement | Attribute Name AttributeVal | Text TextVal
At first I was going to say that I would _never_ feel the need to turn a set of constructors into a set of types. But looking again at your example constructors I grasp what you mean by "incoherent". In such cases, what may help is to consider why such disparate entities would be grouped together.
The main reason in this case is that I'm implementing a specification (RDF parser [1]), with a view to keeping the code very close to the specification. Part of my goal is to provide implementation-based feedback on the accuracy of the specification, with clear traceability between the code and the specification document, and also to provide feedback on how easy it is for an implementer to follow the specification. You may notice that some of the events contain references to other specific kinds of events (Document-->Element, Element-->Attribute, etc). My implementation strategy (following the spec) is to create a sequence of Event values that are subsequently analyzed using a Parsec implementation of the grammar. Effectively, this means that I sacrifice some static type checking, and maybe have a little additional dynamic type-labelling overhead, but not too much, I think. BTW, my current solution is with no separate types [2]. #g -- [1] http://www.w3.org/TR/rdf-syntax-grammar/ [2] [[ data Event = Document { base :: String , element :: Event } | Element { name :: ScopedName , base :: String , lang :: String , children :: [Event] , attributes :: [Event] } | EndElement | Attribute { name :: ScopedName , value :: String } | CharData { value :: String } -- Derivative (non-infoset) events: | UriNode { name :: ScopedName } | BlankNode { ident :: String } | PlainLit { value :: String , lang :: String } | TypedLit { value :: String , datatype :: ScopedName } ]] ------------ Graham Klyne For email: http://www.ninebynine.org/#Contact

It sounds a little as though you want a family of n-ary union type constructor. A crude first attempt would be: data Either2 t1 t2 = In1 t1 | In2 t2 data Either3 t1 t2 t3 = In1 t1 | In2 t2 | In3 t3 ... but this would be a bit tedious to use because the names are a bit meaningless - you'd be relying heavily on type checking to keep you straight. e.g., it's kinds hard to read: f :: Either String Bool -> Either String Bool f (In1 "Fred") = In1 "Frederick" f (In2 m) = In2 (not m) Proposals for records gives you an easy way to define arbitrary product types with convenient syntax: { name = "Fred", married = True } :: { name :: String, married :: Bool } Maybe what you really want here is a way to define union types with some convenient syntax. In the following made up syntax, |[ ... |] contains a list of types to be unioned together with each type labelled by a constructor name. f :: |[ Name :: String, Married :: Bool |] -> |[ Name :: String, Married :: Bool |] f |[ Name = "Fred" |] = |[ Name = "Frederick |] f [| Married = m |] = [| Married = not m |] I wonder if an extension like this would be generally useful? (Would it be useful for defining a compiler/interpreter by first giving a simple language and then adding further operations?) -- Alastair Reid

My original question was related to use of existing facilities. I find it somewhat surprising how rich and complex a language Haskell turns out to be. I'm wary of yet more features. That said, you seem to be suggesting a kind of "anonymous" union type (to complement the proposed "anonymous" record type?). At one level, that seems to answer the question I was raising, and seems to require that alternative selection is based on (unification of?) the datatypes, rather than by simple matching of constructor labels. That sounds to me a bit like class instance selection (and I note that Scott did suggest using type classes). Could your proposal be viewed as a syntactic sugaring of classes and instances, without explicit instance declarations? #g -- At 10:35 08/07/04 +0100, Alastair Reid wrote:
It sounds a little as though you want a family of n-ary union type constructor. A crude first attempt would be:
data Either2 t1 t2 = In1 t1 | In2 t2 data Either3 t1 t2 t3 = In1 t1 | In2 t2 | In3 t3 ...
but this would be a bit tedious to use because the names are a bit meaningless - you'd be relying heavily on type checking to keep you straight. e.g., it's kinds hard to read:
f :: Either String Bool -> Either String Bool f (In1 "Fred") = In1 "Frederick" f (In2 m) = In2 (not m)
Proposals for records gives you an easy way to define arbitrary product types with convenient syntax:
{ name = "Fred", married = True } :: { name :: String, married :: Bool }
Maybe what you really want here is a way to define union types with some convenient syntax. In the following made up syntax, |[ ... |] contains a list of types to be unioned together with each type labelled by a constructor name.
f :: |[ Name :: String, Married :: Bool |] -> |[ Name :: String, Married :: Bool |] f |[ Name = "Fred" |] = |[ Name = "Frederick |] f [| Married = m |] = [| Married = not m |]
I wonder if an extension like this would be generally useful? (Would it be useful for defining a compiler/interpreter by first giving a simple language and then adding further operations?)
-- Alastair Reid
------------ Graham Klyne For email: http://www.ninebynine.org/#Contact
participants (4)
-
Alastair Reid
-
Graham Klyne
-
Graham Klyne
-
Scott Turner