How to implement this? A case for scoped record labels?

Hi, I would like to get some advice on how to best implement a protocol. The protocol in question is "Bayeux": http://svn.cometd.org/trunk/bayeux/bayeux.html. The details don't matter here - it defines a couple of requests and responses in JSON format, basically JSON objects with different properties, some of which are shared by all (e.g. "channel") and some which are specific for certain kinds of requests/responses (e.g. "subscription"). To give an example, a connect request would look like this: [ { "channel": "/meta/connect", "clientId": "Un1q31d3nt1f13r", "connectionType": "long-polling" } ] Now I leave the actual JSON parsing to the excellent Text.JSON library. My problem is how to get the types right. At first I started with a big discriminated union, e.g. data BayeuxMessage = HandshakeRequest { channel :: String , ... } | HandshakeResponse { channel :: String, successful :: Bool, ... } | ... This way I could create BayeuxMessage values by copying the Text.JSON parsed values over. However what I don't like is that many selector functions, e.g. successful, are only partial and using them with a BayeuxMessage value constructed with HandshakeRequest for example will result in a runtime error. So I think it would be better to have individual types for the protocol requests/responses, e.g. data HandshakeRequest = HandshakeRequest { channel :: String , ... } data HandshakeResponse = HandshakeResponse { channel :: String, successful :: Bool, ... } ... data BayeuxMessage = HSReq HandshakeRequest | HSRes HandshakeResponse ... This however does not work because record selectors have module scope, so the compiler will complain that channel et. al. are defined multiple times. As a workaround I could put each type into its own module, but at least GHC requires a file per module (which is *very* inconvenient IMO). If we would have scoped labels (e.g. like proposed here: http://legacy.cs.uu.nl/daan/pubs.html#scopedlabels) it seems like it would have been straightforward. So certainly I am missing something and there is a better way to design this. Hence this e-mail. I welcome any advice how this would best be done in Haskell with GHC. Many thanks, nt

ntupel@googlemail.com wrote:
This however does not work because record selectors have module scope, so the compiler will complain that channel et. al. are defined multiple times. As a workaround I could put each type into its own module, but at least GHC requires a file per module (which is *very* inconvenient IMO). If we would have scoped labels (e.g. like proposed here: http://legacy.cs.uu.nl/daan/pubs.html#scopedlabels) it seems like it would have been straightforward.
So certainly I am missing something and there is a better way to design this. Hence this e-mail. I welcome any advice how this would best be done in Haskell with GHC.
One alternative is to use Haskell's support for ad-hoc overloading. Define a typeclass for each selector (or group of selectors that must always occur together) which is polymorphic in the record type. Combine this with the separate constructor types to get something like: data HandshakeRequest = HandshakeRequest String ... data HandshakeResponse = HandshakeResponse String Bool ... ... data BayeuxMessage = HSReq HandshakeRequest | HSRes HandshakeResponse ... class BayeuxChannel r where channel :: r -> String instance BayeuxChannel HandshakeRequest where channel (HandshakeRequest ch ...) = ch instance BayeuxChannel HandshakeResponse where channel (HandshakeResponse ch _ ...) = ch ... class BayeuxSuccessful r where successful :: r -> Bool ... It's not pretty, but it gets the job done. Many people decry this as improper use of typeclasses though (and rightly so). A better approach would probably be to use GADTs or the new data families which give a sort of dual of typeclasses (typeclasses give a small set of functions for a large set of types; GADTs give a large set of functions for a small set of types[0]). Someone more familiar with those approaches should give those versions. If you want to be able to set the fields as well as read them then the classes should be more like lenses than projectors. For instance, this[1] discussion on Reddit. The two obvious options are a pair of setter and getter functions: (Whole->Part, Whole->Part->Whole); or a factored version of the same: Whole->(Part, Part->Whole). You should also take a look at the data-accessor packages[2][3] which aim to give a general solution to the lens problem. Also take a look at hptotoc[4], the Haskell implementation of Google's Protocol Buffers which has many similar problems to your Bayeaux protocol. In general, protocols designed for OO are difficult to translate into non-OO languages. [0] http://blog.codersbase.com/tag/gadt/ [1] http://www.reddit.com/r/haskell/comments/86oc3/yet_another_proposal_for_hask... [2] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/data-accessor [3] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/data-accessor-tem... [4] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hprotoc -- Live well, ~wren

On Tue, May 26, 2009 at 4:32 AM, wren ng thornton
One alternative is to use Haskell's support for ad-hoc overloading. Define a typeclass for each selector (or group of selectors that must always occur together) which is polymorphic in the record type. [...] It's not pretty, but it gets the job done. Many people decry this as improper use of typeclasses though (and rightly so).
Yes. I was experimenting a little with type classes and the more granular I defined them the more I got the feeling of producing a hack. But as you said, it would get the job done and I will probably give it a try.
A better approach would probably be to use GADTs or the new data families which give a sort of dual of typeclasses (typeclasses give a small set of functions for a large set of types; GADTs give a large set of functions for a small set of types[0]). Someone more familiar with those approaches should give those versions.
Interesting, but I fail to see how this might be applied to the problem at hand. I played with associated types and they are quite neat. But I would still be working with type classes, so how would this be different from the first approach. W.r.t. GADTs I understood these as to provide a way to be more specific about the return type of constructor functions. But my problem is mostly a scope issue, isn't it?
If you want to be able to set the fields as well as read them then the classes should be more like lenses than projectors.
I am fine with selectors for now. But thanks for the references.
Also take a look at hptotoc[4], the Haskell implementation of Google's Protocol Buffers which has many similar problems to your Bayeaux protocol. In general, protocols designed for OO are difficult to translate into non-OO languages.
From what I saw by briefly scanning the contents it seems to me the problem is again solved with the type class approach you mentioned in the beginning.
I wonder if I am completely off here, but I am surprised that there is no progress on the scoped labels front. The Haskell wiki mentioned that the status quo is due to a missing optimum in the design space, but the same can be said about generic programming in Haskell and yet, GHC ships with Scrap your boilerplate. So we have to resort to type classes hacks instead of a proper solution. OTOH I might not have understood the relevance of GADTs for this problem and it is a non-issue but prima facie it doesn't seem to be. Anyway, many thanks for your thoughtful reply. Best regards, nt

ntupel@googlemail.com wrote:
wren ng thornton wrote:
A better approach would probably be to use GADTs or the new data families which give a sort of dual of typeclasses (typeclasses give a small set of functions for a large set of types; GADTs give a large set of functions for a small set of types[0]). Someone more familiar with those approaches should give those versions.
Interesting, but I fail to see how this might be applied to the problem at hand. I played with associated types and they are quite neat. But I would still be working with type classes, so how would this be different from the first approach. W.r.t. GADTs I understood these as to provide a way to be more specific about the return type of constructor functions. But my problem is mostly a scope issue, isn't it?
I'm not familiar enough with the state of the art here to feel comfortable suggesting an implementation; hence leaving it to someone else. Technically GADTs are "just" for being more specific about the return types of constructor functions, but this is vastly more powerful than it may seem. For instance, GADTs can be combined with phantom types to great effect, e.g. for type checking dependent-like types. GADTs can easily cover record selectors that apply to all constructors, and selectors that apply to one constructor (or a set of constructors producing the same type). If the family of selector sets forms a nice tree hierarchy, you can use phantom type constructors and type variables to express subtrees of that hierarchy as types, e.g. data Name m data Successful m ... data Bayeaux mesg where HandshakeRequest :: ... -> Bayeaux (Name ...) HandshakeResponse :: ... -> Bayeaux (Name (Successful ...)) ... name :: Bayeaux (Name m) -> String name (HandshakeRequest ...) = ... name (HandshakeResponse ...) = ... ... successful :: Bayeaux (Name (Successful m)) name (HandshakeResponse ...) = ... ... I don't know if the Bayeaux protocol is amenable to this or not. And I'm sure there's a prettier way to do it anyhow. By using multiple phantom types you can encode any subset relation on selector sets that can be described by a tree-ordered space. If the subset relation is DAGy, then things get ugly again. You'll probably have to use typeclasses in some form or another eventually, the question is how much you rely on ad-hoc overloading vs how structured you can make things by using other techniques.
I wonder if I am completely off here, but I am surprised that there is no progress on the scoped labels front. The Haskell wiki mentioned that the status quo is due to a missing optimum in the design space, but the same can be said about generic programming in Haskell and yet, GHC ships with Scrap your boilerplate. So we have to resort to type classes hacks instead of a proper solution. OTOH I might not have understood the relevance of GADTs for this problem and it is a non-issue but prima facie it doesn't seem to be.
The missing optimum is a big problem leading to the status quo. I think another issue is that noone is currently working on alternatives[1][2]. For SYB and the other generics stuff, people are actively working on it so there's more desire to make the options widely available, hoping that a clear winner will emerge. Without active competition to weed out competitors, offering multiple options fragments the community. The monad transformer libraries seem to be in this quandary now. There was a lot of research a while back and there are lots of options out there, but people default to mtl for compatibility reasons and there hasn't been a strong campaign for one of the competitors to conquer mindshare and take over (though a small one is beginning now that the HP is here). [1] I have a handful of ideas I've been kicking around, but I can't say that I've actually been "working" on any of them. [2] If anyone *is* actively working in this area, I'd be curious to hear about it :) -- Live well, ~wren

On Tue, 2009-05-26 at 18:09 -0400, wren ng thornton wrote:
GADTs can easily cover record selectors that apply to all constructors, and selectors that apply to one constructor (or a set of constructors producing the same type). If the family of selector sets forms a nice tree hierarchy, you can use phantom type constructors and type variables to express subtrees of that hierarchy as types, e.g. [...] I don't know if the Bayeaux protocol is amenable to this or not. And I'm sure there's a prettier way to do it anyhow.
By using multiple phantom types you can encode any subset relation on selector sets that can be described by a tree-ordered space. If the subset relation is DAGy, then things get ugly again. You'll probably have to use typeclasses in some form or another eventually, the question is how much you rely on ad-hoc overloading vs how structured you can make things by using other techniques.
Finally I got your point. Many thanks for your explanation. So, yes, in principle GADTs seem helpful here, but it turned out that for Bayeux the relations are difficult to encode and it seems I would indeed at least partially have to use type classes again. It really is fascinating, I learned a lot in the last days about GADTs, type families, and other type trickery. Never mind that I still struggle to see an obvious implementation strategy, all proposed solutions look like workarounds to the lack of scoped record labels to me. Maybe I should just use prefixes for the record selectors of individual data types. D'oh! Thanks, nt

I wonder if I am completely off here, but I am surprised that there is no progress on the scoped labels front. The Haskell wiki mentioned that the status quo is due to a missing optimum in the design space, but the same can be said about generic programming in Haskell and yet, GHC ships with Scrap your boilerplate. So we have to resort to type classes hacks instead of a proper solution.
There are various implementations of extensible records available. HList may have the best-supported versions and the most experience, but essentially, they are simple enough to define that some packages ship with their own variants (as long as there is no agreement on the future of the language extensions needed to implement these libraries, there won't be any standard library). See the links on the Haskell wiki [1], though there are also newer entries on the GHC trac wiki [2,3]. The Haskell wiki page also points to my old first class labels proposal, which included a small example implementation based on Daan's scoped labels (there was a more recent implementation of Data.Record which noone seemed interested in, and the fairly new Data.Label suggestion offers a workaround for the lack of first class labels, see [4] for unsupported experimental versions of both). The various accessor packages and generators might be a more lightweight/portable alternative. In particular, they also cover the case of nested accessors. And, going back to your original problem, there is an intermediate stage between data BayeuxMessage = HandshakeRequest { channel :: String , ... } | HandshakeResponse { channel :: String, successful :: Bool, ... } | ... and data HandshakeRequest = HandshakeRequest { channel :: String , ... } data HandshakeResponse = HandshakeResponse { channel :: String, successful :: Bool, ... } ... data BayeuxMessage = HSReq HandshakeRequest | HSRes HandshakeResponse ... namely data HandshakeRequest = HandshakeRequest { ... } data HandshakeResponse = HandshakeResponse { successful :: Bool, ... } ... data BayeuxMessage = HSReq{ channel :: String, request :: HandshakeRequest } | HSRes{ channel :: String, response :: HandshakeResponse } ... Generally, you'll often want to use labelled fields with parameterized types, eg data NamedRecord a = Record { name :: a, ... } type StringNamedRecord = Record String type IntNamedRecord = Record Int and, no, I don't suggest to encoded types in names, this is just an abstract example;-) Mostly, don't feel bound to a single upfront design, refactor your initial code until it better fits your needs, as you discover them. Hth, Claus [1] http://www.haskell.org/haskellwiki/Extensible_record [2] http://hackage.haskell.org/trac/ghc/wiki/ExtensibleRecords [3] http://hackage.haskell.org/trac/ghc/ticket/1872 [4] http://community.haskell.org/~claus/

Hi,
Using a type class in the way Wren suggests seems to be the right way
to do this in Haskell, as it is at the moment. I don't think that
this an inappropriate use of type classes at all---in fact, it is
exactly what type classes were designed to do (i.e., allow you to
reuse the same name at different types). Note that you can combine
type classes and records to cut down on the typing:
data Request = Request { request_channel :: Channel, ... }
data Response = Response { response_channel :: Channel, ... }
class HasChannel t where
get_channel :: t -> Channel
set_channel :: Channel -> t -> t
instance HasChannel Request where
get_channel = request_channel
set_channel x t = t { response_channel = x }
and so on. It is a bit verbose, but you only have to do it once for
your protocol, and then you get the nice overloaded interface.
Actually, having the non-overloaded names might also be useful in some
contexts (e.g., to resolve ambiguities).
-Iavor
On Mon, May 25, 2009 at 7:32 PM, wren ng thornton
ntupel@googlemail.com wrote:
This however does not work because record selectors have module scope, so the compiler will complain that channel et. al. are defined multiple times. As a workaround I could put each type into its own module, but at least GHC requires a file per module (which is *very* inconvenient IMO). If we would have scoped labels (e.g. like proposed here: http://legacy.cs.uu.nl/daan/pubs.html#scopedlabels) it seems like it would have been straightforward.
So certainly I am missing something and there is a better way to design this. Hence this e-mail. I welcome any advice how this would best be done in Haskell with GHC.
One alternative is to use Haskell's support for ad-hoc overloading. Define a typeclass for each selector (or group of selectors that must always occur together) which is polymorphic in the record type. Combine this with the separate constructor types to get something like:
data HandshakeRequest = HandshakeRequest String ... data HandshakeResponse = HandshakeResponse String Bool ... ... data BayeuxMessage = HSReq HandshakeRequest | HSRes HandshakeResponse ...
class BayeuxChannel r where channel :: r -> String instance BayeuxChannel HandshakeRequest where channel (HandshakeRequest ch ...) = ch instance BayeuxChannel HandshakeResponse where channel (HandshakeResponse ch _ ...) = ch ... class BayeuxSuccessful r where successful :: r -> Bool ...
It's not pretty, but it gets the job done. Many people decry this as improper use of typeclasses though (and rightly so). A better approach would probably be to use GADTs or the new data families which give a sort of dual of typeclasses (typeclasses give a small set of functions for a large set of types; GADTs give a large set of functions for a small set of types[0]). Someone more familiar with those approaches should give those versions.
If you want to be able to set the fields as well as read them then the classes should be more like lenses than projectors. For instance, this[1] discussion on Reddit. The two obvious options are a pair of setter and getter functions: (Whole->Part, Whole->Part->Whole); or a factored version of the same: Whole->(Part, Part->Whole).
You should also take a look at the data-accessor packages[2][3] which aim to give a general solution to the lens problem. Also take a look at hptotoc[4], the Haskell implementation of Google's Protocol Buffers which has many similar problems to your Bayeaux protocol. In general, protocols designed for OO are difficult to translate into non-OO languages.
[0] http://blog.codersbase.com/tag/gadt/ [1] http://www.reddit.com/r/haskell/comments/86oc3/yet_another_proposal_for_hask... [2] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/data-accessor [3] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/data-accessor-tem... [4] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/hprotoc
-- Live well, ~wren _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Sun, May 31, 2009 at 06:20:23PM -0700, Iavor Diatchki wrote:
and so on. It is a bit verbose, but you only have to do it once for your protocol, and then you get the nice overloaded interface.
This also seems like the kind of thing perfectly suited to Template Haskell. Especially if the records might end up being modified, fields added, etc., having some TH code to regenerate all the necessary classes and instances from some compact description could be a big win, and probably not too hard to code either. -Brent

On May 25, 2009, at 08:20 , ntupel@googlemail.com wrote:
data HandshakeRequest = HandshakeRequest { channel :: String , ... } data HandshakeResponse = HandshakeResponse { channel :: String, successful :: Bool, ... } ...
data BayeuxMessage = HSReq HandshakeRequest | HSRes HandshakeResponse ...
This however does not work because record selectors have module scope, so the compiler will complain that channel et. al. are defined multiple times. As a workaround I could put each type into its own
Try -XDisambiguateRecordFields? -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH
participants (7)
-
Brandon S. Allbery KF8NH
-
Brent Yorgey
-
Claus Reinke
-
Iavor Diatchki
-
ntupel
-
ntupel@googlemail.com
-
wren ng thornton