A Lightweight Implementation of Generics and Dynamics (LIGD)

Hello all, Johan mentioned in his last email that, starting from October 2nd (yesterday), we could begin to discuss the different approaches to Generic Programming (GP). Since I have not seen any email concerning this I'll open the hostilities, which will hopefully sparkle some discussion. I provide my summary for LIGD in the end of this email, but before that I would like to discuss a few things: 1) While discussing the different approaches, we should also develop a "template" for reporting our conclusions. I think that having a base template will make our life easier. At the moment I am using the following: - Approach (Just the name of the approach) - Required features (What do we need) - Usage (how to use it, what different users do we have and what knowledge do they need) - Extensibility (is it extensible, if not can we do anything about it?) - Performance considerations (what are the performance impacts, is it possible to improve?) - Expressibility (what kind of generic functions can we write? producer and consumers? different arities? local redefinition?) - Boilerplate (what is the boilerplate involved, is it possible to remove it?) - Helpful extra features (what could Haskell have that would make the approach easier to use? Are those features realistically feasible?) - Discussion (General discussion, advantages/disadvantages, comparasion with other approaches) but note that this is not, by any means, supposed to be a proposal for the final template. It is just my initial template. We probably should discuss the template as we go along and tune it (if you think it is a good idea to have it in the first place). 2) Johan also mentioned a darcs/svn repository for the resulting paper/document. I think this is a great idea. Is anyone taking care of this? It would also be a good idea to use the repository to store some code for the different approaches. This way people who would like to experiment with code, would not have to repeat the task individually. What do you think? 3) Related to 2), we could start by collecting code for the approaches from the authors of the papers. I guess most of them are in this mailing list anyway. For example, we could now ask Ralf Hinze or James Cheney (either one of you two reading this email ?) if they some code available for LIGD that they could submit. We could then add this code to darcs/svn and do some experiments with it: organize it like a library, define generic functions, etc. If no code is available, perhaps we should find a volunteer that implements the code (if that's the case for LIGD I can volunteer for it). Here is my summary of LIGD, which is not complete but I think it can be a starting point. ======================================================================= Approach: A Lightweight Implementation of Generics and Dynamics Required features: - Haskell 98 + existential types Usage: - Library Writer: Responsible for the sums-of-products machinery, the representation(s) data type(s) to be defined in the library and defining the class and standard instances for "Representable". The library writer needs to be have a deep knowledge about generic programming. - Power User: Defines the boilerplate for new data types. Needs to be fairly familiar with the implementation. - User: defines generic functions by pattern matching on the structure of "Rep". This requires that he is comfortable with sums of products and isomorphisms. For convenience, the user should also provide a definition that uses the type class "Representable", which avoids the representations to be passed explicitly. - End User: If "Representable" is used, then the end can just use generic functions in the same way as any other ad-hoc (that is, type class overloaded) functions. Extensibility: - Data types are non-extensible, however we can try to combine the approach with type classes to get some form of extensibility (RepLib proposes one solution like this). Performance considerations: - Sums of products based and therefore involving considerable performance penalties. Expressibility: - Can do both producer and consumer functions. - Multiple representations needed for generic functions of different arities; - No local redefinitions. Boilerplate: - Isomorphisms between sums of products and data types - Instances of Representable - Smart constructors for representation of data types Helpful extra features: - GADTs (already implemented in GHC) allow more direct definitions of generic functions, since there is no need to apply the type coercions explicitly; - A boilerplate generation mechanism. This would effectively mean that power users would not be necessary. - Open data types and functions in order for extensibility. - Kind polymorphism, for eliminating the need for multiple representations; Discussion: An interesting approach which is quite lightweight and that is fairly easy to use (specially if we would not need to handle any of the boilerplate). It is a bit outdated because with GADTs, the use can be slightly simplified. However, this also takes us further away from Haskell 98 or even Haskell' (since GADTs will not be there). I think that LIGD is still a reference for a generic programming library that uses a data type for type representations and sums of products as a family of data types. A drawback may be performance, since we need to convert a data type into its isomorphic sum of products representation. ============================================================= Cheers, Bruno

Bruno Oliveira:
2) Johan also mentioned a darcs/svn repository for the resulting paper/document. I think this is a great idea. Is anyone taking care of this? It would also be a good idea to use the repository to store some code for the different approaches. This way people who would like to experiment with code, would not have to repeat the task individually. What do you think?
I'd say darcs.haskell.org is the right place for that.
======================================================================= Approach: A Lightweight Implementation of Generics and Dynamics [..] It is a bit outdated because with GADTs, the use can be slightly simplified. However, this also takes us further away from Haskell 98 or even Haskell' (since GADTs will not be there).
The 2nd Haskell' StrawPoll http://hackage.haskell.org/trac/haskell-prime/wiki/StrawPoll-2 had for GADTs 5 Y 4 M 4 N So, it's currently undecided whether they will be in. (And personally, I don't think we should hold ourselves up with H98.) Manuel

Hello,
Bruno Oliveira:
2) Johan also mentioned a darcs/svn repository for the resulting paper/document. I think this is a great idea. Is anyone taking care of this? It would also be a good idea to use the repository to store some code for the different approaches. This way people who would like to experiment with code, would not have to repeat the task individually. What do you think?
I'd say darcs.haskell.org is the right place for that.
Ok, that seems good to me. Are there any volunteers for setting up the repository at darcs.haskell.org? I must confess that I am not too familiar with darcs and I don't know what is needed for setting up a new repository/directory at darcs.haskell.org. So I would prefer if someone else (other than me) would volunteer for this task.
======================================================================= Approach: A Lightweight Implementation of Generics and Dynamics [..] It is a bit outdated because with GADTs, the use can be slightly simplified. However, this also takes us further away from Haskell 98 or even Haskell' (since GADTs will not be there).
The 2nd Haskell' StrawPoll
http://hackage.haskell.org/trac/haskell-prime/wiki/StrawPoll-2
had for GADTs
5 Y 4 M 4 N
So, it's currently undecided whether they will be in. (And personally, I don't think we should hold ourselves up with H98.)
Thanks for the info, I didn't now that. One last thing, since I have not heard from Ralf Hinze and James Cheney, I'll send them an email asking if they have some code lying around. Cheers, Bruno Oliveira

Bruno Oliveira wrote:
2) Johan also mentioned a darcs/svn repository for the resulting paper/document. I think this is a great idea. Is anyone taking care of this? It would also be a good idea to use the repository to store some code for the different approaches. This way people who would like to experiment with code, would not have to repeat the task individually. What do you think?
I'd say darcs.haskell.org is the right place for that.
Ok, that seems good to me. Are there any volunteers for setting up the repository at darcs.haskell.org? I must confess that I am not too familiar with darcs and I don't know what is needed for setting up a new repository/directory at darcs.haskell.org. So I would prefer if someone else (other than me) would volunteer for this task.
I can do that if everybody is happy with it. Manuel

===================================================================== == Approach: A Lightweight Implementation of Generics and Dynamics [..] It is a bit outdated because with GADTs, the use can be slightly simplified. However, this also takes us further away from Haskell 98 or even Haskell' (since GADTs will not be there).
The 2nd Haskell' StrawPoll
http://hackage.haskell.org/trac/haskell-prime/wiki/StrawPoll-2
had for GADTs
5 Y 4 M 4 N
So, it's currently undecided whether they will be in. (And personally, I don't think we should hold ourselves up with H98.)
I think this is probably something we should take up again once we have the differences between the different approaches clear. I can imagine a number of scenarios: - designing three different libraries (Haskell 98, Haskell', and GHC + all extensions), - or a single library that is easier to use the more extensions of Haskell 98 is available, - forgetting about Haskell 98 altogether if Haskell' is emerging at that time, -... But maybe we should defer this discussion until the beginning of next year. -- Johan

Dear James, Ralf As you might know, Johan proposed people related to the generic programming community to gather and try to come up with a library for generic programming in Haskell. The discussions are supposed to start this week with LIGD, which you two proposed. Yesterday I sent an email to generics@haskell.org (our discussion mailing list) and gave a quick summary of your approach. I also proposed that we should collect code from the authors for the various approaches. You can see the original email at the end of the message. So, since we are now discussing LIGD, I was wondering if you have some code that you wouldn't mind to share with us. It won't be terribly hard for us to encode your paper (you even have a listing), but perhaps you have some extra code that didn't make it into the paper that would be interesting to discuss. For example, you mention at the end of LIGD that using a different representation type we could encode a generic map. Also, perhaps you could comment about the use of LIGD as a basis for a generic programming library. Regards, Bruno Oliveira On Tue, 03 Oct 2006 12:47:04 +0100, Bruno Oliveira wrote:
Hello all,
Johan mentioned in his last email that, starting from October 2nd (yesterday), we could Since I have not seen any email concerning this I'll open the hostilities, which will hopefully sparkle some discussion.
I provide my summary for LIGD in the end of this email, but before that I would like to discuss a few things:
1) While discussing the different approaches, we should also develop a "template" for reporting our conclusions. I think that having a base template will make our life easier. At the moment I am using the following:
- Approach (Just the name of the approach) - Required features (What do we need) - Usage (how to use it, what different users do we have and what knowledge do they need) - Extensibility (is it extensible, if not can we do anything about it?) - Performance considerations (what are the performance impacts, is it possible to improve?) - Expressibility (what kind of generic functions can we write? producer and consumers? different arities? local redefinition?) - Boilerplate (what is the boilerplate involved, is it possible to remove it?) - Helpful extra features (what could Haskell have that would make the approach easier to use? Are those features realistically feasible?) - Discussion (General discussion, advantages/disadvantages, comparasion with other approaches)
but note that this is not, by any means, supposed to be a proposal for the final template. It is just my initial template. We probably should discuss the template as we go along and tune it (if you think it is a good idea to have it in the first place).
2) Johan also mentioned a darcs/svn repository for the resulting paper/document. I think this is a great idea. Is anyone taking care of this? It would also be a good idea to use the repository to store some code for the different approaches. This way people who would like to experiment with code, would not have to repeat the task individually. What do you think?
3) Related to 2), we could start by collecting code for the approaches from the authors of the papers. I guess most of them are in this mailing list anyway. For example, we could now ask Ralf Hinze or James Cheney (either one of you two reading this email ?) if they some code available for LIGD that they could submit. We could then add this code to darcs/svn and do some experiments with it: organize it like a library, define generic functions, etc. If no code is available, perhaps we should find a volunteer that implements the code (if that's the case for LIGD I can volunteer for it).
Here is my summary of LIGD, which is not complete but I think it can be a starting point.
======================================================================= Approach: A Lightweight Implementation of Generics and Dynamics
Required features: - Haskell 98 + existential types
Usage: - Library Writer: Responsible for the sums-of-products machinery, the representation(s) data type(s) to be defined in the library and defining the class and standard instances for "Representable". The library writer needs to be have a deep knowledge about generic programming. - Power User: Defines the boilerplate for new data types. Needs to be fairly familiar with the implementation. - User: defines generic functions by pattern matching on the structure of "Rep". This requires that he is comfortable with sums of products and isomorphisms. For convenience, the user should also provide a definition that uses the type class "Representable", which avoids the representations to be passed explicitly. - End User: If "Representable" is used, then the end can just use generic functions in the same way as any other ad-hoc (that is, type class overloaded) functions.
Extensibility: - Data types are non-extensible, however we can try to combine the approach with type classes to get some form of extensibility (RepLib proposes one solution like this).
Performance considerations: - Sums of products based and therefore involving considerable performance penalties.
Expressibility: - Can do both producer and consumer functions. - Multiple representations needed for generic functions of different arities; - No local redefinitions.
Boilerplate: - Isomorphisms between sums of products and data types - Instances of Representable - Smart constructors for representation of data types
Helpful extra features: - GADTs (already implemented in GHC) allow more direct definitions of generic functions, since there is no need to apply the type coercions explicitly; - A boilerplate generation mechanism. This would effectively mean that power users would not be necessary. - Open data types and functions in order for extensibility. - Kind polymorphism, for eliminating the need for multiple representations;
Discussion: An interesting approach which is quite lightweight and that is fairly easy to use (specially if we would not need to handle any of the boilerplate). It is a bit outdated because with GADTs, the use can be slightly simplified. However, this also takes us further away from Haskell 98 or even Haskell' (since GADTs will not be there). I think that LIGD is still a reference for a generic programming library that uses a data type for type representations and sums of products as a family of data types. A drawback may be performance, since we need to convert a data type into its isomorphic sum of products representation. =============================================================
Cheers,
Bruno
_______________________________________________ Generics mailing list Generics@haskell.org http://www.haskell.org/mailman/listinfo/generics

| I provide my summary for LIGD in the end of this email, but before that I would like to | discuss a few things: | | 1) While discussing the different approaches, we should also develop a "template" | for reporting our conclusions. I think that having a base template will make our life | easier. Good plan. Rather than gathering these summaries in email, could we put them on the Haskell Wiki? (The accompanying code could be in Darcs, but it could also be simply attachments to the relevant wiki page) Simon

Hello Simon, Wednesday, October 4, 2006, 5:16:56 PM, you wrote:
Good plan. Rather than gathering these summaries in email, could we put them on the Haskell Wiki? (The accompanying code could be in Darcs, but it could also be simply attachments to the relevant wiki page)
from my POV, it's easier to keep text on wiki and code in darcs - using "native" media makes it easier to maintain (update/pull/compare versions) both text on wiki and code in darcs -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Hi,
Sorry! I was subscribed to the list, but I never received any emails
from the generics list server until yesterday, so I didn't realize this
discussion was happening. (I think the problem was that my list account
was sent to "digest"; I only received the first digest yesterday also.
So it might be good for the list owner to check on this...) On top of
that, somehow I then managed to delete both Bruno's message and the
digest.
Since I haven't done much more research on generic programming per se,
I'm probably not capable of adding much to a discussion of LIGD beyond
any comments Ralf and other experts have (e.g. in the "comparing
approaches" paper). From what I've seen, Ralf's "Fun with phantom
types", "generics for the masses" and from what I've heard Stephanie's
RepLib (caveat: haven't read that paper carefully) are in a similar
spirit and subsume much or all of what's in the LIGD paper. Also, much
more now appears to be known about complementary things like GADTs,
open/extensible types and functions which could help fix some of the
limitations of the original LIGD approach.
Assuming that GADTs do eventually become a standard feature, I'd
definitely be in favor of using them instead of explicit to/from
conversions in an implementation of LIGD, for efficiency if nothing
else. In addition, as the FCPT paper notes, there are some natural
seeming things that you can only do if equations are known to the type
system. This was one motivation for our subsequent tech report on
"first class phantom types", which essentially proposed extending
Haskell 98 with equation guards on datatypes, i.e. GADTs. But, once you
have GADTs a lot of things appear to get simpler, and simulating
first-class generics/dynamics via type representations seem to be just
one example.
I'm attaching a tarball containing everything that both I and ghci could
make sense of from my code directory for the HW 02 paper; it includes
Dynamic.lhs -- Isomorphism-based equality types
DynamicLeibniz.lhs -- Leibniz-based equality types
Lambda.hs -- de Bruijn terms via type equations
Nat.hs -- Some peano arithmetic using type equations
FoldRep.hs -- Generates "fold" function from a Rep
MapRep.hs -- An example of a representation tailored to maps.
-- (Not sure this makes sense...)
RepFn.lhs -- An example of an encoding of a function on types
-- (specifically, the one that maps each int to a
pair of Ints)
Ralf may have more/better commented code...
I may have more coherent thoughts later as well (and will try to
participate more actively now that I should be getting list emails
regularly).
--James
--
James Cheney

Thanks Bruno, for kicking off. I have had meetings all week, and worked on the forthcoming release of Generic Haskell. I'll leave for holidays for two weeks tomorrow, so I won't join the discussion until the week of October 23 again.
1) While discussing the different approaches, we should also develop a "template" for reporting our conclusions. I think that having a base template will make our life easier. At the moment I am using the following:
- Approach (Just the name of the approach) - Required features (What do we need) - Usage (how to use it, what different users do we have and what knowledge do they need) - Extensibility (is it extensible, if not can we do anything about it?) - Performance considerations (what are the performance impacts, is it possible to improve?) - Expressibility (what kind of generic functions can we write? producer and consumers? different arities? local redefinition?) - Boilerplate (what is the boilerplate involved, is it possible to remove it?) - Helpful extra features (what could Haskell have that would make the approach easier to use? Are those features realistically feasible?) - Discussion (General discussion, advantages/disadvantages, comparasion with other approaches)
but note that this is not, by any means, supposed to be a proposal for the final template. It is just my initial template. We probably should discuss the template as we go along and tune it (if you think it is a good idea to have it in the first place).
I definitely think it is a good idea to have such a template. I think you cover all relevant aspects. I recently thought about such a template again, and also included: - Subset of data types covered. Most proposals restrict the subset of data types on which generic functions can be applied. The restrictions vary for the different proposals. This is related to expressiveness, but different enough to warrant a special category, I think. Furthermore, I called Boilerplate `Amount of work per data type', at least, that is what I think you mean with boilerplate? The name boilerplate might be a bit confusing: what is the boilerplate of SYB (nothing! of course (well, except for deriving Typeable), bust still, I suppose you get my point). Something I'd also like to look at is: what are the kind of type- error messages you get when you make a mistake in your generic function? How `comprehensible' is the program (whatever that means, and it probably means different things to different people)? At least size plays a role here, but also other aspects. And is it easy to prove properties of generic programs? Finally, I called required features portability, but there is something to say for both.
2) Johan also mentioned a darcs/svn repository for the resulting paper/document. I think this is a great idea. Is anyone taking care of this? It would also be a good idea to use the repository to store some code for the different approaches. This way people who would like to experiment with code, would not have to repeat the task individually. What do you think?
Good idea. The darcs repository seems like a good idea. Manuel offered to creat one.
3) Related to 2), we could start by collecting code for the approaches from the authors of the papers. I guess most of them are in this mailing list anyway. For example, we could now ask Ralf Hinze or James Cheney (either one of you two reading this email ?) if they some code available for LIGD that they could submit. We could then add this code to darcs/svn and do some experiments with it: organize it like a library, define generic functions, etc. If no code is available, perhaps we should find a volunteer that implements the code (if that's the case for LIGD I can volunteer for it).
Ralf implemented some function in LIGD for our comparing paper, I attach it to this message (together with a file Xbits that is used somewhere; maybe there is a standard Haskell library for converting strings into bits and vv? These should appear in the Darcs repository when it is there. Note that the code is better viewed on paper then as code).
====================================================================== = Approach: A Lightweight Implementation of Generics and Dynamics
Required features: - Haskell 98 + existential types
Usage: - Library Writer: Responsible for the sums-of-products machinery, the representation(s) data type(s) to be defined in the library and defining the class and standard instances for "Representable". The library writer needs to be have a deep knowledge about generic programming. - Power User: Defines the boilerplate for new data types. Needs to be fairly familiar with the implementation. - User: defines generic functions by pattern matching on the structure of "Rep". This requires that he is comfortable with sums of products and isomorphisms. For convenience, the user should also provide a definition that uses the type class "Representable", which avoids the representations to be passed explicitly. - End User: If "Representable" is used, then the end can just use generic functions in the same way as any other ad-hoc (that is, type class overloaded) functions.
Extensibility: - Data types are non-extensible, however we can try to combine the approach with type classes to get some form of extensibility (RepLib proposes one solution like this).
What do you mean with extensibility here? Not data types I assume. You mean that LIGD functions are not extensible for new data types? This is a limitation of course. Maybe a more severe limitation is that generic functions can only be defined on types defined in Rep, we cannot add non-generic behaviour for particular data types. Open data types will solve this, but these do not exist in any Haskell implementation.
Performance considerations: - Sums of products based and therefore involving considerable performance penalties.
And types are passed at run-time, and interpreted. Sums of products need not incur considerable performance penalties if partial evaluators remove those intermediate layers. But we have to experiment with behaviour here. Passing and interpreting types at run- time is harder to avoid, I think.
Expressibility: - Can do both producer and consumer functions. - Multiple representations needed for generic functions of different arities; - No local redefinitions.
Boilerplate: - Isomorphisms between sums of products and data types - Instances of Representable - Smart constructors for representation of data types
Helpful extra features: - GADTs (already implemented in GHC) allow more direct definitions of generic functions, since there is no need to apply the type coercions explicitly; - A boilerplate generation mechanism. This would effectively mean that power users would not be necessary. - Open data types and functions in order for extensibility. - Kind polymorphism, for eliminating the need for multiple representations;
Discussion: An interesting approach which is quite lightweight and that is fairly easy to use (specially if we would not need to handle any of the boilerplate). It is a bit outdated because with GADTs, the use can be slightly simplified. However, this also takes us further away from Haskell 98 or even Haskell' (since GADTs will not be there). I think that LIGD is still a reference for a generic programming library that uses a data type for type representations and sums of products as a family of data types. A drawback may be performance, since we need to convert a data type into its isomorphic sum of products representation. =============================================================
I think LIGD is quite a nice library, that needs almost no Haskell extensions. But it has its limitations. -- Johan

Hi Bruno (and all), Sorry for being so sluggish---but I just wanted to chime in with my comments on Bruno's summary. On Oct 3, 2006, at 7:47 AM, Bruno Oliveira wrote:
Hello all,
====================================================================== = Approach: A Lightweight Implementation of Generics and Dynamics
Required features: - Haskell 98 + existential types
Usage: - Library Writer: Responsible for the sums-of-products machinery, the representation(s) data type(s) to be defined in the library and defining the class and standard instances for "Representable". The library writer needs to be have a deep knowledge about generic programming. - Power User: Defines the boilerplate for new data types. Needs to be fairly familiar with the implementation. - User: defines generic functions by pattern matching on the structure of "Rep". This requires that he is comfortable with sums of products and isomorphisms. For convenience, the user should also provide a definition that uses the type class "Representable", which avoids the representations to be passed explicitly. - End User: If "Representable" is used, then the end can just use generic functions in the same way as any other ad-hoc (that is, type class overloaded) functions.
I'm trying to get these roles straight in my head. I think we're going to have a similar division for many of the library proposals. A Generic library has several important components: the "framework" or way of representing a particular type, instances of this framework for the primitive types, helper functions for creating instances for new types, and a base set of generic operations/support code to get things started. The "Library writer" is responsible for all of these. A user of the generic library may want to do two things: (a) define new type-indexed functions (b) ensure that their datatypes can be used with type-indexed functions In Bruno's terminology, - Power User: does both of these - User: does (b) but not (a) - End User: does neither, just calls existing type-indexed functions on existing datatypes. Note that we could imagine eliminating/automating (b) for some platforms with Template Haskell.
Extensibility: - Data types are non-extensible, however we can try to combine the approach with type classes to get some form of extensibility (RepLib proposes one solution like this).
I think I prefer the term "Specializability" (even though it is more of a mouthful.) Generic functions can be specialized at particular datatypes to specific operations. (LIGD does not support this behavior, but could be extended to do so by pushing RepLib through the GADT translation.)
Performance considerations: - Sums of products based and therefore involving considerable performance penalties.
Besides the runtime types and sums-of-products explosion, there is also the issue of applying the identity function type coercions (if they are not optimized away). Those at least would be eliminated by using a GADT. Aside: should we be developing a benchmark suite if we are actually going to be making decisions about performance? I don't want to guess here. Given that we will be collecting versions of several functions for each library, we should be able to run sort of "Library shootout". I also would like to know if there are some specific Haskell extensions that could dramatically improve performance. These extensions would not be portable, of course, but perhaps for specific platforms we could support optimized execution.
Expressibility: - Can do both producer and consumer functions. - Multiple representations needed for generic functions of different arities; - No local redefinitions.
I think we should separate expressibility of the generic functions and expressibility wrt to what type information can index such functions. For the former we talk about producer/consumer functions (read/show), functions of different arities (map,zip), functions defined by higher- order types (map, count, reduce), functions requiring type-indexed types (the zipper, bit-packed arrays, generalized tries). For the latter, we talk about the ability to use the library with types that contain records nested datatypes higher-kinded types (what kinds?) datatypes with existential components (both of kind type and higher kinds) datatypes with first-class polymorphic components (both of kind type and higher kinds) above with class constraints GADTs (simple, and those that subsume existentials...) datatypes with higher-kinded type arguments (I see that Johan also mentions this division in his email.) What do you mean by "local redefinitions"?
Boilerplate: - Isomorphisms between sums of products and data types - Instances of Representable - Smart constructors for representation of data types
Helpful extra features: - GADTs (already implemented in GHC) allow more direct definitions of generic functions, since there is no need to apply the type coercions explicitly;
- A boilerplate generation mechanism. This would effectively mean that power users would not be necessary.
- Open data types and functions in order for extensibility.
- Kind polymorphism, for eliminating the need for multiple representations;
I don't think that kind polymorphism would solve this problem---the multiple representations are needed for functions (like map) that have different arities of kind-indexed types. Kind polymorphism would help a little, but it wouldn't be completely useful without some form of kind-indexed types. (The representations of types with higher kinds is not parametric in those kinds, nor is the behavior of type-indexed operations on those representations.)
Discussion: An interesting approach which is quite lightweight and that is fairly easy to use (specially if we would not need to handle any of the boilerplate). It is a bit outdated because with GADTs, the use can be slightly simplified. However, this also takes us further away from Haskell 98 or even Haskell' (since GADTs will not be there).
I think the case for GADTs in Haskell' would be strengthened if there were other Haskell compilers than GHC that implement GADTs. Does anyone know of any?
I think that LIGD is still a reference for a generic programming library that uses a data type for type representations and sums of products as a family of data types. A drawback may be performance, since we need to convert a data type into its isomorphic sum of products representation. =============================================================
Cheers,
Bruno
Thanks, Stephanie

Stephanie,
What do you mean by "local redefinitions"?
I suppose it could be described as "scoped specializability"; see Chapter 8 of Andres' thesis [1]. Examples of local redefinitions in GH: -- case-insenstive equality let equals {| a |} c1 c2 = equals {| Char |} (toUpper c1) (toUpper c2) in equals {| [a] |} "Hello" "hEllo" -- length of the list let size {| a |} x = 1 in size {| [a] |} [2, 3, 5, 7] -- map toUpper let map {| a |} = toUpper in map {| [a] |} "Hello, world!" -- flattens the list of lists let collect {| a |} x = [x] in collect {| [[a]] |} xss -- unlines let collect {| a |} s = s ++ "\n" in collect {| [a] |} ["Hello", "World"] Cheers, Stefan ---- [1] Andres Loeh. Exploring Generic Haskell. Ph.D. thesis, Utrecht University, 2004.

Hi there. A few remarks.
Usage: - Library Writer: Responsible for the sums-of-products machinery, the representation(s) data type(s) to be defined in the library and defining the class and standard instances for "Representable". The library writer needs to be have a deep knowledge about generic programming. - Power User: Defines the boilerplate for new data types. Needs to be fairly familiar with the implementation. - User: defines generic functions by pattern matching on the structure of "Rep". This requires that he is comfortable with sums of products and isomorphisms. For convenience, the user should also provide a definition that uses the type class "Representable", which avoids the representations to be passed explicitly. - End User: If "Representable" is used, then the end can just use generic functions in the same way as any other ad-hoc (that is, type class overloaded) functions.
I'm trying to get these roles straight in my head. I think we're going to have a similar division for many of the library proposals.
A Generic library has several important components: the "framework" or way of representing a particular type, instances of this framework for the primitive types, helper functions for creating instances for new types, and a base set of generic operations/support code to get things started. The "Library writer" is responsible for all of these.
A user of the generic library may want to do two things: (a) define new type-indexed functions (b) ensure that their datatypes can be used with type-indexed functions In Bruno's terminology, - Power User: does both of these - User: does (b) but not (a) - End User: does neither, just calls existing type-indexed functions on existing datatypes.
I think that this is a good summary.
Note that we could imagine eliminating/automating (b) for some platforms with Template Haskell.
Is this a statement specific to LIGD? Using Template Haskell would in my opinion be a good argument for Template Haskell itself, but not very practical, because I have the feeling that it is a relatively unstable (in the sense of interface changes) extension of the Haskell language. Also, it is definitely not going to be in Haskell', and it's use wouldn't really be optional, because TH-generated code can (as far as I know) so far not be exported and saved as a normal Haskell module. Using a tool like DrIFT would have the advantage that people not having that tool available could still make full use of generic programs. They'd just have more work.
Extensibility: - Data types are non-extensible, however we can try to combine the approach with type classes to get some form of extensibility (RepLib proposes one solution like this).
I think I prefer the term "Specializability" (even though it is more of a mouthful.) Generic functions can be specialized at particular datatypes to specific operations. (LIGD does not support this behavior, but could be extended to do so by pushing RepLib through the GADT translation.)
The phrase "specialization" makes sense if you consider generic functions that in principle work on all data types, and now are trying to assign a new, specific behaviour to the function for one data type. But what if we have defined a function in the style of a generic programmign approach such as LIGD that does not have a catch-all case and does only work for a limited number of data types. That'd correspond to an ad-hoc overloaded functions. You might still want to extend such functions to new data types. And in this case, I think it's better to call the function "extensible" or "open".
Performance considerations: - Sums of products based and therefore involving considerable performance penalties.
Besides the runtime types and sums-of-products explosion, there is also the issue of applying the identity function type coercions (if they are not optimized away). Those at least would be eliminated by using a GADT.
Isn't "explosion" a bit exaggerated, or am I missing something? I think the main overhead of the sums-of-products representation lies in the fact that it is different from the representation Haskell usually has for a value. So, at least in principle, every value has to actually be converted twice in order to apply a generic function, and optimizing this is tricky, especially if we want to do it in Haskell, without applying special knowledge about the application domain.
Aside: should we be developing a benchmark suite if we are actually going to be making decisions about performance? I don't want to guess here. Given that we will be collecting versions of several functions for each library, we should be able to run sort of "Library shootout".
Yes, I think that would be a very good idea.
I also would like to know if there are some specific Haskell extensions that could dramatically improve performance. These extensions would not be portable, of course, but perhaps for specific platforms we could support optimized execution.
Maybe RULES pragmas could help, but it would almost certainly also imply writing the conversion functions in a specific style.
Expressibility: - Can do both producer and consumer functions. - Multiple representations needed for generic functions of different arities; - No local redefinitions.
I think we should separate expressibility of the generic functions and expressibility wrt to what type information can index such functions.
For the former we talk about producer/consumer functions (read/show), functions of different arities (map,zip), functions defined by higher- order types (map, count, reduce), functions requiring type-indexed types (the zipper, bit-packed arrays, generalized tries).
For the latter, we talk about the ability to use the library with types that contain records nested datatypes higher-kinded types (what kinds?) datatypes with existential components (both of kind type and higher kinds) datatypes with first-class polymorphic components (both of kind type and higher kinds) above with class constraints GADTs (simple, and those that subsume existentials...) datatypes with higher-kinded type arguments
Yes, that is a good idea as well. I will have another look at LIGD to find out how that fits in these categories.
What do you mean by "local redefinitions"?
Stefan has already given an answer. I think Bruno mentions this here because "local redefinitions" are a bit like having functions of different arities.
Boilerplate: - Isomorphisms between sums of products and data types - Instances of Representable - Smart constructors for representation of data types
Helpful extra features: - GADTs (already implemented in GHC) allow more direct definitions of generic functions, since there is no need to apply the type coercions explicitly;
- A boilerplate generation mechanism. This would effectively mean that power users would not be necessary.
- Open data types and functions in order for extensibility.
- Kind polymorphism, for eliminating the need for multiple representations;
I don't think that kind polymorphism would solve this problem---the multiple representations are needed for functions (like map) that have different arities of kind-indexed types.
Kind polymorphism would help a little, but it wouldn't be completely useful without some form of kind-indexed types. (The representations of types with higher kinds is not parametric in those kinds, nor is the behavior of type-indexed operations on those representations.)
I think that's true. Nevertheless, kind polymorphism would be a necessary first step to do these things.
Discussion: An interesting approach which is quite lightweight and that is fairly easy to use (specially if we would not need to handle any of the boilerplate). It is a bit outdated because with GADTs, the use can be slightly simplified. However, this also takes us further away from Haskell 98 or even Haskell' (since GADTs will not be there).
I think the case for GADTs in Haskell' would be strengthened if there were other Haskell compilers than GHC that implement GADTs. Does anyone know of any?
I haven't heard of any plans. The next HCAR is going to happen soon, so maybe I'll get a report from one of the implementation teams that tells me they're working on it. Cheers, Andres

Hello, Before any comments, just a reminder that there is an updated summary for LIGD at the Haskell Wiki http://www.haskell.org/haskellwiki/Libraries_and_tools/Generic_programming/L... where I merged some of Johan Jeuring and James Cheney previous comments in.
I'm trying to get these roles straight in my head. I think we're going to have a similar division for many of the library proposals.
A Generic library has several important components: the "framework" or way of representing a particular type, instances of this framework for the primitive types, helper functions for creating instances for new types, and a base set of generic operations/support code to get things started. The "Library writer" is responsible for all of these.
A user of the generic library may want to do two things: (a) define new type-indexed functions (b) ensure that their datatypes can be used with type-indexed functions In Bruno's terminology, - Power User: does both of these - User: does (b) but not (a) - End User: does neither, just calls existing type-indexed functions on existing datatypes.
I think that this is a good summary.
Right, I think this is useful too, I added this information to the template page: http://www.haskell.org/haskellwiki/Libraries_and_tools/Generic_programming/T... For now it is almost just a copy and paste from Stephanie's text in the template page. But I think eventually, we should create a separate page explaining this information about the users.
Note that we could imagine eliminating/automating (b) for some platforms with Template Haskell.
Is this a statement specific to LIGD? Using Template Haskell would in my opinion be a good argument for Template Haskell itself, but not very practical, because I have the feeling that it is a relatively unstable (in the sense of interface changes) extension of the Haskell language. Also, it is definitely not going to be in Haskell', and it's use wouldn't really be optional, because TH-generated code can (as far as I know) so far not be exported and saved as a normal Haskell module.
Using a tool like DrIFT would have the advantage that people not having that tool available could still make full use of generic programs. They'd just have more work.
Like Andres, I would also be reluctant in saying that TH is the best solution for b). DrIFT has the advantage that is not tied to GHC and I guess it is relatively stable. Another alternative is that, provided that we come with some standart library to generic programming in Haskell, we can just propose some lightweight extension for eliminating/automating b). That's what happens already in SyB and Derivable Type Classes.
Extensibility: - Data types are non-extensible, however we can try to combine the approach with type classes to get some form of extensibility (RepLib proposes one solution like this).
I think I prefer the term "Specializability" (even though it is more of a mouthful.) Generic functions can be specialized at particular datatypes to specific operations. (LIGD does not support this behavior, but could be extended to do so by pushing RepLib through the GADT translation.)
The phrase "specialization" makes sense if you consider generic functions that in principle work on all data types, and now are trying to assign a new, specific behaviour to the function for one data type. But what if we have defined a function in the style of a generic programmign approach such as LIGD that does not have a catch-all case and does only work for a limited number of data types. That'd correspond to an ad-hoc overloaded functions. You might still want to extend such functions to new data types. And in this case, I think it's better to call the function "extensible" or "open".
Yes, I think I would associate "specialization" with a form of partial evaluation that takes a generic function and produces a specialized function for a specific type. This is not the same as adding an ad-hoc case to a generic function and using the same term for the two may lead to some confusion.
Aside: should we be developing a benchmark suite if we are actually going to be making decisions about performance? I don't want to guess here. Given that we will be collecting versions of several functions for each library, we should be able to run sort of "Library shootout".
Yes, I think that would be a very good idea.
Right, this calls for a new Wiki page and perhaps some volunteer for the task. Anyone eager to collect generic functions/ develop performance cases and report those in the Wiki :)?
I also would like to know if there are some specific Haskell extensions that could dramatically improve performance. These extensions would not be portable, of course, but perhaps for specific platforms we could support optimized execution.
Maybe RULES pragmas could help, but it would almost certainly also imply writing the conversion functions in a specific style.
Expressibility: - Can do both producer and consumer functions. - Multiple representations needed for generic functions of different arities; - No local redefinitions.
I think we should separate expressibility of the generic functions and expressibility wrt to what type information can index such functions.
For the former we talk about producer/consumer functions (read/show), functions of different arities (map,zip), functions defined by higher- order types (map, count, reduce), functions requiring type-indexed types (the zipper, bit-packed arrays, generalized tries).
For the latter, we talk about the ability to use the library with types that contain records nested datatypes higher-kinded types (what kinds?) datatypes with existential components (both of kind type and higher kinds) datatypes with first-class polymorphic components (both of kind type and higher kinds) above with class constraints GADTs (simple, and those that subsume existentials...) datatypes with higher-kinded type arguments
Yes, that is a good idea as well. I will have another look at LIGD to find out how that fits in these categories.
Ok, would you like to report the results in the Wiki page? There is already a section for the subset of types covered. And I added Stephanie's List to the template page. Please feel free to edit the LIGD Wiki page and add your own comments. Cheers, Bruno Oliveira

Hi all, I have a few replies to Andres' comments:
Extensibility: - Data types are non-extensible, however we can try to combine the approach with type classes to get some form of extensibility (RepLib proposes one solution like this).
I think I prefer the term "Specializability" (even though it is more of a mouthful.) Generic functions can be specialized at particular datatypes to specific operations. (LIGD does not support this behavior, but could be extended to do so by pushing RepLib through the GADT translation.)
The phrase "specialization" makes sense if you consider generic functions that in principle work on all data types, and now are trying to assign a new, specific behaviour to the function for one data type. But what if we have defined a function in the style of a generic programmign approach such as LIGD that does not have a catch-all case and does only work for a limited number of data types. That'd correspond to an ad-hoc overloaded functions. You might still want to extend such functions to new data types. And in this case, I think it's better to call the function "extensible" or "open".
I disagree here. Generic functions are not the same as ad-hoc overloaded functions because of datatype-genericity. This allows them to be defined by the structure of the datatype instead of its name, and is (I would argue) the key difference between generic programming and ad-hoc overloading, such as with type classes. Even though datatype genericity may not extend to all datatypes (Such as those that contain functions, or those that haven't been represented, there is a large class of common types that it does apply to.) Part of the confusion is that it is often the same mechanism solving two different problems. (a) Generic functions apply *uniformly* based on type structure. (b) Generic functions apply to many *but not all* types. Sometimes we would like a generic function to behave specially for a particular type in its domain. For example, newtype Set = S [Int] should use set equality not list equality. Sometimes we would like to extend a generic function to a type not in its domain. For example, in newtype Set = S ([Int], Int -> Bool), perhaps the function doesn't matter and we can define equality just by looking at the integers. In both cases, generic equality for types that contain Set should use the user-specified version of equality. Is there a framework that solves one problem but not the other? Or that solves the two problems with two different mechanisms? Personally, I think problem (a) occurs much more often than (b), which is why I prefer "specializability".
Performance considerations: - Sums of products based and therefore involving considerable performance penalties.
Besides the runtime types and sums-of-products explosion, there is also the issue of applying the identity function type coercions (if they are not optimized away). Those at least would be eliminated by using a GADT.
Isn't "explosion" a bit exaggerated, or am I missing something?
Sorry. That was an obscure analogy :-). In SML, strings are arrays of characters, and the function "explode" converts a string to a list of characters. I was thinking of the sum-of-products view as converting constructor arguments from a packed tuple (that cannot be accessed directly) into an accessible list of components.
I think the main overhead of the sums-of-products representation lies in the fact that it is different from the representation Haskell usually has for a value. So, at least in principle, every value has to actually be converted twice in order to apply a generic function, and optimizing this is tricky, especially if we want to do it in Haskell, without applying special knowledge about the application domain.
When you say "application domain" what do you mean? I think that if we make sure to use special purpose types to store the arguments of the constructor in the view and provide a library of operations for those types, then (perhaps) a smart Haskell compiler could optimize things under the covers.
What do you mean by "local redefinitions"?
Stefan has already given an answer. I think Bruno mentions this here because "local redefinitions" are a bit like having functions of different arities.
Sorry if I'm being dense, but I still don't see the connection between "local redefinitions" and different arities.
I think the case for GADTs in Haskell' would be strengthened if there were other Haskell compilers than GHC that implement GADTs. Does anyone know of any?
I haven't heard of any plans. The next HCAR is going to happen soon, so maybe I'll get a report from one of the implementation teams that tells me they're working on it.
Great! Thanks, Stephanie
participants (9)
-
Andres Loeh
-
Bruno Oliveira
-
Bulat Ziganshin
-
James Cheney
-
Johan Jeuring
-
Manuel M T Chakravarty
-
Simon Peyton-Jones
-
Stefan Holdermans
-
Stephanie Weirich