
On 06/03/06, Johannes Waldmann
With respect to the discussion on records, let me throw in my usual warning: all of this seems overly obsessed with concrete representations of data types.
We already have mechanisms for abstraction. There's a gap in our ability to form certain concrete representations we might want. This paper simply describes how to add those representations to the language in a nice way.
The representation should not be exposed in the first place: you don't want to access it (=> make all fields private) you don't want to extend it (=> implementation inheritance is bad, interface inheritance is good.) (read e. g. Introduction to Design Patterns by Gamma et al.)
You think it is a win to be able to write a function that takes "everything that has a foo :: Foo component"? I think it is not, since it is not robust design. It will only take records, and components have to be components. What if you later change the type's representation from a record to something else? If you change the component to a function? If you want a reliable notion of "everything that has a foo :: Foo", then you need to declare an interface (erm, one parameter type class).
Well, changing data representations is always inflexible. This isn't a new problem, and as you mentioned, you can still fix it with the use of typeclasses.
My point is that the OO community has learned all this stuff the hard way (from software problems arising from naive use of objects and inheritance), and it has taken them years, if not decades, and now it looks as if we are going to joyfully repeat this whole process.
Large product types normally indicate an awkward design, yes, but they're still implicit in many real-world interfaces, and it can be quite difficult to deal with them. This gives nice ways to break them up and work with them where they naturally occur.
An important selling point of the records proposal seems to be that you don't have to declare a type name for a record type. While I don't buy this whole idea (we have a declarative programming language but we want to avoid (type) declarations?) I see a concrete problem: what if you want to make such a nameless type an instance of some type class? Then we get all sorts of overlappings.
Well, I don't know about that. You don't have to declare a type name simply because all the types here already exist. You can still newtype them. However, not all record types are polymorphic. Declaring instances for completely specified rows would not be an issue. It's not clear to me that having instances for polymorphic records would be too much of an issue either. Yes, it would be easy to get overlaps, but not much more so than with existing polymorphic types. If there's more than one polymorphic instance, then of course you get overlap, because you can construct a record type with the union of the labels from the two instances. However are multiple row-polymorphic instances even needed? Due to the problem that records could very well satisfy both predicates in any situation like that, if you needed multiple instances, it would be better to newtype as usual.
So with respect to the original post (see the subject of this email) I tend to agree: leave records as they are. Of course they are problematic, but the main reason is not missing extensibility.
Well, the issue is just that Haskell does not actually have a record system. It has algebraic types, and while those can emulate certain aspects of records, they are not the same thing. The current "record syntax" is just syntax sugar for labelling the fields of a product in an algebraic type. It's nice syntax sugar, and I wouldn't want to get rid of it. (Though it could perhaps do with a renaming :)
As I see it, the problem is that the named component notation was added late and still allows to access the earlier positional notation, and the component names are in the (module-global) namespace.
The problem is that people see "record syntax" and think that somehow what they're declaring is any different from an ordinary product. The syntax gives you a little more capacity for dealing with more fields, and a little bit of future proofing, but not much more, and really it's the same thing, as the ability to use the positional notation indicates. Even with syntax sugar, using large product types in current Haskell is poor design. I'll illustrate one of the main reasons for this, and how extensible records can help fix that problem: Suppose that A, B, and C are types and that we have: data T = T {x :: A, y :: B, z :: C} which we're trying to use to simulate a record type. Then any function f :: T -> T has the ability to read and depend on all the components of the T which it is working with. There are many cases where this is completely inappropriate, but restricting access to one or more of the components is difficult. We'd need to define a new typeclass with get/set functions, and use that instead. Doing this sort of thing for every one of the fields of every product one uses is obviously not a good solution. On the other hand, a function: f :: {x :: A, y :: B | r} -> {x :: A, y :: B | r} obviously can only depend and act on the x and y components, and is not allowed to touch z at all. Sure, you might perhaps say that there's too much polymorphism there, but this usually isn't an issue, and there are still newtypes to tag things and ensure that they don't get into the wrong parts of the program. Record types would also be permitted as members of algebraic data types. More flexible systems than just using products as records are possible using typeclasses with label types like HList, but these generally involve quite a lot of typeclass hackery which, while it's nice to see that it can be done, at some point begins to feel like an abuse of the system, when one could do a better treatment at the compiler level. Such systems still wouldn't have properties as nice as the record system in the paper. (There is no provision for associative or commutative data/type constructors.) A related issue is that these tend to be closer in performance to association lists, which means that while extension is fast, record selection is linear time.
This would be more tolerable if we had ad-hoc overloading. Since we haven't, I'm now basically putting each data declaration in a separate module and import these qualified. (This simulates the "per-type" namespace for components.)
I think that ad-hoc overloading would be much more intolerable. In some cases the design you describe (separating a data type into a module) is appropriate, but I wouldn't hold myself to it. Usually I'd only use that if I planned to hide the constructors. A lot of the time, field labels can be renamed such that they don't overlap. Inventing new names is not hard work. (You can just put part or all of the type name in the labels, and you get basically the same effect as the module system gives you.) - Cale