A backwards-compatible record proposal

19 Aug 2006

      Hi -
As I've been writing a Haskell program over the past few months the main 
problem I encounter is that record field names are not local to the record 
type, and any systematic way of making them local (eg by prepending 
"_Tycon_") results in names that are just too clunky, and I feel that 
identifiers *must* use systematic naming conventions to get code that will 
be easy to understand and maintain.

Although it is relatively easy to think up a better record system, it has 
taken me up till now to discover how such a system could be integrated with 
Haskell as it is at the moment, since it's unlikely that the existing record 
system will ever disappear, at least not in the next few years, and in fact, 
there are some good points about the existing record system that I wouldn't 
like to lose.

I'll motivate the proposal then talk about how it could be implemented.

Motivation
=======

Consider the following:

    data Vector3 a = Vector3{x :: a, y::a,  z::a }

    data Normal3 a = Normal3{x :: a, y::a, z::a }

We've got a problem because (x) has been introduced twice to the top level 
namespace. In the above example, it could be argued that I should have 
written:

    data Arr3 a = Arr3 {x::a, y::a, z::a}

    newtype Vector3 a = Vector3 (Arr3 a)
    newtype Normal3 a = Normal3 (Arr3 a)

but for the sake of argument, let's suppose we can't do this - perhaps one 
of the record types has some different fields as well.

A related problem is suppose I have:

    data Size = {width :: Int, height :: Int}

    data Rect = Rect{x1, y1, x2, y2::Int}

    width :: Rect -> Int
    width Rect{x1,_,x2,_} = x2 - x1

because a record field of Size has the same name as a top level function.

This second conflict can be avoided by always using the rule that record 
fields begin with an underscore and all other variables don't.

So applying this to the first problem, we have:

    data Vector3 a = Vector3{_x :: a, _y::a,  _z::a }
    data Normal3 a = Normal3{_x :: a, _y::a, _z::a }

Of course we haven't solved it yet!

But now suppose we introduce a new piece of syntactic sugar, and write:

    data Vector3 a = Vector3{.x :: a, .y::a,  .z::a }
    data Normal3 a = Normal3{.x :: a, .y::a, .z::a }

ie putting a '.' before each field name. The intended meaning is that dotted 
field names do *not* generate top level functions. Instead they allow the 
compiler to generate instance decls as follows, where we've introduced a new 
form of identifier, the dotted id, which behaves as a postfix operator which 
binds more tightly than function application and can also be used as a class 
name (by the compiler only):

    class (.x) :: a b | a -> b where
        (.x) :: a -> b

    class (.y) :: a b | a -> b where
        (.y) :: a -> b

    class (.z) :: a b | a -> b where
        (.z) :: a -> b

For each dotted id, there is a class defined as above, which is available 
globally to the whole program as if a module containing an infinite set of 
class decls as above was exported by the Prelude.

In the module containing the data decl for the record, the compiler inserts 
the following:

    instance (.x) (Vector3 a) a where
        (.x) v = ... -- compiler generated code to access the field

Then within the rest of the program we can write:

    magSquared :: Num a => Vector3 a -> a
    magSquared v = v.x*v.x + v.y*v.y + v.z*v.z

    -- explicit type when specific function is required
    vec_x = ((.x) :: (Vector3 a -> a))

The advantage of this proposal is that it is completely backwards compatible 
with records as they are at the moment, and we can choose which fields we 
want to be dotted and which we want to just keep as normal top level 
functions.

The only extra thing we need to do is put a dot before the field names we 
want to access via the dotted syntax, but we in any case needed to use an 
underscore when we wanted a systematic way to avoid conflicts between field 
names and other top-level names so there is no extra effort involved.

Implementation
==========

The above could almost be implemented just by parsing a source file 
containing uses of dotted fields and using a conversion like:

    data Rec a = Rec { .f :: a}

    let
        rec = Rec {.f = 78}    -- dot is used here too
        p = rec.f * rec.f

==>

    data Rec a = Rec a

    instance Dot__f (Rec a) a where
        __dot_f (Rec a) = a

    let
        rec = Rec 78
        p = __dot_f rec * __dot_f rec

So far so good, but the alert reader :-) will have noticed that we now have 
a *major* problem with abstraction because although we can write:

    module M (Rec) where ...

code in another module can still say rec.f because in Haskell, all instance 
decls in a module are always exported [1]. This problem may disappear in 
Haskell' [2]. http://hackage.haskell.org/trac/haskell-prime/ticket/19

Therefore I think the desugaring would need to take place in the compiler so 
the compiler could avoid exporting the compiler-generated instances when the 
fields are not present in the module export list.

Regards, Brian.

[1] http://haskell.org/onlinereport/modules.html#import-instances
[2] http://hackage.haskell.org/trac/haskell-prime/ticket/19
-- 
Logic empowers us and Love gives us purpose.
Yet still phantoms restless for eras long past,
congealed in the present in unthought forms,
strive mightily unseen to destroy us.

http://www.metamilk.com

Brian Hulley

Brian Hulley

Bernard James POPE

Brian Hulley

Brian Hulley

Brian Hulley

Gene A

Brian Hulley

Bulat Ziganshin

Brian Hulley

tags

participants (4)