Re: Records in Haskell

8 Jan 2012

      2012/1/8 Greg Weber :
...
2012/1/8 Gábor Lehel 
...
...
...
Later on you write that the names of record fields are only accessible
from the record's namespace and via record syntax, but not from the
global scope. For Haskell I think it would make sense to reverse this
decision. On the one hand, it would keep backwards compatibility; on
the other hand, Haskell code is already written to avoid name clashes
between record fields, so it wouldn't introduce new problems. Large
gain, little pain. You could use the global-namespace function as you
do now, at the risk of ambiguity, or you could use the new record
syntax and avoid it. (If you were to also allow x.n syntax for
arbitrary functions, this could lead to ambiguity again... you could
solve it by preferring a record field belonging to the inferred type
over a function if both are available, but (at least in my current
state of ignorance) I would prefer to just not allow x.n for anything
other than record fields.)
Perhaps you can give some example code for what you have in mind - we do
need to figure out the preferred technique for interacting with
old-style
records. Keep in mind that for new records the entire point is that they
must be name-spaced. A module could certainly export top-level functions
equivalent to how records work now (we could have a helper that
generates
those functions).
Let's say you have a record.
data Record = Record { field :: String }
In existing Haskell, you refer to the accessor function as 'field' and
to the contents of the field as 'field r', where 'r' is a value of
type Record. With your proposal, you refer to the accessor function as
'Record.field' and to the contents of the field as either
'Record.field r' or 'r.field'. The point is that I see no conflict or
drawback in allowing all of these at the same time. Writing 'field' or
'field r' would work exactly as it already does, and be ambiguous if
there is more than one record field with the same name in scope. In
practice, existing code is already written to avoid this ambiguity so
it would continue to work. Or you could write 'Record.field r' or
'r.field', which would work as the proposal describes and remove the
ambiguity, and work even in the presence of multiple record fields
with the same name in scope.
The point is that I see what you gain by allowing record fields to be
referred to in a namespaced way, but I don't see what you gain by not
allowing them to be referred to in a non-namespaced way. In theory you
wouldn't care because the non-namespaced way is inferior anyways, but
in practice because all existing Haskell code does it that way, it's
significant.
My motivation for this entire change is simply to be able to use two record
with field members of the same name. This requires *not* generating
top-level functions to access record fields. I don't know if there is a
valid use case for the old top-level functions once switched over to the new
record system (other than your stated personal preference). We could
certainly have a pragma or something similar that generates top-level
functions even if the new record system is in use.
Oh, in a sense you're right. If the top-level accessor functions are
treated as if they were defined by the module containing the record,
and there is more than one with the same name, the compiler would see
it as multiple definitions and indeed report an error. On the other
hand if they are treated as imported names (conceptually, implicitly
imported from the namespace of the record, say), then the compiler
would only report an error when you actually try to use the ambiguous
name. I had been assuming the latter case without realizing it. It
corresponds to what you have now if you have multiple records imported
with overlapping field names.

Again, exporting the field accessors to global scope and deferring any
errors from ambiguity or overlap to the point of their use would not
in any way interfere with the use of those same field accessors with
the namespaced syntax. If you only use the namespaced syntax, it would
work exactly as in your proposal: the top-level accessors are never
used so no ambiguity errors are reported. If you only use the
top-level syntax, then it works almost exactly as Haskell currently
does (except you can define multiple records with overlapping field
names in the same module as long as you don't use them, which I had
not considered). The set of well-formed programs if you allow
top-level access would be almost a superset of the set of well-formed
programs if you don't. (The exception is that top-level field
accessors would conflict with non-accessor plain old functions of the
same name, whereas if they weren't visible outside of the record's
namespace they wouldn't, but I don't feel like that's a huge concern.)
...
...
...
...
All of that said, maybe having TDNR with bad syntax is preferable to
not having TDNR at all. Can't it be extended to the existing syntax
(of function application)? Or least some better one, which is ideally
right-to-left? I don't really know the technical details...
Generalized data-namespaces: Also think I'm opposed. This would import
the problem from OO languages where functions written by the module
(class) author get to have a distinguished syntax (be inside the
namespace) over functions by anyone else (which don't).
Maybe you can show some example code? To me this is about controlling
exports of namespaces, which is already possible - I think this is
mostly a
matter of convenience.
If I'm understanding correctly, you're suggesting we be able to write:
data Data = Data Int where
   twice (Data d) = 2 * d
   thrice (Data d) = 3 * d
   ...
and that if we write 'let x = Data 7 in x.thrice' it would evaluate to
21. I have two objections.
The first is the same as with the TDNR proposal: you would have both
code that looks like
'data.firstFunction.secondFunction.thirdFunction', as well as the
existing 'thirdFunction $ secondFunction $ firstFunction data' and
'thirdFunction . secondFunction . firstFunction $ data', and if you
have both of them in the same expression (which you will) it becomes
unpleasant to read because you have to read them in opposite
directions.
This would not be possible because the functions can only be accessed from
the namespace - you could only use the dot (or T.firstFunction). It is
possible as per your complaint below:
Sorry, I was unclear here. The firstFunction, secondFunction, and
thirdFunction in my examples are *not* referring to the very same
firstFunction, secondFunction, and thirdFunction, they are all
placeholders for arbitrary functions.

My problem is that you could (and would have to, because the syntaxes
aren't interchangeable) write things like this:

foo . bar . (baz.quux.asdf) . wasd $ hjkl

Now what's the right order for reading the functions in this
expression? The correct answer is:

hjkl wasd baz quux asdf bar foo

or using numbers to denote their place:

7 6 3 4 5 2 1

If you had written the equivalent using existing Haskell syntax it would be:

foo . bar . (asdf $ quux baz) . wasd $ hjkl

and the right order for reading it is:

hjkl wasd baz quux asdf bar foo

or with numbers:

7 6 5 4 3 2 1

If you introduce heavy use of the a.b.c.d syntax you would frequenty
have to jump around and switch directions while you read an
expression. If you restrict it to only field accessors I think it
would be limited and tolerable, my quarrel is with allowing arbitrary
functions (whether by TDNR or data-namespacing) in which case you
would likely as not end up with half of functions going one way and
the other half going the other.
...
...
The second is that only the author of the datatype could put functions
into its namespace; the 'data.foo' notation would only be available
for functions written by the datatype's author, while for every other
function you would have to use 'foo data'. I dislike this special
treatment in OO languages and I dislike it here.
-- 
Work is punishment for failing to procrastinate effectively.