Re: Records in Haskell

8 Jan 2012


      2012/1/8 Gábor Lehel <illissius@gmail.com>:
...
2012/1/8 Greg Weber <greg@gregweber.info>:
...
2012/1/8 Gábor Lehel <illissius@gmail.com>
...
Thank you. I have a few questions/comments.
"The module/record ambiguity is dealt with in Frege by preferring
modules and requiring a module prefix for the record if there is
ambiguity."
I think I see why they do it this way (otherwise you can't refer to a
module if a record by the same name is in scope), but on the other
hand it would seem intuitive to me to choose the more specific thing,
and a record feels more specific than a module. Maybe you could go
that way and just not give your qualified imports the same name as a
record? (Unqualified imports are in practice going to be hierarchical,
and no one's in the habit of typing those out to disambiguate things,
so I don't think it really matters if qualified records shadow them.)
In the case where a Record has the same name as its containing module it
would be more specific than a module, and preferring it makes sense. I think
doing this inside the module makes sense, as one shouldn't need to refer to
the containing module's name. We should think more about the case where
module & records are imported.
...
"Expressions of the form x.n: first infer the type of x. If this is
just an unbound type variable (i.e. the type is unknown yet), then
check if n is an overloaded name (i.e. a class operation). [...] Under
no circumstances, however, will the notation x.n contribute in any way
in inferring the type of x, except for the case when n is a class
operation, where an appropriate class constraint is generated."
Is this just a simple translation from x.n to n x? What's the
rationale for allowing the x.n syntax for, in addition to record
fields, class methods specifically, but no other functions?
It is a simple translation from x.n to T.n x
The key point being the function is only accessible through the record's
namespace.
The dot is only being used to tap into a namespace, and is not available for
general function application.
I think my question and your answer are walking past each other here.
Let me rephrase. The wiki page implies that in addition to using the
dot to tap into a namespace, you can also use it for general function
application in the specific case where the function is a class method
("appropriate class constraint is generated" etc etc). I don't
understand why. Or am I misunderstanding?
...
...
Later on you write that the names of record fields are only accessible
from the record's namespace and via record syntax, but not from the
global scope. For Haskell I think it would make sense to reverse this
decision. On the one hand, it would keep backwards compatibility; on
the other hand, Haskell code is already written to avoid name clashes
between record fields, so it wouldn't introduce new problems. Large
gain, little pain. You could use the global-namespace function as you
do now, at the risk of ambiguity, or you could use the new record
syntax and avoid it. (If you were to also allow x.n syntax for
arbitrary functions, this could lead to ambiguity again... you could
solve it by preferring a record field belonging to the inferred type
over a function if both are available, but (at least in my current
state of ignorance) I would prefer to just not allow x.n for anything
other than record fields.)
Perhaps you can give some example code for what you have in mind - we do
need to figure out the preferred technique for interacting with old-style
records. Keep in mind that for new records the entire point is that they
must be name-spaced. A module could certainly export top-level functions
equivalent to how records work now (we could have a helper that generates
those functions).
Let's say you have a record.
data Record = Record { field :: String }
In existing Haskell, you refer to the accessor function as 'field' and
to the contents of the field as 'field r', where 'r' is a value of
type Record. With your proposal, you refer to the accessor function as
'Record.field' and to the contents of the field as either
'Record.field r' or 'r.field'. The point is that I see no conflict or
drawback in allowing all of these at the same time. Writing 'field' or
'field r' would work exactly as it already does, and be ambiguous if
there is more than one record field with the same name in scope. In
practice, existing code is already written to avoid this ambiguity so
it would continue to work. Or you could write 'Record.field r' or
'r.field', which would work as the proposal describes and remove the
ambiguity, and work even in the presence of multiple record fields
with the same name in scope.
The point is that I see what you gain by allowing record fields to be
referred to in a namespaced way, but I don't see what you gain by not
allowing them to be referred to in a non-namespaced way. In theory you
wouldn't care because the non-namespaced way is inferior anyways, but
in practice because all existing Haskell code does it that way, it's
significant.
...
...
Later on:
"- the function that updates field x of data type T is T.{x=}
- the function that sets field x in a T to 42 is T.{x=42}
- If a::T then a.{x=} and a.{x=42} are valid"
I think this looks considerably ugly. Aren't there better
alternatives? { T.x = }, { T.x = 42 }, { a.x = }, { a.x = 42 } maybe?
(Does this conflict in some unfinesseable way with explicit layout
contexts?)
I think this is one of those slightly different syntaxes that many people
will have an initial bad reaction to, however once they use it they will
like it just fine. The problem with what you are suggesting is that it would
be verbose when updating multiple fields at once. But we should investigate
if it is possible to have a syntax closer to the existing update syntax.
Good point.
...
...
"the function that changes field x of a T by applying some function to
it is T.{x <-}"
Same comment on syntax applies. I believe this is a new feature? It
would be welcome, albeit the overloading of <- is a bit worrisome
(don't have better ideas at the moment, but I think there was a
thread). I assume T.{x <- f}, a.{x <-}, and a.{x <- f} (whatever the
syntax is) would also be valid, by analogy to the above?
Yes, new feature, so not necessary in the initial implementation. I
personally think Haskell should drop the monadic curly brackets which nobody
uses, but whatever syntax works is fine with me.
...
Re: Compatibility with existing records: based on (very) cursory
inspection I don't see an obstacle to making it (near-)fully
compatible - you would just be adding some new syntax, most
significantly x.n. Backwards compatibility is a great advantage, so
why not?
Generalizing the syntax to arbitrary TDNR: I think I'm opposed to
this. The problem is that in existing Haskell the vast majority of
expressions (with the notable (and imho unfortunate) exception of
(>>=)) flow from right to left. Going the other way with record fields
isn't a big problem because it's simple and doesn't even feel like
function application so much as member-selection (like modules), but
if you were to allow any function you would soon end up with lengthy
chains of them which would clash nastily with the surrounding code.
Having to jump back and forth and switch directions while reading is
unpleasant. OO languages have this problem and I don't envy them for
it. And in particular having "a . b" mean "first do b, then do a", but
"a.b" mean "do b to a" would be confusing. (You'd already have this
problem with global namespace record field selectors, but at least
it's localized.)
I agree - I think a.b or A.b should always mean tapping into a namespace and
not be generalized outside of that.
...
All of that said, maybe having TDNR with bad syntax is preferable to
not having TDNR at all. Can't it be extended to the existing syntax
(of function application)? Or least some better one, which is ideally
right-to-left? I don't really know the technical details...
Generalized data-namespaces: Also think I'm opposed. This would import
the problem from OO languages where functions written by the module
(class) author get to have a distinguished syntax (be inside the
namespace) over functions by anyone else (which don't).
Maybe you can show some example code? To me this is about controlling
exports of namespaces, which is already possible - I think this is mostly a
matter of convenience.
If I'm understanding correctly, you're suggesting we be able to write:
data Data = Data Int where
   twice (Data d) = 2 * d
   thrice (Data d) = 3 * d
   ...
and that if we write 'let x = Data 7 in x.thrice' it would evaluate to
21. I have two objections.
The first is the same as with the TDNR proposal: you would have both
code that looks like
'data.firstFunction.secondFunction.thirdFunction', as well as the
existing 'thirdFunction $ secondFunction $ firstFunction data' and
'thirdFunction . secondFunction . firstFunction $ data', and if you
have both of them in the same expression (which you will) it becomes
unpleasant to read because you have to read them in opposite
directions.
The second is that only the author of the datatype could put functions
into its namespace; the 'data.foo' notation would only be available
for functions written by the datatype's author, while for every other
function you would have to use 'foo data'. I dislike this special
treatment in OO languages and I dislike it here.
...
...
Another thing that would be nice is lenses to solve the
nested-record-update problem - at least the room to add them later.
Most of the proposed syntax would be unaffected, but you'd need some
syntax for the lens itself... I'm not sure what it might be. Would it
be terrible to have T.x refer to a lens rather than a getter? (I don't
know how you'd refer to the getter then, so probably yeah.) Or maybe {
T.x }, building backwards from { T.x = }?
Another existing language very similar to Haskell whose record system
might be worth evaluating is Disciple: http://disciple.ouroborus.net/.
Unfortunately I couldn't find any specific page it seemed best to link
to.
The syntax of DDC seems the same as this proposal. However, I could not find
any specific information either.
The main things I remember being interesting about it are that it's
based on lenses, and uses some kind of extensible projectors system to
allow something similar to what you achieve with datatype-namespaces,
namely 'virtual' record fields. But I haven't studied it in detail.
Ah, I remember now where I saw a more thorough discussion: in his
thesis[1]. Section 2.7 (page 115) and in particular 2.7.4 (119). It
seems to be a very similar proposal to datatype-namespacing except it
would address my second objection above and allow third-party code to
add functions to the namespace as well. My first objection (the 'flow'
of the code being in the opposite direction to all other code) still
applies though. I couldn't find any discussion of lenses, except as
pertaining to destructive update (which is another feature of
Disciple).

[1] http://www.cse.unsw.edu.au/~benl/papers/thesis/lippmeier-impure-world.pdf
...
...
...
On Sun, Jan 8, 2012 at 2:40 AM, Greg Weber <greg@gregweber.info> wrote:
...
I have updated the wiki - the entry level page [1] compares the
different
proposals and points to a more fleshed out explanation of the Frege
proposal
[2].
I think I now understand the differences between the existing proposals
and
am able to provide leadership to move this forward. Let me summarize the
state of things:
There is a debate over extensible records that we are putting off into
the
future. Instead we have 2 proposals to make things better right now:
* an overloaded record fields proposal that still has implementation
concerns
* a name-spacing & simple type resolution proposal that is awaiting your
critique
The Frege language originally had overloaded record fields but then
moved to
the latter system. The existing experience of the Frege language is very
fortunate for us as we now have some experience to help inform our own
decision.
Greg Weber
[1] http://hackage.haskell.org/trac/ghc/wiki/Records
[2] http://hackage.haskell.org/trac/ghc/wiki/Records/NameSpacing
On Wed, Jan 4, 2012 at 7:54 AM, Greg Weber <greg@gregweber.info> wrote:
...
The Frege author does not have a ghc mail list account but gave a more
detailed explanation of how he goes about TDNR for records and how
often it
type checks without annotation in practice.
A more general explanation is here:
http://www.reddit.com/r/haskell/comments/nph9l/records_stalled_again_leaders...
He sent a specific response to Simon's mail list message, quoted below:
Simon Peyton-Jones is absolutely correct when he notes:
Well the most obvious issue is this. 3.2 says e.m = (T.m e) if the
expression e has type t and the type constructor of t is T and there
exists
a function T.m But that innocent-looking statement begs the *entire*
question! How do we know if "e has type t?
The way it is done in Frege is such that, if you have a function that
uses
or updates (nondestructively, of course) a "record" then at least the
type
constructor of that record has to be known. This is no different than
doing
it explicitly with case constructs, etc., just here you learn the types
from
the constructors you write in the patterns.
Hence, it is not so that one can write a function that updates field f
to
42 for any record that contains a field f:
foo x = x.{f=42}    -- type annotation required for foo or x
In practice this means you'll have to write a type annotation here and
there.
Often, the field access is not the only one that happens to some
variable
of record type, or the record is the result of another function
application.
In such cases, the type is known.
I estimate that in 2/3 of all cases one does not need to write (T.e x)
in
sparsely type annotated code, despite the fact that the frege type
checker
has a left to right bias and does not yet attempt to find the type of x
in
the code that "follows" the x.e construct (after let unrolling etc.)
I think one could do better and guarantee that, if the type of x is
inferrable at all, then so will be x.e (Still, it must be more than
just a
type variable.)
On Sun, Jan 1, 2012 at 2:39 PM, Greg Weber <greg@gregweber.info> wrote:
...
On Sat, Dec 31, 2011 at 3:28 PM, Simon Peyton-Jones
<simonpj@microsoft.com> wrote:
>
> Frege has a detailed explanation of the semantics of its record
> implementation, and the language is *very* similar to Haskell. Lets
> just
> start by using Frege's document as the proposal. We can start a new
> wiki
> page as discussions are needed.
>
>
>
> If it’s a serious proposal, it needs a page to specify the design.
> Currently all we have is a paragraph on
> http://hackage.haskell.org/trac/ghc/wiki/Records, under “Better name
> spacing”.
>
>
>
> As previously stated on this thread, the Frege user manual is
> available
> here:
>
> http://code.google.com/p/frege/downloads/detail?name=Language-202.pdf
>
> see Sections 3.2 (primary expressions) and 4.2.1 (Algebraic Data type
> Declaration - Constructors with labeled fields)
>
>
>
> To all those concerned about Records: look at the Frege
> implementation
> and poke holes in it.
>
>
>
> Well the most obvious issue is this.  3.2 says
>
> e.m = (T.m e) if the expression e has type t and the type constructor
>
> of t is T and there exists a function T.m
>
> But that innocent-looking statement begs the *entire* question!  How
> do
> we know if “e has type t?   This is the route ML takes for arithmetic
> operators: + means integer plus if the argument is of type Int, float
> plus
> if the argument is of type Float, and so on.
>
>
>
> Haskell type classes were specifically designed to address this
> situation. And if you apply type classes to the record situation, I
> think
> you end up with
>
>
> http://hackage.haskell.org/trac/ghc/wiki/Records/OverloadedRecordFields
More specifically I think of this as TDNR, which instead of the focus
of
the wiki page of maintaining backwards compatibility and de-surgaring
to
polymorphic constraints. I had hoped that there were different ideas
or at
least more flexibility possible for the TDNR implementation.
>
>
>
> Well, so maybe we can give up on that.  Imagine Frege without the
> above
> abbreviation.  The basic idea is that field names are rendered unique
> by
> pre-pending the module name.  As I understand it, to record selection
> one
> would then be forced to write (T.m e), to select the ‘m’ field.  That
> is
> the, qualification with T is compulsory.   The trouble with this is
> that
> it’s *already* possible; simply define suitably named fields
>
>   data T = MkE { t_m :: Int, t_n :: Bool }
>
> Here I have prefixed with a (lower case version of) the type name.
> So
> we don’t seem to be much further ahead.
>
>
>
> Maybe one could make it optional if there is no ambiguity, much like
> Haskell’s existing qualified names.  But there is considerable
> ambiguity
> about whether T.m means
>
>   m imported from module T
>
> or
>
>   the m record selector of data type T
If there is ambiguity, we expect the T to be a module. So you would
need
to refer to Record T's module: OtherModule.T.n or T.T.n
Alternatively these conflicts could be compilation errors.
Either way programmers are expected to structure their programs to
avoid
conflicting names, no different then they do now.
>
>
> Perhaps one could make it work out.  But before we can talk about it
> we
> need to see a design. Which takes us back to the question of
> leadership.
>
>
I am trying to provide as much leadership on this issue as I am
capable
of. Your critique is very useful in that effort.
At this point the Frege proposal without TDNR seems to be a small step
forward. We can now define records with clashing fields in the same
module.
However, without TDNR we don't have convenient access to those fields.
I am contacting the Frege author to see if we can get any more
insights
on implementation details.
>
> Simon
>
>
>
>
>
> We only want critiques about
>
> * achieving name-spacing right now
>
> * implementing it in such a way that extensible records could be
> implemented in its place in the future, although we will not allow
> that
> discussion to hold up a records implementation now, just possibly
> modify
> things slightly.
>
>
>
> Greg Weber
>
>
>
> On Thu, Dec 29, 2011 at 2:00 PM, Simon Peyton-Jones
> <simonpj@microsoft.com> wrote:
>
> | The lack of response, I believe, is just a lack of anyone who
> | can cut through all the noise and come up with some
> | practical way to move forward in one of the many possible
> | directions.
>
> You're right.  But it is very telling that the vast majority of
> responses on
>
>
>  http://www.reddit.com/r/haskell/comments/nph9l/records_stalled_again_leaders...
> were not about the subject (leadership) but rather on suggesting yet
> more, incompletely-specified solutions to the original problem.  My
> modest
> attempt to build a consensus by articulating the simplest solution I
> could
> think of, manifestly failed.
>
> The trouble is that I just don't have the bandwidth (or, if I'm
> honest,
> the motivation) to drive this through to a conclusion. And if no one
> else
> does either, perhaps it isn't *that* important to anyone.  That said,
> it
> clearly is *somewhat* important to a lot of people, so doing nothing
> isn't
> very satisfactory either.
>
> Usually I feel I know how to move forward, but here I don't.
>
> Simon
>
>
_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
--
Work is punishment for failing to procrastinate effectively.
--
Work is punishment for failing to procrastinate effectively.
-- 
Work is punishment for failing to procrastinate effectively.