The discussion on records has in some ways narrowed (which is good),
but within that narrowed scope of disagreement become very
contentious on global vs. local default scope for field names. Those
in favor of information-hiding as a key feature have been pretty
vocal so far, and while others have argued, I don't think the
opposite approach has been strongly motivated. I want to put forward
at least one such strong motivation for global field names (aka
SORF-style) rather than locally declared field names as a default.
Suppose we have some sort of persistence layer implemented with
records mapped to, e.g., rows within database tables. In the
underlying data, many columns in different tables could share the
same name. Now we may want to implement the generation of these
record data declarations with some sort of automatic code
generation. Alternately, we may want to write these declarations by
hand, and automatically populate and migrate the database. It
doesn't really matter. For different columns with the same name, we
necessarily want to refer to these with the same selector -- i.e.
we're not interested in building a one-deep hierarchy where certain
"address" selectors refer to the "address" within some columns, and
other "address" selectors refer to others. Arguably, we might want
such a thing, but with sufficient tables with sufficient columns,
the complexity to the programmer in determining the right mappings
could simply be not worth it.
So, suppose we have a locally declared fields solution (such as
DORF). Now, where do these fields live? Arguably, we want a module
per record. Otherwise, we're forced to pull in the entire namespace
of our database at once, when we only want a fraction in any given
client module. Clearly, the field selectors shouldn't live in the
module for each individual record -- this means that they're
dispersed all over the place, and MyDb.Corporation must import
MyDb.Person to get the "address" label or vice-versa, which
introduces strange and spurious dependencies. The obvious solution
is to create a MyDb.Labels module which holds all the shared field
declarations. But now, we want to introduce a new table. This table
uses some labels shared with other tables, but uses some labels
which are new. To add this table, we then have to update our
MyDb.Labels module. This then forces a recompilation of the Labels
module. This in turn forces a recompilation of every module which
depends on Labels -- at a minimum, every module representing any
database table. This in turn again forces a recompilation of any
module using any database table. For a project which actually uses
the persistence layer widely and freely, this effectively means a
recompilation of the entire project! For a sufficiently large
project (and I have worked with such) this can be extremely time
consuming.
So, with any system limited to local field declarations, and with a
minimal and reasonable set of design choices to allow field sharing
where desired, we discover that it must necessarily have frequent
full rebuilds for operations which *should*
require compiling a couple of files at most.
The above argument holds under a number of modifications. For a
database/table layer, we can substitute a layer such as happstack
uses. Or, we can substitute a query API to some service which
returns results in JSON or XML (or even simply an XML parser layer).
How might we amend such a system to avoid this terribly pessimal,
bad, no-good behavior? Well, imagine we had a Labels module that
declared every possible field in advance. Now, no matter what we
wanted, it would already exist, the module would not need to be
modified, and so a full recompile would not be triggered. Such a
Labels module is clearly impossible to write in finite disk space,
and compile in finite time. However, we can provide a simulation of
precisely this functionality in the following way (using, e.g.,
hlists or the like) :
data Heof; data La; instance Letter La; data Lb; instance Letter
Lb; ...
instance Label Heof
instance (Letter a, Label b) => Label (a :* b)
class Label f => Has r f t where...
However, writing (undefined :: La :* Lb :* Lc :* Heof) for a label
"abc" is a bit of a pain. Sugar can eliminate this a bit. With a bit
more sugar, and new toys in GHC, we can simply write "abc" at the
type level rather than the value level! Clearly though, "abc" is a
very different sort of type than Bool or Int. It would be nice to
statically state that certain places which can now take any type,
can only take types like "abc". We want to restrict our types by
their "type." This "type" given to types is called a _Kind_. We can then say that things
like "abc" at the type level are of kind String. So, if we want to
say that our Has class can only take fields like "abc" we can say
the following:
class Has (r :: *) (f :: String) (t :: *) where
get :: r -> t
Which is precisely the SORF proposal!
From our above description, we can see a number of straightforward
ways to recover optional representation hiding. The most obvious
thing is simply to wrap records in newtypes.
data PrivateRecord = PR {x :: Int}
newtype PRInterface= PRI {unPRI :: PR}
Now, if the module exports PRInterface and not PrivateRecord (and
not unPRI), users can pass around PR values (wrapped in a newtype),
but since those PR values don't have Has instances, then they can't
get at the x directly and certainly can't modify it. (Standalone +
newtype deriving lets you cheat here, but arguably you just want to
make breaking abstraction explicit and painful -- impossible is
surprisingly hard in Haskell anyway).
However, we can do one better, and recover DORF and SORF behavior at
once!
class Label a
class Label f => Has (r :: *) (f :: *) (t :: *) where
get :: r -> t
instance Label (a :: String)
Now we have global field names "baked in" and by default (and indeed
proper global names, as I've argued *must be*
baked in to be usable). However, we also have the opportunity of
declaring private field names:
data MyPrivateLabel
instance Label MyPrivateLabel
Now, if you don't export MyPrivateLabel (or any type which contains
MyPrivateLabel) then your abstraction is again safe!
So this is a slight generalization of SORF, that solves (I think)
the information hiding issue, giving easy use of either global or
locally scoped labels. Sugar can be implemented as in Ian's
proposal, or otherwise. If everyone finds this agreeable (and I can
imagine no reason they wouldn't!) then that leaves us with only a
very few semantic issues left. In fact, I think the *only* remaining issue is updates,
where I think it would be useful to redirect some collective
thought.
Having solved all that, we can get back to what's really important
-- debating syntax :-)
Hope this is clarifying,
Gershom