Why I Want Global Field Names (By Default)

5 Mar 2012

      The discussion on records has in some ways narrowed (which is good), but 
within that narrowed scope of disagreement become very contentious on 
global vs. local default scope for field names. Those in favor of 
information-hiding as a key feature have been pretty vocal so far, and 
while others have argued, I don't think the opposite approach has been 
strongly motivated. I want to put forward at least one such strong 
motivation for global field names (aka SORF-style) rather than locally 
declared field names as a default.

Suppose we have some sort of persistence layer implemented with records 
mapped to, e.g., rows within database tables. In the underlying data, 
many columns in different tables could share the same name. Now we may 
want to implement the generation of these record data declarations with 
some sort of automatic code generation. Alternately, we may want to 
write these declarations by hand, and automatically populate and migrate 
the database. It doesn't really matter. For different columns with the 
same name, we necessarily want to refer to these with the same selector 
-- i.e. we're not interested in building a one-deep hierarchy where 
certain "address" selectors refer to the "address" within some columns, 
and other "address" selectors refer to others. Arguably, we might want 
such a thing, but with sufficient tables with sufficient columns, the 
complexity to the programmer in determining the right mappings could 
simply be not worth it.

So, suppose we have a locally declared fields solution (such as DORF). 
Now, where do these fields live? Arguably, we want a module per record. 
Otherwise, we're forced to pull in the entire namespace of our database 
at once, when we only want a fraction in any given client module. 
Clearly, the field selectors shouldn't live in the module for each 
individual record -- this means that they're dispersed all over the 
place, and MyDb.Corporation must import MyDb.Person to get the "address" 
label or vice-versa, which introduces strange and spurious dependencies. 
The obvious solution is to create a MyDb.Labels module which holds all 
the shared field declarations. But now, we want to introduce a new 
table. This table uses some labels shared with other tables, but uses 
some labels which are new. To add this table, we then have to update our 
MyDb.Labels module. This then forces a recompilation of the Labels 
module. This in turn forces a recompilation of every module which 
depends on Labels -- at a minimum, every module representing any 
database table. This in turn again forces a recompilation of any module 
using any database table. For a project which actually uses the 
persistence layer widely and freely, this effectively means a 
recompilation of the entire project! For a sufficiently large project 
(and I have worked with such) this can be extremely time consuming.

So, with any system limited to local field declarations, and with a 
minimal and reasonable set of design choices to allow field sharing 
where desired, we discover that it must necessarily have frequent full 
rebuilds for operations which *should* require compiling a couple of 
files at most.

The above argument holds under a number of modifications. For a 
database/table layer, we can substitute a layer such as happstack uses. 
Or, we can substitute a query API to some service which returns results 
in JSON or XML (or even simply an XML parser layer).

How might we amend such a system to avoid this terribly pessimal, bad, 
no-good behavior? Well, imagine we had a Labels module that declared 
every possible field in advance. Now, no matter what we wanted, it would 
already exist, the module would not need to be modified, and so a full 
recompile would not be triggered. Such a Labels module is clearly 
impossible to write in finite disk space, and compile in finite time. 
However, we can provide a simulation of precisely this functionality in 
the following way (using, e.g., hlists or the like) :

     data Heof; data La; instance Letter La; data Lb; instance Letter 
Lb; ...

     instance Label Heof
     instance (Letter a, Label b) => Label (a :* b)
     class Label f => Has r f t where...

However, writing (undefined :: La :* Lb :* Lc :* Heof) for a label "abc" 
is a bit of a pain. Sugar can eliminate this a bit. With a bit more 
sugar, and new toys in GHC, we can simply write "abc" at the type level 
rather than the value level! Clearly though, "abc" is a very different 
sort of type than Bool or Int. It would be nice to statically state that 
certain places which can now take any type, can only take types like 
"abc". We want to restrict our types by their "type." This "type" given 
to types is called a _Kind_. We can then say that things like "abc" at 
the type level are of kind String. So, if we want to say that our Has 
class can only take fields like "abc" we can say the following:

     class Has (r :: *) (f :: String) (t :: *) where
        get :: r -> t

Which is precisely the SORF proposal!

 From our above description, we can see a number of straightforward ways 
to recover optional representation hiding. The most obvious thing is 
simply to wrap records in newtypes.

    data PrivateRecord = PR {x :: Int}
    newtype PRInterface= PRI {unPRI :: PR}

Now, if the module exports PRInterface and not PrivateRecord (and not 
unPRI), users can pass around PR values (wrapped in a newtype), but 
since those PR values don't have Has instances, then they can't get at 
the x directly and certainly can't modify it. (Standalone + newtype 
deriving lets you cheat here, but arguably you just want to make 
breaking abstraction explicit and painful -- impossible is surprisingly 
hard in Haskell anyway).

However, we can do one better, and recover DORF and SORF behavior at once!

     class Label a

     class Label f => Has (r :: *) (f :: *) (t :: *) where
        get :: r -> t

    instance Label (a :: String)

Now we have global field names "baked in" and by default (and indeed 
proper global names, as I've argued *must be* baked in to be usable). 
However, we also have the opportunity of declaring private field names:

    data MyPrivateLabel
    instance Label MyPrivateLabel

Now, if you don't export MyPrivateLabel (or any type which contains 
MyPrivateLabel) then your abstraction is again safe!

So this is a slight generalization of SORF, that solves (I think) the 
information hiding issue, giving easy use of either global or locally 
scoped labels. Sugar can be implemented as in Ian's proposal, or 
otherwise.  If everyone finds this agreeable (and I can imagine no 
reason they wouldn't!) then that leaves us with only a very few semantic 
issues left. In fact, I think the *only* remaining issue is updates, 
where I think it would be useful to redirect some collective thought.

Having solved all that, we can get back to what's really important -- 
debating syntax :-)

Hope this is clarifying,
Gershom

Why I Want Global Field Names (By Default)

Gershom Bazerman