Why I Want Global Field Names (By Default)

The discussion on records has in some ways narrowed (which is good), but within that narrowed scope of disagreement become very contentious on global vs. local default scope for field names. Those in favor of information-hiding as a key feature have been pretty vocal so far, and while others have argued, I don't think the opposite approach has been strongly motivated. I want to put forward at least one such strong motivation for global field names (aka SORF-style) rather than locally declared field names as a default. Suppose we have some sort of persistence layer implemented with records mapped to, e.g., rows within database tables. In the underlying data, many columns in different tables could share the same name. Now we may want to implement the generation of these record data declarations with some sort of automatic code generation. Alternately, we may want to write these declarations by hand, and automatically populate and migrate the database. It doesn't really matter. For different columns with the same name, we necessarily want to refer to these with the same selector -- i.e. we're not interested in building a one-deep hierarchy where certain "address" selectors refer to the "address" within some columns, and other "address" selectors refer to others. Arguably, we might want such a thing, but with sufficient tables with sufficient columns, the complexity to the programmer in determining the right mappings could simply be not worth it. So, suppose we have a locally declared fields solution (such as DORF). Now, where do these fields live? Arguably, we want a module per record. Otherwise, we're forced to pull in the entire namespace of our database at once, when we only want a fraction in any given client module. Clearly, the field selectors shouldn't live in the module for each individual record -- this means that they're dispersed all over the place, and MyDb.Corporation must import MyDb.Person to get the "address" label or vice-versa, which introduces strange and spurious dependencies. The obvious solution is to create a MyDb.Labels module which holds all the shared field declarations. But now, we want to introduce a new table. This table uses some labels shared with other tables, but uses some labels which are new. To add this table, we then have to update our MyDb.Labels module. This then forces a recompilation of the Labels module. This in turn forces a recompilation of every module which depends on Labels -- at a minimum, every module representing any database table. This in turn again forces a recompilation of any module using any database table. For a project which actually uses the persistence layer widely and freely, this effectively means a recompilation of the entire project! For a sufficiently large project (and I have worked with such) this can be extremely time consuming. So, with any system limited to local field declarations, and with a minimal and reasonable set of design choices to allow field sharing where desired, we discover that it must necessarily have frequent full rebuilds for operations which *should* require compiling a couple of files at most. The above argument holds under a number of modifications. For a database/table layer, we can substitute a layer such as happstack uses. Or, we can substitute a query API to some service which returns results in JSON or XML (or even simply an XML parser layer). How might we amend such a system to avoid this terribly pessimal, bad, no-good behavior? Well, imagine we had a Labels module that declared every possible field in advance. Now, no matter what we wanted, it would already exist, the module would not need to be modified, and so a full recompile would not be triggered. Such a Labels module is clearly impossible to write in finite disk space, and compile in finite time. However, we can provide a simulation of precisely this functionality in the following way (using, e.g., hlists or the like) : data Heof; data La; instance Letter La; data Lb; instance Letter Lb; ... instance Label Heof instance (Letter a, Label b) => Label (a :* b) class Label f => Has r f t where... However, writing (undefined :: La :* Lb :* Lc :* Heof) for a label "abc" is a bit of a pain. Sugar can eliminate this a bit. With a bit more sugar, and new toys in GHC, we can simply write "abc" at the type level rather than the value level! Clearly though, "abc" is a very different sort of type than Bool or Int. It would be nice to statically state that certain places which can now take any type, can only take types like "abc". We want to restrict our types by their "type." This "type" given to types is called a _Kind_. We can then say that things like "abc" at the type level are of kind String. So, if we want to say that our Has class can only take fields like "abc" we can say the following: class Has (r :: *) (f :: String) (t :: *) where get :: r -> t Which is precisely the SORF proposal! From our above description, we can see a number of straightforward ways to recover optional representation hiding. The most obvious thing is simply to wrap records in newtypes. data PrivateRecord = PR {x :: Int} newtype PRInterface= PRI {unPRI :: PR} Now, if the module exports PRInterface and not PrivateRecord (and not unPRI), users can pass around PR values (wrapped in a newtype), but since those PR values don't have Has instances, then they can't get at the x directly and certainly can't modify it. (Standalone + newtype deriving lets you cheat here, but arguably you just want to make breaking abstraction explicit and painful -- impossible is surprisingly hard in Haskell anyway). However, we can do one better, and recover DORF and SORF behavior at once! class Label a class Label f => Has (r :: *) (f :: *) (t :: *) where get :: r -> t instance Label (a :: String) Now we have global field names "baked in" and by default (and indeed proper global names, as I've argued *must be* baked in to be usable). However, we also have the opportunity of declaring private field names: data MyPrivateLabel instance Label MyPrivateLabel Now, if you don't export MyPrivateLabel (or any type which contains MyPrivateLabel) then your abstraction is again safe! So this is a slight generalization of SORF, that solves (I think) the information hiding issue, giving easy use of either global or locally scoped labels. Sugar can be implemented as in Ian's proposal, or otherwise. If everyone finds this agreeable (and I can imagine no reason they wouldn't!) then that leaves us with only a very few semantic issues left. In fact, I think the *only* remaining issue is updates, where I think it would be useful to redirect some collective thought. Having solved all that, we can get back to what's really important -- debating syntax :-) Hope this is clarifying, Gershom

Gershom Bazerman
... I want to put forward at least one strong motivation for global field names (aka SORF-style) rather than locally declared field names as a default.
Hi Gershom, One of the things that's surprised me contributing to this thread is not just that different people rate different features with different priority (that's common in requirements analysis, to try to 'weight' each objective); but that what some people think is absolutely critical others think is absolutely to be avoided. (The Best example of that is the furore about dot notation.) So you're not going to expect me to agree with you. A couple of very strange things I noticed about your expectations for software development, so I'm happy to make clear the context I'm expecting: * Yes, I expect you "to pull in the entire namespace for [the] database at once". That is: in a large-scale database-oriented application, you would declare **all** your field labels (aka data dictionary) in one module, and import that into every sub-module. That's been standard practice in all the software shops I've worked in since around 1984 (System/38). That's exactly what Chris Done's example did, which started my thinking around what became DORF. I am, frankly, astonished anyone would contemplate any other approach in this day and age. Because ... * As you say, tracing all the dependencies from code mods is hard work. Developers are very poor at it. Computers are superb. I expect you to run a nightly build, why not make it easy for yourself? http://www.joelonsoftware.com/articles/fog0000000023.html http://www.joelonsoftware.com/articles/fog0000000043.html Those articles date from over a decade ago! Software shops I've worked in have run nightly builds since around mid-90's. (It'd be interesting to hear from the ghc dev team: do they run a nightly build? Although ghc is not a database-intensive application.) And what on earth sort of hardware are you running that your builds are "time consuming"? This is just not an issue in 2012. I expect large-scale applications with (say) hundreds of tables, thousands of fields, 100k's LOC to rebuild in a few hours. So to my substantive answer: * The realistic situation in modern software engineering is that you assemble your application out of libraries/packages. * You have no control over the naming used in them. * What's worse, developers are so used to strong namespacing controls, that they often use generic names that are likely to appear elsewhere. (And especially if it's in a technical domain with well-agreed jargon.) * So you're highly likely to get clashes 'by accident'. * Haskell's module/namespacing controls are perfectly adequate to manage this. * And OO, as a technique insists on namespace control, to implement encapsulation/representation-hiding. * In database-oriented applications, re-using the same name on different tables is deliberate and intentional (_not_ by accident). * H98 is simply awful for this. DORF addresses that issue. * That said, nothing in DORF stops you creating global names: simply export/import them everywhere unqualified. See: that wasn't too painful, was it? So our point of difference comes down to: what behaviour "By Default". * DORF expects it's more likely you'll want namespace control. And makes it hardly more difficult to be global. And uses module-based namespace control that is already industry-standard, viz: Haskell 98's approach. * SORF 'imposes' global everywhere, And makes it possible but awkward to control name scope. My litmus test for your approach (as for Ian's): * My record has fields `x`, `y`, and `z`: - `x` is to be gettable and settable - `y` is to be gettable but not settable - `z` is to be hidden and unguessable Please explain how your PrivateLabel approach handles those. It's not a case of whether it's possible, but rather how awkward. AntC

Gershom Bazerman
Gershom, an implementation question for your proposed approach to representation-hiding: (It's always easy to wave away awkwardness if you don't show the full mechanism. Please write up your proposal on the wiki to the same level of detail as SORF -- at least!.)
However, we can do one better, and recover DORF and SORF behavior at once!
class Label a
class Label f => Has (r :: *) (f :: *) (t :: *) where
get :: r -> t
instance Label (a :: String)
SORF is not proposing any change to data decls syntax, or the namespacing around field selector functions. How do you get from the data decl to the Has instance? (So that the program can get/set within its private module.) Specifically, with:
data MyPrivateLabel
instance Label MyPrivateLabel
How to get a data decl to generate a Has instance for MyPrivateLabel, and avoid generating a Has instance for the String Kind? How (within the private module) to desugar a call to get (or dot notation) to use MyPrivateLabel? And record update? (Remember that unlike DORF, SORF's field access is syntax-directed: it desugars dot notation to a hard-coded String Kind for the field. There's no clear proposal for record update as yet)
... If everyone finds this agreeable (and I can imagine no reason they wouldn't!) ...
That's a misleading remark: you're asking people to agree to a half-baked and sketchy proposal. I do not "find this agreeable", and I think I've put forward heaps of reasons to disagree. I expect you to: * explain what changes you're proposing to existing syntax * detail how to desugar that syntax to available Haskell * detail the existing syntax you're going to desugar differently * ideally build a prototype to prove it * demonstrate in the prototype how the namespace control works Then we can hold the beauty pageant. (Oh yes: and explain what you'd have instead of dot notation, so that we can fire up another syntax debate ;-) AntC

On 05/03/2012 11:06, AntC wrote:
Gershom Bazerman
writes: ... If everyone finds this agreeable (and I can imagine no reason they wouldn't!) ...
That's a misleading remark: you're asking people to agree to a half-baked and sketchy proposal.
I wonder if we might all stop and take a deep breath at this point. The debate is getting a little more heated than we're used to on this mailing list. There's lots of great discussion going on, and sadly for myself I think I'm missing a lot of the fine details of the debate because of the volume. So can I ask people to take a little more time when concocting their replies, read it over a couple of times, see if it can't be condensed a bit - and above all, let's be civilized. This is Haskell, and we're proud of our community. Cheers, Simon

+1
On Mar 6, 2012 11:24 AM, "Simon Marlow"
On 05/03/2012 11:06, AntC wrote:
Gershom Bazerman
writes: ... If everyone finds this agreeable (and I can
imagine no reason they wouldn't!) ...
That's a misleading remark: you're asking people to agree to a half-baked and sketchy proposal.
I wonder if we might all stop and take a deep breath at this point. The debate is getting a little more heated than we're used to on this mailing list.
There's lots of great discussion going on, and sadly for myself I think I'm missing a lot of the fine details of the debate because of the volume. So can I ask people to take a little more time when concocting their replies, read it over a couple of times, see if it can't be condensed a bit - and above all, let's be civilized. This is Haskell, and we're proud of our community.
Cheers, Simon
______________________________**_________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.**org
http://www.haskell.org/**mailman/listinfo/glasgow-**haskell-usershttp://www.haskell.org/mailman/listinfo/glasgow-haskell-users

On 3/5/12 12:03 AM, Gershom Bazerman wrote:
So, suppose we have a locally declared fields solution (such as DORF). Now, where do these fields live? Arguably, we want a module per record.
Rather, part of the point of all this is precisely that we *don't* want one module per record. That's the organization required presently, since it's the only way to use the module system for namespace resolution. Ideally we should be able to define multiple records with the same field name within a single module, rather than being forced to break the module up into many files (and polluting the module namespace along the way). Also, if your goal is to have a global namespace, that can be approximated precisely by placing all declarations in a single module. In virtue of being in a single module, they're all in the same namespace; the client then just has to import your "global" module in order to get everything. Indeed this has the benefit of modularity because if my code depends on two separate projects which both want "global" namespaces, this would allow me to use them simultaneously. If we are to have any packaging and code reuse at all, this ability to compose multiple packages is absolutely essential. Furthermore, this would *avoid* spurious recompilation, since if I upgrade my dependency on project A then I don't need to recompile any modules which only depend on project B. -- Live well, ~wren
participants (5)
-
AntC
-
Gershom Bazerman
-
Simon Marlow
-
Tom Murphy
-
wren ng thornton