Non-atomic "atoms" for type-level programming

The recent GHC trac ticket revision reminded me of the old open type-sharing problem with type tags and record labels: - if type-level tags (such as 'data TTrue'/'data TFalse') are declared repeatedly in separate modules, they represent separate types, preventing shared use (your type-level predicate doesn't return my version of 'TTrue'/'TFalse') - if record field labels (as used in your run-of-the-mill extensible record library) have to be declared, that is not only inconvenient, but prevents sharing of field labels over independent code bases (see the old Haskell' ticket #92 for discussion http://hackage.haskell.org/trac/haskell-prime/wiki/FirstClassLabels ; also http://hackage.haskell.org/trac/ghc/ticket/1872 for alternative extensible records libs) Since Haskell has no concept of type sharing, the only way to express this is to declare types in a common import. But there is no way for that common import to predict all possible future uses, and we can't just keep adding more labels/tags to it, forcing all dependencies to be updated and kept in sync. Using Template Haskell, and QuasiQuotes in particular, we can now at least work around this issue, by splitting the atoms:-) - 'undefined :: Tag' becomes 'undefined :: (TT :. Ta :. Tg)' - 'label :: Label' becomes '(Ll :< La :< Lb :< Le :< Ll) :: (Ll :< La :< Lb :< Le :< Ll)' - a common import, Data.Label, provides type letters and combinators: 'data La = La' and 'data a :< b = a :< b' 'data Ta' and 'data a :. b' .. - quasiquoting and Show instances, also provided by Data.Label, try to hide the internal structure of labels and tags, at least at expression level. Since there is no quasiquoting at type-level (why?), generating type synonyms seems the best we can do there.. - since record field labels are constructed from type letters, this would also provide a basis for - type-level numbers (type-level quasiquoting would help, though) - label ordering: http://hackage.haskell.org/trac/ghc/ticket/2104 http://hackage.haskell.org/trac/ghc/ticket/1894 In actual use, this is what tags and labels from Data.Label look like: ------------- -- the usual extensible-records-as-nested-pairs data label := value = label := value deriving Show data field :& record = field :& record deriving Show infixr :& -- no quasiquoting for types:-(, so generate type synonyms instead $(genTypeTag "TTrue") $(genTypeTag "TFalse") -- a type-level predicate, using shared tags TTrue/TFalse class HasField record label tbool | label record -> tbool instance HasField () label TFalse instance HasField ((label:=value):&record) label TTrue instance HasField record label tbool => HasField (field:&record) label tbool -- record field selection, driven by field label types class Select record label value where (!) :: record -> label -> value instance .. -- some example records -- no need to declare field labels, and they will be -- shared with other importers of Data.Label! a = ([$l|this|] := True) :&([$l|that|] := "hi") :&() b = ([$l|that|] := "there") :&([$l|x|] := 100) :&([$l|y|] := 200) :&() -- we don't even need label values for this, -- type tags are sufficient, as long as we don't -- evaluate the (undefined) values of tags c = ([$t|this|] := 'x') :&([$t|y|] := ()) :&() -- testing Show and record selection records = do print a print b print c print $ (a ! [$l|this|]) print $ (c ! [$t|this|]) print $ (a ! [$l|that|]) ++ ", " ++ (b ! [$l|that|]) ------------- *Main> [$l|label|] label *Main> :t [$l|label|] [$l|label|] :: Ll :< (La :< (Lb :< (Le :< Ll))) *Main> [$l|label|] `seq` () () *Main> [$t|label|] label *Main> :t [$t|label|] [$t|label|] :: Tl :. (Ta :. (Tb :. (Te :. Tl))) *Main> [$t|label|] `seq` () *** Exception: Prelude.undefined *Main> :t [$l|100|] [$l|100|] :: L1 :< (L0 :< L0) *Main> records (this := True) :& ((that := "hi") :& ()) (that := "there") :& ((x := 100) :& ((y := 200) :& ())) (this := 'x') :& ((y := ()) :& ()) True 'x' "hi, there" ------------- For example code, see http://community.haskell.org/~claus/misc/Data/Label.hs http://community.haskell.org/~claus/misc/Data/Label/TH.hs http://community.haskell.org/~claus/misc/labels.hs Claus

- if type-level tags (such as 'data TTrue'/'data TFalse') are declared repeatedly in separate modules, they represent separate types, preventing shared use (your type-level predicate doesn't return my version of 'TTrue'/'TFalse')
How is the need for a common import for 'data TTrue; data TFalse' different then the need for a common import for 'data Bool = True | False'? Clearly, the advent of type-level programming necessitates the design of a type-level standard library, which provides standard abstractions to enable interoperation of custom libraries. But I don't see why the module system should not scale to type-level programming. Tillmann

Am Dienstag, 14. April 2009 20:01 schrieb Tillmann Rendel:
How is the need for a common import for 'data TTrue; data TFalse' different then the need for a common import for 'data Bool = True | False'?
Why not say data True data False, instead of data TTrue data TFalse? I don’t see the reason why we should insert the “T”. Data constructors are in a different namespace than type constructors. By the way, the grapefruit-records package imports type-level, only to not define its own type-level booleans but to reuse “common” types whereas I considered type-level as the standard type level programming library. Best wishes, Wolfgang

- if type-level tags (such as 'data TTrue'/'data TFalse') are declared repeatedly in separate modules, they represent separate types, preventing shared use (your type-level predicate doesn't return my version of 'TTrue'/'TFalse')
How is the need for a common import for 'data TTrue; data TFalse' different then the need for a common import for 'data Bool = True | False'?
'Bool' is hardcoded at the language level (if/then/else), not just standard library, not just implicitly imported Prelude, so that seems very stable - stable enough for a single common standard library import (unlike many type-level programming concepts, which still require experimentation and variation). But even that is considered too unflexible - some users want alternative Preludes (be it temporarily, to develop a better standard, or permanently, for personal preferences), most authors of embedded domain-specific languages have wanted to replace 'Bool' and associated syntax/classes/ operations with a variant matching their needs. Now you have several definitions of 'Bool', some of which may be compatible with each other (say, two variants of FRP libraries that both simply lift 'Bool' into 'Time->Bool'). How do you, as a library user, express that two compatible types from different sources are to be considered equivalent, without forcing the authors of the compatible definitions to collaborate on a common "standard" library for both their projects? It is not a question of possible-in-theory, it is a question of pragmatics. The need to go beyond common imports, temporarily (as systems evolve) or permanently (because tree-like hierarchies do not fit all modularization strategies), exists for 'Bool' as well as for 'TBool'. Standard ML's answer to that kind of issue is type sharing. Haskell has no equivalent. Haskell has further needs in going beyond plain hierarchical import structures, though - think of the proposals for class aliases, or the question of how to split up package dependencies without relying on orphan instances, how to depend on packages in specific configurations, etc. Again, the ML family of advanced module systems has some answers to offer (and, yes, we can encode much of those in Haskell's type system, but who does that when writing cabal packages?). Haskell'98, by design, had the simplest module system it could get away with. These days, additional layers have accumulated around this core, based on libraries and cabal packages. These layers run into all the problems of advanced module systems, only that these problems currently aren't acknowledged as language design problems, but are treated as issues to hack around whenever someone is available to do the hacking.
Clearly, the advent of type-level programming necessitates the design of a type-level standard library, which provides standard abstractions to enable interoperation of custom libraries. But I don't see why the module system should not scale to type-level programming.
Haskell's module system is an embarrassment ignoring decades of research, its one strong point being its simplicity. There has long been an implicit assumption that advances in modular programming would come either via the type class system, or via extensible records, and that these advanced would happen within modules, without having to evolve the module system beyond simple namespace management. In practice, cabal and hackage have changed all that, introducing a de-facto module configuration system on top of the existing modules, with an evolving design. My typed non-atomic atoms don't fix any of that, but they do seem to offer a workaround for a single issue that has been around for years, and has led to several trac tickets and type-level library awkwardnesses. For instance, it isn't necessary to pre-define hundreds of constant literals for a type-level numeric library if they can be generated in client code, nor is it necessary to hand-define or template-generate an ordering relation on constructors (which some type-level libraries depend on) if it can be defined once and for all. Non of this means that it wouldn't be good to have a standard library for type-level programming - in fact, I'd expect a revised Data.Label to become a small part of such standard!-) Claus ps. If you want to know more about my view on module systems for functional languages, have a look at chapter 4 of http://community.haskell.org/~claus/publications/phd.html , titled "Module Systems for Functional Languages". It is slightly dated by now -lots of things have happened in program (de-)composition research since 1997, when that was written-, but the basis for Haskell's module system is much more ancient that that, so it might be interesting for new Haskellers to see just how old some of Haskell's "advanced" ideas really are;-)

Hi Claus, thanks for your elaborations. I'm still not convinced that a common name (e.g. TT :. Tr :. Tu :. Te) is a better interface than a common import (e.g. TypeLevel.Bool.True). In both cases, the authors of all modules have to actively collaborate, either to define common names, or to define common imports. But I begin to see how type-level atoms could help to, e.g., implement more advanced module system as type-level embedded DSLs in Haskell.
Standard ML's answer to that kind of issue is type sharing.
Does type sharing help with making modules retroactively compatible? Tillmann

Tillmann Rendel
Hi Claus,
thanks for your elaborations. I'm still not convinced that a common name (e.g. TT :. Tr :. Tu :. Te) is a better interface than a common import (e.g. TypeLevel.Bool.True). In both cases, the authors of all modules have to actively collaborate, either to define common names, or to define common imports.
But I begin to see how type-level atoms could help to, e.g., implement more advanced module system as type-level embedded DSLs in Haskell.
Standard ML's answer to that kind of issue is type sharing.
Does type sharing help with making modules retroactively compatible?
map (\i -> rot13 i) import Foo -- (c) this sig last receiving data processing entity. Inspect headers for copyright history. All rights reserved. Copying, hiring, renting, performance and/or quoting of this signature prohibited.

thanks for your elaborations. I'm still not convinced that a common name (e.g. TT :. Tr :. Tu :. Te) is a better interface than a common import (e.g. TypeLevel.Bool.True). In both cases, the authors of all modules have to actively collaborate, either to define common names, or to define common imports.
It is not a solution, it is a workaround. All it does is offer users another choice, so they can now say whether they are talking about shared or locally defined labels: module A where import Data.Label data MyLabel x = [$l|label|] y = undefined::MyLabel module B where import Data.Label data MyLabel x = [$l|label|] y = undefined::MyLabel module C where import Data.Label import A import B ok = [A.x,B.x] fails = [A.y,B.y] It does so by offering a meta-level commonality: A and B do not have to agree on a common module to declare all their common types (the author of Data.Label has no idea what labels its importers might use, other than the alphabet the labels are constructed from), they only need to agree on a common way of declaring all their shareable types.
But I begin to see how type-level atoms could help to, e.g., implement more advanced module system as type-level embedded DSLs in Haskell.
Well, atoms make labels, labels form extensible record fields, extensible records can be used as first-class modules, but there'd still be a lot missing. This is really just a small step, and though it is a useful one, it has no such high aspirations, yet. When I wrote the first-class-labels proposal for Haskell', I was thinking about possible implementations (outlined in the haskell prime wiki page I referred to) but they always looked as if they'd require substantial changes. This workaround suggests that a few well-placed localised changes to GHC might be sufficient to get first-class labels - just split the modifications over two, only losely coupled areas: - provide label constructors - provide label usage (preferably hiding the internal structure) Until someone does that, quasiquoting offers a workaround, so people can resume playing with things like type-level numbers, extensible record libraries with and without label ordering, etc. I've filed a feature request for type-level quasiquoting, in case anyone else has such needs:-) http://hackage.haskell.org/trac/ghc/ticket/3177
Standard ML's answer to that kind of issue is type sharing.
Does type sharing help with making modules retroactively compatible?
It has been too long since I looked at this in detail, but yes. The way I recall it (and the early example in [1] seems to confirm this, though SML has changed after that paper was published) is that modules have signatures, and type sharing constraints assert that parts of these signatures have to match up (in Haskell-speak: imagine modules as records, with two records R1 a and R2 b, then we can use a type 'a~b => R1 a -> R2 b -> T' to assert that both records share the same type parameter; only that Haskell modules aren't records and aren't parameterized..). It would be as if one could write modules parameterised by types, instead of declaring them locally, and being able to share a type parameter over several imports: module A(type label) where x = undefined :: label module B(type label) where x = undefined :: label module C(type label) where import A[label] import B[label] ok = [A.x,B.x] Claus [1] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.3595 Robert Harper and Mark Lillibridge, A Type-Theoretic Approach to Higher-Order Modules with Sharing

in case anyone stumbles over my ad-hoc notations, that should have been:
module A[type label] where x = undefined :: label module B[type label] where x = undefined :: label
module C[type label] where import A[label] import B[label] ok = [A.x,B.x]
assuming that: - 'module X[types]' means a module parameterized by 'types' - 'import X[types]' means a module import with parameters 'types'. Claus

Standard ML's answer to that kind of issue is type sharing. Does type sharing help with making modules retroactively compatible?
It would be as if one could write modules parameterised by types, instead of declaring them locally, and being able to share a type parameter over several imports:
module A[type label] where x = undefined :: label module B[type label] where x = undefined :: label
module C[type label] where import A[label] import B[label] ok = [A.x,B.x]
assuming that: - 'module X[types]' means a module parameterized by 'types' - 'import X[types]' means a module import with parameters 'types'.
It appears I need to qualify my earlier statement that Haskell doesn't have parameterized modules and type sharing (thanks to Tuve Nordius [1] for suggesting to use type families for the former). Here is an encoding of the hypothetical example above using type families (using one of their lesser publicized features - they can be referred to before being defined): ---------------------------------------------------- {-# LANGUAGE TypeFamilies #-} module LA where type family Label a z = undefined::Label () ---------------------------------------------------- {-# LANGUAGE TypeFamilies #-} module LB where type family Label a z = undefined::Label () ---------------------------------------------------- {-# LANGUAGE UndecidableInstances #-} {-# LANGUAGE TypeFamilies #-} module LC where import LA import LB -- express type sharing while leaving actual type open type family Label a type instance LA.Label a = LC.Label a type instance LB.Label a = LC.Label a ok2 = [LA.z,LB.z] ---------------------------------------------------- Modules 'LA' and 'LB' use the applications of the yet to be instantiated type family 'Label a' as placeholders for unknown types (ie, type families are used as glorified type synonyms, but with delayed definitions), effectively parameterizing the whole modules over these types. Module 'LC' adds its own placeholder 'LC.Label a', instantiating both 'LA.Label a' and 'LB.Label a' to the same, as yet unknown type (we're refining their definitions just enough to allow them to match identically), effectively expressing a type sharing constraint. This is probably implicit in the work comparing SML's module language with Haskell's recent type class/family extensions (can anyone confirm this with a quote/reference?), but I hadn't realized that this part of the encoding is straightforward enough to use in practice. One remaining issue is whether this encoding can be modified to allow for multiple independent instantiations of 'LA', 'LB', and 'LC' above, each with their own type parameters, in the same program. Claus [1] http://www.haskell.org/pipermail/haskell-cafe/2009-April/060665.html

Hi, Claus Reinke wrote:
One remaining issue is whether this encoding can be modified to allow for multiple independent instantiations of 'LA', 'LB', and 'LC' above, each with their own type parameters, in the same program.
Modules A and B can make their dependence on the ultimate client module more explicit. {-# LANGUAGE TypeFamilies #-} module A where type family Label client z :: client -> Label client z client = undefined {-# LANGUAGE TypeFamilies #-} module B where type family Label client z :: client -> Label client z client = undefined A client of A and B can just use them together, and ghc figures out the sharing constraints. {-# LANGUAGE TypeFamilies #-} module C where import A import B ok client = [ A.z client, B.z client] The type inferred for C.ok is ok :: (B.Label client ~ A.Label client) => client -> [A.Label client]. This seems to be already quite useful. However, a different client of A and B could hide it's use of A and B, and the type sharing constraints, from its own clients. {-# LANGUAGE TypeFamilies #-} module D (ok) where import A import B data D client = D client type family Label client type instance A.Label (D client) = D.Label client type instance B.Label (D client) = D.Label client ok :: client -> [D.Label client] ok client = [ A.z (D client), B.z (D client)] Note that D does not export the reified module identity D, and that the type of D.ok does not expose the fact that D is implemented in terms of A and B. This technique relies on the explicit management of the identities of modules both at compile-time (type annotation of D.ok) and run-time (creation of (D client) in the body of D.ok). While explicit management of modules at compile time is the point of the exercise, it would be better to avoid the passing of reified module identities at runtime. Tillmann

Hi again, Tillmann Rendel wrote:
{-# LANGUAGE TypeFamilies #-} module D (ok) where import A import B
data D client = D client
type family Label client type instance A.Label (D client) = D.Label client type instance B.Label (D client) = D.Label client
ok :: client -> [D.Label client] ok client = [ A.z (D client), B.z (D client)]
Oh, and note that I do not need UndecidableInstances here, because I match against the explicit module identity D. Tillmann

z :: client -> Label client z client = undefined
ok :: (B.Label client ~ A.Label client) => client -> [A.Label client]. ok client = [ A.z client, B.z client]
This technique relies on the explicit management of the identities of modules both at compile-time (type annotation of D.ok) and run-time (creation of (D client) in the body of D.ok). While explicit management of modules at compile time is the point of the exercise, it would be better to avoid the passing of reified module identities at runtime.
In particular, this variant requires all bindings in the module to take an explicit parameter, instead of having a single parameter for the whole module. Having a single module-identifier parameter for each binding is better than having lots of individual parameters for each binding, but the idea of a parameterized module is to avoid these extra per-binding parameters alltogether. Of course, my variant cheated a little, using the TF "feature" that TF applications hide implicit parameters from contexts, so instead of 'Label a~type=>type', we write 'Label a'. (which meant I had to fix the 'a' to something concrete to avoid ambiguous types, as I needed to use type families, not data families, to get the sharing) Nevertheless, interesting possibilities. Claus
participants (4)
-
Achim Schneider
-
Claus Reinke
-
Tillmann Rendel
-
Wolfgang Jeltsch