Capitalization and associated type families

Hi, I'm familiar with the capitalization rules for identifiers in Haskell and know that they are very useful and practical e.g. by allowing variables to be distinguished from "constants" in patterns etc. However I'm trying to understand on a much deeper level whether or not they can really be justified and indeed if there is a danger that they can lead to inconsistency in the light of future extensions to the language. For example, why is there any distinction between a type "variable" and a type "constant"? To make this clearer consider the type of (map): (a -> b) -> List a -> List b (ignoring the special [] syntax for list types which would just confuse the issue here). With explicit quantifiers, we'd write: forall a b. (a -> b) -> List a -> List b Now this begs the question: why does "List" need to start with a capital? (supposing that quantifiers were always written explicitly) The same distinction between "constants" and "variables" is also used in predicate logic e.g. On(x, Table) & Colour(x, Red) but similarly the above could surely be written just as easily by: exists x. on(x, table) & colour(x, red) where (on), (table), (colour), and (red) are just considered to be variables that have already been introduced (by some enclosing scope) instead of having to think of them as "constants" and make a distinction between "constant" and "variable". Anyway to get to the point of this post what I'm really trying to find is some explanation of why there is the concept of "constant" as opposed to "variable" or whether this concept is now redundant in the light of lexical scoping/ explicit quantification? Somewhat related to this issue is the syntax of associated type families. For example in section 5.2.2 of http://www.haskell.org/haskellwiki/GHC/Indexed_types there is the example: class C a b where data T a which raises in my mind several questions: 1) Why is (T) given a parameter? I.e. why does an associated type declared inside a class decl not automatically get all the parameters of the class passed to it so that one would just write: class C a b where data T f :: T -> T === data family T a b class C a b where ... f :: T a b -> T a b 2) Compare: class C a b t | a b -> t with class C a b where data T In other words, why is (T) a "constant" when it is declared inside the class but a "variable" when passed as a parameter? Is the fact that (T) needs to be given a parameter even when it is declared inside the "scope" of the class type parameters the result of needing to fulfill the idea of "constant" vs "variable"? 3) In the syntax: class C a b data T a it seems rather odd that the "a" in "T a" should be related to the "a" in "C a b" since one might expect that the name chosen for a parameter should be local to the data decl. Apologies for such a rambling post. I've spent literally at least 6 months scouring the internet trying to find a real discussion about capitalization rules because I'm trying to answer the following question: if I make use of a similar capitalization rule in my own language, can I guarantee that this rule will not lead to an inconsistency at some future point? Ie is there a real rock solid theoretical basis for making a distinction between "constants" and "variables" that holds even in the presence of lexical scoping? Thanks, Brian.

On Mon, 2008-08-04 at 19:51 +0100, Brian Hulley wrote:
Hi, I'm familiar with the capitalization rules for identifiers in Haskell and know that they are very useful and practical e.g. by allowing variables to be distinguished from "constants" in patterns etc.
However I'm trying to understand on a much deeper level whether or not they can really be justified and indeed if there is a danger that they can lead to inconsistency in the light of future extensions to the language.
For example, why is there any distinction between a type "variable" and a type "constant"?
To make this clearer consider the type of (map):
(a -> b) -> List a -> List b
(ignoring the special [] syntax for list types which would just confuse the issue here).
With explicit quantifiers, we'd write:
forall a b. (a -> b) -> List a -> List b
Now this begs the question: why does "List" need to start with a capital? (supposing that quantifiers were always written explicitly)
AFAIK, it doesn't; the variable/constant distinction is just there to let the implementation put the foralls in for you. Similarly, if we had explicit foralls on equations, we could say forall n x xn. drop (n+1) (x:xn) = drop n xn but forall n. drop n nil = nil and it would be perfectly clear that `nil' in the latter case was a constructor. Of course, if you want to introduce this rule, you'd better start out with it and be consistent --- require every variable to be explicitly brought into scope. I think you'd find it pretty annoying after trying to program in your new language for a while, though. jcc

Jonathan Cast wrote:
On Mon, 2008-08-04 at 19:51 +0100, Brian Hulley wrote:
For example, why is there any distinction between a type "variable" and a type "constant"?
forall a b. (a -> b) -> List a -> List b
Now this begs the question: why does "List" need to start with a capital? (supposing that quantifiers were always written explicitly)
AFAIK, it doesn't; the variable/constant distinction is just there to let the implementation put the foralls in for you.
Hi Jonathan - thanks - this confirms what I've been thinking. However consider: class Foo a where f :: a -> b -> (a, b) Here there is no capitalization distinction in the type of (f) but the implementation can still insert the "forall b." since (a) is already in scope. Therefore similarly, if type constructors like "List" were instead written using lowercase, since they would already be in scope it would be clear even without explicit quantifiers that (a) and (b), but not (list), were to be quantified in the type of (map) (assuming of course that there was no top level data decl for (a) or (b)).
Similarly, if we had explicit foralls on equations, we could say
forall n x xn. drop (n+1) (x:xn) = drop n xn
but
forall n. drop n nil = nil
and it would be perfectly clear that `nil' in the latter case was a constructor.
Actually I quite like the above because it shows that pattern matching can be understood in terms of universally quantified predicates. Regarding the verbosity, patterns are something of a special case since variables (that are introduced in the pattern) can only occur once and there is no need to have higher rank quantification in value patterns (disregarding explicit tests for _|_). Therefore special syntax could be used to make the above shorter e.g. borrowing the '?' from Felix (http://felix.sourceforge.net/doc/tutorial/introduction/en_flx_tutorial_top.h... (section 2.11/2.17 etc)): drop (?n + 1) (_ : ?xn) = drop n xn drop ?n nil = nil (where glued prefix binds tighter than application).
Of course, if you want to introduce this rule, you'd better start out with it and be consistent --- require every variable to be explicitly brought into scope. I think you'd find it pretty annoying after trying to program in your new language for a while, though.
I agree that there are many good points in favour of Haskell's use of a case distinction, not the least being that it gives definite guidance on when to use lower or upper case identifiers and therefore avoids the mess you find in C++ programs where every programmer has their own private conventions which can lead to confusion and useless debate. However it still seems to me that the case distinction, at least on the level of type constructors, becomes problematic and possibly inconsistent especially if I consider the effects of trying to incorporate something like the ML module system as well as type classes. (For example the Moby language (http://moby.cs.uchicago.edu/ ), a possible successor to SML, uses capitalization for type constructors similar to Haskell which means that type parameters in functor applications must be uppercase, which conflicts with the use of lowercase type params for classes which are a special kind of functor... (Moby itself doesn't encounter any difficulty since it doesn't have type classes)). Best regards, Brian.

On Mon, 2008-08-04 at 23:04 +0100, Brian Hulley wrote:
Jonathan Cast wrote:
On Mon, 2008-08-04 at 19:51 +0100, Brian Hulley wrote:
For example, why is there any distinction between a type "variable" and a type "constant"?
forall a b. (a -> b) -> List a -> List b
Now this begs the question: why does "List" need to start with a capital? (supposing that quantifiers were always written explicitly)
AFAIK, it doesn't; the variable/constant distinction is just there to let the implementation put the foralls in for you.
Hi Jonathan - thanks - this confirms what I've been thinking. However consider:
class Foo a where
f :: a -> b -> (a, b)
Here there is no capitalization distinction in the type of (f) but the implementation can still insert the "forall b." since (a) is already in scope. Therefore similarly, if type constructors like "List" were instead written using lowercase, since they would already be in scope it would be clear even without explicit quantifiers that (a) and (b), but not (list), were to be quantified in the type of (map) (assuming of course that there was no top level data decl for (a) or (b)).
Sure. ML does it this way, in fact. Not for type variables, but for regular pattern variables. So, IIRC, in ML the empty list is written nil. But (playing Devil's advocate) if you have a function declaration f niil = ... That's a variable pattern... The Miranda capitalization rule (which Haskell adopted) is designed to find such typos automatically, since now you have to write f Niil and Niil can't be a variable. The fact that type signatures in Haskell type classes are subject to the same problem is certainly a weakness of the design. I think if new type variables (universally quantified locally) had a different syntax, so you said f :: a -> 'b -> (a, 'b) where a by itself meant a had better already be in scope, that would be more consistent. Then you could drop the capitalization convention entirely.
Similarly, if we had explicit foralls on equations, we could say
forall n x xn. drop (n+1) (x:xn) = drop n xn
but
forall n. drop n nil = nil
and it would be perfectly clear that `nil' in the latter case was a constructor.
Actually I quite like the above because it shows that pattern matching can be understood in terms of universally quantified predicates.
Regarding the verbosity, patterns are something of a special case since variables (that are introduced in the pattern) can only occur once and there is no need to have higher rank quantification in value patterns (disregarding explicit tests for _|_). Therefore special syntax could be used to make the above shorter e.g. borrowing the '?' from Felix (http://felix.sourceforge.net/doc/tutorial/introduction/en_flx_tutorial_top.h... (section 2.11/2.17 etc)):
drop (?n + 1) (_ : ?xn) = drop n xn drop ?n nil = nil
(where glued prefix binds tighter than application).
I think this goes with ML's type variable syntax ('a), actually.
Of course, if you want to introduce this rule, you'd better start out with it and be consistent --- require every variable to be explicitly brought into scope. I think you'd find it pretty annoying after trying to program in your new language for a while, though.
I agree that there are many good points in favour of Haskell's use of a case distinction, not the least being that it gives definite guidance on when to use lower or upper case identifiers and therefore avoids the mess you find in C++ programs where every programmer has their own private conventions which can lead to confusion and useless debate.
Another system with this advantage would be to invalidate identifiers beginning with upper case letters entirely :) (If you really wanted to go B&D, disallow underscores as well. Then every identifier /has/ to be camel-case.)
However it still seems to me that the case distinction, at least on the level of type constructors, becomes problematic and possibly inconsistent especially if I consider the effects of trying to incorporate something like the ML module system as well as type classes. (For example the Moby language (http://moby.cs.uchicago.edu/ ), a possible successor to SML, uses capitalization for type constructors similar to Haskell which means that type parameters in functor applications must be uppercase, which conflicts with the use of lowercase type params for classes which are a special kind of functor... (Moby itself doesn't encounter any difficulty since it doesn't have type classes)).
It's weird. ML-derived languages are functional languages at the value level (and have regular lambda-bound variables in consequence), but are logical languages at the type level (and have logical variables in consequence). So ordinary lambda-bound variables, like in ML-style functors (parametrized modules) act more like type constants than like type variables. Food for thought --- thanks for that! jcc

Jonathan Cast wrote:
It's weird. ML-derived languages are functional languages at the value level (and have regular lambda-bound variables in consequence), but are logical languages at the type level (and have logical variables in consequence). So ordinary lambda-bound variables, like in ML-style functors (parametrized modules) act more like type constants than like type variables.
Thanks - the above seems to be a promising avenue of thought to arrive at clarity regarding case distinctions at the type level, and I can see here the basis for a good argument of why there would actually be no inconsistency regarding the use of capitals for module type members and lowercase for the class parameters. Perhaps this is also the root of the difference between associated type synonyms and class params. Ie at the type level, uppercase == functional variable (aka "constant") and lowercase == logic variable... The type level now seems to me to be quite significantly more complicated than the value level due to the mixing of functional and logic programming, not to mention that at the type level "variables" in an outer scope become "constants" in an inner scope as far as pattern matching is concerned whereas for value patterns variables are always fresh and never scoped (over other patterns). Therefore I've decided to deal with the issue of case at each level separately since it seems immediately clear that the case distinction at the value level, as in Haskell/OCaml/Moby, is obviously good due to the "foolproof" nature of value patterns you pointed out, whereas at the type level it may also be good but I'll need a few more months to think about it... ;-) Cheers, Brian.

Brian Hulley wrote:
class Foo a where
f :: a -> b -> (a, b)
Here there is no capitalization distinction in the type of (f) but the implementation can still insert the "forall b." since (a) is already in scope. Therefore similarly, if type constructors like "List" were instead written using lowercase, since they would already be in scope it would be clear even without explicit quantifiers that (a) and (b), but not (list), were to be quantified in the type of (map) (assuming of course that there was no top level data decl for (a) or (b)).
So adding b to the export list of an imported module would change the type of f? Tillmann

Tillmann Rendel wrote:
Brian Hulley wrote:
class Foo a where
f :: a -> b -> (a, b)
Here there is no capitalization distinction in the type of (f) but the implementation can still insert the "forall b." since (a) is already in scope. Therefore similarly, if type constructors like "List" were instead written using lowercase, since they would already be in scope it would be clear even without explicit quantifiers that (a) and (b), but not (list), were to be quantified in the type of (map) (assuming of course that there was no top level data decl for (a) or (b)).
So adding b to the export list of an imported module would change the type of f?
Yes, if the module in which the above class decl were defined imported a data type (b) from some other module (or defined it elsewhere in the same module) then (f)'s type would change. (Of course at the moment this can't happen because (b) is lowercase so it can't be introduced by data/type decls or imported/exported.) The type of (f) would also change if Haskell were extended to support nested classes: class Bar b where class Foo a where f :: a -> b -> (a, b) But as Jonathan pointed out this behaviour is not really all that ideal since even in Haskell at the moment a simple spelling mistake could cause silent quantification instead of an error (which is perhaps the main justification for why constructors introduced by data/type decls must currently be capitalized -- unfortunately there is no corresponding protection when writing types inside class decls): class Zap aaaaa where g :: aaaaa -> aaaa ^ oops! Cheers - Brian.

On Tue, Aug 5, 2008 at 12:22 PM, Brian Hulley
unfortunately there is no corresponding protection when writing types inside class decls:
class Zap aaaaa where g :: aaaaa -> aaaa ^ oops!
Well, I think there's a pretty good protection; instances usually won't typecheck until it's fixed. -- ryan
participants (4)
-
Brian Hulley
-
Jonathan Cast
-
Ryan Ingram
-
Tillmann Rendel