Advantages of using qualified names and a uniform naming convention

Hi - There's lots of great Haskell libraries available, but little standardization regarding naming conventions or code organization. In this post I try to illustrate some dimensions of the question of how to form names for things and offer my opinion on specific examples knowing that this of course only represents my own personal view but may nevertheless be of objective interest as an *example* of a view and a possible starting point for discussion/ unification of naming conventions in Haskell code if such a thing is even desired/desireable at all. It would be helpful to build up a more rigorous algorithm for determining names, but I think the first step is to try and isolate aspects of the problem thus the following is an initial attempt in this direction... Firstly, I found the following advice by Henning Thielemann very useful in my own code [1]: In the style of Modula-3 I define one data type or one type class per module. The module is named after the implemented type or class. Then a type is named T, and a type class C. I use them qualified, e.g. Music.T or Collection.C. Similarly, if a type has only one constructor then I call it Cons and use it qualified MidiFile.Cons [I don't agree with this last suggestion]. This style also answers the annoying question whether the module name should be in singular or plural form: Always choose singular form! where the only thing I change is that the name of the value constructor for a type with only one value constructor should be the same as the name of the type constructor eg newtype T a = T (a->Int) which seems to be the normal convention anyway and seems better imho than introducing a different identifier for the value constructor when the namespaces for values and types/classes are already distinct. The advantage of such a strict rule is that code becomes deliciously uniform and neat, even if you just follow the part of it that deals with types and modules and use some different convention for classes. It's also particularly nice if you're used to object oriented languages, since it preserves the intuition that you can deal with one thing at a time just as you would put each object into its own separate file or C++ unit. Of course, not all modules can have just one type but I find that most only need to export one type and in any case the (main) exported type can be called "T". The exception is when the types are mutually recursive, so that different types which should really have their own module have to be put into the same module due to limitations of the compiler being used (eg GHC does not yet support mutually recursive modules involving types declared using newtype deriving and also requires hs-boot files which is imho a horrible mess as bad as the need to write separate header files in C and keep decls in sync therefore I avoid them at all costs). Consider the following: import Data.IORef main = do x <- newIORef (0::Int) writeIORef x 3 compared to: import qualified Prime.Data.IORef as Ref main = do x <- Ref.new (0::Int) Ref.write x 3 To my mind the latter is infinitely cleaner looking, because the names of the functions just specify their purpose and nothing else, and it is immediately clear which module they have come from, and there is also the advantage that a shorter prefix could be used if required eg "R". In contrast, the first code example is polluted with mutiple repetitions of "IORef" - we already know we're in the IO monad so why keep stressing the point that we're using IORefs as opposed to STRefs etc? - and there is not even any certainty that the functions so named even come from that module (unless we're already familiar with the contents of the other modules of course). Not only does the latter code look startlingly beautiful, but there is a significant advantage when using an editor which is sophisticated enough to make use of it: after typing the dot in "Ref.", it should be possible for an editor to then display a pop-up list of the visible contents of the module (various possible patent issues aside :-( ) ie the programmer just needs to know the module alias name rather than the names of each individual function/value/type/class when coding. Furthermore (see there's almost no end to the advantages of this convention! ;-)) if the implementation of IORef's was improved later, the whole module can be instantly ported to use the improved implementation just by changing the one import line rather than a painful search and replace of "newIORef" by "newBetterIORef" etc. I think the current presence of names like "newIORef" in the base library is perhaps a result of historical development - qualified names or aliases may not yet have been invented so there was probably a need to follow a convention of appending the type to the purpose when creating a function/value name, but now that we have the ability to use qualified names it would seem to be a lot better if everything could be changed to use them ie Data.Set should export T not Set so you would use foo :: Set.T -> Int foo = Set.size instead of import Data.Set as Set and trusting to luck that there is no other data type in scope with the name Set, or having to use Set.Set which looks wierd. It probably goes without saying that with the use of qualified imports, symbols are absolutely gross and should be avoided at all costs. They're totally unreadable, not just because they're a squiggly unpronouncable mess, but because you need to simulate an operator parser in your head to discover what's being applied to what. Someone could even define <+> to bind tighter than <*>, so it's not even safe to rely on normal conventions, and different modules in a program could use the same symbol in totally different ways with different precedences, leading to a real headache and unnecessary bugs when jumping between code in an editor. For readable code, plenty of descriptive words and parentheses are surely preferable. The only exception I'd make is the use of >>=, >> (which is so fundamental the alternative do notation is built into the language), ($), ($!), (.), and common arithmetic ops. Returning to Data.Set, let's now consider some of the names that are used: null :: Set a -> Bool empty :: Set a (null) tests to see whether or not a set is empty, and (empty) is the empty set. The relationship between the words "null" and "empty" can only be found by looking in an English dictionary. I propose that linguistic relationships should never feature in names used in programming. Instead the relationships should always be expressed as far as possible in the forms of the identifiers as sequences of characters. Having a clear unified framework to create names would also help not only the library author but also library users. Thus I propose that (null) should actually have been called "isEmpty", so that the relationship with the use of (empty) to denote the empty set is immediately apparent, and the use of the word "is" would immediately tell you that the function is a predicate. It might even be advantageous to reserve more characters for use in identifiers (since the infamous ASCII symbols are so abhorrent anyway ;-) ) so we could have a similar rule to Scheme, that predicates would end with a question mark thus relieving us of the need to decide between "is" and "has" (to try and eliminate as much of the messiness and indecision caused by natural language as possible), though of course this would be a more long term idea eg: -- So related things appear alphabetically together... empty? :: Set a -> Bool empty :: Set a It could be argued that it would be more in keeping with left-to-right thinking to put the '?' first but then we'd lose the "related things should be together in any alphabetical list of functions/values", though such a compromise is already necessary when using "isEmpty" rather than "emptyIs" which would perhaps just sound too unnatural ;-) Moving on to Data.List, we find a confusion of different spatial, temporal, and historical viewpoints jostling valiantly for supremacy in the programmer's mind: head tail -- funny cartoon-like image of a list last init -- temporal foldl foldr -- spatial (left to right) Despite the fact that (last, init) is somehow the dual of (head, tail), we have to switch from a temporal conception of a list to a historical "cartoon" conception to move between them. Then we come to foldl, foldr where we think in terms of left and right. Also, with foldl, foldr, it is just lucky that we already know there is a word called "fold" so that we know the trailing "r" or "l" is intended to be a suffix - it's not nearly so clear in cases like "reducer" and "reducel" where "reducer" is also a single word in English. Therefore I propose that for a list there should only be one underlying concept, that of a spatial sequence going from left to right, and that all functions should be named in accordance with this alone ie: atL atLs -- similar to (x:xs) atR atRs foldL foldR thus there would only be one concept (the spatial sequence) to grasp which the programmer could then map internally onto another concept such as time if required, without being forced by the names in Data.List to jump confusingly between them. A discussion of naming in Data.List would certainly be incomplete without a mention of (nub). It must surely rank as one of the most peculiar identifiers in the whole history of programming. The documentation thankfully explains that it means "essence", but then goes on to say that it is in fact just a function for removing duplicates from a list. The mind can only convulse in the most tortuous configurations to try and reverse engineer this strange correspondence. Would the simple name (removeDuplicates) or even (asSet) not be much easier all round? Related to lists, we of course have sequences and there, instead of cons and snoc, we could just have pushL and pushR which are nice logical names that don't conjure up images of candles and devils crawling out of mirrors etc like the use of reversed spelling in "snoc". In case the above seems overly critical of other people's hard work and good intent, I'd like to confess that I myself am often tempted off the path when it comes to the creation of identifiers involving adjectives eg: type BlueCar = ... type RedCar = ... The problem here is that these names, presumably both to do with "Car", are not going to appear next to each other in any alphabetical listing (if there are other names too), whereas: type CarBlue = ... type CarRed = ... will. Thus the position of the adjective in natural language (in this case English) has to be ignored if you want a programming environment to display related things together. Finally, as a piece de resistance :-), consider the following two definitions of (>>=) for the continuation monad: newtype Cont r a = Cont { runCont :: (a -> r) -> r } m >>= k = Cont $ \c -> runCont m $ \a -> runCont (k a) c -- (1) Cont a_r_r >>= a_C_b_r_r = Cont $ \ b_r -> a_r_r (\a -> runCont (a_C_b_r_r a) b_r) -- (2) In (1), we have to know what the associativity of ($) is (and this is a real fire bed of controversy!) and the variable names, apart from (a), don't give us the faintest clue what's going on, and in fact are doubly confusing because (m) is used to represent a value of type (m a) so there is a kind of kind error in the value name. In contrast, (2) uses the simple convention that the type is reflected more or less directly in the name so that a_r_r means a value of type (a->r)->r (there is unfortunately no way to indicate bracketing in the name). Thus by reading the definition we can understand what's going on just by cancelling out components of the name thus: a_C_b_r_r a gives us C_b_r_r and runCont (C_b_r_r) gives us b_r_r and (b_r_r b_r) gives us (r) so it's obvious that we're supplying an argument of type a->r to a_r_r as required. A great thing about Haskell is that it's quite simple to start renaming things already without having to change existing modules. In my own code I've simply just created new modules for IORef and Unique that re-export the original module contents but with the unsuffixed names, then I import from a hierarchy optimistically called "Prime" instead of the normal hierarchy :-) Having said all this, it's nevertheless an open question to me, regarding the meeting between human psychology and the desire for consistent logical naming conventions, if perhaps the rapid jumping between different viewpoints encountered in the functions in Data.List etc has a stimulating effect on the imagination in terms of encouraging mobility of thought and a danger with fully rationalised names could be that they may make one dry and brittle - though at least there would be no danger of a headless fire breathing dragon emerging from a mirror because you've just typed "cons" backwards.... :-) Regards, Brian. [1] http://haskell.org/hawiki/UsingQualifiedNames (bottom of the page) -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

On Mon, 4 Sep 2006, Brian Hulley wrote:
Firstly, I found the following advice by Henning Thielemann very useful in my own code: http://haskell.org/hawiki/UsingQualifiedNames (bottom of the page)
In the style of Modula-3 I define one data type or one type class per module. The module is named after the implemented type or class. Then a type is named T, and a type class C. I use them qualified, e.g. Music.T or Collection.C. Similarly, if a type has only one constructor then I call it Cons and use it qualified MidiFile.Cons [I don't agree with this last suggestion]. This style also answers the annoying question whether the module name should be in singular or plural form: Always choose singular form!
Applause! ... and thanks for promoting that naming style. :-) (I don't want to call it "my style", but the only Haskell libraries that use it and are used by me, are those, I have written myself. :-)
where the only thing I change is that the name of the value constructor for a type with only one value constructor should be the same as the name of the type constructor eg newtype T a = T (a->Int) which seems to be the normal convention anyway and seems better imho than introducing a different identifier for the value constructor when the namespaces for values and types/classes are already distinct.
I agree. I have chosen Cons, because I consider T as an abbreviation of 'Type', and 'Type' would be not a good name for a constructor. Initially I used C, but found that this is better for type classes.
It probably goes without saying that with the use of qualified imports, symbols are absolutely gross and should be avoided at all costs. They're totally unreadable, not just because they're a squiggly unpronouncable mess, but because you need to simulate an operator parser in your head to discover what's being applied to what. Someone could even define <+> to bind tighter than <*>, so it's not even safe to rely on normal conventions, and different modules in a program could use the same symbol in totally different ways with different precedences, leading to a real headache and unnecessary bugs when jumping between code in an editor. For readable code, plenty of descriptive words and parentheses are surely preferable. The only exception I'd make is the use of >>=, >> (which is so fundamental the alternative do notation is built into the language), ($), ($!), (.), and common arithmetic ops.
I also like to remind a sparingly usage of infix operators, because of these reasons. I want to add that precedences are not only a problem for human readers but also for tools. Imagine a source code formatting tool which respects precedences. It shall format a+ b*c instead of a+b* c in order to highlight sub-expressions. However, this is only possible if the tool knows the precedences imported from all modules. That is, a module alone can be formatted correctly only if all imported modules are known and the tool must be able to fetch this information from the other modules (which are possibly present only in compiled form). http://www.haskell.org/hawiki/SyntacticSugar_2fCons
Thus I propose that (null) should actually have been called "isEmpty", so that the relationship with the use of (empty) to denote the empty set is immediately apparent, and the use of the word "is" would immediately tell you that the function is a predicate.
I like that, too!
It might even be advantageous to reserve more characters for use in identifiers (since the infamous ASCII symbols are so abhorrent anyway ;-) ) so we could have a similar rule to Scheme, that predicates would end with a question mark thus relieving us of the need to decide between "is" and "has" (to try and eliminate as much of the messiness and indecision caused by natural language as possible), though of course this would be a more long term idea eg:
-- So related things appear alphabetically together... empty? :: Set a -> Bool empty :: Set a
I think the separation of alpha-numeric characters and other symbols simplifies things, and shall be preserved.
It could be argued that it would be more in keeping with left-to-right thinking to put the '?' first but then we'd lose the "related things should be together in any alphabetical list of functions/values", though such a compromise is already necessary when using "isEmpty" rather than "emptyIs" which would perhaps just sound too unnatural ;-)
In Mathematica 'is' functions are denoted by trailing 'Q' for 'Query'. May this be an option?
Moving on to Data.List, we find a confusion of different spatial, temporal, and historical viewpoints jostling valiantly for supremacy in the programmer's mind:
head tail -- funny cartoon-like image of a list last init -- temporal foldl foldr -- spatial (left to right)
Despite the fact that (last, init) is somehow the dual of (head, tail), we have to switch from a temporal conception of a list to a historical "cartoon" conception to move between them. Then we come to foldl, foldr where we think in terms of left and right.
Concerning pairs of identifiers, I want to use the opportunity again to promote http://www.haskell.org/hawiki/PairsOfIdentifiers including some thoughts about Show/Read, where I prefer pairs like Read/Write, Show/Hide, Parse/Format would be better according to my limited knowledge of the English language.
Therefore I propose that for a list there should only be one underlying concept, that of a spatial sequence going from left to right, and that all functions should be named in accordance with this alone ie:
atL atLs -- similar to (x:xs)
I may mix atLs up with today's 'init', because it sounds like "many elements beginning from the left".
A discussion of naming in Data.List would certainly be incomplete without a mention of (nub). It must surely rank as one of the most peculiar identifiers in the whole history of programming. The documentation thankfully explains that it means "essence", but then goes on to say that it is in fact just a function for removing duplicates from a list. The mind can only convulse in the most tortuous configurations to try and reverse engineer this strange correspondence. Would the simple name (removeDuplicates) or even (asSet) not be much easier all round?
I vote for 'removeDuplicates'. The name 'nub' might be a reason for that some Haskell programmers don't know about this useful function.
The problem here is that these names, presumably both to do with "Car", are not going to appear next to each other in any alphabetical listing (if there are other names too), whereas:
type CarBlue = ... type CarRed = ...
will. Thus the position of the adjective in natural language (in this case English) has to be ignored if you want a programming environment to display related things together.
Good point. Natural languages sometimes places things in `wrong order`, that is less important things first. Maybe you know that in German numbers are spelled in quite a mixed order, that is hundreds-ones-tens.
A great thing about Haskell is that it's quite simple to start renaming things already without having to change existing modules. In my own code I've simply just created new modules for IORef and Unique that re-export the original module contents but with the unsuffixed names, then I import from a hierarchy optimistically called "Prime" instead of the normal hierarchy :-)
In order to export types with other names, you have to use 'type' assignments, and a type synonym cannot be an instance of a type class. That is these modules cannot replace the original ones. :-(

Henning Thielemann wrote:
On Mon, 4 Sep 2006, Brian Hulley wrote:
-- So related things appear alphabetically together... empty? :: Set a -> Bool empty :: Set a
I think the separation of alpha-numeric characters and other symbols simplifies things, and shall be preserved.
In Mathematica 'is' functions are denoted by trailing 'Q' for 'Query'. May this be an option?
Or possibly 'P' for 'Predicate', since the Q suffix is already used by Template Haskell though of course it's possible to use Q Exp instead of ExpQ in that case, and in fact I'm slightly mystified why Template Haskell defines type synonyms like: type ExpQ = Q Exp when all this does is to force the reader of code, on encountering the identifier (ExpQ), to have to look for it's definition, when (Q Exp) would give the definition immediately inline at the "bargain price" of one space character, and the latter would also fit with the preference for writing (IO Int) as opposed to defining a type synonym to get (IntIO). Though an advantage of 'Q' instead of 'P' for predicates would be that 'P' could then be used to mark partial versions of functions eg: viewL :: Monad m => [a] -> m (a, [a]) viewLP :: [a] -> (a, [a]) -- when we already know the list is non-empty
head tail -- funny cartoon-like image of a list
atL atLs -- similar to (x:xs)
I may mix atLs up with today's 'init', because it sounds like "many elements beginning from the left".
I see what you mean. The problem seems to be that the rest of the left view of a list is that portion of the list which lies to the right (!) so there is a conflict between using the 'L' suffix to mean "this operation refers to the left view" versus "this operation returns element(s) which are on the left". With: viewLP :: [a] -> (a, [a]) viewRP :: [a] -> ([a], a) we could use (fst . viewLP) which suggests alternatives to head, tail, last, init eg (viewLPL) to mean "the left part of the partially defined left view" ie (head). I think it is a question of individual psychology whether the name (viewRPL) is more difficult or easier to understand/remember than (init) - it might be a mistake to expect a consensus to emerge here. Maybe both methodical and "folklore" names could exist simultaneously in such cases. Best regards, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

On Mon, 4 Sep 2006, Brian Hulley wrote:
The problem here is that these names, presumably both to do with "Car", are not going to appear next to each other in any alphabetical listing (if there are other names too), whereas:
type CarBlue = ... type CarRed = ...
will. Thus the position of the adjective in natural language (in this case English) has to be ignored if you want a programming environment to display related things together.
You have still not mentioned the C legacy Float and Double. Where the first one is certainly an abbreviation for "floating point number", the second one abbreviates "double precision floating point number", but the abbreviations are quite different. What about Single and Double, or FloatSingle and FloatDouble, or FloatSP and FloatDP?
participants (2)
-
Brian Hulley
-
Henning Thielemann