what I learnt from my first serious haskell programm

Hi everybody, I came to haskell recently (one month) and I have now written my first serious program. I am still busy improving it, but here is a small report of what I learned and and my impressions of the language. I found no show stoppers, but a couple of things that I didn't like much. * namespaces * First off something that disturbed me but does not seem to be discussed much are namespaces, or rather the lack of them. Yes I know that haskell does have namespaces, but with object oriented languages you basically get a namespace for each class, in haskell you need modules, and I often like to have groups of related types in the same module. This might seem like a small issue, but the fact when a program or library grows it gets worse, and for me having a consistent way of calling functions is very important, it lets me spend less time thinking about naming. This is something makes the difference between a set of libraries and well thought out framework. Records then also put everything in the module namespace, and it seems that this misfeature has already been discussed. Anyway the namespaces would not be a problem if one would use always the most specialized function (and throw an error is no such function exist) as for example is done by Aldor ( http://www.aldor.org ) a language with a very nice type system, that unfortunately has been kind of dying for a long time because of its closed source. So I am wondering how people cope with them, share your opinion, for me the best thing seem to be to try to use one module per "big" type, and then "import qualified x as y", what are good coding practices? * space leaks * Sure enough in my program I had a space leak, actually an exponential space leak. The short story: it did bite me, and I think that many people that have problems with the performance in haskell, is because of them. The whole story is that I wanted to find the optimal node at depth n of an infinite tree. I even wrote down the list of the optimal nodes at each depth from which I would take some, but even before compiling it I realized that this would never be efficient. So I passed the maximum depth as argument to the function that generates and immediately reduces the tree. It had a huge space leak, and I knew what was happening, the function wad doing a breath first search of the tree instead of a depth first search, so I put a seq to make it go deeper first. It helped, but a space leak was still there. So I used the profiler, and basically the default profiler is useless for space leaks (it just says the total amount allocated which was constant). After some thinking I finally found out (a posteriori obvious) that I had to fully evaluate the actual node the the depth, and finally return the list. In the meantime I had introduced a couple of seq too much, and also switched to a strict data structure (which helped a little but was not the biggest issue. I really looked into memory profiling after having fixed the problem (but having seen how much work it could be even when you know where the problem is). I tried to use it and I think that it would have been useful, I will see how much with the next bug. With respect to this bibliographical heap profile (+RTS -hb) is broken on my AMD 64 Linux machine with GHC 6.6.20070129, I submitted a bug. * vector & matrices * A thing that bothered me a little was the absence of good standardized vectors and matrices, for now I rolled my own 3x3, I have seen numerical prelude, maybe in the future thre will be a standard interface for matrixes... Rolling my own mattrixes/vectors I have hit against some limitations of classes, nothing terrible, but not so nice, and something that I gather is known for example the fact that you cannot overload + or (in haskell 98) you cannot have overlapping instances. Again (see namespaces problem) changing names for the operations is the way out but it is not so nice. * parallel * For the future I want to make my algorithm parallel using threads (should be easy) . Then I was thinking about being parallel between computers, so to have really make this exercice a test if I really can switch to haskell for my programming, I was thinking about MPI, suggestions? thanks for having read till here :) Fawzi

On Sat, 17 Mar 2007, Fawzi Mohamed wrote:
So I am wondering how people cope with them, share your opinion, for me the best thing seem to be to try to use one module per "big" type, and then "import qualified x as y", what are good coding practices?
Do that and use hierarchical modules if there's a bunch of functionality you want to pull together for ease of importing. Modules are there almost entirely to provide namespaces and that's why there aren't other constructs to do it. You don't want an overloading mechanism that isn't type classes, either - how else would you decide "most specific"? -- flippa@flippac.org A problem that's all in your head is still a problem. Brain damage is but one form of mind damage.

On Sat, 17 Mar 2007, Philippa Cowderoy wrote:
On Sat, 17 Mar 2007, Fawzi Mohamed wrote:
So I am wondering how people cope with them, share your opinion, for me the best thing seem to be to try to use one module per "big" type, and then "import qualified x as y", what are good coding practices?
http://haskell.org/haskellwiki/Qualified_names http://haskell.org/haskellwiki/Category:Style

On 17/03/07, Fawzi Mohamed
* namespaces *
First off something that disturbed me but does not seem to be discussed much are namespaces, or rather the lack of them.
I'm also in the middle of writing a medium-sized program in Haskell, but my experiences have been somewhat the opposite; I've found that although most people complain about the lack of namespaces I haven't really missed them.
Yes I know that haskell does have namespaces, but with object oriented languages you basically get a namespace for each class, in haskell you need modules, and I often like to have groups of related types in the same module.
Surely within a group of related types there'd be no overlapping names anyway?
Records then also put everything in the module namespace, and it seems that this misfeature has already been discussed.
I like to prefix my record accessors with three letters that describe the type. For example, in my forum software, the author of a post can be pulled out of a Post value using pstAuthor. Although this is somewhat low-tech, it's a solution that works well and makes reading code easier, while of course solving the one-namespace problem. I don't really see why anything more complex would be needed.
So I am wondering how people cope with them, share your opinion, for me the best thing seem to be to try to use one module per "big" type, and then "import qualified x as y", what are good coding practices?
A practice I've seen a lot in small- to mid-sized programs is to have a Types module that contains definitions of the types used in the program.
* vector & matrices *
A thing that bothered me a little was the absence of good standardized vectors and matrices, for now I rolled my own 3x3, I have seen numerical prelude, maybe in the future thre will be a standard interface for matrixes... Rolling my own mattrixes/vectors I have hit against some limitations of classes, nothing terrible, but not so nice, and something that I gather is known for example the fact that you cannot overload + or (in haskell 98) you cannot have overlapping instances.
Why not write up your module then send it off to the libraries@haskell.org mailing list? If this has frustrated you then it's probably frustrated others. Why not contribute something back and polish this problem off? And you can overload (+), just make your matrix type an instance of Num. If some operations shouldn't be supported (like signum, perhaps), the general strategy is just to use error (this is basically due to the fact that Num has too many methods because we don't have a sane way of splitting up type classes. Type class synonyms -- google for them -- look like a good solution, but are lacking an implementation). -- -David House, dmhouse@gmail.com

G'day all.
Quoting David House
Surely within a group of related types there'd be no overlapping names anyway? [...] I like to prefix my record accessors with three letters that describe the type.
I do pretty much that too, but I wouldn't say that I "like" to do it. You're replacing a namespace system with a naming convention. If you enjoy that sort of thing, you might be happier programming in C. (Sorry, that was unfair.)
I don't really see why anything more complex would be needed.
Famous last words. Cheers, Andrew Bromage

Thanks for the long answer David, David House wrote:
On 17/03/07, Fawzi Mohamed
wrote: [...] Surely within a group of related types there'd be no overlapping names anyway? yes, but I found out that I would have an overlap with functions that I wanted to use and function I wanted to define, resulting in quite some qualifying. Also arrays, inset,... have quite some overlapping. For array the use of the IArray typeclass kept the things nice also using Arrays and UArrays together, but adding IntSet to the whole worked only qualifying, and then I also wanted to define a fold for my type... I found Data.Foldable, but then in the end I just prefixed my operation names which is what you did for record names. Is this recommanded? It makes it difficult to change types, but at least is easy to memorize... So I am wondering how people cope with them, share your opinion, for me the best thing seem to be to try to use one module per "big" type, and then "import qualified x as y", what are good coding practices?
A practice I've seen a lot in small- to mid-sized programs is to have a Types module that contains definitions of the types used in the program.
ok I will think about it
* vector & matrices *
A thing that bothered me a little was the absence of good standardized vectors and matrices, for now I rolled my own 3x3, I have seen numerical prelude, maybe in the future thre will be a standard interface for matrixes... Rolling my own mattrixes/vectors I have hit against some limitations of classes, nothing terrible, but not so nice, and something that I gather is known for example the fact that you cannot overload + or (in haskell 98) you cannot have overlapping instances.
Why not write up your module then send it off to the libraries@haskell.org mailing list? If this has frustrated you then it's probably frustrated others. Why not contribute something back and polish this problem off?
I thought about it, the problem is that I need just a partial implementation of it, but if I will have an enough cleaned and version I will do it.
And you can overload (+), just make your matrix type an instance of Num. If some operations shouldn't be supported (like signum, perhaps), the general strategy is just to use error (this is basically due to the fact that Num has too many methods because we don't have a sane way of splitting up type classes. Type class synonyms -- google for them -- look like a good solution, but are lacking an implementation).
This is is very ugly in my opinion, because for me a type class should represent something more than just a way to overload, is something is not a number then it should not have the class Num. In this I liked very much the approach of aldor where being an instance of a class is separated from overloading, and must always be explicit, so that a type class can be used also to advertise that something (for example) is commutative and it can have exactly the same functions as another class. Maybe here I should say something explicitly that I realized I had completely forgotten to say: I like haskell very much and I enjoyed using it, but there are some areas where coming from other languages do not seem too well rounded (but maybe it is just that I am not yet fully *in* haskell. by Fawzi

On Mon, 19 Mar 2007, Fawzi Mohamed wrote:
A practice I've seen a lot in small- to mid-sized programs is to have a Types module that contains definitions of the types used in the program.
ok I will think about it
I'd avoid that and suggest a more decentralized design, where each module contributes one main type and according functions.

Quoth Henning Thielemann, nevermore,
On Mon, 19 Mar 2007, Fawzi Mohamed wrote:
A practice I've seen a lot in small- to mid-sized programs is to have a Types module that contains definitions of the types used in the program.
ok I will think about it
I'd avoid that and suggest a more decentralized design, where each module contributes one main type and according functions.
That sounds like a kind of stateless object-oriented approach. Interesting. I probably do something like that but have never really formalised my thoughts on the matter. Nice one! -- Dougal Stanton

On Mar 19, 2007, at 9:56 AM, Henning Thielemann wrote:
On Mon, 19 Mar 2007, Fawzi Mohamed wrote:
A practice I've seen a lot in small- to mid-sized programs is to have a Types module that contains definitions of the types used in the program.
ok I will think about it
I'd avoid that and suggest a more decentralized design, where each module contributes one main type and according functions.
I'd just like to mention that I've used the central "Types" module myself on occasion. The main reason is to avoid the need for mutually recursive modules, and not because its a particularly nice design. I try to arrange the user-visible API around some coherent organizing principle, such as the one Henning proposes, but that sometimes leads to module dependency cycles; factoring out a Types module is one way to break the cycles. Rob Dockins Speak softly and drive a Sherman tank. Laugh hard; it's a long way to the bank. -- TMBG

Robert Dockins wrote:
On Mar 19, 2007, at 9:56 AM, Henning Thielemann wrote:
On Mon, 19 Mar 2007, Fawzi Mohamed wrote:
A practice I've seen a lot in small- to mid-sized programs is to have a Types module that contains definitions of the types used in the program.
ok I will think about it
I'd avoid that and suggest a more decentralized design, where each module contributes one main type and according functions.
I'd just like to mention that I've used the central "Types" module myself on occasion. The main reason is to avoid the need for mutually recursive modules, and not because its a particularly nice design. I try to arrange the user-visible API around some coherent organizing principle, such as the one Henning proposes, but that sometimes leads to module dependency cycles; factoring out a Types module is one way to break the cycles.
Rob Dockins
I used a "Types" module for most of the types in the all haskell regex-* backends I wrote. Doing anything else tended to lead to cycles, like Rob mentioned. This seems to be a result of "module/import" being the one-true-and-unique-way to create a namespace combined with almost no support for recursive modules. Thus data types that (indirectly) refer to each other must be in the same namespace. Thus one cannot even have a "all data types go in separate modules" policy. As I write the above, I wonder: if a new variant of "module" that only defines data constructors and types could be created then could compilers support these if they have mutual recursive imports/references? The other alternative being: Make one "Types" module file that has a new variant of "sub-module" that defines module namespaces inside the file. This is much that same as concatenating all the separate type modules into a single file. -- Chris

On Mon, 19 Mar 2007, Chris Kuklewicz wrote:
I used a "Types" module for most of the types in the all haskell regex-* backends I wrote. Doing anything else tended to lead to cycles, like Rob mentioned.
This seems to be a result of "module/import" being the one-true-and-unique-way to create a namespace combined with almost no support for recursive modules.
Thus data types that (indirectly) refer to each other must be in the same namespace. Thus one cannot even have a "all data types go in separate modules" policy.
As I write the above, I wonder: if a new variant of "module" that only defines data constructors and types could be created then could compilers support these if they have mutual recursive imports/references?
If I remember correctly, Oberon has one file per module and allows recursive modules. The identifiers to be exported are marked with a '*' on declaration. Interface files are automatically generated from these module files. (And Oberon programmers do not need to maintain export lists.)

Robert Dockins wrote:
The main reason is to avoid the need for mutually recursive modules, and not because its a particularly nice design.
Chris Kuklewicz wrote:
This seems to be a result of "module/import" being the one-true-and-unique-way to create a namespace combined with almost no support for recursive modules.
I'd like to remind you that the Haskell98 report explicitly allows recursive modules http://haskell.org/onlinereport/modules.html However, it's a real pity that Hugs doesn't support them at all and that you have to take extra pains with .boot-files to get them in GHC. Recursive modules are the lazy evaluation of modules and "One should not obstruct access to such a vital tool". I want recursive modules for free! Regards, apfelmus

On 19/03/07, Fawzi Mohamed
This is is very ugly in my opinion, because for me a type class should represent something more than just a way to overload, is something is not a number then it should not have the class Num.
Num is a collection of types whose members can be added and subtracted and so on. As numbers are the most common example of this, one could say the members of Num _act like numbers_, rather than are numbers in themselves. Really typeclasses are all about overloading. For example, Eq is the collection of types that the equality predicate (==) applies to. I don't see this as ugly; quite the contrary, in that if you know a type instantiates Eq you can use (==) without worrying about using a type-specific equality predicate. E.g. it's nice to see the same (==) everywhere rather than seeing (say) (Int.==), (Bool.==) and so on. -- -David House, dmhouse@gmail.com

David House wrote:
On 19/03/07, Fawzi Mohamed
wrote: This is is very ugly in my opinion, because for me a type class should represent something more than just a way to overload, is something is not a number then it should not have the class Num.
Num is a collection of types whose members can be added and subtracted and so on. As numbers are the most common example of this, one could say the members of Num _act like numbers_, rather than are numbers in themselves.
Really typeclasses are all about overloading. For example, Eq is the collection of types that the equality predicate (==) applies to. I don't see this as ugly; quite the contrary, in that if you know a type instantiates Eq you can use (==) without worrying about using a type-specific equality predicate. E.g. it's nice to see the same (==) everywhere rather than seeing (say) (Int.==), (Bool.==) and so on.
Maybe I did not express me clearly enough, I think that classes are useful (and the language that I was speaking of, aldor, has them), but it is not nice that the only way to have an overloaded function is to define a type class, or that you need to fully implement a class just to overload one function of it. Vectors don't act like numbers, a vector space is not a field, even if they have some common operations. I find it misleading to define something a number when it does not satisfy all the properties of numbers. The numerical prelude might fix this, but still I think that class and overloading should be distinct concepts. Fawzi

Fawzi Mohamed
Vectors don't act like numbers, a vector space is not a field, even if they have some common operations.
That's a long-standing flaw in the design of numeric classes. It's not a problem with typeclasses per se.
I find it misleading to define something a number when it does not satisfy all the properties of numbers.
Justifiably so. But if you had a class Additive, would you be unhappy about defining (+) on non-numbers?
The numerical prelude might fix this, but still I think that class and overloading should be distinct concepts.
I think the problem here is that you are using the word class to mean something different from Haskell classes. Haskell typeclasses /are/ overloading, and that's what I understand them as. They were originally introduced as a solution to the question of how to handle equality so that one didn't have to use different names for the same concept on different types. -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk http://www.chaos.org.uk/~jf/Stuff-I-dont-want.html (updated 2006-09-13)

Hello Fawzi, Monday, March 19, 2007, 8:26:33 PM, you wrote:
Maybe I did not express me clearly enough, I think that classes are useful (and the language that I was speaking of, aldor, has them), but it is not nice that the only way to have an overloaded function is to define a type class
overloading without using type classes (as it is implemented in C++) is hardly compatible with type inference. imagine that you has two functions: f :: String -> Int f :: Int -> Int and write (f a). what should be type of a? it becomes a list of possible types. because type inference mechanism pushes inferred types up and down, the whole information about type of each term will become much more complicated. compare this to current requirement to overload using only type classes. in this case, you know that 'a' belongs to some class, so possible variants don't multiply at each inference step -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Fawzi,
Monday, March 19, 2007, 8:26:33 PM, you wrote:
Maybe I did not express me clearly enough, I think that classes are useful (and the language that I was speaking of, aldor, has them), but it is not nice that the only way to have an overloaded function is to define a type class
overloading without using type classes (as it is implemented in C++) is hardly compatible with type inference. imagine that you has two functions:
f :: String -> Int f :: Int -> Int
and write (f a). what should be type of a? it becomes a list of possible types. because type inference mechanism pushes inferred types up and down, the whole information about type of each term will become much more complicated. compare this to current requirement to overload using only type classes. in this case, you know that 'a' belongs to some class, so possible variants don't multiply at each inference step
That was the reason that is spoke of aldor ( http://www.aldor.com ), as it has type inference, but yes indeed this makes type inference much more difficult and undefined in some cases (also haskell extensions make inference in general impossible). So I understand the choice of haskell, but still as a user (and not implementor :) I think that aldor approach is superior. Anyway I don't want to make this into a language war, I just pointed out something where haskell in my opinion (and from my short experience with it) could be improved. Obviously the perfect language does not exist you always have some trade offs to do... still one can always imagine some imprvements... ;-) Fawzi

Hello Fawzi, Tuesday, March 20, 2007, 1:47:48 PM, you wrote:
That was the reason that is spoke of aldor ( http://www.aldor.com ), as it has type inference, but yes indeed this makes type inference much more difficult and undefined in some cases (also haskell extensions make inference in general impossible).
the problem is not only implementation, but error messages. are you want to see a message like "a should be Int, b should String, and ñ should be Double; or x should be String and y Int; or ñ should be [Int]" ? :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Fawzi,
Tuesday, March 20, 2007, 1:47:48 PM, you wrote:
That was the reason that is spoke of aldor ( http://www.aldor.com ), as
it has type inference, but yes indeed this makes type inference much more difficult and undefined in some cases (also haskell extensions make inference in general impossible).
the problem is not only implementation, but error messages. are you want to see a message like "a should be Int, b should String, and ñ should be Double; or x should be String and y Int; or ñ should be [Int]" ? :)
ambiguous function call at line xxx. Possible instances are: f: Int -> String -> Double -> a f: String -> Int -> [Int] -> a please explicitly annotate the type to disambiguate Note that you want to use also the type of the result to disambiguate. Not easy, but doable, and done, again I can understand why haskell did not do it. Fawzi

Hello Fawzi, Tuesday, March 20, 2007, 5:37:45 PM, you wrote:
ambiguous function call at line xxx. Possible instances are: f: Int -> String -> Double -> a f: String -> Int -> [Int] -> a please explicitly annotate the type to disambiguate
Note that you want to use also the type of the result to disambiguate. Not easy, but doable, and done, again I can understand why haskell did not do it.
you not yet realize the whole problem :) in traditional languages, all type information flows in one direction and when compiler see f(x,y,z) call, it already knows x/y/z types because they should be declared earlier. but in haskell types of these variable may be defined by subsequent calls. so imagine several such overloaded calls - "f x y z", then "g (h x) (h y)", and more and more. error messages will become so complicated that you will shoot the compiler :) -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin wrote:
Hello Fawzi,
Tuesday, March 20, 2007, 5:37:45 PM, you wrote:
ambiguous function call at line xxx. Possible instances are: f: Int -> String -> Double -> a f: String -> Int -> [Int] -> a please explicitly annotate the type to disambiguate
Note that you want to use also the type of the result to disambiguate. Not easy, but doable, and done, again I can understand why haskell did not do it.
you not yet realize the whole problem :) in traditional languages, all type information flows in one direction and when compiler see f(x,y,z) call, it already knows x/y/z types because they should be declared earlier. but in haskell types of these variable may be defined by subsequent calls. so imagine several such overloaded calls - "f x y z", then "g (h x) (h y)", and more and more. error messages will become so complicated that you will shoot the compiler :)
Yes looking at the paper describing the type system of aldor http://www.aldor.org/mediawiki/index.php/Reports (I had used it 5 years ago before abandoning it because it was closed source and did not work on mac) indeed there are quite some differences, for example no parametric polymorphism, but types (and just about everything) are values so you can have something similar by passing the explicit type to a function that generates modules or other functions. Fawzi

The nice thing about Haskell's overloading is that every function, like f, has a type. Not two different types, but one general type you can give it. It's a different approach to overloading. -- Lennart On Mar 20, 2007, at 14:37 , Fawzi Mohamed wrote:
Bulat Ziganshin wrote:
Hello Fawzi,
Tuesday, March 20, 2007, 1:47:48 PM, you wrote:
That was the reason that is spoke of aldor ( http:// www.aldor.com ), as
it has type inference, but yes indeed this makes type inference much more difficult and undefined in some cases (also haskell extensions make inference in general impossible).
the problem is not only implementation, but error messages. are you want to see a message like "a should be Int, b should String, and ñ should be Double; or x should be String and y Int; or ñ should be [Int]" ? :)
ambiguous function call at line xxx. Possible instances are: f: Int -> String -> Double -> a f: String -> Int -> [Int] -> a please explicitly annotate the type to disambiguate
Note that you want to use also the type of the result to disambiguate. Not easy, but doable, and done, again I can understand why haskell did not do it.
Fawzi _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 19/03/07, Fawzi Mohamed
Vectors don't act like numbers, a vector space is not a field, even if they have some common operations.
As I said in my previous email, this is because Num is too big. We need to split it down, but there's no sane way of doing this without your average numeric function needing about a thousand different constraints on it. Type class synonyms [1] look promising, but no-one's implemented them yet AFAIK. [1]: http://repetae.net/john/recent/out/classalias.html -- -David House, dmhouse@gmail.com

David House wrote:
On 19/03/07, Fawzi Mohamed
wrote: Vectors don't act like numbers, a vector space is not a field, even if they have some common operations.
As I said in my previous email, this is because Num is too big. We need to split it down, but there's no sane way of doing this without your average numeric function needing about a thousand different constraints on it. Type class synonyms [1] look promising, but no-one's implemented them yet AFAIK.
I might missing something fundamental but what is the advantage of a class alias with respect to multiple inheritance? i.e. class alias FooBar a = (Foo a, Bar a) vs class (Foo a, Bar a) => FooBar can an alias be implemented a posteriory without modifying the code of FooBar? because what one really wants to do is to split Num even if Num is not splitted, but this would create problems with type inference... Fawzi

Hello Fawzi, Monday, March 19, 2007, 1:20:37 PM, you wrote:
Also arrays, inset,... have quite some overlapping. For array the use of the IArray typeclass kept the things nice also using Arrays and UArrays together, but adding IntSet to the whole worked only qualifying, and then I also wanted to define a fold for my type...
there are two libraries, Edison and Collection, that includes type classes hierarchy for collection types. may be, it will be useful for you -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com
participants (12)
-
ajb@spamcop.net
-
apfelmus
-
Bulat Ziganshin
-
Chris Kuklewicz
-
David House
-
Dougal Stanton
-
Fawzi Mohamed
-
Henning Thielemann
-
Jón Fairbairn
-
Lennart Augustsson
-
Philippa Cowderoy
-
Robert Dockins