
This email actually turned out much longer than I expected, but I hope it sparks interesting (and hopefully, thorough!) discussion on points that weren't touched on by previous threads on this topic. What follows describes my journey thus far exploring what I see (as a newcomer to Haskell) as a major problem with Haskell - the ease with which users can write inefficient code, and have no idea that they did. Some of this has to do with laziness, some of it with the functional programming in general. My goal is to achieve (as a link further down says) *good* performance, not blazing performance. I.e., I want to avoid what should really be no-brainers such as little-omega(1) space utilization for simple folds. The first version of a simple program I wrote was reasonably "clean." (Throughout this email, "clean" is supposed to mean some combination of: clear, compact, modular, elegant, etc.) It's a polynomial adder, which takes in lists of (coefficient, degree) tuples, combining terms of the same degree and maintaining a sorted order of least-to-greatest degree: type Poly = [(Int,Int)] addPoly1 :: Poly -> Poly -> Poly addPoly1 p1@(p1h@(p1c,p1d):p1t) p2@(p2h@(p2c,p2d):p2t) | p1d == p2d = (p1c + p2c, p1d) : addPoly1 p1t p2t | p1d < p2d = p1h : addPoly1 p1t p2 | p1d > p2d = p2h : addPoly1 p1 p2t addPoly1 p1 [] = p1 addPoly1 [] p2 = p2 addPoly1 [] [] = [] But this doesn't use tail recursion/accumulation (which seems to me like a complete hack that is a mandatory price to pay for using FP), so I rewrote it: addPoly :: Poly -> Poly -> Poly addPoly p1 p2 = let addPoly' p1@(p1h@(p1c,p1d):p1t) p2@(p2h@(p2c,p2d):p2t) result | p1d == p2d = addPoly' p1t p2t ((p1c + p2c, p1d):result) | p1d > p2d = addPoly' p1 p2t (p2h:result) | p1d < p2d = addPoly' p1t p2 (p1h:result) addPoly' (p1:p1s) [] result = addPoly' p1s [] (p1:result) addPoly' [] (p2:p2s) result = addPoly' [] p2s (p2:result) addPoly' [] [] result = reverse result in addPoly' p1 p2 [] But laziness will cause this to occupy Theta(n)-space of cons-ing thunks. (See appendix for a simpler example.) My third iteration will become even uglier because I will have to incorporate strictness and things like $! or seq. And there's probably a whole host of other issues that I haven't even thought of or don't know about (boxing, space leaks, etc.).
From #haskell, I got a general piece of advice:
Oct 07 22:37:20 <Cale> Actually, a lot of the time, code gets messy when people optimise because they take the wrong road to optimisation Oct 07 22:37:31 <Cale> There are two ways to optimise a piece of Haskell code Oct 07 22:37:43 <Cale> You can make it stricter, forcing evaluation to occur sooner Oct 07 22:38:01 <Cale> Or you can make it lazier, ensuring that things are not demanded until actually needed Oct 07 22:38:09 <Korollary> @wiki performance Oct 07 22:38:09 <lambdabot> http://www.haskell.org/haskellwiki/performance Oct 07 22:38:13 <Cale> the latter route actually tends to result in cleaner code Of course, to want to learn route 2 was a no-brainer. I read through that wiki page on laziness and various other resources, but was disappointed in what I found: http://www.haskell.org/haskellwiki/Performance/Laziness only discusses the aspect of laziness where you only evaluate what you need, which by now has been repeatedly beaten into my head and sounds obvious. Aren't there other advantages to laziness? Or at least, am I not fully appreciating the "obvious" work-avoiding part of laziness? For instance, the example seems to allude to sharing (between the even and odd functions), which I think ties in strongly with laziness. I was hoping for more in-depth insights on how to take advantage of laziness to write cleaner AND more efficient code. (A loose analogy exists in computer architecture - the smaller the chip, the less power it can consume/the faster it can clock.) http://en.wikibooks.org/wiki/Haskell/Laziness_revisited does slightly better but still just scratches the surface - it gives an example of a clean (compact and modular) yet efficient piece of code (isSubstringOf), then completely neglects explaining it (e.g., exactly why/how does it run more efficiently?). http://users.aber.ac.uk/afc/stricthaskell.html seems to suggest that strictness is the way to go. Please say it ain't so! http://www.algorithm.com.au/mt/haskell/haskells_performance.html says that between code optimized for performance in Haskell and in another language (like O'Caml), "which is more declarative, high-level, and most importantly, which one doesn't require an expert in the language to write" is an unfortunate (for Haskell fans) no-brainer: "The problem then is you have to know why something is slow, and there's where Haskell can get complex. Is there a space leak? (Do you even know what a space leak is?) Is your array strict instead of lazy? If it's strict, is it boxed instead of unboxed? (Do you even know what unboxing is?) Can you use unsafeRead instead of readArray? (Did you know that unsafeRead exists? I sure didn't!)" So, how do I take advantage of laziness to write clean *and* performant code? I've been seeking resources with concrete examples, explanations, or guides on how to "systematically" do this. Is this even possible? Or am I deluding myself, and strictness is the way to go? If so, has there been any work/research to address the issues pointed out in the last blog entry? And do most (experienced) Haskell users sacrifice cleanliness for speed, or speed for cleanliness? I feel that an understanding of how Haskell compilers work would help immensely in writing more performant Haskell code (or just better Haskell code in general), as well as understand what optimizations are reasonable to expect from the compiler. (I have a university class of background on the design/implementation of simple lexers/parsers/compilers/interpreters/code block graph analyses for imperative OO languages. However, i'd love to learn more about these things for a language like Haskell, including details on lazy evaluation, pattern-matching, overloading (why it's slow), currying, PFA, type classes, monads, etc. Preferrably something "lighter weight" than a bunch of research papers. Any pointers?) BTW, please also feel free to give me feedback/refinements on my code snippets. :) Thanks a ton for any enlightenment! Yang Aside: http://www.matthewmorgan.net/blog/archives/2004/10/07/haskell-performance. This post brings up a most reasonable point (venturing off onto tangent now) - space leaks are a much more serious concern than performance, and they're a difficult problem even for experienced Haskell users (http://www.haskell.org/pipermail/haskell-cafe/2003-December/005587.html).

Hello, admittedly, there is a lack of material on lazy evaluation and performance. IMHO, the current wiki(book) and other articles are somewhat inadequate which stems from the fact that current rumor is "strictness is fast" and "arrays must be unboxed" or so. I don't agree with this, so I post some remarks about performance in general and with laziness in particular. My first remark about performance is unfair and amounts to discuss the issue away: * The programmers viewpoint about performance in Haskell should be a lazy one, i.e. you think about the performance of your code only when its forced by someone else. If no one complains, do not even waste a second thinking about it. So
type Poly = [(Int,Int)]
addPoly1 :: Poly -> Poly -> Poly addPoly1 p1@(p1h@(p1c,p1d):p1t) p2@(p2h@(p2c,p2d):p2t) | p1d == p2d = (p1c + p2c, p1d) : addPoly1 p1t p2t | p1d < p2d = p1h : addPoly1 p1t p2 | p1d > p2d = p2h : addPoly1 p1 p2t addPoly1 p1 [] = p1 addPoly1 [] p2 = p2 addPoly1 [] [] = []
is the right thing to do. Point. With that viewpoint, your sorrows will fade away :) Mine do. In my experience, clean code by far outweighs any performance issues as it allows one to reduce the complexity of the programming task until it actually becomes tractable. As example, Darcs' patch mechanism is almost impossible to code in a non-functional language. And what about those open source / freeware game web sites where one reads "at some point my code base became absolutely unmaintainable and I had to abandon this project" ? Linked to that is the advent of scripting languages like perl, python, tcl. Why do people use these interpreted languages as there are the coffins of performance? IMHO, there also already is the trend to buy medium abstraction (Java, Objective-C) by deliberately selling away performance (Java programs are so lame, Objective-C can be speed up 10% by optimizing the msg_send() assembly) and i think: Haskell has a much better benefit-to-cost ratio then the others. The second remark is: * On machine level, Laziness is an overhead, but it's only a constant factor. *Constant factor*! There is a story about this in S. Skiena's "The Algorithm Design Manual" where some guy eats all the CPU of a supercomputer for a stupid task which he programmed using an asymptotically mediocre algorithm. Ouch. This already applies to your polynomials: are you sure [(Int,Int)] is the right data structure to fulfill your performance requirements on the asymptotic running time of your intended operation? Or should you use Data.Map Int Int which grants O(log n) random access (by index i) to every coefficient a_i as opposed to the list case where you can only fetch a_i in O(n) time? The answer is that your original representation is quite suitable as the common operations +,*,diff,evaluation etc. for polynomials have no "a priori" knowledge about which coefficients are 0 and which are not, they have to discover it anyway. Only internally * needs a finite map to collect common degrees. You can even write a function sum :: [Polynomial] -> Polynomial which uses a finite map to collect degrees and define + and * in terms of it. By the way, I don't know any imperative hobby programmer for which mutable arrays are not the one and for all collection data structure. Haskell offers a much easier access to data structures like binary search trees with appropriate polymorphism. IMHO, Haskell is even the first language to have a binary search tree library which offers satisfying generality and ease of use. This makes your program faster! Only the third remark starts to be about performance in Haskell itself: * Haskell is a lazy language. Why to make this your enemy, why to swim against the currents by making your programs more strict? As I remember, Cale once said on IRC how the relative strength of strictness / laziness applies on functions which operate on the different data sizes: small -> small doesn't really matter (minor lazy overhead but we have strictness analysis) large -> small every part of the large thing is needed in calculation strictness wins large -> small only parts are needed lazyness wins large -> large depends on large input like above but roughly same otherwise small -> large roughly same Only in the case where strictness wins, you have to insert a seq or two. As a rule of thumb, all recursive data structures like lists and trees are large. Int's are small. The point is that when you are given an Int, the first look at it will reveal all information about it. In case of a list, a first look only reveals the first element. What was the primary reason for laziness, anyway? The famous paper "Why functional programming matters" by John Hughes gives the answer: you can code programs in a more modular way (and this without biting performance penalties). Laziness is what makes compositions like large -> large -> small (large thing only partly needed) work.
But this doesn't use tail recursion/accumulation (which seems to me like a complete hack that is a mandatory price to pay for using FP)
I never payed it :) You actually fell into the trap by rewriting your code: performance got worse! Because of laziness, your first version can deliver the first coefficient of the sum without calculation the others. Your tail recursive version must calculate the hole sum before returning anything. This is because the list constructor (:) is lazy in its second argument. addPoly1 falls into the (large -> large) category. Of course, if you have to sum Int's over a list (which is large -> small and all list elements are needed), then it is wise to be strict and a seq will come in handy (foldl' (+) 0 in Haskell). On the other hand, if you are 'or'ing Boolean values over a list ((any) in Haskell), then laziness is the right thing, why? Your second version only adds overhead by introducing additional parameters and so on, it won't get better than the minor laziness overhead. Test both functions with "hugs +s" and be disappointed about the number of reductions/cells used. Concerning "overhead" in general, the simplest version will always do and the compiler will figure out the rest with automatic strictness analysis, inlining, unboxing, tail recursion etc. To the programmer, tail recursion is of no importance in a lazy language. And my last remark, again about data structures: * (mutable) arrays and hash tables don't fit into a functional language. Don't use them. Use Data.Map for random access. That being said, static arrays whose entries do not change (you only have a "single shot") can be very handy for Dynamic Programming. In this case, it is crucial that they are "boxed", i.e. the entries are calculated lazily. If need any other data structure, http://haskell.org/ghc/docs/latest/html/libraries/ http://haskell.org/haskellwiki/Libraries_and_tools/Data_structures http://www.eecs.tufts.edu/~rdocki01/edison.html For (in)finite maps, tries are *the* natural data structure for functional languages, see also Ralf Hinze. Generalizing generalized tries. Journal of Functional Programming, 10(4):327-351, July 2000 http://www.informatik.uni-bonn.de/~ralf/publications/GGTries.ps.gz If you are in a true need for speed, they'll outperform everything else. Should you need a custom data structure ("appendable priority search queue interval region ... tree sequence"), you can build one from the all-rounding Finger Tree, see also Finger Trees: A Simple General-purpose Data Structure Ralf Hinze and Ross Paterson. in Journal of Functional Programming16:2 (2006), pages 197-217 http://www.soi.city.ac.uk/~ross/papers/FingerTree.pdf Regards, apfelmus

On 08/10/06, apfelmus@quantentunnel.de
large -> large depends on large input like above but roughly same otherwise small -> large roughly same
Actually, it's an important point that laziness is generally preferable in these cases as well, since the large result might never be completely demanded, so if the function is lazy, then it will waste no time computing things that are never needed. I might also like to point out that by "small" and "large", we're actually referring to the number of ways in which components of the datastructure can be computed separately, which tends to correspond nicely to how one usually pictures small and large pieces of data.

Cale Gibbard wrote:
On 08/10/06, apfelmus@quantentunnel.de
wrote: large -> large depends on large input like above but roughly same otherwise small -> large roughly same
Actually, it's an important point that laziness is generally preferable in these cases as well, since the large result might never be completely demanded, so if the function is lazy, then it will waste no time computing things that are never needed.
Yes, in the sense that laziness allows for compositions (f : large -> small) . (g : large -> large) to adopt demands on the "large" input from f and not from g. So to speak, laziness is to be preferred for (.. -> large) if it's preferred for (large -> small) to enable proper behavior under composition.
I might also like to point out that by "small" and "large", we're actually referring to the number of ways in which components of the datastructure can be computed separately, which tends to correspond nicely to how one usually pictures small and large pieces of data.
I'm not sure what you mean. I thought it is appropriate to count the number of approximations to a given value v, i.e. count how many v' there are with v' < v in the usual CPO ordering, that is Cons _|_ [1,2] < Cons 3 [1,2] etc. That count is either large or small. For example, Ints are small, because there is only _|_ < any number. However, by representing them as data Digit = Zero | One newtype Int' = Binary [Digit] they become large. Also, data Tree' = Leaf | Tree !(Tree a) a !(Tree a) (with strictness annotations) is small, although the values are usually "big". Is it that what you mean? Admittedly, discussing strictness in terms of large and small which are defined by < and _|_ is a somewhat circular definition. Regards, apfelmus

On 09/10/06, apfelmus@quantentunnel.de
Cale Gibbard wrote:
I might also like to point out that by "small" and "large", we're actually referring to the number of ways in which components of the datastructure can be computed separately, which tends to correspond nicely to how one usually pictures small and large pieces of data.
I'm not sure what you mean. I thought it is appropriate to count the number of approximations to a given value v, i.e. count how many v' there are with v' < v in the usual CPO ordering, that is Cons _|_ [1,2] < Cons 3 [1,2] etc. That count is either large or small.
Yeah, that's exactly what I'm talking about. :)

On 10/8/06, Yang
And do most (experienced) Haskell users sacrifice cleanliness for speed, or speed for cleanliness?
Keep the internals of your code--that which will be looked at a lot--fast and ugly, while the rest can be clean. If you have a function that does something very simple, but the "pretty" way to do it takes a second to run while the ugly way is much, much faster, use the pretty one if it's only going to be needed once or twice. It's certainly not the kind of thing you want to fold your lists with: use the ugly version for that. Also, if you want, you can write both a pretty version and an ugly version, and put the pretty version in comments while the ugly version does all the real work.

On 10/8/06, ihope
On 10/8/06, Yang
wrote: And do most (experienced) Haskell users sacrifice cleanliness for speed, or speed for cleanliness?
Keep the internals of your code--that which will be looked at a lot--fast and ugly, while the rest can be clean. If you have a function that does something very simple, but the "pretty" way to do it takes a second to run while the ugly way is much, much faster, use the pretty one if it's only going to be needed once or twice. It's certainly not the kind of thing you want to fold your lists with: use the ugly version for that.
Also, if you want, you can write both a pretty version and an ugly version, and put the pretty version in comments while the ugly version does all the real work.
Another good idea when you have a pretty version which is easy to verify for correctness and an ugly version that is harder to verify is to use QuickCheck or SmallCheck and define a property that says both versions are equal for all inputs. Ugly code is notorious for holding bugs, but doing this would help test the ugly code. Jason

On Sun, 2006-10-08 at 15:25 -0700, Jason Dagit wrote:
Another good idea when you have a pretty version which is easy to verify for correctness and an ugly version that is harder to verify is to use QuickCheck or SmallCheck and define a property that says both versions are equal for all inputs. Ugly code is notorious for holding bugs, but doing this would help test the ugly code.
This is exactly how we tested Data.ByteString and to great effect I think. We uncovered loads of bugs during testing. The few bugs uncovered by our users since it has been released have invariably been in things we didn't have QC properties for. Duncan

duncan.coutts:
On Sun, 2006-10-08 at 15:25 -0700, Jason Dagit wrote:
Another good idea when you have a pretty version which is easy to verify for correctness and an ugly version that is harder to verify is to use QuickCheck or SmallCheck and define a property that says both versions are equal for all inputs. Ugly code is notorious for holding bugs, but doing this would help test the ugly code.
This is exactly how we tested Data.ByteString and to great effect I think. We uncovered loads of bugs during testing. The few bugs uncovered by our users since it has been released have invariably been in things we didn't have QC properties for.
Yes, I agree with this. By checking fast-bug-ugly code against slow-but-obvious code, we were able to catch bugs before Data.ByteString was deployed in the outside world, and before the bugs could hurt anyone. These days, Data.ByteString has some 2000 lines of QC properties, which are run on ever darcs commit. -- Don

Yang wrote:
type Poly = [(Int,Int)]
addPoly1 :: Poly -> Poly -> Poly addPoly1 p1@(p1h@(p1c,p1d):p1t) p2@(p2h@(p2c,p2d):p2t) | p1d == p2d = (p1c + p2c, p1d) : addPoly1 p1t p2t | p1d < p2d = p1h : addPoly1 p1t p2 | p1d > p2d = p2h : addPoly1 p1 p2t addPoly1 p1 [] = p1 addPoly1 [] p2 = p2 addPoly1 [] [] = []
But this doesn't use tail recursion/accumulation
Indeed it doesn't. Now remind me, why is that supposed to be a Bad Thing? The above code exhibits a maximum of lazyness and runs with no useless space overhead. Apart from the expression (p1c + p2c), which you probably want to evaluate eagerly, it is close to perfect.
so I rewrote it: [...]
But laziness will cause this to occupy Theta(n)-space of cons-ing thunks.
No, it doesn't. Insisting on accumulator recursion does. Actually, using reverse does. Think about it, a strict reverse cannot use less than O(n) space, either.
I was hoping for more in-depth insights on how to take advantage of laziness to write cleaner AND more efficient code.
Try to explain why your first iteration was bad. You'll achieve enlightenment at the point where your explanation fails. Udo. -- Hast du zum Leben kein Motiv -- steig mal vor, vielleicht geht's schief. -- aus einem Gipfelbuch

On 10/8/06, Udo Stenzel u.stenzel-at-web.de |haskell-cafe| <...> wrote:
Yang wrote:
type Poly = [(Int,Int)]
addPoly1 :: Poly -> Poly -> Poly addPoly1 p1@(p1h@(p1c,p1d):p1t) p2@(p2h@(p2c,p2d):p2t) | p1d == p2d = (p1c + p2c, p1d) : addPoly1 p1t p2t | p1d < p2d = p1h : addPoly1 p1t p2 | p1d > p2d = p2h : addPoly1 p1 p2t addPoly1 p1 [] = p1 addPoly1 [] p2 = p2 addPoly1 [] [] = []
But this doesn't use tail recursion/accumulation
Indeed it doesn't. Now remind me, why is that supposed to be a Bad Thing? The above code exhibits a maximum of lazyness and runs with no useless space overhead. Apart from the expression (p1c + p2c), which you probably want to evaluate eagerly, it is close to perfect.
so I rewrote it: [...]
But laziness will cause this to occupy Theta(n)-space of cons-ing thunks.
No, it doesn't. Insisting on accumulator recursion does. Actually, using reverse does. Think about it, a strict reverse cannot use less than O(n) space, either.
Well, in general, the problem you run into is this, where we use linear space for the thunks: foldl (+) 0 [1,2,3] = foldl (+) (0 + 1) [2,3] = foldl (+) ((0 + 1) + 2) [3] = foldl (+) (((0 + 1) + 2) + 3) [] = ((0 + 1) + 2) + 3 = (1 + 2) + 3 = 3 + 3 = 6 whereas with strictness, you use constant space: foldl' f z [] = z foldl' f z (x:xs) = let u = f z x in u `seq` foldl' f u xs foldl' (+) 0 [1,2,3] = let u = 0 + 1 in u `seq` foldl' (+) u [2,3] = foldl' (+) 1 [2,3] = let u = 1 + 2 in u `seq` foldl' (+) u [3] = foldl' (+) 3 [3] = let u = 3 + 3 in u `seq` foldl' (+) u [] = foldl' (+) 6 [] = 6
I was hoping for more in-depth insights on how to take advantage of laziness to write cleaner AND more efficient code.
Try to explain why your first iteration was bad. You'll achieve enlightenment at the point where your explanation fails.
Udo. -- Hast du zum Leben kein Motiv -- steig mal vor, vielleicht geht's schief. -- aus einem Gipfelbuch
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux)
iD8DBQFFKYwQc1ZCC9bsOpURAs4KAKCymnLiE5LfkCa01H0AJ2FddwJ6oQCfY6DY sYRPT1fGr0mUozUcs+qGC8s= =BRLQ -----END PGP SIGNATURE-----

Yang wrote:
But laziness will cause this to occupy Theta(n)-space of cons-ing thunks.
No, it doesn't. Insisting on accumulator recursion does. Actually, using reverse does. Think about it, a strict reverse cannot use less than O(n) space, either.
Well, in general, the problem you run into is this, where we use linear space for the thunks:
foldl (+) 0 [1,2,3] [etc.]
The point is that the claim "in general" is wrong. The problem arises for the special case (fold (+) 0) but things are different in your special case addPoly1. Regards, apfelmus

I think your first try looks good. The only thing to worry about would be the + being too lazy. But that's easy to fix at the same time as improving your code in another respect. It's usually good to use real types instead of synonyms, so let's do that. data Nom = Nom Int Int type PolyNom = [Nom] ... addPoly1 p1@(p1h@(Nom p1c p1d):p1t) p2@(p2h@(Nom p2c p2d):p2t) | p1d == p2d = Nom (p1c + p2c) p1d : addPoly1 p1t p2t | p1d < p2d = p1h : addPoly1 p1t p2 | p1d > p2d = p2h : addPoly1 p1 p2t ... And then to make the + strict we just change the data type definition. data Nom = Nom !Int Int -- Lennart On Oct 8, 2006, at 12:58 , Yang wrote:
This email actually turned out much longer than I expected, but I hope it sparks interesting (and hopefully, thorough!) discussion on points that weren't touched on by previous threads on this topic. What follows describes my journey thus far exploring what I see (as a newcomer to Haskell) as a major problem with Haskell - the ease with which users can write inefficient code, and have no idea that they did. Some of this has to do with laziness, some of it with the functional programming in general. My goal is to achieve (as a link further down says) *good* performance, not blazing performance. I.e., I want to avoid what should really be no-brainers such as little-omega(1) space utilization for simple folds.
The first version of a simple program I wrote was reasonably "clean." (Throughout this email, "clean" is supposed to mean some combination of: clear, compact, modular, elegant, etc.) It's a polynomial adder, which takes in lists of (coefficient, degree) tuples, combining terms of the same degree and maintaining a sorted order of least-to-greatest degree:
type Poly = [(Int,Int)]
addPoly1 :: Poly -> Poly -> Poly addPoly1 p1@(p1h@(p1c,p1d):p1t) p2@(p2h@(p2c,p2d):p2t) | p1d == p2d = (p1c + p2c, p1d) : addPoly1 p1t p2t | p1d < p2d = p1h : addPoly1 p1t p2 | p1d > p2d = p2h : addPoly1 p1 p2t addPoly1 p1 [] = p1 addPoly1 [] p2 = p2 addPoly1 [] [] = []
But this doesn't use tail recursion/accumulation (which seems to me like a complete hack that is a mandatory price to pay for using FP), so I rewrote it:
addPoly :: Poly -> Poly -> Poly addPoly p1 p2 = let addPoly' p1@(p1h@(p1c,p1d):p1t) p2@(p2h@(p2c,p2d):p2t) result | p1d == p2d = addPoly' p1t p2t ((p1c + p2c, p1d):result) | p1d > p2d = addPoly' p1 p2t (p2h:result) | p1d < p2d = addPoly' p1t p2 (p1h:result) addPoly' (p1:p1s) [] result = addPoly' p1s [] (p1:result) addPoly' [] (p2:p2s) result = addPoly' [] p2s (p2:result) addPoly' [] [] result = reverse result in addPoly' p1 p2 []
But laziness will cause this to occupy Theta(n)-space of cons-ing thunks. (See appendix for a simpler example.) My third iteration will become even uglier because I will have to incorporate strictness and things like $! or seq. And there's probably a whole host of other issues that I haven't even thought of or don't know about (boxing, space leaks, etc.).
From #haskell, I got a general piece of advice:
Oct 07 22:37:20 <Cale> Actually, a lot of the time, code gets messy when people optimise because they take the wrong road to optimisation Oct 07 22:37:31 <Cale> There are two ways to optimise a piece of Haskell code Oct 07 22:37:43 <Cale> You can make it stricter, forcing evaluation to occur sooner Oct 07 22:38:01 <Cale> Or you can make it lazier, ensuring that things are not demanded until actually needed Oct 07 22:38:09 <Korollary> @wiki performance Oct 07 22:38:09 <lambdabot> http://www.haskell.org/haskellwiki/ performance Oct 07 22:38:13 <Cale> the latter route actually tends to result in cleaner code
Of course, to want to learn route 2 was a no-brainer. I read through that wiki page on laziness and various other resources, but was disappointed in what I found:
http://www.haskell.org/haskellwiki/Performance/Laziness only discusses the aspect of laziness where you only evaluate what you need, which by now has been repeatedly beaten into my head and sounds obvious. Aren't there other advantages to laziness? Or at least, am I not fully appreciating the "obvious" work-avoiding part of laziness? For instance, the example seems to allude to sharing (between the even and odd functions), which I think ties in strongly with laziness. I was hoping for more in-depth insights on how to take advantage of laziness to write cleaner AND more efficient code. (A loose analogy exists in computer architecture - the smaller the chip, the less power it can consume/the faster it can clock.)
http://en.wikibooks.org/wiki/Haskell/Laziness_revisited does slightly better but still just scratches the surface - it gives an example of a clean (compact and modular) yet efficient piece of code (isSubstringOf), then completely neglects explaining it (e.g., exactly why/how does it run more efficiently?).
http://users.aber.ac.uk/afc/stricthaskell.html seems to suggest that strictness is the way to go. Please say it ain't so!
http://www.algorithm.com.au/mt/haskell/haskells_performance.html says that between code optimized for performance in Haskell and in another language (like O'Caml), "which is more declarative, high-level, and most importantly, which one doesn't require an expert in the language to write" is an unfortunate (for Haskell fans) no-brainer:
"The problem then is you have to know why something is slow, and there's where Haskell can get complex. Is there a space leak? (Do you even know what a space leak is?) Is your array strict instead of lazy? If it's strict, is it boxed instead of unboxed? (Do you even know what unboxing is?) Can you use unsafeRead instead of readArray? (Did you know that unsafeRead exists? I sure didn't!)"
So, how do I take advantage of laziness to write clean *and* performant code? I've been seeking resources with concrete examples, explanations, or guides on how to "systematically" do this. Is this even possible? Or am I deluding myself, and strictness is the way to go? If so, has there been any work/research to address the issues pointed out in the last blog entry? And do most (experienced) Haskell users sacrifice cleanliness for speed, or speed for cleanliness?
I feel that an understanding of how Haskell compilers work would help immensely in writing more performant Haskell code (or just better Haskell code in general), as well as understand what optimizations are reasonable to expect from the compiler. (I have a university class of background on the design/implementation of simple lexers/parsers/compilers/interpreters/code block graph analyses for imperative OO languages. However, i'd love to learn more about these things for a language like Haskell, including details on lazy evaluation, pattern-matching, overloading (why it's slow), currying, PFA, type classes, monads, etc. Preferrably something "lighter weight" than a bunch of research papers. Any pointers?)
BTW, please also feel free to give me feedback/refinements on my code snippets. :)
Thanks a ton for any enlightenment!
Yang
Aside: http://www.matthewmorgan.net/blog/archives/2004/10/07/ haskell-performance. This post brings up a most reasonable point (venturing off onto tangent now) - space leaks are a much more serious concern than performance, and they're a difficult problem even for experienced Haskell users (http://www.haskell.org/pipermail/haskell-cafe/2003- December/005587.html). _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Lennart Augustsson wrote:
I think your first try looks good. [snip] ... addPoly1 p1@(p1h@(Nom p1c p1d):p1t) p2@(p2h@(Nom p2c p2d):p2t) | p1d == p2d = Nom (p1c + p2c) p1d : addPoly1 p1t p2t | p1d < p2d = p1h : addPoly1 p1t p2 | p1d > p2d = p2h : addPoly1 p1 p2t ...
The last comparison is redundant (this was in the original version too) because p1d > p2d is implied (certainly for this case where p1d, p2d::Int) by the fall through from not satisfying == and < so how about: addPoly1 p1@(p1h@(Nom p1c p1d):p1t) p2@(p2h@(Nom p2c p2d):p2t) | p1d == p2d = Nom (p1c + p2c) p1d : addPoly1 p1t p2t | p1d < p2d = p1h : addPoly1 p1t p2 | otherwise = p2h : addPoly1 p1 p2t Regards, Brian. -- Logic empowers us and Love gives us purpose. Yet still phantoms restless for eras long past, congealed in the present in unthought forms, strive mightily unseen to destroy us. http://www.metamilk.com

"Brian Hulley"
Lennart Augustsson wrote:
I think your first try looks good. [snip] ... addPoly1 p1@(p1h@(Nom p1c p1d):p1t) p2@(p2h@(Nom p2c p2d):p2t) | p1d == p2d = Nom (p1c + p2c) p1d : addPoly1 p1t p2t | p1d < p2d = p1h : addPoly1 p1t p2 | p1d > p2d = p2h : addPoly1 p1 p2t ...
The last comparison is redundant (this was in the original version too) because p1d > p2d is implied (certainly for this case where p1d, p2d::Int) by the fall through from not satisfying == and < so how about:
addPoly1 p1@(p1h@(Nom p1c p1d):p1t) p2@(p2h@(Nom p2c p2d):p2t) | p1d == p2d = Nom (p1c + p2c) p1d : addPoly1 p1t p2t | p1d < p2d = p1h : addPoly1 p1t p2 | otherwise = p2h : addPoly1 p1 p2t
Surely all but one of the comparisons is unnecessary? If you use `compare` instead of (==) and friends, won't one do (I'm assuming that the compiler can convert cases on LT, EQ and GT into something sensible -- after all, wasn't that the purpose of compare?)? -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk http://www.chaos.org.uk/~jf/Stuff-I-dont-want.html (updated 2006-09-13)

Hello Jon, Tuesday, October 10, 2006, 1:18:52 PM, you wrote:
Surely all but one of the comparisons is unnecessary? If you use `compare` instead of (==) and friends, won't one do (I'm assuming that the compiler can convert cases on LT, EQ and GT into something sensible -- after all, wasn't that the purpose of compare?)?
it will too smart for GHC. actual code is: compareInt# :: Int# -> Int# -> Ordering compareInt# x# y# | x# <# y# = LT | x# ==# y# = EQ | otherwise = GT -- Best regards, Bulat mailto:Bulat.Ziganshin@gmail.com

Bulat Ziganshin
Hello Jon,
Tuesday, October 10, 2006, 1:18:52 PM, you wrote:
Surely all but one of the comparisons is unnecessary? If you use `compare` instead of (==) and friends, won't one do (I'm assuming that the compiler can convert cases on LT, EQ and GT into something sensible -- after all, wasn't that the purpose of compare?)?
it will too smart for GHC. actual code is:
compareInt# :: Int# -> Int# -> Ordering compareInt# x# y# | x# <# y# = LT | x# ==# y# = EQ | otherwise = GT
But once that's been inlined and through whatever code generator, what then? If it doesn't get turned into one test on the data and conditional jumps on sign bits, something isn't doing a thorough job... -- Jón Fairbairn Jon.Fairbairn@cl.cam.ac.uk

On Oct 10, 2006, at 14:58 , Jón Fairbairn wrote:
Bulat Ziganshin
writes: Hello Jon,
Tuesday, October 10, 2006, 1:18:52 PM, you wrote:
Surely all but one of the comparisons is unnecessary? If you use `compare` instead of (==) and friends, won't one do (I'm assuming that the compiler can convert cases on LT, EQ and GT into something sensible -- after all, wasn't that the purpose of compare?)?
it will too smart for GHC. actual code is:
compareInt# :: Int# -> Int# -> Ordering compareInt# x# y# | x# <# y# = LT | x# ==# y# = EQ | otherwise = GT
But once that's been inlined and through whatever code generator, what then? If it doesn't get turned into one test on the data and conditional jumps on sign bits, something isn't doing a thorough job...
Assuming your machine architecture supports something like condition codes. On, e.g., the MIPS you would need to test for < and == separately. -- Lennart

Assuming your machine architecture supports something like condition codes. On, e.g., the MIPS you would need to test for < and == separately.
And even if your machine supports condition codes, you'll need one test plus two conditional jumps. Not much better than MIPS's 2 independent tests plus 2 conditional jumps. Stefan
participants (13)
-
apfelmus@quantentunnel.de
-
Brian Hulley
-
Bulat Ziganshin
-
Cale Gibbard
-
dons@cse.unsw.edu.au
-
Duncan Coutts
-
ihope
-
Jason Dagit
-
Jón Fairbairn
-
Lennart Augustsson
-
Stefan Monnier
-
Udo Stenzel
-
Yang