Re: [Haskell] [Fwd: Re: Computer Language Shootout]

I hoped at least to stimulate interest in repeating GP experiment with latest GHC version.
until that happens, I'd be wary to draw too many conclusions for today's applications from this paper. two orders of magnitude difference would seem to imply programming problems to me (though the authors did have the problem/ excuse of lack of profiling tools).
(the experiment: http://www.dcs.ed.ac.uk/home/stg/sfpw/book/Garmendia-Doval/cameraready.ps)
not only is it old (ghc-4.08), it doesn't show much code, and what it shows, looks suspiciously like a case of ML-mindset programming in Haskell (the kind that people get help with on the various Haskell lists nearly every day, nowadays). according to 3.5, the Haskell version came first, was used for experiments, and errors in design were corrected, then came the ML version, which was at this point already substantially faster (from which i would be tempted to conclude that the coding style was a better fit for ML than for Haskell; cf also 3.5.1, third para). the ML version was then profiled and improved, after which these improvements were backported from the ML to the Haskell version! okay, profiling was not available for the Haskell version back then, but using ML profiling to improve a Haskell version sounds highly dangerous to me, even more so if the authors do not even mention any awareness of this danger. in 3.5.1, we see Alleles as a BoolVector, which sounds fine until we see it converted to its list of associations, which is then foldl-ed (!) over with a non-strict function (good that those chromosome lengths appear to be only 60..), for every evaluation. this is the main evaluation function for fitness, so it should be very much inner loop, with population sizes and generations ranging to 7500/700 and 50/150000. . of course, the same function in the ML version just means running a loop over a vector, with a strict accumulator.. that is just the code shown in the paper - and it is the only piece of code, apart from structure declarations. perhaps inlining and strictness analysis caught this particular problem, and perhaps calling out to C for random numbers didn't slow down the program, either - the paper just doesn't give enough information to tell. "So far, we have found the Standard ML version to out-perform the Haskell version by over two orders of magnitude. Despite extensive study of the Haskell version, no clear reason has emerged for this poor performance." note that they do not say "any Haskell version would have to be slow because of..", they say they don't know. well, enough of this fud!-) i hope it is fair to say that not just todays compilers are different, but also the user communities. i cannot imagine anyone experiencing this level of performance difference in a concrete application without asking questions about it on one of the Haskell lists, nor would i expect such questions to remain unanswered for long enough to merit a paper (at least not about the question; Data.ByteString shows that there are still papers to be written about answers;-). and if it is too complicated for a public discussion, the project in question could always hire Haskell expertise, in the form of consulting (although i see that most entries have disappeared from the Haskell consultants list, probably because they tend to answer questions for free on the mailing lists?-). claus

On Tue, 2007-02-27 at 16:51 +0000, Claus Reinke wrote:
okay, profiling was not available for the Haskell version back then, but using ML profiling to improve a Haskell version sounds highly dangerous to me, even more so if the authors do not even mention any awareness of this danger. in 3.5.1, we see Alleles as a BoolVector, which sounds fine until we see it converted to its list of associations, which is then foldl-ed (!) over with a non-strict function (good that those chromosome lengths appear to be only 60..), for every evaluation. this is the main evaluation function for fitness, so it should be very much inner loop, with population sizes and generations ranging to 7500/700 and 50/150000. . of course, the same function in the ML version just means running a loop over a vector, with a strict accumulator..
It'd be interesting to get the real code for this. Partly to just try optimising it but more so as a real test case for list/array fusion. As far as I see, there's no reason that consuming an assoc list of a bool vector with a foldl' (the ' is probably essential) should be slow. If it's fused properly no list cells should ever be allocated, we should get the loop over the vector. Duncan

It'd be interesting to get the real code for this. Partly to just try optimising it but more so as a real test case for list/array fusion.
As far as I see, there's no reason that consuming an assoc list of a bool vector with a foldl' (the ' is probably essential) should be slow. If it's fused properly no list cells should ever be allocated, we should get the loop over the vector.
fooling around with a dummy framework (loop over evaluate population*generations times) suggests that it is far easier to get order-of-magnitude slow-downs and space leaks by not evaluating the fitness results to the end than it is by not evaluating a short inner loop (provided that its result is forced at each outer iteration, ie making sure that the next generation is computed from the fitness *before* the next iteration starts). that's why i mentioned that this might not have been the source of their trouble. but as they did have resource trouble, it worries me that they present this kind of code without even discussing possible implications, strictness or optimisations. yes, it would be interesting to see if the rest of the code follows the same pattern (in which case i wouldn't be surprised about resource issues), or whether there is anything more interesting going on. as it stands, guessing what the authors might have wanted to say in this paper isn't helping anyone. claus
participants (2)
-
Claus Reinke
-
Duncan Coutts