
Hi all, I'm taking a look at the "Computer Language Shootout Benchmarks". http://shootout.alioth.debian.org/ It looks like Haskell doesn't do very well. It seems to be near the bottom of the pile in most tests. Is this due to the inherent design of Haskell or is it merely the fact that GHC is young and hasn't had as much time to optimize as other compilers? For example, another very slow language is Ruby. In Ruby's case, there is a design factor that will always make it slow. I wonder if Haskell is in a smilar situation. Yes yes, I know that a high level language trades CPU time by programmer time. I'm still interested in the CPU time question though :) Cheers, Daniel. -- /\/`) http://oooauthors.org /\/_/ http://opendocumentfellowship.org /\/_/ \/_/ I am not over-weight, I am under-tall. /

On 23.12 19:14, Daniel Carrera wrote:
I'm taking a look at the "Computer Language Shootout Benchmarks".
http://shootout.alioth.debian.org/
It looks like Haskell doesn't do very well. It seems to be near the bottom of the pile in most tests. Is this due to the inherent design of Haskell or is it merely the fact that GHC is young and hasn't had as much time to optimize as other compilers?
Some reasons for this include: * Efficient string handling functions are packaged separately (faststring etc) And thus not included in shootout. * The tests change faster than Haskell people write efficient versions of the programs. * Most of the Haskell results are taking an imperative implementation and just translating it. * In many cases other languages use arrays while the Haskell implementation uses lists. Haskell does have efficient arrays. - Einar Karttunen

From a glance, I also don't buy that most of this Haskell code is very idiomatic or optimized, but I could be wrong. It looks like literal
This "benchmark" only tests specific implementations of these little tests, tests mostly designed to do imperative things in an imperative way. What that means is the results are completely subject to (1) how good the submission for that tests was, (2) the choice of tests in the first place, and (3) startup times for loading the binaries into memory (GHC makes big binaries that are arguably much faster if you do things in "daemon" mode, for example). translations of imperative code into Haskell, which is not the way Haskell works. (It reminds me of Paul Graham's 1-D measure of programming language expressivity: how hard is it to make a number incrementer, which assumes very much that side-effects are a good thing---the Haskell code is quite short considering the fact that this is not something the language was designed for, and it's not considered "the right approach" unless you have a good reason to do it this way.) I'm also not surprised that "true, blue" Haskell folk haven't necessarily jumped on this and submitted better code, because it is pretty much a game created to make C win---it already wins! That being said, I think it's neat that someone but something together to play with. I also //do// agree that Haskell at its best is still too slow, and that even implementations of a great language like Haskell can always be better, but, as you said, the programmer/human side wins out---for me, for Haskell, anyway. Plus a lot of this code could be rewritten in C and then just have the Haskell code foreign function call it--- since that is arguably a very good way ("the right way"?) to use Haskell to do imperative things, especially time-sensitive inner loops, especially in large applications. But notice that none of these programs are big enough to be considered "large" applications, and none of the tests measure prototyping, refactoring, design time, mathematical tractability/provability, or programmer time (or programmer **pain**! especially for non-trivial programs, which none of these are. Just my $.02 Jared. -- jupdike@gmail.com http://www.updike.org/~jared/ reverse ")-:"

At Fri, 23 Dec 2005 19:14:46 +0000, Daniel Carrera wrote:
Hi all,
I'm taking a look at the "Computer Language Shootout Benchmarks".
http://shootout.alioth.debian.org/
It looks like Haskell doesn't do very well. It seems to be near the bottom of the pile in most tests. Is this due to the inherent design of Haskell or is it merely the fact that GHC is young and hasn't had as much time to optimize as other compilers?
Rumor has it that the poor results are due to the inherent design of the shootout itself. The shootout seems to test how fast specific algorthims can be executed in various languages. Instead of testing: "Write a program that solves this problem quickly" the tests are: "Write a program that solves this problem quickly, using *this* algorithm" And many of the algorithms that you are required to use are a very poor match for haskell. If the challenge was simply to solve the problem using any algorithm you wanted, then Haskell would probably fair much better. Another possible factor is known deficiencies in current IO library used by most haskell compilers. It is my understanding that the current IO library was not implemented with speed and efficiency as a top priority. If someone took the time to optimize/rewrite the current library code, it might be possible to greatly increase the overall speed of IO intensive haskell programs with out any changes to the compilers or language. In theory, compiled haskell code ought to be faster than C. Optimization techniques like "whole program compilation" allow for optimizations that would not be possible under most languages. However, many of these techniques are still being developed, are hard to implement and are not fully understood. Projects like jhc aim to figure out which of these techniques actually work in the real world. Personally, I think one of the main reasons why haskell is 'slow' is because it is fast enough for most people. Given a choice of 'more features' or 'more speed', many people prefer 'more features'. That said, the GHC maintainers manage to do both somehow. j. ps. Everything I said in this email may be wrong...

Jeremy Shaw wrote:
Rumor has it that the poor results are due to the inherent design of the shootout itself. The shootout seems to test how fast specific algorthims can be executed in various languages. Instead of testing:
"Write a program that solves this problem quickly"
the tests are:
"Write a program that solves this problem quickly, using *this* algorithm"
At one time someone was working on creating the "Computer Language Shoot-in" to compensate for this deficiency... http://shootin.sourceforge.net/
And many of the algorithms that you are required to use are a very poor match for haskell.
True. But there are some tests like "fasta" that appear to have a laziness induced space leak that presumably could be fixed. http://shootout.alioth.debian.org/benchmark.php?test=fasta&lang=all Greg Buchholz

Greg Buchholz wrote:
True. But there are some tests like "fasta" that appear to have a laziness induced space leak that presumably could be fixed.
http://shootout.alioth.debian.org/benchmark.php?test=fasta&lang=all
Playing with the space-leaking code did not show any obvious reason to expect the space leak. By commenting out the third output in main it did not leak space. Could some look at the original code at the link above and let me know why there is a 300 MB space leak? But if you never express huge lists, then it won't leak space. Here is a tweaked version, called tweak6.hs, that interleaves making and writing a line of output. It runs in 2.16 MB of RSIZE instead of over 300 MB. Still slow, however. If you want more speed without extra libraries, then use an unboxed array of Word8 (length of one line: 60 bytes), since Unicode Char and linked lists are overkill. The other killer is probably the array of tuple linear lookup to find the chosen symbol. Hand coded if/then branches could be used to speed that up. For even faster speed, do not use 0.0 to 1.0 probabilities at all. Instead of normalize :: Int -> Double and then lookup (Double -> Base) remove the use of Double and do (Int -> Base) lookup. But the space leak was the embarrassing part. So I just fixed that. -- Chris Kuklewicz --------------- -- The Great Computer Language Shootout -- http://shootout.alioth.debian.org/ -- contributed by Jeff Newbern -- Modified to tweak6.hs by Chris Kuklewicz import Control.Monad.Trans import Control.Monad.State import System.IO (hFlush,stdout) {- original $ time ./fasta.ghc_run 2500000 > /dev/null real 1m17.596s user 0m44.838s sys 0m3.322s 330+ Megs! Space leaking all over the place. No idea why. $ time ./fasta.tweak6 2500000 > /dev/null real 0m30.477s user 0m23.356s sys 0m0.744s 2.2 Megs RSIZE -} -- Uses random generation code derived from Simon Marlow and Einar Karttunen's -- "random" test entry. -- Orignal version note: This code has not been optimized *at all*. -- It is written to be clear and to follow standard Haskell idioms as -- much as possible (but we have to match the stateful PRNG idiom in -- the test definition, which is oriented toward an imperative -- language). Performance is decent with ghc -O2, but if it can be -- improved without sacrificing the clarity of the code, by all means -- go for it! -- Mondified tweak6.hs version note: Use a StateT around IO to manage -- the seed. This makes interleaving the generation and output of the -- sequence easier. It is only *minimally* abstracted. import System(getArgs) type Base = Char type Sequence = [Base] alu :: Sequence -- predefined sequence alu = "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGG" ++ "GAGGCCGAGGCGGGCGGATCACCTGAGGTCAGGAGTTCGAGA" ++ "CCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACTAAAAAT" ++ "ACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCA" ++ "GCTACTCGGGAGGCTGAGGCAGGAGAATCGCTTGAACCCGGG" ++ "AGGCGGAGGTTGCAGTGAGCCGAGATCGCGCCACTGCACTCC" ++ "AGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA" type BaseFrequency = (Base,Double) iub :: [BaseFrequency] iub = [ ('a', 0.27), ('c', 0.12), ('g', 0.12), ('t', 0.27), ('B', 0.02), ('D', 0.02), ('H', 0.02), ('K', 0.02), ('M', 0.02), ('N', 0.02), ('R', 0.02), ('S', 0.02), ('V', 0.02), ('W', 0.02), ('Y', 0.02) ] homosapiens :: [BaseFrequency] homosapiens = [ ('a', 0.3029549426680), ('c', 0.1979883004921), ('g', 0.1975473066391), ('t', 0.3015094502008) ] -- select a base whose interval covers the given double chooseBase :: [BaseFrequency] -> Double -> Base chooseBase [(b,_)] _ = b chooseBase ((b,f):xs) p | p < f = b | otherwise = chooseBase xs (p-f) type Seed = Int type Pseudo a = StateT Seed IO a prng :: Pseudo Double prng = let nextSeed s = (s * ia + ic) `mod` im normalize n = (fromIntegral n) * (1.0 / fromIntegral im) im = 139968; ia = 3877; ic = 29573 in do seed <- get let seed' = nextSeed seed put seed' return (normalize seed') prngN count = sequence $ replicate count prng -- write a sequence in Fasta format writeFasta :: String -> String -> Sequence -> IO () writeFasta label title sequence = do putStrLn $ ">" ++ (label ++ (" " ++ title)) writeWrapped 60 sequence where writeWrapped _ [] = do return () writeWrapped len str = do let (s1,s2) = splitAt len str putStrLn s1 writeWrapped len s2 writeFastaHeader :: (MonadIO m) => String -> String -> m () writeFastaHeader label title = liftIO $ putStrLn $ ">" ++ (label ++ (" " ++ title)) writeWrapped' :: Int -> Int -> (Double->Base) -> Pseudo () writeWrapped' wrap total trans = let work c = case c of 0 -> return () n -> do let c' = min wrap n nextC = c - c' s <- liftM (map trans) (prngN c') liftIO $ putStrLn s work nextC in work total writeWrapped = writeWrapped' 60 main = do args <- getArgs let n = if (null args) then 1000 else read (head args) writeFasta "ONE" "Homo sapiens alu" (take (2*n) (cycle alu)) writeFastaHeader "TWO" "IUB ambiguity codes" seed' <- execStateT (writeWrapped (3*n) (chooseBase iub)) 42 writeFastaHeader "THREE" "Homo sapiens frequency" execStateT (writeWrapped (5*n) (chooseBase homosapiens)) seed'

Daniel Carrera writes:
http://shootout.alioth.debian.org/
It looks like Haskell doesn't do very well. It seems to be near the bottom of the pile in most tests. Is this due to the inherent design of Haskell or is it merely the fact that GHC is young and hasn't had as much time to optimize as other compilers?
It's because nobody took the time to write faster entries for the tests where Haskell is at the bottom of the pile. The "Computer Language Shootout Benchmark" is a fun idea, but it's quite pointless to draw any conclusions about programming languages from those results. If it were included in the contest, native assembler code would win every time. Peter

Peter Simons wrote:
It's because nobody took the time to write faster entries for the tests where Haskell is at the bottom of the pile. The "Computer Language Shootout Benchmark" is a fun idea, but it's quite pointless to draw any conclusions about programming languages from those results. If it were included in the contest, native assembler code would win every time.
If the results could be trusted, they would be useful. You could balance the expected loss in performance against other factors (e.g. speed of development). Cheers, Daniel. -- /\/`) http://oooauthors.org /\/_/ http://opendocumentfellowship.org /\/_/ \/_/ I am not over-weight, I am under-tall. /

Daniel Carrera writes:
If the results could be trusted, they would be useful. You could balance the expected loss in performance against other factors (e.g. speed of development).
How do you measure the time it takes to come up with a QuickSort algorithm that, implemented in Haskell, crushes the MergeSort algorithm all other languages use? ;-) Peter

Peter Simons wrote:
How do you measure the time it takes to come up with a QuickSort algorithm that, implemented in Haskell, crushes the MergeSort algorithm all other languages use? ;-)
:-) I've been impressed with Haskell so far. Today I implemented a reverse Polish notation calculator. The code is 55 lines, it's easy to read, and only took a few hours for this humble newbie to implement. The second function I wrote in Haskell was quicksort :) But still... I think that speed benchmarks can be useful. For example, I really like Ruby, but its speed would really stop me from using it for many applications even though it's a very expressive language. So it's good to know what speed limitations the language has. Right now I'd say that if I need a complex algorithm (e.g. an RPN calculator) I'd do it in Haskell, but when I have a simple algorithm and performance is an issue (e.g. modeling the colission of two galaxies) I'd use C. Cheers, Daniel. -- /\/`) http://oooauthors.org /\/_/ http://opendocumentfellowship.org /\/_/ \/_/ I am not over-weight, I am under-tall. /

Daniel Carrera writes:
when I have a simple algorithm and performance is an issue [...] I'd use C.
You don't have to. You can write very fast programs in Haskell. I never really finished the article I wanted to write about this subject, but the fragment I have might be interesting or even useful nonetheless: http://cryp.to/blockio/fast-io.html http://cryp.to/blockio/fast-io.lhs The text uses one of the Language Shootout's tasks as an example. Peter

On 23 Dec 2005 22:29:02 +0100, Peter Simons
Daniel Carrera writes:
when I have a simple algorithm and performance is an issue [...] I'd use C.
You don't have to. You can write very fast programs in Haskell.
I never really finished the article I wanted to write about this subject, but the fragment I have might be interesting or even useful nonetheless:
http://cryp.to/blockio/fast-io.html http://cryp.to/blockio/fast-io.lhs
One of the interesting points that this illustrates (to me) is that the "obvious" approach in Haskell can be seriously non-optimal in terms of performance. Add to this the fact that tuning functional programs is a non-trivial art, and it becomes quite easy to see why Haskell might come across as slow. Your wcLazy implementation is (as you say) the "obvious" approach. Your two other approaches aren't particularly obvious, althought they are far more efficient. It would be interesting to see standalone code for wcIOB (where you're allowed to assume that any helpers you need, like your block IO library, are available from the standard library). This would help in comparing the "obviousness" of the two approaches. Paul

On Dec 24, 2005, at 6:02 PM, Paul Moore wrote:
One of the interesting points that this illustrates (to me) is that the "obvious" approach in Haskell can be seriously non-optimal in terms of performance. Add to this the fact that tuning functional programs is a non-trivial art, and it becomes quite easy to see why Haskell might come across as slow.
I got totally burned by this trying to write a heavily threaded, heavy networking app in Haskell. The obvious approach was a dead-end. Joel -- http://wagerlabs.com/

Hello Joel, Saturday, December 24, 2005, 9:29:17 PM, you wrote:
One of the interesting points that this illustrates (to me) is that the "obvious" approach in Haskell can be seriously non-optimal in terms of performance. Add to this the fact that tuning functional programs is a non-trivial art, and it becomes quite easy to see why Haskell might come across as slow.
JR> I got totally burned by this trying to write a heavily threaded, JR> heavy networking app in Haskell. The obvious approach was a dead-end. your problem is not Haskell or GHC, your problem is just lack of experience in this area. remember how i decreased number of your threads in 4 times, remember that i found simple and obvious solution for problem, what you tried to fix 2 weeks and ended with attempt to write your own scheduler. remember that you start low-optimizing "functional pickling" library, which is inefficient by its functional composition design instead of give chances to all serialization libraries and select the most effective one so i think that your problems is due to bad design decisions caused by lack of experience. two weeks ago when you skipped my suggestions about improving this design and answered that you will use "systematic approach", i foresee that you will fail and say that Haskell is a bad language Haskell is really slow language. but even if it will be 100 times better, it can't write programs itself, some help from programmer anyway is needed :) -- Best regards, Bulat mailto:bulatz@HotPOP.com

On Dec 25, 2005, at 10:13 AM, Bulat Ziganshin wrote:
Hello Joel, [...] so i think that your problems is due to bad design decisions caused by lack of experience. two weeks ago when you skipped my suggestions about improving this design and answered that you will use "systematic approach", i foresee that you will fail and say that Haskell is a bad language
Yes and no. The systematic approach that I used was profiling the serialization code and tweaking all that I could. I saved my profiling reports after each run and tracked the changes that I made. I will blog about it after Simon M. comes back and suggests how to squeeze the last bit out of it. Regardless of this, it looks to me like I could easily have around 4Mb of network traffic per second with about 4k threads and complicated nested structures to serialize and deserialize. Trying to tackle far less data suggests to me that it's not gonna happen. So I will try to take this as far as I can in Haskell, once I have the heavy artillery to back me up. If the results are good then I will use them in later applications of the same nature but in the meantime I'm rewriting this particular app in Erlang. There are other apps to write in Haskell but my overall lesson is that the choice of right tool for the task at hand will help far more than any optimizations you can produce. Specially when the natural approach to the problem will just produce the results required. I spent 3 months with this app, going from total Haskell newbie who has not seen Haskell before to someone who can read Haskell core code and tackle FFI and networking in a heavily concurrent environment. I still don't know what "the right thing" is. The Erlang rewrite should take me 2-3 days with FFI being the teeth- grinding bit. I need the FFI (which totally sucks compared to Haskell's) because my use of SSL and ZLib is non-standard, otherwise these are already provided to you. I expect that the app will just work. No memory or time leaks. The serialization approach that I'm using is based on the pickler combinators but slightly different. I think I'll backport it to Haskell to compare. The next project on my plate is real-time collusion detection in poker. I first thought of using the Dazzle approach and Bayesian networks but the focus again seems to be on high-throughput networking and scalability. What language will I use? Haskell is no sure bet. A disclaimer for those who might think that I'm a newbit disillutioned with Haskell... I totally love Haskell, it just seems that you need to be a guru to do what I'm trying to do and I have not yet reached this status. I will not rest until I'm able to see just _what_ kind of a performance can be squeezed from Haskell for my application. I'll also continue to seek enlightenment but I need to finish this project before New Year's. Joel -- http://wagerlabs.com/

On Sun, Dec 25, 2005 at 12:20:38PM +0000, Joel Reymont wrote:
A disclaimer for those who might think that I'm a newbit disillutioned with Haskell... I totally love Haskell, it just seems that you need to be a guru to do what I'm trying to do and I have not yet reached this status.
I am pleased to hear this. I would love to be able to help you more often, alas my job was very time demanding recently (but quite rewarding too :-). And we have Christmas right now. I think your work will be very important and valuable for the community. You've shown were Haskell could be better, and I believe it will catch up eventually, either by improvements in GHC, in libraries or simply in documentation. Merry Christmas! Tomasz -- I am searching for a programmer who is good at least in some of [Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland

On Dec 25, 2005, at 1:14 PM, Tomasz Zielonka wrote:
I think your work will be very important and valuable for the community. You've shown were Haskell could be better, and I believe it will catch up eventually, either by improvements in GHC, in libraries or simply in documentation.
Thank you Tomasz! I think eventually will come sooner than you think :-). On my TODO list for the next few weeks: 1) Set up a Haskell vs. Erlang heavily-multithreaded serialization shootout 2) Work Simon M. to make sure Haskell is on par and try to hack GHC to add whatever is needed 3) Add profiling support for STM to GHC (including retainer profiling) 4) Adapt the Zipper FileServer OS to #1. See if single-threading with delimited continuations is better than multi-threading with unbound threads. 5) If #4 is a yes then investigate why And of course I will blog about the whole thing as I go along. Thanks, Joel -- http://wagerlabs.com/

If anyone is interested, I posted the Erlang version of the pickler combinators at http://wagerlabs.com/erlang/pickle.erl Orignal paper at http://research.microsoft.com/~akenn/fun/ picklercombinators.pdf Notice that I did away with the "sequ" while preserving "wrap" and friends. Erlang does not have enums so I added support for those. I did not test the performance of this code yet. I used the "most natural way of doing things", though, and I expect performance to be excellent. I'm most interested in whether it's possible to get the performance of the Haskell version on par with this one. Thanks, Joel -- http://wagerlabs.com/

From: Joel Reymont
To: Bulat Ziganshin CC: Peter Simons , haskell-cafe@haskell.org Subject: [Haskell-cafe] Haskell vs. Erlang for heavy-duty network apps (wasRe: Haskell Speed) Date: Sun, 25 Dec 2005 12:20:38 +0000 On Dec 25, 2005, at 10:13 AM, Bulat Ziganshin wrote:
Hello Joel, [...] so i think that your problems is due to bad design decisions caused by lack of experience. two weeks ago when you skipped my suggestions about improving this design and answered that you will use "systematic approach", i foresee that you will fail and say that Haskell is a bad language
Yes and no. The systematic approach that I used was profiling the serialization code and tweaking all that I could. I saved my profiling reports after each run and tracked the changes that I made. I will blog about it after Simon M. comes back and suggests how to squeeze the last bit out of it.
Regardless of this, it looks to me like I could easily have around 4Mb of network traffic per second with about 4k threads and complicated nested structures to serialize and deserialize. Trying to tackle far less data suggests to me that it's not gonna happen. So I will try to take this as far as I can in Haskell, once I have the heavy artillery to back me up. If the results are good then I will use them in later applications of the same nature but in the meantime I'm rewriting this particular app in Erlang.
Sounds familiar ?:) http://www.jetcafe.org/~npc/doc/euc00-sendmail.html Similar experience with Erlang about 5 years ago :0) Greetings, Bane. _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

Paul Moore writes:
It would be interesting to see standalone code for wcIOB (where you're allowed to assume that any helpers you need, like your block IO library, are available from the standard library). This would help in comparing the "obviousness" of the two approaches.
A simple version of the program -- which doesn't need any 3rd party modules to compile -- is attached below. My guess is that this approach to I/O is quite obvious, too, if you have experience with system programming in C. IMHO, the main point of the example in the article is that wc :: String -> (Int, Int, Int) wc file = ( length (lines file) , length (words file) , length file ) is a crapy word-counting algorithm. I'm not sure whether conclusions about functional programming in general or even programming in Haskell can be derived from this code. Most people seem to have trouble with lazy-evaluation, first of all. Peter -- Compile with: ghc -O2 -funbox-strict-fields -o wc wc.hs module Main ( main ) where import System.IO import Foreign type Count = Int data CountingState = ST !Bool !Count !Count !Count deriving (Show) initCST :: CountingState initCST = ST True 0 0 0 wc :: Char -> CountingState -> CountingState wc '\n' (ST _ l w c) = ST True (l+1) w (c+1) wc ' ' (ST _ l w c) = ST True l w (c+1) wc '\t' (ST _ l w c) = ST True l w (c+1) wc _ (ST True l w c) = ST False l (w+1) (c+1) wc _ (ST False l w c) = ST False l w (c+1) bufsize :: Int -- our I/O buffer size bufsize = 4096 type IOHandler st = Ptr Word8 -> Int -> st -> IO st countBuf :: IOHandler CountingState countBuf _ 0 st@(ST _ _ _ _) = return st countBuf ptr n st@(ST _ _ _ _) = do c <- peek ptr let st' = wc (toEnum (fromEnum c)) st countBuf (ptr `plusPtr` 1) (n - 1) st' loop :: Handle -> Int -> IOHandler st -> st -> IO st loop h n f st' = allocaArray n (\ptr' -> loop' ptr' st') where loop' ptr st = st `seq` do rc <- hGetBuf h ptr n if rc == 0 then return st else f ptr rc st >>= loop' ptr main :: IO () main = do ST _ l w c <- loop stdin bufsize countBuf initCST putStrLn . shows l . (' ':) . shows w . (' ':) $ show c

On Sun, Dec 25, 2005 at 12:24:38PM +0100, Peter Simons wrote:
Paul Moore writes:
It would be interesting to see standalone code for wcIOB (where you're allowed to assume that any helpers you need, like your block IO library, are available from the standard library). This would help in comparing the "obviousness" of the two approaches.
A simple version of the program -- which doesn't need any 3rd party modules to compile -- is attached below. My guess is that this approach to I/O is quite obvious, too, if you have experience with system programming in C.
IMHO, the main point of the example in the article is that
wc :: String -> (Int, Int, Int) wc file = ( length (lines file) , length (words file) , length file )
is a crapy word-counting algorithm.
I have a crazy idea: what if we computed all three length applications concurrently, with the RTS preempting the thread when it generates too much unreclaimable nodes? Do you know what I mean? The ideal effect would be that the three threads chase each other on the list, there is always only a constant part of the list in memory (no space-leak). Best regards Tomasz -- I am searching for a programmer who is good at least in some of [Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland

Tomasz Zielonka writes:
wc :: String -> (Int, Int, Int) wc file = ( length (lines file) , length (words file) , length file )
I have a crazy idea: what if we computed all three length applications concurrently, with the RTS preempting the thread when it generates too much unreclaimable nodes?
That's a pretty good idea, actually. What is the state of the 'par' primitive in GHC 6.x? Now that I think of it, it seems that adding a proper "compute in parallel" annotation could make a big difference in this case. Peter

On 25 Dec 2005 12:24:38 +0100, Peter Simons
Paul Moore writes:
It would be interesting to see standalone code for wcIOB (where you're allowed to assume that any helpers you need, like your block IO library, are available from the standard library). This would help in comparing the "obviousness" of the two approaches.
A simple version of the program -- which doesn't need any 3rd party modules to compile -- is attached below. My guess is that this approach to I/O is quite obvious, too, if you have experience with system programming in C.
Hmm, I can't honestly believe that you feel that your code is as "obvious" as the original. I'm not unfamiliar with monads and state, and I know C well, but it took me a significant amount of time to decipher your code (even knowing what it was intended to do), whereas I knew what the original was doing instantly.
IMHO, the main point of the example in the article is that
wc :: String -> (Int, Int, Int) wc file = ( length (lines file) , length (words file) , length file )
is a crapy word-counting algorithm.
Dunno. It's certainly not a bad (executable!) definition of the problem. My point is that Haskell allows me to write *very* clear "executable pseudocode", but that code is not a good starting point for writing production-quality code. To me, the point of the original article was to demonstrate how to get from the original version to an efficient version, in reasonable steps. I'm not sure how well it achieved that - particularly as the final code didn't clearly (to me) separate out standard libraries, supporting (but non-standard) libraries, and application code. But the article was specifically noted to be incomplete by the author, so that's not unsurprising.
I'm not sure whether conclusions about functional programming in general or even programming in Haskell can be derived from this code. Most people seem to have trouble with lazy-evaluation, first of all.
The biggest conclusion to me is that techniques for *readable* and "obvious" code in Haskell differ significantly from techniques for efficient code. I think that conclusion is not isolated to this one specific example... Back to where this came from, my view is that this is an education issue - tutorials tend to focus on lazy, functional techniques, and not on efficiency. This is true for C (or any other language) tutorials as well, but in languages where the step from naive to efficient code isn't as large, it's not such an issue (mailing list questions in C or Java about "my code isn't fast enough" tend to result in advice on fairly comprehensible tweaks that can be made; in Haskell, they tend to result in monadic rewrites which bear little relationship to the original code - so the original poster hasn't got much chance of understanding what happened...) But the material is available, so people *can* learn. It just needs some effort (but possibly more than it should...) Paul

Back to where this came from, my view is that this is an education issue - tutorials tend to focus on lazy, functional techniques, and not on efficiency.
But the material is available, so people *can* learn. It just needs some effort (but possibly more than it should...)
Can anyone suggest some good tutorials, papers, or books to read for learning how to reason about laziness, specifically, time and space complexity? I seem to remember that Richard Bird's Introduction to Functional Programming book has a chapter or so on this subject, but what other material would anyone recommend for trying to understand how to write efficient, lazy algorithms? Maybe in the spirit of updating Wikis and such, we can collect this sort of material together... Jared. -- jupdike@gmail.com http://www.updike.org/~jared/ reverse ")-:"

On Mon, Dec 26, 2005 at 11:58:52AM +0000, Paul Moore wrote:
My point is that Haskell allows me to write *very* clear "executable pseudocode", but that code is not a good starting point for writing production-quality code.
It is often a good start, but not always a good end ;-) Recently, I have written some code to do something we previously thought would be impractical because of unacceptable performance. I did it in Haskell in the most obvious way I could. It works, because Haskell helps me to look at the problem itself and solve efficiency problems in the big picture, not only in the details.
I'm not sure whether conclusions about functional programming in general or even programming in Haskell can be derived from this code. Most people seem to have trouble with lazy-evaluation, first of all.
The biggest conclusion to me is that techniques for *readable* and "obvious" code in Haskell differ significantly from techniques for efficient code. I think that conclusion is not isolated to this one specific example...
That's not my experience. I have often been surprised that I was able to transform the "obvious", inefficient code into an efficient version in a short sequence of small steps. The best way to do this is to first modularize the code, splitting it to smaller functions, perhaps introducing some type-classes, non-IO monads, etc. If you know what you are doing, you can later optimize the code locally, modularly. Also, you say that "monadic rewrites bear little relationship to the original code". I often start writing my *pure* code in monadic form, so I have this problem much less often. A good intro to modularizing functional programs is in the "Why Functional Programming Matters" article: http://www.md.chalmers.se/~rjmh/Papers/whyfp.html I've read it when I was learning Haskell a couple of years ago, but now I know I didn't get it back then. When I read it again this week and compared it to my experiences, I thought "Wow, this man knows what he's taking about!". I daresay that if you don't say something like "Wow" while reading the article, then you didn't get it ;-)
Back to where this came from, my view is that this is an education issue - tutorials tend to focus on lazy, functional techniques, and not on efficiency.
If you want to focus on efficiency from the start, you write an assembler tutorial ;-) Well, you have some point. We need "Efficient Haskell" tutorials, for people who have already gone through basic tutorials. I think there even is one or two.
This is true for C (or any other language) tutorials as well, but in languages where the step from naive to efficient code isn't as large, it's not such an issue (mailing list questions in C or Java about "my code isn't fast enough" tend to result in advice on fairly comprehensible tweaks that can be made;
On the other hand, in those languages it is often difficult to write high level, specification-like code. No wonder the step is so small ;-)
in Haskell, they tend to result in monadic rewrites which bear little relationship to the original code - so the original poster hasn't got much chance of understanding what happened...)
You don't have that problem, if your obvious code is monadic (not neccesarily IO!) from the start ;-) Best regards Tomasz -- I am searching for a programmer who is good at least in some of [Haskell, ML, C++, Linux, FreeBSD, math] for work in Warsaw, Poland

wc :: String -> (Int, Int, Int) wc file = ( length (lines file) , length (words file) , length file )
Most people tend to think that imperative programming novices don't even start their obvious solutions from something as inefficient as this. Well, I beg to differ. For almost a decade, most (I dare claim even all) Pascal and C compilers were "three-pass" or "two-pass". It means perhaps the compiler reads the input two or three times (each for a different task, just like the above), or perhaps the compiler reads the input once, produces an intermediate form and saves it to disk, then reads the intermediate form from disk, produces a second intermediate form and saves it to disk, then reads the second intermediate form from disk, then it can produce machine code. It must have been the obvious method, since even though it was obviously crappy, everyone was doing it, and "everyone" obviously refers to both novices and gurus. It must also have been pretty non-obvious how to transition from the obvious slow method to a one-pass fast method, since for almost a decade no one did it. Most of us had to wait until someone figured it out and then we had Turbo Pascal. And it was the case with imperative programming.

Albert Lai writes:
For almost a decade, most (I dare claim even all) Pascal and C compilers were "three-pass" or "two-pass". It means perhaps the compiler reads the input two or three times [...], or perhaps the compiler reads the input once, produces an intermediate form and saves it to disk, then reads the intermediate form from disk, produces a second intermediate form and saves it to disk, then reads the second intermediate form from disk, then it can produce machine code.
It must have been the obvious method, since even though it was obviously crappy [...].
I beg to differ. Highly modular software is not necessarily crapy if you're writing something as complex as a C or Pascal compiler -- especially in times where RAM existed only in miniscule amounts. A highly modularized algorithm for counting lines, words, and characters in an input file, however, is something altogether different. I doubt anyone but the most inexperienced novices would have tried to do that in three passes in a strict language. Peter

On Thu, 2005-12-29 at 15:56 -0500, Albert Lai wrote: . . .
one-pass fast method, since for almost a decade no one did it. Most of us had to wait until someone figured it out and then we had Turbo
Judging from comments by U. Ammann [1],[2], part of the original Pascal implementation team at ETH Zurich, the first Pascal compiler for the CDC 6x00 family of computers was written in 1970-1971, and a second was developed from scratch for the same computers but compiling the revised language, was written in 1972-1974. I think both of these compilers were one-pass. -- Bill Wood [1] U. Ammann, "The Zurich Implementation", in D.W. Barron (ed.), _Pascal, The Language and its Implementation_, pp. 63-82, John Wiley and Sons, 1981. [2] U. Ammann, "Code Generation for a Pascal Compiler", in D.W. Barron (ed.), _Pascal, The Language and its Implementation_, pp. 83-124, John Wiley and Sons, 1981.

Hello Albert, Thursday, December 29, 2005, 11:56:12 PM, you wrote: AL> For almost a decade, most (I dare claim even all) Pascal and C AL> compilers were "three-pass" or "two-pass". 1) Pascal was developed as one-pass compiled language. highly-optimizing compilers used additional passes to generate better code 2) multiple passes was used in times where was just not enough memory to perform all passes in parallel -- Best regards, Bulat mailto:bulatz@HotPOP.com

From: Daniel Carrera
To: Haskell-Cafe@haskell.org Subject: [Haskell-cafe] Haskell Speed Date: Fri, 23 Dec 2005 19:14:46 +0000 Hi all,
I'm taking a look at the "Computer Language Shootout Benchmarks".
http://shootout.alioth.debian.org/
It looks like Haskell doesn't do very well.
Of course, first example uses [String] instead of Data.HashTable as other languages do. Imagine C program does not use hash ,rather list, how it will perform:) I didn't look further after that. Greetings, Bane. _________________________________________________________________ Don't just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/

Daniel Carrera wrote:
It looks like Haskell doesn't do very well. It seems to be near the bottom of the pile in most tests. Is this due to the inherent design of Haskell or is it merely the fact that GHC is young and hasn't had as much time to optimize as other compilers?
I don't think it's that bad. It depends on the particular test, but it's almost comparable to Java, iirc. On some tests, it's terrible, though.
For example, another very slow language is Ruby. In Ruby's case, there is a design factor that will always make it slow. I wonder if Haskell is in a smilar situation.
Haskell's syntax and type system are powerful enough that technically there are a lot of optimizations possible without involving FFI. It may become ugly, though, and less and less safe e.g. if you have to use unsafeWrite's to update arrays to eliminate boundchecks, etc. A lot of the benchmark problems (at least the ones GHC seems to do worse than usual, e.g. http://shootout.alioth.debian.org/gp4/benchmark.php?test=revcomp&lang=all) involve some sort of string processing. Idiomatic Haskell dictates that one uses a linked list of Char's because FastString is not part of the language. That is a lot of overhead for values as small as one byte. Also, the input string is 25M characters long in the revcomp case, thus there's a lot of difference between reversing it with and without in-place updates. If you look at the OCaml implementations, they usually use references, in-place updates and compile with boundchecks disabled (but that is idiomatic ocaml code). However, I don't think it is right to downplay these benchmarks. Such little tasks exist in one form or another in bigger programs. Perhaps we should include mutable arrays in 'idiomatic' Haskell as well. Otherwise it is similar to proposing std::getline() take a std::List<Char> as an argument from a performance point of view. And it's not right to blame naive implementors, either. I couldn't have guessed that the see the difference between the two haskell implementations for sum-file would be so massive. It's a pity that the super-slow version could very well be the version your coworker would have written even if you wouldn't. Cheers, Koray
participants (15)
-
Albert Lai
-
Bill Wood
-
Branimir Maksimovic
-
Bulat Ziganshin
-
Chris Kuklewicz
-
Daniel Carrera
-
Einar Karttunen
-
Greg Buchholz
-
Jared Updike
-
Jeremy Shaw
-
Joel Reymont
-
Paul Moore
-
Peter Simons
-
S Koray Can
-
Tomasz Zielonka