
OK, what did I do wrong here?
module Main () where
import Data.List
top = 10 ^ 8 :: Int
main = do
let numbers = [1 .. top]
print $ foldl' (+) 0 numbers
-- Runtime is 20 seconds.
#include

OK, what did I do wrong here?
When making a request for help on a compiler issue, you failed to include key information to make it possible to reproduce your problem, and what you did include was broken or incorrect. The three programs that submitted don't do even do the same thing. Let's look into this further. * Program 1:
module Main () where
import Data.List
top = 10 ^ 8 :: Int
main = do let numbers = [1 .. top] print $ foldl' (+) 0 numbers
-- Runtime is 20 seconds.
Well, let's see if we can reproduce this: $ ghc -O2 A.hs --make [1 of 1] Compiling Main ( A.hs, A.o ) Linking A ... $ time ./A 5000000050000000 ./A 1.54s user 0.01s system 98% cpu 1.571 total Nope. OK, so this seems like user error. Without more info about how you conducted your experiment, the results are meaningless. My guess is that you compiled it without optimisations? Nope, not that, $ ghc -O0 --make A.hs $ time ./A 5000000050000000 ./A 2.65s user 0.01s system 99% cpu 2.667 tota So even with all optimisations disabled, it is still an order of magnitude faster than the number you presented. Resolution: invalid. Not reproducible. * Program 2
#include
int main(void) { int total = 0; int n;
for (n = 1, n < 100000000; n++) total += n;
printf("%d", n); }
// Runtime is 0.8 seconds.
Ok. Let's try this then, a C program: $ gcc t.c t.c: In function ‘main’: t.c:8: error: expected ‘;’ before ‘)’ token Ah, an incorrect C program. Correcting the OP's typo: $ time ./a.out 100000000 ./a.out 0.41s user 0.00s system 100% cpu 0.416 total So its actually a different program. Is this supposed to print 'total'? This program seems wrong in a number of other ways too. Resolution: non-sequitor
Program 3
module Main () where
import Data.List
top = 10 ^ 8 :: Int
kernel i o = if i < top then o `seq` kernel (i+1) (o+i) else o
main = do print $ kernel 1 0
-- Runtime is 0.5 seconds. Clearly these two nearly identical Haskell programs should have exactly the same runtime. Instead, one is 40x slower. Clearly something has gone catastrophically wrong here. The whole point of GHC's extensive optimiser passes are to turn the first example into the second example - but something somewhere hasn't worked. Any suggestions?
Ok, another program. Let's try this. $ ghc -O2 B.hs --make [1 of 1] Compiling Main ( B.hs, B.o ) Linking B ... $ time ./B 4999999950000000 ./B 0.18s user 0.00s system 98% cpu 0.186 total Oh, this is produces yet another result. In 0.186 seconds. So, going back to the original question, what did you do wrong? If you're seeking input for a technical error relating to performance, you should have, but failed to: * Use programs that implement the same algorithm * Indicate which compiler versions/optimisations/architecture you're on * State what you expected the results to be. Besides the technical aspects, your presentation categorises the post somwhere between internet crank and internet troll, as: * You used an inflammatory title, which doesn't inspire trust. * You jumped to conclusions on the fundamental nature of a technology, striking at its reasons for existing, without considering to recheck your assumptions. So yes, epic fail. And after three years of this, I'm not hopeful. But perhaps you can take some lessons from this post to improve your next one? ------------------------------------------------------------------------ Now, assuming good faith, and you were just very confused about what you were measuring, or how to measure it, or how to ask for help in a technical forum, here's some more fun: let's write your 1st program, and see if GHC can transform it into your 3rd program. $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.10.1 $ uname -msr Linux 2.6.28-ARCH x86_64 $ gcc --version gcc (GCC) 4.3.3 $ ghc-pkg list uvector uvector-0.1.0.3 Here's a a program written with combinators in a high level style (and using a library written in a high level way): import Data.Array.Vector main = print . sumU . enumFromToU 0 $ (10^8 :: Int) Compiling it, $ ghc -O2 --make C.hs Which yields the following core, $wfold_s15D :: Int# -> Int# -> Int# $wfold_s15D = \ (ww1_s15a :: Int#) (ww2_s15e :: Int#) -> case ># ww2_s15e ww_s154 of wild_a12I { False -> $wfold_s15D (+# ww1_s15a ww2_s15e) (+# ww2_s15e 1); True -> ww1_s15a Because GHC knows how to optimise loops of these forms. And the resulting assembly is pretty nice, s16o_info: .Lc17g: cmpq 6(%rbx),%rdi jg .Lc17j leaq 1(%rdi),%rax addq %rdi,%rsi movq %rax,%rdi jmp s16o_info Running it, $ time ./C 5000000050000000 ./C 0.17s user 0.00s system 99% cpu 0.175 total We can also try the C backend, though we probably don't expect much change here. $ ghc -O2 -fvia-C -optc-O3 --make C.hs Yielding an inner loop of, s16o_info: cmpq 6(%rbx), %rdi jg .L2 addq %rdi, %rsi leaq 1(%rdi), %rdi jmp s16o_info A little different in the inner loop if we run the code though GCC. Interesting! $ time ./C 5000000050000000 ./C 0.15s user 0.00s system 99% cpu 0.155 total Some small difference. -- Don
participants (2)
-
Andrew Coppin
-
Don Stewart