
rendel:
Prabhakar Ragde wrote:
divisors i = [j | j<-[1..i-1], i `mod` j == 0] main = print [i | i<-[1..10000], i == sum (divisors i)]
Jerzy Karczmarczuk wrote:
My point didn't concern that point. Haskell compiler cannot change an algorithm using lists into something which deals with indexable arrays, usually faster. Indexing may be faster than the indirection, and the allocation of memory costs. And there is laziness...
This may be true, but it isn't relevant in this case, since the "obvious c program" doesn't need any arrays, only two loops:
for (int i = 1; i <= 10000; i++) { int sum = 0; for (int j = 1; j < i; j++) if (i % j == 0) sum += i; if (sum == i) print(i); }
Loops can be expressed with lazy lists in Haskell. Therefore, the presented Haskell program is perfectly equivalent to the "obvious c program".
So what we would hope is that GHC could transform a set of composed lazy list functions into a doubly nested strict loop in Int#... Let's see if we can get that result from GHC, using a bit of fusion. First, to explain what is happening, let's first replace the `mod` call with `rem`, which is faster, and then desugar the list comprehension and enumerations syntax, to expose the underlying program: default(Int) divisors i = filter (\j -> i `rem`j == 0) (enumFromTo 1 (i-1)) main = print $ filter (\i -> i == sum (divisors i)) (enumFromTo 1 10000) Looking at this we see some good chances for fusion to take place: the enumFromTo should fuse twice with 'filter', using build/foldr fusion. And with stream fusion, the left fold 'sum' should also fuse with pipeline that results from divisors. So my prediction would be that this program would run slightly faster with stream fusion. Let's see... Compiling with -O2 and ghc 6.8 snapshot, with build/foldr fusion, we see two fusion sites, as expected, and a spec-constr of 'sum: $ ghc-6.9.20070916 A.hs -O2 -ddump-simpl-stats RuleFired 1 SPEC Data.List.sum 2 fold/build Good, running this: $ time ./A-stream [6,28,496,8128] ./A-stream 1.29s user 0.02s system 99% cpu 1.319 total Now we can try with stream fusion, using the stream-fusible list library here: http://www.cse.unsw.edu.au/~dons/code/streams/list/ To use these list functions in preference to the default, we have to import: import Prelude hiding (filter,sum,enumFromTo) import Data.List.Stream and since the base library doesn't include stream fusible enumFromTo, we'll have to write our own definition in terms of stream: import Data.Stream (enumFromToInt,unstream) enumFromTo i j = unstream (enumFromToInt i j) Ok, that's easy. Compiling this, we hope to see 3 fusion sites, and all heap-allocated Ints removed: $ ghc-6.9.20070916 A.hs -O2 -ddump-simpl-stats -package list RuleFired 2 filter -> fusible 1 sumInt -> fusible 1 sum spec Int 3 STREAM stream/unstream fusion Terrific! The 'sum' was specialised to Int, then translated to a stream version, the two filters also were translated, then 3 fusion sites were found and fused. Our program should now run faster: $ time ./A-stream [6,28,496,8128] ./A-stream 1.23s user 0.01s system 99% cpu 1.251 total And so it does, with no list allocated for the sum loop. In fact the entire program reduces to a strict unboxed nested loop: unfold = Int# -> [Int] wloop_sum_sV5 :: Int# -> Int# -> Int# So almost identical types to the C program (bar for the return [Int]). Finally, we can manually translate the C code into a confusing set of nested loops with interleaved IO, main = loop 1 where loop !i | i > 10000 = return () | otherwise = if i == go i 0 1 then print i >> loop (i+1) else loop (i+1) go !i !s !j | j <= i-1 = if i `rem` j == 0 then go i (s+j) (j+1) else go i s (j+1) | otherwise = s And we get *no speed benefit* at all! time ./A-loop 1.24s user 0.00s system 98% cpu 1.256 total So the lesson is: write in a high level style, and the compiler can do the work for you. Or, GHC is pretty smart on high level code. -- Don