Re: [Haskell-cafe] Fusing foldr's

26 Oct 2007

      Thanks for letting me know about the Data.Strict library on Hackage. I 
will definitely make use of that! BTW, you left out an import 
Data.List(foldl') in your example.

My timing test is an order of magnitude worse than yours. Do you have an 
extra zero in your list endpoint?
...
I fed these functions to ghc with the -O2 and -threaded flags and
timed them using the list [1..10000000]. The result (best times out of
several runs):
avg4: 284 ms
avgS: 184 ms
avgP: 248 ms
Using ghc -threaded -O2 --make Avg.hs for ghc 6.6.1, I ran your tests on 
[1..10000000] and got the user times:

avg4: 12.75 s
avgS:  3.65 s
avgP: 15.56 s

The funny thing is that avg4/avgS = 3.5 for and only 1.5 for you. I 
understand that with only 1 processor my avgP time may be twice yours, 
but not the avgS or avg4.

I have the following machine:

Main memory size: 2026 Mbytes
Num Processors: 1
Processor Type: Intel(R) Xeon(TM) CPU 2.80GHz x32
Clock Speed: 2790 MHZ

Josef Svenningsson wrote:
...
Sorry for reacting so late on this mail. I'm digging through some old mails...
On 10/12/07, Dan Weston  wrote:
...
Always check optimizations to make sure they are not pessimizations!
Actually, traversing the list twice is very cheap compared to space
leakage, and accumulating pairs requires tuple boxing and unboxing which
I don't know how to get GHC not to do.
I agree hole-heartedly that replacing multiple traversals with a
single traversal should be done with care as it more often than not
results in a pessimization. Indeed you showed just that with your
examples! But I'd thought it'd be interesting to see how it can
actually be an improvement if done carefully.
\begin{code}
import Control.Arrow
import qualified Data.Strict.Tuple as T
import Data.Strict.Tuple (Pair(..))
import Control.Parallel
avg4 = uncurry (/) . (foldl' (+) 0 &&& foldl' (\x y -> x + 1) 0)
avgS = T.uncurry (/) . foldl' (\p n -> ((+n) *!* (+1)) p) (0 :!: 0)
avgP = uncurry (/) . (foldl' (+) 0 &!& foldl' (\x y -> x + 1) 0)
(*!*) f g (a :!: b) = f a :!: g b
(&!&) f g a = fa `par` (fa,ga)
  where fa = f a
        ga = g a
\end{code}
avg4 is the function which was best among Dan's benchmarks. avgS uses
strict tuples. I just threw in avgP for fun, it traverses the lists in
parallel. Note: I do have a dual core machine so it makes sense to try
avgP.
I fed these functions to ghc with the -O2 and -threaded flags and
timed them using the list [1..10000000]. The result (best times out of
several runs):
avg4: 284 ms
avgS: 184 ms
avgP: 248 ms
It seems doing a single traversal can be faster if your write your
function carefully. Doing the traversal in parallel was beneficial but
not as good as the single traversal.
Cheers,
/Josef