
On Saturday 11 September 2010 14:46:48, Bas van Dijk wrote:
On Sat, Sep 11, 2010 at 5:13 AM, michael rice
wrote: Which of these would be more costly for a long list?
f :: [Int] -> [Int] f [x] = [x] f (x:xs) = x + (head xs) : f xs
f :: [Int] -> [Int] f [x] = [x] f (x:y:xs) = x + y : f (y:xs)
So to summarize from fastest to slowest:
f1: mean: 80.04258 ms f4: mean: 80.80089 ms f3: mean: 82.66586 ms f2: mean: 83.13315 ms
To find out why f1 is the fastest you can look at the core using ghc-core[2]:
f4_go :: Int -> [Int] -> [Int]
f3_$sf3 :: [Int] -> Int -> [Int]
f2_$sf2 :: [Int] -> Int -> [Int]
I don't immediately see the reason for the time difference between f4, f3 and f2. The inner loops all seem equivalent.
I don't pretend to understand at the processor level why f4's loop is faster that f2's and f3's, but I've observed on several occasions that parameter order plays a big rĂ´le for performance. Apart from the rule to order them in increasing order of variation, on my box, getting Int# parameters first (in the core) is better (in particular, getting them before Double# parameters, doesn't seem to make so much difference for lists or boxed types in general), so that might explain the difference. And, just for kicks: f5 :: [Int] -> [Int] f5 [] = [] f5 xs@(_:ys) = zipWith (+) xs (ys ++ [0]) and for speed: f6 :: [Int] -> [Int] f6 [] = [] f6 (x:xs) = go x xs where go y [] = [y] go y (z:zs) = let s = y+z in s `seq` (s : go z zs) f7 :: [Int] -> [Int] f7 [] = [] f7 (x:xs) = go xs x where go [] y = [y] go (z:zs) y = let s = y+z in s `seq` (s : go zs z) f7: mean: 32.06289 ms f6: mean: 32.70934 ms f1: mean: 39.27808 ms f4: mean: 40.30768 ms f2: mean: 41.05561 ms f3: mean: 41.49728 ms f5: mean: 59.87034 ms Well, actually it isn't so clear cut between f2, f3 and f4, I've even had benchmark runs where f4 was slower than f2 and f3, the order of f2 and f3 changes too. Also sometimes f6 is faster than f7, but usually f7 is a little faster than f6, there's a largish gap to f1, a smaller gap to f4, closely followed by f2 and f3, f5 trailing by a long distance. So, let's look at the core for f6 and f7 (inner loops only): Rec { TestFuns.f6_$sgo :: [GHC.Types.Int] -> GHC.Prim.Int# -> [GHC.Types.Int] GblId [Arity 2 NoCafRefs Str: DmdType SL] TestFuns.f6_$sgo = \ (sc_sok :: [GHC.Types.Int]) (sc1_sol :: GHC.Prim.Int#) -> case sc_sok of _ { [] -> GHC.Types.: @ GHC.Types.Int (GHC.Types.I# sc1_sol) (GHC.Types.[] @ GHC.Types.Int); : z_ae4 zs_ae5 -> case z_ae4 of _ { GHC.Types.I# y_amw -> GHC.Types.: @ GHC.Types.Int (GHC.Types.I# (GHC.Prim.+# sc1_sol y_amw)) (TestFuns.f6_$sgo zs_ae5 y_amw) } } end Rec } Yay, the Int parameter got unboxed, so one visit less to the heap per round. But GHC switched the parameter order, so let's try to fix that by switching the order ourselves: Rec { TestFuns.f7_$sgo :: GHC.Prim.Int# -> [GHC.Types.Int] -> [GHC.Types.Int] GblId [Arity 2 NoCafRefs Str: DmdType LS] TestFuns.f7_$sgo = \ (sc_sot :: GHC.Prim.Int#) (sc1_sou :: [GHC.Types.Int]) -> case sc1_sou of _ { [] -> GHC.Types.: @ GHC.Types.Int (GHC.Types.I# sc_sot) (GHC.Types.[] @ GHC.Types.Int); : z_aeb zs_aec -> case z_aeb of _ { GHC.Types.I# y_amw -> GHC.Types.: @ GHC.Types.Int (GHC.Types.I# (GHC.Prim.+# sc_sot y_amw)) (TestFuns.f7_$sgo y_amw zs_aec) } } end Rec } Exactly what I wanted :)
Regards,
Bas