
Which of these would be more costly for a long list? f :: [Int] -> [Int] f [x] = [x] f (x:xs) = x + (head xs) : f xs f :: [Int] -> [Int] f [x] = [x] f (x:y:xs) = x + y : f (y:xs) Michael

On Friday 10 September 2010 11:13:50 pm michael rice wrote:
Which of these would be more costly for a long list?
f :: [Int] -> [Int] f [x] = [x] f (x:xs) = x + (head xs) : f xs
f :: [Int] -> [Int]
f [x] = [x] f (x:y:xs) = x + y : f (y:xs)
Another option would be: f [x] = [x] f (x:xs@(y:_)) = (x + y) : f xs However, I believe I've done tests in the past, and your second example generates the same code when optimizations are on (that is, it doesn't build a new y:xs, but reuses the existing one), and that should perform the same as your first implementation. All that said, I'm not sure you'd be able to see the difference anyway. -- Dan

Hi Dan,
I wasn't aware of the third option, at least this particular variant of the *as* pattern. I've only seen it like this
f s@(x:xs) = ...
i.e., outside the parens.
The cost question arose as I was deciding which way to write it.
Thanks,
Michael
--- On Fri, 9/10/10, Dan Doel
Which of these would be more costly for a long list?
f :: [Int] -> [Int] f [x] = [x] f (x:xs) = x + (head xs) : f xs
f :: [Int] -> [Int]
f [x] = [x] f (x:y:xs) = x + y : f (y:xs)
Another option would be: f [x] = [x] f (x:xs@(y:_)) = (x + y) : f xs However, I believe I've done tests in the past, and your second example generates the same code when optimizations are on (that is, it doesn't build a new y:xs, but reuses the existing one), and that should perform the same as your first implementation. All that said, I'm not sure you'd be able to see the difference anyway. -- Dan

On Sat, Sep 11, 2010 at 5:13 AM, michael rice
Which of these would be more costly for a long list?
f :: [Int] -> [Int] f [x] = [x] f (x:xs) = x + (head xs) : f xs
f :: [Int] -> [Int] f [x] = [x] f (x:y:xs) = x + y : f (y:xs)
Use Criterion[1] to find out: module Main where import Criterion.Main f1, f2, f3, f4 :: [Int] -> [Int] f1 [x] = [x] f1 (x:xs) = x + head xs : f1 xs f2 [x] = [x] f2 (x:y:xs) = x + y : f2 (y:xs) f3 [x] = [x] f3 (x:xs@(y:_)) = x + y : f3 xs f4 [x] = [x] f4 (x:y:xs) = x + y : go y xs where go x [] = [x] go x (y:xs) = x + y : go y xs benchMark s f = bench s $ whnf (\n -> sum $ f [1..n]) 1000000 main = defaultMain [ benchMark "f1" f1 , benchMark "f2" f2 , benchMark "f3" f3 , benchMark "f4" f4 ] now compile and run it: $ ghc --make Benchmark.hs -O2 -o benchmark ... $ ./benchmark warming up estimating clock resolution... mean is 24.29944 us (40001 iterations) found 1405 outliers among 39999 samples (3.5%) 721 (1.8%) high mild 684 (1.7%) high severe estimating cost of a clock call... mean is 1.844233 us (45 iterations) found 2 outliers among 45 samples (4.4%) 2 (4.4%) high severe benchmarking f1 collecting 100 samples, 1 iterations each, in estimated 7.917595 s bootstrapping with 100000 resamples mean: 80.04258 ms, lb 79.85129 ms, ub 80.24094 ms, ci 0.950 std dev: 1.000711 ms, lb 878.8460 us, ub 1.179558 ms, ci 0.950 found 1 outliers among 100 samples (1.0%) variance introduced by outliers: 0.990% variance is unaffected by outliers benchmarking f2 collecting 100 samples, 1 iterations each, in estimated 8.171391 s bootstrapping with 100000 resamples mean: 83.13315 ms, lb 82.93615 ms, ub 83.33348 ms, ci 0.950 std dev: 1.017999 ms, lb 904.5153 us, ub 1.174008 ms, ci 0.950 variance introduced by outliers: 0.990% variance is unaffected by outliers benchmarking f3 collecting 100 samples, 1 iterations each, in estimated 8.297014 s bootstrapping with 100000 resamples mean: 82.66586 ms, lb 82.34780 ms, ub 83.39774 ms, ci 0.950 std dev: 2.339937 ms, lb 976.2940 us, ub 4.133495 ms, ci 0.950 found 9 outliers among 100 samples (9.0%) 7 (7.0%) high mild 2 (2.0%) high severe variance introduced by outliers: 0.998% variance is unaffected by outliers benchmarking f4 collecting 100 samples, 1 iterations each, in estimated 8.080888 s bootstrapping with 100000 resamples mean: 80.80089 ms, lb 80.61719 ms, ub 80.99542 ms, ci 0.950 std dev: 968.1706 us, lb 872.7758 us, ub 1.097217 ms, ci 0.950 variance introduced by outliers: 0.990% variance is unaffected by outliers So to summarize from fastest to slowest: f1: mean: 80.04258 ms f4: mean: 80.80089 ms f3: mean: 82.66586 ms f2: mean: 83.13315 ms To find out why f1 is the fastest you can look at the core using ghc-core[2]: $ ghc-core -- -O2 Benchmark.hs f1 :: [Int] -> [Int] GblId f1 = \ (ds_d1h0 :: [Int]) -> case ds_d1h0 of _ { [] -> f11; : x_a12V ds1_d1h1 -> case ds1_d1h1 of _ { [] -> : @ Int x_a12V ([] @ Int); : ipv_s1hv ipv1_s1hw -> : @ Int (case x_a12V of _ { I# x1_a1k6 -> case ipv_s1hv of _ { I# y_a1ka -> I# (+# x1_a1k6 y_a1ka) } }) (f1_$sf1 ipv1_s1hw ipv_s1hv) } } f1_$sf1 :: [Int] -> Int -> [Int] GblId f1_$sf1 = \ (sc_s1FL :: [Int]) (sc1_s1FM :: Int) -> case sc_s1FL of _ { [] -> : @ Int sc1_s1FM ([] @ Int); : ipv_s1hv ipv1_s1hw -> : @ Int (case sc1_s1FM of _ { I# x_a1k6 -> case ipv_s1hv of _ { I# y_a1ka -> I# (+# x_a1k6 y_a1ka) } }) (f1_$sf1 ipv1_s1hw ipv_s1hv) } f4 :: [Int] -> [Int] GblId f4 = \ (ds_d1gl :: [Int]) -> case ds_d1gl of _ { [] -> f41; : x_a13f ds1_d1gm -> case ds1_d1gm of _ { [] -> : @ Int x_a13f ([] @ Int); : y_a13h xs_a13i -> : @ Int (plusInt x_a13f y_a13h) (f4_go y_a13h xs_a13i) } } f4_go = \ (x_a13k :: Int) (ds_d1gu :: [Int]) -> case ds_d1gu of _ { [] -> : @ Int x_a13k ([] @ Int); : y_a13m xs_a13n -> : @ Int (plusInt x_a13k y_a13m) (f4_go y_a13m xs_a13n) } f3 :: [Int] -> [Int] GblId f3 = \ (ds_d1gD :: [Int]) -> case ds_d1gD of _ { [] -> f31; : x_a13b ds1_d1gE -> case ds1_d1gE of _ { [] -> : @ Int x_a13b ([] @ Int); : y_a13e ds2_d1gF -> : @ Int (plusInt x_a13b y_a13e) (f3_$sf3 ds2_d1gF y_a13e) } } f3_$sf3 :: [Int] -> Int -> [Int] GblId f3_$sf3 = \ (sc_s1G3 :: [Int]) (sc1_s1G4 :: Int) -> case sc_s1G3 of _ { [] -> : @ Int sc1_s1G4 ([] @ Int); : y_a13e ds_d1gF -> : @ Int (plusInt sc1_s1G4 y_a13e) (f3_$sf3 ds_d1gF y_a13e) } f2 :: [Int] -> [Int] GblId f2 = \ (ds_d1gP :: [Int]) -> case ds_d1gP of _ { [] -> f21; : x_a137 ds1_d1gQ -> case ds1_d1gQ of _ { [] -> : @ Int x_a137 ([] @ Int); : y_a139 xs_a13a -> : @ Int (plusInt x_a137 y_a139) (f2_$sf2 xs_a13a y_a139) } } f2_$sf2 :: [Int] -> Int -> [Int] GblId f2_$sf2 = \ (sc_s1FU :: [Int]) (sc1_s1FV :: Int) -> case sc_s1FU of _ { [] -> : @ Int sc1_s1FV ([] @ Int); : y_a139 xs_a13a -> : @ Int (plusInt sc1_s1FV y_a139) (f2_$sf2 xs_a13a y_a139) } The reason that f1 is faster than the rest is that GHC is somehow able to unpack the Ints and use the more efficient +# instead of the slower plusInt. I don't immediately see the reason for the time difference between f4, f3 and f2. The inner loops all seem equivalent. Regards, Bas [1] http://hackage.haskell.org/package/criterion [2] http://hackage.haskell.org/package/ghc-core

Hi Baz,
That's quite an analysis, one I'll keep for future reference.
So, my original coding was the fastest. Guess I should stop second guessing myself. ;-)
Michael
--- On Sat, 9/11/10, Bas van Dijk
Which of these would be more costly for a long list?
f :: [Int] -> [Int] f [x] = [x] f (x:xs) = x + (head xs) : f xs
f :: [Int] -> [Int] f [x] = [x] f (x:y:xs) = x + y : f (y:xs)
Use Criterion[1] to find out: module Main where import Criterion.Main f1, f2, f3, f4 :: [Int] -> [Int] f1 [x] = [x] f1 (x:xs) = x + head xs : f1 xs f2 [x] = [x] f2 (x:y:xs) = x + y : f2 (y:xs) f3 [x] = [x] f3 (x:xs@(y:_)) = x + y : f3 xs f4 [x] = [x] f4 (x:y:xs) = x + y : go y xs where go x [] = [x] go x (y:xs) = x + y : go y xs benchMark s f = bench s $ whnf (\n -> sum $ f [1..n]) 1000000 main = defaultMain [ benchMark "f1" f1 , benchMark "f2" f2 , benchMark "f3" f3 , benchMark "f4" f4 ] now compile and run it: $ ghc --make Benchmark.hs -O2 -o benchmark ... $ ./benchmark warming up estimating clock resolution... mean is 24.29944 us (40001 iterations) found 1405 outliers among 39999 samples (3.5%) 721 (1.8%) high mild 684 (1.7%) high severe estimating cost of a clock call... mean is 1.844233 us (45 iterations) found 2 outliers among 45 samples (4.4%) 2 (4.4%) high severe benchmarking f1 collecting 100 samples, 1 iterations each, in estimated 7.917595 s bootstrapping with 100000 resamples mean: 80.04258 ms, lb 79.85129 ms, ub 80.24094 ms, ci 0.950 std dev: 1.000711 ms, lb 878.8460 us, ub 1.179558 ms, ci 0.950 found 1 outliers among 100 samples (1.0%) variance introduced by outliers: 0.990% variance is unaffected by outliers benchmarking f2 collecting 100 samples, 1 iterations each, in estimated 8.171391 s bootstrapping with 100000 resamples mean: 83.13315 ms, lb 82.93615 ms, ub 83.33348 ms, ci 0.950 std dev: 1.017999 ms, lb 904.5153 us, ub 1.174008 ms, ci 0.950 variance introduced by outliers: 0.990% variance is unaffected by outliers benchmarking f3 collecting 100 samples, 1 iterations each, in estimated 8.297014 s bootstrapping with 100000 resamples mean: 82.66586 ms, lb 82.34780 ms, ub 83.39774 ms, ci 0.950 std dev: 2.339937 ms, lb 976.2940 us, ub 4.133495 ms, ci 0.950 found 9 outliers among 100 samples (9.0%) 7 (7.0%) high mild 2 (2.0%) high severe variance introduced by outliers: 0.998% variance is unaffected by outliers benchmarking f4 collecting 100 samples, 1 iterations each, in estimated 8.080888 s bootstrapping with 100000 resamples mean: 80.80089 ms, lb 80.61719 ms, ub 80.99542 ms, ci 0.950 std dev: 968.1706 us, lb 872.7758 us, ub 1.097217 ms, ci 0.950 variance introduced by outliers: 0.990% variance is unaffected by outliers So to summarize from fastest to slowest: f1: mean: 80.04258 ms f4: mean: 80.80089 ms f3: mean: 82.66586 ms f2: mean: 83.13315 ms To find out why f1 is the fastest you can look at the core using ghc-core[2]: $ ghc-core -- -O2 Benchmark.hs f1 :: [Int] -> [Int] GblId f1 = \ (ds_d1h0 :: [Int]) -> case ds_d1h0 of _ { [] -> f11; : x_a12V ds1_d1h1 -> case ds1_d1h1 of _ { [] -> : @ Int x_a12V ([] @ Int); : ipv_s1hv ipv1_s1hw -> : @ Int (case x_a12V of _ { I# x1_a1k6 -> case ipv_s1hv of _ { I# y_a1ka -> I# (+# x1_a1k6 y_a1ka) } }) (f1_$sf1 ipv1_s1hw ipv_s1hv) } } f1_$sf1 :: [Int] -> Int -> [Int] GblId f1_$sf1 = \ (sc_s1FL :: [Int]) (sc1_s1FM :: Int) -> case sc_s1FL of _ { [] -> : @ Int sc1_s1FM ([] @ Int); : ipv_s1hv ipv1_s1hw -> : @ Int (case sc1_s1FM of _ { I# x_a1k6 -> case ipv_s1hv of _ { I# y_a1ka -> I# (+# x_a1k6 y_a1ka) } }) (f1_$sf1 ipv1_s1hw ipv_s1hv) } f4 :: [Int] -> [Int] GblId f4 = \ (ds_d1gl :: [Int]) -> case ds_d1gl of _ { [] -> f41; : x_a13f ds1_d1gm -> case ds1_d1gm of _ { [] -> : @ Int x_a13f ([] @ Int); : y_a13h xs_a13i -> : @ Int (plusInt x_a13f y_a13h) (f4_go y_a13h xs_a13i) } } f4_go = \ (x_a13k :: Int) (ds_d1gu :: [Int]) -> case ds_d1gu of _ { [] -> : @ Int x_a13k ([] @ Int); : y_a13m xs_a13n -> : @ Int (plusInt x_a13k y_a13m) (f4_go y_a13m xs_a13n) } f3 :: [Int] -> [Int] GblId f3 = \ (ds_d1gD :: [Int]) -> case ds_d1gD of _ { [] -> f31; : x_a13b ds1_d1gE -> case ds1_d1gE of _ { [] -> : @ Int x_a13b ([] @ Int); : y_a13e ds2_d1gF -> : @ Int (plusInt x_a13b y_a13e) (f3_$sf3 ds2_d1gF y_a13e) } } f3_$sf3 :: [Int] -> Int -> [Int] GblId f3_$sf3 = \ (sc_s1G3 :: [Int]) (sc1_s1G4 :: Int) -> case sc_s1G3 of _ { [] -> : @ Int sc1_s1G4 ([] @ Int); : y_a13e ds_d1gF -> : @ Int (plusInt sc1_s1G4 y_a13e) (f3_$sf3 ds_d1gF y_a13e) } f2 :: [Int] -> [Int] GblId f2 = \ (ds_d1gP :: [Int]) -> case ds_d1gP of _ { [] -> f21; : x_a137 ds1_d1gQ -> case ds1_d1gQ of _ { [] -> : @ Int x_a137 ([] @ Int); : y_a139 xs_a13a -> : @ Int (plusInt x_a137 y_a139) (f2_$sf2 xs_a13a y_a139) } } f2_$sf2 :: [Int] -> Int -> [Int] GblId f2_$sf2 = \ (sc_s1FU :: [Int]) (sc1_s1FV :: Int) -> case sc_s1FU of _ { [] -> : @ Int sc1_s1FV ([] @ Int); : y_a139 xs_a13a -> : @ Int (plusInt sc1_s1FV y_a139) (f2_$sf2 xs_a13a y_a139) } The reason that f1 is faster than the rest is that GHC is somehow able to unpack the Ints and use the more efficient +# instead of the slower plusInt. I don't immediately see the reason for the time difference between f4, f3 and f2. The inner loops all seem equivalent. Regards, Bas [1] http://hackage.haskell.org/package/criterion [2] http://hackage.haskell.org/package/ghc-core

On Sat, Sep 11, 2010 at 7:10 AM, michael rice
Hi Baz,
That's quite an analysis, one I'll keep for future reference.
So, my original coding was the fastest. Guess I should stop second guessing myself. ;-)
I think Bas's point was actually that you should second guess yourself and use Criterion/ghc-core if you want to know which way is fastest. On the other hand, these are microbrenchmarks and should not be taken as representative of how they will perform inside a larger application. Evidence is always nice to have :) Jason

With Haskell, there's always something new to learn, and on haskell-cafe there are always patient people to assist in that endeavor.
Thanks, all.
Michael
--- On Sat, 9/11/10, Jason Dagit

On Saturday 11 September 2010 14:46:48, Bas van Dijk wrote:
On Sat, Sep 11, 2010 at 5:13 AM, michael rice
wrote: Which of these would be more costly for a long list?
f :: [Int] -> [Int] f [x] = [x] f (x:xs) = x + (head xs) : f xs
f :: [Int] -> [Int] f [x] = [x] f (x:y:xs) = x + y : f (y:xs)
So to summarize from fastest to slowest:
f1: mean: 80.04258 ms f4: mean: 80.80089 ms f3: mean: 82.66586 ms f2: mean: 83.13315 ms
To find out why f1 is the fastest you can look at the core using ghc-core[2]:
f4_go :: Int -> [Int] -> [Int]
f3_$sf3 :: [Int] -> Int -> [Int]
f2_$sf2 :: [Int] -> Int -> [Int]
I don't immediately see the reason for the time difference between f4, f3 and f2. The inner loops all seem equivalent.
I don't pretend to understand at the processor level why f4's loop is faster that f2's and f3's, but I've observed on several occasions that parameter order plays a big rôle for performance. Apart from the rule to order them in increasing order of variation, on my box, getting Int# parameters first (in the core) is better (in particular, getting them before Double# parameters, doesn't seem to make so much difference for lists or boxed types in general), so that might explain the difference. And, just for kicks: f5 :: [Int] -> [Int] f5 [] = [] f5 xs@(_:ys) = zipWith (+) xs (ys ++ [0]) and for speed: f6 :: [Int] -> [Int] f6 [] = [] f6 (x:xs) = go x xs where go y [] = [y] go y (z:zs) = let s = y+z in s `seq` (s : go z zs) f7 :: [Int] -> [Int] f7 [] = [] f7 (x:xs) = go xs x where go [] y = [y] go (z:zs) y = let s = y+z in s `seq` (s : go zs z) f7: mean: 32.06289 ms f6: mean: 32.70934 ms f1: mean: 39.27808 ms f4: mean: 40.30768 ms f2: mean: 41.05561 ms f3: mean: 41.49728 ms f5: mean: 59.87034 ms Well, actually it isn't so clear cut between f2, f3 and f4, I've even had benchmark runs where f4 was slower than f2 and f3, the order of f2 and f3 changes too. Also sometimes f6 is faster than f7, but usually f7 is a little faster than f6, there's a largish gap to f1, a smaller gap to f4, closely followed by f2 and f3, f5 trailing by a long distance. So, let's look at the core for f6 and f7 (inner loops only): Rec { TestFuns.f6_$sgo :: [GHC.Types.Int] -> GHC.Prim.Int# -> [GHC.Types.Int] GblId [Arity 2 NoCafRefs Str: DmdType SL] TestFuns.f6_$sgo = \ (sc_sok :: [GHC.Types.Int]) (sc1_sol :: GHC.Prim.Int#) -> case sc_sok of _ { [] -> GHC.Types.: @ GHC.Types.Int (GHC.Types.I# sc1_sol) (GHC.Types.[] @ GHC.Types.Int); : z_ae4 zs_ae5 -> case z_ae4 of _ { GHC.Types.I# y_amw -> GHC.Types.: @ GHC.Types.Int (GHC.Types.I# (GHC.Prim.+# sc1_sol y_amw)) (TestFuns.f6_$sgo zs_ae5 y_amw) } } end Rec } Yay, the Int parameter got unboxed, so one visit less to the heap per round. But GHC switched the parameter order, so let's try to fix that by switching the order ourselves: Rec { TestFuns.f7_$sgo :: GHC.Prim.Int# -> [GHC.Types.Int] -> [GHC.Types.Int] GblId [Arity 2 NoCafRefs Str: DmdType LS] TestFuns.f7_$sgo = \ (sc_sot :: GHC.Prim.Int#) (sc1_sou :: [GHC.Types.Int]) -> case sc1_sou of _ { [] -> GHC.Types.: @ GHC.Types.Int (GHC.Types.I# sc_sot) (GHC.Types.[] @ GHC.Types.Int); : z_aeb zs_aec -> case z_aeb of _ { GHC.Types.I# y_amw -> GHC.Types.: @ GHC.Types.Int (GHC.Types.I# (GHC.Prim.+# sc_sot y_amw)) (TestFuns.f7_$sgo y_amw zs_aec) } } end Rec } Exactly what I wanted :)
Regards,
Bas

On Sat, Sep 11, 2010 at 4:16 PM, Daniel Fischer
... I don't pretend to understand at the processor level why f4's loop is faster that f2's and f3's, but I've observed on several occasions that parameter order plays a big rôle for performance. Apart from the rule to order them in increasing order of variation, on my box, getting Int# parameters first (in the core) is better (in particular, getting them before Double# parameters, doesn't seem to make so much difference for lists or boxed types in general), so that might explain the difference. ...
Thanks, I'll keep that in mind. Bas

michael rice schrieb:
Which of these would be more costly for a long list?
f :: [Int] -> [Int] f [x] = [x] f (x:xs) = x + (head xs) : f xs
f :: [Int] -> [Int] f [x] = [x] f (x:y:xs) = x + y : f (y:xs)
What about empty lists? How about zipWith (+) xs (drop 1 xs ++ [0]) ? Since I often need to combine adjacent list elements, I have defined mapAdjacent in utility-ht.

Hi Henning,
Thanks for the tip, I'll check it out.
A related but more general question: on average, what's more efficient, pattern matching or function calls?
Michael
--- On Sun, 9/12/10, Henning Thielemann
Which of these would be more costly for a long list?
f :: [Int] -> [Int] f [x] = [x] f (x:xs) = x + (head xs) : f xs
f :: [Int] -> [Int] f [x] = [x] f (x:y:xs) = x + y : f (y:xs)
What about empty lists? How about zipWith (+) xs (drop 1 xs ++ [0]) ? Since I often need to combine adjacent list elements, I have defined mapAdjacent in utility-ht.
participants (6)
-
Bas van Dijk
-
Dan Doel
-
Daniel Fischer
-
Henning Thielemann
-
Jason Dagit
-
michael rice