Speeding up Data.List.inits

19 Jul 2014

      Summary: yes, we can, by a LOT. Yes, I know how. Yes, I've done some
benchmarking to demonstrate. Yes, it is even very simple. And yes, the
results are correct, including laziness requirements.

Background: I was looking at the code for Data.List.inits in base-4.7.0.0
(function renamed for clarity):

initsDL                   :: [a] -> [[a]]
initsDL xs                =  [] : case xs of
                                  []      -> []
                                  x : xs' -> map (x :) (initsDL xs')

The recursive maps made me suspicious. I decided to try writing a different
version:

initsHO xs = map ($ []) (scanl q id xs)
  where
    q f x = f . (x:)

I mentioned this on #haskell and noted that it would be a nice exercise to
write it with less fancy function footwork using map reverse. rasfar
responded (modulo naming) with

initsR xs = map reverse (scanl q [] xs)
  where
    q acc x = x : acc

rasfar ran a few informal benchmarks suggesting that initsR is faster than
initsHO, which in turn is significantly faster than the current
implementation in Data.List. I have now run Criterion benchmarks testing
performance in three ways: reduction of inits [1..n] to normal form,
reduction of (inits [1..n])!!(n-1) to normal form, and reduction of (length
(inits [1..n])) to normal form. In each case the Criterion test case is the
list [1..n], so there is no risk of some sort of weird fusion occurring.
The results are extremely clear in all three test scenarios, with a variety
of values of n:  initsR is a little faster than initsHO, and inits HO is
much, much faster than initsDL. The differences become apparent even for
small values of n (10-50), but when n gets up to 100000, you don't even
want to wait for initsDL to finish. This was all using ghc 7.8.3 with -O2.
I've attached my benchmarking code, which also includes some basic
correctness and laziness tests. Note that the first group of benchmarks
tests a variety of vaues of n. For the second group, I was tired so I just
twiddled it by hand. For the third group... Criterion estimated that
benchmarking length(initsDL(100000)) would take 124582.9 seconds, so I
decided to stop there.

Conclusion: we should replace Data.List.inits with initsR, unless someone
comes up with something that beats initsR.

David Feuer

Henning Thielemann

David Feuer

Henning Thielemann

David Feuer

Henning Thielemann

Edward Kmett

tags

participants (3)