
Summary: yes, we can, by a LOT. Yes, I know how. Yes, I've done some benchmarking to demonstrate. Yes, it is even very simple. And yes, the results are correct, including laziness requirements. Background: I was looking at the code for Data.List.inits in base-4.7.0.0 (function renamed for clarity): initsDL :: [a] -> [[a]] initsDL xs = [] : case xs of [] -> [] x : xs' -> map (x :) (initsDL xs') The recursive maps made me suspicious. I decided to try writing a different version: initsHO xs = map ($ []) (scanl q id xs) where q f x = f . (x:) I mentioned this on #haskell and noted that it would be a nice exercise to write it with less fancy function footwork using map reverse. rasfar responded (modulo naming) with initsR xs = map reverse (scanl q [] xs) where q acc x = x : acc rasfar ran a few informal benchmarks suggesting that initsR is faster than initsHO, which in turn is significantly faster than the current implementation in Data.List. I have now run Criterion benchmarks testing performance in three ways: reduction of inits [1..n] to normal form, reduction of (inits [1..n])!!(n-1) to normal form, and reduction of (length (inits [1..n])) to normal form. In each case the Criterion test case is the list [1..n], so there is no risk of some sort of weird fusion occurring. The results are extremely clear in all three test scenarios, with a variety of values of n: initsR is a little faster than initsHO, and inits HO is much, much faster than initsDL. The differences become apparent even for small values of n (10-50), but when n gets up to 100000, you don't even want to wait for initsDL to finish. This was all using ghc 7.8.3 with -O2. I've attached my benchmarking code, which also includes some basic correctness and laziness tests. Note that the first group of benchmarks tests a variety of vaues of n. For the second group, I was tired so I just twiddled it by hand. For the third group... Criterion estimated that benchmarking length(initsDL(100000)) would take 124582.9 seconds, so I decided to stop there. Conclusion: we should replace Data.List.inits with initsR, unless someone comes up with something that beats initsR.

Am 19.07.2014 09:51, schrieb David Feuer:
Background: I was looking at the code for Data.List.inits in base-4.7.0.0 (function renamed for clarity):
initsDL :: [a] -> [[a]] initsDL xs = [] : case xs of [] -> [] x : xs' -> map (x :) (initsDL xs')
The recursive maps made me suspicious. I decided to try writing a different version:
initsHO xs = map ($ []) (scanl q id xs) where q f x = f . (x:)
I mentioned this on #haskell and noted that it would be a nice exercise to write it with less fancy function footwork using map reverse. rasfar responded (modulo naming) with
initsR xs = map reverse (scanl q [] xs) where q acc x = x : acc
The 'map' gives efficient access to the first elements of the sub-lists, thus it seems that initsDL is faster than initsR when you access only prefixes of the sub-lists. E.g. Prelude> head $ last $ inits [0..10000000::Int]

I'm not sure about the theory of it, but if I load the module in GHCi and run head $ last $ initsR [0..100000::Int] I get a response *immediately*. If I run head $ last $ initsDL [0..100000::Int] I wrote this email while waiting for that to complete, and gave up on it. There may be an asymptotic performance bug here. On Sat, Jul 19, 2014 at 4:14 AM, Henning Thielemann < schlepptop@henning-thielemann.de> wrote:
Am 19.07.2014 09:51, schrieb David Feuer:
Background: I was looking at the code for Data.List.inits in
base-4.7.0.0 (function renamed for clarity):
initsDL :: [a] -> [[a]] initsDL xs = [] : case xs of [] -> [] x : xs' -> map (x :) (initsDL xs')
The recursive maps made me suspicious. I decided to try writing a different version:
initsHO xs = map ($ []) (scanl q id xs) where q f x = f . (x:)
I mentioned this on #haskell and noted that it would be a nice exercise to write it with less fancy function footwork using map reverse. rasfar responded (modulo naming) with
initsR xs = map reverse (scanl q [] xs) where q acc x = x : acc
The 'map' gives efficient access to the first elements of the sub-lists, thus it seems that initsDL is faster than initsR when you access only prefixes of the sub-lists. E.g.
Prelude> head $ last $ inits [0..10000000::Int]

Am 19.07.2014 10:25, schrieb David Feuer:
I'm not sure about the theory of it, but if I load the module in GHCi and run head $ last $ initsR [0..100000::Int] I get a response *immediately*. If I run head $ last $ initsDL [0..100000::Int] I wrote this email while waiting for that to complete, and gave up on it. There may be an asymptotic performance bug here.
Sorry, I got the implementation of initsDL wrong (recursed into the wrong inits-function).

I don't really see how recursing into some other inits function would help. The same laziness requirement should hold everywhere: take (n+1) $ inits ([1 .. n] ++ undefined) = inits [1 .. n] Maybe I don't understand what you mean properly. On Sat, Jul 19, 2014 at 4:29 AM, Henning Thielemann < schlepptop@henning-thielemann.de> wrote:
Am 19.07.2014 10:25, schrieb David Feuer:
I'm not sure about the theory of it, but if I load the module in GHCi
and run head $ last $ initsR [0..100000::Int] I get a response *immediately*. If I run head $ last $ initsDL [0..100000::Int] I wrote this email while waiting for that to complete, and gave up on it. There may be an asymptotic performance bug here.
Sorry, I got the implementation of initsDL wrong (recursed into the wrong inits-function).

Am 19.07.2014 10:45, schrieb David Feuer:
I don't really see how recursing into some other inits function would help. The same laziness requirement should hold everywhere:
take (n+1) $ inits ([1 .. n] ++ undefined) = inits [1 .. n]
Maybe I don't understand what you mean properly.
I wrote my own initsDL (with different name) and in the recursive call did not call initsDL but accidentally initsR (with different name), thus initsDL became essentially initsR except for the first recursion step.

Good catch, the common inits and tails version of subsequences makes pretty heavy use of that, as do many of the uses of inits where you use it to get k-element substrings. I'd be curious if an okasaki-style catenable output restricted deque could recover both efficient head access and reasonably efficient appends, but the overhead of that is probably way out of line with the performance at stake! -Edward On Sat, Jul 19, 2014 at 4:14 AM, Henning Thielemann < schlepptop@henning-thielemann.de> wrote:
Am 19.07.2014 09:51, schrieb David Feuer:
Background: I was looking at the code for Data.List.inits in
base-4.7.0.0 (function renamed for clarity):
initsDL :: [a] -> [[a]] initsDL xs = [] : case xs of [] -> [] x : xs' -> map (x :) (initsDL xs')
The recursive maps made me suspicious. I decided to try writing a different version:
initsHO xs = map ($ []) (scanl q id xs) where q f x = f . (x:)
I mentioned this on #haskell and noted that it would be a nice exercise to write it with less fancy function footwork using map reverse. rasfar responded (modulo naming) with
initsR xs = map reverse (scanl q [] xs) where q acc x = x : acc
The 'map' gives efficient access to the first elements of the sub-lists, thus it seems that initsDL is faster than initsR when you access only prefixes of the sub-lists. E.g.
Prelude> head $ last $ inits [0..10000000::Int]
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (3)
-
David Feuer
-
Edward Kmett
-
Henning Thielemann