Re: Formal proposal: replace Data.List.inits

Minor errors in copying/memory:
initsHO = map ($[]) . scanl q id
where
q f x = f . (x :)
On Sun, Jul 20, 2014 at 2:03 AM, David Feuer
As previously discussed, Data.List.inits has terrible performance: something as simple as
print $ length $ inits [100000]
will take an absurdly long time. The simplest decent improvement seems to be
initsHO = map ($[]) . scanl q [] where q f x = f . (q :)
A slight transformation of this by Andrew Seniuk (rasfar on #haskell) gives the slightly faster
initsR = map reverse . scanl (flip (:)) []
As Henning Thielemann and Edward Kmett pointed out, these perform poorly when calculating things like `map head $ drop 100000 $ inits [100050]` because the costs of the reverses or compositions are monolithic and unshared. Edward Kmett pointed out that this could probably be fixed by using a queue (although he isn't sure it's worth doing so). As it turns out, it can, and with only a trivial penalty in other cases. The following uses a stripped-down partial implementation of Okasaki's banker's queue (see the attached file for a version with explanatory comments as well as the benchmark code):
import Data.Bits (popCount) import Data.Word (Word)
data Queue a = Queue {-# UNPACK #-} !Word [a] [a]
queue :: Word -> [a] -> [a] -> Queue a queue lp1 f r | popCount lp1 /= 1 = Queue lp1 f r | otherwise = Queue lp1 (f ++ reverse r) []
emptyQ :: Queue a emptyQ = Queue 1 [] []
snocQ :: Queue a -> a -> Queue a snocQ (Queue lp1 f r) x = queue (lp1 + 1) f (x:r)
toListQ :: Queue a -> [a] toListQ (Queue _ f r) = f ++ reverse r
initsQ :: [a] -> [[a]] initsQ = map toListQ . scanl snocQ emptyQ
This implementation appears to take care of the performance wrinkle in initsR and initsHO, while being only very slightly slower in other contexts. It's longer, certainly, but it's not long.
I have attached the Criterion report produced by the attached program comparing initsR to initsQ. Previous tests definitively showed that initsHO is a small constant factor worse than initsR, and Data.List.inits is incomparably worse, so I limited this round to initsR and initsQ.
I therefore propose that we should replace the current implementation of inits with initsQ, the queue-based one, unless someone comes up with something faster and/or simpler very soon.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 I suppose this is a no-brainer. +1 The patch should include a reasonable amount of the proposal email, preferably in the source itself, to explain why we're changing it. - -- Alexander alexander@plaimi.net https://secure.plaimi.net/~alexander -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iF4EAREIAAYFAlPLoNQACgkQRtClrXBQc7WjcQD+MbrIOQ/YKkdp0jm27hF73/MC B2pCgcoPeqt7PhNusHQA/2aeCbspUvwgjAEYJJNualLNvwkdELI9g/gdrtqDHEEn =/8/h -----END PGP SIGNATURE-----

The only question that's not quite a no-brainer is whether the situations
that slow down the initsR implementation will ever occur in practice. I'm
not sure of the answer.
On Jul 20, 2014 6:58 AM, "Alexander Berntsen"
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
I suppose this is a no-brainer. +1
The patch should include a reasonable amount of the proposal email, preferably in the source itself, to explain why we're changing it. - -- Alexander alexander@plaimi.net https://secure.plaimi.net/~alexander -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iF4EAREIAAYFAlPLoNQACgkQRtClrXBQc7WjcQD+MbrIOQ/YKkdp0jm27hF73/MC B2pCgcoPeqt7PhNusHQA/2aeCbspUvwgjAEYJJNualLNvwkdELI9g/gdrtqDHEEn =/8/h -----END PGP SIGNATURE----- _______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

David Feuer wrote:
The only question that's not quite a no-brainer is whether the situations that slow down the initsR implementation will ever occur in practice. I'm not sure of the answer.
Do such cases exist? For reference, we're discussing initsR = map reverse . scanl (flip (:)) [] initsHO = map ($ []) . scanl (\f x -> f . (x:)) id initsR is slow if for some reason, only the first couple of elements of the result lists are used in a computation, as happens in foldl' (+) 0 . map head . tail . inits In that case, initsR shows quadratic behavior. However, that is also true for initsHO, because of the way that the calls to (.) are nested. For better laziness, one can base 'inits' on 'take': initsT xs = [] : zipWith (\l _ -> take l xs) [1..] xs which with a bit of tuning comes surprisingly close to initsR [*]: initsT' xs = [] : go (1 :: Int) xs where go !l (_:ls) = take' l xs : go (l+1) ls go _ [] = [] take' 0 _ = [] take' n (x:xs) = x : take' (n-1) xs I find it surprising because the function has to allocate a thunk for every call of take', whereas the 'reverse' calls of initsR can directly evaluate the (:) constructors as they construct their results. In any case my preference is for 'initsR', perhaps with a hint in the documentation that the zipWith/take variant may be faster in some cases. Cheers, Bertram [*] some ad-hoc timings (sum' = foldl' (+) 0, all inits* functions compiled with -O2): - using only first elements of result lists:
sum' $ map head $ tail $ inits [1..10000] 10000 (5.38 secs, 5905331504 bytes) sum' $ map head $ tail $ initsR [1..10000] 10000 (0.55 secs, 1203045184 bytes) sum' $ map head $ tail $ initsHO [1..10000] 10000 (1.11 secs, 1205462216 bytes) sum' $ map head $ tail $ initsT [1..10000] 10000 (0.01 secs, 7226208 bytes) sum' $ map head $ tail $ initsT' [1..10000] 10000 (0.01 secs, 8119224 bytes)
- using whole result:
sum' $ map sum' $ tail $ inits [1..10000] 166716670000 (7.79 secs, 7900276296 bytes) sum' $ map sum' $ tail $ initsR [1..10000] 166716670000 (1.35 secs, 2006560272 bytes) sum' $ map sum' $ tail $ initsHO [1..10000] 166716670000 (1.93 secs, 2003170216 bytes) sum' $ map sum' $ tail $ initsT [1..10000] 166716670000 (2.16 secs, 3603697344 bytes) sum' $ map sum' $ tail $ initsT' [1..10000] 166716670000 (1.61 secs, 3603897320 bytes)

I don't know why you're looking at initsHO, which is almost the same as initsR but slightly slower. Have you looked at initsQ (preferably implemented with scanl')? That's the one that fixes the bad cases. On Aug 19, 2014 12:47 PM, "Bertram Felgenhauer" < bertram.felgenhauer@googlemail.com> wrote:
David Feuer wrote:
The only question that's not quite a no-brainer is whether the situations that slow down the initsR implementation will ever occur in practice. I'm not sure of the answer.
Do such cases exist? For reference, we're discussing
initsR = map reverse . scanl (flip (:)) [] initsHO = map ($ []) . scanl (\f x -> f . (x:)) id
initsR is slow if for some reason, only the first couple of elements of the result lists are used in a computation, as happens in
foldl' (+) 0 . map head . tail . inits
In that case, initsR shows quadratic behavior. However, that is also true for initsHO, because of the way that the calls to (.) are nested.
For better laziness, one can base 'inits' on 'take':
initsT xs = [] : zipWith (\l _ -> take l xs) [1..] xs
which with a bit of tuning comes surprisingly close to initsR [*]:
initsT' xs = [] : go (1 :: Int) xs where go !l (_:ls) = take' l xs : go (l+1) ls go _ [] = [] take' 0 _ = [] take' n (x:xs) = x : take' (n-1) xs
I find it surprising because the function has to allocate a thunk for every call of take', whereas the 'reverse' calls of initsR can directly evaluate the (:) constructors as they construct their results.
In any case my preference is for 'initsR', perhaps with a hint in the documentation that the zipWith/take variant may be faster in some cases.
Cheers,
Bertram
[*] some ad-hoc timings (sum' = foldl' (+) 0, all inits* functions compiled with -O2):
- using only first elements of result lists:
sum' $ map head $ tail $ inits [1..10000] 10000 (5.38 secs, 5905331504 bytes) sum' $ map head $ tail $ initsR [1..10000] 10000 (0.55 secs, 1203045184 bytes) sum' $ map head $ tail $ initsHO [1..10000] 10000 (1.11 secs, 1205462216 bytes) sum' $ map head $ tail $ initsT [1..10000] 10000 (0.01 secs, 7226208 bytes) sum' $ map head $ tail $ initsT' [1..10000] 10000 (0.01 secs, 8119224 bytes)
- using whole result:
sum' $ map sum' $ tail $ inits [1..10000] 166716670000 (7.79 secs, 7900276296 bytes) sum' $ map sum' $ tail $ initsR [1..10000] 166716670000 (1.35 secs, 2006560272 bytes) sum' $ map sum' $ tail $ initsHO [1..10000] 166716670000 (1.93 secs, 2003170216 bytes) sum' $ map sum' $ tail $ initsT [1..10000] 166716670000 (2.16 secs, 3603697344 bytes) sum' $ map sum' $ tail $ initsT' [1..10000] 166716670000 (1.61 secs, 3603897320 bytes)
Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

David Feuer wrote:
I don't know why you're looking at initsHO, which is almost the same as initsR but slightly slower. Have you looked at initsQ (preferably implemented with scanl')? That's the one that fixes the bad cases.
No, because I looked at the wrong part of your mail. Adding timings:
- using only first elements of result lists:
sum' $ map head $ tail $ inits [1..10000] 10000 (5.38 secs, 5905331504 bytes) sum' $ map head $ tail $ initsR [1..10000] 10000 (0.55 secs, 1203045184 bytes) sum' $ map head $ tail $ initsHO [1..10000] 10000 (1.11 secs, 1205462216 bytes) sum' $ map head $ tail $ initsT [1..10000] 10000 (0.01 secs, 7226208 bytes) sum' $ map head $ tail $ initsT' [1..10000] 10000 (0.01 secs, 8119224 bytes) sum' $ map head $ tail $ initsQ [1..10000] 10000 (0.01 secs, 7309616 bytes) sum' $ map head $ tail $ initsQ' [1..10000] 10000 (0.01 secs, 8908320 bytes)
All these results are pretty meaningless beyond demonstrating that initsT and initsQ are sufficiently lazy.
- using whole result:
sum' $ map sum' $ tail $ inits [1..10000] 166716670000 (7.79 secs, 7900276296 bytes) sum' $ map sum' $ tail $ initsR [1..10000] 166716670000 (1.35 secs, 2006560272 bytes) sum' $ map sum' $ tail $ initsHO [1..10000] 166716670000 (1.93 secs, 2003170216 bytes) sum' $ map sum' $ tail $ initsT [1..10000] 166716670000 (2.16 secs, 3603697344 bytes) sum' $ map sum' $ tail $ initsT' [1..10000] 166716670000 (1.61 secs, 3603897320 bytes)
sum' $ map sum' $ tail $ initsQ [1..10000] 166716670000 (3.30 secs, 3194869384 bytes) sum' $ map sum' $ tail $ initsQ' [1..10000] 166716670000 (3.26 secs, 3195015296 bytes)
I've also run some criterion benchmarks, see the html files at http://int-e.eu/~bf3/haskell/inits/ (Is there a way to get criterion to split the summary table in its html output by groups?) Inits.hs contains the various inits implementations. inits.tar.gz contains everything. Interestingly, initsQ looks better than initsT in these benchmarks. Cheers, Bertram

On Aug 19, 2014 6:42 PM, "Bertram Felgenhauer" < bertram.felgenhauer@googlemail.com> wrote:
Interestingly, initsQ looks better than initsT in these benchmarks.
I love the simplicity of initsT, but unlike initsQ (with Joachim Breitner's fusing scanl') it won't fuse properly with with a list producer. I don't know if that can be fixed or not. If so ... that would be great. In any case, it would be very interesting indeed to see how it performs in a library that uses stream fusion instead of fold/build—there's a library in the vector package for that, and it currently uses the broken version of inits; you might want to try initsT out with that one.

Ah, I am being a little silly. Tunnel vision. The fusion only matters if
you're skipping a bunch of lists. So ... I don't know why your results are
worse, really.
On Aug 19, 2014 7:04 PM, "David Feuer"
On Aug 19, 2014 6:42 PM, "Bertram Felgenhauer" < bertram.felgenhauer@googlemail.com> wrote:
Interestingly, initsQ looks better than initsT in these benchmarks.
I love the simplicity of initsT, but unlike initsQ (with Joachim Breitner's fusing scanl') it won't fuse properly with with a list producer. I don't know if that can be fixed or not. If so ... that would be great. In any case, it would be very interesting indeed to see how it performs in a library that uses stream fusion instead of fold/build—there's a library in the vector package for that, and it currently uses the broken version of inits; you might want to try initsT out with that one.

What I'm seeing is very different from what you're seeing. In
particular, my tests with print $ sum $ concat $ inits [1..10000]
indicate that your initsT' function is much faster, and allocates much
less than, the initsQ version. Could you explain what your test was
doing? What version of GHC were you running them on? Another thing: I
also see the performance impact of using take' instead of take, but I
can't for the life of me understand why. Do you know?
If others see similar results, I think we should scrap initsQ and use
your initsT'.
On Tue, Aug 19, 2014 at 6:42 PM, Bertram Felgenhauer
David Feuer wrote:
I don't know why you're looking at initsHO, which is almost the same as initsR but slightly slower. Have you looked at initsQ (preferably implemented with scanl')? That's the one that fixes the bad cases.
No, because I looked at the wrong part of your mail. Adding timings:
- using only first elements of result lists:
sum' $ map head $ tail $ inits [1..10000] 10000 (5.38 secs, 5905331504 bytes) sum' $ map head $ tail $ initsR [1..10000] 10000 (0.55 secs, 1203045184 bytes) sum' $ map head $ tail $ initsHO [1..10000] 10000 (1.11 secs, 1205462216 bytes) sum' $ map head $ tail $ initsT [1..10000] 10000 (0.01 secs, 7226208 bytes) sum' $ map head $ tail $ initsT' [1..10000] 10000 (0.01 secs, 8119224 bytes) sum' $ map head $ tail $ initsQ [1..10000] 10000 (0.01 secs, 7309616 bytes) sum' $ map head $ tail $ initsQ' [1..10000] 10000 (0.01 secs, 8908320 bytes)
All these results are pretty meaningless beyond demonstrating that initsT and initsQ are sufficiently lazy.
- using whole result:
sum' $ map sum' $ tail $ inits [1..10000] 166716670000 (7.79 secs, 7900276296 bytes) sum' $ map sum' $ tail $ initsR [1..10000] 166716670000 (1.35 secs, 2006560272 bytes) sum' $ map sum' $ tail $ initsHO [1..10000] 166716670000 (1.93 secs, 2003170216 bytes) sum' $ map sum' $ tail $ initsT [1..10000] 166716670000 (2.16 secs, 3603697344 bytes) sum' $ map sum' $ tail $ initsT' [1..10000] 166716670000 (1.61 secs, 3603897320 bytes)
sum' $ map sum' $ tail $ initsQ [1..10000] 166716670000 (3.30 secs, 3194869384 bytes) sum' $ map sum' $ tail $ initsQ' [1..10000] 166716670000 (3.26 secs, 3195015296 bytes)
I've also run some criterion benchmarks, see the html files at
http://int-e.eu/~bf3/haskell/inits/
(Is there a way to get criterion to split the summary table in its html output by groups?)
Inits.hs contains the various inits implementations. inits.tar.gz contains everything.
Interestingly, initsQ looks better than initsT in these benchmarks.
Cheers,
Bertram
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

The problem I was seeing with initsQ2 was caused by a bad interaction of
different things fusing badly. It seems relatively unlikely that the
underlying issues with list fusion will be resolved by the next GHC
release, so we need to decide what to do about that. Some options, from my
favorite to my least favorite:
1. Use the fusing scanl', but mark inits NOINLINE. The proposed
implementation wouldn't really wouldn't be such a hot fusion prospect in
any case. The only interesting sort of fusion I see is (concat . inits),
which we *could* implement with a special rule, but no one else seems to
think it's worth the trouble.
2. Use a non-fusing scanl'.
3. Manually inline the scanl' and such, using what GHC compiles initsQ2
into instead of using initsQ2.
On Sat, Aug 30, 2014 at 9:27 PM, David Feuer
What I'm seeing is very different from what you're seeing. In particular, my tests with print $ sum $ concat $ inits [1..10000] indicate that your initsT' function is much faster, and allocates much less than, the initsQ version. Could you explain what your test was doing? What version of GHC were you running them on? Another thing: I also see the performance impact of using take' instead of take, but I can't for the life of me understand why. Do you know?
If others see similar results, I think we should scrap initsQ and use your initsT'.
On Tue, Aug 19, 2014 at 6:42 PM, Bertram Felgenhauer
wrote: David Feuer wrote:
I don't know why you're looking at initsHO, which is almost the same as initsR but slightly slower. Have you looked at initsQ (preferably implemented with scanl')? That's the one that fixes the bad cases.
No, because I looked at the wrong part of your mail. Adding timings:
- using only first elements of result lists:
sum' $ map head $ tail $ inits [1..10000] 10000 (5.38 secs, 5905331504 bytes) sum' $ map head $ tail $ initsR [1..10000] 10000 (0.55 secs, 1203045184 bytes) sum' $ map head $ tail $ initsHO [1..10000] 10000 (1.11 secs, 1205462216 bytes) sum' $ map head $ tail $ initsT [1..10000] 10000 (0.01 secs, 7226208 bytes) sum' $ map head $ tail $ initsT' [1..10000] 10000 (0.01 secs, 8119224 bytes) sum' $ map head $ tail $ initsQ [1..10000] 10000 (0.01 secs, 7309616 bytes) sum' $ map head $ tail $ initsQ' [1..10000] 10000 (0.01 secs, 8908320 bytes)
All these results are pretty meaningless beyond demonstrating that initsT and initsQ are sufficiently lazy.
- using whole result:
sum' $ map sum' $ tail $ inits [1..10000] 166716670000 (7.79 secs, 7900276296 bytes) sum' $ map sum' $ tail $ initsR [1..10000] 166716670000 (1.35 secs, 2006560272 bytes) sum' $ map sum' $ tail $ initsHO [1..10000] 166716670000 (1.93 secs, 2003170216 bytes) sum' $ map sum' $ tail $ initsT [1..10000] 166716670000 (2.16 secs, 3603697344 bytes) sum' $ map sum' $ tail $ initsT' [1..10000] 166716670000 (1.61 secs, 3603897320 bytes)
sum' $ map sum' $ tail $ initsQ [1..10000] 166716670000 (3.30 secs, 3194869384 bytes) sum' $ map sum' $ tail $ initsQ' [1..10000] 166716670000 (3.26 secs, 3195015296 bytes)
I've also run some criterion benchmarks, see the html files at
http://int-e.eu/~bf3/haskell/inits/
(Is there a way to get criterion to split the summary table in its html output by groups?)
Inits.hs contains the various inits implementations. inits.tar.gz contains everything.
Interestingly, initsQ looks better than initsT in these benchmarks.
Cheers,
Bertram
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Hi, Am Freitag, den 05.09.2014, 18:07 -0400 schrieb David Feuer:
The problem I was seeing with initsQ2 was caused by a bad interaction of different things fusing badly. It seems relatively unlikely that the underlying issues with list fusion will be resolved by the next GHC release, so we need to decide what to do about that. Some options, from my favorite to my least favorite:
1. Use the fusing scanl', but mark inits NOINLINE. The proposed implementation wouldn't really wouldn't be such a hot fusion prospect in any case. The only interesting sort of fusion I see is (concat . inits), which we *could* implement with a special rule, but no one else seems to think it's worth the trouble.
This seems to be a good way forward. This way, we get a faster inits in, and can then consider the issue of fusing separately. Greetings, Joachim
-- Joachim “nomeata” Breitner mail@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nomeata@joachim-breitner.de • GPG-Key: 0xF0FBF51F Debian Developer: nomeata@debian.org
participants (4)
-
Alexander Berntsen
-
Bertram Felgenhauer
-
David Feuer
-
Joachim Breitner