[GHC] #9345: Data.List.inits is extremely slow

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: libraries/base | Version: 7.8.3 Keywords: | Differential Revisions: Operating System: Unknown/Multiple | Architecture: Type of failure: Runtime | Unknown/Multiple performance bug | Difficulty: Easy (less Test Case: | than 1 hour) Blocking: | Blocked By: | Related Tickets: -------------------------------------+------------------------------------- As discussed on libraries@haskell.org, `Data.List.inits` is extremely slow (try running `print $ length $ inits [1..100000]` if you don't believe me). As discussed, there are at least two reasonable fixes. One of them (named `initsR` in the attached) is a one-liner and gives very good performance in general, but poor performance in certain cases that may or may not appear in real code. The other (named `initsQ` in the attached) is slightly more complex and slightly slower in general, but its performance appears to be robust. I would personally lean toward `initsQ` for `Data.List`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Operating System: Unknown/Multiple Differential Revisions: | Type of failure: Runtime Architecture: | performance bug Unknown/Multiple | Test Case: Difficulty: Easy (less | Blocking: than 1 hour) | Blocked By: | Related Tickets: | -------------------------------------+------------------------------------- Comment (by nomeata): Interesting. Can you elaborate (just for the record here) when `initsQ` is faster? I would find it strange to add such complexity “just for” `inits`. Now, if we had such a `Queue` type (which looks useful in general) in base anyways, using it would be a different story... How is the performance when the existing `Seq` is used instead of `Queue`? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:1 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Operating System: Unknown/Multiple Differential Revisions: | Type of failure: Runtime Architecture: | performance bug Unknown/Multiple | Test Case: Difficulty: Easy (less | Blocking: than 1 hour) | Blocked By: | Related Tickets: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:1 nomeata]:
Interesting. Can you elaborate (just for the record here) when `initsQ` is faster?
I would find it strange to add such complexity “just for” `inits`. Now, if we had such a `Queue` type (which looks useful in general) in base anyways, using it would be a different story...
How is the performance when the existing `Seq` is used instead of `Queue`?
I'm attaching the Criterion comparison of `initsR`, `initsQ`, and a new `initsS` that uses `Data.Sequence`. On most tests, `initsS` is much slower than `initsQ`, reflecting the fact that sequences are much heavier data structures than these snoc-builder almost-queues. `initsQ` is faster when the heads (or more generally the first few elements) of several of the elements of the result list are inspected by traversing the result list. If you just calculate `head $ initsR [1..n] !! n`, then the time required to reverse the final list can be amortized over the steps in `!!n`. But if you were to calculate, for some inexplicable reason, `sum $ map head $ initsR [1..n]`, you only have `n` steps over which to amortize `n^2` work. The vague notion I've had in the back of my mind for a vaguely realistic example is some sort of breadth-first traversal of `inits [1..n]`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:2 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Operating System: Unknown/Multiple Differential Revisions: | Type of failure: Runtime Architecture: | performance bug Unknown/Multiple | Test Case: Difficulty: Easy (less | Blocking: than 1 hour) | Blocked By: | Related Tickets: | -------------------------------------+------------------------------------- Comment (by nomeata): JFTR: Seeing that the rear list of `Queue` is only consed and reversed, I tried to change it to a difference list (`[a] -> [a]`). The difference in performance is small and mixed, sometimes slightly better, sometimes slightly worse. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:3 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Operating System: Unknown/Multiple Differential Revisions: | Type of failure: Runtime Architecture: | performance bug Unknown/Multiple | Test Case: Difficulty: Easy (less | Blocking: than 1 hour) | Blocked By: | Related Tickets: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:1 nomeata]:
I would find it strange to add such complexity “just for” `inits`. Now, if we had such a `Queue` type (which looks useful in general) in base anyways, using it would be a different story...
If it were added, it wouldn't be called `Queue`, because (unlike the real banker's queue) it's actually incapable of implementing `uncons` efficiently. It's really just a very limited, but very fast and light, list builder. If it's useful for things other than implementing `inits`, I would not be opposed to adding it in a separate module with some sufficiently clear name. I don't know enough about what sorts of things should and should not be provided by base to really comment much beyond that. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:4 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.4 Component: libraries/base | Version: 7.8.3 Resolution: | Keywords: Differential: | Operating System: Unknown/Multiple Architecture: Unknown/Multiple | Type of failure: Runtime Difficulty: Easy (less than 1 | performance bug hour) | Test Case: Blocked By: | Blocking: Related Tickets: | -------------------------------------+------------------------------------- Changes (by dfeuer): * milestone: => 7.8.4 Comment: I'm taking the liberty of setting an early milestone for this one because the current situation is silly and the change is entirely local. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:5 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Replying to [comment:1 nomeata]:
I would find it strange to add such complexity “just for” `inits`. Now, if we had such a `Queue` type (which looks useful in general) in base anyways, using it would be a different story...
If it were added, it wouldn't be called `Queue`, because (unlike the real banker's queue) it's actually incapable of implementing `uncons` efficiently. It's really just a very limited, but very fast and light,
#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by nomeata): Replying to [comment:4 dfeuer]: list builder. If it's useful for things other than implementing `inits`, I would not be opposed to adding it in a separate module with some sufficiently clear name. I don't know enough about what sorts of things should and should not be provided by base to really comment much beyond that. After reading more of the code I agree that this not too big “just for” inits, and I’m in favor of adding it. I also thought about ways to make it even faster. In particular, I tried to make `GHC` get rid of `Queue` and simply pass the three values around as arguments to the inner loop, and also make this fuse. With this definition it works: {{{ myScanl' :: (b -> a -> b) -> b -> [a] -> [b] myScanl' f a bs = build $ \c n -> a `seq` a `c` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `c` g b'))) (\b -> b `seq` n) bs a initsQ2 :: [a] -> [[a]] initsQ2 = map toListQ . myScanl' snocQ emptyQ }}} and yields consistently better results than `initsQ`. (See attached report, `initsQ2`. Ignore `initsQDL`, thats the same, but with the rear list implemented as a difference list, which makes little difference.) This raises the question whether we want such a fusing strict `scanl'` in `Data.List` as well – it turned out to be useful here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:6 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): That's a neat idea. If I'm not mistaken, `seq` in the argument to `build` leads to a proof obligation to justify the safety of the fusion. Do you have a proof? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:7 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by nomeata):
That's a neat idea. If I'm not mistaken, seq in the argument to build leads to a proof obligation to justify the safety of the fusion. Do you have a proof?
No. What exactly is the proof obligation in this case? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:8 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:8 nomeata]:
That's a neat idea. If I'm not mistaken, seq in the argument to build leads to a proof obligation to justify the safety of the fusion. Do you have a proof?
No. What exactly is the proof obligation in this case?
According to the [http://www.haskell.org/haskellwiki/Correctness_of_short_cut_fusion short cut fusion page] on the Haskell Wiki, the foldr/build equivalence in general is only guaranteed when the argument to build does not use `seq`; otherwise there are ways to break it so that the fused program is less lazy than it should be. That page gives one way around that limitation, which involves restricting the other arguments to `foldr`, but that's not an option here—a "bare" `build` is exposed to the world. So if I understand things right (which is never guaranteed at all), you're left having to write your own proof, from scratch or based on other theorems not mentioned on that page, that the foldr/build rule is safe here. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:9 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): I've worked on it a bit, and if I'm not mistaken, `myScanl'` is safe if, whenever `a /= _|_`, and however I choose `evil`, `wrong`, and `f`, the "correct" expression {{{ e1 = a `evil` foldr evil wrong $ foldr (\b g x -> (let b' = f x b in b' `seq` (b' : g b'))) (\b -> b `seq` []) bs a }}} gives the same result as the fused expression {{{ e2 = a `evil` foldr (\b g x -> (let b' = f x b in b' `seq` (b' `evil` g b'))) (\b -> b `seq` wrong) bs a }}} But I'm not sure how to prove this or find a counterexample. Any ideas? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:10 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by nomeata): I forgot to attach the report mentioned in comment:6, doing that now. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:11 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:11 nomeata]:
I forgot to attach the report mentioned in comment:6, doing that now.
I was finally able to figure out how to read that file, and ''wow''. Those are some impressive results. If we can verify that the fusion on the left is safe, `initsQ2` is clearly the way to go. If we can't, it looks like we probably want to use the same thing, but without the `build`. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:12 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by nomeata):
I was finally able to figure out how to read that file, and wow. Those are some impressive results.
Of course the allocation and destruction of the `Queue` is not for free. Just to make sure I didn’t do anything stupid: Can you reproduce my results? If we are worried about `seq` we can simply inline my `myScanl'` in `initsQ2` and replace the `seq` by `case` statements. Or leave the `seq` but argue that we are always applying them to values of type `Queue`, nothing breaks. And by that argument, as long as `myScanl` is only used for `initsQ2`, we should be safe. I guess we only should worry about actually adding `scanl'` to the libraries, but that’s a different issue and should be discussed in #9356 I guess. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:13 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: normal | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:13 nomeata]:
If we are worried about `seq` we can simply inline my `myScanl'` in `initsQ2` and replace the `seq` by `case` statements. Or leave the `seq` but argue that we are always applying them to values of type `Queue`, nothing breaks. And by that argument, as long as `myScanl` is only used for `initsQ2`, we should be safe.
I guess we only should worry about actually adding `scanl'` to the
We could also write `initsQ` in the completely inlined form, i.e. the core above. Plus point: No need to define the `Queue` datatype (which would be dead code anyways). Minus point: Doesn’t look elegant, no fusion
I believe I have now pretty much completed the proof. It's very ugly, because I don't have a sufficiently firm understanding of the higher-order fold involved, but here's the outline (minus horrors of immature mathematics): {{{ foldr evil wrong (scanl' f a bs) = foldr evil wrong $ a `seq` a : foldr (\b g x -> let b' = f x b in b' `seq` (b` : g b')) (\b -> b `seq` []) bs a -- Call this e1 a bs =?= (\c n -> a `seq` a `c` foldr (\b g x -> let b' = f x b in b' `seq` (b' `c` g b')) (\b -> b `seq` n) bs a) evil wrong -- Call this e2 a bs }}} There are two trivial base cases: {{{#!haskell e1 [] = e2 [] = evil a wrong e1 _|_ = e2 _|_ = evil a _|_ }}} Then the part I found difficult was proving that {{{#!haskell e1 a (q:bs) = evil a $ let b1' = f a q in b1' `seq` e1 b1' bs }}} to match the much simpler {{{#!haskell e2 a (q:bs) = evil a $ let b1' = f a q in b1' `seq` e2 b1' bs }}} For the infinite case, I don't know the formalism involved, but the general idea is that `evil` can only inspect a finite amount of its second argument before producing something, so we can always (temporarily) cut off the list somewhere past that. libraries, but that’s a different issue and should be discussed in #9356 I guess. Yes, but I now think we should. possible. It doesn't look bad at all, but I think it's pretty much write-only code—it's hard to see what it's doing and why, and hard to experiment with as we've been doing (swapping modular pieces around to try different things). Giving up possible fusion to attain less clarity does not look like a good plan to me. I'm going to do some benchmarking and profiling myself; I'd like to toss a fusable non-strict scanl into the ring for comparison, and ramp up the scale on some of the benchmarks to improve resolution. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:14 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by dfeuer): * priority: normal => high -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:15 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: patch Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by dfeuer): * status: new => patch Comment: My testing indicates that for large lists (e.g., 10,000 elements), Bertram Felgenhauer's simple and mostly very clean `initsT'` implementation is much better than any variation on `initsQ` when concatenating the results. I wrapped it up in `build` to get a ''little'' fusion potential (not essential) and uploaded a patch. I would still like very much to understand why using `take'` gives slightly better performance than using `take`, but it very clearly does. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:16 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: patch Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): If someone is able to do benchmark comparisons for lists of various sizes, that would really be great; I'm not currently able to use Criterion on GHC 7.9. It's possible that `initsQ` is better when the list is fairly small relative to the cache size and `initsT'` is better when it's large, in which case someone will have to make a judgement about which case is more important. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:17 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: patch Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by nomeata): I’m really worryied seeing `INLINE` on such a rather large function. Doesn’t that mean that we will never use the compiled `inits` from the library but rather re-compile it at every use site? That does not seem to be good. (Disclaimer: Have not actually tried it.) I have run criterion on HEAD earlier, I might be able to set it up again. Do you have a ready-to-run benchmark that you would like me to run? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:18 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

I’m really worryied seeing `INLINE` on such a rather large function. Doesn’t that mean that we will never use the compiled `inits` from the
#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: patch Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:18 nomeata]: library but rather re-compile it at every use site? That does not seem to be good. (Disclaimer: Have not actually tried it.)
I have run criterion on HEAD earlier, I might be able to set it up
again. Do you have a ready-to-run benchmark that you would like me to run? I think we can just skip the `build` and get rid of `INLINE` without any significant harm. The fusion potential with this definition is quite limited anyway. The most interesting-looking fusion opportunities (not enabled by that definition anyway) seem to be `inits/enumFromTo` and `concat/inits`. The former is great if things are, and remain, unboxed, and horrible otherwise, and we can't tell the difference, unfortunately. The latter might be worth thinking about, if anyone actually writes that. This looks very good in Core, if we can find a way to accomplish it: {{{#!hs concat (inits xs) = build (\cons nil -> let go _ _ [] = nil go n k (y:ys) | n == k = go (n+1) 0 xs | otherwise = y `cons` go n (k+1) ys in go 1 0 xs) }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:19 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: patch Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): I've attached a new patch without the `build` or inlining, and upgrading from `Int` to `Word` to double the number of lists we can pull from `inits . cycle` before the world explodes. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:20 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Changes (by dfeuer): * status: patch => new Comment: I worked with Bertram Felgenhauer a bit tonight, and it looks like he's right; initsQ2 really is better. Unfortunately, it appears that the implementation of `scanl'` given here has the potential to blow things up for `initsQ2` when combined with `concat`. In particular, if I do {{{#!hs main = print $ sum $ concat $ initsQ2 [1..10000::Int] }}} then it uses up all my RAM and takes forever. Interrupting fusion between `initsQ2` and `[1..1000]` does not help at all, but interrupting it between `concat` and `initsQ2` seems to work. Removing the `sum` also prevents it from using all my RAM. If I replace `sum` with `foldl' (+) 0` then it runs in a small amount of RAM, but allocation is still very high and it's very slow. I'm still struggling to tease apart the interactions here. One thing that ''seems'' clear is that at its worst, it fails to recognize that it can treat the lazy `foldl` of `sum` as a strict `foldl'`, which is disastrous. But that's not the only thing going wrong. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:21 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by nomeata): Didn’t look at the core yet, but this might be a case of non-linear recursion where an arity analysis in insufficient to get good code for a left fold, see 5.1.1 in http://pp.ipd.kit.edu/uploads/publikationen/breitner14callarity.pdf. I can have a closer look later. Can you, with these examples, please always include one complete self-contained file? Otherwise there is a risk that I might be testing with the wrong version of inits/scanl'/whatever. I assume you are on GHC HEAD? -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:22 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

Didn’t look at the core yet, but this might be a case of non-linear recursion where an arity analysis in insufficient to get good code for a left fold, see 5.1.1 in http://pp.ipd.kit.edu/uploads/publikationen/breitner14callarity.pdf.
I can have a closer look later. Can you, with these examples, please always include one complete self-contained file? Otherwise there is a risk
#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:22 nomeata]: that I might be testing with the wrong version of inits/scanl'/whatever. I assume you are on GHC HEAD? I've attached a complete, self-contained test case. As I mentioned in an email, you should keep a close eye on RAM usage or run this program with heap limits set if you don't want to crash your computer. If this is an arity analysis problem that's too hard to fix, we can work around it by writing `inits xs = don'tFuse . map toListQ . scanl' snocQ emptyQ` to ensure that `inits` doesn't fuse on the problematic (and, in fact, rather unimportant) side. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:23 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by nomeata): Thanks. I changed it to `foo = sum $ concat $ initsQ' $ [1..10000::Int]` (so that I don’t get any print-related stuff), and `-ddump-call-arity` gives me this code: {{{ #!hs [LclId, Arity=2, CallArity=2, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=2, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [0 0] 150 0}] c = \ (x :: [Int]) (y [OS=OneShot] :: Int -> Int) -> letrec { go [Occ=LoopBreaker] :: [Int] -> Int -> Int [LclId, Arity=1, CallArity=1, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=1, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [30] 110 60}] go = \ (ds :: [Int]) -> case ds of _ [Occ=Dead] { [] -> y; : y ys -> let { ds1 [OS=OneShot] :: Int -> Int [LclId, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 20 0}] ds1 = go ys } in \ (ds2 :: Int) -> ds1 ($fNumInt_$c+ ds2 y) }; } in go x foo :: Int [LclIdX, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 634 0}] foo = c (++ ([]) (reverse1 ([]) ([]))) (letrec { go [Occ=LoopBreaker] :: Int# -> Queue Int -> Int -> Int [LclId, Arity=2, CallArity=2, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=2, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [80 20] 464 0}] go = \ (x :: Int#) (eta :: Queue Int) -> let { b :: Int [LclId, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 20}] b = I# x } in case eta of _ [Occ=Dead] { Queue dt f r -> let { a :: Word# [LclId, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=False, ConLike=False, WorkFree=False, Expandable=True, Guidance=IF_ARGS [] 1 0}] a = plusWord# dt (__word 1) } in let { r :: [Int] [LclId, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=True, ConLike=True, WorkFree=True, Expandable=True, Guidance=IF_ARGS [] 10 30}] r = : b r } in case word2Int# (popCnt# a) of _ [Occ=Dead] { __DEFAULT -> c (++ f (reverse1 r ([]))) (case x of wild { __DEFAULT -> go (+# wild 1) (Queue a f r); 10000 -> \ (eta :: Int) -> eta }); 1 -> let { ipv :: [Int] [LclId, Str=DmdType, Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=False, ConLike=False, WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 60 0}] ipv = ++ f (reverse1 r ([])) } in c (++ ipv (reverse1 ([]) ([]))) (case x of wild { __DEFAULT -> go (+# wild 1) (Queue a ipv ([])); 10000 -> \ (eta :: Int) -> eta }) } }; } in go 1 a) (I# 0) }}} The `go` inside `foo` is the interesting function. Its type has three arguments, but its outermost lambda only takes two. This is common when fusing a left-fold. Only that Call Arity is not sufficient to see that we’d want this to be expanded to thre arguments (`CallArity=3`). Would `CallArity=3` be correct? Yes, because the recursive calls to `go` call it with two arguments, pass that to `c`, which calls it at most once (so no sharing could be lost) with at least one argument. Unfortunately, that information is currently not available to the call arity analysis (§5.1.1 in the paper linked above). Call Arity is a forward analysis, so it is hard to see how it could make use of that. And even if it did, it would still have to pass a function (in that case a PAP) to `c`... :-( -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:24 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Just to keep us on the same page, I rewrote `scanl'` like this so it can be written back to a simple form. {{{#!hs {-# NOINLINE [1] scanl' #-} scanl' :: (b -> a -> b) -> b -> [a] -> [b] scanl' f q ls = q `seq` q : (case ls of [] -> [] x:xs -> scanl' f (f q x) xs) {-# RULES "scanl'" [~1] forall f a bs . scanl' f a bs = build (\c n -> a `c` foldr (scanlFB' f c) (flipSeqScanl' n) bs a) "scanl'List" [1] forall f a bs . foldr (scanlFB' f (:)) (flipSeqScanl' []) bs a = tailScanl' f a bs #-} {-# INLINE [0] scanlFB' #-} scanlFB' f c = \b g x -> let b' = f x b in b' `seq` b' `c` g b' {-# INLINE [0] flipSeqScanl' #-} flipSeqScanl' = flip seq {-# NOINLINE [1] tailScanl' #-} tailScanl' f a bs = a `seq` foldr (scanlFB' f (:)) (flip seq []) bs a }}} This doesn't seem to affect the problem, however. I am curious if there is a way to manually force whatever's getting the wrong arity to get the right one. Bertram pointed out that fusing `inits` on the right with an expensive producer is ''really'' bad, and I realized that fusing it on the right with something that leads to lots of reboxing is also really bad. So for now, probably the best thing is just to NOINLINE inits altogether. Unrelatedly: I was successful in my quest to write rules to make `concat` and `inits` fuse into something that's actually good (in case anyone actually uses such a thing). Interestingly the arity/strictness analysis only works on it if it's inlined (maybe you can figure out a fix for that?): {{{#!hs -- We don't particularly like the idea of inlining this function, since it's -- a bit large, but it appears to be necessary to get the analyses to work -- right and avoid eating all available memory. Making this INLINE [0] to -- be able to rewrite it to a simpler form unfortunately seems to make it miss some -- analysis that speeds it up by a relatively small but still significant amount. -- There may be some way around this problem, but I haven't found it yet. {-# INLINE concatInitsFB #-} concatInitsFB xs cons nil = let go _ _ [] = nil go n k (y:ys) | n == k = go (n+1) 0 xs | otherwise = y `cons` go n (k+1) ys in go (1::Int) 0 xs {-# RULES -- We might catch concat before it's rewritten. "concatInits1" forall xs . concat (inits xs) = build (concatInitsFB xs) -- If not, we might be able to recognize its rewritten form. "concatInits2" forall c n xs . foldr (\x y -> foldr c y x) n (inits xs) = concatInitsFB xs c n #-} }}} -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:25 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by nomeata):
This doesn't seem to affect the problem, however.
Yes, I don’t think the problem in this particular case lies with init per se. I think it’s more like concat, but only when part of the recursion (the `c` above) is not inlined. That might also be what you observe.
I am curious if there is a way to manually force whatever's getting the wrong arity to get the right one
Not with list fusion as we have it right now, I think. It should work with https://github.com/takano-akio/ww-fusion (which I should follow up on again). I don’t think a rule for `concat (inits)` is worth it, but it may help us understand the problem better. I wonder if maybe the rule for `concat` can be rewritten so that the resulting recursion is nicer, but I don’t see it right now (but I’m too jetlagged to think anyways). -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:26 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

This doesn't seem to affect the problem, however.
Yes, I don’t think the problem in this particular case lies with `init`
I am curious if there is a way to manually force whatever's getting
#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: Type: bug | Status: new Priority: high | Milestone: 7.8.4 Component: | Version: 7.8.3 libraries/base | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: | -------------------------------------+------------------------------------- Comment (by dfeuer): Replying to [comment:26 nomeata]: per se. I think it’s more like `concat`, but only when part of the recursion (the `c` above) is not inlined. That might also be what you observe. I hope you'll be able to tease this apart; `inits` isn't very important in the grand scheme of things, but `concat` is. the wrong arity to get the right one
Not with list fusion as we have it right now, I think. It should work
with https://github.com/takano-akio/ww-fusion (which I should follow up on again). That looks very interesting, by which I mean I couldn't understand it at all the first time around. I'll have to try again. If it can really solve this class of problem in a fairly thorough fashion, it will certainly be worth the added complexity.
I wonder if maybe the rule for `concat` can be rewritten so that the resulting recursion is nicer, but I don’t see it right now (but I’m too jetlagged to think anyways).
Rest well! -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:27 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: bug | Status: patch Priority: high | Milestone: 7.8.4 Component: Core | Version: 7.8.3 Libraries | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: Phab:D329 | -------------------------------------+------------------------------------- Changes (by dfeuer): * cc: core-libraries-committee@… (added) * status: new => patch * differential: => Phab:D329 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:29 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: bug | Status: patch Priority: high | Milestone: 7.8.4 Component: Core | Version: 7.8.3 Libraries | Keywords: Resolution: | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: Phab:D329 | -------------------------------------+------------------------------------- Changes (by dfeuer): * milestone: 7.10.1 => 7.8.4 Comment: The `initsR` version, at least, can slip into 7.8.4 without any ripples. Changing milestone back per thoughtpolice. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:31 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow
-------------------------------------+-------------------------------------
Reporter: dfeuer | Owner: ekmett
Type: bug | Status: patch
Priority: high | Milestone: 7.8.4
Component: Core | Version: 7.8.3
Libraries | Keywords:
Resolution: | Architecture: Unknown/Multiple
Operating System: | Difficulty: Easy (less than 1
Unknown/Multiple | hour)
Type of failure: Runtime | Blocked By:
performance bug | Related Tickets:
Test Case: |
Blocking: |
Differential Revisions: Phab:D329 |
-------------------------------------+-------------------------------------
Comment (by Joachim Breitner

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: bug | Status: closed Priority: high | Milestone: 7.8.4 Component: Core | Version: 7.8.3 Libraries | Keywords: Resolution: fixed | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: Phab:D329 | -------------------------------------+------------------------------------- Changes (by nomeata): * status: patch => closed * resolution: => fixed -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:33 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: bug | Status: closed Priority: high | Milestone: 7.10.1 Component: Core | Version: 7.8.3 Libraries | Keywords: Resolution: fixed | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: Phab:D329 | -------------------------------------+------------------------------------- Changes (by thoughtpolice): * milestone: 7.8.4 => 7.10.1 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:34 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: bug | Status: merge Priority: high | Milestone: 7.8.4 Component: Core | Version: 7.8.3 Libraries | Keywords: Resolution: fixed | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: Phab:D329 | -------------------------------------+------------------------------------- Changes (by dfeuer): * status: closed => merge * milestone: 7.10.1 => 7.8.4 -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:35 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler

#9345: Data.List.inits is extremely slow -------------------------------------+------------------------------------- Reporter: dfeuer | Owner: ekmett Type: bug | Status: closed Priority: high | Milestone: 7.8.4 Component: Core | Version: 7.8.3 Libraries | Keywords: Resolution: fixed | Architecture: Unknown/Multiple Operating System: | Difficulty: Easy (less than 1 Unknown/Multiple | hour) Type of failure: Runtime | Blocked By: performance bug | Related Tickets: Test Case: | Blocking: | Differential Revisions: Phab:D329 | -------------------------------------+------------------------------------- Changes (by thoughtpolice): * status: merge => closed Comment: Merged. -- Ticket URL: http://ghc.haskell.org/trac/ghc/ticket/9345#comment:36 GHC http://www.haskell.org/ghc/ The Glasgow Haskell Compiler
participants (1)
-
GHC