Re: Speeding up Data.List.inits

I did a little more benchmarking, and it turns out that the overhead
to use Okasaki-style persistently efficient banker's queues is not
terribly severe because these queues are simple and therefore fast
(it's around 1.5 times as slow at its very worst, but that may have
been a fluke--the performance generally seems comparable). It does
indeed give far, far better performance when looking at the heads of
many of the lists at the end of the result, as Edward Kmett predicted.
I don't know if these queues are available in a standard library, but
the following is a full implementation of initsQ, including what it
needs of the queue implementation:
data Queue a = Queue
{front :: [a],
lenF :: Int,
rear :: [a],
lenR ::Int} deriving (Show)
empty = Queue {front=[],lenF=0,rear=[],lenR=0}
queue f lF r lR
| lR <= lF = Queue f lF r lR
| otherwise = Queue {front = f ++ reverse r,
lenF = lF + lR,
rear = [],
lenR = 0}
(Queue f lF r lR) |> x = queue f lF (x:r) (lR+1)
toListQ (Queue f _ r _) = f ++ reverse r
initsQ xs = map toListQ (scanl (|>) empty xs)
On Sat, Jul 19, 2014 at 5:25 AM, Edward Kmett
The half-baked thought was that reaching the kth list is O(k), and reversing _each list_ is O(k), but with a smarter structure you might share some of the work of construction across multiple lists. e.g. if you had an inits like function that returned a list of queues rather than a list of lists you could actually do less work overall setting up the inits asymptotically, sharing the work. If you only take a constant prefix off each queue then this can change the overhead from O(n^2) to = O(kn).
A reversal or difference list requires you to pay separately for each list.
I we are in "violent agreement", however, as I indicated in my comment that the constant factors almost definitely make it a non-starter for something like Data.List. ;)
-Edward
On Sat, Jul 19, 2014 at 4:30 AM, David Feuer
wrote: Edward Kmett, reaching the kth list is O(k). Reversing it is also O(k). I don't know that anything as fancy as something by Okasaki could improve matters—such data structures tend to have large constant factors. Something much worse than that seems to be wrong with the Data.List version at the moment.
On Sat, Jul 19, 2014 at 4:25 AM, Edward Kmett
wrote: Good catch, the common inits and tails version of subsequences makes pretty heavy use of that, as do many of the uses of inits where you use it to get k-element substrings.
I'd be curious if an okasaki-style catenable output restricted deque could recover both efficient head access and reasonably efficient appends, but the overhead of that is probably way out of line with the performance at stake!
-Edward
On Sat, Jul 19, 2014 at 4:14 AM, Henning Thielemann
wrote: Am 19.07.2014 09:51, schrieb David Feuer:
Background: I was looking at the code for Data.List.inits in base-4.7.0.0 (function renamed for clarity):
initsDL :: [a] -> [[a]] initsDL xs = [] : case xs of [] -> [] x : xs' -> map (x :) (initsDL xs')
The recursive maps made me suspicious. I decided to try writing a different version:
initsHO xs = map ($ []) (scanl q id xs) where q f x = f . (x:)
I mentioned this on #haskell and noted that it would be a nice exercise to write it with less fancy function footwork using map reverse. rasfar responded (modulo naming) with
initsR xs = map reverse (scanl q [] xs) where q acc x = x : acc
The 'map' gives efficient access to the first elements of the sub-lists, thus it seems that initsDL is faster than initsR when you access only prefixes of the sub-lists. E.g.
Prelude> head $ last $ inits [0..10000000::Int]
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

In fact, since the queues are only used in a very restricted way, we can
make them even smaller. Something like this (completely untested) code:
data Queue a = Queue {-# UNPACK #-}!Int [a] [a]
empty = Queue 1 [] []
queue sp1 f r
| popCount sp1 /= 1 = Queue sp1 f r
| otherwise. = Queue sp1 (f ++ reverse r) []
(Queue sp1 f r) |> x = queue (sp1 + 1) f (x:r)
toListQ (Queue _ f r) = f ++ reverse r
On Jul 19, 2014 1:33 PM, "David Feuer"
I did a little more benchmarking, and it turns out that the overhead to use Okasaki-style persistently efficient banker's queues is not terribly severe because these queues are simple and therefore fast (it's around 1.5 times as slow at its very worst, but that may have been a fluke--the performance generally seems comparable). It does indeed give far, far better performance when looking at the heads of many of the lists at the end of the result, as Edward Kmett predicted. I don't know if these queues are available in a standard library, but the following is a full implementation of initsQ, including what it needs of the queue implementation:
data Queue a = Queue {front :: [a], lenF :: Int, rear :: [a], lenR ::Int} deriving (Show)
empty = Queue {front=[],lenF=0,rear=[],lenR=0}
queue f lF r lR | lR <= lF = Queue f lF r lR | otherwise = Queue {front = f ++ reverse r, lenF = lF + lR, rear = [], lenR = 0}
(Queue f lF r lR) |> x = queue f lF (x:r) (lR+1)
toListQ (Queue f _ r _) = f ++ reverse r
initsQ xs = map toListQ (scanl (|>) empty xs)
On Sat, Jul 19, 2014 at 5:25 AM, Edward Kmett
wrote: The half-baked thought was that reaching the kth list is O(k), and
reversing _each list_ is O(k), but with a smarter structure you might share some of the work of construction across multiple lists. e.g. if you had an inits like function that returned a list of queues rather than a list of lists you could actually do less work overall setting up the inits asymptotically, sharing the work. If you only take a constant prefix off each queue then this can change the overhead from O(n^2) to = O(kn).
A reversal or difference list requires you to pay separately for each
list.
I we are in "violent agreement", however, as I indicated in my comment
that the constant factors almost definitely make it a non-starter for something like Data.List. ;)
-Edward
On Sat, Jul 19, 2014 at 4:30 AM, David Feuer
Edward Kmett, reaching the kth list is O(k). Reversing it is also O(k).
I don't know that anything as fancy as something by Okasaki could improve matters—such data structures tend to have large constant factors. Something much worse than that seems to be wrong with the Data.List version at the moment.
On Sat, Jul 19, 2014 at 4:25 AM, Edward Kmett
wrote: Good catch, the common inits and tails version of subsequences makes
wrote: pretty heavy use of that, as do many of the uses of inits where you use it to get k-element substrings.
I'd be curious if an okasaki-style catenable output restricted deque
could recover both efficient head access and reasonably efficient appends, but the overhead of that is probably way out of line with the performance at stake!
-Edward
On Sat, Jul 19, 2014 at 4:14 AM, Henning Thielemann <
schlepptop@henning-thielemann.de> wrote:
Am 19.07.2014 09:51, schrieb David Feuer:
Background: I was looking at the code for Data.List.inits in base-4.7.0.0 (function renamed for clarity):
initsDL :: [a] -> [[a]] initsDL xs = [] : case xs of [] -> [] x : xs' -> map (x :) (initsDL xs')
The recursive maps made me suspicious. I decided to try writing a different version:
initsHO xs = map ($ []) (scanl q id xs) where q f x = f . (x:)
I mentioned this on #haskell and noted that it would be a nice
exercise
to write it with less fancy function footwork using map reverse. rasfar responded (modulo naming) with
initsR xs = map reverse (scanl q [] xs) where q acc x = x : acc
The 'map' gives efficient access to the first elements of the sub-lists, thus it seems that initsDL is faster than initsR when you access only prefixes of the sub-lists. E.g.
Prelude> head $ last $ inits [0..10000000::Int]
_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries
participants (1)
-
David Feuer