<*> for Data.Sequence: mission accomplished

All right! It's taken a long time, but I finally managed to write a
nice implementation of `<*>` for `Data.Sequence`! It looks nothing
like my original concept, but it accomplishes my incremental
performance goals while also being a little better than the current
implementation when forcing the whole result. Anyone interested can
see the code in the `ap` branch of treeowl/containers on GitHub. There
are still some ugly special cases for sufficiently small
arguments--any ideas for changing that would be most welcome.
Many thanks to Joachim Breitner for helping implement an earlier
approach, to Ross Paterson for coming up with a solution to a previous
formulation of the problem (thus helping me understand that I had
asked the wrong question), and to Carter Schonwald for putting up with
my stream of consciousness rambling.
On Sun, Nov 23, 2014 at 12:07 AM, David Feuer
OK, sorry for the flood of posts, but I think I've found a way to make that work. Specifically, I think I can write a three-Seq append that takes the total size and uses it to be as lazy as possible in the second of the three Seqs. I'm still working out the details, but I think it will work. It does the (possibly avoidable) rebuilding, but I *think* it's at least asymptotically optimal. Of course, if Ross Paterson can find something more efficient, that'd be even better.
On Sat, Nov 22, 2014 at 10:10 PM, David Feuer
wrote: OK, so I've thought about this some more. I think the essential *concept* I want is close to this, but it won't quite work this way:
fs <*> xs = equalJoin $ fmap (<$> xs) fs
equalJoin :: Int -> Seq (Seq a) -> Seq a equalJoin n s | length s <= 2*n = simpleJoin s | otherwise = simpleJoin pref >< equalJoin (2*n) mid >< simpleJoin suff where (pref, s') = splitAt n s (mid, suff) = splitAt (length s - 2*n) s'
simpleJoin :: Seq (Seq a) -> Seq a simpleJoin s | null s = empty | length s == 1 = index s 0 | otherwise = simpleJoin front >< simpleJoin back where (front,back) = splitAt (length s `quot` 2) s
I think the reason this doesn't work is that >< is too strict. I believe the only potential way around this is to dig into the FingerTree representation and build the thing top-down. I still don't understand how (if at all) this can be done.
On Sat, Nov 22, 2014 at 12:57 PM, David Feuer
wrote: The ideal goal, which has taken me forever to identify and which may well be unattainable, is to get O(log(min{i,mn-i})) access to each element of the result, while maintaining O(mn) time to force it entirely. Each of these is possible separately, of course. To get them both, if it's possible, we need to give up on the list-like approach and start splitting Seqs instead of lists. As we descend, we want to pass a single thunk to each element of each Digit to give it just enough to do its thing. Representing the splits efficiently and/or memoizing them could be a bit of a challenge.
On Fri, Nov 21, 2014 at 02:00:16PM -0500, David Feuer wrote:
To be precise, I *think* using the fromList approach for <*> makes us create O (n) thunks in order to extract the last element of the result. If we build the result inward, I *think* we can avoid this, getting the last element of the result in O(1) time and space. But my understanding of this data structure remains primitive.
This modification of the previous should do that.
mult :: Seq (a -> b) -> Seq a -> Seq b mult sfs sxs = fromTwoLists (length sfs * length sxs) ys rev_ys where fs = toList sfs rev_fs = toRevList sfs xs = toList sxs rev_xs = toRevList sxs ys = [f x | f <- fs, x <- xs] rev_ys = [f x | f <- rev_fs, x <- rev_xs]
-- toRevList xs = toList (reverse xs) toRevList :: Seq a -> [a] toRevList = foldl (flip (:)) []
-- Build a tree lazy in the middle, from a list and its reverse. -- -- fromTwoLists (length xs) xs (reverse xs) = fromList xs -- -- Getting the kth element from either end involves forcing the lists -- to length k. fromTwoLists :: Int -> [a] -> [a] -> Seq a fromTwoLists len_xs xs rev_xs = Seq $ mkTree2 len_xs 1 (map Elem xs) (map Elem rev_xs)
-- Construct a fingertree from the first n elements of xs. -- The arguments must satisfy n <= length xs && rev_xs = reverse xs. -- Each element of xs has the same size, provided as an argument. mkTree2 :: Int -> Int -> [a] -> [a] -> FingerTree a mkTree2 n size xs rev_xs | n == 0 = Empty | n == 1 = let [x1] = xs in Single x1 | n < 6 = let nl = n `div` 2 l = Data.List.take nl xs r = Data.List.take (n - nl) rev_xs in Deep totalSize (mkDigit l) Empty (mkRevDigit r) | otherwise = let size' = 3*size n' = (n-4) `div` 3 digits = n - n'*3 nl = digits `div` 2 (l, xs') = Data.List.splitAt nl xs (r, rev_xs') = Data.List.splitAt (digits - nl) rev_xs nodes = mkNodes size' xs' rev_nodes = mkRevNodes size' rev_xs' in Deep totalSize (mkDigit l) (mkTree2 n' size' nodes rev_nodes) (mkRevDigit r) where totalSize = n*size
mkDigit :: [a] -> Digit a mkDigit [x1] = One x1 mkDigit [x1, x2] = Two x1 x2 mkDigit [x1, x2, x3] = Three x1 x2 x3 mkDigit [x1, x2, x3, x4] = Four x1 x2 x3 x4
-- length xs <= 4 => mkRevDigit xs = mkDigit (reverse xs) mkRevDigit :: [a] -> Digit a mkRevDigit [x1] = One x1 mkRevDigit [x2, x1] = Two x1 x2 mkRevDigit [x3, x2, x1] = Three x1 x2 x3 mkRevDigit [x4, x3, x2, x1] = Four x1 x2 x3 x4
mkNodes :: Int -> [a] -> [Node a] mkNodes size (x1:x2:x3:xs) = Node3 size x1 x2 x3:mkNodes size xs
-- length xs `mod` 3 == 0 => -- mkRevNodes size xs = reverse (mkNodes size (reverse xs)) mkRevNodes :: Int -> [a] -> [Node a] mkRevNodes size (x3:x2:x1:xs) = Node3 size x1 x2 x3:mkRevNodes size xs _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 19-12-2014 19:02, David Feuer wrote:
All right! It's taken a long time, but I finally managed to write a nice implementation of `<*>` for `Data.Sequence`! It looks nothing like my original concept, but it accomplishes my incremental performance goals while also being a little better than the current implementation when forcing the whole result. Anyone interested can see the code in the `ap` branch of treeowl/containers on GitHub. There are still some ugly special cases for sufficiently small arguments--any ideas for changing that would be most welcome.
Link for the lazy like me: https://github.com/treeowl/containers/compare/ap
Many thanks to Joachim Breitner for helping implement an earlier approach, to Ross Paterson for coming up with a solution to a previous formulation of the problem (thus helping me understand that I had asked the wrong question), and to Carter Schonwald for putting up with my stream of consciousness rambling.
Nice work! I've got some questions/suggestions: - It would be nice to have at least one comment explaining the decisions surrounding the code and how it works. I read it without previous knowledge about how it would accomplish its goal and it was a bit painful. It's pretty clever code! - Why is ap3FT the only of the ap*FT family that cares about length fs < 2? - Why have the ap* family of functions instead of just deconstructing fs and wrapping Seq on apShort? Cheers! -- Felipe.

On Dec 19, 2014 7:01 PM, "Felipe Lessa"
- It would be nice to have at least one comment explaining the decisions surrounding the code and how it works. I read it without previous knowledge about how it would accomplish its goal and it was a bit painful. It's pretty clever code!
Thanks! I plan to write up a full explanation with diagrams and everything, but that will take me a bit.
- Why is ap3FT the only of the ap*FT family that cares about length fs < 2?
I wrote ap2FT and ap3FT well before I figured out how (or if!) the rest would work. I forgot to strip those cases from ap3FT. Thanks!
- Why have the ap* family of functions instead of just deconstructing fs and wrapping Seq on apShort?
Because I didn't think of it. Thanks.
participants (2)
-
David Feuer
-
Felipe Lessa