
Sorry about the delay in responding. On Tue, Jun 02, 2009 at 04:53:44PM -0500, Louis Wasserman wrote:
http://hackage.haskell.org/trac/ghc/ticket/3271
Data.Sequence is meant as a general-purpose implementation of finite sequences. The range of methods it offers, however, is considerably more constrained than Data.List, even allowing for the constraint that sequences are finite.
The following methods cannot be efficiently implemented based on currently exported methods from Data.Sequence.
* zipWith and its variants. Note: it suffices to define zipWith alone. * replicate. (This can be implemented in O(log n) time with node sharing.)
This is great stuff: your version of replicate is obviously a big win, though I would simplify replicate slightly to replicateFinger n x = case n of 0 -> Empty 1 -> Single x 2 -> deep one Empty one 3 -> deep two Empty one 4 -> deep two Empty two 5 -> deep three Empty two _ -> case (n - 6) `quotRem` 3 of (q, 0) -> deep three (replicateFinger q node) three (q, 1) -> deep four (replicateFinger q node) three (q, 2) -> deep four (replicateFinger q node) four where one = One x -- Maximize node sharing. two = Two x x three = Three x x x four = Four x x x x node = node3 x x x It's not quite true that zipWith can't be implemented efficiently using the current interface: one can do a mapAccumL over the shorter sequence with a toList of the other one. But it's convenient to have, and it's also more efficient to have a separate zipWith3 and zipWith4 instead of iterating zipWith. You've also given concise definitions of many of the other functions, which will be convenient to users. But I have a general concern about the maintainability of large amounts of repetitive (often machine-generated) code. That may seem hypocritical, as I put such code in there myself (notably in append), but I think there's a trade-off. The QuickCheck tests are a great help (as they were in the original development), but this code is still difficult to manage. I think it needs to bring considerable gains to justify the maintainance cost. The most extreme case is sort, where I think your 400-line version would need to be a lot faster than the naive sort :: Ord a => Seq a -> Seq a sort = fromList . Data.List.sort . Data.Foldable.toList to justify inclusion, but you don't include measurements. The loss of stability is also problematic. Incidentally, the following would be log-time and stable, but I haven't measured it: newtype OrdList a = OrdList { getList :: [a] } sort :: Ord a => Seq a -> Seq a sort = fromList . getList . foldMap wrap where wrap x = OrdList [x] instance Ord a => Monoid (OrdList a) where mempty = OrdList [] OrdList xs `mappend` OrdList ys = OrdList (merge xs ys) Hmm, it's not clear how to make that fit sortBy. The zips are only 80 lines, but I wonder whether a simpler version of those might also give comparable performance. Similarly, I wonder if your partition is much faster than the naive partition :: (a -> Bool) -> Seq a -> (Seq a, Seq a) partition p = foldMap partition1 where partition1 x | p x = (singleton x, empty) | otherwise = (empty, singleton x) By the way, if you have partition you might as well have filter too. Why have you introduced consDigitToTree and snocDigitToTree? Could tails be tails = scanr (<|) empty symmetrically with inits? I think it's problematic that iterate iterate :: Int -> (a -> a) -> a -> Seq a has a different signature from the list version. But then a new function is not much easier to use than fromList . take n . Data.List.iterate x and that ought to be deforested to something reasonable anyway.