Re: Proposal #3271: new methods for Data.Sequence

18 Jun 2009

      Sorry about the delay in responding.

On Tue, Jun 02, 2009 at 04:53:44PM -0500, Louis Wasserman wrote:
...
http://hackage.haskell.org/trac/ghc/ticket/3271
Data.Sequence is meant as a general-purpose implementation of finite
sequences.  The range of methods it offers, however, is considerably
more constrained than Data.List, even allowing for the constraint that
sequences are finite.
The following methods cannot be efficiently implemented based on currently
exported methods from Data.Sequence.
* zipWith and its variants. Note: it suffices to define zipWith alone.
  * replicate. (This can be implemented in O(log n) time with node sharing.)
This is great stuff: your version of replicate is obviously a big win,
though I would simplify replicate slightly to

replicateFinger n x = case n of
       0       -> Empty
       1       -> Single x
       2       -> deep one Empty one
       3       -> deep two Empty one
       4       -> deep two Empty two
       5       -> deep three Empty two
       _       -> case (n - 6) `quotRem` 3 of
               (q, 0)  -> deep three (replicateFinger q node) three
               (q, 1)  -> deep four (replicateFinger q node) three
               (q, 2)  -> deep four (replicateFinger q node) four
       where   one = One x             -- Maximize node sharing.
               two = Two x x
               three = Three x x x
               four = Four x x x x
               node = node3 x x x

It's not quite true that zipWith can't be implemented efficiently using
the current interface: one can do a mapAccumL over the shorter sequence
with a toList of the other one.  But it's convenient to have, and it's
also more efficient to have a separate zipWith3 and zipWith4 instead of
iterating zipWith.

You've also given concise definitions of many of the other functions,
which will be convenient to users.

But I have a general concern about the maintainability of large amounts of
repetitive (often machine-generated) code.  That may seem hypocritical,
as I put such code in there myself (notably in append), but I think
there's a trade-off.  The QuickCheck tests are a great help (as they
were in the original development), but this code is still difficult
to manage.  I think it needs to bring considerable gains to justify the
maintainance cost.  The most extreme case is sort, where I think your
400-line version would need to be a lot faster than the naive

  sort :: Ord a => Seq a -> Seq a
  sort = fromList . Data.List.sort . Data.Foldable.toList

to justify inclusion, but you don't include measurements.  The loss of
stability is also problematic.  Incidentally, the following would be
log-time and stable, but I haven't measured it:

  newtype OrdList a = OrdList { getList :: [a] }

  sort :: Ord a => Seq a -> Seq a
  sort = fromList . getList . foldMap wrap
    where wrap x = OrdList [x]

  instance Ord a => Monoid (OrdList a) where
    mempty = OrdList []
    OrdList xs `mappend` OrdList ys = OrdList (merge xs ys)

Hmm, it's not clear how to make that fit sortBy.

The zips are only 80 lines, but I wonder whether a simpler version of
those might also give comparable performance.

Similarly, I wonder if your partition is much faster than the naive

  partition :: (a -> Bool) -> Seq a -> (Seq a, Seq a)
  partition p = foldMap partition1
    where partition1 x
            | p x = (singleton x, empty)
            | otherwise = (empty, singleton x)

By the way, if you have partition you might as well have filter too.

Why have you introduced consDigitToTree and snocDigitToTree?

Could tails be

  tails = scanr (<|) empty

symmetrically with inits?

I think it's problematic that iterate

  iterate :: Int -> (a -> a) -> a -> Seq a

has a different signature from the list version.  But then a new function
is not much easier to use than

  fromList . take n . Data.List.iterate x

and that ought to be deforested to something reasonable anyway.

Re: Proposal #3271: new methods for Data.Sequence

Ross Paterson