[Haskell-cafe] V.I.P.s and the associativity of merge'

30 Dec 2010

      Ok,  after mulling over the issues that Will Ness has brought up in
the last few days [1],  I think I have a partial explanation for the
apparent tension between Will's observations and Heinrich Apfelmus's
Implicit Heaps article [2],  which both concern the implementation of
mergeAll [3].

The merge' function takes two ordered lists,  with the head of the
first list less than the head of the second,  and merges their
contents:

   merge' [] ys = ys
   merge' (x:xs) ys = x : merge xs ys

The nice thing about this function is we can merge an infinite number
of lists by folding right,  if assume that the heads of those lists
are appropriately ordered.  This appears in Richard Bird's code at the
end of Melissa O'Neill's "Genuine Sieve of Eratosthenes",  though
undoubtedly this observation has been independently made by many
people.

Now,  in an ordinary sense,   merge' *is* an associative operator,  in
that given three fully defined ordered lists with ordered heads,
merge' xs (merge' ys zs) == merge' (merge' xs ys) zs.   This allows us
to merge an infinite number of lists using an arbitrary tree of merge'
operations,  without ever worrying that we will return an incorrect
result.   (However,  we might get stuck in a non-productive loop,  for
example if we fold left over an infinite list of ordered lists)

However,  Heinrich's article uses a stronger sense of associativity
which includes laziness properties and thus captures some operational
characteristics.  In this sense,  merge' *is not* associative.    To
remedy this,  Heinrich uses VIPs to strengthen the associativity
property of merge'.   This raises the question,  is there some
combination of the shape of the merge' tree and some input for which
using VIPs dramatically changes the efficiency of a mergeAll
operation?   I suspect the answer is yes,  though I don't know for
sure at this point in time.

However,  I do tacitly believe that the current tree that mergeAll
uses doesn't exhibit this property for any input,   and so I have
simplified the implementations of mergeAll and unionAll in the latest
version of data-ordlist-0.4.4 by avoiding the use of VIPs.   This has
the nice side benefit of modestly improving performance when the
elements of the result are highly biased towards the first list.

Best,
Leon

[1] http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666
[2] http://apfelmus.nfshost.com/articles/implicit-heaps.html
[3] http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data...

[Haskell-cafe] V.I.P.s and the associativity of merge'

Leon Smith