
Leon Smith wrote:
Ok, after mulling over the issues that Will Ness has brought up in the last few days [1], I think I have a partial explanation for the apparent tension between Will's observations and Heinrich Apfelmus's Implicit Heaps article [2], which both concern the implementation of mergeAll [3].
[1] http://permalink.gmane.org/gmane.comp.lang.haskell.cafe/84666 [2] http://apfelmus.nfshost.com/articles/implicit-heaps.html [3] http://hackage.haskell.org/packages/archive/data-ordlist/0.4.4/doc/html/Data...
[...]
This raises the question, is there some combination of the shape of the merge' tree and some input for which using VIPs dramatically changes the efficiency of a mergeAll operation? I suspect the answer is yes, though I don't know for sure at this point in time.
However, I do tacitly believe that the current tree that mergeAll uses doesn't exhibit this property for any input, and so I have simplified the implementations of mergeAll and unionAll in the latest version of data-ordlist-0.4.4 by avoiding the use of VIPs. This has the nice side benefit of modestly improving performance when the elements of the result are highly biased towards the first list.
Will Ness
For those who remember the discussion about this about a year ago, it turns out there was a simpler version after all lurking somewhere in there (or is it _out_?).
primes = 2: primes' where primes' = 3: 5: [7,9..] `minus` tfold [ [p*p,p*p+2*p..] | p <- primes' ] tfold ((x:xs):t) = x : xs `union` tfold (pairs t) pairs ((x:xs):ys:t) = (x: union xs ys) : pairs t
Unfortunately, it turns out that this program is clear, shorter ... and subtly wrong. :) Here an example where the VIP merge would give a different result bad = tfold $ (1:10:undefined) : (2:3:5:undefined) : (4:undefined) : error "bad" We have ghci> bad [1,2*** Exception: bad but the VIP version would give at least ghci> bad [1,2,3,4*** Exception: bad / Prelude: undefined In other words, this new program already tries to compare the number 3 to the fourth list when it is still clear that only the first three lists are relevant. Note that this doesn't necessarily mean that the program does not work for prime numbers, but *proving* correctness is now considerably more difficult because you need estimates about the growth and distribution of prime numbers. The VIP version always works as long as there are infinitely many primes. Also, since unnecessary comparisons are performed, it is no longer clear whether the time and space complexity stays the same. (Which is not as bad as it sounds, since we didn't know them well in the first place anyway). More worryingly, changing the tree shape now affects correctness. Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com