[Haskell-cafe] Incremental sequence sorting

18 Mar 2015

      I had an idea, and it's kind of complicated, so I was wondering if I
could get a sanity check from someone before putting *too* much time
into it. Specifically, I was thinking about a sort of incremental
sorting of sequences (in Data.Sequence). The idea is very closely
related to the incremental sorting described by Navarro and Paredes (
http://www.dcc.uchile.cl/~gnavarro/ps/algor09.pdf ).

The basic concept: sequences are represented as Hinze-Paterson 2-3
finger trees. They are fairly lazy, so building them from the top down
tends to be a good thing. Suppose I have a sequence to be sorted. To
start building the result sequence, I need to know which elements go
in the first digit and which go in the last digit. To do this, I need
two order statistics, starting with one separating the first digit
from the rest, then one separating the last digit from the rest. Once
I have separated these, I can produce the top of the sequence. The
digit separation is the same all down the spine. Within digits, 2-3
trees split similarly.

I can get the order statistics using median-of-medians. As Navarro and
Paredes (and presumably others) describe, median-of-medians partially
sorts the sequence in the process of finding the desired order
statistic. Navarro and Paredes use a stack of indices into an array to
take advantage of this, but I need things rather more incremental.
What I was thinking is that I could use a finger tree of sequences (a
sequence of sequences making it easy to find where a particular total
length is reached) to separate the different segments from each other.
Each time I need to split on a particular order statistic, I can
isolate the necessary segment, use median-of-medians (splitting it
into pieces), and then append the pieces onto either side as
appropriate.

Does anyone have a sense of

1. Whether this is sane and/or
2. What the chances are that its performance will be competitive with
either of the current Data.Sequence sorting functions and/or
3. How I should actually implement median-of-medians for this
particular application (as there seem to be very few Haskell
implementations of the algorithm, and I've found no optimized ones)?

Thanks,
David Feuer