Honest constant-space mergesort

12 Jan 2019

      I am, for entertainment, trying to figure out how to implement a
constant-space bottom-up mergesort in Haskell.  This is quite easy to at
least describe imperatively: break the list into ordered blocks, merge
adjacent ones, and repeat with double the length until there's only one
block and you're done.  For a linked list like [a] in Haskell, this should
be constant-space.  Unfortunately, it is not.

You can definitely express the above algorithm nicely if you organize those
blocks into an [[a]], but then you are creating O(n) new lists.  Even if
you go to extremes to keep the blocks concatenated, though, you still have
the issue that every time you merge two lists, you are creating a new list
and that involves the allocator.  (Note that the implementation of
Data.List.sort does this.)

The C equivalent would proceed by just overwriting the linking pointers in
each node, without allocating new nodes to do so, but of course, that's not
going to work in Haskell.  I considered using something like STRef to
represent the links (so, departing from the plain [a] type) but even then,
if you overwrite the contents of one of those, the garbage collector still
needs to know.  This is better than allocating more space, but it is not
necessary because the algorithm never actually makes any of the nodes
unreachable.  Also, I understand that using a lot of STRefs is bad for
efficiency.

Is there any way to do this in (GHC) Haskell and get the algorithm down to
the bare minimum?  If there is, is it any different than just "writing C in
Haskell"?  That is, can one coerce GHC into emitting the mutable-link
operations and not doing collections, from high-level code?  Although I've
definitely seen examples of algorithms that compile down to such unboxed
depths, this one has been really resistant to improvement.

Thanks,
Ryan Reich

Ryan Reich

Li-yao Xia

Frank Staals

Ryan Reich

tags

participants (3)