
I am, for entertainment, trying to figure out how to implement a constant-space bottom-up mergesort in Haskell. This is quite easy to at least describe imperatively: break the list into ordered blocks, merge adjacent ones, and repeat with double the length until there's only one block and you're done. For a linked list like [a] in Haskell, this should be constant-space. Unfortunately, it is not. You can definitely express the above algorithm nicely if you organize those blocks into an [[a]], but then you are creating O(n) new lists. Even if you go to extremes to keep the blocks concatenated, though, you still have the issue that every time you merge two lists, you are creating a new list and that involves the allocator. (Note that the implementation of Data.List.sort does this.) The C equivalent would proceed by just overwriting the linking pointers in each node, without allocating new nodes to do so, but of course, that's not going to work in Haskell. I considered using something like STRef to represent the links (so, departing from the plain [a] type) but even then, if you overwrite the contents of one of those, the garbage collector still needs to know. This is better than allocating more space, but it is not necessary because the algorithm never actually makes any of the nodes unreachable. Also, I understand that using a lot of STRefs is bad for efficiency. Is there any way to do this in (GHC) Haskell and get the algorithm down to the bare minimum? If there is, is it any different than just "writing C in Haskell"? That is, can one coerce GHC into emitting the mutable-link operations and not doing collections, from high-level code? Although I've definitely seen examples of algorithms that compile down to such unboxed depths, this one has been really resistant to improvement. Thanks, Ryan Reich

On 1/12/19 6:25 AM, Ryan Reich wrote:
I considered using something like STRef to represent the links (so, departing from the plain [a] type) but even then, if you overwrite the contents of one of those, the garbage collector still needs to know. This is better than allocating more space, but it is not necessary because the algorithm never actually makes any of the nodes unreachable.
I thought this did not apply to a copying GC, does it? What is actually in an STRef? If it's small enough then unpacking the tail reference like this may be a good idea: data List s a = Nil | Cons a {-# UNPACK #-} !(STRef s (List s a)) Li-yao

Ryan Reich
I am, for entertainment, trying to figure out how to implement a constant-space bottom-up mergesort in Haskell. This is quite easy to at least describe imperatively: break the list into ordered blocks, merge adjacent ones, and repeat with double the length until there's only one block and you're done. For a linked list like [a] in Haskell, this should be constant-space. Unfortunately, it is not.
You can definitely express the above algorithm nicely if you organize those blocks into an [[a]], but then you are creating O(n) new lists. Even if you go to extremes to keep the blocks concatenated, though, you still have the issue that every time you merge two lists, you are creating a new list and that involves the allocator. (Note that the implementation of Data.List.sort does this.)
The C equivalent would proceed by just overwriting the linking pointers in each node, without allocating new nodes to do so, but of course, that's not going to work in Haskell. I considered using something like STRef to represent the links (so, departing from the plain [a] type) but even then, if you overwrite the contents of one of those, the garbage collector still needs to know. This is better than allocating more space, but it is not necessary because the algorithm never actually makes any of the nodes unreachable. Also, I understand that using a lot of STRefs is bad for efficiency.
Is there any way to do this in (GHC) Haskell and get the algorithm down to the bare minimum? If there is, is it any different than just "writing C in Haskell"? That is, can one coerce GHC into emitting the mutable-link operations and not doing collections, from high-level code? Although I've definitely seen examples of algorithms that compile down to such unboxed depths, this one has been really resistant to improvement.
Thanks, Ryan Reich _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.
If you want your sort function to have type: Ord a => [a] -> [a] then I would think that you'd need to at least n new space anyway (where n is the length of the list), since you cannot actually overwrite the input list anyway. -- - Frank

With list fusion, neither input nor output actually is necessarily
allocated. Though I take your point, I'm not griping about setup code that
maintains a pure facade, but about the allocations that purity demands in
the middle.
On Sat, Jan 12, 2019, 13:59 Frank Staals Ryan Reich I am, for entertainment, trying to figure out how to implement a
constant-space bottom-up mergesort in Haskell. This is quite easy to at
least describe imperatively: break the list into ordered blocks, merge
adjacent ones, and repeat with double the length until there's only one
block and you're done. For a linked list like [a] in Haskell, this
should
be constant-space. Unfortunately, it is not. You can definitely express the above algorithm nicely if you organize
those
blocks into an [[a]], but then you are creating O(n) new lists. Even if
you go to extremes to keep the blocks concatenated, though, you still
have
the issue that every time you merge two lists, you are creating a new
list
and that involves the allocator. (Note that the implementation of
Data.List.sort does this.) The C equivalent would proceed by just overwriting the linking pointers
in
each node, without allocating new nodes to do so, but of course, that's
not
going to work in Haskell. I considered using something like STRef to
represent the links (so, departing from the plain [a] type) but even
then,
if you overwrite the contents of one of those, the garbage collector
still
needs to know. This is better than allocating more space, but it is not
necessary because the algorithm never actually makes any of the nodes
unreachable. Also, I understand that using a lot of STRefs is bad for
efficiency. Is there any way to do this in (GHC) Haskell and get the algorithm down
to
the bare minimum? If there is, is it any different than just "writing C
in
Haskell"? That is, can one coerce GHC into emitting the mutable-link
operations and not doing collections, from high-level code? Although
I've
definitely seen examples of algorithms that compile down to such unboxed
depths, this one has been really resistant to improvement. Thanks,
Ryan Reich
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post. If you want your sort function to have type: Ord a => [a] -> [a] then I
would think that you'd need to at least n new space anyway (where n is
the length of the list), since you cannot actually overwrite the input
list anyway. -- - Frank
participants (3)
-
Frank Staals
-
Li-yao Xia
-
Ryan Reich