Re: Status of nubOrd (Proposal #2629)

On Wed, Feb 24, 2010 at 3:23 PM, Gwern Branwen
nub . sort (and the reverse) are very frequent. The attached sort.txt was produced by
find bin/ -name "*.hs" -exec sh -c "grep sort {} | grep nub && echo {}" \; > sort.txt
It's very crude and obviously produces lots of false positives & negatives, but even so, there's at least >50 uses of nub . sort etc.
Incidentally, data-ordlist has a nubSort implementation that's never more expensive than Data.List.sort. It's the same mergesort algorithm provided in Data.List except that it removes duplicates as it merges the lists. The library also provides a linear-time nub and nubBy on ordered lists, however this implementation is not used as part of nubSort. http://hackage.haskell.org/package/data-ordlist Best, Leon

On Sun, Feb 28, 2010 at 9:39 PM, Leon Smith
On Wed, Feb 24, 2010 at 3:23 PM, Gwern Branwen
wrote: nub . sort (and the reverse) are very frequent. The attached sort.txt was produced by
find bin/ -name "*.hs" -exec sh -c "grep sort {} | grep nub && echo {}" \; > sort.txt
It's very crude and obviously produces lots of false positives & negatives, but even so, there's at least >50 uses of nub . sort etc.
Incidentally, data-ordlist has a nubSort implementation that's never more expensive than Data.List.sort. It's the same mergesort algorithm provided in Data.List except that it removes duplicates as it merges the lists.
Hm. But sort was recently modified with some tricks from nhc, IIRC. Is it up to date?
The library also provides a linear-time nub and nubBy on ordered lists, however this implementation is not used as part of nubSort.
http://hackage.haskell.org/package/data-ordlist
Best, Leon
Thanks for the pointers. I briefly looked at data-ordlist during the searching, but I didn't take the time to figure out what exactly it was doing. -- gwern

On Mon, Mar 1, 2010 at 2:35 PM, Gwern Branwen
Hm. But sort was recently modified with some tricks from nhc, IIRC. Is it up to date?
data-ordlist is up to date now. Thanks for the heads up! I played a bit with implementing the ascending part of the new implementation using a splitAt-style recursion, instead of a DiffList technique. In some cases it appears to be better, in others a bit worse, and in most cases no substantial changes. The difference appears to be mostly related to memory use patterns. If you sort an already sorted list, this technique is ~45% faster, but if you sort a list that consists of long runs of ascending chains, it's a bit slower. I don't know why, and haven't dug any deeper. Best, Leon
participants (2)
-
Gwern Branwen
-
Leon Smith