
Krzysztof Skrze;tnicki wrote:
Ok, my turn now. Let's think about algorithm that takes equivalence relation EQ, ordering relation ORD on abstraction classes generated by this equivalence ( -> equivalence classes ) and divides given input elements XS into appropriate classes and then prints them out according to given ordering ORD. If we pose the restriction (let's call it (*)), that input XS should have at most one element from every abstraction class, we get sorting in a way that you desire. Hovewer, if we don't pose that restriction the algorithm still makes perfect sense and is given by standard library sortBy.
I see no reason for restriction (*).
I don't understand the above paragraph. Let's consider a simple example: (sort [a,b]) in the case we have: (compare a b = EQ) Which of the following 4 possible results are correct/incorrect? 1- [a,a] 2- [a,b] 3- [b,a] 4- [b,b] I would say they are all correct and equivalent for any "sane" Ord instance, though from the point of view of space efficiency 1 or 4 are preferable to 2 or 3.
For efficiency reasons there could be additional class StrictEq. If the type is in that class then we can assume (*) and use more space-efficient algorithm.
Now, the problem with
treeSort = concatMap (reverse . snd) . Map.toAscList . Map.fromListWith (++) . map (\x -> (x,[x]))
is that i can't tell any compact way of implementing treeSortBy in nice manner. This is because Data.Map does not allow us to provide our own comparison test instead of compare.
As a practical matter, for benchmarking you should also count the actual number of comparisons needed, not just execution times for simple types (Ints presumably). Also, I think you'll find that the AVL lib gives better performance than Data.Map, particularly for sorted inputs. Unfortunately you can't use this implementation in the standard libs without making the AVL lib a standard lib (the same happens to be true of Data.Map too, thought this is widely perceived as being standard because of ghc library bundling :-) But actually I would say that if either (both) of these is faster than the the standard sort then this is some kind of performance bug with the current ghc release. They weren't faster last time I tested (with Ints). I also happen to think that sort should be made an Ord class method, so that trie based sorts are possible (which should be faster for complex data types). We should only use sort = sortBy compare as the default. Regards -- Adrian Hey