
nubBy is a very good suggestion. Added! Regarding good hash functions: if your data structure is algebraic, you can derive generic and Hashable will give you a pretty good hash function:
data ADT a = C0 Int String | C1 [a] deriving Generic
instance Hashable a => Hashable (ADT a)
It's magic!
- Clark
On Mon, Jul 15, 2013 at 11:35 PM, Richard A. O'Keefe
On 16/07/2013, at 3:21 PM, Clark Gaebel wrote:
I'm still against having an Ord version, since my intuition tells me that hash-based data structures are faster than ordered ones.
There are at least four different things that "an Ord version" might mean:
- first sort a list, then eliminate duplicates - sort a list eliminating duplicates stably as you go (think 'merge sort', using 'union' instead of 'merge') - build a balanced tree set as you go - having a list that is already sorted, use that to eliminated duplicates cheaply.
These things have different costs. For example, if there are N elements of which U are unique, the first as O(N.log N) cost, the third has O(N.log U) cost, and the fourth has O(N) cost.
What I want is more often ordNubBy than ordNub, though.
Someone else can write the patch, though!
As a tangent, can anyone think of a data structure for which you can write an Ord instance but Hashable/Eq is impossible (or prove otherwise)? How about the converse?
Since Ord has Eq as a superclass, and since 0 is a functionally correct hash value for anything, if you can implement Ord you can obviously implement Hashable/Eq. Whether it is *useful* to do so is another question.
It turns out that it _is_ possible to define good quality hash functions on sets, but most code in the field to do so is pretty bad. (Just a modular sum or exclusive or.)
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe