
2011/5/26 Jacek Generowicz
On 2011 May 26, at 11:16, Christopher Done wrote:
On 26 May 2011 10:45, Jacek Generowicz
wrote: What is the Haskell approach to efficient comparison and lookup of objects by their identity?
Often you just provide your own and implement Eq.
I should be able to run the program on data that becomes available at run time.
Typically you define an id generator and zip anything coming from the input stream up with that generator.
Makes sense.
Whatever algorithm I choose to use for the optimization, will have to do lots of comparisons of Groups and Persons where their *identity* is all that matters: you don't need to look inside the objects.
To achieve this abstraction the usual way is just implementing Eq:
instance Eq Person where Person{personId=id1} == Person{personId=id2} = id1 == id2
Any comments on the relative efficiency of the above as compared to
A == B in the context of
data Foo = A | B | C | D | ... lots more ...
?
(I imagine that a Sufficiently Smart Compiler could reduce (==) :: Person Person to just integer comparison.)
I'm pretty sure GHC will do that for you. An approach similar to the one by Chris Done is to use a bi-directional map between Persons and Ints along the lines of data IdMap a = IdMap (IntMap a) (Map a Int) You can then associate a unique Int with each of your Persons and use this during your comparison. For associating Ints to the Persons, a simple fold or a State monad computation suffice. For the lookups on the further properties of Persons, an additional argument or the Reader monad will do. If you use a State monad and a single operation that associates an Int to a Person, then you additionally get the guarantee (inside a monadic computation) that no two Persons will be associated with the same Int. Efficiency-wise, you'll have O(log(n)) association, O(min(n,W)) access time, and O(1) comparison time with a very low constant factor. See the IntMap documentation for the O(min(n,W)) explanation. Additionally, the code is pure with all the nice properties that come with it. By the way this problem is very similar to the one of observable sharing. See this thread: http://www.haskell.org/pipermail/haskell-cafe/2008-February/039639.html best regards, Simon