Efficient object identity (aka symbols as data)

[ TLDR: How do you do Lisp symbols in Haskell? ] What is the Haskell approach to efficient comparison and lookup of objects by their identity? Maybe a toy example would help to explain what I mean. Imagine that I want to use Haskell to maximize happiness in a situation where a bunch of people have to be assigned to certain groups according to their preferences, and some constraints on the group sizes. Conceptually my input data might be structured as follows: data Group = Group GroupName MaxGroupSize data Person = Person Name Group Group Group The program should partition the people into groups, attempting to get everyone into their most favoured groups. Whatever algorithm I choose to use for the optimization, will have to do lots of comparisons of Groups and Persons where their *identity* is all that matters: you don't need to look inside the objects. On the other hand, sometimes we will have to look inside the objects, for example in order to answer queries such as "Which group did Johnny get and how far down his list of choices did that group feature?". I should be able to run the program on data that becomes available at run time. The lisper in me is crying out for (lisp-like-)symbols which I can create from the input data at run time and on which some extra information can be hung. How would I organize something like this in Haskell? Thanks.

On Thu, May 26, 2011 at 04:45, Jacek Generowicz
What is the Haskell approach to efficient comparison and lookup of objects by their identity?
ghc uses Data.Unique to generate unique internal identifiers to associate with things. (Think gensym. Hm, except last time I did anything serious with Lisp, it was Maclisp... does gensym even still exist, or did CL do something inscrutable with it?) Beyond that, the existence of functions such as reallyUnsafePointerEquality# makes me think it's a Hard Problem.

On 2011 May 26, at 11:12, Brandon Allbery wrote:
On Thu, May 26, 2011 at 04:45, Jacek Generowicz
wrote: What is the Haskell approach to efficient comparison and lookup of objects by their identity?
ghc uses Data.Unique to generate unique internal identifiers to associate with things.
At first blush it sounds like the sort of thing I'm after. Thanks.
(Think gensym. Hm, except last time I did anything serious with Lisp, it was Maclisp... does gensym even still exist, or did CL do something inscrutable with it?)
CL has gensym. But gensym does seem to be overkill in the case I presented. I *could*, for example, say data Person = Fred | Johnny | Sally | Belinda | Kate | Roger | Eric [... and so on for a few squillion ...] data Group = Amazing | Brilliant | Cool | Great | Funky [ ... and so on for a few dozen ...] preferences :: Map Person (Group, Group, Group) In this setup (Fred == Johnny) will be very cheap, and (lookup Fred preferences) will be less cheap but possible. I'd use gensym if I need a new symbol right about which I know nothing other than I want it to be new and unique (nobody has created it before, and nobody will in the future). In the example scenario, I know what I want the symbol is to be called, and am prepared to accept the responsibility for avoiding duplicates, but I still want to be able create it at run time. In the Haskell snippet above, the compiler protects me against duplication, but forces me to know the data at compile time. In Lisp terms, I'm looking for make-symbol and intern.
Beyond that, the existence of functions such as reallyUnsafePointerEquality# makes me think it's a Hard Problem.
:-)

On Thu, May 26, 2011 at 05:41, Jacek Generowicz
On 2011 May 26, at 11:12, Brandon Allbery wrote:
(Think gensym. Hm, except last time I did anything serious with Lisp, it was Maclisp... does gensym even still exist, or did CL do something inscrutable with it?)
But gensym does seem to be overkill in the case I presented. (...)
In Lisp terms, I'm looking for make-symbol and intern. I think I just landed on "inscrutable"; (gensym) used to do pretty much that, it rolled symbols starting from 'a0000 for the first one generated in a given interpreter. (It has occurred to me that I was not entirely clear; the "Mac" in "Maclisp" was MACSYMA. Ancient stuff.)

On 2011 May 26, at 11:59, Brandon Allbery wrote:
On Thu, May 26, 2011 at 05:41, Jacek Generowicz
wrote: On 2011 May 26, at 11:12, Brandon Allbery wrote: (Think gensym. Hm, except last time I did anything serious with Lisp, it was Maclisp... does gensym even still exist, or did CL do something inscrutable with it?)
But gensym does seem to be overkill in the case I presented. (...) In Lisp terms, I'm looking for make-symbol and intern.
I think I just landed on "inscrutable";
Nah, it's probably just me confusing you.
(gensym) used to do pretty much that, it rolled symbols starting from 'a0000 for the first one generated in a given interpreter.
It still does essentially that. (Just don't be fooled by the "name" a0000: you can't access that symbol through that name. Aaaah, maybe in Maclisp it really did intern them under that name, but that would surprise me). '(gensym)' in CL is a bit like 'object()' in Python or 'new MyEmptyClass' in C++: the key point being that if you don't bind the thing being created right now, you'll never be able to get your hands on it again. Coming back on topic: Yes, I could use gensym for my purposes (though CL provides a variety of similar yet subtly different tools, which is why gensym feels a bit weird in this context).

On 26 May 2011 10:45, Jacek Generowicz
What is the Haskell approach to efficient comparison and lookup of objects by their identity?
Often you just provide your own and implement Eq.
I should be able to run the program on data that becomes available at run time.
Typically you define an id generator and zip anything coming from the input stream up with that generator. So: persons <- fmap (zipWith addId [1..] . map parsePerson) (hGetContents handle) where addId id person = person { personId = id }
Whatever algorithm I choose to use for the optimization, will have to do lots of comparisons of Groups and Persons where their *identity* is all that matters: you don't need to look inside the objects.
To achieve this abstraction the usual way is just implementing Eq: instance Eq Person where Person{personId=id1} == Person{personId=id2} = id1 == id2 You can also implement ordering if required: instance Ord Person where Person{personId=id1} `compare` Person{personId=id2} = id1 `compare` id2 Then you can write person1 == person2 and person1 > person2, etc. without "looking inside" the object, and all the library functions you have available like the list functions that work on Eq instances and Ord instances will work for your data type automatically. There're also some packages on Hackage for generating unique ids.

On 2011 May 26, at 11:16, Christopher Done wrote:
On 26 May 2011 10:45, Jacek Generowicz
wrote: What is the Haskell approach to efficient comparison and lookup of objects by their identity?
Often you just provide your own and implement Eq.
I should be able to run the program on data that becomes available at run time.
Typically you define an id generator and zip anything coming from the input stream up with that generator.
Makes sense.
Whatever algorithm I choose to use for the optimization, will have to do lots of comparisons of Groups and Persons where their *identity* is all that matters: you don't need to look inside the objects.
To achieve this abstraction the usual way is just implementing Eq:
instance Eq Person where Person{personId=id1} == Person{personId=id2} = id1 == id2
Any comments on the relative efficiency of the above as compared to A == B in the context of data Foo = A | B | C | D | ... lots more ... ? (I imagine that a Sufficiently Smart Compiler could reduce (==) :: Person Person to just integer comparison.) Thank you.

2011/5/26 Jacek Generowicz
On 2011 May 26, at 11:16, Christopher Done wrote:
On 26 May 2011 10:45, Jacek Generowicz
wrote: What is the Haskell approach to efficient comparison and lookup of objects by their identity?
Often you just provide your own and implement Eq.
I should be able to run the program on data that becomes available at run time.
Typically you define an id generator and zip anything coming from the input stream up with that generator.
Makes sense.
Whatever algorithm I choose to use for the optimization, will have to do lots of comparisons of Groups and Persons where their *identity* is all that matters: you don't need to look inside the objects.
To achieve this abstraction the usual way is just implementing Eq:
instance Eq Person where Person{personId=id1} == Person{personId=id2} = id1 == id2
Any comments on the relative efficiency of the above as compared to
A == B in the context of
data Foo = A | B | C | D | ... lots more ...
?
(I imagine that a Sufficiently Smart Compiler could reduce (==) :: Person Person to just integer comparison.)
I'm pretty sure GHC will do that for you. An approach similar to the one by Chris Done is to use a bi-directional map between Persons and Ints along the lines of data IdMap a = IdMap (IntMap a) (Map a Int) You can then associate a unique Int with each of your Persons and use this during your comparison. For associating Ints to the Persons, a simple fold or a State monad computation suffice. For the lookups on the further properties of Persons, an additional argument or the Reader monad will do. If you use a State monad and a single operation that associates an Int to a Person, then you additionally get the guarantee (inside a monadic computation) that no two Persons will be associated with the same Int. Efficiency-wise, you'll have O(log(n)) association, O(min(n,W)) access time, and O(1) comparison time with a very low constant factor. See the IntMap documentation for the O(min(n,W)) explanation. Additionally, the code is pure with all the nice properties that come with it. By the way this problem is very similar to the one of observable sharing. See this thread: http://www.haskell.org/pipermail/haskell-cafe/2008-February/039639.html best regards, Simon

On 26/05/2011 10:59 AM, Jacek Generowicz wrote:
Any comments on the relative efficiency of the above as compared to
A == B in the context of
data Foo = A | B | C | D | ... lots more ...
?
(I imagine that a Sufficiently Smart Compiler could reduce (==) :: Person Person to just integer comparison.)
My understanding is that if you have a constructor with no fields, it gets allocated as a compile-time constant. In other words, "C" is just a pointer to a static data structure somewhere in the program binary, and (==) effectively becomes pointer equity. OTOH, I am not a GHC developer...

On Thu, May 26, 2011 at 14:56, Andrew Coppin
My understanding is that if you have a constructor with no fields, it gets allocated as a compile-time constant. In other words, "C" is just a pointer to a static data structure somewhere in the program binary, and (==) effectively becomes pointer equity.
When used as a general function, at least; my understanding is that it's usually reduced to an integer tag (and that this is relied on to do fast conversions internally)

On 26/05/2011 07:56 PM, Andrew Coppin wrote:
On 26/05/2011 10:59 AM, Jacek Generowicz wrote:
Any comments on the relative efficiency of the above as compared to
A == B in the context of
data Foo = A | B | C | D | ... lots more ...
?
(I imagine that a Sufficiently Smart Compiler could reduce (==) :: Person Person to just integer comparison.)
My understanding is that if you have a constructor with no fields, it gets allocated as a compile-time constant. In other words, "C" is just a pointer to a static data structure somewhere in the program binary, and (==) effectively becomes pointer equity.
OTOH, I am not a GHC developer...
...and what I *should* of course have written is "go benchmark it". ;-)

Based on the description it looks like you could be looking for:
http://hackage.haskell.org/package/simple-atom
G
On Thu, May 26, 2011 at 10:45 AM, Jacek Generowicz
[ TLDR: How do you do Lisp symbols in Haskell? ]
What is the Haskell approach to efficient comparison and lookup of objects by their identity?
Maybe a toy example would help to explain what I mean.
Imagine that I want to use Haskell to maximize happiness in a situation where a bunch of people have to be assigned to certain groups according to their preferences, and some constraints on the group sizes. Conceptually my input data might be structured as follows:
data Group = Group GroupName MaxGroupSize data Person = Person Name Group Group Group
The program should partition the people into groups, attempting to get everyone into their most favoured groups.
Whatever algorithm I choose to use for the optimization, will have to do lots of comparisons of Groups and Persons where their *identity* is all that matters: you don't need to look inside the objects. On the other hand, sometimes we will have to look inside the objects, for example in order to answer queries such as "Which group did Johnny get and how far down his list of choices did that group feature?".
I should be able to run the program on data that becomes available at run time.
The lisper in me is crying out for (lisp-like-)symbols which I can create from the input data at run time and on which some extra information can be hung. How would I organize something like this in Haskell?
Thanks.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
--
Gregory Collins

On Thu, May 26, 2011 at 7:26 PM, Gregory Collins
Based on the description it looks like you could be looking for:
http://hackage.haskell.org/package/simple-atom
G
Coincidentally, I put up a question at stackoverflow for this just yesterday - http://stackoverflow.com/questions/6164260/why-doesnt-haskell-have-symbols-a... . Why doesn't Haskell have built in syntactic sugar for atoms? -- Anupam
participants (10)
-
Andrew Coppin
-
Anupam Jain
-
Brandon Allbery
-
Christopher Done
-
Don Stewart
-
Gregory Collins
-
Jacek Generowicz
-
Jacek Generowicz
-
Simon Meier
-
Stephen Tetley