
On 22 December 2005 13:43, Christian Maeder wrote:
for storing highly shared data structures we use so called Annotated Terms (shortly ATerms, details below).
http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/ATerm
In contrast to the Binary (or GhcBinary) class we compute the sharing, which saves a lot of space for data types that keep redundant information.
With this we can store some of our data structures (of course only non-cyclic and finite ones) in a few KBs that need MBs if stored without sharing (as when using the Binary or the Show/Read classes).
So far so good. The problem remaining is that an object is _traversed_ as if being unshared and thus the _time_ for the ATermTable construction becomes too long for us.
GHC's internal data structures (on the heap) are in many cases shared, by pointer references. I.e. if I add a (single) symbol table to every symbol that I use, then the symbol table will not be copied, but only a reference added to my symbol.
How can I detect this sharing in order to avoid traversing the very same symbol table for every symbol?
I've tried to use a "Map (Ptr ()) ShATerm". So before traversing an object I look up its address and check if is was traversed before (if not I traverse it and store it in my map for future lookups).
1.) I'm not sure if it is safe to use "Ptr ()" (meanwhile garbage collection, heap compaction and what not could happen).
Right - Ptr isn't the right thing here, because GC will move objects around. That's why we have StablePtr and StableName. In fact, what you really want here is the pointer-equality memo table implementation in the Memo module (package util). This is scheduled to be removed in 6.6, but it will be available as a Cabal package. Cheers, Simon