
On Thursday 02 Mar 2006 10:10 pm, John Meacham wrote:
On Thu, Mar 02, 2006 at 09:13:11PM +0000, Adrian Hey wrote:
I think what we should be aiming for in the long term is automatically derived Generalised Tries (as discussed recently).
Would it be possible to mock up an implementation of this using Data.Generics by chance? so we can use a generalized tree for anything in Data and Ord? I find the idea of a derived generalized trie quite intriguing. would associated types be the best way to express such a thing (so you can use 'Trie a' everywhere as a type rather than some complicated class predicate).
I'm afraid I'm not sufficiently well versed in bleedin' edge type theory and type system extensions to comment. But I would say that my preferance would be for some compile time solution, rather than Data.Generics (because of the runtime cost which I suspect this may involve). This exercise is all about performance after all, so I don't want to throw any away unnecessarily. I think your earlier idea of using DrIFT seemed good, though I'm not sure what you can do with it or whether it needs modifying in any way. (For generalised tries we're deriving new types and then deriving new class instances of the newly derived types.) Since we last discussed this I've been thinking about this and have a few points I'd like to mention if you'd like to give this a shot .. Regarding efficiency, I think that in these nested Maps of Maps of Maps.. it's likely that the overwhelming majority of these Maps are singletons and also that you will get long singleton chains. Also for the Map tuples used for algebraic data type constructors, it's likely that most of the Maps in the tuple are empty. So there's probably some optimised representation you could use to exploit these facts. A good test would be to see if the derivation produced a Trie implemntation for Lists that was as efficient as a handmade ListMap using Tries, maybe similar to.. http://homepages.nildram.co.uk/~ahey/HLibs/Data.StringMap/ Regarding the role of the Ord class in all this, I think we should still insist that Key types are instances of Ord (and that the Trie implementation is consistent with that instance) for several reasons.. 1 - Compatibility with balanced tree implementations. 2 - To give meaningful ordering for functions like elemsAscending etc.. 3 - To enable a standard solution to the "Set of Sets" problem that doesn't require deriving yet another generalised Trie type for the original generalised Trie type, if you see what I mean (more about this later). For derived instances of Ord this probably isn't too hard. It seems somewhat more difficult for hand written instances (e.g. one that used a different ordering from the default for performance reasons). Regarding the "Set of Sets" problem, if the element type is an instance of Ord then we can uniquely represent a Set as a sorted list (obtained via elemsAscending of similar). So we could just use a general Trie based ListMap(Set) implementation for these. All that's needed to implement this an efficient Map implementation for the element type (which has already been derived). This all seems doable in practice, though no doubt there's some gotcha which I've overlooked. I have no plans to do this in the near future but it would be really good if someone else wanted to give it a go (someone such as yourself for example :-). Regards -- Adrian Hey