
Hi, I wonder how well http://www.haskell.org/ghc/docs/latest/html/libraries/fgl/Data-Graph-Inducti... performs wrt. other Map implementations. It may be worth replacing this implementation, too. Cheers Christian

Hello, On Monday 09 Jan 2006 1:11 pm, Christian Maeder wrote:
Hi,
I wonder how well
http://www.haskell.org/ghc/docs/latest/html/libraries/fgl/Data-Graph-Induct ive-Internal-FiniteMap.html
performs wrt. other Map implementations.
It may be worth replacing this implementation, too.
I suspect it doesn't compare too well with Data.Tree.AVL, though I've never measured it to be honest. It's based on "AVL trees" (or rather trees where left and right sub-tree heights differ by at most 1), but doesn't seem to use the AVL algorithm itself. IIRC it stores (boxed!) height in each tree node, not "balance factor". As well as requiring an extra field (same as Adams) this also causes other inefficiencies. Insertion requires inspection of nodes which aren't on the insertion path to determine the balance factor, which has gotta slow things down. That said, it should still do a pretty good job of balancing and the balancing mechanism itself and node size will have little impact on look up times (for example). It'd be worth a look at to see what functions this provides that weren't provided by the old FiniteMap implementation, and whether or not these are still absent from Data.Map (I believe this was the original motivation for FGL having it's own private implementation in the first place). Regards -- Adrian Hey

On Monday 09 Jan 2006 6:28 pm, Adrian Hey wrote:
I suspect it doesn't compare too well with Data.Tree.AVL, though I've never measured it to be honest. It's based on "AVL trees" (or rather trees where left and right sub-tree heights differ by at most 1), but doesn't seem to use the AVL algorithm itself. IIRC it stores (boxed!) height in each tree node, not "balance factor". As well as requiring an extra field (same as Adams) this also causes other inefficiencies. Insertion requires inspection of nodes which aren't on the insertion path to determine the balance factor, which has gotta slow things down.
Oh, and not forgeting that it's lazy (no strict fields of seqs) anywhere. This may be good thing, but I suspect (for AVL trees at least) it's a bad thing because of the way height/balance information propogates up the tree. Regards -- Adrian Hey

Adrian Hey wrote:
It'd be worth a look at to see what functions this provides that weren't provided by the old FiniteMap implementation, and whether or not these are still absent from Data.Map (I believe this was the original motivation for FGL having it's own private implementation in the first place).
splitFM :: Ord a => FiniteMap a b -> a -> Maybe (FiniteMap a b, (a, b)) combines delFrom and lookup splitMinFM :: Ord a => FiniteMap a b -> Maybe (FiniteMap a b, (a, b)) combines splitFM and minFM The above two are special. splitFM could be expressed using Data.Map.updateLookupWithKey (with one traversal), I think. splitMinFM corresponds to Data.Map.deleteFindMin Christian

On Monday 09 Jan 2006 6:28 pm, Adrian Hey wrote:
That said, it should still do a pretty good job of balancing and the balancing mechanism itself and node size will have little impact on look up times (for example).
..or maybe not. Looking at this code again reminds me of something strange that happened when I tested the RedBlack tree implementation Chris Okasaki sent me a while back. It required precisely twice as many comparisons as the RedBlack code Malcolm Wallace supplied. I'm not sure exactly why, but I suspect it probably had something to do with the use of either guarded equational style code (or maybe if then elses) rather than "case compare x y of.." style code. The FGL implementation seems to do the same thing. So I guess lookup performance might not be too good either, at least for non-trivial comparisons. Regards -- Adrian Hey

On Wednesday 11 Jan 2006 7:09 am, Adrian Hey wrote:
The FGL implementation seems to do the same thing. So I guess lookup performance might not be too good either, at least for non-trivial comparisons.
Well, speaking of "non-trivial comparisons" reminds me of something that's been worrying me about this whole approach to implementing Sets and Maps. I.E. Requiring that element/key types are instances of Ord and assuming that lookup etc is achieved by performing some kind of binary search. This seems wrong in general. E.G. A trie based implementation of ListSet/ListMap does not require that lists are instances of Ord. Furthermore, a trie based implementation for Lists is likely to be very much more efficient than the proposed polymorphic Sets/Maps based on balanced binary trees, because it avoids a lot of repeated comparison on the same list element. What I really want is Sets/Maps that work efficiently where the element/keys can be arbitrarily complex data structures, so comparison is non-trivial and typically quite expensive. (I guess we need to restrict "arbitrarily complex" to exclude cyclic data structures.) So couldn't we generalise the idea of tries to work with any data type? Suppose I wanted a Set or Map of pairs. The obvious implementation (using current polymorphic Sets/Maps) would be.. -- Set of pairs (a,b) newtype Set2 a b = Set2 (Map a (Set b)) -- Map of pairs (a,b) to values v newtype Map2 a b v = Map2 (Map a (Map b v)) .. and for triples .. -- Set of triples (a,b,c) newtype Set3 a b c = Set3 (Map a (Set2 b c)) -- or perhaps .. newtype Set3 a b c = Set3 (Map2 a b (Set c)) -- Map of triples (a,b,c) to values v newtype Map3 a b c v = Map3 (Map a (Map2 b c v)) -- or perhaps .. newtype Map3 a b c v = Map3 (Map2 a b (Map c v)) This can obviously be extended for any product type or data constructor of arbitrary arity. For sum of product types with N distinct constructors, what's needed is an N-tuple of product Sets/Maps (one for each constructor), with nullary constructors just using a simple Bool/Maybe. So, assuming this scheme can be used to implement Sets/Maps for any product or algebraic data type, the only real implementation needed is something like IntSet/IntMap (which could be used for CharSet/CharMap etc too). We'd probably want a hand generated implementation for Lists using "proper" tries, but other than these cases couldn't all this stuff just be derived by the compiler? (would be nice if it could). The only problem I see here is the structural equality problem we've discussed at some length recently. The above scheme assumes structural equality. If this isn't what's wanted then I guess we'd have to rely on hand written instances (as is the case with Eq/Ord). This also highlights another reason why IMO specifying "biasing" is bad, because doing this implicitly assumes that Sets/Maps somehow contain concrete structural representations of elements/keys. This is not necessarily so. It's not true with any implementation like that described above (or even simple tries). Hmm.., just thought of something else. Although this scheme does not require types to be instances of Ord, it would be nice if derived or hand written instances of Set/Map were consistent with any Ord instances, so we could have meaningful ordering with functions like elemsAscending, foldAscending etc.. elemsAscending :: (Set s e, Ord e) => s e -> [e] foldAscending :: (Set s e, Ord e) => (e -> a -> a) -> a -> s e -> a Any comments? Or maybe suggestions about how to use exotic ghc extensions (about which I understand little) to implement this? (I know we shouldn't be assuming ghc, in an ideal world :-) [P.S Now of course having written all this I decided this must surely have been done already by someone :-) Googling for "generalised tries" quickly took me to this paper by Ralf Hinze.. http://citeseer.ist.psu.edu/233124.html which in turn references Chris Okasakis book So why don't we implement something like this? (somehow:-) ] Regards -- Adrian Hey
participants (2)
-
Adrian Hey
-
Christian Maeder