
Hello Christian On Friday 21 Oct 2005 3:41 pm, Christian Maeder wrote:
(It would also be handy if your modules could be passed to the same properties and performance checker if we had those.)
Yes, definining a common API which useable in actual applications seems rather difficult, but even if we just agreed a common simple API for correctness testing and benchmarking that would be useful (these could be made much more thorough if they didn't have to be re-written for each new implementation).
A different performance profile is of course fine (and desirable if there is a big gain in certain or even most cases).
AVL seems to gain in two ways. 1- It does a better job of balancing in pathological cases like growing a tree from a sorted list (where the tree is not known to be sorted). 2- The Hedge algorithm seems to be a bit of a problem IMO. The actual benchmark times for AVL union and Data.Set union aren't significantly different for sets of Ints (AVL seemed marginally faster, but nothing to get excited about). But if you count the number of comparisons actually performed Data.Set (Hedge algorithm) is doing about 3..5 times as many comparisons. Presumably this would be reflected in measured execution times in cases where set element comparison was less trivial than comparing Ints. Also, you'd expect the additional indirection overhead of using an AVL tree of association pairs (vs. specialised Data.Map) to slow down lookup somewhat. But it doesn't seem to significantly IIRC, even with Ints. Not sure why. Maybe better balancing is giving shorter search paths, or maybe use of Ord dictionary passing (vs. HOFs) is more expensive. One problem with AVL is the complexity of most of the code means producing specialised unboxed implementations manually is not an attractive proposition. Hopefully compilers will do this sort of thing for us one day. But in reality I expect this sort of thing to be more of an issue for benchmark style code using Ints. For more complex keys and comparisons the indirection cost will be relatively insignificant I think. For polymorpic implementations counting the actual number of comparisons required by some function is just as important IMO.
So far, the problem always was to agree on an accepted interface (that I didn't want to be a class but leave to the discipline of library writers.)
How about something like Data.Set.Tree.AVL (and Data.Set.Seq.Ordered) or Data.SetByAVLTree (and Data.SetByOrderedSeq)?
Dunno. At one time I did have something like Data.Tree.AVL.Clone.Data.Set and Data.Tree.AVL.Clone.Data.Map, but I never actually posted the implementations. (I wish I had now because it seems I've lost them.) Regards -- Adrian Hey