Re: Data.* collections maintenance

22 Oct 2005

      Hello Christian

On Friday 21 Oct 2005 3:41 pm, Christian Maeder wrote:
...
(It would also be handy if your modules could be passed to the same
properties and performance checker if we had those.)
Yes, definining a common API which useable in actual applications
seems rather difficult, but even if we just agreed a common simple
API for correctness testing and benchmarking that would be useful
(these could be made much more thorough if they didn't have to
be re-written for each new implementation).
...
A different performance profile is of course fine (and desirable if
there is a big gain in certain or even most cases).
AVL seems to gain in two ways.

1- It does a better job of balancing in pathological cases like growing
   a tree from a sorted list (where the tree is not known to be sorted).

2- The Hedge algorithm seems to be a bit of a problem IMO. The actual
   benchmark times for AVL union and Data.Set union aren't
   significantly different for sets of Ints (AVL seemed marginally
   faster, but nothing to get excited about).
   But if you count the number of comparisons actually performed
   Data.Set (Hedge algorithm) is doing about 3..5 times as many
   comparisons. Presumably this would be reflected in measured
   execution times in cases where set element comparison was less
   trivial than comparing Ints.

Also, you'd expect the additional indirection overhead of using
an AVL tree of association pairs (vs. specialised Data.Map) to
slow down lookup somewhat. But it doesn't seem to significantly
IIRC, even with Ints. Not sure why. Maybe better balancing is
giving shorter search paths, or maybe use of Ord dictionary
passing (vs. HOFs) is more expensive.

One problem with AVL is the complexity of most of the code means
producing specialised unboxed implementations manually is not an
attractive proposition. Hopefully compilers will do this sort of
thing for us one day.

But in reality I expect this sort of thing to be more of an issue
for benchmark style code using Ints. For more complex keys and
comparisons the indirection cost will be relatively insignificant
I think. For polymorpic implementations counting the actual number
of comparisons required by some function is just as important IMO.
...
So far, the problem always was to agree on an accepted interface (that I
didn't want to be a class but leave to the discipline of library writers.)
How about something like Data.Set.Tree.AVL (and Data.Set.Seq.Ordered)
or Data.SetByAVLTree (and Data.SetByOrderedSeq)?
Dunno. At one time I did have something like Data.Tree.AVL.Clone.Data.Set
and Data.Tree.AVL.Clone.Data.Map, but I never actually posted the
implementations. (I wish I had now because it seems I've lost them.)

Regards
--
Adrian Hey