RE: AVL Trees available

20 May 2004

      On 20 May 2004 07:50, Adrian Hey wrote:
...
On Wednesday 19 May 2004 10:47 am, Simon Marlow wrote:
...
I'm happy for this to be "the" Data.Trees.AVL library.  I'm sure
others will want to comment on the details.  I wonder whether it
would make sense to have a specialised AVLMap type, which would let
you optimise the representation of AVL-based finite maps?
Do you mean a newtype wrapper for the current AVL type or distinct
algebraic type which contains both the key and the associated value
as separate fields (like the Map type folks were talking about
recently)?
If you mean the former, one thing I was thinking about doing was
creating an AVL based clone of the current Data.FiniteMap library so
folk could experiment with the effect of using both on real programs
just by changing their imports.
Actually I meant the latter, but this is a good idea too.

Data.Map.AVL?
...
If you mean the latter, I'm not keen on this because it would mean
duplication of a lot of code for the new type, and I'm not sure that
it would be an optimisation at all. I considered this but decided
against it on the grounds that I suspect a major contributor to
the true cost of tree operations is heap burn rate. The AVL tree
type has only 3 fields in each constructor (not 5 as with the Map
type for example), and the code was carefully written to avoid
unnecessary heap construction wherever possible. Doing this would
result in 4 fields in each constructor.
The total space overhead for the tree would be smaller, because the pair
constructor is inlined.  5 words per tree node instead of 7, and one
fewer eval each time you examine a node.  However if a lot of
rebalancing happens on the tree, then being able to share the key/value
pairs might be a win.

Code duplication is clearly a problem, though.

Anyway, t'was just a thought...
...
...
...
I'd also like to book "Data.COrdering" too, if that's OK.
Data.COrdering is less obviously the right thing.  Can you give some
motivation for the design here?
Because you you generally do end up combining values which turn out
to be "equal", in some manner. If you don't use a function which
returns a COrdering, then you have to either..
Pass the comparison and combining function separately down your
 recursive search or whatever. But this will have extra runtime
 overhead, which is often unnecessary since the only time the
 combining function is ever used is when the comparison returns EQ.
or..
Use some default combining function (like pairing) and apply the
 real combining function later. But pairing is the only reasonable
 default combining function I can think of, and this not always
 the right type (e.g. for pushing elements in a tree or set union),
 and even when it's OK type wise you've still got the overhead of an
 additional map operation (e.g for set intersection). You could get
 around the type problem using lists I suppose, and use (++) as the
 default combining function, but this doesn't seem very attractive
 to me.
Right, I think I see the reasoning.

In the case of writing/pushing, the COrdering allows you to pass a
composition of a comparison and a combining function around, which looks
cleaner.  It's not obviously more efficient than passing the two
functions around, because the actual comparisons will be going through
an extra level of higher-order function (except that you've INLINEd
everything in sight, so the COrdering combinators probably disappear
:-).

In the case of lookup, it doesn't look like you need COrdering: just
apply the "combining function" to the result of genRead/genTryRead,
right?  But given that you're using COrdering for writes, using it for
read too is reasonable.

Ok, I guess I'm more or less convinced.

Cheers,
	Simon