
On Wednesday 19 May 2004 10:47 am, Simon Marlow wrote:
I'm happy for this to be "the" Data.Trees.AVL library. I'm sure others will want to comment on the details. I wonder whether it would make sense to have a specialised AVLMap type, which would let you optimise the representation of AVL-based finite maps?
Do you mean a newtype wrapper for the current AVL type or distinct algebraic type which contains both the key and the associated value as separate fields (like the Map type folks were talking about recently)? If you mean the former, one thing I was thinking about doing was creating an AVL based clone of the current Data.FiniteMap library so folk could experiment with the effect of using both on real programs just by changing their imports. If you mean the latter, I'm not keen on this because it would mean duplication of a lot of code for the new type, and I'm not sure that it would be an optimisation at all. I considered this but decided against it on the grounds that I suspect a major contributor to the true cost of tree operations is heap burn rate. The AVL tree type has only 3 fields in each constructor (not 5 as with the Map type for example), and the code was carefully written to avoid unnecessary heap construction wherever possible. Doing this would result in 4 fields in each constructor. But I may be wrong about this, and I guess it wouldn't be to hard to find out for sure by doing a cut and paste job once the current AVL code is a bit more stable. Keeping pairs separately has it's own overheads, like an additional level of indirection during searches and more heap records (so possibly slower gc). But if the tree is so stable that lookup times dominate costs it may be better to convert the entire thing into an array (or maybe an unboxed array of Ints and a corresponding array of values) and do binary searches on that instead.
I'd also like to book "Data.COrdering" too, if that's OK.
Data.COrdering is less obviously the right thing. Can you give some motivation for the design here?
Because you you generally do end up combining values which turn out to be "equal", in some manner. If you don't use a function which returns a COrdering, then you have to either.. Pass the comparison and combining function separately down your recursive search or whatever. But this will have extra runtime overhead, which is often unnecessary since the only time the combining function is ever used is when the comparison returns EQ. or.. Use some default combining function (like pairing) and apply the real combining function later. But pairing is the only reasonable default combining function I can think of, and this not always the right type (e.g. for pushing elements in a tree or set union), and even when it's OK type wise you've still got the overhead of an additional map operation (e.g for set intersection). You could get around the type problem using lists I suppose, and use (++) as the default combining function, but this doesn't seem very attractive to me. Anyway, if that doesn't persuade you I can give a more pragmatic reason :-) The AVL library depends on COrdering, so if it's going to exist then COrdering has to exist too. I could make it part of the AVL library, but since COrdering is not in any way dependent on the AVL library (and potentially useful in it's own right elsewhere) then I think it should exist independently of the AVL library. This will stop people having to reinvent their own (incompatible) COrdering equivalents. Of course, I'm open to persuasion regarding the type name and the support functions provided in the module, if anybody has strong objections to what I've chosen. Regards -- Adrian Hey