
On 20 May 2004 07:50, Adrian Hey wrote:
On Wednesday 19 May 2004 10:47 am, Simon Marlow wrote:
I'm happy for this to be "the" Data.Trees.AVL library. I'm sure others will want to comment on the details. I wonder whether it would make sense to have a specialised AVLMap type, which would let you optimise the representation of AVL-based finite maps?
Do you mean a newtype wrapper for the current AVL type or distinct algebraic type which contains both the key and the associated value as separate fields (like the Map type folks were talking about recently)?
If you mean the former, one thing I was thinking about doing was creating an AVL based clone of the current Data.FiniteMap library so folk could experiment with the effect of using both on real programs just by changing their imports.
Actually I meant the latter, but this is a good idea too. Data.Map.AVL?
If you mean the latter, I'm not keen on this because it would mean duplication of a lot of code for the new type, and I'm not sure that it would be an optimisation at all. I considered this but decided against it on the grounds that I suspect a major contributor to the true cost of tree operations is heap burn rate. The AVL tree type has only 3 fields in each constructor (not 5 as with the Map type for example), and the code was carefully written to avoid unnecessary heap construction wherever possible. Doing this would result in 4 fields in each constructor.
The total space overhead for the tree would be smaller, because the pair constructor is inlined. 5 words per tree node instead of 7, and one fewer eval each time you examine a node. However if a lot of rebalancing happens on the tree, then being able to share the key/value pairs might be a win. Code duplication is clearly a problem, though. Anyway, t'was just a thought...
I'd also like to book "Data.COrdering" too, if that's OK.
Data.COrdering is less obviously the right thing. Can you give some motivation for the design here?
Because you you generally do end up combining values which turn out to be "equal", in some manner. If you don't use a function which returns a COrdering, then you have to either..
Pass the comparison and combining function separately down your recursive search or whatever. But this will have extra runtime overhead, which is often unnecessary since the only time the combining function is ever used is when the comparison returns EQ.
or..
Use some default combining function (like pairing) and apply the real combining function later. But pairing is the only reasonable default combining function I can think of, and this not always the right type (e.g. for pushing elements in a tree or set union), and even when it's OK type wise you've still got the overhead of an additional map operation (e.g for set intersection). You could get around the type problem using lists I suppose, and use (++) as the default combining function, but this doesn't seem very attractive to me.
Right, I think I see the reasoning. In the case of writing/pushing, the COrdering allows you to pass a composition of a comparison and a combining function around, which looks cleaner. It's not obviously more efficient than passing the two functions around, because the actual comparisons will be going through an extra level of higher-order function (except that you've INLINEd everything in sight, so the COrdering combinators probably disappear :-). In the case of lookup, it doesn't look like you need COrdering: just apply the "combining function" to the result of genRead/genTryRead, right? But given that you're using COrdering for writes, using it for read too is reasonable. Ok, I guess I'm more or less convinced. Cheers, Simon