
Jean-Philippe Bernardy wrote:
On 2/27/06, Ross Paterson
wrote: We are planning to move to AVL trees for A. Hey, which do not provide a O(1) size function. What will be gain for this move? And what will we lose, apart from O(1) size?
The gains: 1. Christian Maeder and others report a significant performance improvement. This is the main motivating factor.
Indeed, the gain is significant (at least 5% lookup speedup!). Adrian Hey simply proved to me that AVL trees (with his clever constructor choice without an explicit height) are a better data structure than Adam's "Efficient sets: a balancing act". And it's sort of regrettable that the old FiniteMap and the current Data.Map are based on the latter (until someone finds a speedup for Adam's sets)!
2. The AVL tree library is very comprehensive and can be very useful by itself.
I don't share this point. The trees suffer the severe problem that the order argument can easily be mixed up. So I'd prefer not to directly use these trees. I also did not like the hscpp stuff and the many (tricky?) modules when installing it.
3. A separate balanced Tree infrastructure provide many benefits: * Shared code between Map and Set. * Conversion between those should then become more efficient (requested by John Meacham)
Without sharing the current Data.Map is faster and even the currently fastest "AVL (k, a)" would become faster. However, I would not sacrifice sharing for an only marginal speedup (below 1%). (But it would be interesting to know the speedup.) further personal judgements: 1. Data.IntMap and Data.IntSet don't need to be touched since they are fast. 2. (portable) Data.Map and Data.Set modules belong into the base package. These data structures would be useful for ghc itself. 3. Jean-Philippe's collection framework may sit on top in a different package (supporting more generic applications) 4. A few interface cleanups should be no problem Cheers Christian