
Hello Folks,
Sorry to post this here but my ISP seems to have been blacklisted by
University of Bergen SpamAssassin or something and I didn't want to
appear to ignore Ketil.
-------- Original Message --------
Subject: Re: Data.Map, Data.IntMap documentation
Date: Mon, 20 Aug 2007 09:05:56 +0100
From: Adrian Hey
On Thu, 2007-08-16 at 11:52 +0100, Adrian Hey wrote:
http://darcs.haskell.org/packages/collections-ghc6.6/Data/Map/AVL.hs http://darcs.haskell.org/packages/collections-ghc6.6/Data.Trie.General/Data/...
I have some Map-intensive code which may be a useful benchmark for these (i.e. most of the time is spent building and then doing lookups in a large Map - 100s of Mb, typically).
Is AVL compatible enough that I would only need to change the imports to test this?
It should be, unless you're using indexing or something. Otherwise you might get some deprecation warnings (but those functions that are deprecated are still implemented). Also, the Monoid instance is different. Also, size is O(n), not O(1). (But the only reason this is O(1) with Data.Map is that you pay extra for this information on every operation that created the Map in the first place).
What kind of performance would you expect compared to the regular Map?
It depends what your doing. There are some aspects of the current Data.Map that will cause sucky performance in some circumstances, but if your not falling foul of those then I would expect a modest but not particularly dramatic performance increase. Particular problems with Data.Map seem to be... 1 - Poor balancing can mean it can take over twice as long to grow a tree by insertion in "pathological" cases (keys are in fact sorted but you don't know in advance that they are sorted). 2 - For union,intesection etc. Hedge algorithm seems to require something 4 to 5 times as many comparisons as divide and conquer for large maps. Of course what effect this actually has depends on key type and cost of comparison. The clone also gives you much better strictness contol, which may have some influence on performance, space behaviour etc for better or worse (but only if you modify your code to use it of course :-). The only possible adverse factor with the clone is that there's an extra level of indirection overhead in that the underlying implementation is an AVL tree of boxed (key,value) pairs. Again how significant this is depends on key type and comparison costs. For lookup it may be significant for simple keys with cheap comparison. There's also some advantage in reduced heap burn rate as a result of this during insertion etc.. (smaller tree records), and of course better balancing also helps keep paths shorter and heap burn rate down. Anyway, it'd be good to get some feedback re what effect it has on your program, or even whether or not it works :-). It hasn't been given any real thorough tests as such (but then again neither has Data.Map if bug reports are anything to go by). But it's a fairly simple wrapper around AVL (which has been thoroughly tested), so hopefully there's not too much wrong with it. If you try it you should checkout the darcs repository (the hackage version is out of date and nothing like the version in the repository). Regards -- Adrian Hey