
- Since someone raised the issue of test suites for performance and correctness, I think it would also be interesting to investigate what effect the strictness annotations in the tree constructors have on performance. Everyone takes for granted that bangs=faster, but I've noticed, as have others, that removing these strictness annotations actually make things run faster. Instead, the tree construction _functions_ should be made strict using seq. The Map construction functions are already strict enough on account of the balancing. Try it for yourself, remove the bangs from the sub-tree fields in the Bin constructor, and run a little benchmark. This would open up the possibility for lazy versions of functions like mapWithKey and mapKeysMonotonic. I had mentioned this previously, but few seemed to care, so I just made the change locally. http://www.haskell.org/pipermail/libraries/2008-August/010371.html . If we we're going to start a little Map/Set/IntMap/IntSet working group, then I'd like to throw that idea back into the mix.
Of course, anything that boosts performance is a win. Does anyone know of any Map benchmarks? I suppose I could write some, but I'd be sort of surprised if someone else hasn't already done that. Then we could throw in all the various avl / redblack / whatever map variants in there and see how they stack up.
IntMap is hard coded to Int, which is 32 bits on a 32 bit architecture, and 64 bits on a 64 bit architecture. I think the main reason for this is so that the key can be unpacked/specialized for extra performance, since that is the primary purpose of the library. If you just need sublinear insert/lookup/delete then use a Map.
To be honest, I had no particular rational reason for wanting IntMap (and in fact I'm not using it). I just heard some mumbling on the list about "Data.Map so slow, IntMap so much faster especially for unions". Though to be honest, probably what most matters to me is how much garbage an insert generates (and similarly how much the resulting structures will share). I didn't run any tests or anything, so I wasn't that upset to not be able to use IntMap :)
Are you galled that the key type is not hardcoded to Int64, or that the key is not allowed to be any instance of the Bits class? In the former case, you'd inflate the size of the tree and take a small speed hit on 32bit archs, and in the latter I think you'd have even worse performance. I think the choice of Int is wise because it is the fastest, and that is the point of the library, and if you think about it you're not going to be dealing with more that 2^32 Ints on a 32 bit computer. (What would they be referencing? Not array positions or file offsets.) If you're trying to
Points in time. I was using Int64 at the time, but I'm actually using Doubles now. Plain Data.Map is working fine in practice.
optimize, say, "Map (Int32,Int32) a" into "IntMap64 a", there are other ways to accomplish that:
Ah yes, splitting the numbers would work for 64 bit words.