RE: Data.* collections maintenance

newer
RE: [Haskell] Read Instances for...

older
Data.* collections maintenance

Simon Marlow

24 Oct 2005 24 Oct '05

9:43 a.m.

On 22 October 2005 11:58, Adrian Hey wrote:

...

One problem with AVL is the complexity of most of the code means producing specialised unboxed implementations manually is not an attractive proposition. Hopefully compilers will do this sort of thing for us one day.

But in reality I expect this sort of thing to be more of an issue for benchmark style code using Ints. For more complex keys and comparisons the indirection cost will be relatively insignificant I think. For polymorpic implementations counting the actual number of comparisons required by some function is just as important IMO.

It's not just benchmark code - in GHC we use Int-based maps all over the place for speed. Given your promising results for polymorphic AVL trees, it's a fair bet that you could beat Data.IntMap with a specialised Int version. If you did, we would be forced to use Data.Tree.AVL in GHC :) Cheers, Simon

Show replies by date

Sebastian Sylvan

24 Oct 24 Oct

10:17 a.m.

New subject: Data.* collections maintenance

On 10/24/05, Simon Marlow wrote:

...

On 22 October 2005 11:58, Adrian Hey wrote:

...
One problem with AVL is the complexity of most of the code means producing specialised unboxed implementations manually is not an attractive proposition. Hopefully compilers will do this sort of thing for us one day.

But in reality I expect this sort of thing to be more of an issue for benchmark style code using Ints. For more complex keys and comparisons the indirection cost will be relatively insignificant I think. For polymorpic implementations counting the actual number of comparisons required by some function is just as important IMO.

It's not just benchmark code - in GHC we use Int-based maps all over the place for speed. Given your promising results for polymorphic AVL trees, it's a fair bet that you could beat Data.IntMap with a specialised Int version. If you did, we would be forced to use Data.Tree.AVL in GHC :)

Wouldn't an Array (or DiffArray) based HashMap be even better? /S -- Sebastian Sylvan +46(0)736-818655 UIN: 44640862

Adrian Hey

1:02 p.m.

New subject: Data.* collections maintenance

On Monday 24 Oct 2005 10:43 am, Simon Marlow wrote:

...

It's not just benchmark code - in GHC we use Int-based maps all over the place for speed.

Sure. What I meant was that the only reason anybody is likely to use the polymorphic version with Ints is for benchmarking the polymorphic version. In real code they'll be using something better suited to Ints. I didn't mean IntMaps/Sets wouldn't be used at all in real code.

...

Given your promising results for polymorphic AVL trees, it's a fair bet that you could beat Data.IntMap with a specialised Int version.

Well, as it happens I do have a cut down unboxed version of AVL for IntMaps which I produced just to see how it performed. I'm sure you're all eager to know how it compared with Data.IntMap. Results were a bit mixed.. IIRC lookup was about 30% faster, insertion and deletion were about 10% slower and union was about 70% slower. The union used the same divide and conquer algorithm as the proper AVL library. This might be a case where comparison is so cheap that the Hedge algorithm would be a net win. But as I felt disinclined to write the code needed to find out for sure that is just speculation :-) Regards -- Adrian Hey

Daan Leijen

1:24 p.m.

New subject: Data.* collections maintenance

Adrian Hey wrote:

...

IIRC lookup was about 30% faster, insertion and deletion were about 10% slower and union was about 70% slower.

The union used the same divide and conquer algorithm as the proper AVL library. This might be a case where comparison is so cheap that the Hedge algorithm would be a net win. But as I felt disinclined to write the code needed to find out for sure that is just speculation :-)

Note that IntMap/IntSet use big-endian patricia trees and that therefore union is very efficient (which is good for compilers that collect for example free variables bottom up..). i.e. The IntMap/IntSet do not use a "hedge" algorithm whatsoever but something completely different -- and it makes no sense to compare comparisons here. It is amazing that lookup is faster on AVL trees -- this goes against all my intuition -- did you measure random lookups? All the best, -- Daan.

...

Regards -- Adrian Hey

_______________________________________________ Libraries mailing list Libraries@haskell.org http://www.haskell.org/mailman/listinfo/libraries

Adrian Hey

25 Oct 25 Oct

10:27 a.m.

New subject: Data.* collections maintenance

On Monday 24 Oct 2005 2:24 pm, Daan Leijen wrote:

...

It is amazing that lookup is faster on AVL trees -- this goes against all my intuition -- did you measure random lookups?

Yes, I've just run the benchmarks again (with ghc 6.4), results for maps of 500000 random Ints are (time in no particular units).. Data.IntMap AVL.IntMap Insert 214.0 227.0 Lookup 146.6 104.8 Delete 253.0 254.0 Union 66.0 96.0 Intersection 15.0 41.0 However, I seem to recall somebody finding a bug somewhere in Data.IntMap a while ago. I'm not sure if that's the case, but if so in may be influencing the timings, for better or worse. Can anybody enlighten me about that? (or refute this allegation :-) I can supply AVL.IntMap code and benchmark code if anybody's interested. Regards -- Adrian Hey

7197

Age (days ago)

7198

Last active (days ago)

List overview

Download

4 comments

4 participants

participants (4)

Adrian Hey
Daan Leijen
Sebastian Sylvan
Simon Marlow