[Haskell-cafe] Help optimising a Haskell program

21 Mar 2011

      Hi,

I have a Haskell program I'm trying to optimise, and could use some assistance.

It's for rank aggregation - taking a bunch of partial rankings of some
items from users and turning them into an overall ranking (aka "That
thing that Hammer Principle does").

The code is here: https://github.com/DRMacIver/hs-rank-aggregation.
The interesting code lives in the Algorithms.RankAggregation module.

The sample data I'm running it on is samples/electornot (voting data
from http://electornot.org.uk/).  Here's a profile:
https://gist.github.com/1ee9356f45330e9f8caa

Here's a recent profile: https://gist.github.com/1ee9356f45330e9f8caa

It's doing pretty well on this data, in that it's taking 8 seconds for
180k records (down from about 80 seconds earlier today) on my
computer, but I'd like it to be faster - ideally under a second.
Admittedly this is an arbitrary figure and I will just move the goal
posts if I achieve it, but it would still be nice. :-)

I suspect that it may be hitting the limits of what can be improved
without actually changing the algorithm (which I'm open to doing, but
not if it makes the results significantly worse), but the last 3 or 4
times I've thought that it was shortly followed by an optimisation
which halved the runtime or better. In particular the most likely
culprit for improvement is to deal with the fact that it allocates
nearly 5GB of data (that appears to be mostly transient - watching the
process as it runs the memory usage never seems to raise above 150MB,
which is large but unproblematic). The biggest culprit for this is
divideAndConquer which is probably generating a silly number of
intermediate lists, but it's not obvious how to fix that - maybe by
making the tree structure in the recursion more explicit? Don't know.
Suggestions very welcome.

Regards,
David

[Haskell-cafe] Help optimising a Haskell program

David MacIver