
Hi, I have a Haskell program I'm trying to optimise, and could use some assistance. It's for rank aggregation - taking a bunch of partial rankings of some items from users and turning them into an overall ranking (aka "That thing that Hammer Principle does"). The code is here: https://github.com/DRMacIver/hs-rank-aggregation. The interesting code lives in the Algorithms.RankAggregation module. The sample data I'm running it on is samples/electornot (voting data from http://electornot.org.uk/). Here's a profile: https://gist.github.com/1ee9356f45330e9f8caa Here's a recent profile: https://gist.github.com/1ee9356f45330e9f8caa It's doing pretty well on this data, in that it's taking 8 seconds for 180k records (down from about 80 seconds earlier today) on my computer, but I'd like it to be faster - ideally under a second. Admittedly this is an arbitrary figure and I will just move the goal posts if I achieve it, but it would still be nice. :-) I suspect that it may be hitting the limits of what can be improved without actually changing the algorithm (which I'm open to doing, but not if it makes the results significantly worse), but the last 3 or 4 times I've thought that it was shortly followed by an optimisation which halved the runtime or better. In particular the most likely culprit for improvement is to deal with the fact that it allocates nearly 5GB of data (that appears to be mostly transient - watching the process as it runs the memory usage never seems to raise above 150MB, which is large but unproblematic). The biggest culprit for this is divideAndConquer which is probably generating a silly number of intermediate lists, but it's not obvious how to fix that - maybe by making the tree structure in the recursion more explicit? Don't know. Suggestions very welcome. Regards, David