Sorting efficiency

newer
Talk about Haskell at Ubucon in...

David Feuer

4 Aug 2012 4 Aug '12

5:23 p.m.

I'm writing a toy program (for a SPOJ problem--see https://www.spoj.pl/problems/ABCDEF/ ) and the profiler says my performance problem is that I'm spending too much time sorting. I'm using Data.List.sort on [Int32] (it's a 32-bit architecture). Others, using other languages, have managed to solve the problem within the time limit using the same approach I've taken (I believe), but mine is taking too long. Any suggestions? Do I need to do something insane like sorting in an STUArray? David Feuer

Show replies by date

Clark Gaebel

4 Aug 4 Aug

7:55 p.m.

It's generally not advisable to use Data.List for performance-sensitive parts of an application. Try using Data.Vector instead: http://hackage.haskell.org/package/vector On Sat, Aug 4, 2012 at 11:23 AM, David Feuer wrote:

...

I'm writing a toy program (for a SPOJ problem--see https://www.spoj.pl/problems/ABCDEF/ ) and the profiler says my performance problem is that I'm spending too much time sorting. I'm using Data.List.sort on [Int32] (it's a 32-bit architecture). Others, using other languages, have managed to solve the problem within the time limit using the same approach I've taken (I believe), but mine is taking too long. Any suggestions? Do I need to do something insane like sorting in an STUArray?

David Feuer

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

David Feuer

8:47 p.m.

I realized my algorithm is insane. The correct way to sort [a*b|a<-A, b<-B] is clearly to sort A and B, then for each a in A construct either map (a*) B or map (a*) (reverse B), depending on the sign of a, then merge all these results together with a merge that collapses duplicates. I was multiplying and then sorting, which is way worse. The same (modulo sign) goes for adding lists. On Aug 4, 2012 1:55 PM, "Clark Gaebel" wrote:

...

It's generally not advisable to use Data.List for performance-sensitive parts of an application.

Try using Data.Vector instead: http://hackage.haskell.org/package/vector

On Sat, Aug 4, 2012 at 11:23 AM, David Feuer wrote:

...
I'm writing a toy program (for a SPOJ problem--see https://www.spoj.pl/problems/ABCDEF/ ) and the profiler says my performance problem is that I'm spending too much time sorting. I'm using Data.List.sort on [Int32] (it's a 32-bit architecture). Others, using other languages, have managed to solve the problem within the time limit using the same approach I've taken (I believe), but mine is taking too long. Any suggestions? Do I need to do something insane like sorting in an STUArray?

David Feuer

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Heinrich Hördegen

5 Aug 5 Aug

9:59 a.m.

Hi David, can this help you? http://www.scribd.com/doc/81029312/5/Sorting-pairwise-sums Heinrich Am 04.08.2012 20:47, schrieb David Feuer:

...

I realized my algorithm is insane. The correct way to sort [a*b|a<-A, b<-B] is clearly to sort A and B, then for each a in A construct either map (a*) B or map (a*) (reverse B), depending on the sign of a, then merge all these results together with a merge that collapses duplicates. I was multiplying and then sorting, which is way worse. The same (modulo sign) goes for adding lists. On Aug 4, 2012 1:55 PM, "Clark Gaebel" wrote:

...
Its generally not advisable to use Data.List for performance-sensitive parts of an application.

Try using Data.Vector instead: http://hackage.haskell.org/package/vector [4]

On Sat, Aug 4, 2012 at 11:23 AM, David Feuer wrote:

...
Im writing a toy program (for a SPOJ problem--see https://www.spoj.pl/problems/ABCDEF/ [1] ) and the profiler says my performance problem is that Im spending too much time sorting. Im using Data.List.sort on [Int32] (its a 32-bit architecture). Others, using other languages, have managed to solve the problem within the time limit using the same approach Ive taken (I believe), but mine is taking too long. Any suggestions? Do I need to do something insane like sorting in an STUArray?

David Feuer

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org [2] http://www.haskell.org/mailman/listinfo/haskell-cafe [3]

Links: ------ [1] https://www.spoj.pl/problems/ABCDEF/ [2] mailto:Haskell-Cafe@haskell.org [3] http://www.haskell.org/mailman/listinfo/haskell-cafe [4] http://hackage.haskell.org/package/vector [5] mailto:david.feuer@gmail.com [6] mailto:cgaebel@uwaterloo.ca

David Feuer

11:37 a.m.

Unfortunately, I doubt it can. That algorithm reduces the number of comparisons a good bit, but the asymptotic complexity of its other operations remains the same, with apparently bad constant factors (according to the article). Also, that article describes the algorithm in terms of sorting [a+b| a<-A, b<-A], whereas I'm actually dealing (in the more substantial phase) with [a+b | a<-A, b<-B], where B is much larger than A. Using the merging approach I described in my last email, I can reduce the complexity from mn log(mn) in the naive algorithm to mn log (min {m,n}). Since m=100 and n~=10,000, I should be able to get nearly double the speed of the naive approach, as long as my multi-merge is fast enough. On Sun, Aug 5, 2012 at 3:59 AM, Heinrich Hördegen wrote:

...

Hi David,

can this help you?

http://www.scribd.com/doc/81029312/5/Sorting-pairwise-sums

Heinrich

Am 04.08.2012 20:47, schrieb David Feuer:

...
I realized my algorithm is insane. The correct way to sort [a*b|a<-A, b<-B] is clearly to sort A and B, then for each a in A construct either map (a*) B or map (a*) (reverse B), depending on the sign of a, then merge all these results together with a merge that collapses duplicates. I was multiplying and then sorting, which is way worse. The same (modulo sign) goes for adding lists. On Aug 4, 2012 1:55 PM, "Clark Gaebel" wrote:

...
Its generally not advisable to use Data.List for

performance-sensitive parts of an application.

Try using Data.Vector instead: http://hackage.haskell.org/package/vector [4]

On Sat, Aug 4, 2012 at 11:23 AM, David Feuer wrote:

...
Im writing a toy program (for a SPOJ problem--see https://www.spoj.pl/problems/ABCDEF/ [1] ) and the profiler says my performance problem is that Im spending too much time sorting. Im using Data.List.sort on [Int32] (its a 32-bit architecture).

Others, using other languages, have managed to solve the problem within the time limit using the same approach Ive taken (I believe), but

mine is taking too long. Any suggestions? Do I need to do something insane like sorting in an STUArray?

David Feuer

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org [2] http://www.haskell.org/mailman/listinfo/haskell-cafe [3]

Links: ------ [1] https://www.spoj.pl/problems/ABCDEF/ [2] mailto:Haskell-Cafe@haskell.org [3] http://www.haskell.org/mailman/listinfo/haskell-cafe [4] http://hackage.haskell.org/package/vector [5] mailto:david.feuer@gmail.com [6] mailto:cgaebel@uwaterloo.ca

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

David Feuer

11:45 a.m.

Where by "nearly double", of course, I really mean "nearly triple". I'm a little tired. David On Sun, Aug 5, 2012 at 5:37 AM, David Feuer wrote:

...

Unfortunately, I doubt it can. That algorithm reduces the number of comparisons a good bit, but the asymptotic complexity of its other operations remains the same, with apparently bad constant factors (according to the article). Also, that article describes the algorithm in terms of sorting [a+b| a<-A, b<-A], whereas I'm actually dealing (in the more substantial phase) with [a+b | a<-A, b<-B], where B is much larger than A. Using the merging approach I described in my last email, I can reduce the complexity from mn log(mn) in the naive algorithm to mn log (min {m,n}). Since m=100 and n~=10,000, I should be able to get nearly double the speed of the naive approach, as long as my multi-merge is fast enough.

On Sun, Aug 5, 2012 at 3:59 AM, Heinrich Hördegen wrote:

...
Hi David,

can this help you?

http://www.scribd.com/doc/81029312/5/Sorting-pairwise-sums

Heinrich

Am 04.08.2012 20:47, schrieb David Feuer:

...
I realized my algorithm is insane. The correct way to sort [a*b|a<-A, b<-B] is clearly to sort A and B, then for each a in A construct either map (a*) B or map (a*) (reverse B), depending on the sign of a, then merge all these results together with a merge that collapses duplicates. I was multiplying and then sorting, which is way worse. The same (modulo sign) goes for adding lists. On Aug 4, 2012 1:55 PM, "Clark Gaebel" wrote:

...
Its generally not advisable to use Data.List for

performance-sensitive parts of an application.

Try using Data.Vector instead: http://hackage.haskell.org/package/vector [4]

On Sat, Aug 4, 2012 at 11:23 AM, David Feuer wrote:

...
Im writing a toy program (for a SPOJ problem--see https://www.spoj.pl/problems/ABCDEF/ [1] ) and the profiler says my performance problem is that Im spending too much time sorting. Im using Data.List.sort on [Int32] (its a 32-bit architecture).

Others, using other languages, have managed to solve the problem within the time limit using the same approach Ive taken (I believe), but

mine is taking too long. Any suggestions? Do I need to do something insane like sorting in an STUArray?

David Feuer

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org [2] http://www.haskell.org/mailman/listinfo/haskell-cafe [3]

Links: ------ [1] https://www.spoj.pl/problems/ABCDEF/ [2] mailto:Haskell-Cafe@haskell.org [3] http://www.haskell.org/mailman/listinfo/haskell-cafe [4] http://hackage.haskell.org/package/vector [5] mailto:david.feuer@gmail.com [6] mailto:cgaebel@uwaterloo.ca

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

4892

Age (days ago)

4893

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Clark Gaebel
David Feuer
Heinrich Hördegen