How to improve its performance ? - Haskell-Cafe - Haskell.org

newer
Parsec to parse tree structures?

How to improve its performance ?

older
Parsec monad transformer with IO?

zaxis

17 Mar 2010 17 Mar '10

11:53 p.m.

import Data.List combination :: [a] -> [[a]] combination [] = [[]] combination (x:xs) = (map (x:) (combination xs) )++ (combination xs) samp = [1..100] allTwoGroup = [(x, samp\\x) | x <- combination samp] The above code is used to calculate all the two groups from sample data ? It is very slow ! Sincerely! ----- fac n = let { f = foldr (*) 1 [1..n] } in f -- View this message in context: http://old.nabble.com/How-to-improve-its-performance---tp27940036p27940036.h... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Reply

Sign in to reply online Use email software

Show replies by date

Daniel Fischer

18 Mar 18 Mar

1:14 a.m.

Am Donnerstag 18 März 2010 00:53:28 schrieb zaxis:

import Data.List

combination :: [a] -> [[a]] combination [] = [[]] combination (x:xs) = (map (x:) (combination xs) )++ (combination xs)

That would normally be called sublists (or subsets, if one regards lists as representing a set), I think. And, apart from the order in which they are generated, it's the same as Data.List.subsequences (only less efficient).

samp = [1..100] allTwoGroup = [(x, samp\\x) | x <- combination samp]

The above code is used to calculate all the two groups from sample data

All partitions into two sublists/sets/samples.

? It is very slow !

I found it surprisingly not-slow (code compiled with -O2, as usual). There are two points where you waste time. First, in combination (x:xs) you calculate (combination xs) twice. If the order in which the sublists come doesn't matter, it's better to do it only once: combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs] Second, (\\) is slow, xs \\ ys is O(length xs * length ys). Also, (\\) requires an Eq constraint. If you're willing to constrain the type further, to (Ord a => [a] -> [([a],[a])]), and call it only on ordered lists, you can replace (\\) by the much faster difference of oredered lists (implementation left as an exercise for the reader). But you can work with unconstrained types, and faster, if you build the two complementary sublists at the same time. The idea is, -- An empty list has one partition into two complementary sublists: partitions2 [] = [([],[])] -- For a nonempty list (x:xs), the partitions into two complementary -- sublists each have x either in the first sublist or in the second. -- Each partition induces a corresponding partition of the tail, xs, -- by removing x from the group in which it appears. -- Conversely, every partition ox xs gives rise to two partitions -- of (x:xs), by adding x to either the first or the second sublist. So partitions2 (x:xs) = concat [ [(x:ys,zs),(ys,x:zs)] | (ys,zs) <- partitions2 xs ] We can also write the second case as partitions2 (x:xs) = concatMap (choice x) (partitions2 xs) where choice x (ys,zs) = [(x:ys,zs),(ys,x:zs)] Now it's very easy to recognise that partitions2 is a fold, partitions2 xs = foldr (concatMap . choice) [([],[])] xs

Sincerely!

Reply

Sign in to reply online Use email software

Alexander Solla

1:24 a.m.

On Mar 17, 2010, at 6:14 PM, Daniel Fischer wrote:

I found it surprisingly not-slow (code compiled with -O2, as usual). There are two points where you waste time.

I found one big point where time is wasted: in computing the powerset of a list. He's making 2^n elements, and then iterating through them all and filtering, but only needs n^2 or n `choose` 2 of the (depending on the semantics for his "groups"). The answer is to do something like: allPairs list = [(x,y) | x <- list, y <- list] to get it done in n^2 time.

Reply

Sign in to reply online Use email software

zaxis

3:33 a.m.

`allPairs list = [(x,y) | x <- list, y <- list] ` is not what `combination` does !

let allPairs list = [(x,y) | x <- list, y <- list] allPairs [1,2,3] [(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]

combination [1,2,3] [[1,2,3],[2,3],[1,3],[3],[1,2],[2],[1],[]]

Alexander Solla-2 wrote:

On Mar 17, 2010, at 6:14 PM, Daniel Fischer wrote:

...
I found it surprisingly not-slow (code compiled with -O2, as usual). There are two points where you waste time.

I found one big point where time is wasted: in computing the powerset of a list. He's making 2^n elements, and then iterating through them all and filtering, but only needs n^2 or n `choose` 2 of the (depending on the semantics for his "groups").

The answer is to do something like:

allPairs list = [(x,y) | x <- list, y <- list]

to get it done in n^2 time. _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

----- fac n = let { f = foldr (*) 1 [1..n] } in f -- View this message in context: http://old.nabble.com/How-to-improve-its-performance---tp27940036p27941343.h... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Reply

Sign in to reply online Use email software

Alexander Solla

4:03 a.m.

On Mar 17, 2010, at 8:33 PM, zaxis wrote:

`allPairs list = [(x,y) | x <- list, y <- list] ` is not what `combination` does !

...
let allPairs list = [(x,y) | x <- list, y <- list] allPairs [1,2,3] [(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]

Yeah, I know that. I said so specifically. combination computes the power set of a list. You said your goal was to compute a set of "two groups". You don't need the power set in order to compute a set of pairs. Moreover, computing the power set is a slow operation. Indeed, it is the source of your slowness.

Reply

Sign in to reply online Use email software

Ozgur Akgun

11:44 a.m.

As Daniel pointed out earlier, this may be of a little bit of help: http://haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.0/src/Data-List... On 18 March 2010 04:03, Alexander Solla wrote:

On Mar 17, 2010, at 8:33 PM, zaxis wrote:

...
`allPairs list = [(x,y) | x <- list, y <- list] ` is not what `combination` does !

...
let allPairs list = [(x,y) | x <- list, y <- list] allPairs [1,2,3]

[(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]

Yeah, I know that. I said so specifically. combination computes the power set of a list. You said your goal was to compute a set of "two groups". You don't need the power set in order to compute a set of pairs. Moreover, computing the power set is a slow operation. Indeed, it is the source of your slowness.

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Ozgur Akgun

Reply

Sign in to reply online Use email software

Daniel Fischer

4:13 p.m.

Am Donnerstag 18 März 2010 05:03:28 schrieb Alexander Solla:

On Mar 17, 2010, at 8:33 PM, zaxis wrote:

...
`allPairs list = [(x,y) | x <- list, y <- list] ` is not what `combination` does !

...
let allPairs list = [(x,y) | x <- list, y <- list] allPairs [1,2,3]

[(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(3,1),(3,2),(3,3)]

Yeah, I know that. I said so specifically. combination computes the power set of a list. You said your goal was to compute a set of "two groups". You don't need the power set in order to compute a set of pairs. Moreover, computing the power set is a slow operation. Indeed, it is the source of your slowness.

I think you've been fooled by the names. If you look at the code (or run it on a smaller sample), you'll see that allTwoGroup computes the list of all partitions of sample into two complementary sets. If sample were [1,2,3], the result would be (apart from the order) [([1,2,3],[]),([1,2],[3]),([1,3],[2]),([2,3],[1]),([1],[2,3]),([2],[1,3]), ([3],[1,2]),([],[1,2,3])] with sample = [1 .. 100], it's a list of 2^100 elements, that'll take a long time to compute no matter how.

Reply

Sign in to reply online Use email software

zaxis

3:29 a.m.

The time is wasted to run combination even if use `combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs] ' instead. in ghci

combination [1..20] will wait for a long time .......

Daniel Fischer-4 wrote:

Am Donnerstag 18 März 2010 00:53:28 schrieb zaxis:

...
import Data.List

combination :: [a] -> [[a]] combination [] = [[]] combination (x:xs) = (map (x:) (combination xs) )++ (combination xs)

That would normally be called sublists (or subsets, if one regards lists as representing a set), I think. And, apart from the order in which they are generated, it's the same as Data.List.subsequences (only less efficient).

...
samp = [1..100] allTwoGroup = [(x, samp\\x) | x <- combination samp]

The above code is used to calculate all the two groups from sample data

All partitions into two sublists/sets/samples.

...
? It is very slow !

I found it surprisingly not-slow (code compiled with -O2, as usual). There are two points where you waste time. First, in

combination (x:xs)

you calculate (combination xs) twice. If the order in which the sublists come doesn't matter, it's better to do it only once:

combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs]

Second, (\\) is slow, xs \\ ys is O(length xs * length ys). Also, (\\) requires an Eq constraint. If you're willing to constrain the type further, to (Ord a => [a] -> [([a],[a])]), and call it only on ordered lists, you can replace (\\) by the much faster difference of oredered lists (implementation left as an exercise for the reader).

But you can work with unconstrained types, and faster, if you build the two complementary sublists at the same time. The idea is, -- An empty list has one partition into two complementary sublists: partitions2 [] = [([],[])] -- For a nonempty list (x:xs), the partitions into two complementary -- sublists each have x either in the first sublist or in the second. -- Each partition induces a corresponding partition of the tail, xs, -- by removing x from the group in which it appears. -- Conversely, every partition ox xs gives rise to two partitions -- of (x:xs), by adding x to either the first or the second sublist. So partitions2 (x:xs) = concat [ [(x:ys,zs),(ys,x:zs)] | (ys,zs) <- partitions2 xs ]

We can also write the second case as

partitions2 (x:xs) = concatMap (choice x) (partitions2 xs)

where

choice x (ys,zs) = [(x:ys,zs),(ys,x:zs)]

Now it's very easy to recognise that partitions2 is a fold,

partitions2 xs = foldr (concatMap . choice) [([],[])] xs

...
Sincerely!

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

----- fac n = let { f = foldr (*) 1 [1..n] } in f -- View this message in context: http://old.nabble.com/How-to-improve-its-performance---tp27940036p27941317.h... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Reply

Sign in to reply online Use email software

Daniel Fischer

5:16 p.m.

Am Donnerstag 18 März 2010 04:29:53 schrieb zaxis:

The time is wasted to run combination even if use `combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs] ' instead. in ghci

...
combination [1..20]

will wait for a long time .......

Hm, really? Prelude> :set +s Prelude> let combination [] = [[]]; combination (x:xs) = [x:ys | ys <- combination xs] ++ combination xs (0.00 secs, 1818304 bytes) Prelude> let combs [] = [[]]; combs (x:xs) = concat [[x:ys,ys] | ys <- combs xs] (0.00 secs, 2102280 bytes) Prelude> length [1 .. 2^20] 1048576 (0.07 secs, 49006024 bytes) Prelude> length $ combination [1 .. 20] 1048576 (8.28 secs, 915712452 bytes) Prelude> length $ combs [1 .. 20] 1048576 (0.78 secs, 146841964 bytes) That's interpreted, so not optimised. Optimisation narrows the gap, speeds both up significantly, so put combination :: [a] -> [[a]] combination [] = [[]] combination (x:xs) = [x:ys | ys <- combination xs] ++ combination xs combs :: [a] -> [[a]] combs [] = [[]] combs (x:xs) = concat [[x:ys,ys] | ys <- combs xs] mcombs :: [a] -> [[a]] mcombs = foldr (flip (>>=) . f) [[]] where f x xs = [x:xs,xs] in a file, compile with -O2 and load into ghci: Prelude Parts> length $ combination [1 .. 20] 1048576 (0.16 secs, 43215220 bytes) Prelude Parts> length $ combs [1 .. 20] 1048576 (0.05 secs, 55573436 bytes) Prelude Parts> length $ mcombs [1 .. 20] 1048576 (0.06 secs, 55572692 bytes) Prelude Parts> length $ combination [1 .. 24] 16777216 (3.06 secs, 674742880 bytes) Prelude Parts> length $ combs [1 .. 24] 16777216 (0.62 secs, 881788880 bytes) Prelude Parts> length $ mcombs [1 .. 24] 16777216 (0.62 secs, 881788956 bytes) Prelude Parts> length [1 .. 2^24] 16777216 (0.64 secs, 675355184 bytes) So combs and the pointfree combinator version mcombs are equally fast and significantly faster than combination. In fact they're as fast as a simple enumeration. Now, if you actually let ghci print out the result, the printing takes a long time. So much that the difference in efficiency is hardly discernible or not at all.

Daniel Fischer-4 wrote:

...
Am Donnerstag 18 März 2010 00:53:28 schrieb zaxis:

...
import Data.List

combination :: [a] -> [[a]] combination [] = [[]] combination (x:xs) = (map (x:) (combination xs) )++ (combination xs)

That would normally be called sublists (or subsets, if one regards lists as representing a set), I think. And, apart from the order in which they are generated, it's the same as Data.List.subsequences (only less efficient).

...
samp = [1..100] allTwoGroup = [(x, samp\\x) | x <- combination samp]

The above code is used to calculate all the two groups from sample data

All partitions into two sublists/sets/samples.

...
? It is very slow !

I found it surprisingly not-slow (code compiled with -O2, as usual). There are two points where you waste time. First, in

combination (x:xs)

you calculate (combination xs) twice. If the order in which the sublists come doesn't matter, it's better to do it only once:

combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs]

Second, (\\) is slow, xs \\ ys is O(length xs * length ys). Also, (\\) requires an Eq constraint. If you're willing to constrain the type further, to (Ord a => [a] -> [([a],[a])]), and call it only on ordered lists, you can replace (\\) by the much faster difference of oredered lists (implementation left as an exercise for the reader).

But you can work with unconstrained types, and faster, if you build the two complementary sublists at the same time. The idea is, -- An empty list has one partition into two complementary sublists: partitions2 [] = [([],[])] -- For a nonempty list (x:xs), the partitions into two complementary -- sublists each have x either in the first sublist or in the second. -- Each partition induces a corresponding partition of the tail, xs, -- by removing x from the group in which it appears. -- Conversely, every partition ox xs gives rise to two partitions -- of (x:xs), by adding x to either the first or the second sublist. So partitions2 (x:xs) = concat [ [(x:ys,zs),(ys,x:zs)] | (ys,zs) <- partitions2 xs ]

We can also write the second case as

partitions2 (x:xs) = concatMap (choice x) (partitions2 xs)

where

choice x (ys,zs) = [(x:ys,zs),(ys,x:zs)]

Now it's very easy to recognise that partitions2 is a fold,

partitions2 xs = foldr (concatMap . choice) [([],[])] xs

Reply

Sign in to reply online Use email software

zaxis

19 Mar 19 Mar

1:56 a.m.

%cat Test.hs module Test(mcombs) where import Data.List mcombs = foldr (flip (>>=) . f) [[]] where f x xs = [x:xs,xs] %ghc -c -O2 Test.hs %ghci

:l Test Ok, modules loaded: Test.

:set +s length $ mcombs [1..20] 1048576 (0.06 secs, 56099528 bytes)

length $ mcombs [1..50] ^CInterrupted.

Daniel Fischer-4 wrote:

Am Donnerstag 18 März 2010 04:29:53 schrieb zaxis:

...
The time is wasted to run combination even if use `combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs] ' instead. in ghci

...
combination [1..20]

will wait for a long time .......

Hm, really?

Prelude> :set +s Prelude> let combination [] = [[]]; combination (x:xs) = [x:ys | ys <- combination xs] ++ combination xs (0.00 secs, 1818304 bytes) Prelude> let combs [] = [[]]; combs (x:xs) = concat [[x:ys,ys] | ys <- combs xs] (0.00 secs, 2102280 bytes) Prelude> length [1 .. 2^20] 1048576 (0.07 secs, 49006024 bytes) Prelude> length $ combination [1 .. 20] 1048576 (8.28 secs, 915712452 bytes) Prelude> length $ combs [1 .. 20] 1048576 (0.78 secs, 146841964 bytes)

That's interpreted, so not optimised. Optimisation narrows the gap, speeds both up significantly, so put

combination :: [a] -> [[a]] combination [] = [[]] combination (x:xs) = [x:ys | ys <- combination xs] ++ combination xs

combs :: [a] -> [[a]] combs [] = [[]] combs (x:xs) = concat [[x:ys,ys] | ys <- combs xs]

mcombs :: [a] -> [[a]] mcombs = foldr (flip (>>=) . f) [[]] where f x xs = [x:xs,xs]

in a file, compile with -O2 and load into ghci:

Prelude Parts> length $ combination [1 .. 20] 1048576 (0.16 secs, 43215220 bytes) Prelude Parts> length $ combs [1 .. 20] 1048576 (0.05 secs, 55573436 bytes) Prelude Parts> length $ mcombs [1 .. 20] 1048576 (0.06 secs, 55572692 bytes) Prelude Parts> length $ combination [1 .. 24] 16777216 (3.06 secs, 674742880 bytes) Prelude Parts> length $ combs [1 .. 24] 16777216 (0.62 secs, 881788880 bytes) Prelude Parts> length $ mcombs [1 .. 24] 16777216 (0.62 secs, 881788956 bytes) Prelude Parts> length [1 .. 2^24] 16777216 (0.64 secs, 675355184 bytes)

So combs and the pointfree combinator version mcombs are equally fast and significantly faster than combination. In fact they're as fast as a simple enumeration.

Now, if you actually let ghci print out the result, the printing takes a long time. So much that the difference in efficiency is hardly discernible or not at all.

...
Daniel Fischer-4 wrote:

...
Am Donnerstag 18 März 2010 00:53:28 schrieb zaxis:

...
import Data.List

combination :: [a] -> [[a]] combination [] = [[]] combination (x:xs) = (map (x:) (combination xs) )++ (combination xs)

That would normally be called sublists (or subsets, if one regards lists as representing a set), I think. And, apart from the order in which they are generated, it's the same as Data.List.subsequences (only less efficient).

...
samp = [1..100] allTwoGroup = [(x, samp\\x) | x <- combination samp]

The above code is used to calculate all the two groups from sample data

All partitions into two sublists/sets/samples.

...
? It is very slow !

I found it surprisingly not-slow (code compiled with -O2, as usual). There are two points where you waste time. First, in

combination (x:xs)

you calculate (combination xs) twice. If the order in which the sublists come doesn't matter, it's better to do it only once:

combination (x:xs) = concat [(x:ys), ys] | ys <- combination xs]

Second, (\\) is slow, xs \\ ys is O(length xs * length ys). Also, (\\) requires an Eq constraint. If you're willing to constrain the type further, to (Ord a => [a] -> [([a],[a])]), and call it only on ordered lists, you can replace (\\) by the much faster difference of oredered lists (implementation left as an exercise for the reader).

But you can work with unconstrained types, and faster, if you build the two complementary sublists at the same time. The idea is, -- An empty list has one partition into two complementary sublists: partitions2 [] = [([],[])] -- For a nonempty list (x:xs), the partitions into two complementary -- sublists each have x either in the first sublist or in the second. -- Each partition induces a corresponding partition of the tail, xs, -- by removing x from the group in which it appears. -- Conversely, every partition ox xs gives rise to two partitions -- of (x:xs), by adding x to either the first or the second sublist. So partitions2 (x:xs) = concat [ [(x:ys,zs),(ys,x:zs)] | (ys,zs) <- partitions2 xs ]

We can also write the second case as

partitions2 (x:xs) = concatMap (choice x) (partitions2 xs)

where

choice x (ys,zs) = [(x:ys,zs),(ys,x:zs)]

Now it's very easy to recognise that partitions2 is a fold,

partitions2 xs = foldr (concatMap . choice) [([],[])] xs

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

----- fac n = let { f = foldr (*) 1 [1..n] } in f -- View this message in context: http://old.nabble.com/How-to-improve-its-performance---tp27940036p27950876.h... Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Reply

Sign in to reply online Use email software

Daniel Fischer

2:24 a.m.

Am Freitag 19 März 2010 02:56:36 schrieb zaxis:

%cat Test.hs module Test(mcombs) where import Data.List

mcombs = foldr (flip (>>=) . f) [[]] where f x xs = [x:xs,xs]

%ghc -c -O2 Test.hs %ghci

...
:l Test

Ok, modules loaded: Test.

...
:set +s

length $ mcombs [1..20] 1048576 (0.06 secs, 56099528 bytes)

...
length $ mcombs [1..50]

^CInterrupted.

Yes, 2^50 = 1125899906842624. If your computer generated 10^9 sublists per second, you'd wait more than ten days for the answer. Since something like 25 million - 100 million per second is more in the region of what ordinary computers can achieve, it'd rather be three months to over a year. Prelude Parts> mcombs [1 .. 50] !! 100000000 [1,2,3,4,5,6,7,8,10,11,12,13,18,20,26,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50] (4.35 secs, 5254369992 bytes) Prelude Parts> length $ replicate (10^8) () 100000000 (2.10 secs, 2806344336 bytes) Really, not bad, IMO.

Reply

Sign in to reply online Use email software

5596

Age (days ago)

5598

Last active (days ago)

Download

10 comments

4 participants

tags

participants (4)

Alexander Solla
Daniel Fischer
Ozgur Akgun
zaxis