Add 'subsequences' and 'permutations' to Data.List (ticket #1990)

Hello all, As Thomas Hartman noted in a recent cafe thread [1], haskell 1.3 included the functions 'subsequences' and 'permutations'. I think these functions are quite useful, and I don't know why they were ever removed. This is a proposal to add these two functions to Data.List. The implementation is taken directly from the Haskell 1.3 report [2]. Trac ticket: #1990 Discussion period ends: January 7th (since there is a holiday comming up) Twan [1] http://article.gmane.org/gmane.comp.lang.haskell.cafe/33535 [2] http://haskell.cs.yale.edu/haskell-report/List.html

Hi
As Thomas Hartman noted in a recent cafe thread [1], haskell 1.3 included the functions 'subsequences' and 'permutations'. I think these functions are quite useful, and I don't know why they were ever removed.
Agreed and support. Obvious functions with unambiguous names. Thanks Neil

Support, though I'd tweak the code in your patch a little: subsequences :: [a] -> [[a]] subsequences [] = [[]] subsequences (x:xs) = let s = subsequences xs in s ++ map (x:) s permutations :: [a] -> [[a]] permutations [] = [[]] permutations (x:xs) = concatMap interleave $ permutations xs where interleave [] = [[x]] interleave (y:ys) = (x:y:ys) : map (y:) (interleave ys) In subsequences, make sure we don't calculate "subsequences xs" twice. In permutations, use : instead of ++ in one place, use concatMap instead of list comprehension, and take out unnecessary parameter to interleave.

David Benbennick wrote:
Support, though I'd tweak the code in your patch a little:
subsequences :: [a] -> [[a]] subsequences [] = [[]] subsequences (x:xs) = let s = subsequences xs in s ++ map (x:) s
This will cause a space leak, as 's' will be kept around while its first copy is consumed. (1) subsequences (x:xs) = let s = subsequences xs in concatMap (\xs -> [xs, x:xs]) s behaves better in that regard, but changes the order of the result lists. This version can also be made to work for infinite lists with a small modification, (2) subsequences (x:xs) = let s = subsequences xs in [] : tail (concatMap (\ys -> [ys, x:ys]) s) finally, we could make it slightly more lazy, as follows: (3) subsequences :: [a] -> [[a]] subsequences xs = [] : case xs of [] -> [] (x:xs) -> tail (concatMap (\ys -> [ys, x:ys]) (subsequences xs)) I prever both (2) and (3) over (1) and the original, and I'm leaning towards (3). (As an example where (3) is superior to (2), consider subsequences (1:2:undefined) (2) gives []:[1]:undefined, while (3) gives []:[1]:[2]:[1,2]:undefined) The only problem I see it that the output order is changed - does anybody feel that it is particularily important? Bertram

Bertram Felgenhauer wrote:
finally, we could make it slightly more lazy, as follows:
(3) subsequences :: [a] -> [[a]] subsequences xs = [] : case xs of [] -> [] (x:xs) -> tail (concatMap (\ys -> [ys, x:ys]) (subsequences xs))
You can get rid of the ugly call to 'tail': subsequences :: [a] -> [[a]] subsequences xs = [] : subsequences' xs where subsequences' [] = [] subsequences' (x:xs) = [x] : concatMap (\ys -> [ys, x:ys]) (subsequences' xs)
The only problem I see it that the output order is changed - does anybody feel that it is particularily important?
I don't think it matters much. This new version starts with things from the start of the list, as opposed to the end: subsequencesOld "abc" == ["","c","b","bc","a","ac","ab","abc"] subsequencesNew "abc" == ["","a","b","ab","c","ac","bc","abc"] If anything, I think this makes more sense, especially since it works for infinite lists. Twan

On Dec 18, 2007 12:56 PM, Bertram Felgenhauer
finally, we could make it slightly more lazy
Good point, your version is much better. The same issue applies to permutations. I haven't had time to write out the code yet, but I can imagine a version of permutations that does: permutations [1..] = [1...], [2,1, ...], [1,3,2 ...], [2,3,1 ...], [3,1,2 ...], [3,2,1 ...], ... so that the expression (take 10 $ map (take 10) $ permutations [1..]) isn't bottom.

David Benbennick wrote:
On Dec 18, 2007 12:56 PM, Bertram Felgenhauer
wrote: finally, we could make it slightly more lazy
Good point, your version is much better.
The same issue applies to permutations. I haven't had time to write out the code yet, but I can imagine a version of permutations that does: ...
Using mutual recursion between a version including the identity and one not including it, you can get: permutations1 :: [a] -> [[a]] permutations1 xxs = xxs : permutations3b xxs permutations1' [] = [] permutations1' (x:xs) = tail $ concatMap interleave $ permutations1 xs where interleave [] = [[x]] interleave (y:ys) = (x:y:ys) : map (y:) (interleave ys) Testing: > map (take 5) $ take 5 $ permutations1 [1..] [[1,2,3,4,5],[2,1,3,4,5],[2,3,1,4,5],[2,3,4,1,5],[2,3,4,5,1]] Again this has a call to tail. We have that tail $ concatMap interleave $ permutations1 xs = tail $ concatMap interleave $ (xs : permutations1 xs) = tail (interleave xs ++ concatMap interleave (permutations1 xs)) = tail (interleave xs) ++ concatMap interleave (permutations1 xs) So making a special case for "tail . interleave": permutations2 :: [a] -> [[a]] permutations2 xxs = xxs : permutations2' xxs permutations2' [] = [] permutations2' (x:xs) = interleave' xs ++ concatMap interleave (permutations2' xs) where interleave ys = (x:ys) : interleave' ys interleave' [] = [] interleave' (y:ys) = map (y:) (interleave ys) The next step would be to eliminate the (++) calls in interleave. This is not so easy, because of the last line, "map (y:) (interleave ys)". We can't use the ShowS trick here directly. The way out is of course to get rid of the map permutations3 :: [a] -> [[a]] permutations3 xxs = xxs : permutations3' xxs permutations3' [] = [] permutations3' (x:xs) = interleave' id xs $ foldr (interleave id) [] (permutations3' xs) where interleave f ys r = f (x:ys) : interleave' f ys r interleave' f [] r = r interleave' f (y:ys) r = interleave (f . (y:)) ys r And this is indeed a lot faster (ghc 6.8.1 -O2): time $ print $ sum $ map sum $ permutations $ [1..10] permutations2: 3.875000 sec permutations3: 1.625000 sec Unfortunatly, the clarity of the algorithm has not improved. Twan

On Dec 18, 2007 6:25 PM, Twan van Laarhoven
David Benbennick wrote:
The same issue applies to permutations. I haven't had time to write out the code yet, but I can imagine a version of permutations that does: ...
Using mutual recursion between a version including the identity and one not including it, you can get:
Actually, I would like permutations to satisfy: map (take n) (take (factorial n) $ permutations [1..]) == permutations [1..n] Not only is that a neat property, but it forces maximal laziness. That means, for example, that: take 6 $ map (take 3) $ permutations (1:2:3:undefined) doesn't throw an exception.

David Benbennick wrote:
Actually, I would like permutations to satisfy:
map (take n) (take (factorial n) $ permutations [1..]) == permutations [1..n]
Not only is that a neat property, but it forces maximal laziness.
That would indeed be nice. After a lot of work I have come up with a function that does this: permutations4 xs = perms xs [[]] [] where perms xxs ps qs = map (++xxs) ps ++ case xxs of [] -> [] (x:xs) -> perms xs (concatMap (interleave x) (ps ++ qs)) (map (++[x]) (ps ++ qs)) interleave x [] = [] interleave x (y:ys) = (x:y:ys) : map (y:) (interleave x ys) Unfortunatly this function is really slow, 10.734375 sec, compared to 3.875000 sec for the initial version (see previous message for details of testing). This is of course the result of way too many (++) calls. Twan

Twan van Laarhoven wrote:
Unfortunatly this function is really slow, 10.734375 sec, compared to 3.875000 sec for the initial version (see previous message for details of testing). This is of course the result of way too many (++) calls.
Twan
I agree :). I have spent some time optimizing the function (both for readability and speed). I ended up with permutations5 xs = xs : perms xs [[]] where perms [] ps = [] perms (x:xs) ps = concatMap interleave' ps ++ perms xs (concatMap interleave ps) where interleave [] = [[x]] interleave (y:ys) = (x:y:ys) : map (x:) (interleave ys) interleave' [] = [] interleave' (y:ys) = (x:y:ys++xs) : map (x:) (interleave' ys) Or using 'foldr' instead of 'concatMap' permutations6 xs = xs : perms xs [[]] where perms [] ps = [] perms (x:xs) ps = (flip . foldr) (interleave' id) ps $ perms xs $ (flip . foldr) (interleave id) ps [] where interleave f [] r = f [x] : r interleave f (y:ys) r = f (x:y:ys) : interleave (f . (y:)) ys r interleave' f [] r = r interleave' f (y:ys) r = f (x:y:ys++xs) : interleave' (f . (y:)) ys r The (flip.foldr) is just a trick to get the arguments to foldr in the 'right' order, i.e. (flip.foldr) :: (a -> b -> b) -> ([a] -> b -> b). I am not too happy with this function, especially the two similar both subtly different interleave functions. I am not sure what version should go into the base library. The concatMap version (permutations5) makes the most sense to me, since number 6 just screams obfuscation. Benchmark times: non-lazy concatMap 3.250s non-lazy foldr 1.500s lazy concatMap 3.640s lazy foldr 2.500s Twan

Unfortunately, permutations5 has an error:
*Main> permutations5 [1..3] [[1,2,3],[2,1,3],[3,2,1],[3,3,1],[3,2,2],[3,3,2]] *Main> permutations6 [1..3] [[1,2,3],[2,1,3],[3,2,1],[2,3,1],[3,1,2],[1,3,2]]

Chris Kuklewicz wrote:
Unfortunately, permutations5 has an error:
*Main> permutations5 [1..3] [[1,2,3],[2,1,3],[3,2,1],[3,3,1],[3,2,2],[3,3,2]] *Main> permutations6 [1..3] [[1,2,3],[2,1,3],[3,2,1],[2,3,1],[3,1,2],[1,3,2]]
Oops, the "map (x:) ..." in interleave should be "map (y:) ...", permutations5 xs = xs : perms xs [[]] where perms [] ps = [] perms (x:xs) ps = concatMap interleave' ps ++ perms xs (concatMap interleave ps) where interleave [] = [[x]] interleave (y:ys) = (x:y:ys) : map (y:) (interleave ys) interleave' [] = [] interleave' (y:ys) = (x:y:ys++xs) : map (y:) (interleave' ys) Twan

It just occurred to me that permutations doesn't quite agree with the mathematical concept:
permutations "aaa" ["aaa","aaa","aaa","aaa","aaa","aaa"]
But that's not really a problem, since you can always just compose
permutations with (Set.toList . Set.fromList).
On Dec 20, 2007 7:05 AM, Twan van Laarhoven
Chris Kuklewicz wrote:
Unfortunately, permutations5 has an error:
I suggest including some QuickCheck properties with this proposal, especially a property that would have caught that error. I'm thinking of four properties: 1) If x = permutations y, then check length x is right. 2) Check each element of x is a permutation of y 3) Check each permutation of y is an element of x 4) Check for laziness By the way, to improve mailing list laziness: the proposal is at http://hackage.haskell.org/trac/ghc/ticket/1990

David Benbennick wrote:
Actually, I would like permutations to satisfy: map (take n) (take (factorial n) $ permutations [1..]) == permutations [1..n]
Twan van Laarhoven wrote:
permutations5 xs = xs : perms xs [[]] where perms [] ps = [] perms (x:xs) ps = concatMap interleave' ps ++ perms xs (concatMap interleave ps) where interleave [] = [[x]] interleave (y:ys) = (x:y:ys) : map (y:) (interleave ys) interleave' [] = [] interleave' (y:ys) = (x:y:ys++xs) : map (y:) (interleave' ys)
Twan, I think you are working a little too hard to satisfy the consistency property. You only need to satisfy it for permutations where the first n elements of the permutation are the same as the first n elements of the original list. Other than that, you can just use the faster function that you defined earlier. Here is a quick effort that beats permutations5 by using your previous permutations3: permutations7 xs = xs : (concat $ zipWith newPerms (init $ tail $ tails xs) (init $ tail $ inits xs)) where newPerms (t:ts) = map (++ts) . concatMap (interleave t) . permutations3 interleave t [y] = [[t, y]] interleave t ys@(y:ys') = (t:ys) : map (y:) (interleave t ys') (sorry about using the name interleave yet again, ugh) On my machine the result is: control: 1.037048 sec permutations3: 1.4078301 sec permutations5: 3.280238 sec permutations7: 3.0912578 sec

Yitzchak Gale wrote:
Twan, I think you are working a little too hard to satisfy the consistency property. You only need to satisfy it for permutations where the first n elements of the permutation are the same as the first n elements of the original list. Other than that, you can just use the faster function that you defined earlier.
Here is a quick effort that beats permutations5 by using your previous permutations3:
permutations7 xs = xs : (concat $ zipWith newPerms (init $ tail $ tails xs) (init $ tail $ inits xs)) where newPerms (t:ts) = map (++ts) . concatMap (interleave t) . permutations3 interleave t [y] = [[t, y]] interleave t ys@(y:ys') = (t:ys) : map (y:) (interleave t ys')
That looks quite nice. Unfortunatly your function is too strict with the normal inits and tails. After replacing those with these lazier versions (which should be in Data.List) it works much better. inits' xxs = [] : case xxs of [] -> [] (x:xs) -> map (x:) (inits' xs) tails' xxs = xxs : case xxs of [] -> [] (_:xs) -> tails' xs There is also a problem with "permutations7 []". The problem is 'tail', it is not needed. Replacing the first two lines with: permutations7' xs = xs : (concat $ init $ zipWith newPerms (tails' xs) (inits' xs)) solves that problem, and it is also shorter. It is also possible to get rid of inits and tails entirely: permutations8 xs = xs : newPerms xs [] where newPerms [] is = [] newPerms (t:ts) is = concatMap interleave (permutations8 is) ++ newPerms ts (t:is) where interleave [] = [] interleave (y:ys) = (t:y:ys++ts) : map (y:) (interleave ys) A foldr version is of course also possible permutations8b xs = xs : newPerms xs [] where newPerms [] is = [] newPerms (t:ts) is = foldr (interleave id) (newPerms ts (t:is)) (permutations8b is) where interleave f [] r = r interleave f yys@(y:ys) r = f (t:yys++ts) : interleave (f . (y:)) ys r Some run times: permutations7': 4.750 sec permutations7', using 3 for recursion: 4.250 sec permutations8: 3.984 sec permutations8b: 2.250 sec permutations8b, using 3 for recursion: 1.984 sec My current preference is 8 or 8b, using a different function in the recursion is going to far for my taste. Twan

Twan van Laarhoven wrote:
permutations7': 4.750 sec permutations7', using 3 for recursion: 4.250 sec permutations8: 3.984 sec permutations8b: 2.250 sec permutations8b, using 3 for recursion: 1.984 sec
Great work!
My current preference is 8 or 8b, using a different function in the recursion is going to far for my taste.
Perhaps they could be combined somehow, e.g., re-use interleave. You should run your times with a control that just adds up 10! copies of [1..10]. On my machine, that takes about 1 sec. So when we subtract out the control, satisfying the consistency property increases run time by at least a factor of 4. The property is nice, but is it worth that penalty? I'm not sure, I'd be interested in hearing other people's opinions. Regards, Yitz

So when we subtract out the control, satisfying the consistency property increases run time by at least a factor of 4.
Oops, a factor of 3. But anyway, same point.
The property is nice, but is it worth that penalty? I'm not sure, I'd be interested in hearing other people's opinions.
-Yitz

Twan van Laarhoven wrote:
permutations8b xs = xs : newPerms xs [] where newPerms [] is = [] newPerms (t:ts) is = foldr (interleave id) (newPerms ts (t:is)) (permutations8b is) where interleave f [] r = r interleave f yys@(y:ys) r = f (t:yys++ts) : interleave (f . (y:)) ys r
Replacing interleave with interleave xs r = snd $ interleave' id xs r interleave' f [] r = (ts, r) interleave' f y(y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r in (y:us, f (t:y:us) : zs) Gets rid of the (++). I call this version 8c. This has the run times: permuations8c, permutations3 in recursion 1.969 sec permuations8c, permutations8c in recursion 2.094 sec The difference between these two variants is quite small, not worth the effort in my opinion. This is also getting quite close to the optimal non-lazy version (permutations3, 1.625 sec). Yitzchak Gale wrote:
So when we subtract out the control, satisfying the consistency property increases run time by at least a factor of 3.
What exactly are you calculating here? Subtracting the control doesn't make sense. You are comparing apples to oranges. What if permutations1, say 1.1* were almost as fast as the control, while permutations2 takes 1.2*. Would you call permutations2 twice as slow? What if the control were permutations0, the fastest possible permutations function. How can permutations2 be twice as slow as permutations1, while it is only 1.2 times as slow as the fastest permutations function?
The property is nice, but is it worth that penalty? I'm not sure, I'd be interested in hearing other people's opinions.
It is not so much about that property, more about the lazyness it allows. Permutations should give as many results as possible. This means that: map (take n) . take (factorial n) $ permutations ([1..n] ++ undefined) should not give an error. Since 'undefined' is not inspected the function can not depend on it, which gives the necessary (but not sufficient) condition: map (take n) . take (factorial n) $ permutations [1..]) == permutations [1..n] I think lazyness is very important in the standard library. If you want things to be strict, why are you programming in Haskell? :) Also, the constant-factor of 'permutations' hardly matters. Any program that does something interesting will be doing more for each permutation than just summing numbers. If you replace the trivial 'map sum' with something slightly more complicated, which permutations function you used is not going to matter much. Twan

I wrote:
So when we subtract out the control, satisfying the consistency property increases run time by at least a factor of 3.
Twan van Laarhoven wrote:
What exactly are you calculating here?
The cost of the logic that actually generates the permutations, after subtracting off the cost of adding up 10! lists of 10 numbers, which is an artifact of our ad hoc performance test. True, we also subtract off the cost of iterating over all of those copies, which, due to laziness, is shared somewhat with the generating algorithm. But that sharing is common to all algorithms, so I think this is fair. Anyway, no need to dwell on this point. It is not so relevant anymore, see below.
The property is nice, but is it worth that penalty? I'm not sure, I'd be interested in hearing other people's opinions.
No one else responded. I am taking that as support for the consistency condition and Twan's approach. Besides, after Twan's further progress, there is hardly a penalty anymore (less than a factor of 2 even according to my logic, if you accept it). As one further sanity check, let's see if we are re-inventing the wheel by checking Knuth. The relevant chapter is Volume 4, section 7.2.1.2, published in Volume 4 Fascicle 2. There is no link to this on Knuth's home page anymore, so I won't link to it either due to copywrite concerns. But let's just say that for those who, like me, have no access to a printed copy of Knuth, Google "knuth permutations" >>= I'm feeling lucky gets you there as of this writing. So, yes, of course, we have re-invented the wheel. But Twan seems to have done quite a good job of it. First of all, permutations3 is essentially Knuth's Algorithm P. As far as Knuth knows, that algorithm was first published by English church bell ringers in the 1600's. So it is quite appropriate for the present season. Knuth's Algorithm G is a general method of generating algorithms that satisfy our consistency condition. The algorithms are parametrized over decompositions of the group structure of the permutation group into Sims tables. Unfortunately, it is not clear which of these many algorithms is Twan's permutations8, nor is it clear (to me) from Knuth which one of them would be fastest. But I agree with Twan that permutations8 must be close to optimal. Regards, Yitz

Bertram Felgenhauer proposed the following version of subsequences:
(3) subsequences :: [a] -> [[a]] subsequences xs = [] : case xs of [] -> [] (x:xs) -> tail (concatMap (\ys -> [ys, x:ys]) (subsequences xs))
I think that lazyness is attractive, and sequence of results is unimportant, so agree with his choice of (3). I see additional improvements in saving the tail: subsequences, nonEmptySubsequences :: [a] -> [[a]] subsequences xs = [] : nonEmptySubsequences xs nonEmptySubsequences [] = [] nonEmptySubsequences (x:xs) = [x] : -- concatMap (\ ys -> [ys, x:ys]) (nonEmptySubsequences xs) foldr f [] (nonEmptySubsequences xs) where f ys r = ys : (x : ys) : r Testing this with: main = do a : _ <- getArgs putStrLn $ show $ sum $ map sum $ subsequences [1 .. read a] and ghc-6.8.2 -O produces the following user times (minimum out of five runs) for argument 24 for the two variants: foldr 6.796s concatMap 8.044s So eliminating (++) clearly pays off, too. Wolfram

Twan van Laarhoven wrote:
Hello all,
As Thomas Hartman noted in a recent cafe thread [1], haskell 1.3 included the functions 'subsequences' and 'permutations'. I think these functions are quite useful, and I don't know why they were ever removed. This is a proposal to add these two functions to Data.List. The implementation is taken directly from the Haskell 1.3 report [2].
Trac ticket: #1990 Discussion period ends: January 7th (since there is a holiday comming up)
The discussion period for this proposal is over. Everyone here seems to agree it is a good idea. The patch attached to the trac ticket contains the best versions discussed in this thread: - Wolfram(kahl@cas.mcmaster.ca)'s version of subsequences - my permutations8c Can someone with the power to do so apply this patch? Twan

Twan van Laarhoven wrote:
Twan van Laarhoven wrote:
Hello all,
As Thomas Hartman noted in a recent cafe thread [1], haskell 1.3 included the functions 'subsequences' and 'permutations'. I think these functions are quite useful, and I don't know why they were ever removed. This is a proposal to add these two functions to Data.List. The implementation is taken directly from the Haskell 1.3 report [2].
Trac ticket: #1990 Discussion period ends: January 7th (since there is a holiday comming up)
The discussion period for this proposal is over. Everyone here seems to agree it is a good idea.
The patch attached to the trac ticket contains the best versions discussed in this thread: - Wolfram(kahl@cas.mcmaster.ca)'s version of subsequences - my permutations8c
Can someone with the power to do so apply this patch?
I've milestoned it for inclusion in GHC 6.10. Cheers, Simon
participants (8)
-
Bertram Felgenhauer
-
Chris Kuklewicz
-
David Benbennick
-
kahl@cas.mcmaster.ca
-
Neil Mitchell
-
Simon Marlow
-
Twan van Laarhoven
-
Yitzchak Gale