partitions of a multiset

Hi all, I've written some code to generate set partitions: import Control.Arrow import Data.List -- pSet' S generates the power set of S, with each subset paired -- with its complement. -- e.g. pSet' [1,2] = [([1,2],[]),([1],[2]),([2],[1]),([],[1,2])]. pSet' :: [a] -> [([a],[a])] pSet' [] = [([],[])] pSet' (x:xs) = mp first ++ mp second where mp which = map (which (x:)) psRest psRest = pSet' xs -- partitions S generates a list of partitions of S. -- e.g. partitions [1,2,3] = [[[1,2,3]],[[1,2],[3]],[[1,3],[2]],[[1],[2,3]],[[1],[2],[3]]]. partitions :: [a] -> [[[a]]] partitions [] = [[]] partitions (x:xs) = (pSet' xs) >>= ((x:) *** partitions >>> uncurry (map . (:))) However, this version of partitions generates duplicates when given a multiset, for example: *Combinatorics> partitions [1,1,2] [[[1,1,2]],[[1,1],[2]],[[1,2],[1]],[[1],[1,2]],[[1],[1],[2]]] The partition [[1,2],[1]] is generated twice (order doesn't matter). I'd like to write a version of partitions which generates duplicate-free output even for input multisets, but haven't come up with a good method yet. Any ideas? -Brent PS Yes, this is for Project Euler. =)

Here's the approach I would try. 1. Use Data.List.group to group your multiset, eg [1,1,2] -> [[1,1],[2]] 2. Now apply your partitions function to each of the groups [[1,1],[2]] -> [ [([1,1],[]), ([1],[1]), ([],[1,1])], [([2],[]), ([],[2])] ] (Actually of course, you can probably write a simpler function to do this) 3. Then you just need a function which can list all possible ways of combining the partial partitions (so it's a kind of Cartesian product). I leave you the pleasure of writing the code.

On 7/23/07, DavidA
Here's the approach I would try. 1. Use Data.List.group to group your multiset, eg [1,1,2] -> [[1,1],[2]] 2. Now apply your partitions function to each of the groups [[1,1],[2]] -> [ [([1,1],[]), ([1],[1]), ([],[1,1])], [([2],[]), ([],[2])] ] (Actually of course, you can probably write a simpler function to do this) 3. Then you just need a function which can list all possible ways of combining the partial partitions (so it's a kind of Cartesian product).
I leave you the pleasure of writing the code.
It seems to me (unless I've missed something?) that this method generates the power set of the original multiset (i.e. all subsets) rather than partitions. (Although, to be sure, it does seem like an elegant and efficient method for doing so; my code for generating the power set of a multiset is somewhat different, I'll have to try this method too.) A partition of a set S is a set of pairwise disjoint subsets of S whose union is S; the definition for a multiset is similar but altered somewhat to allow for the possibility of multiple element copies. I'm not sure what a formal mathematical definition would be off the top of my head; but in Haskell, given a list L :: [a], I'm looking for all partitions P :: [[a]] where (sort . concat $ P) == (sort L). I think I've figured out a method of doing this efficiently when trying to list all factorizations of a number (the original application), but it takes advantage of some specific properties of the problem and I'm still interested in a general solution. It would of course be possible to generate all partitions and then use nubBy with suitable list equality (under recursive sorting), but I feel there must be a way to more directly generate unique partitions in the first place. -Brent

On 7/24/07, Brent Yorgey
I'm not sure what a formal mathematical definition would be off the top of my head; but in Haskell, given a list L :: [a], I'm looking for all partitions P :: [[a]] where (sort . concat $ P) == (sort L).
Here is quick attempt that requires Ord [a] and expects a sorted list. It may very well not be correct, but it seemed to get all the simple cases I tried right. Can you find a counterexample where it doesn't work? import Data.List (nub, (\\)) subsets [] = [[]] subsets xs = []:[ a:b | a <- nub xs, let (_:tl) = dropWhile (/=a) xs, b <- subsets tl ] multiPart [] = [[]] multiPart xs = [ a:b | a <- takeWhile ((head xs ==) . head) $ tail $ subsets xs, b <- multiPart $ xs \\ a, null b || a <= head b ] It would be nice if one could get rid of the (<=) and hence Ord without allowing duplicates. Furthermore, I haven't worried at all about space or time efficiency while writing this. I'd want to get it right first. Pekka

On 7/24/07, Pekka Karjalainen
On 7/24/07, Brent Yorgey
wrote: given a list L :: [a], I'm looking for all partitions P :: [[a]] where (sort . concat $ P) == (sort L).
Here is quick attempt that requires Ord [a] and expects a sorted list. It may very well not be correct, but it seemed to get all the simple cases I tried right. Can you find a counterexample where it doesn't work?
import Data.List (nub, (\\))
subsets [] = [[]] subsets xs = []:[ a:b | a <- nub xs, let (_:tl) = dropWhile (/=a) xs, b <- subsets tl ]
multiPart [] = [[]] multiPart xs = [ a:b | a <- takeWhile ((head xs ==) . head) $ tail $ subsets xs, b <- multiPart $ xs \\ a, null b || a <= head b ]
very nice! I had an inkling that the solution would depend on an ordering of subsets, but I hadn't come up with the right way to do it. Your code seems to make sense to me, and I also checked it successfully with some QuickCheck properties: -- check that every "partition" generated by multiPart is in fact a valid partition. propMultiPartCorrect :: [Int] -> Bool propMultiPartCorrect set = all (== sset) (map (sort . concat) $ multiPart sset) where sset = sort set -- check that multiPart does not generate the same partition twice. propMultiPartDisjoint :: [Int] -> Bool propMultiPartDisjoint set = nub parts == parts where parts = sort . map sort . multiPart . sort $ set I didn't write a test to check that multiPart doesn't miss any partitions, but inspecting some small examples seems to indicate that it doesn't. It would be nice if one could get rid of the (<=) and hence Ord
without allowing duplicates. Furthermore, I haven't worried at all about space or time efficiency while writing this. I'd want to get it right first.
One optimization that might (?) make it faster is to generate subset complements along with subsets (as I did in my original code), which would allow getting rid of the \\. But optimizing certainly isn't my forte. As for getting rid of the Ord, I'm going to go out on a limb and conjecture that the Ord is necessary for an efficient implementation (i.e. with only Eq, the best you can do is to generate ALL partitions and then eliminate duplicates). Off the top of my head I'm not sure how you'd go about trying to prove it, though. -Brent
participants (3)
-
Brent Yorgey
-
DavidA
-
Pekka Karjalainen