
Gabriel Scherer wrote:
I must apologize : a part of my post about quicksort didn't make sense : if one choose the pivot position randomly, elements shouldn't be splitted with even probability, because there would be no control over the size of the results list.
No worries, my statement about the probability of the length of the left part being k doesn't make sense, either, since the probabilities 1/(n `over` k) don't even add up to a total of one. What is clear is that there is no *a priori* reason that 1/2 should work.
If I understand it correctly, your solution doesn't pick a pivot position, but dynamically adapt list size probabilities during traversal.
Yes. Note, however, that I intended to pick a pivot *element*, i.e. an element which, just like in ordinary quick sort, will not be permuted subsequently. This is subtly different from a pivot *position* that divides the list into two parts but has no element associated to it. I think this is important when trying to analyze the naive 1/2 scheme, but it's immaterial in your proposal:
I have a different solution, that pick a pivot position, then choose the elements with carefully weighted probability to get the right left-hand and right-hand sizes. The key idea comes from your analysis (more precisely, from the presence of the n `over` k probabilities) : for a given k (the pivot), choose a random subset of k elements as the left-hand side of the pivot.
import Random (Random, StdGen, randomRIO) import Control.Monad (liftM)
quickshuffle :: [a] -> IO [a] quickshuffle [] = return [] quickshuffle [x] = return [x] quickshuffle xs = do (ls, rs) <- partition xs sls <- quickshuffle ls srs <- quickshuffle rs return (sls ++ srs)
-- The idea is that to partition a list of length n, we choose a pivot -- position randomly (1 < k < n), then choose a subset of k elements -- in the list to be on the left side, and the other n-k on the right -- side. -- -- To choose a random subset of length k among n, scan the list and -- add each element with probability -- (number of elements left to pick) / (number of elements left to scan) -- partition :: [a] -> IO ([a], [a]) partition xs = do let n = length xs k <- randomRIO (1, n-1) split n k ([], []) xs where split n k (ls, rs) [] = return (ls, rs) split n k (ls, rs) (x:xs) = do p <- randomRIO (1, n) if p <= k then split (n - 1) (k - 1) (x:ls, rs) xs else split (n - 1) k (ls, x:rs) xs
Yes, this algorithm should work. Of course, while the probability (number of elements left to pick) / (number of elements left to scan) for picking an element x seems to be the only sensible choice, one still has to prove that this gives a uniform distribution. (Embarrassingly, my article about "merge shuffle" lacks a proof, too, I plan to rewrite it at some point.) Proving uniformity proceeds in two steps: First, we have to argue that picking the first k elements uniformly and then permuting them does give a uniform *total* permutation. This is fairly obvious, though. Namely, the set of possible permutations of n elements can be partitioned into (n `over` k) classes, where two permutations belong to the same class if they have the same first k elements (in any order). For instance, k = 3, n = 5 [1,2,3,5,4] is the same class as [3,2,1,4,5] because the first k elements are {1,2,3} [1,2,3,5,4] is in a different class than [4,2,1,3,5] because the first k elements are {1,2,3} and {1,2,4} resp. Now, to pick a random permutation, we first pick a class at random and then pick a permutation from this class. The point is that for reasons of symmetry, all classes have the same size! Namely, there are k! * (n-k)! permutations in every class. That means we should pick the class with uniform probability, and that's exactly what the split function intends to do. Second, we have to argue that split does indeed pick a class (= a set of k elements) *uniformly*. The reasoning for that is as follows: Consider a particular element x . There are (n-1 `over` k-1) classes that contain x (n-1 `over` k ) classes that don't contain x (This correctly adds up to (n `over` k) classes in total). Hence, the probability that the class contains x is classes with x / total classes = (n-1 `over` k-1) / (n `over` k) = ( (n-1)! / ((k-1)!(n-k)!) ) / ( n! / (k!(n-k)!) ) = (n-1)! / n! * k! / (k-1)! = k / n = elements left to pick / elements left to scan This shows that split does indeed use the correct probability for including x . Regards, Heinrich Apfelmus -- http://apfelmus.nfshost.com