
A small remark I forgot.
First, It seems to me that the choice of k need not be random, but
only arbitrary. I believe that any fixed k would work. In particular,
it is interesting to compare this algorithm with k = n/2, and Heinrich
Apfelmus's algorithm based on mergesort (
http://apfelmus.nfshost.com/articles/random-permutations.html ):
- mergeshuffle split the unshuffled list directly, shuffle each part,
then merge them carefully
- quickshuffle split the unshuffled list carefully, shuffle each part,
then merge them directly
mergeshuffle's `merge` and quickshuffle's `split` are very similar
and, in a sense, dual.
`merge` reason on the probability (number of elements in the left list
to merge) / (total number of elements to merge)
`split` reason on the probability (number of elements left to put in
the left list) / (total number of elements left to split)
On Thu, Sep 2, 2010 at 3:34 PM, Gabriel Scherer
I must apologize : a part of my post about quicksort didn't make sense : if one choose the pivot position randomly, elements shouldn't be splitted with even probability, because there would be no control over the size of the results list.
If I understand it correctly, your solution doesn't pick a pivot position, but dynamically adapt list size probabilities during traversal.
I have a different solution, that pick a pivot position, then choose the elements with carefully weighted probability to get the right left-hand and right-hand sizes. The key idea comes from your analysis (more precisely, from the presence of the n `over` k probabilities) : for a given k (the pivot), choose a random subset of k elements as the left-hand side of the pivot.
import Random (Random, StdGen, randomRIO) import Control.Monad (liftM)
quickshuffle :: [a] -> IO [a] quickshuffle [] = return [] quickshuffle [x] = return [x] quickshuffle xs = do (ls, rs) <- partition xs sls <- quickshuffle ls srs <- quickshuffle rs return (sls ++ srs)
-- The idea is that to partition a list of length n, we choose a pivot -- position randomly (1 < k < n), then choose a subset of k elements -- in the list to be on the left side, and the other n-k on the right -- side. -- -- To choose a random subset of length k among n, scan the list and -- add each element with probability -- (number of elements left to pick) / (number of elements left to scan) -- partition :: [a] -> IO ([a], [a]) partition xs = do let n = length xs k <- randomRIO (1, n-1) split n k ([], []) xs where split n k (ls, rs) [] = return (ls, rs) split n k (ls, rs) (x:xs) = do p <- randomRIO (1, n) if p <= k then split (n - 1) (k - 1) (x:ls, rs) xs else split (n - 1) k (ls, x:rs) xs
I have also written an implementation for my former algorithm :
import Data.List (mapAccumL, sortBy) import System.Random (RandomGen, split, randoms) import Data.Ord (Ordering) import Data.Function (on) -- compare two real numbers as infinite sequences of booleans real_cmp :: [Bool] -> [Bool] -> Ordering real_cmp (True:_) (False:_) = LT real_cmp (False:_) (True:_) = GT real_cmp (_:xs) (_:ys) = real_cmp xs ys -- weight each element with a random real number weight_list :: RandomGen g => g -> [a] -> [([Bool], a)] weight_list g = snd . mapAccumL weight g where weight g x = let (g1, g2) = split g in (g1, (randoms g2, x)) -- shuffle by sorting on weights shuffle :: RandomGen g => g -> [a] -> [a] shuffle g = map snd . sort_on_weights . weight_list g where sort_on_weights = sortBy (real_cmp `on` fst)
Interesting question! Adapting quick sort is not that easy, though.
First, you can skip choosing the pivot position because it is already entailed by the choices of elements left and right to it.
Second, probability 1/2 won't work, it doesn't give a uniform distribution. In order to get that, the remaining input xs has to be partitioned into two lists xs = (ls,rs) such that
probability that length ys == k is 1/(n `over` k)
where
n `over` k = n! / (k! * (n-k)!)
is the binomial coefficient. After all, calling "quickrandom" recursively on each of the two lists will give two permutations with probability 1/k! and 1/(n-k)! and the probability for a compound permutation is
1/(n `over` k) * 1/k! * 1/(n-k)! = 1/n!
as desired. In contrast, distributing elements with probability 1/2 would give
probability that length ys == k is (n `over` k) * 2^(-n)
which would skew the distribution heavily towards permutations where the pivot element is in the middle.
However, it is possible to divide the list properly, though I haven't worked out the exact numbers. The method would be
divide (x:xs) = do (ls,rs) <- divide xs random <- uniform (0, 1) :: Random Double if random <= p (length xs) (length ls) then return (x:ls, rs) else return (ls, x:rs)
where the probability p of putting the element x into the left part has to be chosen such that
1/(n `over` k) = 1/(n-1 `over` k-1) * p (n-1) (k-1) + 1/(n-1 `over` k ) * (1 - p (n-1) k)
Regards, Heinrich Apfelmus