[Haskell-beginners] Re: randomize the order of a list

8 Sep 2010

      Gabriel Scherer wrote:
...
I must apologize : a part of my post about quicksort didn't make sense
: if one choose the pivot position randomly, elements shouldn't be
splitted with even probability, because there would be no control over
the size of the results list.
No worries, my statement about the probability of the length of the left 
part being k doesn't make sense, either, since the probabilities  1/(n 
`over` k)  don't even add up to a total of one. What is clear is that 
there is no *a priori* reason that 1/2 should work.
...
If I understand it correctly, your solution doesn't pick a pivot
position, but dynamically adapt list size probabilities during
traversal.
Yes. Note, however, that I intended to pick a pivot *element*, i.e. an 
element which, just like in ordinary quick sort, will not be permuted 
subsequently. This is subtly different from a pivot *position* that 
divides the list into two parts but has no element associated to it. I 
think this is important when trying to analyze the naive 1/2 scheme, but 
it's immaterial in your proposal:
...
I have a different solution, that pick a pivot position, then choose
the elements with carefully weighted probability to get the right
left-hand and right-hand sizes. The key idea comes from your analysis
(more precisely, from the presence of the n `over` k probabilities) :
for a given k (the pivot), choose a random subset of k elements as the
left-hand side of the pivot.
import Random (Random, StdGen, randomRIO)
import Control.Monad (liftM)
quickshuffle :: [a] -> IO [a]
quickshuffle [] = return []
quickshuffle [x] = return [x]
quickshuffle xs = do
             (ls, rs) <- partition xs
             sls <- quickshuffle ls
             srs <- quickshuffle rs
             return (sls ++ srs)
-- The idea is that to partition a list of length n, we choose a pivot
-- position randomly (1 < k < n), then choose a subset of k elements
-- in the list to be on the left side, and the other n-k on the right
-- side.
--
-- To choose a random subset of length k among n, scan the list and
-- add each element with probability
--   (number of elements left to pick) / (number of elements left to scan)
--
partition :: [a] -> IO ([a], [a])
partition xs = do
          let n = length xs
          k <- randomRIO (1, n-1)
          split n k ([], []) xs
  where split n k (ls, rs) [] = return (ls, rs)
        split n k (ls, rs) (x:xs) = do
            p <- randomRIO (1, n)
            if p <= k
               then split (n - 1) (k - 1) (x:ls, rs) xs
               else split (n - 1) k       (ls, x:rs) xs
Yes, this algorithm should work. Of course, while the probability

    (number of elements left to pick) / (number of elements left to scan)

for picking an element x seems to be the only sensible choice, one still 
has to prove that this gives a uniform distribution. (Embarrassingly, my 
article about "merge shuffle" lacks a proof, too, I plan to rewrite it 
at some point.)

Proving uniformity proceeds in two steps:

First, we have to argue that picking the first  k  elements uniformly 
and then permuting them does give a uniform *total* permutation. This is 
fairly obvious, though. Namely, the set of possible permutations of  n 
elements can be partitioned into (n `over` k) classes, where two 
permutations belong to the same class if they have the same first  k 
elements (in any order). For instance,

   k = 3, n = 5
   [1,2,3,5,4] is the same class as [3,2,1,4,5]
      because the first  k  elements are  {1,2,3}
   [1,2,3,5,4] is in a different class than [4,2,1,3,5]
      because the first  k  elements are  {1,2,3} and {1,2,4} resp.

Now, to pick a random permutation, we first pick a class at random and 
then pick a permutation from this class. The point is that for reasons 
of symmetry, all classes have the same size! Namely, there are  k! * 
(n-k)!  permutations in every class. That means we should pick the class 
with uniform probability, and that's exactly what the  split  function 
intends to do.

Second, we have to argue that  split  does indeed pick a class (= a set 
of  k  elements) *uniformly*. The reasoning for that is as follows: 
Consider a particular element  x . There are

    (n-1 `over` k-1)  classes that contain  x
    (n-1 `over` k  )  classes that don't contain  x

(This correctly adds up to (n `over` k) classes in total). Hence, the 
probability that the class contains  x  is

     classes with x / total classes
   = (n-1 `over` k-1) / (n `over` k)
   = ( (n-1)! / ((k-1)!(n-k)!) ) / ( n! / (k!(n-k)!) )
   = (n-1)! / n! * k! / (k-1)!
   = k / n
   = elements left to pick / elements left to scan

This shows that  split  does indeed use the correct probability for 
including  x .

Regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com