Re: [Haskell-beginners] randomize the order of a list

2 Sep 2010

      A small remark I forgot.

First, It seems to me that the choice of k need not be random, but
only arbitrary. I believe that any fixed k would work. In particular,
it is interesting to compare this algorithm with k = n/2, and Heinrich
Apfelmus's algorithm based on mergesort (
http://apfelmus.nfshost.com/articles/random-permutations.html ):

- mergeshuffle split the unshuffled list directly, shuffle each part,
then merge them carefully
- quickshuffle split the unshuffled list carefully, shuffle each part,
then merge them directly

mergeshuffle's `merge` and quickshuffle's `split` are very similar
and, in a sense, dual.
`merge` reason on the probability (number of elements in the left list
to merge) / (total number of elements to merge)
`split` reason on the probability (number of elements left to put in
the left list) / (total number of elements left to split)

On Thu, Sep 2, 2010 at 3:34 PM, Gabriel Scherer
 wrote:
...
I must apologize : a part of my post about quicksort didn't make sense
: if one choose the pivot position randomly, elements shouldn't be
splitted with even probability, because there would be no control over
the size of the results list.
If I understand it correctly, your solution doesn't pick a pivot
position, but dynamically adapt list size probabilities during
traversal.
I have a different solution, that pick a pivot position, then choose
the elements with carefully weighted probability to get the right
left-hand and right-hand sizes. The key idea comes from your analysis
(more precisely, from the presence of the n `over` k probabilities) :
for a given k (the pivot), choose a random subset of k elements as the
left-hand side of the pivot.
import Random (Random, StdGen, randomRIO)
import Control.Monad (liftM)
quickshuffle :: [a] -> IO [a]
quickshuffle [] = return []
quickshuffle [x] = return [x]
quickshuffle xs = do
            (ls, rs) <- partition xs
            sls <- quickshuffle ls
            srs <- quickshuffle rs
            return (sls ++ srs)
-- The idea is that to partition a list of length n, we choose a pivot
-- position randomly (1 < k < n), then choose a subset of k elements
-- in the list to be on the left side, and the other n-k on the right
-- side.
--
-- To choose a random subset of length k among n, scan the list and
-- add each element with probability
--   (number of elements left to pick) / (number of elements left to scan)
--
partition :: [a] -> IO ([a], [a])
partition xs = do
         let n = length xs
         k <- randomRIO (1, n-1)
         split n k ([], []) xs
 where split n k (ls, rs) [] = return (ls, rs)
       split n k (ls, rs) (x:xs) = do
           p <- randomRIO (1, n)
           if p <= k
              then split (n - 1) (k - 1) (x:ls, rs) xs
              else split (n - 1) k       (ls, x:rs) xs
I have also written an implementation for my former algorithm :
import Data.List (mapAccumL, sortBy)
import System.Random (RandomGen, split, randoms)
import Data.Ord (Ordering)
import Data.Function (on)
-- compare two real numbers as infinite sequences of booleans
real_cmp :: [Bool] -> [Bool] -> Ordering
real_cmp (True:_) (False:_) = LT
real_cmp (False:_) (True:_) = GT
real_cmp (_:xs) (_:ys) = real_cmp xs ys
-- weight each element with a random real number
weight_list :: RandomGen g => g -> [a] -> [([Bool], a)]
weight_list g = snd . mapAccumL weight g
               where weight g x =
                                let (g1, g2) = split g in
                                (g1, (randoms g2, x))
-- shuffle by sorting on weights
shuffle :: RandomGen g => g -> [a] -> [a]
shuffle g = map snd . sort_on_weights . weight_list g
       where sort_on_weights = sortBy (real_cmp `on` fst)
...
Interesting question! Adapting quick sort is not that easy, though.
First, you can skip choosing the pivot position because it is already
entailed by the choices of elements left and right to it.
Second, probability 1/2 won't work, it doesn't give a uniform
distribution. In order to get that, the remaining input  xs  has to be
partitioned into two lists  xs = (ls,rs)  such that
     probability that  length ys == k  is   1/(n `over` k)
where
     n `over` k = n! / (k! * (n-k)!)
is the binomial coefficient. After all, calling "quickrandom"
recursively on each of the two lists will give two permutations with
probability  1/k!  and  1/(n-k)!  and the probability for a compound
permutation is
    1/(n `over` k) * 1/k! * 1/(n-k)! = 1/n!
as desired. In contrast, distributing elements with probability 1/2
would give
    probability that  length ys == k  is  (n `over` k) * 2^(-n)
which would skew the distribution heavily towards permutations where the
pivot element is in the middle.
However, it is possible to divide the list properly, though I haven't
worked out the exact numbers. The method would be
     divide (x:xs) = do
          (ls,rs) <- divide xs
          random  <- uniform (0, 1) :: Random Double
          if random <= p (length xs) (length ls)
              then return (x:ls, rs)
              else return (ls, x:rs)
where the probability  p  of putting the element  x  into the left part
has to be chosen such that
    1/(n `over` k) =
        1/(n-1 `over` k-1) * p (n-1) (k-1)
      + 1/(n-1 `over` k  ) * (1 - p (n-1) k)
Regards,
Heinrich Apfelmus