Re: generalized shuffle

25 Mar 2009

      oleg@okmij.org ha scritto:
...
Hello!
...
The reason is that in a project I need to extract random elements
    from a list (and I need to extract a lot of them), and using "normal"
    methods [1] seems to be rather inefficient.
Note that I have used an incorrect term, sorry.
What I need, in detail, is to:
Given a population of 100480507 elements, extract 17770 random samples, 
where each sample has a size between 1 and 480189.

Yes, it is again my Netflix Prize project; here I'm trying to write a 
program to generate the entire training data set using random data.
...
...
[1] by normal method I mean: extract a random number, in the range
     [0, n] (where n is the length of the sequence), and get the element
     indexed by that number.
BTW, have you tried to use Data.IntMap? That is, put the sequence into
an IntMap (each element being indexed by its index) and then extract
elements at random; IntMap seems to be quite efficient...
This should be ok if I need independent random choices from a 
population, but I need random samples, instead, sorry for the confusion.

Do you think that it is possible to further optimize your tree 
structure? Or to use IntMap, instead?
...
Cheers,
  Oleg
Thanks   Manlio

Manlio Perillo

tags

participants (1)