
oleg@okmij.org ha scritto:
Hello!
The reason is that in a project I need to extract random elements from a list (and I need to extract a lot of them), and using "normal" methods [1] seems to be rather inefficient.
Note that I have used an incorrect term, sorry. What I need, in detail, is to: Given a population of 100480507 elements, extract 17770 random samples, where each sample has a size between 1 and 480189. Yes, it is again my Netflix Prize project; here I'm trying to write a program to generate the entire training data set using random data.
[1] by normal method I mean: extract a random number, in the range [0, n] (where n is the length of the sequence), and get the element indexed by that number.
BTW, have you tried to use Data.IntMap? That is, put the sequence into an IntMap (each element being indexed by its index) and then extract elements at random; IntMap seems to be quite efficient...
This should be ok if I need independent random choices from a population, but I need random samples, instead, sorry for the confusion. Do you think that it is possible to further optimize your tree structure? Or to use IntMap, instead?
Cheers, Oleg
Thanks Manlio
participants (1)
-
Manlio Perillo