random shuffle and random list partition

Hi. For my Netflix Prize project I have implemented two reusable modules. The first module implements a random shuffle on immutable lists. It uses http://okmij.org/ftp/Haskell/perfect-shuffle.txt, with an additional "wrapper" function, having a more friendly interface. The second module implements a function used to partition a list into n sublists of random length. I have pasted the modules here: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2483 http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485 If someone is interested (and if Oleg give me permission), I can release them as a package on Hackage. I need to improve documentation, however. In future I can add an implementation of the random shuffle algorithm on mutable arrays in the ST monad. Manlio

Hi Manlio, Manlio Perillo wrote:
For my Netflix Prize project I have implemented two reusable modules. The first module implements a random shuffle on immutable lists... The second module implements a function used to partition a list into n sublists of random length.
Very nice!
If someone is interested (and if Oleg give me permission), I can release them as a package on Hackage.
Please do that. While I think Oleg's tree method is beautiful, in practice it may be re-inventing the wheel. I haven't tested it, but I doubt that this implementation is much better than using the classical shuffle algorithm on an IntMap. It's essentially the same tree inside. That's what I usually use for this, and it works fine.
In future I can add an implementation of the random shuffle algorithm on mutable arrays in the ST monad.
I've tried that in the past. Surprisingly, it wasn't faster than using trees. Perhaps I did something wrong. Or perhaps the difference only becomes apparent for huge lists. As you point out, your partition algorithm is not fair. Using your Random.Shuffle and a well-know trick from combinatorics, you can easily get a fair partitions function: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485#a2495 Regards, Yitz

Yitzchak Gale ha scritto:
[...] While I think Oleg's tree method is beautiful, in practice it may be re-inventing the wheel. I haven't tested it, but I doubt that this implementation is much better than using the classical shuffle algorithm on an IntMap.
Do you have a working implementation?
It's essentially the same tree inside. That's what I usually use for this, and it works fine.
Oleg implementation is rather efficient, but it requires a lot of memory for huge lists. Here, as an example, two programs, one in Python and one in Haskell. The default Python generator in Python use the Mersenne Twister, but returning floats number in the range [0, 1]. # Python version from random import shuffle n = 10000000 m = 10 l = range(1, n + 1) shuffle(l) print l[:m] -- Haskell version module Main where import Random.Shuffle import System.Random.Mersenne.Pure64 (newPureMT) n = 10000000 m = 10 l = [1 .. n] main = do gen <- newPureMT print $ take m $ shuffle' l n gen The Python version performances are: real 0m16.812s user 0m16.469s sys 0m0.280s 150 MB memory usage The Haskell version performances are: real 0m8.757s user 0m7.920s sys 0m0.792s 800 MB memory usage
In future I can add an implementation of the random shuffle algorithm on mutable arrays in the ST monad.
I've tried that in the past. Surprisingly, it wasn't faster than using trees. Perhaps I did something wrong. Or perhaps the difference only becomes apparent for huge lists.
Can you try it on the list I have posted above?
As you point out, your partition algorithm is not fair. Using your Random.Shuffle and a well-know trick from combinatorics, you can easily get a fair partitions function:
Thanks, this is very nice. I have to run some benchmarks to see if it is efficient. Regards Manlio

Yitzchak Gale ha scritto:
Hi Manlio,
Manlio Perillo wrote:
For my Netflix Prize project I have implemented two reusable modules. The first module implements a random shuffle on immutable lists... The second module implements a function used to partition a list into n sublists of random length.
[...]
As you point out, your partition algorithm is not fair. Using your Random.Shuffle and a well-know trick from combinatorics, you can easily get a fair partitions function:
Someone added an alternative implementation: http://hpaste.org/fastcgi/hpaste.fcgi/view?id=2485#a2497 Regards Manlio
participants (2)
-
Manlio Perillo
-
Yitzchak Gale