Performance problem with random numbers

ntupel

12 Oct 2007 12 Oct '07

10:09 p.m.

Dear all, I have implemented a small module to generate random items with a given probability distribution using the alias approach [1] and unfortunately compared to similar implementations in C++ or Java it is about 10 times slower. I have to confess that I am still in the early stages of writing Haskell programs so there is hopefully something I can do about it and I hope some helpful soul on this list can give me a hint. I have attached my implementation and a small testing routine which runs in about 5 seconds on my machine (when compiled with -O2) whereas my Java-Implementation finishes in about 0.48 seconds. Profiling indicates that the time is mostly spend in System.Random.random and System.Random.randomR so I wonder if these are slow or what else might cause the relative slowness of my implementation. Many thanks for your responses, Thoralf PS: I would also appreciate any feedback about the module from a design perspective. I bet I miss lots of good Haskell idioms. ---------------- [1] The alias methods moves probability mass of a non-uniform probability distribution around to create a uniform distribution. Lets say you have three items "a", "b", and "c" with distribution [0.2, 0.1, 0.7] then a uniform distribution would assign 1/3 to each so "a" and "b" need to be "filled up" with exactly the same probability mass that "c" has too much. Then 2 uniform random numbers are generated. The first one to pick an item and the second one to pick either the item itself if the value is in the original part or the alias otherwise. A much better explanation can be found on the web somewhere. Anyway it should not matter with regards to the performance problem I have.

Attachments:

main.hs (text/x-haskell — 394 bytes)
random.hs (text/x-haskell — 2.7 KB)

Show replies by date

Stefan O'Rear

13 Oct 13 Oct

3:25 a.m.

On Sat, Oct 13, 2007 at 12:09:57AM +0200, ntupel wrote:

...

Dear all,

I have implemented a small module to generate random items with a given probability distribution using the alias approach [1] and unfortunately compared to similar implementations in C++ or Java it is about 10 times slower. I have to confess that I am still in the early stages of writing Haskell programs so there is hopefully something I can do about it and I hope some helpful soul on this list can give me a hint.

...

setup :: (Ord a, IArray a2 a, IArray a1 e, Num a) => [e] -> [a] -> (a1 Int e, a1 Int e, a2 Int a) calcAlias :: (Ord e, Num e, IArray a e, Ix i, IArray a2 e1, IArray a1 e1) => a2 i e1 -> a1 i e1 -> a i e -> [i] -> [i] -> (a1 i e1, a i e) next :: (IArray a2 e1, IArray a e1, Ord e, IArray a1 e, RandomGen t, Random e) => (a Int e1, a2 Int e1, a1 Int e) -> t -> (e1, t) randomList :: (Random e, RandomGen t1, IArray a2 e, Ord e, IArray a t, IArray a1 t) => (a Int t, a1 Int t, a2 Int e) -> t1 -> [t]

I haven't looked at your code in depth, but these long contexts are something of a red flag. Overloading in Haskell is a very neat mechanism, but it is not really suitable for inner loops; each type class listed turns into an extra parameter, and proxy methods use indirect function calls for the operations (which unfortunately show up in profiles with the same names as the specific operations). I would try specializing to StdGen, UArray, and Int, for RandomGen, IArray, and Random respectively. P.S. Most real programs (outside of specialized niches like Monte-Carlo simulation) spend neglible amounts of time in random number generation. Profile before optimizing! If you've already profiled the real program, ignore this postscript. Stefan

Don Stewart

4:33 a.m.

stefanor:

...

On Sat, Oct 13, 2007 at 12:09:57AM +0200, ntupel wrote:

...
Dear all,

I have implemented a small module to generate random items with a given probability distribution using the alias approach [1] and unfortunately compared to similar implementations in C++ or Java it is about 10 times slower. I have to confess that I am still in the early stages of writing Haskell programs so there is hopefully something I can do about it and I hope some helpful soul on this list can give me a hint.

...
setup :: (Ord a, IArray a2 a, IArray a1 e, Num a) => [e] -> [a] -> (a1 Int e, a1 Int e, a2 Int a) calcAlias :: (Ord e, Num e, IArray a e, Ix i, IArray a2 e1, IArray a1 e1) => a2 i e1 -> a1 i e1 -> a i e -> [i] -> [i] -> (a1 i e1, a i e) next :: (IArray a2 e1, IArray a e1, Ord e, IArray a1 e, RandomGen t, Random e) => (a Int e1, a2 Int e1, a1 Int e) -> t -> (e1, t) randomList :: (Random e, RandomGen t1, IArray a2 e, Ord e, IArray a t, IArray a1 t) => (a Int t, a1 Int t, a2 Int e) -> t1 -> [t]

I haven't looked at your code in depth, but these long contexts are something of a red flag. Overloading in Haskell is a very neat mechanism, but it is not really suitable for inner loops; each type class listed turns into an extra parameter, and proxy methods use indirect function calls for the operations (which unfortunately show up in profiles with the same names as the specific operations).

I would try specializing to StdGen, UArray, and Int, for RandomGen, IArray, and Random respectively.

P.S. Most real programs (outside of specialized niches like Monte-Carlo simulation) spend neglible amounts of time in random number generation. Profile before optimizing! If you've already profiled the real program, ignore this postscript.

Stefan

I agree with Stefan's analysis. Very suspicious types for high performance code! And again, the only time random generation actually mattered to me was in a high perf monte carlo simulation, after all the inner loops had been specialised. -- Don

Don Stewart

4:40 a.m.

dons:

...

stefanor:

...
On Sat, Oct 13, 2007 at 12:09:57AM +0200, ntupel wrote:

...
Dear all,

I have implemented a small module to generate random items with a given probability distribution using the alias approach [1] and unfortunately compared to similar implementations in C++ or Java it is about 10 times slower. I have to confess that I am still in the early stages of writing Haskell programs so there is hopefully something I can do about it and I hope some helpful soul on this list can give me a hint.

...
setup :: (Ord a, IArray a2 a, IArray a1 e, Num a) => [e] -> [a] -> (a1 Int e, a1 Int e, a2 Int a) calcAlias :: (Ord e, Num e, IArray a e, Ix i, IArray a2 e1, IArray a1 e1) => a2 i e1 -> a1 i e1 -> a i e -> [i] -> [i] -> (a1 i e1, a i e) next :: (IArray a2 e1, IArray a e1, Ord e, IArray a1 e, RandomGen t, Random e) => (a Int e1, a2 Int e1, a1 Int e) -> t -> (e1, t) randomList :: (Random e, RandomGen t1, IArray a2 e, Ord e, IArray a t, IArray a1 t) => (a Int t, a1 Int t, a2 Int e) -> t1 -> [t]

I haven't looked at your code in depth, but these long contexts are something of a red flag. Overloading in Haskell is a very neat mechanism, but it is not really suitable for inner loops; each type class listed turns into an extra parameter, and proxy methods use indirect function calls for the operations (which unfortunately show up in profiles with the same names as the specific operations).

I would try specializing to StdGen, UArray, and Int, for RandomGen, IArray, and Random respectively.

P.S. Most real programs (outside of specialized niches like Monte-Carlo simulation) spend neglible amounts of time in random number generation. Profile before optimizing! If you've already profiled the real program, ignore this postscript.

Stefan

I agree with Stefan's analysis. Very suspicious types for high performance code!

And again, the only time random generation actually mattered to me was in a high perf monte carlo simulation, after all the inner loops had been specialised.

I should add: and in that case we called the mersenne twister generator carefully optimised in C. -- Don

ntupel

10:52 a.m.

On Fri, 2007-10-12 at 20:25 -0700, Stefan O'Rear wrote:

...

On Sat, Oct 13, 2007 at 12:09:57AM +0200, ntupel wrote:

...
setup :: (Ord a, IArray a2 a, IArray a1 e, Num a) => [e] -> [a] -> (a1 Int e, a1 Int e, a2 Int a) calcAlias :: (Ord e, Num e, IArray a e, Ix i, IArray a2 e1, IArray a1 e1) => a2 i e1 -> a1 i e1 -> a i e -> [i] -> [i] -> (a1 i e1, a i e) next :: (IArray a2 e1, IArray a e1, Ord e, IArray a1 e, RandomGen t, Random e) => (a Int e1, a2 Int e1, a1 Int e) -> t -> (e1, t) randomList :: (Random e, RandomGen t1, IArray a2 e, Ord e, IArray a t, IArray a1 t) => (a Int t, a1 Int t, a2 Int e) -> t1 -> [t]

...

I would try specializing to StdGen, UArray, and Int, for RandomGen, IArray, and Random respectively.

Thanks for your reply Stefan. Unfortunately I could measure only a relatively small improvement by changing to concrete types, e.g. using setup :: [a] -> [Double] -> (Array Int a, Array Int a, UArray Int Double) calcAlias :: Array Int a -> Array Int a -> UArray Int Double -> [Int] -> [Int] -> (Array Int a, UArray Int Double) next :: (Array Int a, Array Int a, UArray Int Double) -> StdGen -> (a, StdGen) randomList :: (Array Int a, Array Int a, UArray Int Double) -> StdGen -> [a] the sample code was about one second faster when compiled with -O2. Profiling again indicated that most time was spend in random and randomR (I manually added cost centers into "next"): main +RTS -p -RTS total time = 8.00 secs (160 ticks @ 50 ms) total alloc = 2,430,585,728 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc random Random 60.0 54.5 randomR Random 20.0 23.3 next Random 17.5 17.0 main Main 1.9 2.5 randomList Random 0.6 2.8 previously (i.e. with long class contexts) it looked like this: main +RTS -p -RTS total time = 7.85 secs (157 ticks @ 50 ms) total alloc = 2,442,579,716 bytes (excludes profiling overheads) COST CENTRE MODULE %time %alloc random Random 58.6 54.5 randomR Random 22.9 23.3 next Random 14.6 16.5 main Main 2.5 2.5 randomList Random 1.3 3.1 Many thanks again, Thoralf

Brandon S. Allbery KF8NH

1:56 p.m.

On Oct 13, 2007, at 6:52 , ntupel wrote:

...

On Fri, 2007-10-12 at 20:25 -0700, Stefan O'Rear wrote:

...
On Sat, Oct 13, 2007 at 12:09:57AM +0200, ntupel wrote:

...
setup :: (Ord a, IArray a2 a, IArray a1 e, Num a) => [e] -> [a] -

...
(a1 Int e, a1 Int e, a2 Int a) calcAlias :: (Ord e, Num e, IArray a e, Ix i, IArray a2 e1, IArray a1 e1) => a2 i e1 -> a1 i e1 -> a i e -> [i] -> [i] -> (a1 i e1, a i e) next :: (IArray a2 e1, IArray a e1, Ord e, IArray a1 e, RandomGen t, Random e) => (a Int e1, a2 Int e1, a1 Int e) -> t -> (e1, t) randomList :: (Random e, RandomGen t1, IArray a2 e, Ord e, IArray a t, IArray a1 t) => (a Int t, a1 Int t, a2 Int e) -> t1 -> [t]

...

...
I would try specializing to StdGen, UArray, and Int, for RandomGen, IArray, and Random respectively.

Thanks for your reply Stefan. Unfortunately I could measure only a relatively small improvement by changing to concrete types, e.g. using (...) COST CENTRE MODULE %time %alloc

random Random 60.0 54.5 randomR Random 20.0 23.3 next Random 17.5 17.0 main Main 1.9 2.5 randomList Random 0.6 2.8

Now you need to start forcing things; given laziness, things tend to only get forced when in IO, which leads to time being accounted to the routine where the forcing happened. If random / randomR are invoked with large unevaluated thunks, their forcing will generally be attributed to them, not to functions within the thunks. (Yes, this means profiling lazy programs is a bit of a black art.) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

ntupel

3:40 p.m.

On Sat, 2007-10-13 at 09:56 -0400, Brandon S. Allbery KF8NH wrote:

...

Now you need to start forcing things; given laziness, things tend to only get forced when in IO, which leads to time being accounted to the routine where the forcing happened. If random / randomR are invoked with large unevaluated thunks, their forcing will generally be attributed to them, not to functions within the thunks.

But AFAIK random and randomR only take a StdGen (plus a range argument in case of randomR) as argument so I don't understand where the unevaluated thunks might be actually? (Maybe I should have said that random and randomR are the ones from GHC's System.Random module.) Thanks, Thoralf

Brandon S. Allbery KF8NH

4:42 p.m.

On Oct 13, 2007, at 11:40 , ntupel wrote:

...

On Sat, 2007-10-13 at 09:56 -0400, Brandon S. Allbery KF8NH wrote:

...
Now you need to start forcing things; given laziness, things tend to only get forced when in IO, which leads to time being accounted to the routine where the forcing happened. If random / randomR are invoked with large unevaluated thunks, their forcing will generally be attributed to them, not to functions within the thunks.

But AFAIK random and randomR only take a StdGen (plus a range argument in case of randomR) as argument so I don't understand where the unevaluated thunks might be actually? (Maybe I should have said that random and randomR are the ones from GHC's System.Random module.)

Your apparently simple StdGen argument is actually a sort of program state (represented by unevaluated thunks, not by a state monad; see below) which gets altered with every invocation of random. If nothing is forced until the very end, it in effect becomes an expression which produces the desired StdGen, with the uses of the previous StdGen values as "side effects" of its computation that occur when the thunk is evaluated at the end. I'm not sure I'm up to working through an example of what this looks like. Suffice it to say that in a lazy language like Haskell, almost any "simple" expression can in practice end up being a suspended computation (a thunk) consisting of whatever is supposed to produce it. And in the general case (e.g. you don't use strictness annotations) the only way to force evaluation is to do I/O, so it's quite normal for a "naive" program to end up being one big thunk dangling off a PutStrLn. So why does random get tagged for it? Because it's a state-like function (that is, a function of the form s -> (a,s); compare to the definition of the State monad) which takes a StdGen and produces a modified StdGen, so when Haskell finally evaluates the thunk most of the activity happens in the context of evaluating that modification. Hopefully someone reading -cafe can explain it better; I'm pretty lousy at it, as you can probably tell. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

ntupel

5:30 p.m.

On Sat, 2007-10-13 at 12:42 -0400, Brandon S. Allbery KF8NH wrote:

...

Your apparently simple StdGen argument is actually a sort of program state (represented by unevaluated thunks, not by a state monad; see below) which gets altered with every invocation of random. If nothing is forced until the very end, it in effect becomes an expression which produces the desired StdGen, with the uses of the previous StdGen values as "side effects" of its computation that occur when the thunk is evaluated at the end. I'm not sure I'm up to working through an example of what this looks like.

Thanks Brandon. I understand your argument but I don't know how to put it into practice, i.e. how to force the evaluation of StdGen. - Thoralf

Brandon S. Allbery KF8NH

5:35 p.m.

On Oct 13, 2007, at 13:30 , ntupel wrote:

...

On Sat, 2007-10-13 at 12:42 -0400, Brandon S. Allbery KF8NH wrote:

...
Your apparently simple StdGen argument is actually a sort of program state (represented by unevaluated thunks, not by a state monad; see below) which gets altered with every invocation of random. If nothing is forced until the very end, it in effect becomes an expression which produces the desired StdGen, with the uses of the previous StdGen values as "side effects" of its computation that occur when the thunk is evaluated at the end. I'm not sure I'm up to working through an example of what this looks like.

Thanks Brandon. I understand your argument but I don't know how to put it into practice, i.e. how to force the evaluation of StdGen.

For starters, look into "seq". Try applying it to any expression using a generated random number. This should force evaluation to occur somewhere other than when random is trying to figure out what StdGen value it's been told to use as its initial state. Alternately you can put all the uses in IO and use Control.Exception.evaluate (or even print). This is probably not what you want to do in your actual production code, however. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

ntupel

6:59 p.m.

On Sat, 2007-10-13 at 13:35 -0400, Brandon S. Allbery KF8NH wrote:

...

For starters, look into "seq". Try applying it to any expression using a generated random number. This should force evaluation to occur somewhere other than when random is trying to figure out what StdGen value it's been told to use as its initial state.

Ok, but I still wonder where that might be. random and randomR are used in a function named "next" as show here: next :: (Array Int a, Array Int a, UArray Int Double) -> StdGen -> (a, StdGen) next (xs, as, rs) g = let n = length $ indices xs (x1, g1) = randomR (0, n - 1) g (x2, g2) = random g1 r = rs!x1 in if x2 <= r then (xs!x1, g2) else (as!x1, g2) x1 and x2 are used in the same function so I assume this already requires their evaluation. The only function that calls "next" is randomList: randomList :: (Array Int a, Array Int a, UArray Int Double) -> StdGen -> [a] randomList t g = let (n, g') = next t g in n:randomList t g' Cf. my original e-mail for the complete program.

...

Alternately you can put all the uses in IO and use Control.Exception.evaluate (or even print). This is probably not what you want to do in your actual production code, however.

Right. This is not what I want. Many thanks again, Thoralf

ntupel

14 Oct 14 Oct

9:54 p.m.

New subject: Laziness (was: Performance problem with random numbers)

On Sat, 2007-10-13 at 09:56 -0400, Brandon S. Allbery KF8NH wrote:

...

Now you need to start forcing things; given laziness, things tend to only get forced when in IO, which leads to time being accounted to the routine where the forcing happened. If random / randomR are invoked with large unevaluated thunks, their forcing will generally be attributed to them, not to functions within the thunks.

(Yes, this means profiling lazy programs is a bit of a black art.)

After more testing I finally realized how right you are. It appears that my problem is not related to random/randomR but only to laziness. I came up with a test that doesn't use random numbers at all and still needs about 2.5 seconds to complete (it is really just meaningless computations): module Main where import Data.List main :: IO () main = do let n = 1000000 :: Int print $ foldl' (\x y -> seq y x) 0 (take n $ test 1 [1,2..]) test :: Int -> [Int] -> [Int] test t g = let (n, g') = next t g in n:test t g' next :: Int -> [Int] -> (Int, [Int]) next x (y:ys) = let n = func y in if n <= 0.5 then (x, ys) else (0, ys) where func x = fromIntegral x / (10 ^ len x) where len 0 = 0 len n = 1 + len (n `div` 10) Now my problem still is, that I don't know how to speed things up. I tried putting seq and $! at various places with no apparent improvement. Maybe I need to find a different data structure for my random module and lazy lists are simply not working well enough here? Thanks, Thoralf

Brandon S. Allbery KF8NH

10:14 p.m.

New subject: Laziness (was: Performance problem with random numbers)

On Oct 14, 2007, at 17:54 , ntupel wrote:

...

Now my problem still is, that I don't know how to speed things up. I tried putting seq and $! at various places with no apparent improvement. Maybe I need to find a different data structure for my random module and lazy lists are simply not working well enough here?

Unfortunately I'm not so good at that myself. Even more unfortunately, my understanding is that randomly using seq and/or $! not only usually doesn't help, but can actually make things slower; and to do it right, you need to refer to the "simplified" Core Haskell code generated by GHC. And understanding *that* requires rather more familiarity with Core than I have. :/ -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

Derek Elkins

10:29 p.m.

New subject: Laziness (was: Performance problem with random numbers)

On Sun, 2007-10-14 at 18:14 -0400, Brandon S. Allbery KF8NH wrote:

...

On Oct 14, 2007, at 17:54 , ntupel wrote:

...
Now my problem still is, that I don't know how to speed things up. I tried putting seq and $! at various places with no apparent improvement. Maybe I need to find a different data structure for my random module and lazy lists are simply not working well enough here?

Unfortunately I'm not so good at that myself. Even more unfortunately, my understanding is that randomly using seq and/or $! not only usually doesn't help, but can actually make things slower; and to do it right, you need to refer to the "simplified" Core Haskell code generated by GHC. And understanding *that* requires rather more familiarity with Core than I have. :/

A lot of times just unfolding a few evaluations by hand (perhaps mentally) will point out issues readily and readily suggest there solution. After a while you will know what kinds of things are problematic and not write such code to begin with. Unfortunately, this is not something widely and well understood and is not part of almost any of the available educational material for Haskell. Programming in a lazy language is more different than programming in an eager one than almost any resource states.

David Roundy

15 Oct 15 Oct

2:48 p.m.

New subject: Laziness (was: Performance problem with random numbers)

On Sun, Oct 14, 2007 at 11:54:54PM +0200, ntupel wrote:

...

On Sat, 2007-10-13 at 09:56 -0400, Brandon S. Allbery KF8NH wrote:

...
Now you need to start forcing things; given laziness, things tend to only get forced when in IO, which leads to time being accounted to the routine where the forcing happened. If random / randomR are invoked with large unevaluated thunks, their forcing will generally be attributed to them, not to functions within the thunks.

(Yes, this means profiling lazy programs is a bit of a black art.)

After more testing I finally realized how right you are. It appears that my problem is not related to random/randomR but only to laziness. I came up with a test that doesn't use random numbers at all and still needs about 2.5 seconds to complete (it is really just meaningless computations):

Here's a modified version of your code that prints out a real result, by using sum rather than seq to force the computation: module Main where main :: IO () main = do let n = 1000000 :: Int print $ sum (take n $ test 1 [1,2..]) test :: Int -> [Int] -> [Int] test t g = let (n, g') = next t g in n:test t g' next :: Int -> [Int] -> (Int, [Int]) next x (y:ys) = let n = func y in if n <= 0.5 then (x, ys) else (0, ys) where func x = fromIntegral x / (10 ^ len x) where len 0 = 0 len n = 1 + len (n `div` 10) On my computer this takes 4 seconds to run. I can speed it up by an order of magnitude by writing code that is friendlier to the compiler: module Main where main :: IO () main = do let n = 1000000 :: Int print $ sum (take n $ test 1 [1,2..]) test :: Int -> [Int] -> [Int] test t g = map f g where f :: Int -> Int f y = if func y <= 0.5 then t else 0 func :: Int -> Double func x = fromIntegral x / mypow x mypow 0 = 1 mypow n = 10*(mypow (n `div` 10)) Switching to map and simplifying the structure gained me 30% or so, but the big improvement came from the elimination of the use of (^) by writing mypow (ill-named). I have no idea if this example will help your actual code, but it illustrates that at least in this example, it's pretty easy to gain an order of magnitude in speed. (That "func" is a weird function, by the way.) Incidentally, implementing the same program in C, I get: #include int test(int, int); double func(int); int mypow(int); int mypow(int n) { double result = 1; while (n>0) { result *= 10; n /= 10; } return result; } double func(int x) { return x / (double) mypow(x); } int test(int t, int y) { if (func(y) <= 0.5) { return t; } else { return 0; } } int main() { int i; int sum = 0; for (i=0;i<1000000;i++) { sum += test(1,i); } printf("sum is %d\n", sum); return 0; } Which runs more than 10 times faster than my Haskell version, so there's obviously still a lot of room for optimization. :( Incidentally, a version written in C that uses pow for the 10^(len n) runs in only half the time of my haskell version (five time the time of the C version I give)--confirming that pow is indeed a very expensive operation (as I already knew) and that if you call the pow function it *ought* to dominate your timing. But we've also still clearly got some seriously painful loop overhead. :( -- David Roundy Department of Physics Oregon State University

ntupel

9:28 p.m.

New subject: Laziness (was: Performance problem with random numbers)

On Mon, 2007-10-15 at 10:48 -0400, David Roundy wrote:

...

I have no idea if this example will help your actual code, but it illustrates that at least in this example, it's pretty easy to gain an order of magnitude in speed. (That "func" is a weird function, by the way.)

Thanks for your reply David, Unfortunately my original problem was that System.Random.{random, randomR} is used instead of all these weird test functions that I came up with during experimentation. And I can't force anything inside StdGen so I see no way of speeding up the original program sans replacing the random number generator itself. When I did that I became about 4 times faster than with System.Random but still an order of magnitude slower than for instance by using the Java implementation (and I can confirm that (^) is *very* expensive in this context). Many thanks again, Thoralf

Isaac Dupree

13 Oct 13 Oct

9:33 p.m.

ntupel wrote:

...

Thanks for your reply Stefan. Unfortunately I could measure only a relatively small improvement by changing to concrete types

...

the sample code was about one second faster when compiled with -O2. Profiling again indicated that most time was spend in random and randomR

GHC StdGen's random and randomR are somewhat slow. I found that changing to a custom ((x*a + b) `mod` c) random-generator (instance of RandomGen) much sped things up (since nothing depended on the random numbers being good quality). (Then I switched to a small C function to do the randomization and make all the OpenGL calls, and it sped up by another factor of 4.) Isaac

Don Stewart

9:37 p.m.

isaacdupree:

...

ntupel wrote:

...
Thanks for your reply Stefan. Unfortunately I could measure only a relatively small improvement by changing to concrete types

...
the sample code was about one second faster when compiled with -O2. Profiling again indicated that most time was spend in random and randomR

GHC StdGen's random and randomR are somewhat slow. I found that changing to a custom ((x*a + b) `mod` c) random-generator (instance of RandomGen) much sped things up (since nothing depended on the random numbers being good quality). (Then I switched to a small C function to do the randomization and make all the OpenGL calls, and it sped up by another factor of 4.)

I've seen similar results switching to the SIMD mersenne twister C implementation for randoms: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html If there's interest, I can package up the bindings for hackage. -- Don

ntupel

11:15 p.m.

On Sat, 2007-10-13 at 14:37 -0700, Don Stewart wrote:

...

I've seen similar results switching to the SIMD mersenne twister C implementation for randoms:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html

If there's interest, I can package up the bindings for hackage.

I would definitely be interested. Many thanks, Thoralf

Isaac Dupree

14 Oct 14 Oct

12:08 a.m.

Don Stewart wrote:

...

I've seen similar results switching to the SIMD mersenne twister C implementation for randoms:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html

If there's interest, I can package up the bindings for hackage.

looks nice... at least for those of us who have non-old computer CPUs.... Is there a decent way to implement 'split'? A way that doesn't take too long to run, and produces fairly independent generators? Isaac

Lennart Augustsson

4:05 p.m.

The Mersenne twister should be able to split better than most. but I'm not sure how efficient it is. On 10/14/07, Isaac Dupree wrote:

...

Don Stewart wrote:

...
I've seen similar results switching to the SIMD mersenne twister C implementation for randoms:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html

If there's interest, I can package up the bindings for hackage.

looks nice... at least for those of us who have non-old computer CPUs.... Is there a decent way to implement 'split'? A way that doesn't take too long to run, and produces fairly independent generators?

Isaac

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Simon Peyton-Jones

16 Oct 16 Oct

10:07 a.m.

We'd be delighted if someone offered a more performant library to put into future GHC releases. Simon | -----Original Message----- | From: haskell-cafe-bounces@haskell.org [mailto:haskell-cafe-bounces@haskell.org] On Behalf Of Don | Stewart | Sent: 13 October 2007 22:38 | To: Isaac Dupree | Cc: haskell-cafe@haskell.org | Subject: Re: [Haskell-cafe] Performance problem with random numbers | | isaacdupree: | > ntupel wrote: | > >Thanks for your reply Stefan. Unfortunately I could measure only a | > >relatively small improvement by changing to concrete types | > | > >the sample code was about one second faster when compiled with -O2. | > >Profiling again indicated that most time was spend in random and randomR | > | > GHC StdGen's random and randomR are somewhat slow. I found that | > changing to a custom ((x*a + b) `mod` c) random-generator (instance of | > RandomGen) much sped things up (since nothing depended on the random | > numbers being good quality). (Then I switched to a small C function to | > do the randomization and make all the OpenGL calls, and it sped up by | > another factor of 4.) | > | | I've seen similar results switching to the SIMD mersenne twister C | implementation for randoms: | | http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/index.html | | If there's interest, I can package up the bindings for hackage. | | -- Don | _______________________________________________ | Haskell-Cafe mailing list | Haskell-Cafe@haskell.org | http://www.haskell.org/mailman/listinfo/haskell-cafe

Peter Verswyvelen

4:07 p.m.

Does the GHC code generator makes use of SIMD instructions? Maybe via the C compiler? Cheers, Peter Simon Peyton-Jones wrote:

...

We'd be delighted if someone offered a more performant library to put into future GHC releases.

| I've seen similar results switching to the SIMD mersenne twister C | implementation for randoms:

Stefan O'Rear

8:37 p.m.

On Tue, Oct 16, 2007 at 06:07:39PM +0200, Peter Verswyvelen wrote:

...

Does the GHC code generator makes use of SIMD instructions? Maybe via the C compiler?

No. GHC uses GCC extensions, and GCC doesn't support automatic SIMD use. (You could use -unreg and an advanced compiler. Good luck finding a compiler smart enough to work around the idiocies incurred translating Haskell to ANSI C; -unreg is very slow) Stefan

ntupel

17 Oct 17 Oct

7:14 a.m.

On Sat, 2007-10-13 at 18:33 -0300, Isaac Dupree wrote:

...

GHC StdGen's random and randomR are somewhat slow. I found that changing to a custom ((x*a + b) `mod` c) random-generator (instance of RandomGen) much sped things up (since nothing depended on the random numbers being good quality).

Yes, I also switched now to a simple custom linear congruential generator (which is random enough for this task) and restructured the code a bit and am happy now since it is even a bit faster than the Java implementation :) Thanks, Thoralf

6474

Age (days ago)

6479

Last active (days ago)

List overview

Download

24 comments

10 participants

participants (10)

Brandon S. Allbery KF8NH
David Roundy
Derek Elkins
Don Stewart
Isaac Dupree
Lennart Augustsson
ntupel
Peter Verswyvelen
Simon Peyton-Jones
Stefan O'Rear

Performance problem with random numbers

tags

participants (10)