
i'm implementing a benchmark which includes a detailed specification for a random number generator. for any of the kernels outlined in the benchmark, i might have to generate a set of random numbers R, which has a length n, using the following formulas: R[k] = ((2^-46)(X[k])) mod 2^46, where X[k] = (a^k)s where the values of a and s are constant and defined below. many of the kernels in the benchmark require a large number of randoms to be generated (in the tens of millions). when i invoke the following getRandAt function that many times to build up a list, evaluation of the list takes forever (somewhere between 5 and 10 minutes). i've tried optimizing this several different ways, with no luck. i though i might post my code here and see if anyone notices anything i'm doing wrong that might be causing such a large bottleneck: --constants a :: Int64 a = 5^13 divisor :: Int64 divisor = 2^46 multiplier :: Float multiplier = 2**(-46) --gets r[k], which is the value at the kth --position in the overall sequence of --pseudorandom numbers getRandAt :: Int64 -> Int64 -> Float getRandAt 0 seed = multiplier * (fromIntegral seed) getRandAt k seed = multiplier * (fromIntegral x_next) where x_prev = (a^k * seed) `mod` divisor x_next = (a * x_prev) `mod` divisor thanks all in advance for your help! -james

james.swaine:
i'm implementing a benchmark which includes a detailed specification for a random number generator. for any of the kernels outlined in the benchmark, i might have to generate a set of random numbers R, which has a length n, using the following formulas:
R[k] = ((2^-46)(X[k])) mod 2^46, where
X[k] = (a^k)s
where the values of a and s are constant and defined below. many of the kernels in the benchmark require a large number of randoms to be generated (in the tens of millions). when i invoke the following getRandAt function that many times to build up a list, evaluation of the list takes forever (somewhere between 5 and 10 minutes). i've tried optimizing this several different ways, with no luck. i though i might post my code here and see if anyone notices anything i'm doing wrong that might be causing such a large bottleneck:
--constants a :: Int64 a = 5^13
divisor :: Int64 divisor = 2^46
multiplier :: Float multiplier = 2**(-46)
--gets r[k], which is the value at the kth --position in the overall sequence of --pseudorandom numbers getRandAt :: Int64 -> Int64 -> Float getRandAt 0 seed = multiplier * (fromIntegral seed) getRandAt k seed = multiplier * (fromIntegral x_next) where x_prev = (a^k * seed) `mod` divisor x_next = (a * x_prev) `mod` divisor
thanks all in advance for your help!
Using ghc -O2 --make There's nothing wrong with this code, really: Z.$wgetRandAt :: Int# -> Int# -> Float# and an inner loop of: Z.$w$j :: Int# -> Float# Z.$w$j = \ (w_sHx :: Int#) -> case Z.^1 Z.lit1 Z.lvl of w1_XFs { I64# ww_XFv -> case ww_XFv of wild_aFB { __DEFAULT -> case minBound3 of wild1_aFC { I64# b1_aFE -> case ==# w_sHx b1_aFE of wild2_aFG { False -> case modInt# w_sHx wild_aFB of wild3_aFJ { __DEFAULT -> timesFloat# (powerFloat# __float 2.0 __float -46.0) (int2Float# wild3_aFJ) }; True -> case wild_aFB of wild3_aFM { __DEFAULT -> case modInt# w_sHx wild3_aFM of wild4_aFN { __DEFAULT -> timesFloat# (powerFloat# __float 2.0 __float -46.0) (int2Float# wild4_aFN) }; (-1) -> overflowError `cast` (CoUnsafe (forall a_aFS. a_aFS) Float# :: forall a_aFS. a_aFS ~ Float#) } } }; 0 -> divZeroError `cast` (CoUnsafe (forall a_aFT. a_aFT) Float# :: forall a_aFT. a_aFT ~ Float#) } Which is just fine. Inlining those constants explicitly might be a good idea, then we get an outer loop of: Z.$wgetRandAt :: Int# -> Int# -> Float# Z.$wgetRandAt = \ (ww_sHG :: Int#) (ww1_sHK :: Int#) -> case ww_sHG of wild_B1 { __DEFAULT -> case Z.lvl3 of wild1_aEd { I64# x#_aEf -> case Z.^ Z.a (I64# wild_B1) of wild2_XFe { I64# x#1_XFh -> case Z.lvl1 of w_aE7 { I64# ww2_aE9 -> case ww2_aE9 of wild3_aFB { __DEFAULT -> case minBound3 of wild11_aFC { I64# b1_aFE -> let { ww3_aFz [ALWAYS Just L] :: Int# ww3_aFz = *# x#1_XFh ww1_sHK } in case ==# ww3_aFz b1_aFE of wild21_aFG { False -> case modInt# ww3_aFz wild3_aFB of wild31_aFJ { __DEFAULT -> Z.$w$j (*# x#_aEf wild31_aFJ) }; True -> case wild3_aFB of wild31_aFM { __DEFAULT -> case modInt# ww3_aFz wild31_aFM of wild4_aFN { __DEFAULT -> Z.$w$j (*# x#_aEf wild4_aFN) }; (-1) -> overflowError `cast` (CoUnsafe (forall a_aFS. a_aFS) Float# :: forall a_aFS. a_aFS ~ Float#) } 0 -> divZeroError `cast` (CoUnsafe (forall a_aFT. a_aFT) Float# :: forall a_aFT. a_aFT ~ Float#) } } } }; 0 -> timesFloat# (powerFloat# __float 2.0 __float -46.0) (int2Float# ww1_sHK) } which is the fast path, then error / bounds checking. this looks perfectly acceptable. What does sound troublesome is using lazy lists .. that's more likely to be the bottleneck. -- Don

2009/2/26 James Swaine
--gets r[k], which is the value at the kth --position in the overall sequence of --pseudorandom numbers getRandAt :: Int64 -> Int64 -> Float getRandAt 0 seed = multiplier * (fromIntegral seed) getRandAt k seed = multiplier * (fromIntegral x_next) where x_prev = (a^k * seed) `mod` divisor x_next = (a * x_prev) `mod` divisor
One thing that comes to mind is that this exponentiation, with a very big exponent, could potentially take a very long time. I believe that GHC implements (^) using a repeated squaring technique, so it runs in log(k) time, which ought to be no problem. I'm not sure about other compilers though. Also note: (a^k * seed) `mod` divisor = ((a^k `mod` divisor) * seed) `mod` divisor = (a^(k `mod` phi(divisor)) * seed) `mod` divisor. Where phi is the Euler totient function: phi(2^46) = 2^23. Modulo errors... it's been a while since I've done this stuff. Luke

Am Donnerstag, 26. Februar 2009 18:48 schrieb Luke Palmer:
2009/2/26 James Swaine
--gets r[k], which is the value at the kth --position in the overall sequence of --pseudorandom numbers getRandAt :: Int64 -> Int64 -> Float getRandAt 0 seed = multiplier * (fromIntegral seed) getRandAt k seed = multiplier * (fromIntegral x_next) where x_prev = (a^k * seed) `mod` divisor x_next = (a * x_prev) `mod` divisor
One thing that comes to mind is that this exponentiation, with a very big exponent, could potentially take a very long time. I believe that GHC implements (^) using a repeated squaring technique, so it runs in log(k) time, which ought to be no problem. I'm not sure about other compilers though.
Another thing: if you don't need to pick random indices, but use them in order, it may be faster to have a list of the random numbers : randInts = iterate ((`mod` divisor) . (*a)) seed or carry x[k] around as state.
Also note:
(a^k * seed) `mod` divisor = ((a^k `mod` divisor) * seed) `mod` divisor = (a^(k `mod` phi(divisor)) * seed) `mod` divisor.
Where phi is the Euler totient function: phi(2^46) = 2^23.
phi(2^n) = 2^(n-1). Apart from that, correct.
Modulo errors... it's been a while since I've done this stuff.
Luke
participants (4)
-
Daniel Fischer
-
Don Stewart
-
James Swaine
-
Luke Palmer