
james.swaine:
i'm implementing a benchmark which includes a detailed specification for a random number generator. for any of the kernels outlined in the benchmark, i might have to generate a set of random numbers R, which has a length n, using the following formulas:
R[k] = ((2^-46)(X[k])) mod 2^46, where
X[k] = (a^k)s
where the values of a and s are constant and defined below. many of the kernels in the benchmark require a large number of randoms to be generated (in the tens of millions). when i invoke the following getRandAt function that many times to build up a list, evaluation of the list takes forever (somewhere between 5 and 10 minutes). i've tried optimizing this several different ways, with no luck. i though i might post my code here and see if anyone notices anything i'm doing wrong that might be causing such a large bottleneck:
--constants a :: Int64 a = 5^13
divisor :: Int64 divisor = 2^46
multiplier :: Float multiplier = 2**(-46)
--gets r[k], which is the value at the kth --position in the overall sequence of --pseudorandom numbers getRandAt :: Int64 -> Int64 -> Float getRandAt 0 seed = multiplier * (fromIntegral seed) getRandAt k seed = multiplier * (fromIntegral x_next) where x_prev = (a^k * seed) `mod` divisor x_next = (a * x_prev) `mod` divisor
thanks all in advance for your help!
Using ghc -O2 --make There's nothing wrong with this code, really: Z.$wgetRandAt :: Int# -> Int# -> Float# and an inner loop of: Z.$w$j :: Int# -> Float# Z.$w$j = \ (w_sHx :: Int#) -> case Z.^1 Z.lit1 Z.lvl of w1_XFs { I64# ww_XFv -> case ww_XFv of wild_aFB { __DEFAULT -> case minBound3 of wild1_aFC { I64# b1_aFE -> case ==# w_sHx b1_aFE of wild2_aFG { False -> case modInt# w_sHx wild_aFB of wild3_aFJ { __DEFAULT -> timesFloat# (powerFloat# __float 2.0 __float -46.0) (int2Float# wild3_aFJ) }; True -> case wild_aFB of wild3_aFM { __DEFAULT -> case modInt# w_sHx wild3_aFM of wild4_aFN { __DEFAULT -> timesFloat# (powerFloat# __float 2.0 __float -46.0) (int2Float# wild4_aFN) }; (-1) -> overflowError `cast` (CoUnsafe (forall a_aFS. a_aFS) Float# :: forall a_aFS. a_aFS ~ Float#) } } }; 0 -> divZeroError `cast` (CoUnsafe (forall a_aFT. a_aFT) Float# :: forall a_aFT. a_aFT ~ Float#) } Which is just fine. Inlining those constants explicitly might be a good idea, then we get an outer loop of: Z.$wgetRandAt :: Int# -> Int# -> Float# Z.$wgetRandAt = \ (ww_sHG :: Int#) (ww1_sHK :: Int#) -> case ww_sHG of wild_B1 { __DEFAULT -> case Z.lvl3 of wild1_aEd { I64# x#_aEf -> case Z.^ Z.a (I64# wild_B1) of wild2_XFe { I64# x#1_XFh -> case Z.lvl1 of w_aE7 { I64# ww2_aE9 -> case ww2_aE9 of wild3_aFB { __DEFAULT -> case minBound3 of wild11_aFC { I64# b1_aFE -> let { ww3_aFz [ALWAYS Just L] :: Int# ww3_aFz = *# x#1_XFh ww1_sHK } in case ==# ww3_aFz b1_aFE of wild21_aFG { False -> case modInt# ww3_aFz wild3_aFB of wild31_aFJ { __DEFAULT -> Z.$w$j (*# x#_aEf wild31_aFJ) }; True -> case wild3_aFB of wild31_aFM { __DEFAULT -> case modInt# ww3_aFz wild31_aFM of wild4_aFN { __DEFAULT -> Z.$w$j (*# x#_aEf wild4_aFN) }; (-1) -> overflowError `cast` (CoUnsafe (forall a_aFS. a_aFS) Float# :: forall a_aFS. a_aFS ~ Float#) } 0 -> divZeroError `cast` (CoUnsafe (forall a_aFT. a_aFT) Float# :: forall a_aFT. a_aFT ~ Float#) } } } }; 0 -> timesFloat# (powerFloat# __float 2.0 __float -46.0) (int2Float# ww1_sHK) } which is the fast path, then error / bounds checking. this looks perfectly acceptable. What does sound troublesome is using lazy lists .. that's more likely to be the bottleneck. -- Don