Re: [Haskell-cafe] Performance Issue

26 Feb 2009

      james.swaine:
...
i'm implementing a benchmark which includes a detailed specification for a
random number generator.  for any of the kernels outlined in the benchmark, i
might have to generate a set of random numbers R, which has a length n, using
the following formulas:
R[k] = ((2^-46)(X[k])) mod 2^46, where
X[k] = (a^k)s
where the values of a and s are constant and defined below. 
many of the kernels in the benchmark require a large number of randoms to be
generated (in the tens of millions).  when i invoke the following getRandAt
function that many times to build up a list, evaluation of the list takes
forever (somewhere between 5 and 10 minutes).  i've tried optimizing this
several different ways, with no luck.  i though i might post my code here and
see if anyone notices anything i'm doing wrong that might be causing such a
large bottleneck:
--constants
a :: Int64
a = 5^13
divisor :: Int64
divisor = 2^46
multiplier :: Float
multiplier = 2**(-46)
--gets r[k], which is the value at the kth
--position in the overall sequence of
--pseudorandom numbers
getRandAt :: Int64 -> Int64 -> Float
getRandAt 0 seed = multiplier * (fromIntegral seed)
getRandAt k seed = multiplier * (fromIntegral x_next)
    where
        x_prev = (a^k * seed) `mod` divisor
        x_next = (a * x_prev) `mod` divisor
thanks all in advance for your help!
Using ghc -O2 --make

There's nothing wrong with this code, really:

    Z.$wgetRandAt :: Int# -> Int# -> Float#

and an inner loop of:

    Z.$w$j :: Int# -> Float#
    Z.$w$j =
      \ (w_sHx :: Int#) ->
        case Z.^1 Z.lit1 Z.lvl of w1_XFs { I64# ww_XFv ->
        case ww_XFv of wild_aFB {
          __DEFAULT ->
            case minBound3 of wild1_aFC { I64# b1_aFE ->
            case ==# w_sHx b1_aFE of wild2_aFG {
              False ->
                case modInt# w_sHx wild_aFB of wild3_aFJ { __DEFAULT ->
                timesFloat#
                  (powerFloat# __float 2.0 __float -46.0)
                  (int2Float# wild3_aFJ)
                };
              True ->
                case wild_aFB of wild3_aFM {
                  __DEFAULT ->
                    case modInt# w_sHx wild3_aFM of wild4_aFN { __DEFAULT ->
                    timesFloat#
                      (powerFloat# __float 2.0 __float -46.0)
                      (int2Float# wild4_aFN)
                    };
                  (-1) ->
                    overflowError
                    `cast` (CoUnsafe (forall a_aFS. a_aFS) Float#
                            :: forall a_aFS. a_aFS ~ Float#)
                }
            }
            };
          0 ->
            divZeroError
            `cast` (CoUnsafe (forall a_aFT. a_aFT) Float#
                    :: forall a_aFT. a_aFT ~ Float#)
        }

Which is just fine.

Inlining those constants explicitly might be a good idea, then we get an outer loop of:

    Z.$wgetRandAt :: Int# -> Int# -> Float#
    Z.$wgetRandAt =
      \ (ww_sHG :: Int#) (ww1_sHK :: Int#) ->
        case ww_sHG of wild_B1 {
          __DEFAULT ->
            case Z.lvl3 of wild1_aEd { I64# x#_aEf ->
            case Z.^ Z.a (I64# wild_B1)
            of wild2_XFe { I64# x#1_XFh ->
            case Z.lvl1 of w_aE7 { I64# ww2_aE9 ->
            case ww2_aE9 of wild3_aFB {
              __DEFAULT ->
                case minBound3 of wild11_aFC { I64# b1_aFE ->
                let {
                  ww3_aFz [ALWAYS Just L] :: Int#

                  ww3_aFz = *# x#1_XFh ww1_sHK } in
                case ==# ww3_aFz b1_aFE of wild21_aFG {
                  False ->
                    case modInt# ww3_aFz wild3_aFB
                    of wild31_aFJ { __DEFAULT ->
                    Z.$w$j (*# x#_aEf wild31_aFJ)
                    };
                  True ->
                    case wild3_aFB of wild31_aFM {
                      __DEFAULT ->
                        case modInt# ww3_aFz wild31_aFM
                        of wild4_aFN { __DEFAULT ->
                        Z.$w$j (*# x#_aEf wild4_aFN)
                        };
                      (-1) ->
                        overflowError
                        `cast` (CoUnsafe (forall a_aFS. a_aFS) Float#
                                :: forall a_aFS. a_aFS ~ Float#)
                    }
          0 ->
            divZeroError
            `cast` (CoUnsafe (forall a_aFT. a_aFT) Float#
                    :: forall a_aFT. a_aFT ~ Float#)
        }
        }
        }
        };
      0 ->
        timesFloat#
          (powerFloat# __float 2.0 __float -46.0)
          (int2Float# ww1_sHK)
    }

which is the fast path, then error / bounds checking.

this looks perfectly acceptable.

What does sound troublesome is using lazy lists .. that's more likely to be the bottleneck.

-- Don

Re: [Haskell-cafe] Performance Issue

Don Stewart