PI calculation - Newbie question

Hi Group, I am just trying to learn the lang and Implemented this PI calculator. It is really slow and very memory consuming (much much slower than its equivalent in Clojure for instance) I think the problem is in "rs <- sequence (replicate n isRandIn)" - But I don't know how to get around it (how do I get a lazy sequence of rs? Is it the problem anyway?) -- p.hs simple PI calculator, using the Monte Carlo Method import System( getArgs ) import System.Random inCirc :: (Double, Double) -> Int inCirc (x,y) = if ((dx * dx) + (dy * dy)) < 0.25 then 1 else 0 where dx = x - 0.5 dy = y - 0.5 randPoint :: IO (Double, Double) randPoint = do x <-getStdRandom (randomR (0, 1 :: Double)) y <-getStdRandom (randomR (0, 1 :: Double)) return (x, y) isRandIn = do p <- randPoint return (inCirc p) main = do args <- getArgs let n = if null args then 10000 else read $ (head args)::Int rs <- sequence (replicate n isRandIn) let pi = (fromIntegral(sum rs) / fromIntegral n) * 4 print pi

Am Freitag 29 Januar 2010 11:59:30 schrieb Gabi:
Hi Group,
I am just trying to learn the lang and Implemented this PI calculator. It is really slow and very memory consuming (much much slower than its equivalent in Clojure for instance)
I think the problem is in "rs <- sequence (replicate n isRandIn)" - But I don't know how to get around it (how do I get a lazy sequence of rs? Is it the problem anyway?)
It's one part of the problem. sequence (replicate n isRandIn) requires all n random number computations to be in memory at once. When you want far more than 10000 random points, that'll take a lot of memory. The more evil part is that the things aren't actually calculated when you call sequence, so what you get is a long list of thunks "calculate when needed" which take a really large amount of space ( ./MCPi 100000 +RTS -sstderr 3.14048 252,873,148 bytes allocated in the heap 48,792,700 bytes copied during GC 8,861,192 bytes maximum residency (5 sample(s)) 128,768 bytes maximum slop 23 MB total memory in use (0 MB lost due to fragmentation) ./MCPi 1000000 +RTS -sstderr 3.1401 2,523,687,224 bytes allocated in the heap 527,553,588 bytes copied during GC 107,345,604 bytes maximum residency (8 sample(s)) 7,619,396 bytes maximum slop 219 MB total memory in use (2 MB lost due to fragmentation) ). You can reduce the space requirements by forcing the calculations earlier, e.g. isRandIn = do p <- randPoint return $! inCirc p The ($!) requires the evaluation of (inCirc p) [to the outermost constructor, since it's an Int, that means complete evaluation], the list that sequence produces is a list of evaluated Ints, needs far less space. Unfortunately, that is much slower. I'm not sure why. However, it's not good to do that much in IO (and getStdRandom isn't exactly fast, it has to retrieve the random generator and store the modified generator). The bulk of the algorithm is a pure calculation, given the *lazy* list of pseudo-random coordinates. So ---------------------------------------------------------------------- -- curried version of inCirc, no need to construct pairs inCirc :: Double -> Double -> Int inCirc x y | dx*dx + dy*dy < 0.25 = 1 | otherwise = 0 where dx = x - 0.5 dy = y - 0.5 -- transform a list of coordinates into a list of indicators -- whether the point is inside the circle (sorry for the -- stupid name) inCircles :: [Double] -> [Int] inCircles (x:y:zs) = inCirc x y : inCircles zs inCircles _ = [] -- given a count of experiments and an infinite list of coordinates, -- calculate an approximation to pi calcPi :: Int -> [Double] -> Double calcPi n ds = fromIntegral ct / fromIntegral n * 4 where ct = sum . take n $ inCircles ds -- now the IO part is only -- * getting the number of experiments and -- * getting the StdGen main :: IO () main = do args <- getArgs sg <- getStdGen let n = case args of (a:_) -> read a _ -> 10000 print $ calcPi n (randomRs (0,1) sg) ---------------------------------------------------------------------- Now the list of coordinates is consumed as it is produced, things can immediately be garbage-collected (if compiled with optimisations, without optimisations, you would have to replace "sum" with "foldl' (+) 0"). It runs in constant space and faster. It's still slow, though, because System.Random's StdGen is slow. For faster generation of pseudo-random numbers, take a look at e.g. mersenne-random (there are other fast PRNG packages on hackage, too, IIRC).
-- p.hs simple PI calculator, using the Monte Carlo Method
import System( getArgs ) import System.Random inCirc :: (Double, Double) -> Int inCirc (x,y) = if ((dx * dx) + (dy * dy)) < 0.25 then 1 else 0 where dx = x - 0.5 dy = y - 0.5
randPoint :: IO (Double, Double) randPoint = do x <-getStdRandom (randomR (0, 1 :: Double)) y <-getStdRandom (randomR (0, 1 :: Double)) return (x, y)
isRandIn = do p <- randPoint return (inCirc p)
main = do args <- getArgs let n = if null args then 10000 else read $ (head args)::Int
rs <- sequence (replicate n isRandIn) let pi = (fromIntegral(sum rs) / fromIntegral n) * 4 print pi

On Fri, Jan 29, 2010 at 12:59:30PM +0200, Gabi wrote:
I think the problem is in "rs <- sequence (replicate n isRandIn)" - But I don't know how to get around it (how do I get a lazy sequence of rs? Is it the problem anyway?)
First of all, we don't like to be inside IO just for getting random numbers. It is better to write, for example, type Point = (Double, Double) randPoint :: StdGen -> (Point, StdGen) randPoint gen = let (x, gen') = randomR (0,1) gen (y, gen'') = randomR (0,1) gen' in ((x,y),gen'') and then thread the generator yourself randPoints :: StdGen -> [Point] randPoints gen = let (p,gen') = randPoint gen in p : randPoints gen' -- lazy! To use those functions, something like import Control.Applicative ((<$>)) inCirc :: Point -> Bool inCirc = ... calcPi :: Int -> [Point] -> Double calcPi n pts = let s = length $ filter inCirc $ take n pts in 4 * fromIntegral s / fromIntegral n main = do pi <- calcPi 10000 . randPoints <$> newStdGen print pi I haven't tested performance, but it shouldn't be worse :), and it's cleaner and more idiomatic. To be even clearer you could use MonadRandom[1] which encapsulates the passing of StdGen between computations. [1] http://hackage.haskell.org/package/MonadRandom However System.Random (used by MonadRandom as well) is known to be very very very very slow. If you want more performance then you should use one of the other PRNG libraries on Hackage: http://hackage.haskell.org/package/mwc-random http://hackage.haskell.org/package/mersenne-random http://hackage.haskell.org/package/gsl-random The reason why System.Random is slow is because it must support 'split', which none of the packages above support. HTH, -- Felipe.
participants (3)
-
Daniel Fischer
-
Felipe Lessa
-
Gabi