
[Moved to haskell-cafe@] Chad.Scherrer:
Thanks, Don. I need some time to take a closer look at this. I'm trying to see, what is it about the code you posted that makes it so much more efficient? Is it the loop in the IO monad is compiled into an honest-to-goodness loop? Is this specific to IO, or does it work out this way for monads in general? Or maybe some gains were made by switching from using StdGen with a specified seed to using the OS-supplied random number generator?
-Chad
Ok, lets try to work this out a bit more closely, rather than just throwing IO at it. The code that produces the following profiles is attached at the end. Before: total time = 0.28 secs (14 ticks @ 20 ms) total alloc = 42,614,556 bytes (excludes profiling overheads) randomEntry Main 71.4 78.0 randomArray Main 21.4 19.6 tester Main 7.1 2.3 Add cost centers to each line of randomEntry: total time = 0.46 secs (23 ticks @ 20 ms) total alloc = 49,301,332 bytes (excludes profiling overheads) randomEntry Main 87.0 71.2 randomArray Main 13.0 16.9 tester Main 0.0 2.0 4 Main 0.0 3.2 3 Main 0.0 3.2 1 Main 0.0 3.2 Hmm. Lets try to force the value returned from randomEntry: total time = 0.14 secs (7 ticks @ 20 ms) total alloc = 17,691,948 bytes (excludes profiling overheads) randomEntry Main 85.7 79.4 randomArray Main 14.3 12.1 tester Main 0.0 5.7 1 Main 0.0 2.6 Hmm. Some progress, but not clear what to do next. Now, lets try that loop: total time = 0.10 secs (5 ticks @ 20 ms) total alloc = 17,722,552 bytes (excludes profiling overheads) randomEntry Main 100.0 79.2 tester Main 0.0 5.6 randomArray Main 0.0 12.3 1 Main 0.0 2.6 randomArray is a little faster, but still randomEntry is costing us. Ok, lets see what happens if we switch to using the global stdgen instead of threading our own: total time = 0.00 secs (0 ticks @ 20 ms) total alloc = 137,544 bytes (excludes profiling overheads) CAF System.Random 0.0 1.8 CAF GHC.Handle 0.0 1.1 CAF GHC.Float 0.0 36.3 randomEntry Main 0.0 26.4 Bang! So I think our problem is a space leak in the monad state. Just to check, lets get rid of the explicit loop: total time = 0.00 secs (0 ticks @ 20 ms) total alloc = 139,340 bytes (excludes profiling overheads) CAF System.Random 0.0 1.8 CAF GHC.Handle 0.0 1.0 CAF GHC.Float 0.0 35.8 randomEntry Main 0.0 26.0 randomArray Main 0.0 3.7 main Main 0.0 1.0 CAF Main 0.0 30.0 Ok, so it wasn't the loop. Lets try to squish this monad state a bit better - it's getting allocated a lot. How about we put the StdGen in a strict field of a data type. data S = S !StdGen randomEntry :: State S Float randomEntry = do (S g) <- get (x, g') <- return $ randomR (0,1) g put (S g') return x total time = 0.10 secs (5 ticks @ 20 ms) total alloc = 15,536,944 bytes (excludes profiling overheads) randomArray Main 80.0 76.9 1 Main 20.0 16.6 tester Main 0.0 6.4 Just taking the original code and wrapping the state in the S constructor more than halves the memory consumption (compared to the original code), but doesn't squish the space leak .. lots of S's are getting allocated. How about we stick the StdGen in an STRef inside the ST monad (basically like the stdGen IORef, but not in IO): total time = 0.00 secs (0 ticks @ 20 ms) total alloc = 160,760 bytes (excludes profiling overheads) CAF System.Random 0.0 1.5 CAF GHC.Float 0.0 31.1 tester Main 0.0 2.3 randomEntry Main 0.0 33.3 randomArray Main 0.0 4.1 CAF Main 0.0 25.7 Ah ha. That'll do. Lesson: avoid hidden space leaks in monads. -- Don Here's the final ST code: ------------------------------------------------------------------------ import Control.Monad.ST import System.Random import Data.Array.Diff import Data.Array.Unboxed import Data.STRef import Control.Monad type Vector = UArray Int Float type Matrix = DiffArray Int Vector randomEntry :: STRef s StdGen -> ST s Float randomEntry r = do g <- readSTRef r (x, g') <- return $ randomR (0,1) g writeSTRef r g' return x randomArray :: (IArray arr a, Monad m) => Int -> m a -> m (arr Int a) randomArray n x = sequence (replicate n x) >>= return . listArray (1,n) tester :: Int -> [Matrix] tester n = runST (do ref <- newSTRef (mkStdGen 1) let randomMatrix i j = randomArray i $ randomVector j randomVector n = randomArray n (randomEntry ref) liftM repeat $ randomMatrix n n) main :: IO () main = print $ ((tester 10) !! 10000) ! 5 And here's some of the other versions of the code, if you're interested, to see how I was thinking. ------------------------------------------------------------------------ -- Generate a random number from the unit interval -- randomEntry :: State S Float -- randomEntry :: IO Float randomEntry = do (S g) <- {-# SCC "1" #-} get (x, g') <- {-# SCC "2" #-} return $ randomR (0,1) g {-# SCC "3" #-} put (S g') {-# SCC "4" #-} return x randomEntry = getStdRandom (randomR (0,1)) -- Build an array of n random things -- randomArray :: (IArray arr a) => Int -> State StdGen a -> State StdGen (arr Int a) -- randomArray :: (IArray arr a) => Int -> State S a -> State S (arr Int a) randomArray n x = mapState arr . sequence $ replicate n x where arr (a,s) = (array (1,n) $ zip [1..n] a, s) randomArray n x = do let loop i | i == 0 = return [] | otherwise = do f <- {-# SCC "a1" #-} x fs <- {-# SCC "a3" #-} loop (i-1) {-# SCC "a3" #-} return (f : fs) fs <- {-# SCC "a4" #-} loop n {-# SCC "a5" #-} return $ listArray (1,n) fs randomArray n x = do fs <- sequence (replicate n x) return $ listArray (1,n) fs -- A random vector is an array of random entries -- randomVector :: Int -> IO Vector -- randomVector :: Int -> State StdGen Vector -- randomVector :: Int -> State S Vector randomVector n = randomArray n randomEntry -- a random matrix is an array of random vectors. -- randomMatrix :: Int -> Int -> State StdGen Matrix -- randomMatrix :: Int -> Int -> IO Matrix randomMatrix i j r = randomArray i $ randomVector j tester n = fst $ runState vals $ S (mkStdGen 1) where vals = sequence . repeat $ randomMatrix n n tester n = unsafePerformIO $ do setStdGen (mkStdGen 1) liftM repeat $ randomMatrix n n
-----Original Message----- From: Donald Bruce Stewart [mailto:dons@cse.unsw.edu.au] Sent: Fri 8/19/2005 7:42 PM To: Scherrer, Chad Cc: haskell@haskell.org Subject: Re: [Haskell] Random matrices
Chad.Scherrer:
I'm doing some statistical calculations, and I decided to try this out in Haskell to see how it goes. I'm really enjoying using the language, so I hope I can straighten this out so I can keep using it at work.
I keep getting stack overflows, so I think my code must be too lazy somewhere (that's what that means, right?) Here is my code to build random vectors and matrices.
I was suspicious about all those zips and repeats, so did a bit of profiling. This only revealed that the creation of random floats was chewing up about 70% of time and space. Still suspicious of that replicate code, I decided to rewrite it as a loop (I moved the code into the IO monad just because it is easier to hack in IO). This fixed the issue.
-- Don
Before:
paprika$ ghc -package mtl -O -prof -auto-all N.hs paprika$ ./a.out +RTS -p Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize' to increase it.
total time = 0.16 secs (8 ticks @ 20 ms) total alloc = 42,614,984 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc randomEntry Main 50.0 78.0 randomArray Main 50.0 19.6
------------------------------------------------------------------------
After: paprika$ ghc -O -prof -auto-all M.hs paprika$ ./a.out +RTS -p array (1,10) [(1,0.73394084),(2,0.39095977),(3,0.18828022),(4,0.19094983),(5,0.83119744),(6,0.3594179),(7,0.43519533),(8,0.39867708),(9,0.15676379),(10,0.4503187)]
total time = 0.00 secs (0 ticks @ 20 ms) total alloc = 141,368 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc CAF System.Random 0.0 5.6 CAF GHC.Handle 0.0 1.0 CAF GHC.Float 0.0 35.2 randomEntry Main 0.0 25.7
------------------------------------------------------------------------
import Data.Array.Unboxed import Data.Array.Diff import System.Random import Control.Monad
type Vector = UArray Int Float type Matrix = DiffArray Int Vector
-- Generate a random number from the unit interval randomEntry :: IO Float randomEntry = getStdRandom (randomR (0,1))
-- Build an array of n random things randomArray :: (IArray arr a) => Int -> IO a -> IO (arr Int a) randomArray n x = do let loop i | i == 0 = return [] | otherwise = do f <- x fs <- loop (i-1) return (f : fs) fs <- loop n return $ listArray (1,n) fs
-- A random vector is an array of random entries randomVector :: Int -> IO Vector randomVector n = randomArray n randomEntry
-- a random matrix is an array of random vectors. randomMatrix :: Int -> Int -> IO Matrix randomMatrix i j = randomArray i (randomVector j)
tester :: Int -> IO [Matrix] tester n = liftM repeat $ randomMatrix n n
main :: IO () main = tester 10 >>= \as -> print $ (as !! 10000) ! 5
------------------------------------------------------------------------
participants (1)
-
dons@cse.unsw.edu.au