
Am Sonntag, 1. März 2009 21:03 schrieb Phil:
Hi,
Thanks for the replies - I haven't had a chance to try out everything suggested yet - but your explanations of transformers nailed it for me.
However, in terms of performance when stacking, I've come across something I'm struggling to explain - I was wondering if anyone could offer up and explanation. I've rewritten my code twice - one with 3 stacked monads, and one with 2 stacked monads and a load of maps. Heuristically I would have thought the 3 stacked monads would have performed as well, or even better than the 2 stacked solution, but the 2 stacked solution is MUCH faster and MUCH less memory is used. They are both using 90% of the same code and both chain together the same number of computations using replicateM.
Not quite, the triple stack uses replicateM_, whihc should be a little cheaper.
From profiling I can see that the pure function 'reflect' takes up most of the umph in both cases - which I'd expect. But in the triple stacked version the garbage collector is using up >90% of the time.
I've tried using BangPatterns to reduce memory usage in the Triple Stack version - doing this I can half the time it takes, but it is still running at over twice the time of the two stack version. The BangPatterns were also put in Common Code in the reflect function - so I'd expect both solutions to need them?
One thing that helps much is to use import Control.Monad.State.Strict Using the default lazy State monad, you build enormous thunks in the states, which harms the triple stack even more than the double stack. With the strict State monad (and a strict left fold instead of foldr in the double stack), I get ./v6tripleStrict +RTS -sstderr -K16M 10.450674088955589 444,069,720 bytes allocated in the heap 234,808,472 bytes copied during GC (scavenged) 30,504,688 bytes copied during GC (not scavenged) 41,074,688 bytes maximum residency (9 sample(s)) 786 collections in generation 0 ( 21.03s) 9 collections in generation 1 ( 2.54s) 106 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.05s ( 4.21s elapsed) GC time 23.57s ( 24.18s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 27.62s ( 28.40s elapsed) %GC time 85.3% (85.2% elapsed) Alloc rate 109,646,844 bytes per MUT second Productivity 14.7% of total user, 14.3% of total elapsed ./v6doubleStrict +RTS -sstderr 10.450674088955592 388,795,972 bytes allocated in the heap 177,748,228 bytes copied during GC (scavenged) 23,953,900 bytes copied during GC (not scavenged) 44,560,384 bytes maximum residency (9 sample(s)) 710 collections in generation 0 ( 11.62s) 9 collections in generation 1 ( 3.03s) 94 Mb total memory in use INIT time 0.01s ( 0.00s elapsed) MUT time 13.54s ( 13.91s elapsed) GC time 14.65s ( 15.02s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 28.20s ( 28.93s elapsed) %GC time 52.0% (51.9% elapsed) Alloc rate 28,693,429 bytes per MUT second Productivity 48.0% of total user, 46.8% of total elapsed So, yes, the triple stack uses more memory, but not terribly much more. However, it spends much more time gc'ing, but as its MUT time is much less, the total times are not much different. Now,if we give them enough heap space to begin with: ./v6tripleStrict +RTS -sstderr -K16M -H192M 10.450674088955589 444,077,972 bytes allocated in the heap 95,828,976 bytes copied during GC (scavenged) 15,441,936 bytes copied during GC (not scavenged) 36,147,200 bytes maximum residency (2 sample(s)) 5 collections in generation 0 ( 2.16s) 2 collections in generation 1 ( 0.43s) 185 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.20s ( 4.55s elapsed) GC time 2.59s ( 2.74s elapsed) EXIT time 0.00s ( 0.95s elapsed) Total time 6.79s ( 7.29s elapsed) %GC time 38.1% (37.6% elapsed) Alloc rate 105,732,850 bytes per MUT second Productivity 61.9% of total user, 57.6% of total elapsed ./v6doubleStrict +RTS -sstderr -K16M -H192M 10.450674088955592 388,806,408 bytes allocated in the heap 46,446,680 bytes copied during GC (scavenged) 77,852 bytes copied during GC (not scavenged) 159,744 bytes maximum residency (2 sample(s)) 4 collections in generation 0 ( 1.36s) 2 collections in generation 1 ( 0.03s) 182 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 4.53s ( 5.11s elapsed) GC time 1.39s ( 1.44s elapsed) EXIT time 0.00s ( 0.02s elapsed) Total time 5.92s ( 6.55s elapsed) %GC time 23.5% (21.9% elapsed) Alloc rate 85,829,229 bytes per MUT second Productivity 76.5% of total user, 69.2% of total elapsed MUCH better. I have no idea why the MUT time for the double stack decreases so much, though.
Even though both pieces of code are a bit untidy, the triple stacked monad 'feels' nicer to me - everything is encapsulated away and one evaluation in main yields the result. From purely a design perspective I prefer it - but obviously not if it runs like a dog!
Any ideas why the triple stack runs so slow?
It suffers horribly from laziness. One thing is the lazy State monad, another is the implementation of mc.
Thanks again!
Phil
***************** Triple Stack Specific Impl:
type MonteCarloStateT = StateT Double
mc :: MonteCarloStateT BoxMullerQuasiState () mc = StateT $ \s -> do nextNormal <- generateNormal let stochastic = 0.2*1*nextNormal let drift = 0.05 - (0.5*(0.2*0.2))*1 let newStockSum = payOff 100 ( 100 * exp ( drift + stochastic ) ) + s return ((),newStockSum)
Don't use a new let on each line, have it all in one let-block. And, please, force the evaluation of newStockSum: mc :: MonteCarloStateT BoxMullerQuasiState () mc = StateT $ \s -> do nextNormal <- generateNormal let stochastic = 0.2*1*nextNormal drift = 0.05 - (0.5*(0.2*0.2))*1 !newStockSum = payOff 100 ( 100 * exp ( drift + stochastic ) ) + s return ((),newStockSum) Now: ./v8tripleStrict +RTS -sstderr 10.450674088955589 396,391,172 bytes allocated in the heap 65,252 bytes copied during GC (scavenged) 22,272 bytes copied during GC (not scavenged) 40,960 bytes maximum residency (1 sample(s)) 757 collections in generation 0 ( 0.02s) 1 collections in generation 1 ( 0.00s) 1 Mb total memory in use INIT time 0.00s ( 0.00s elapsed) MUT time 3.36s ( 3.49s elapsed) GC time 0.02s ( 0.05s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 3.38s ( 3.54s elapsed) %GC time 0.6% (1.3% elapsed) Alloc rate 117,973,563 bytes per MUT second Productivity 99.4% of total user, 94.9% of total elapsed w00t!
iterations = 1000000 main :: IO() main = do let sumOfPayOffs = evalState ( evalStateT ( execStateT (do replicateM_ iterations mc) $ 0 ) $ (Nothing,nextHalton) ) $ (1,[3,5]) let averagePO = sumOfPayOffs / fromIntegral iterations let discountPO = averagePO * exp (-0.05) print discountPO
Again, don't needlessly multiply the lets.
***************** Double Stack and Map Specific Impl:
iterations = 1000000 main :: IO() main = do let normals = evalState ( evalStateT (do replicateM iterations generateNormal) $ (Nothing,nextHalton) ) $ (1,[3,5]) let stochastic = map (0.2*1*) normals let sde = map ((( 0.05 - (0.5*(0.2*0.2)) )*1)+) stochastic let expiryMult = map exp sde let expiry = map (100*) expiryMult let payoff = map (payOff 100) expiry let averagePO = (foldr (+) 0 payoff) / fromIntegral iterations let discountPO = averagePO * exp (-0.05) print discountPO
Same here, but important for performance is to replace the foldr with foldl'.
***************** Common Code Used By Both Methods:
import Control.Monad.State import Debug.Trace
-- State Monad for QRNGs - stores current iteration and list of -- bases to compute type QuasiRandomState = State (Int,[Int])
nextHalton :: QuasiRandomState [Double] nextHalton = do (n,bases) <- get let !nextN = n+1 put (nextN,bases) return $ map (reflect (n,1,0)) bases
type ReflectionThreadState = (Int,Double,Double)
reflect :: ReflectionThreadState -> Int -> Double reflect (k,f,h) base
| k <= 0 = h | otherwise = reflect (newK,newF,newH) base
where newK = k `div` base newF = f / fromIntegral base newH = h + fromIntegral(k `mod` base) * newF
-- So we are defining a state transform which has state of 'maybe double' and an -- operating function for the inner monad of type QuasiRandomMonad returning a [Double] -- We then say that it wraps an QuasiRandomMonad (State Monad) - it must of course -- if we pass it a function that operates on these Monads we must wrap the same -- type of Monad. And finally it returns a double
type BoxMullerStateT = StateT (Maybe Double, QuasiRandomState [Double]) type BoxMullerQuasiState = BoxMullerStateT QuasiRandomState
generateNormal :: BoxMullerQuasiState Double generateNormal = StateT $ \s -> case s of (Just d,qrnFunc) -> return (d,(Nothing,qrnFunc)) (Nothing,qrnFunc) -> do qrnBaseList <- qrnFunc let (norm1,norm2) = boxMuller (head qrnBaseList) (head $ tail qrnBaseList) return (norm1,(Just norm2,qrnFunc))
boxMuller :: Double -> Double -> (Double,Double) -- boxMuller rn1 rn2 | trace ( "rn1 " ++ show rn1 ++ " rn2 " ++ show rn2 ) False=undefined boxMuller rn1 rn2 = (normal1,normal2) where r = sqrt ( (-2)*log rn1) twoPiRn2 = 2 * pi * rn2 normal1 = r * cos ( twoPiRn2 ) normal2 = r * sin ( twoPiRn2 )
payOff :: Double -> Double -> Double payOff strike stock | (stock - strike) > 0 = stock - strike
| otherwise = 0
Cheers, Daniel