Memory usage when passing arrays in state

Hello all. I am currently implementing an emulation of a CPU, in which the CPU's RAM is part of the internal state that is passed around in the program using a state monad. However, the program performs unexpectingly bad, and some profiling information makes us believe that the problem is the high memory usage of the program. The program below is similar to our main program used when testing a sorting algorithm in this CPU: module Main where import Control.Monad.State.Lazy import Data.Word import Data.Array.Diff import Control.Concurrent (threadDelay) data LoopState = LoopState { intVal :: Integer , diff :: DiffUArray Word8 Word8 } initState :: LoopState initState = LoopState 0 (array (0x00,0xFF) [(idx,0)|idx<-[0x00..0xFF]]) main :: IO () main = do execStateT looper initState >>= putStrLn . show . intVal looper :: StateT LoopState IO () looper = do st <- get let res = intVal st + 1 idx = fromIntegral res put $ st { intVal = res, diff = (diff st) // [(idx,idx)] } if res == 13000000 then return () else looper Of course our program does more than updating a counter ;-) Compiling and running this program yields the following result: [~]:[olaussot] >> ghc --make -O2 -o array ArrayPlay.hs [~]:[olaussot] >> ./array +RTS -sstderr ./array +RTS -sstderr 13000000 313,219,740 bytes allocated in the heap 1,009,986,984 bytes copied during GC 200,014,828 bytes maximum residency (8 sample(s)) 4,946,648 bytes maximum slop 393 MB total memory in use (3 MB lost due to fragmentation) Generation 0: 590 collections, 0 parallel, 3.06s, 3.09s elapsed Generation 1: 8 collections, 0 parallel, 3.56s, 4.21s elapsed INIT time 0.00s ( 0.00s elapsed) MUT time 0.27s ( 0.27s elapsed) GC time 6.62s ( 7.30s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 6.89s ( 7.57s elapsed) %GC time 96.1% (96.4% elapsed) Alloc rate 1,155,958,754 bytes per MUT second Productivity 3.9% of total user, 3.6% of total elapsed Why does the program spend 96.1% of its total running time collecting garbage? Any tips to make this program perform better are appreciated. Please do tell if anything is unclear. -- Tobias Olausson tobsan@gmail.com

This may be completely unrelated to your problem, but there's a ticket
in the GHC trac saying that DiffArray is unusably slow:
http://hackage.haskell.org/trac/ghc/ticket/2727 . It doesn't analyze
the cause of the slowness, so it's quite possible that it may be
related to GC as in your case.
Cheers,
Dan
On Tue, Mar 3, 2009 at 7:44 PM, Tobias Olausson
Hello all. I am currently implementing an emulation of a CPU, in which the CPU's RAM is part of the internal state that is passed around in the program using a state monad. However, the program performs unexpectingly bad, and some profiling information makes us believe that the problem is the high memory usage of the program.
The program below is similar to our main program used when testing a sorting algorithm in this CPU:
module Main where
import Control.Monad.State.Lazy import Data.Word import Data.Array.Diff import Control.Concurrent (threadDelay)
data LoopState = LoopState { intVal :: Integer , diff :: DiffUArray Word8 Word8 }
initState :: LoopState initState = LoopState 0 (array (0x00,0xFF) [(idx,0)|idx<-[0x00..0xFF]])
main :: IO () main = do execStateT looper initState >>= putStrLn . show . intVal
looper :: StateT LoopState IO () looper = do st <- get let res = intVal st + 1 idx = fromIntegral res put $ st { intVal = res, diff = (diff st) // [(idx,idx)] } if res == 13000000 then return () else looper
Of course our program does more than updating a counter ;-) Compiling and running this program yields the following result:
[~]:[olaussot] >> ghc --make -O2 -o array ArrayPlay.hs [~]:[olaussot] >> ./array +RTS -sstderr ./array +RTS -sstderr 13000000 313,219,740 bytes allocated in the heap 1,009,986,984 bytes copied during GC 200,014,828 bytes maximum residency (8 sample(s)) 4,946,648 bytes maximum slop 393 MB total memory in use (3 MB lost due to fragmentation)
Generation 0: 590 collections, 0 parallel, 3.06s, 3.09s elapsed Generation 1: 8 collections, 0 parallel, 3.56s, 4.21s elapsed
INIT time 0.00s ( 0.00s elapsed) MUT time 0.27s ( 0.27s elapsed) GC time 6.62s ( 7.30s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 6.89s ( 7.57s elapsed)
%GC time 96.1% (96.4% elapsed)
Alloc rate 1,155,958,754 bytes per MUT second
Productivity 3.9% of total user, 3.6% of total elapsed
Why does the program spend 96.1% of its total running time collecting garbage? Any tips to make this program perform better are appreciated. Please do tell if anything is unclear.
-- Tobias Olausson tobsan@gmail.com _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Thank you Daniel.
As I understood it DiffArrays are supposed to be faster than the regular
Array due to the fact that it doesnt copy the entire Array, but just updates
the position that changes, and keeps some kind of "changelog" on the array.
But when looking at the statistics for my sample program, it seems that
it allocates a lot more than what should be needed, which would indicate that
maybe the array is copied anyway.
At this point, the DiffArray/DiffUArray are the only functional arrays, right?
I mean, I can add two and two together and see that it equals...four, and
if the only functional array is sort of broken, that means so is my program.
Are there any alternatives that are fast aswell?
//Tobias
2009/3/4 Daniel Peebles
This may be completely unrelated to your problem, but there's a ticket in the GHC trac saying that DiffArray is unusably slow: http://hackage.haskell.org/trac/ghc/ticket/2727 . It doesn't analyze the cause of the slowness, so it's quite possible that it may be related to GC as in your case.
Cheers, Dan
On Tue, Mar 3, 2009 at 7:44 PM, Tobias Olausson
wrote: Hello all. I am currently implementing an emulation of a CPU, in which the CPU's RAM is part of the internal state that is passed around in the program using a state monad. However, the program performs unexpectingly bad, and some profiling information makes us believe that the problem is the high memory usage of the program.
The program below is similar to our main program used when testing a sorting algorithm in this CPU:
module Main where
import Control.Monad.State.Lazy import Data.Word import Data.Array.Diff import Control.Concurrent (threadDelay)
data LoopState = LoopState { intVal :: Integer , diff :: DiffUArray Word8 Word8 }
initState :: LoopState initState = LoopState 0 (array (0x00,0xFF) [(idx,0)|idx<-[0x00..0xFF]])
main :: IO () main = do execStateT looper initState >>= putStrLn . show . intVal
looper :: StateT LoopState IO () looper = do st <- get let res = intVal st + 1 idx = fromIntegral res put $ st { intVal = res, diff = (diff st) // [(idx,idx)] } if res == 13000000 then return () else looper
Of course our program does more than updating a counter ;-) Compiling and running this program yields the following result:
[~]:[olaussot] >> ghc --make -O2 -o array ArrayPlay.hs [~]:[olaussot] >> ./array +RTS -sstderr ./array +RTS -sstderr 13000000 313,219,740 bytes allocated in the heap 1,009,986,984 bytes copied during GC 200,014,828 bytes maximum residency (8 sample(s)) 4,946,648 bytes maximum slop 393 MB total memory in use (3 MB lost due to fragmentation)
Generation 0: 590 collections, 0 parallel, 3.06s, 3.09s elapsed Generation 1: 8 collections, 0 parallel, 3.56s, 4.21s elapsed
INIT time 0.00s ( 0.00s elapsed) MUT time 0.27s ( 0.27s elapsed) GC time 6.62s ( 7.30s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 6.89s ( 7.57s elapsed)
%GC time 96.1% (96.4% elapsed)
Alloc rate 1,155,958,754 bytes per MUT second
Productivity 3.9% of total user, 3.6% of total elapsed
Why does the program spend 96.1% of its total running time collecting garbage? Any tips to make this program perform better are appreciated. Please do tell if anything is unclear.
-- Tobias Olausson tobsan@gmail.com _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
-- Tobias Olausson tobsan@gmail.com

Am Mittwoch, 4. März 2009 02:30 schrieb Tobias Olausson:
Thank you Daniel. As I understood it DiffArrays are supposed to be faster than the regular
They may be supposed to be faster, but they aren't. If you want anything resembling speed, use UArrays, STUArrays, or, if your array elements cannot be unboxed, plain Arrays and STArrays, or some other array-package from Hackage (I don't know which are good, though, the above are good enough for me).
Array due to the fact that it doesnt copy the entire Array, but just updates the position that changes, and keeps some kind of "changelog" on the array. But when looking at the statistics for my sample program, it seems that it allocates a lot more than what should be needed, which would indicate that maybe the array is copied anyway. At this point, the DiffArray/DiffUArray are the only functional arrays, right?
No. They are probably the most dysfunctional arrays around.
I mean, I can add two and two together and see that it equals...four, and if the only functional array is sort of broken, that means so is my program. Are there any alternatives that are fast aswell?
//Tobias

Am Mittwoch, 4. März 2009 01:44 schrieb Tobias Olausson:
Hello all. I am currently implementing an emulation of a CPU, in which the CPU's RAM is part of the internal state that is passed around in the program using a state monad. However, the program performs unexpectingly bad, and some profiling information makes us believe that the problem is the high memory usage of the program.
The program below is similar to our main program used when testing a sorting algorithm in this CPU:
module Main where
import Control.Monad.State.Lazy
Not good, use Control.Monad.State.Strict
import Data.Word import Data.Array.Diff import Control.Concurrent (threadDelay)
data LoopState = LoopState { intVal :: Integer , diff :: DiffUArray Word8 Word8
Diff(U)Arrays tend to be slow, use them with care.
}
initState :: LoopState initState = LoopState 0 (array (0x00,0xFF) [(idx,0)|idx<-[0x00..0xFF]])
main :: IO () main = do execStateT looper initState >>= putStrLn . show . intVal
looper :: StateT LoopState IO () looper = do st <- get let res = intVal st + 1 idx = fromIntegral res put $ st { intVal = res, diff = (diff st) // [(idx,idx)] } if res == 13000000 then return () else looper
You're being too lazy, building a huge thunk that only gets evaluated at the end of the loop. You have to force evaluation earlier.
Of course our program does more than updating a counter ;-) Compiling and running this program yields the following result:
[~]:[olaussot] >> ghc --make -O2 -o array ArrayPlay.hs [~]:[olaussot] >> ./array +RTS -sstderr ./array +RTS -sstderr 13000000 313,219,740 bytes allocated in the heap 1,009,986,984 bytes copied during GC 200,014,828 bytes maximum residency (8 sample(s)) 4,946,648 bytes maximum slop 393 MB total memory in use (3 MB lost due to fragmentation)
Generation 0: 590 collections, 0 parallel, 3.06s, 3.09s elapsed Generation 1: 8 collections, 0 parallel, 3.56s, 4.21s elapsed
INIT time 0.00s ( 0.00s elapsed) MUT time 0.27s ( 0.27s elapsed) GC time 6.62s ( 7.30s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 6.89s ( 7.57s elapsed)
%GC time 96.1% (96.4% elapsed)
Alloc rate 1,155,958,754 bytes per MUT second
Productivity 3.9% of total user, 3.6% of total elapsed
Why does the program spend 96.1% of its total running time collecting garbage? Any tips to make this program perform better are appreciated. Please do tell if anything is unclear.
Nothing gets evaluated until the end, so nothing can be discarded earlier. ---------------------------------------------------------------------- {-# LANGUAGE BangPatterns #-} module Main where import Control.Monad.State.Strict import Data.Word import Data.Array.Unboxed import Data.Array.ST import Data.Array.MArray update :: UArray Word8 Word8 -> Word8 -> Word8 -> UArray Word8 Word8 update arr i v = runSTUArray $ do sar <- unsafeThaw arr writeArray sar i v return sar data LoopState = LoopState { intVal :: !Integer , diff :: !(UArray Word8 Word8) } initState :: LoopState initState = LoopState 0 (array (0x00,0xFF) [(idx,0)|idx<-[0x00 .. 0xFF]]) main :: IO () main = do execStateT looper initState >>= putStrLn . show . intVal looper :: StateT LoopState IO () looper = do LoopState i df <- get let res = i + 1 idx = fromIntegral res !ndf = update df idx idx put (LoopState res ndf) if res == 13000000 then return () else looper ---------------------------------------------------------------------- Is much better behaved. I didn't investigate if every strictness annotation is necessary.
-- Tobias Olausson tobsan@gmail.com
Cheers, Daniel

I've found DiffArrays to be way too slow/memory-hogging for real usage.
Since you are in IO already (StateT s IO), you'll probably be better
off using a mutable array for a data structure.
Some things are still best done in the imperative style. You can be a
bit safer by using ST as the bottom monad, however, and returning the
result as a pure array. This gives you a single copy per "runST".
Alternatively, use IntMap instead of arrays if you want a pure data
structure with reasonably efficient update/lookup.
-- ryan
On Tue, Mar 3, 2009 at 4:44 PM, Tobias Olausson
Hello all. I am currently implementing an emulation of a CPU, in which the CPU's RAM is part of the internal state that is passed around in the program using a state monad. However, the program performs unexpectingly bad, and some profiling information makes us believe that the problem is the high memory usage of the program.
The program below is similar to our main program used when testing a sorting algorithm in this CPU:
module Main where
import Control.Monad.State.Lazy import Data.Word import Data.Array.Diff import Control.Concurrent (threadDelay)
data LoopState = LoopState { intVal :: Integer , diff :: DiffUArray Word8 Word8 }
initState :: LoopState initState = LoopState 0 (array (0x00,0xFF) [(idx,0)|idx<-[0x00..0xFF]])
main :: IO () main = do execStateT looper initState >>= putStrLn . show . intVal
looper :: StateT LoopState IO () looper = do st <- get let res = intVal st + 1 idx = fromIntegral res put $ st { intVal = res, diff = (diff st) // [(idx,idx)] } if res == 13000000 then return () else looper
Of course our program does more than updating a counter ;-) Compiling and running this program yields the following result:
[~]:[olaussot] >> ghc --make -O2 -o array ArrayPlay.hs [~]:[olaussot] >> ./array +RTS -sstderr ./array +RTS -sstderr 13000000 313,219,740 bytes allocated in the heap 1,009,986,984 bytes copied during GC 200,014,828 bytes maximum residency (8 sample(s)) 4,946,648 bytes maximum slop 393 MB total memory in use (3 MB lost due to fragmentation)
Generation 0: 590 collections, 0 parallel, 3.06s, 3.09s elapsed Generation 1: 8 collections, 0 parallel, 3.56s, 4.21s elapsed
INIT time 0.00s ( 0.00s elapsed) MUT time 0.27s ( 0.27s elapsed) GC time 6.62s ( 7.30s elapsed) EXIT time 0.00s ( 0.00s elapsed) Total time 6.89s ( 7.57s elapsed)
%GC time 96.1% (96.4% elapsed)
Alloc rate 1,155,958,754 bytes per MUT second
Productivity 3.9% of total user, 3.6% of total elapsed
Why does the program spend 96.1% of its total running time collecting garbage? Any tips to make this program perform better are appreciated. Please do tell if anything is unclear.
-- Tobias Olausson tobsan@gmail.com _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (4)
-
Daniel Fischer
-
Daniel Peebles
-
Ryan Ingram
-
Tobias Olausson