Array.ST is not being nice to me

Hi all, I have a problem with regards to the speed of ST arrays. In short: a Data.Map.Map outperforms Data.Array.ST in my application, whereas as far as I understand it, the ST array should be quicker. My application is a compiler. It compiles some source code into a (huge) number of boolean equations. These equations are finally printed to stdout; they form my "byte-code". Also the equations are defined in terms of the other equations again. So, for instance, compiled byte code may look like: 1 == 5 5 == True /\ 3 3 == False To optimize the compiled code, I do some tricks like constant propagation. So, for instance, if I know that one equation always results in some constant, I substitute that constant for that equation. So above set of equations will be rewritten as: 1 == False 5 == False 3 == False ------ All in all, my compiler does the following: - It generates a set of equations for some statement in the source code - It rewrites these equations - It remembers for each equation which other equations it depends upon, such that if these are rewritten to something simple, this equation may be rewritten too. So there are a lot of lookups of the equations, especially for all the rewritings. All in all, I store quite some data for each equation, amongst which are: data EquationState = EQS {cDefinition::Equation, usedBy::Map.Map Int (), ...} First, I stored all the equations in a Data.Map.Map. Int EquationState. But then I thought it'd be more clever to put it into an ST array, seen the large number of lookups. So now i put it into an ST array. However, the code runs much, much slower now (about 4 times). The ST array does use slightly less memory though. I did some profiling. It seems that it is not the GC that is making over-hours. So my theory now is: I do a large number of lookups. Only a small number of these lookups trigger a change in the array. One EquationState is pretty big. Is it so that when you look something up from an ST array, the whole element gets deeply copied in memory, just in case you would change it in the array? Seen the size of my elements, I could imagine that that would cause a whole bunch of needless copying. Or is there some other reason for this slower behaviour of ST array? Robert

Chris Kuklewicz wrote:
So my theory now is: I do a large number of lookups.
Try using Data.Array.Base.unsafeRead (and maybe ata.Array.Base.unsafeWrite). These avoid the bounds checking on the index each time you lookup something in the array.
Right. Also keep an eye on the GC time (+RTS -sstderr) if you're using boxed mutable arrays - we don't do card-marking, so GC is pretty sub-optimal when it comes to large mutable boxed arrays. Decreasing the number of GCs with +RTS -A<big> can help if you're suffereing from this - but always try with and without, sometimes it makes things worse by making less effective use of the L2 cache. Cheers, Simon

Hi all,
So my theory now is: I do a large number of lookups.
Try using Data.Array.Base.unsafeRead (and maybe ata.Array.Base.unsafeWrite). These avoid the bounds checking on the index each time you lookup something in the array.
Right. Also keep an eye on the GC time (+RTS -sstderr) if you're using boxed mutable arrays - we don't do card-marking, so GC is pretty sub-optimal when it comes to large mutable boxed arrays. Decreasing the number of GCs with +RTS -A<big> can help if you're suffereing from this -
but always try with and without, sometimes it makes things worse by making less effective use of the L2 cache.
Thanks for you advice. I changed all the reads to unsafeReads and all the writes to unsafeWrites. That indeed sped up things a bit (about 10 percent on the total running time). I also checked the GC time, but that is (only) 30% of the total running time. So still, the program with ST array is about 3 times slower than the program with Data.Map. Also, the function lookupEq that looks up the states of the equations from the array, is responsible for 20% of the running time, and 19% of the allocated memory. I would say that is should allocate almost no memory: lookupEq :: Int -> MyMonad (Maybe EquationState) lookupEq cid = do s <- get mEs <- lift $ unsafeRead s cid return mEs type MyMonad a = forall s2. StateT (Eqns s2) (ST s2) a type Eqns s2 = STArray s2 Int (Maybe EquationState) data EquationState = EQS {cDefinition::Equation, usedBy::Map.Map Int (), ...} How can it be the lookupEq allocates memory? Is is not that, because of some tricky strictness/lazyness thing about ST array, unsafeRead causes the element that is read from the array to be copied in memory? Regards, Robert

Robert van Herk wrote:
Hi all,
So my theory now is: I do a large number of lookups.
Try using Data.Array.Base.unsafeRead (and maybe ata.Array.Base.unsafeWrite). These avoid the bounds checking on the index each time you lookup something in the array.
Right. Also keep an eye on the GC time (+RTS -sstderr) if you're using boxed mutable arrays - we don't do card-marking, so GC is pretty sub-optimal when it comes to large mutable boxed arrays. Decreasing the number of GCs with +RTS -A<big> can help if you're suffereing from this - but always try with and without, sometimes it makes things worse by making less effective use of the L2 cache.
Thanks for you advice.
I changed all the reads to unsafeReads and all the writes to unsafeWrites. That indeed sped up things a bit (about 10 percent on the total running time).
I also checked the GC time, but that is (only) 30% of the total running time.
So still, the program with ST array is about 3 times slower than the program with Data.Map.
Also, the function lookupEq that looks up the states of the equations from the array, is responsible for 20% of the running time, and 19% of the allocated memory. I would say that is should allocate almost no memory:
lookupEq :: Int -> MyMonad (Maybe EquationState) lookupEq cid = do s <- get mEs <- lift $ unsafeRead s cid return mEs
type MyMonad a = forall s2. StateT (Eqns s2) (ST s2) a type Eqns s2 = STArray s2 Int (Maybe EquationState)
data EquationState = EQS {cDefinition::Equation, usedBy::Map.Map Int (), ...}
Perhaps your monad is not being optimised away? Look at the -ddump-simpl code for lookupEq to figure out why it is allocating memory. Cheers, Simon
participants (3)
-
Chris Kuklewicz
-
Robert van Herk
-
Simon Marlow