
I've been working on a program over the last few days to evolve cellular automata rules using a genetic algorithm. Luckily, this email has nothing to do with CAs but everything to do with Haskell performance. For those who don't know, a CA is represented as a row of cells, where each can be either black (on/1) or white (off/0). A CA is "run" by generating a new row from the previous row according to some rule. Each cell is examined in turn and based on the state of it's neighbors and itself, a new cell in the next row is generated that is either black or white. The function below is my "step" function that generates this new row. It's at the heart of my program and where all the execution time is spent. In this scenario it's executed around 800 million times. On my relatively modest desktop using GHC 6.8.1, the program takes about 30 seconds to run. Here's the function, with some of the type declarations: data Rule = Rule { ruleWidth :: Int, rules :: UArray Int Bool } data Ring = Ring { currIdx :: !Int, vals :: (UArray Int Bool), lower, upper, size:: !Int } type CA = Ring currR :: Int -> Ring -> Bool currR amt r@(Ring curr arr l u s) = unsafeAt arr ((curr + amt) `mod` s) stepWithUArray :: Int -> Rule -> CA -> CA stepWithUArray rowLen rule@(Rule width rules) row = let upper = rowLen - 1 getRule currIdx = pattern' start 0 where start = currIdx - width end = currIdx + width pattern' cnt !result | cnt > end = result | otherwise = if (currR cnt row) then pattern' (cnt + 1) (result * 2 + 1) else pattern' (cnt + 1) (result * 2) makeNewRow :: ST s (ST.STUArray s Int Bool) makeNewRow = do arr <- ST.newArray_ (0, upper) let fill idx | idx > upper = return () | otherwise = do unsafeWrite arr idx (unsafeAt rules (getRule idx)) fill (idx + 1) fill 0 return $! arr in fromUArray (ST.runSTUArray makeNewRow) fromUArray produces a new Ring (i.e. CA) from an array. 'makeNewRow' iterates over every cell in the current row using getRule to get the new value for each cell and returns the new row as an array. getRule essentially treats the neighbors of the current cell as bits, with the most significant to the left. An index into the rules array is constructed based on the values around the cell being examined (which wrap on the ends, thus the Ring construct). That index is used to get the value of the new cell from the rules array. Profiling shows that the following lines take up the most execution and allocation: makeNewRow = ... -- 20.5% execution, 26.7% allocation if (currR cnt row) ... -- 14.7% execution, 26.6% allocation, in pattern' currR ... -- 14.7% execution, 0% allocation Any suggestions for improving this code? Thanks in advance! Justin p.s. The entire program is attached. Compile with ghc -O2 -funbox-strict-fields -fbang-patterns --make GA-CA.hs. It runs 25 rules on each of 25 initial CAs for 2 generations. p.p.s. On the bright side, this program has excellent memory performance. It uses a constant 2 - 7 MB depending on the initial parameters for the entire run. Beautiful!

One observation is that you're doing a lot of redundant Bool -> Int
conversion.
As you iterate across the array in fillArray, the rule you are using for the
next cell is almost entirely determined by the rule you are using for the
current cell; lop off the top bit, shift left by one, and add the new bit.
Instead, you're recalculating the entire rule at that point.
Sadly, (at least as of GHC 6.6.1) the performance of the operations in
Data.Bits was horrendous, and any time I wanted to use them for
performance, I ended up going through the FFI and writing C code. I haven't
had a chance to test this on GHC 6.8.
In this case, that might not be so bad, however. You can probably write a
10-20 line C function that implements "fill" and call it via the FFI.
Alternatively, you could create a new rule table which maps a rule and a
bool into a new rule index. This would only be twice the size of the rule
table, and you can index that way.
Also, what stops getRule from going off the end of the array? I didn't see
anything that prevented that in the code, and you're using unsafeAt, which
seems like a potential bug.
-- ryan
On 11/13/07, Justin Bailey
I've been working on a program over the last few days to evolve cellular automata rules using a genetic algorithm. Luckily, this email has nothing to do with CAs but everything to do with Haskell performance.
For those who don't know, a CA is represented as a row of cells, where each can be either black (on/1) or white (off/0). A CA is "run" by generating a new row from the previous row according to some rule. Each cell is examined in turn and based on the state of it's neighbors and itself, a new cell in the next row is generated that is either black or white.
The function below is my "step" function that generates this new row. It's at the heart of my program and where all the execution time is spent. In this scenario it's executed around 800 million times. On my relatively modest desktop using GHC 6.8.1, the program takes about 30 seconds to run. Here's the function, with some of the type declarations:
data Rule = Rule { ruleWidth :: Int, rules :: UArray Int Bool } data Ring = Ring { currIdx :: !Int, vals :: (UArray Int Bool), lower, upper, size:: !Int } type CA = Ring
currR :: Int -> Ring -> Bool currR amt r@(Ring curr arr l u s) = unsafeAt arr ((curr + amt) `mod` s)
stepWithUArray :: Int -> Rule -> CA -> CA stepWithUArray rowLen rule@(Rule width rules) row = let upper = rowLen - 1 getRule currIdx = pattern' start 0 where start = currIdx - width end = currIdx + width pattern' cnt !result | cnt > end = result | otherwise = if (currR cnt row) then pattern' (cnt + 1) (result * 2 + 1) else pattern' (cnt + 1) (result * 2) makeNewRow :: ST s (ST.STUArray s Int Bool) makeNewRow = do arr <- ST.newArray_ (0, upper) let fill idx | idx > upper = return () | otherwise = do unsafeWrite arr idx (unsafeAt rules (getRule idx)) fill (idx + 1) fill 0 return $! arr in fromUArray (ST.runSTUArray makeNewRow)
fromUArray produces a new Ring (i.e. CA) from an array. 'makeNewRow' iterates over every cell in the current row using getRule to get the new value for each cell and returns the new row as an array. getRule essentially treats the neighbors of the current cell as bits, with the most significant to the left. An index into the rules array is constructed based on the values around the cell being examined (which wrap on the ends, thus the Ring construct). That index is used to get the value of the new cell from the rules array.
Profiling shows that the following lines take up the most execution and allocation:
makeNewRow = ... -- 20.5% execution, 26.7% allocation if (currR cnt row) ... -- 14.7% execution, 26.6% allocation, in pattern' currR ... -- 14.7% execution, 0% allocation
Any suggestions for improving this code? Thanks in advance!
Justin
p.s. The entire program is attached. Compile with ghc -O2 -funbox-strict-fields -fbang-patterns --make GA-CA.hs. It runs 25 rules on each of 25 initial CAs for 2 generations. p.p.s. On the bright side, this program has excellent memory performance. It uses a constant 2 - 7 MB depending on the initial parameters for the entire run. Beautiful!
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 11/13/07, Ryan Ingram
Also, what stops getRule from going off the end of the array? I didn't see anything that prevented that in the code, and you're using unsafeAt, which seems like a potential bug.
Never mind, I realized this is a ring buffer with `mod` s. That's another slow operation when you're doing code as tight as this. If you can guarantee the ring is a power of 2 in size you can use a mask instead, or use my original suggestion of deriving rules from the previous rule and the new bit; the initial state is determined by the last bits in the buffer and you never wrap. -- ryan

On Nov 13, 2007 2:21 PM, Ryan Ingram
Never mind, I realized this is a ring buffer with `mod` s. That's another slow operation when you're doing code as tight as this. If you can guarantee the ring is a power of 2 in size you can use a mask instead, or use my original suggestion of deriving rules from the previous rule and the new bit; the initial state is determined by the last bits in the buffer and you never wrap.
I can't guarantee the ring is a power of 2 but do you feel like explaining the mask suggestion anyways? Thanks for the bits suggestion - I'll see if that helps performance at all. It looks like you have to be very careful in which concrete type you choose or you'll get a lot of conversion going on. Justin

On Tue, Nov 13, 2007 at 02:45:33PM -0800, Justin Bailey wrote:
On Nov 13, 2007 2:21 PM, Ryan Ingram
wrote: Never mind, I realized this is a ring buffer with `mod` s. That's another slow operation when you're doing code as tight as this. If you can guarantee the ring is a power of 2 in size you can use a mask instead, or use my original suggestion of deriving rules from the previous rule and the new bit; the initial state is determined by the last bits in the buffer and you never wrap.
I can't guarantee the ring is a power of 2 but do you feel like explaining the mask suggestion anyways?
Thanks for the bits suggestion - I'll see if that helps performance at all. It looks like you have to be very careful in which concrete type you choose or you'll get a lot of conversion going on.
About how wide are your rules usually? Stefan

Sure, if the ring size is a power of two, and a is greater than or equal
to 0, then
a `mod` ringSize == a .&. (ringSize - 1)
that is:
a `mod` 8 == a .&. 7
a `mod` 256 == a .&. 255
etc.
On 11/13/07, Justin Bailey
Never mind, I realized this is a ring buffer with `mod` s. That's another slow operation when you're doing code as tight as this. If you can guarantee the ring is a power of 2 in size you can use a mask instead, or use my original suggestion of deriving rules from the previous rule and
On Nov 13, 2007 2:21 PM, Ryan Ingram
wrote: the new bit; the initial state is determined by the last bits in the buffer and you never wrap.
I can't guarantee the ring is a power of 2 but do you feel like explaining the mask suggestion anyways?
Thanks for the bits suggestion - I'll see if that helps performance at all. It looks like you have to be very careful in which concrete type you choose or you'll get a lot of conversion going on.
Justin

I implement bit shifting to get the next rule, as you suggested, and that cut my run time by 75%. It went from 200 seconds to do 100 rules on 100 CAs to 50 seconds. Amazing. Justin

On Tue, 2007-11-13 at 14:21 -0800, Ryan Ingram wrote:
On 11/13/07, Ryan Ingram
wrote: Also, what stops getRule from going off the end of the array? I didn't see anything that prevented that in the code, and you're using unsafeAt, which seems like a potential bug. Never mind, I realized this is a ring buffer with `mod` s. That's another slow operation when you're doing code as tight as this. If you can guarantee the ring is a power of 2 in size you can use a mask instead, or use my original suggestion of deriving rules from the previous rule and the new bit; the initial state is determined by the last bits in the buffer and you never wrap.
rem is faster than mod and equivalent if everything is positive.
participants (4)
-
Derek Elkins
-
Justin Bailey
-
Ryan Ingram
-
Stefan O'Rear