[Haskell-cafe] Performance help

13 Nov 2007

      I've been working on a program over the last few days to evolve
cellular automata rules using a genetic algorithm. Luckily, this email
has nothing to do with CAs but everything to do with Haskell
performance.

For those who don't know, a CA is represented as a row of cells, where
each can be either black (on/1) or white (off/0). A CA is "run" by
generating a new row from the previous row according to some rule.
Each cell is examined in turn and based on the state of it's neighbors
and itself, a new cell in the next row is generated that is either
black or white.

The function below is my "step" function that generates this new row.
It's at the heart of my program and where all  the execution time is
spent. In this scenario it's executed around 800 million times. On my
relatively modest desktop using GHC 6.8.1, the program takes about 30
seconds to run. Here's the function, with some of the type
declarations:

data Rule = Rule { ruleWidth :: Int, rules :: UArray Int Bool }
data Ring = Ring { currIdx :: !Int, vals :: (UArray Int Bool), lower,
upper, size:: !Int }
type CA = Ring

currR :: Int -> Ring -> Bool
currR amt r@(Ring curr arr l u s) = unsafeAt arr ((curr + amt) `mod` s)

stepWithUArray :: Int -> Rule -> CA -> CA
stepWithUArray rowLen rule@(Rule width rules) row =
  let upper = rowLen - 1
      getRule currIdx = pattern' start 0
        where
          start = currIdx - width
          end = currIdx + width
          pattern' cnt !result
            | cnt > end = result
            | otherwise = if (currR cnt row)
                            then pattern' (cnt + 1) (result * 2 + 1)
                            else pattern' (cnt + 1) (result * 2)
      makeNewRow :: ST s (ST.STUArray s Int Bool)
      makeNewRow =
        do
          arr <- ST.newArray_ (0, upper)
          let fill idx
                | idx > upper = return ()
                | otherwise = do
                    unsafeWrite arr idx (unsafeAt rules (getRule idx))
                    fill (idx + 1)
          fill 0
          return $! arr
  in fromUArray (ST.runSTUArray makeNewRow)

fromUArray produces a new Ring (i.e. CA) from an array. 'makeNewRow'
iterates over every cell in the current row using getRule to get the
new value for each cell and returns the new row as an array. getRule
essentially treats the neighbors of the current cell as bits, with the
most significant to the left. An index into the rules array is
constructed based on the values around the cell being examined (which
wrap on the ends, thus the Ring construct). That index is used to get
the value of the new cell from the rules array.

Profiling shows that the following lines take up the most execution
and allocation:

  makeNewRow = ... -- 20.5% execution,  26.7% allocation
  if (currR cnt row) ... -- 14.7% execution, 26.6% allocation,  in pattern'
  currR ... -- 14.7% execution, 0% allocation

Any suggestions for improving this code? Thanks in advance!

Justin

p.s. The entire program is attached. Compile with ghc -O2
-funbox-strict-fields -fbang-patterns --make GA-CA.hs. It runs 25
rules on each of 25 initial CAs for 2 generations.
p.p.s. On the bright side, this program has excellent memory
performance. It uses a constant 2 - 7 MB depending on the initial
parameters for the entire run. Beautiful!

[Haskell-cafe] Performance help

Justin Bailey