
I posted awhile back asking for help improving my cellular automata program. I am competing with a C program which evolves CAs using a fairly simple genetic algorithm. The algorithm involves evaluating 100 rules on 100 CAs, 100 times. In C this takes about 1 second. In my Haskell version, it takes about 20 minutes. That's an improvement from my first attempts, which took 7 hours. All the time is spent in the function that evaluates one iteration of a given CA. It examines each cell and its neighbors and, based on the values found, updates the cell to be white or black. Here is the code, followed by an explanation of what it's doing. I'm looking for any suggestions on making this faster. Prettier helps too but not at the expense of speed. http://hpaste.org/4151#a0 I represent the automata as an array of integers, where each bit represents a cell. Automata 'wrap' around, so marching through the array and determing the value of all neighbors is tricky around the edges. My goal was to minimize conditionals as much as possible and to avoid any "modulo" arithmetic. I wanted to use straightforward array access. My implemention uses three functions: fill, fillS and fillE. Each evalutaion of fill will determine the new values for cells represented by the current array element. fill handles two conditions - when the array element being examined is the last element or when it is not. It calls fillS with appropriate arguments, the results of which become the value for the integer in the current array position. fillS then handles two conditions - when the "neighborhood" of the bit being examined crosses to the "next" array element and when it does not. When the "neighborhood" crosses to the next array element, fillE is called. fillE finishes determining the values for the remaining cells in the integer and returns, ending the current evaluation. There are some other details in the code (for example firstRule) which could be made faster and/or cleaner, but their contribution is far outweighed by the fill function. If there is one thing I'm happy about, its the memory performance of this code. It runs in a constant 7 - 10MB of memory, even when I'm doing 100 generations with 100 rules and 100 CAs. Each CA is evaluated for 250 steps, which means every generation does 2.5M steps. That kind of memory usage is pretty sweet. Thanks in advance to those willing to take the time to look at this code! Justin

On Nov 29, 2007 10:31 PM, Justin Bailey
I represent the automata as an array of integers, where each bit represents a cell.
Why don't you use an UArray of Bools? They're implemented as bit arrays internally, AFAIK (e.g. see http://www.haskell.org/haskellwiki/Shootout/Nsieve ). And then you would get rid of a lot of shifts and masks in your code -- clearer and faster, the Haskell Way (TM). Well, I guess it would be faster, but I can't really tell. Could you provide a main function implementing an "use case" for benchmark? Good luck. -- Felipe.

felipe.lessa:
On Nov 29, 2007 10:31 PM, Justin Bailey
wrote: I represent the automata as an array of integers, where each bit represents a cell.
Why don't you use an UArray of Bools? They're implemented as bit arrays internally, AFAIK (e.g. see http://www.haskell.org/haskellwiki/Shootout/Nsieve ). And then you would get rid of a lot of shifts and masks in your code -- clearer and faster, the Haskell Way (TM).
Probably you meant to point to this bit / bool benchmark, http://www.haskell.org/haskellwiki/Shootout/Nsieve_Bits#New_entry

On Nov 29, 2007 4:45 PM, Felipe Lessa
Why don't you use an UArray of Bools? They're implemented as bit arrays internally, AFAIK (e.g. see http://www.haskell.org/haskellwiki/Shootout/Nsieve ). And then you would get rid of a lot of shifts and masks in your code -- clearer and faster, the Haskell Way (TM).
fill, fillE and fillS are all looking at a integer value, and not doing any array access. I'm skeptical that an array of bools, even if its internally a mask on an int, could be faster.
Well, I guess it would be faster, but I can't really tell. Could you provide a main function implementing an "use case" for benchmark?
Sure, here is one http://hpaste.org/4151#a1 Compile with: ghc -O2 --make Test.hs It runs a random rule for 1000 steps. Justin

It is interesting, that the naive implementation import Data.List (tails) neighbours :: Int -> [a] -> [[a]] neighbours w = rotL . take w . map (take 3) . tails . cycle rotL :: [a] -> [a] rotL xs = last xs : init xs type Rule a = [a] -> a step :: Int -> Rule a -> [a] -> [a] step w f = map f . neighbours w rule110 :: Rule Char rule110 " " = ' ' rule110 "X " = ' ' rule110 "XXX" = ' ' rule110 _ = 'X' main = let f = step 149 rule110 init = replicate 148 ' ' ++ "X" in mapM_ putStrLn $ take 10000 $ iterate f init is only 3 times slower than your quite complex, hard to follow and hard to debug implementation. As always, I prefer to write most code in Haskell, quick, easy, nice, reasonable fast, ... If speed matters, I switch to some lower level language, as you did staying inside Haskell. /BR, Mirko Rahn

On Friday 30 November 2007 00:31, Justin Bailey wrote:
I represent the automata as an array of integers, where each bit represents a cell.
Mathematica uses a single arbitrary-precision integer to represent each generation of a 1D automaton. The rules to derive the next generation are compiled into arithmetic operations on the integer. The offloads all such work onto your big number library and, with GMP, will be as fast in Haskell as most other languages. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/products/?e

On Nov 29, 2007 9:11 PM, Jon Harrop
Mathematica uses a single arbitrary-precision integer to represent each generation of a 1D automaton. The rules to derive the next generation are compiled into arithmetic operations on the integer. The offloads all such work onto your big number library and, with GMP, will be as fast in Haskell as most other languages.
Does GHC already use the GMP library for Integer? It looks that way but I'm not positive. That'd be ironic, if the higher-level Integer representation is faster than a low-level bitwise one ... Still, I suspect accessing individual "bits" might kill me if I'm not able to move most of the calculation into a call to the library. Do you mind elaborating on how rules are compiled into 'arithmetic' operations or providing a link? Justin

Justin Bailey wrote:
On Nov 29, 2007 9:11 PM, Jon Harrop
wrote: Mathematica uses a single arbitrary-precision integer to represent each generation of a 1D automaton. The rules to derive the next generation are compiled into arithmetic operations on the integer. The offloads all such work onto your big number library and, with GMP, will be as fast in Haskell as most other languages.
Does GHC already use the GMP library for Integer? It looks that way but I'm not positive. That'd be ironic, if the higher-level Integer representation is faster than a low-level bitwise one ... Still, I suspect accessing individual "bits" might kill me if I'm not able to move most of the calculation into a call to the library.
Do you mind elaborating on how rules are compiled into 'arithmetic' operations or providing a link?
This would mean that you are able to extract a global rule from your local rules (plural if you include topological dependencies) which is not possible in the general case. Global behavior was characterized in term of information propagation (classification), which was far not enough to deduce a global rule. But I haven't look at the domain for a decade now ;-) a+, ld.

On Nov 30, 2007 6:03 PM, Justin Bailey
On Nov 29, 2007 9:11 PM, Jon Harrop
wrote: Mathematica uses a single arbitrary-precision integer to represent each generation of a 1D automaton. The rules to derive the next generation are compiled into arithmetic operations on the integer. The offloads all such work onto your big number library and, with GMP, will be as fast in Haskell as most other languages.
Does GHC already use the GMP library for Integer? It looks that way but I'm not positive. That'd be ironic, if the higher-level Integer representation is faster than a low-level bitwise one ... Still, I suspect accessing individual "bits" might kill me if I'm not able to move most of the calculation into a call to the library.
Do you mind elaborating on how rules are compiled into 'arithmetic' operations or providing a link?
Integer is an instance of Bits, so you can just use bitshifts and bitops to do any 1-D rule. There might be a more efficient way. For rule 110: 111 110 101 100 011 010 001 000 0 1 1 0 1 1 1 0 Algebraically, that's not (a && b && c || a && not b && not c || not a && not b && not c) So just translate that to: rule z = complement ((a .&. b .&. c) .|. (a .&. b' .&. c') .|. (a' .&. b' .&. c')) where a = z `shift` (-1) b = z c = z `shift` 1 a' = complement a b' = complement b c' = complement c Modulo some algebra to make it even better, but this should be pretty darn fast... Luke
participants (7)
-
Don Stewart
-
Felipe Lessa
-
Jon Harrop
-
Justin Bailey
-
Laurent Deniau
-
Luke Palmer
-
Mirko Rahn