I'm back with another version of my cellular automata simulator. Since the last iteration I discovered GHC's unlifted types and the primitive operations that go with them. Using these types, rather than Ints, sped my code up by 2x or more.

http://hpaste.org/4151#a2 -- first half of program
http://hpaste.org/4151#a3 -- remaining (note 3 lines or so from first half are repeated)

The key observation came from looking at the Core files, which showed a lot of int2word# and word2int# conversions going on. Figuring out how to remove those led me to the unlifted types. Coding with these types is not easy (for example, I couldn't see a way to write a Word# value directly - I had to write stuff like "int2Word# 1#"), but because I had an existing algorithm to guide me, combined with type checking, it was a fairly straightforward implementation.

At first I felt kind of bad about using these operations, but then I noticed they are used pretty heavily by GHC itself. If it's good enough for GHC, it's good enough for me. The 2x performance gain didn't hurt either. Finally, the safety that comes from using the ST monad is just awesome. I've got low-level bit munging combined with high-level abstraction where I need it. So cool!

I was disappointed to find early on that using higher-order functions in tight loops was a performance killer. It's unfortunate because I started with a very elegant, short implementation based on a simple Ring buffer and map. The current version is certainly harder to understand and has some weird limitations. However, having the simple implementation let me use quickcheck to compare their results on random rules and inputs, which gave me high confidence that my complex implemenation is correct.

One thing I absolutely love about this program is its memory performance. It manages to stay within 1 - 10 MB of memory, depending on how much output is produced. How cool is that?

On Dec 3, 2007 2:44 AM, Mirko Rahn <rahn@ira.uka.de > wrote:

It is interesting, that the naive implementation

...

is only 3 times slower than your quite complex, hard to follow and hard
to debug implementation.

Now the naive implementation is 100x slower, so I don't feel so bad about this comment any more.

As always, I prefer to write most code in Haskell, quick, easy, nice,
reasonable fast, ... If speed matters, I switch to some lower level
language, as you did staying inside Haskell.

I have to take exception here - I *want* to write my code in Haskell. If Haskell isn't fast enough to beat the C implementation, I'd rather find a way to make my Haskell program faster than switch to some other language.

Justin