
Hello, I'm writing some matrix multiplication and inversion functions for small matrices (3x3 and 4x4 mostly, for 3d graphics, modeling, simulation, etc.) I noticed that the matrix multiplication was a bottleneck so I set out to optimize and found that using unsafeRead instead of (!) (or readArray in stateful code) helped a lot. So then I went to optimize my gaussian elimination function and found just the opposite. unsafeRead is slower than readArray. This struck me as very odd considering that readArray calls unsafeRead. If there is a "good" reason why the compiler optimized readArray better than unsafeRead, I'd like to know what it is so that I can make all my array code safe as well as fast. (By "good" reason I mean something deterministic and repeatable, not just luck.) On the otherhand, if this is a fluke, I'm inclined to think that it's not the safe code which is freakishly fast, but the unsafe code which is needlessly slow. That is, something about my program is hindering optimization of the unsafe code. What is it? Attached is the profiling results and a test program with a handful of matrix multiplication and gaussian elimination functions to illustrate what I've seen. This happens both on amd64 and intel core architectures. Thanks for any insight, Scott