I've posted this question (and my code) to stack overflow as well (http://stackoverflow.com/questions/10747079/what-are-the-key-differences-between-the-repa-2-and-3-apis), so if anyone here has the answer, I'll post it to that site for the world's reference. Using the Repa 2 API, I have written some simple image convolution tests which run more than fast enough. The trick to getting good performance was to call 'force' after every array transformation. I can't quite figure out the analogous thing to do with Repa 3 - at stackoverflow you can see my Repa 3 code, which runs correctly but very slowly. It is not clear to me how exactly the monadic "computeP" functions in Repa 3 are intended to be used - I have several calls to 'force' in my Repa 2 code, but only 1 call to computeP in the Repa 3 version. I've read the excellent "Numeric Haskell" Repa tutorial by Don S, but it doesn't cover Repa 3. My thanks in advance!