OK, some interesting things:

1. INLINE pragmas seem to have no substantial effect.

2. Regrettably, using Dimensional does seem to have some negative effect on performance. It's about 1.5x slower with Dimensional. The fragility of our currently used fusion techniques renders empty the promise of "overhead-free" newtype abstractions.

3. I got a *huge* performance boost by calculating `outputs` through `runIdentity` rather than treating it as an IO action. Several times faster. This makes sense, but I'm surprised the results are so drastic. 

At this point, `mapStencil2

On Sat, Jul 2, 2016 at 3:46 AM, Ben Lippmeier <benl@ouroborus.net> wrote:

On 2 Jul 2016, at 1:34 PM, William Yager <will.yager@gmail.com> wrote:

1) Put INLINE pragmas on all the leaf functions, especially ‘kernel’. If the compiler does not inline these functions they won’t fuse. This is a key problem with the Repa approach to fusion.

2) The ‘dimensional’ packages wraps a data type around all those values. I’m not convinced the simplifier will be able to undo the wrapping / unwrapping. You’ll need to inspect the core code to check.

Ben.