
Hi, I do have the problem that my code is not completely optimized. We begin with -- Ring.hs class Ring a where rmult :: a -> a -> a zero :: a -- PrimitiveArray.hs class PrimArrayOps a b where data PrimArray a b :: * unsafeIndex :: PrimArray a b -> a -> b -- PAInstances.hs -- for any 'a' of Data.Primitive.Types by Roman, have unboxed arrays instance (Prim a) => PrimArrayOps where data PrimArray (Int,Int) a = PaIIxI {-# UNPACK #-} !(Int,Int) {-# UNPACK #-} !ByteArray unsafeIndex (PaIIxI (mI,mJ) arr) (i,j) = {-# CORE "IIxIunsafeIndex" #-} case (i*(mJ+1)+j) of idx -> indexByteArray arr idx -- RNAfoldFunctions.hs -- VU.Unbox a, because of the vector package import Data.Vector.Unboxed as VU class (Ring a, VU.Unbox a, Prim a) => FoldFunctions a where opt = VU.foldl' rmult zero $ base turnertables inp table table i j base trnr inp m m1 i j = VU.zipWith rmult ms m1s where cnt = j-i-2 -- TODO when to stop? ms = VU.map (\ik -> m `unsafeIndex` ik) $ VU.generate cnt (\k -> (i,i+k)) m1s = VU.map (\kj -> m1 `unsafeIndex` kj) $ VU.generate cnt (\k -> (k+1+i,j)) {-# INLINE multibranchCloseBase #-} If I now use this stuff... -- [1] instance Ring Int where rmult = min zero = 10000 instance FoldFunctions Int module MyProgram where main = do let val = opt trnr inp myM myM1 15 78 I get this core [1]. If I do this -- [2] instance Ring Int where rmult = min zero = 10000 instance FoldFunctions Int opt = VU.foldl' rmult zero $ base turnertables inp table table i j module MyProgram where main = do let myM = PrimArray of (Int,Int) with Int values ... let val = opt trnr inp myM myM1 15 78 Is there a way to get the program without an explicit FoldFunctions instance to specialize to the same Core as the second? The runtime (Criterion was used) is 7.1us (or worse) for [1] and 2.7us for [2]. I could put nice INLINEs everywhere but they do not help optimizing. These functions run O(n^2) times, with n between 100 and 10000. The core shows that temporary arrays are even created, filled, and then the fold run over them. So basically, can I get code running through several class instances to be optimized at the caller, where all instances are known? Otherwise I could live with [2], but as the code will almost always be the same for FoldFunctions instances, it would be really nice to be able to use the defaults that were defined on (Ring a, VU.Unbox a, Prim a). Thanks, Christian