late optimization of typeclass callers

Hi, I do have the problem that my code is not completely optimized. We begin with -- Ring.hs class Ring a where rmult :: a -> a -> a zero :: a -- PrimitiveArray.hs class PrimArrayOps a b where data PrimArray a b :: * unsafeIndex :: PrimArray a b -> a -> b -- PAInstances.hs -- for any 'a' of Data.Primitive.Types by Roman, have unboxed arrays instance (Prim a) => PrimArrayOps where data PrimArray (Int,Int) a = PaIIxI {-# UNPACK #-} !(Int,Int) {-# UNPACK #-} !ByteArray unsafeIndex (PaIIxI (mI,mJ) arr) (i,j) = {-# CORE "IIxIunsafeIndex" #-} case (i*(mJ+1)+j) of idx -> indexByteArray arr idx -- RNAfoldFunctions.hs -- VU.Unbox a, because of the vector package import Data.Vector.Unboxed as VU class (Ring a, VU.Unbox a, Prim a) => FoldFunctions a where opt = VU.foldl' rmult zero $ base turnertables inp table table i j base trnr inp m m1 i j = VU.zipWith rmult ms m1s where cnt = j-i-2 -- TODO when to stop? ms = VU.map (\ik -> m `unsafeIndex` ik) $ VU.generate cnt (\k -> (i,i+k)) m1s = VU.map (\kj -> m1 `unsafeIndex` kj) $ VU.generate cnt (\k -> (k+1+i,j)) {-# INLINE multibranchCloseBase #-} If I now use this stuff... -- [1] instance Ring Int where rmult = min zero = 10000 instance FoldFunctions Int module MyProgram where main = do let val = opt trnr inp myM myM1 15 78 I get this core [1]. If I do this -- [2] instance Ring Int where rmult = min zero = 10000 instance FoldFunctions Int opt = VU.foldl' rmult zero $ base turnertables inp table table i j module MyProgram where main = do let myM = PrimArray of (Int,Int) with Int values ... let val = opt trnr inp myM myM1 15 78 Is there a way to get the program without an explicit FoldFunctions instance to specialize to the same Core as the second? The runtime (Criterion was used) is 7.1us (or worse) for [1] and 2.7us for [2]. I could put nice INLINEs everywhere but they do not help optimizing. These functions run O(n^2) times, with n between 100 and 10000. The core shows that temporary arrays are even created, filled, and then the fold run over them. So basically, can I get code running through several class instances to be optimized at the caller, where all instances are known? Otherwise I could live with [2], but as the code will almost always be the same for FoldFunctions instances, it would be really nice to be able to use the defaults that were defined on (Ring a, VU.Unbox a, Prim a). Thanks, Christian

It's hard to read your code because the line breaks have been removed. Moreover I think your code depends on some unspecified Hackage package. And I couldn't find the enclosed Core dumps. Nor do you specify which version of GHC you are using. Still I believe that the nub is this. You have a class with a default method declaration class FoldFunctions a where opt = <blah> and you get different behaviour if you say instance FoldFunctions Int from instance FoldFunctions Int where opt = <blah> That's surprising because the second case simply fills in the method with the identical code to the code in the class declaration. Is that right? If so, I think you might want to try the HEAD. Can you do that (use a binary snapshot). I think I fixed this default-method-inlining stuff so that there is no difference between the above two. But I could have failed, or there could be something else going on in your example. Simon | -----Original Message----- | From: glasgow-haskell-users-bounces@haskell.org [mailto:glasgow-haskell- | users-bounces@haskell.org] On Behalf Of Christian Höner zu Siederdissen | Sent: 26 August 2010 10:36 | To: Glasgow-Haskell-Users | Subject: late optimization of typeclass callers | | Hi, | | I do have the problem that my code is not completely optimized. We begin with | | -- Ring.hs | class Ring a where | rmult :: a -> a -> a | zero :: a | | -- PrimitiveArray.hs | class PrimArrayOps a b where | data PrimArray a b :: * | unsafeIndex :: PrimArray a b -> a -> b | | -- PAInstances.hs | -- for any 'a' of Data.Primitive.Types by Roman, have unboxed arrays instance | (Prim a) => PrimArrayOps where | data PrimArray (Int,Int) a = PaIIxI {-# UNPACK #-} !(Int,Int) {-# UNPACK #- | } !ByteArray | unsafeIndex (PaIIxI (mI,mJ) arr) (i,j) = {-# CORE "IIxIunsafeIndex" #-} | case (i*(mJ+1)+j) of idx -> indexByteArray arr idx | | -- RNAfoldFunctions.hs | -- VU.Unbox a, because of the vector package import Data.Vector.Unboxed as VU | class (Ring a, VU.Unbox a, Prim a) => FoldFunctions a where | opt = VU.foldl' rmult zero $ base turnertables inp table table i j | | base trnr inp m m1 i j = VU.zipWith rmult ms m1s where | cnt = j-i-2 -- TODO when to stop? | ms = VU.map (\ik -> m `unsafeIndex` ik) $ VU.generate cnt (\k -> | (i,i+k)) | m1s = VU.map (\kj -> m1 `unsafeIndex` kj) $ VU.generate cnt (\k -> | (k+1+i,j)) {-# INLINE multibranchCloseBase #-} | | | | If I now use this stuff... | -- [1] | instance Ring Int where | rmult = min | zero = 10000 | instance FoldFunctions Int | module MyProgram where | main = do | let val = opt trnr inp myM myM1 15 78 | | I get this core [1]. If I do this | -- [2] | instance Ring Int where | rmult = min | zero = 10000 | instance FoldFunctions Int | opt = VU.foldl' rmult zero $ base turnertables inp table table i j module | MyProgram where main = do | let myM = PrimArray of (Int,Int) with Int values | ... | let val = opt trnr inp myM myM1 15 78 | | Is there a way to get the program without an explicit FoldFunctions instance | to specialize to the same Core as the second? The runtime (Criterion was | used) is 7.1us (or worse) for [1] and 2.7us for [2]. I could put nice INLINEs | everywhere but they do not help optimizing. These functions run O(n^2) times, | with n between 100 and 10000. The core shows that temporary arrays are even | created, filled, and then the fold run over them. | | So basically, can I get code running through several class instances to be | optimized at the caller, where all instances are known? Otherwise I could | live with [2], but as the code will almost always be the same for | FoldFunctions instances, it would be really nice to be able to use the | defaults that were defined on (Ring a, VU.Unbox a, Prim a). | | Thanks, | Christian
participants (2)
-
Christian Höner zu Siederdissen
-
Simon Peyton-Jones