Much faster complex monad stack based on CPS state

newer
instance Enum Double considered...

Nicu Ionita

26 Sep 2011 26 Sep '11

11:07 p.m.

Hello list, Starting from this emails (http://web.archiveorange.com/archive/v/nDNOvSM4JT3GJRSjOm9P) I could refactor my code (a UCI chess engine, with complex functions, in which the search has a complex monad stack) to run twice as fast as with even some hand unroled state transformer! So from 23-24 kilo nodes per second it does now 45 to 50 kNps! And it looks like there is still some improvement room (I have to play a little bit with strictness annotations and so on). (Previously I tried specializations, then I removed a lot of polimorphism, but nothing helped, it was like hitting a wall.) Even more amazingly is that I could program it although I cannot really understand the Cont & ContT, but just taking the code example from Ryan Ingram (newtype ContState r s a = ...) and looking a bit at the code from ContT (from the transformers library), and after fixing some compilation errors, it worked and was so fast. I wonder why the transformers library does not use this kind of state monad definition. Or does it, and what I got is just because of the unrolling? Are there monad (transformers) libraries which are faster? I saw the library kan-extensions but I did not understand (yet) how to use it. Nicu

Show replies by date

Ryan Ingram

28 Sep 28 Sep

12:35 a.m.

New subject: Much faster complex monad stack based on CPS state

My guess is that Cont plays really nicely with GHC's inliner, so things that end up looking like return x >>= \y -> ... get optimized really well return x >>= f -- inline >>= = ContState $ \s0 k -> runCS (return x) s0 $ \a s1 -> runCS (f a) s1 k -- inline return = ContState $ \s0 k -> runCS (ContState $ \s2 k2 -> k2 x s2) s0 $ \a s1 -> runCS (f a) s1 k -- runCS record selector = ContState $ \s0 k -> (\s2 k2 -> k2 x s2) s0 $ \a s1 -> runCS (f a) s1 k -- beta = ContState $ \s0 k -> (\k2 -> k2 x s0) $ \a s1 -> runCS (f a) s1 k -- beta = ContState $ \s0 k -> (\a s1 -> runCS (f a) s1 k) x s0 -- beta = ContState $ \s0 k -> runCS (f x) s0 k and then further inlining of f can take place. On Mon, Sep 26, 2011 at 4:07 PM, Nicu Ionita wrote:

...

Hello list,

Starting from this emails (http://web.archiveorange.com/** archive/v/nDNOvSM4JT3GJRSjOm9Phttp://web.archiveorange.com/archive/v/nDNOvSM4JT3GJRSjOm9P **) I could refactor my code (a UCI chess engine, with complex functions, in which the search has a complex monad stack) to run twice as fast as with even some hand unroled state transformer! So from 23-24 kilo nodes per second it does now 45 to 50 kNps! And it looks like there is still some improvement room (I have to play a little bit with strictness annotations and so on).

(Previously I tried specializations, then I removed a lot of polimorphism, but nothing helped, it was like hitting a wall.)

Even more amazingly is that I could program it although I cannot really understand the Cont & ContT, but just taking the code example from Ryan Ingram (newtype ContState r s a = ...) and looking a bit at the code from ContT (from the transformers library), and after fixing some compilation errors, it worked and was so fast.

I wonder why the transformers library does not use this kind of state monad definition. Or does it, and what I got is just because of the unrolling? Are there monad (transformers) libraries which are faster? I saw the library kan-extensions but I did not understand (yet) how to use it.

Nicu

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe http://www.haskell.org/mailman/listinfo/haskell-cafe

Nicu Ionita

8:55 p.m.

New subject: Much faster complex monad stack based on CPS state

Am 28.09.2011 02:35, schrieb Ryan Ingram:

...

My guess is that Cont plays really nicely with GHC's inliner, so things that end up looking like

return x >>= \y -> ...

get optimized really well

return x >>= f -- inline >>= = ContState $ \s0 k -> runCS (return x) s0 $ \a s1 -> runCS (f a) s1 k -- inline return = ContState $ \s0 k -> runCS (ContState $ \s2 k2 -> k2 x s2) s0 $ \a s1 -> runCS (f a) s1 k -- runCS record selector = ContState $ \s0 k -> (\s2 k2 -> k2 x s2) s0 $ \a s1 -> runCS (f a) s1 k -- beta = ContState $ \s0 k -> (\k2 -> k2 x s0) $ \a s1 -> runCS (f a) s1 k -- beta = ContState $ \s0 k -> (\a s1 -> runCS (f a) s1 k) x s0 -- beta = ContState $ \s0 k -> runCS (f x) s0 k

and then further inlining of f can take place.

I was even thinking - and this would have been the next idea to try if I couldn't get your example code to run so fast - to define some rules for the state monad (transformer) to "fuse" such expressions like m >>= f >>= g = ... or even modify f >>= modify g = modify (g . f) and perhaps other variations, so that it would perhaps end up in some nice combination of f and g, avoiding the intermediate tuples, hopefully with better performance. But then I did not follow it, and I want to concentrate on further improvements with the new code. The way is still long, because the top engines (written in C or C++) can do about 10 mil nps on my machine :-) Nicu

Bas van Dijk

10:56 a.m.

New subject: Much faster complex monad stack based on CPS state

On 27 September 2011 01:07, Nicu Ionita wrote:

...

I wonder why the transformers library does not use this kind of state monad definition.

One disadvantage of ContT and I guess any CPS based monad transformer is that they interact badly with exception handling functions like catch. See [1] for a bug that was caused because of using catch in ContT: Because of this reason I don't provide a MonadTransControl instance for ContT in monad-control[2]. Regards, Bas [1] http://thread.gmane.org/gmane.comp.lang.haskell.cafe/76262/ [2] http://hackage.haskell.org/package/monad-control

Ertugrul Soeylemez

12:25 p.m.

New subject: Much faster complex monad stack based on CPS state

Bas van Dijk wrote:

...

Because of this reason I don't provide a MonadTransControl instance for ContT in monad-control[2].

Is that even possible? I tried hard to come up with just a MonadFix instance for CPS-based monads, and I failed. I would think that MonadTransControl is just as hard, if not even harder. Greets, Ertugrul -- nightmare = unsafePerformIO (getWrongWife >>= sex) http://ertes.de/

Bas van Dijk

2:15 p.m.

New subject: Much faster complex monad stack based on CPS state

On 28 September 2011 14:25, Ertugrul Soeylemez wrote:

...

Bas van Dijk wrote:

...
Because of this reason I don't provide a MonadTransControl instance for ContT in monad-control[2].

Is that even possible?

I once tried and failed so I believe it's not possible. Bas

Thomas Schilling

2:28 p.m.

New subject: Much faster complex monad stack based on CPS state

Well, you can get something close with the help of IORefs, but I forgot the details. I believe this is the paper that explains it: "Value recursion in the continuation monad" by Magnus Carlsson http://www.carlssonia.org/ogi/mdo-callcc.pdf On 28 September 2011 15:15, Bas van Dijk wrote:

...

On 28 September 2011 14:25, Ertugrul Soeylemez wrote:

...
Bas van Dijk wrote:

...
Because of this reason I don't provide a MonadTransControl instance for ContT in monad-control[2].

Is that even possible?

I once tried and failed so I believe it's not possible.

Bas

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

-- Push the envelope. Watch it bend.

Yves Parès

12:05 p.m.

New subject: Much faster complex monad stack based on CPS state

Interesting, so what have you used to get that speedup? A monad stack of ContT and State (*)? Just the Cont monad? (*) If so, were you using the strict version of State? Would it be possible to see the differences between the 2 versions of you code? 2011/9/27 Nicu Ionita

...

Hello list,

Starting from this emails (http://web.archiveorange.com/** archive/v/nDNOvSM4JT3GJRSjOm9Phttp://web.archiveorange.com/archive/v/nDNOvSM4JT3GJRSjOm9P **) I could refactor my code (a UCI chess engine, with complex functions, in which the search has a complex monad stack) to run twice as fast as with even some hand unroled state transformer! So from 23-24 kilo nodes per second it does now 45 to 50 kNps! And it looks like there is still some improvement room (I have to play a little bit with strictness annotations and so on).

(Previously I tried specializations, then I removed a lot of polimorphism, but nothing helped, it was like hitting a wall.)

Even more amazingly is that I could program it although I cannot really understand the Cont & ContT, but just taking the code example from Ryan Ingram (newtype ContState r s a = ...) and looking a bit at the code from ContT (from the transformers library), and after fixing some compilation errors, it worked and was so fast.

I wonder why the transformers library does not use this kind of state monad definition. Or does it, and what I got is just because of the unrolling? Are there monad (transformers) libraries which are faster? I saw the library kan-extensions but I did not understand (yet) how to use it.

Nicu

______________________________**_________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/**mailman/listinfo/haskell-cafe http://www.haskell.org/mailman/listinfo/haskell-cafe

Nicu Ionita

8:41 p.m.

New subject: Much faster complex monad stack based on CPS state

Am 28.09.2011 14:05, schrieb Yves Parès:

...

Interesting, so what have you used to get that speedup? A monad stack of ContT and State (*)? Just the Cont monad?

This is a module with a state monad transformer that I used before (the name STPlus is misleading - and sorry for the long email): {-# LANGUAGE RankNTypes, MultiParamTypeClasses, FlexibleInstances #-} module Search.SearchMonad ( STPlus, return, (>>=), get, put, gets, modify, lift, liftIO, runSearch, execSearch ) where import Control.Monad import Control.Monad.State hiding (lift, gets, modify) newtype STPlus s m a = STPlus { runSTPlus :: s -> m (a, s) } {-# INLINE runSTPlus #-} instance Monad m => Monad (STPlus s m) where {-# INLINE return #-} return v = STPlus (\s -> return (v, s)) {-# INLINE (>>=) #-} (>>=) = bindSTPlus {-# INLINE bindSTPlus #-} bindSTPlus :: Monad m => STPlus s m a -> (a -> STPlus s m b) -> STPlus s m b bindSTPlus ms f = STPlus $ \s -> case runSTPlus ms s of m -> m >>= \(v', s') -> case f v' of fv -> runSTPlus fv s' instance Monad m => MonadState s (STPlus s m) where {-# INLINE get #-} get = STPlus $ \s -> return (s, s) {-# INLINE put #-} put s = STPlus $ \_ -> return ((), s) instance MonadIO m => MonadIO (STPlus s m) where {-# INLINE liftIO #-} liftIO = lift . liftIO runSearch :: Monad m => STPlus s m a -> s -> m (a, s) runSearch = runSTPlus execSearch ms s = liftM snd $ runSearch ms s {-# INLINE lift #-} lift :: Monad m => m a -> STPlus s m a lift m = STPlus $ \s -> m >>= \v -> return (v, s) {-# INLINE gets #-} gets :: Monad m => (s -> a) -> STPlus s m a -- gets f = STPlus $ \s -> return (f s, s) gets f = STPlus $ \s -> case f s of fs -> return (fs, s) {-# INLINE modify #-} modify :: Monad m => (s -> s) -> STPlus s m () modify f = STPlus $ \s -> case f s of fs -> return ((), fs) And this is how the module looks now: {-# LANGUAGE RankNTypes, MultiParamTypeClasses, FlexibleInstances #-} module Search.SearchMonadCPS ( STPlus, return, (>>=), get, put, gets, modify, lift, liftIO, runSearch, execSearch ) where import Control.Monad import Control.Monad.State hiding (lift, gets, modify) newtype STPlus r s m a = STPlus { runSTPlus :: s -> (a -> s -> m r) -> m r } instance Monad (STPlus r s m) where return a = STPlus $ \s k -> k a s c >>= f = STPlus $ \s0 k -> runSTPlus c s0 $ \a s1 -> runSTPlus (f a) s1 k instance MonadState s (STPlus r s m) where get = STPlus $ \s k -> k s s put s = STPlus $ \_ k -> k () s instance MonadIO m => MonadIO (STPlus r s m) where {-# INLINE liftIO #-} liftIO = lift . liftIO runSearch :: Monad m => STPlus (a, s) s m a -> s -> m (a, s) runSearch c s = runSTPlus c s $ \a s0 -> return (a, s0) execSearch ms s = liftM snd $ runSearch ms s {-# INLINE lift #-} lift :: Monad m => m a -> STPlus r s m a lift m = STPlus $ \s k -> m >>= \a -> k a s {-# INLINE gets #-} gets :: Monad m => (s -> a) -> STPlus r s m a gets f = STPlus $ \s k -> k (f s) s {-# INLINE modify #-} modify :: Monad m => (s -> s) -> STPlus r s m () modify f = STPlus $ \s k -> k () (f s) And then I have (in different modules): Client code (starting an PV search to a given depth): type CtxIO = ReaderT Context IO bestMoveCont :: Int -> MyState -> Maybe Int -> [Move] -> [Move] -> CtxIO IterResult bestMoveCont ... = do ... ((sc, path, rmvsf), statf) <- runSearch (alphaBeta abc) stati ... Search framework: class Monad m => Node m where staticVal :: m Int -- static evaluation of a node materVal :: m Int -- material evaluation (for prune purpose) genEdges :: Int -> Int -> Bool -> m ([Move], [Move]) -- generate all legal edges genTactEdges :: m [Move] -- generate all edges in tactical positions ... type Search m a = forall r. STPlus r PVState m a alphaBeta :: Node m => ABControl -> m (Int, [Move], [Move]) alphaBeta abc = do let !d = maxdepth abc rmvs = Alt $ rootmvs abc lpv = Seq $ lastpv abc searchReduced a b = pvRootSearch a b d lpv rmvs True searchFull = pvRootSearch salpha0 sbeta0 d lpv rmvs False r <- if useAspirWin ... pvRootSearch :: Node m => Int -> Int -> Int -> Seq Move -> Alt Move -> Bool -> Search m (Int, Seq Move, Alt Move) ... And then the chess specific implementation of the game state in another module: type Game r m = STPlus r MyState m ... instance CtxMon m => Node (Game r m) where staticVal = staticVal0 materVal = materVal0 genEdges = genMoves ... genMoves :: CtxMon m => Int -> Int -> Bool -> Game r m ([Move], [Move]) genMoves depth absdp pv = do Nicu

...

(*) If so, were you using the strict version of State?

Would it be possible to see the differences between the 2 versions of you code?

2011/9/27 Nicu Ionita mailto:nicu.ionita@acons.at>

Hello list,

Starting from this emails (http://web.archiveorange.com/archive/v/nDNOvSM4JT3GJRSjOm9P) I could refactor my code (a UCI chess engine, with complex functions, in which the search has a complex monad stack) to run twice as fast as with even some hand unroled state transformer! So from 23-24 kilo nodes per second it does now 45 to 50 kNps! And it looks like there is still some improvement room (I have to play a little bit with strictness annotations and so on).

(Previously I tried specializations, then I removed a lot of polimorphism, but nothing helped, it was like hitting a wall.)

Even more amazingly is that I could program it although I cannot really understand the Cont & ContT, but just taking the code example from Ryan Ingram (newtype ContState r s a = ...) and looking a bit at the code from ContT (from the transformers library), and after fixing some compilation errors, it worked and was so fast.

I wonder why the transformers library does not use this kind of state monad definition. Or does it, and what I got is just because of the unrolling? Are there monad (transformers) libraries which are faster? I saw the library kan-extensions but I did not understand (yet) how to use it.

Nicu

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org mailto:Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

5033

Age (days ago)

5035

Last active (days ago)

List overview

Download

8 comments

6 participants

participants (6)

Bas van Dijk
Ertugrul Soeylemez
Nicu Ionita
Ryan Ingram
Thomas Schilling
Yves Parès