
I'm using a control structure that's a variation of a monad and I'm interested in whether - it's got a name - it deserves a name (!) - anything else similar is used elsewhere Please excuse the longer post... I have two programs that need to interact with the outside world, and I want to constrain the nature of these interactions. I don't want to just sprinkle IO throughout the code. In the first program, I am reading on-demand from a database - just reading, not making any changes. In the second, I am requesting computations to be evaluated externally, in order to take advantage of a grid of machines. In both of these cases, the external requests don't change the state of the world, and the programs can be consided "pure" as long as the world isn't being changed by some other means. Hence the requests can be reordered, performed in parallel, and optimised in various ways without affecting the result. To make this concrete, if an external request has a type like: a -> m b where a is the input to the request, b is the response, and m is some monad, probably IO. Then a data structure capturing calculations over these requests, with result type c can be: data SCalc a b c = SCResult c | SCStage { sc_req :: a, sc_calc :: b -> SCalc a b c } The idea is that we either have a result, or we need to make a external request, whose response is used to generate a new calculation. "Running" such a calculation is straightforward: runSC :: (Monad m ) => SCalc a b c -> (a -> m b) -> m c runSC (SCResult v) _ = return v runSC (SCStage a cf) reqf = do b <- reqf a runSC (cf b) reqf and calculations can be sequence by making a monad instance: instance Monad (SCalc a b) where return = SCResult (>>=) (SCResult v) cf = cf v (>>=) (SCStage a cf1) cf = SCStage a (\b -> cf1 b >>= cf) Where it gets interesting, however, is that when they don't depend on each other, we can run these calculations in parallel. We need the ability to merge requests, and split responses. Hence, class Req a where merge :: a -> a -> a class Resp b where split :: b -> (b,b) par :: (Req a, Resp b) => SCalc a b c -> SCalc a b d -> SCalc a b (c,d) The par primitive above can be used to define other parallel operations, such as parList :: (Req a, Resp b) => [SCalc a b c] -> SCalc a b [c] I found it worthwhile to try and visualise what's going on here. Let's say I have 4 calculations that I want to run in parallel. The first doesn't need a request; the second needs to make a single request (A1); the third needs to make two requests where the second (B2) depends on the result of the first (B1), etc. The resulting parallel operations will be done in 3 batches, looking like: batch1 batch2 batch3 result calc1 ------------------------> V0 calc2 A1 ----------------> V1 calc3 B1 B2 -------> V2 calc4 C1 C2 C3 --> V3 (excuse the ascii art). batch1 will consist of A1,B1,C1 merged together; batch2 of B2,C3 merged; etc. In practice, I've used the above data types to abstract out database access in some reporting code. It works quite well, as use of the parallel primitive above means that the haskell code talking to the database sees all of the information in each batch simultaneously, so it can optimise the queries, remove redundant requests etc. It also makes the reporting code pure despite the fact that information is being loaded on demand from the db (without any unsafe calls behind the scene). I guess the use of the term "pure" here should be qualified: the impure code has been factored out to a single function in a different module that has a limited and well defined interface. I haven't implemented the grid calculation example described above, though I see that it ought to be able to work similarly, potentially removing duplicate calculation requests, etc. So my questions are: does this sort of "monad allowing parallel evaluation" structure have a name? Is it an existing design pattern in fp somewhere that I haven't seen? thanks, Tim

"Tim Docker"
I'm using a control structure that's a variation of a monad and I'm interested in whether
- it's got a name - it deserves a name (!) - anything else similar is used elsewhere
You might have reinvented arrows in some sense: http://www.haskell.org/arrows/syntax.html http://www.md.chalmers.se/Cs/Research/Functional/Fudgets/userguide.html The sequencing and parallellizing seems similar, as well as something fuzzy about the notions of streams. Thinking of streams, in some way the stream fusion system is similar, if you mentally add parallelism. If not, it's an interesting read in any case: http://www.cse.unsw.edu.au/~dons/papers/CLS07.html

This monad seems to be basically the same as Prompt; see http://www.haskell.org/pipermail/haskell-cafe/2007-November/034830.html, the only difference I see is that Prompt allows the return value's type to be based on the request instead of forcing everything to be wrapped in a single result type. You implemented the monad operations exactly the same as Prompt, and your bind operator suffers from the same quadratic behavior problem that was pointed out in that thread. As was pointed out there, what you are doing is turning the potential side effects of your computations into a term algebra, which allows you to write different "interpretation" functions to use when running the calculation (the reqf passed to runSC). As far as I can tell, this pattern is general enough to implement any computation, so it's not surprising that you found it possible to use it to implement parallel computation. As an example, here's the State monad implemented in terms of SCalc:
data StateReq s = Get | Put s get :: SCalc (StateReq s) s s get = SCStage Get return put :: s -> SCalc (StateReq s) s () put s = SCStage (Put s) (const $ return ())
runState :: SCalc (StateReq a) s b -> s -> (a, s) runState (SCResult v) s = (v, s) runState (SCStage Get cont) s = runState (cont s) s runState (SCStage (Put s) cont) _ = runState (cont s) s
I think it's a useful pattern and I definitely am getting a lot of use out of Prompt in my code. But I'm trying to figure out the best way to eliminate the quadratic behavior of (>>=) that is exhibited by, for example: foldl1 (>>=) $ take 100 $ repeat $ (\x -> put (x+1) >>= get) $ 0 The only way I've found so far is to wrap Prompt inside of ContT which solves the problem in much the same way that difference lists (newtype DList a = [a] -> [a]) solve the problem of quadratic time append for lists. -- ryan

Tim Docker wrote:
I found it worthwhile to try and visualise what's going on here. Let's say I have 4 calculations that I want to run in parallel. The first doesn't need a request; the second needs to make a single request (A1); the third needs to make two requests where the second (B2) depends on the result of the first (B1), etc. The resulting parallel operations will be done in 3 batches, looking like:
This sounds similar to "futures", where a request for a parallel computation returns immediately, but the "value" returned is just a placeholder for the result, which will be filled in when it becomes available. Paul.

What you described has a Java parallel:
In the first program, I am reading on-demand from a database - just reading, not making any changes.
This is similar to EJB's entity bean. Usually EJB requires XA driver to preserve database state across the cluster, but in your read-only case, this is not needed. HDBC puts you into the IO monad. You need to get yourself out of IO monad if you believe your code is "pure".
In the second, I am requesting computations to be evaluated externally, in order to take advantage of a grid of machines.
This is similar to J2EE's clustering, where beans can sit on any node of the cluster and be accessed via RMI. The ability to handle (de)serialization is the key here. Java's solution is proprietary, while Microsoft dot NET has choosen SOAP to request computation over network. Network calls will also put you into the IO monad, which has to be hidden in your case.. If you came up with a solution, it might have invented the H2EE. Steve
participants (5)
-
Achim Schneider
-
Paul Johnson
-
Ryan Ingram
-
Steve Lihn
-
Tim Docker