ANNOUNCE: CC-delcont-0.1; Delimited continuations for Haskell

Hello, After a bit of hacking, and much documenting, I'm pleased to announce a preliminary release of a delimited continuation library for Haskell. The implementation is drawn from "A Monadic Framework for Delimited Continuations"[1] by Dybvig, Petyon Jones and Sabry, although it has some abstraction available over said implementation (there is both a monad and a transformer; control operations will work interchangeably with either. In addition, all four control operators from "Shift to Control"[2] are implemented). In addition, I have done the adaptation necessary to include the Haskell implementation of the dynamic scoping investigated in "Delimited Dynamic Binding"[3] by Kiselyov, Shan and Sabry, and it has been included. One can think of it as embeddings of the reader or state monads into the delimited control monads (only more flexible). In the future, I'd like to also, possibly, include something like Oleg's generic zippers, and possibly other useful abstractions based on delimited continuations, if I find them. But that's work for another day, I suppose. If you wish to try it out, a package is available on hackage: Page: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/CC-delcont-0.1 Tabrall: http://hackage.haskell.org/packages/archive/CC-delcont/0.1/CC-delcont-0.1.ta... However, I should warn that it's fairly dependent on GHC HEAD at the moment: 1) It uses the Unsafe.Coerce module 2) It uses GADTs to store type class evidence 3) It uses GHC-features that haddock will choke on, so haddock.ghc will be required for docs (and I haven't gotten that working yet to check that the haddock is bug-free), which in turn requires GHC HEAD. 1 & 2 could be relaxed if there's interest. Please feel free to let me know if you find any bugs, or have any suggestions for improvements. I'll follow up this message with a message containing an example program and a discussion thereof. Cheers, Dan Doel 1: http://www.cs.indiana.edu/~sabry/papers/monadicDC.pdf 2: http://www.cs.rutgers.edu/~ccshan/recur/recur.pdf 3: http://okmij.org/ftp/papers/DDBinding.pdf

This is intended as an illustration of how one might use the CC-delcont library, and/or what it might be good for at all. For more, a mailing list search for 'oleg' would likely be fruitful, as this library is heavily derived from the delimited continuation implementation he's used in past mails. Essentially, this is a port of Oleg's 'Incremental, undoable parsing' as seen on the OCaml mailing list. It does lack some of the properties of the OCaml implementation, however (which I'll talk more about later). The original mail is available here: http://caml.inria.fr/pub/ml-archives/caml-list/2007/07/7a34650001bf6876b71c7... This message is (hopefully) literate haskell, and should be able to be run directly, as long as the CC-delcont library is installed.
{-# OPTIONS_GHC -fglasgow-exts #-}
module Parse where
import Text.ParserCombinators.Parsec
import Control.Monad import Control.Monad.Trans import Control.Monad.CC import Control.Monad.CC.Dynvar
------------ The Inverter ------------ First comes a datatype that is, in some sense, a reification of the act of parsing. Done, of course, means that the parsing is over, and it holds the result of parsing. ReqChar means that the parser is paused, waiting for more input. It holds a position, indicating how many characters it's read so far.
data StreamReq m a where Done :: Monad m => a -> StreamReq m a ReqChar :: Monad m => Int -> (Maybe Char -> m (StreamReq m a)) -> StreamReq m a
fromDone :: StreamReq m a -> a fromDone (Done a) = a
instance Show (StreamReq m a) where show (Done _) = "Done" show (ReqChar p _) = "Position: " ++ show p
And now comes the magic. toList below takes a function that, given a position in a stream (list), returns the element that should be at that position, and builds such a stream. The important part is that this function works in the context of an arbitrary monad. streamInv is such a "position -> element" function, but when asked for a character, it captures 'the rest of the stream production,' and reifies it into a StreamReq using delimited continuations. This is what allows one to, in essence, get a pointer into 'the act of parsing,' and pass such things around, and even have multiple pointers in play at once.
streamInv :: MonadDelimitedCont p s m => p (StreamReq m a) -> Int -> m (Maybe Char) streamInv p pos = shift p (\sk -> return $ ReqChar pos (sk . return))
toList :: Monad m => (Int -> m (Maybe a)) -> m [a] toList f = toList' 0 where toList' n = f n >>= maybe (return []) (\c -> liftM (c:) $ toList' (n+1))
The following three functions operate on the resumable parsers, either providing input, or telling the parser that there is no more input. Hopefully they're fairly straight forward.
provide :: Monad m => Char -> StreamReq m a -> m (StreamReq m a) provide _ d@(Done _) = return d provide c (ReqChar _ f) = f $ Just c
finish :: Monad m => StreamReq m a -> m (StreamReq m a) finish d@(Done _) = return d finish (ReqChar _ f) = f Nothing
provideSome :: Monad m => String -> StreamReq m a -> m (StreamReq m a) provideSome [] s = return s provideSome (x:xs) s = provide x s >>= provideSome xs
The following is sort of a wrapper. It turns a parsing function into a resumable parser. As can be seen, the parser function is pure; you can turn any existing parse function into a resumable parser in this manner. In other words, there's no need to worry about having to go through all your code, putting in delimited continuation constraints on all your functions; no "monad pollution" is involved.
invertParse :: (MonadIO m, MonadDelimitedCont p s m) => (String -> a) -> m (StreamReq m a) invertParse parser = reset $ \p -> (Done . parser) `liftM` toList (streamInv p)
---------- The parser ---------- This is the parser that will be inverted in the example. It reads a sum of several numbers; something like: 1 + 23 + 46 + 59 + 102 and returns the list of numbers to be summed (as an [Integer]). As can be seen, this is just an ordinary Parsec parser.
plus = char '+' >> spaces >> return (++) number = many1 digit >>= \n -> spaces >> return (read n :: Integer) total p = p >>= \a -> getInput >>= guard . null >> return a sump = total $ chainl1 (liftM return number) plus
----- Usage ----- This portion finally makes use of the above stream inversion to interesting effect (I hope). It will repeatedly ask for input to parse, gradually feeding that input into the parser, until a blank line is provided, at which point it assumes the user is finished. However, at each user input, the program stops to see if the input up to that point would have been a successful parse. If so, it saves that partial parser. If, when the user signals the end of input, the parse fails, the program will resume taking input from the last successful parse. The saving of the successful parses is achieved through the use of the dynamically scoped variables also included in this library. They can, essentially, be viewed as more flexible versions of the reader or state monads (depending on whether you use the set operations or not) that can be used within the context of delimited continuation monads.
prompt q p = do liftIO $ print p liftIO $ putStrLn "(Empty line to stop)" liftIO $ putStr ": " l <- liftIO getLine if null l then finish p else saveValid q p >> provideSome l p >>= prompt q
saveValid q p = do p' <- finish p case p' of Done (Right _) -> dset q p >> return () _ -> return ()
check _ p@(Done (Right _)) = return p check q p = do liftIO $ putStrLn "Invalid parse." liftIO $ putStrLn "Restarting from last good parse." dupp q (prompt q) >>= check q
main :: IO () main = runCCT $ do q <- dnew p <- invertParse (parse sump "Equation") dlet q p (prompt q p >>= check q >>= liftIO . print . fromDone)
And now for the caveats... If one does some experimentation, he will find that things don't work exactly the same as the OCaml version. For instance, if one tries to emulate the OCaml example exactly, by doing something like (unchecked; I think the code is well-formed): reset (\p -> toList (streamInv p) >>= return . lexer >>= printTokens >> return (Done ())) he'll find that tokens are only printed once a given resumable parser is closed (which may, of course, happen multiple times), and that if he closes multiple parsers from the same source, tokens will be printed for each parser, even over the shared portions. I believe these have the following causes: 1) The delay is due to being highly-sequentialized. Before the lexer and printTokens can do anything, toList has to get the entire list to pass along the monadic pipeline. It can't be produced lazily, because the pieces are being pulled out of monadic computations, which have to be sequenced. 2) The duplication is (possibly) a related matter. One way to solve #1 (I think) would be to parse over a list which incorporated monads. So a cons would have a value at head, and as a tail, a monadic action to produce the rest of the list. If this were used, only enough monadic actions as are needed to produce the output would be performed. If the output of the lexer were produced similarly, the printing could execute in a lazy fashion, and, possibly, the branching caused by the continuation monad could be limited to only the portions of the parse that aren't duplicated. For an example of this sort of idea explored, see "ListT done right" on the wiki: http://haskell.org/haskellwiki/ListT_done_right However, such a solution involves a lot of the monad creep that people tend not to like, and which the above code avoids. The duplication of the output of a printTokens function isn't a big deal; after all, in most cases, one would want to get whole results out of alternate parses, *and then* inspect/print them. However, the over-strictness can be a pain. In fact, it has a direct consequence for the above code, in that parse errors are never detected until an inverted parser is closed. With a lazy list input, the following is well defined: parse sump "Sum" ("1 + 2 - 3" ++ undefined) The result will be a parse error, as the '-' is immediately recognized as a problem. However, if one feeds "1 + 2 - 3" into a resumable parser, it will not recognize that the parse cannot complete, and advance to Done. Instead, it waits until the parser is closed, at which point, it can produce the input list, and get to the parse error. This is one reason that the code above periodically closes the parser, checks for a good parse, and saves the unclosed version if it finds one. However, there are parsers where prefixes of correct input will not themselves parse (consider a parser for something like "1 + 2 + 3 + 4 = 10"; without an '=' and right-hand side of the equation, it's not a correct parse, though it may be a prefix to one), and the above method won't be able to restart as closely to an error as one that, say, saved after each input but resumed when legitimately bad input automatically kicked the parser to Done. Whether this is enough motivation to push monads further into one's data structures, or drop Haskell and go with implicitly monadic code everywhere in OCaml is left as an exercise to the reader. :) Anyhow, I hope I've demonstrated something of interest, and given you a taste of what you can do with this library, and how to do it. There is some more explanation and some elementary examples in the haddock, although due to the code's use of some exotic GHC-isms, haddock.ghc, at the least, will be needed to generate the documentation, and I have yet to get that fully working here, so there may still be bugs in the haddock. Cheers, Dan Doel

Hello again, I apologize for replying to myself, but since no one else is talking to me, I suppose I have no choice. :) Anyhow, in case some people were intrigued, but simply didn't speak up (and because I was interested in seeing how easily it could be done), I took the liberty of implementing a version of the parser inverter that mimics the OCaml semantics pretty closely (I think). As I mentioned, this involves making a list data type that incorporates monads, so that it can be lazy in the side effects used to produce it. In short it looks like this: data MList' m a = MNil | MCons a (MList m a) type MList m a = m (MList' m a) So, each list tail (including the entire list) is associated with a side effect, which has the ultimate effect that you can build lists in ways such as: toMList :: Monad m => m (Maybe t) -> MList m t toMList gen = gen >>= maybe nil (`cons` toMList gen) This is the MList analogue of the toList function from the previous list (slightly modified here to demonstrate the similarity): toList :: Monad m => m (Maybe a) -> m [a] toList gen = gen >>= maybe (return []) (\c -> liftM (c:) $ toList gen) However, toList uses liftM, which will strictly sequence the effects (the recursive toList call has to complete before the whole list is returned), whereas toMList simply adds the *monadic action* to produce the rest of the list as the tail, and so the side effects it entails don't actually occur until a consumer asks to see that part of the list. So, the proof is in the output. The sample program (source included as an attachment) demonstrates normal lexing (where the underlying monad is just IO) and inverted lexing (which uses delimited continuations layered over IO). The 'lexing' is just the 'words' function adapted to MLists (I thought about doing a full-on parser, but I think that'd require making the parser a monad transformer (essentially) over the base monad, which would be complex, to say the least). The relevant parts look like so: normalLex :: IO () normalLex = printTokens (wordsML (liftList "The quick brown fox jumps over the lazy dog")) reqLex :: CCT ans IO () reqLex = do p1 <- begin p2 <- provideSome "The quick brown " p1 pStrLn "Break 1" p3 <- provideSome "fox jumps over " p2 pStrLn "Break 2" p4 <- provideSome "the laz" p3 pStrLn "Break 3" provideSome "y dog" p4 >>= finish pStrLn "Rollback" provideSome "iest dog" p4 >>= finish return () Which main invokes appropriately. Output looks like so: Normal Lexing ------------- The quick brown fox jumps over the lazy dog ------------- Inverted Lexing --------------- The quick brown Break 1 fox jumps over Break 2 the Break 3 lazy dog Rollback laziest dog --------------- So, success! Tokens are printed out as soon as the lexer is able to recognize them, properly interleaved with other IO side effects, and resuming from an intermediate parse does not cause duplication of output. So, that wasn't really that hard to hack up. However, I should mention that it wasn't trivial, either. When converting list functions to MList functions, you have to be very careful not to perform side effects twice. For instance, my first pass gave output like: ... he uick rown Break 1 ox ... Although it worked fine with the normal lexer. The culprit? I had written nullML like so: nullML :: Monad m => MList m a -> m Bool nullML m = isNothing `liftM` uncons m But in that version, testing for null, and then using the list performs side effects twice, and due to the way the delimited continuations produce MLists, characters were getting dropped! The correct version is: nullML :: Monad m => MList m a -> m (Bool, MList m a) nullML m = uncons m >>= maybe (return (True, nil)) (\(a,m') -> return (False, a `cons` m')) Which returns both whether the list is null, and a new list that won't perform a duplicate side effect. So, I guess what I'm saying is that reasoning about code with lots of embedded side effects can be difficult. :) As a final aside, it should be noted that to get the desired effect (that is, laziness with interleaved side effects), it's important to make use of the monadic data structures as much as possible. For instance, wordsML produces not an (m [MList m a]) or MList m [a] or anything like that (although the latter may work), but an MList m (MList m a), which is important for the effects to be able to get a hold over printTokens. However, if you want to produce something that's not a list, say, a tree, you'll have to write an MTree, or, in general, one lazy-effectful data structure for each corresponding pure structure you'd want to use. What a pain! However, there may be a way to alleviate that if you write all your structures in terms of shape functors. For instance: data ListShape a x = LNil | LCons a x newtype Fix f = In { out :: f (Fix f) } -- I think this is right type List a = Fix (ListShape a) And in general, many recursive data structures can be expressed as the fixed-point of shape functors. The kicker is, you can get the monadic version for free: newtype MShape m f x = M (f (m x)) type MList m a = m (Fix (MShape m (ListShape a))) -- = m (MShape m (ListShape a) (Fix (MShape m (ListShape a)))) -- = m (ListShape a (m (Fix (MShape m (ListShape a))))) -- = m (ListShape a (MList m a)) -- = m (LNil | LCons a (MList m a)) -- same as our manual definition -- I think the above substitutions are right, but I may have -- misstepped Of course, I haven't investigated this avenue, so I don't know if it helps in actually writing functions that *use* such data structures (and it might kill your ability to deforest/use an optimized representation underneath). However, I thought it was a cute use of the sort of thing you're likely to see in papers that apply category theory to Haskell, but typically not in practice. Anyhow, I hope that was of some interest to at least someone out there. If you have questions or comments, feel free to respond. Cheers Dan Doel
participants (1)
-
Dan Doel