
This is intended as an illustration of how one might use the CC-delcont library, and/or what it might be good for at all. For more, a mailing list search for 'oleg' would likely be fruitful, as this library is heavily derived from the delimited continuation implementation he's used in past mails. Essentially, this is a port of Oleg's 'Incremental, undoable parsing' as seen on the OCaml mailing list. It does lack some of the properties of the OCaml implementation, however (which I'll talk more about later). The original mail is available here: http://caml.inria.fr/pub/ml-archives/caml-list/2007/07/7a34650001bf6876b71c7... This message is (hopefully) literate haskell, and should be able to be run directly, as long as the CC-delcont library is installed.
{-# OPTIONS_GHC -fglasgow-exts #-}
module Parse where
import Text.ParserCombinators.Parsec
import Control.Monad import Control.Monad.Trans import Control.Monad.CC import Control.Monad.CC.Dynvar
------------ The Inverter ------------ First comes a datatype that is, in some sense, a reification of the act of parsing. Done, of course, means that the parsing is over, and it holds the result of parsing. ReqChar means that the parser is paused, waiting for more input. It holds a position, indicating how many characters it's read so far.
data StreamReq m a where Done :: Monad m => a -> StreamReq m a ReqChar :: Monad m => Int -> (Maybe Char -> m (StreamReq m a)) -> StreamReq m a
fromDone :: StreamReq m a -> a fromDone (Done a) = a
instance Show (StreamReq m a) where show (Done _) = "Done" show (ReqChar p _) = "Position: " ++ show p
And now comes the magic. toList below takes a function that, given a position in a stream (list), returns the element that should be at that position, and builds such a stream. The important part is that this function works in the context of an arbitrary monad. streamInv is such a "position -> element" function, but when asked for a character, it captures 'the rest of the stream production,' and reifies it into a StreamReq using delimited continuations. This is what allows one to, in essence, get a pointer into 'the act of parsing,' and pass such things around, and even have multiple pointers in play at once.
streamInv :: MonadDelimitedCont p s m => p (StreamReq m a) -> Int -> m (Maybe Char) streamInv p pos = shift p (\sk -> return $ ReqChar pos (sk . return))
toList :: Monad m => (Int -> m (Maybe a)) -> m [a] toList f = toList' 0 where toList' n = f n >>= maybe (return []) (\c -> liftM (c:) $ toList' (n+1))
The following three functions operate on the resumable parsers, either providing input, or telling the parser that there is no more input. Hopefully they're fairly straight forward.
provide :: Monad m => Char -> StreamReq m a -> m (StreamReq m a) provide _ d@(Done _) = return d provide c (ReqChar _ f) = f $ Just c
finish :: Monad m => StreamReq m a -> m (StreamReq m a) finish d@(Done _) = return d finish (ReqChar _ f) = f Nothing
provideSome :: Monad m => String -> StreamReq m a -> m (StreamReq m a) provideSome [] s = return s provideSome (x:xs) s = provide x s >>= provideSome xs
The following is sort of a wrapper. It turns a parsing function into a resumable parser. As can be seen, the parser function is pure; you can turn any existing parse function into a resumable parser in this manner. In other words, there's no need to worry about having to go through all your code, putting in delimited continuation constraints on all your functions; no "monad pollution" is involved.
invertParse :: (MonadIO m, MonadDelimitedCont p s m) => (String -> a) -> m (StreamReq m a) invertParse parser = reset $ \p -> (Done . parser) `liftM` toList (streamInv p)
---------- The parser ---------- This is the parser that will be inverted in the example. It reads a sum of several numbers; something like: 1 + 23 + 46 + 59 + 102 and returns the list of numbers to be summed (as an [Integer]). As can be seen, this is just an ordinary Parsec parser.
plus = char '+' >> spaces >> return (++) number = many1 digit >>= \n -> spaces >> return (read n :: Integer) total p = p >>= \a -> getInput >>= guard . null >> return a sump = total $ chainl1 (liftM return number) plus
----- Usage ----- This portion finally makes use of the above stream inversion to interesting effect (I hope). It will repeatedly ask for input to parse, gradually feeding that input into the parser, until a blank line is provided, at which point it assumes the user is finished. However, at each user input, the program stops to see if the input up to that point would have been a successful parse. If so, it saves that partial parser. If, when the user signals the end of input, the parse fails, the program will resume taking input from the last successful parse. The saving of the successful parses is achieved through the use of the dynamically scoped variables also included in this library. They can, essentially, be viewed as more flexible versions of the reader or state monads (depending on whether you use the set operations or not) that can be used within the context of delimited continuation monads.
prompt q p = do liftIO $ print p liftIO $ putStrLn "(Empty line to stop)" liftIO $ putStr ": " l <- liftIO getLine if null l then finish p else saveValid q p >> provideSome l p >>= prompt q
saveValid q p = do p' <- finish p case p' of Done (Right _) -> dset q p >> return () _ -> return ()
check _ p@(Done (Right _)) = return p check q p = do liftIO $ putStrLn "Invalid parse." liftIO $ putStrLn "Restarting from last good parse." dupp q (prompt q) >>= check q
main :: IO () main = runCCT $ do q <- dnew p <- invertParse (parse sump "Equation") dlet q p (prompt q p >>= check q >>= liftIO . print . fromDone)
And now for the caveats... If one does some experimentation, he will find that things don't work exactly the same as the OCaml version. For instance, if one tries to emulate the OCaml example exactly, by doing something like (unchecked; I think the code is well-formed): reset (\p -> toList (streamInv p) >>= return . lexer >>= printTokens >> return (Done ())) he'll find that tokens are only printed once a given resumable parser is closed (which may, of course, happen multiple times), and that if he closes multiple parsers from the same source, tokens will be printed for each parser, even over the shared portions. I believe these have the following causes: 1) The delay is due to being highly-sequentialized. Before the lexer and printTokens can do anything, toList has to get the entire list to pass along the monadic pipeline. It can't be produced lazily, because the pieces are being pulled out of monadic computations, which have to be sequenced. 2) The duplication is (possibly) a related matter. One way to solve #1 (I think) would be to parse over a list which incorporated monads. So a cons would have a value at head, and as a tail, a monadic action to produce the rest of the list. If this were used, only enough monadic actions as are needed to produce the output would be performed. If the output of the lexer were produced similarly, the printing could execute in a lazy fashion, and, possibly, the branching caused by the continuation monad could be limited to only the portions of the parse that aren't duplicated. For an example of this sort of idea explored, see "ListT done right" on the wiki: http://haskell.org/haskellwiki/ListT_done_right However, such a solution involves a lot of the monad creep that people tend not to like, and which the above code avoids. The duplication of the output of a printTokens function isn't a big deal; after all, in most cases, one would want to get whole results out of alternate parses, *and then* inspect/print them. However, the over-strictness can be a pain. In fact, it has a direct consequence for the above code, in that parse errors are never detected until an inverted parser is closed. With a lazy list input, the following is well defined: parse sump "Sum" ("1 + 2 - 3" ++ undefined) The result will be a parse error, as the '-' is immediately recognized as a problem. However, if one feeds "1 + 2 - 3" into a resumable parser, it will not recognize that the parse cannot complete, and advance to Done. Instead, it waits until the parser is closed, at which point, it can produce the input list, and get to the parse error. This is one reason that the code above periodically closes the parser, checks for a good parse, and saves the unclosed version if it finds one. However, there are parsers where prefixes of correct input will not themselves parse (consider a parser for something like "1 + 2 + 3 + 4 = 10"; without an '=' and right-hand side of the equation, it's not a correct parse, though it may be a prefix to one), and the above method won't be able to restart as closely to an error as one that, say, saved after each input but resumed when legitimately bad input automatically kicked the parser to Done. Whether this is enough motivation to push monads further into one's data structures, or drop Haskell and go with implicitly monadic code everywhere in OCaml is left as an exercise to the reader. :) Anyhow, I hope I've demonstrated something of interest, and given you a taste of what you can do with this library, and how to do it. There is some more explanation and some elementary examples in the haddock, although due to the code's use of some exotic GHC-isms, haddock.ghc, at the least, will be needed to generate the documentation, and I have yet to get that fully working here, so there may still be bugs in the haddock. Cheers, Dan Doel