[Haskell-cafe] Re: ANNOUNCE: CC-delcont-0.1; Delimited continuations for Haskell

16 Jul 2007

      Hello again,

I apologize for replying to myself, but since no one else is talking to me, I 
suppose I have no choice. :)

Anyhow, in case some people were intrigued, but simply didn't speak up (and 
because I was interested in seeing how easily it could be done), I took the 
liberty of implementing a version of the parser inverter that mimics the 
OCaml semantics pretty closely (I think). As I mentioned, this involves 
making a list data type that incorporates monads, so that it can be lazy in 
the side effects used to produce it. In short it looks like this:

    data MList' m a = MNil | MCons a (MList m a)
    type MList m a = m (MList' m a)

So, each list tail (including the entire list) is associated with a side 
effect, which has the ultimate effect that you can build lists in ways such 
as:

    toMList :: Monad m => m (Maybe t) -> MList m t
    toMList gen = gen >>= maybe nil (`cons` toMList gen)

This is the MList analogue of the toList function from the previous list 
(slightly modified here to demonstrate the similarity):

    toList :: Monad m => m (Maybe a) -> m [a]
    toList gen = gen >>= maybe (return []) (\c -> liftM (c:) $ toList gen)

However, toList uses liftM, which will strictly sequence the effects (the 
recursive toList call has to complete before the whole list is returned), 
whereas toMList simply adds the *monadic action* to produce the rest of the 
list as the tail, and so the side effects it entails don't actually occur 
until a consumer asks to see that part of the list.

So, the proof is in the output. The sample program (source included as an 
attachment) demonstrates normal lexing (where the underlying monad is just 
IO) and inverted lexing (which uses delimited continuations layered over IO). 
The 'lexing' is just the 'words' function adapted to MLists (I thought about 
doing a full-on parser, but I think that'd require making the parser a monad 
transformer (essentially) over the base monad, which would be complex, to say 
the least). The relevant parts look like so:

    normalLex :: IO ()
    normalLex = printTokens
                   (wordsML
                      (liftList
                         "The quick brown fox jumps over the lazy dog"))

    reqLex :: CCT ans IO ()
    reqLex = do p1 <- begin
                p2 <- provideSome "The quick brown " p1
                pStrLn "Break 1"
                p3 <- provideSome "fox jumps over " p2
                pStrLn "Break 2"
                p4 <- provideSome "the laz" p3
                pStrLn "Break 3"
                provideSome "y dog" p4 >>= finish
                pStrLn "Rollback"
                provideSome "iest dog" p4 >>= finish
                return ()

Which main invokes appropriately. Output looks like so:

    Normal Lexing
    -------------
    The
    quick
    brown
    fox
    jumps
    over
    the
    lazy
    dog
    -------------

    Inverted Lexing
    ---------------
    The
    quick
    brown
    Break 1
    fox
    jumps
    over
    Break 2
    the
    Break 3
    lazy
    dog
    Rollback
    laziest
    dog
    ---------------

So, success! Tokens are printed out as soon as the lexer is able to recognize 
them, properly interleaved with other IO side effects, and resuming from an 
intermediate parse does not cause duplication of output.

So, that wasn't really that hard to hack up. However, I should mention that it 
wasn't trivial, either. When converting list functions to MList functions, 
you have to be very careful not to perform side effects twice. For instance, 
my first pass gave output like:

    ...
    he
    uick
    rown
    Break 1
    ox
    ...

Although it worked fine with the normal lexer. The culprit? I had written 
nullML like so:

    nullML :: Monad m => MList m a -> m Bool
    nullML m = isNothing `liftM` uncons m

But in that version, testing for null, and then using the list performs side 
effects twice, and due to the way the delimited continuations produce MLists, 
characters were getting dropped! The correct version is:

    nullML :: Monad m => MList m a -> m (Bool, MList m a)
    nullML m = uncons m >>= maybe (return (True, nil))
                                  (\(a,m') -> return (False, a `cons` m'))

Which returns both whether the list is null, and a new list that won't perform 
a duplicate side effect. So, I guess what I'm saying is that reasoning about 
code with lots of embedded side effects can be difficult. :)

As a final aside, it should be noted that to get the desired effect (that is, 
laziness with interleaved side effects), it's important to make use of the 
monadic data structures as much as possible. For instance, wordsML produces 
not an (m [MList m a]) or MList m [a] or anything like that (although the 
latter may work), but an MList m (MList m a), which is important for the 
effects to be able to get a hold over printTokens. However, if you want to 
produce something that's not a list, say, a tree, you'll have to write an 
MTree, or, in general, one lazy-effectful data structure for each 
corresponding pure structure you'd want to use. What a pain!

However, there may be a way to alleviate that if you write all your structures 
in terms of shape functors. For instance:

    data ListShape a x = LNil | LCons a x
    newtype Fix f = In { out :: f (Fix f) } -- I think this is right
    type List a = Fix (ListShape a)

And in general, many recursive data structures can be expressed as the 
fixed-point of shape functors. The kicker is, you can get the monadic version 
for free:

    newtype MShape m f x = M (f (m x))
    type MList m a = m (Fix (MShape m (ListShape a)))
        -- = m (MShape m (ListShape a) (Fix (MShape m (ListShape a))))
        -- = m (ListShape a (m (Fix (MShape m (ListShape a)))))
        -- = m (ListShape a (MList m a))
        -- = m (LNil | LCons a (MList m a)) -- same as our manual definition
        -- I think the above substitutions are right, but I may have
        -- misstepped

Of course, I haven't investigated this avenue, so I don't know if it helps in 
actually writing functions that *use* such data structures (and it might kill 
your ability to deforest/use an optimized representation underneath). 
However, I thought it was a cute use of the sort of thing you're likely to 
see in papers that apply category theory to Haskell, but typically not in 
practice.

Anyhow, I hope that was of some interest to at least someone out there. If you 
have questions or comments, feel free to respond.

Cheers
Dan Doel