
Johan Tibell wrote:
Hi Chris,
Thanks for providing feedback. It's much appreciated.
On Wed, Jul 30, 2008 at 11:34 PM, Chris Kuklewicz
wrote: The differences: Your error handling is via Alternative, and if the first branch advances (consumes input) then the second branch is not attempted. The state can only go forward (by 1 byte) or remain in place (there is no look-ahead). If reading past the end, then either (*) the new first byte works (*) the new first fails and the Alternative is ready to try it.
This is by design. This is enough for parsing many network protocols like HTTP and avoids having to deal with backtracking and the potential space leaks and bookkeeping overhead that might come with it.
I agree that the design space of get and parsing monads is quite broad. That is why my small endeavor now has 3 monads: MyGetW with MonadCont, MyGet without MonadCont, and MyGetSimplified without MonadCont,Reader,Writer,State,Trans.
The management of saved state on the stack of pending operations is much simpler with your commit-if-advance semantics and much more complicated with my rollback semantics. Oddly, it seems your committed operations do not immediately release the pending handlers so they can be garbage collected. This same kind of issue motivated me to improve the implementation of binary-strict's incremental get.
I'm not sure what you mean by releasing pending handlers. It's probably something I've not considered. Could you please elaborate?
Absolutely. It does not affect the semantics of the computation, but it does affect the space performance. Your Hyena.Parser has a state, a success continuation, and a failure continuation. The state has a count of the number of bytes consumed. The Alterantive <|> operation is the interesting one:
data S = S S.ByteString Int64 Bool
newtype Parser a = Parser { unParser :: forall r. S -> (a -> S -> IResult r) -> (S -> IResult r) -> IResult r }
instance Alternative Parser where empty = Parser $ \s _ fail -> fail s
p <|> p' = Parser $ \s@(S _ pos _) succ fail -> unParser p s succ $ \s'@(S _ pos' _) -> if pos == pos' then unParser p' s' succ fail else fail s'
I will rewrite this for clarity, and define the function fail' as
p <|> p' = Parser $ \s@(S _ pos _) succ fail -> let fail' s'@(S _ pos' _) = if pos == pos' then unParser p' s' succ fail else fail s' in unParser p s succ fail'
It is now clear that <|> works by allocating fail' which hold a reference to the original pos and succ and fail. Happily, it does not hold a reference to the original S.ByteString because your handlers have commit-if-advance semantics. The commit-if-advance semantics come from (if pos == pos') which checks to see whether p has consumed any of the input. In practice pos <= pos' will always hold, since p cannot have gone backwards. When does fail' get released? It gets released after p fails by calling it or after p and thus <|> succeeds. But fail' reverts in behavior to fail as soon as p advances pos. So Parser keeps fail' longer than needed. For example: if p uses bytes then even though bytes advances for a while the fail' is not released until it hits an error or p finishes. The core problem is the error handlers in Hyena.Parser are written in the form of nested scopes and this does not model their semantics. They behave like the Bool in the state and thus 'fail' can be moved into 'S' ... which I have done in the attached file. The meat of the difference is in 'satisfy'. When pos is incemented to (pos+1) the (Maybe fail) is set to Nothing which signifies that the top-level failure continuation 'failed' is in effect. The 'fail' functions, which are allocated in <|> as before, are thus immediately released when the parser advances. Whether this makes it better or not depends on how deep the nested stack <|> calls become. A very deep stack had many fail' waiting in vain under the original Hyena.Parser and are released much much sooner under my version. [ Also, I deleted IPartial. ] Cheers, Chris K {-# LANGUAGE Rank2Types #-} -- modified by Chris Kuklewicz -- * remove IResult -- * move fail continuation to state S, inside Maybe [changes S to (S r)] -- * clear fail to Nothing when advancing -- * use new 'callFailed' to invoke failure continuation ------------------------------------------------------------------------ -- | -- Module : Hyena.Parser -- Copyright : (c) Johan Tibell 2008 -- License : BSD3-style (see LICENSE) -- -- Maintainer : johan.tibell@gmail.com -- Stability : experimental -- Portability : portable -- -- An incremental LL(1) parser combinator library. -- ------------------------------------------------------------------------ module Hyena.Parser ( -- * The Parser type Parser, Result(..), runParser, -- * Primitive parsers satisfy, byte, bytes, module Control.Applicative ) where import Control.Applicative import qualified Data.ByteString as S import Data.Int (Int64) import Data.Word (Word8) import Prelude hiding (fail, succ) import Text.Show.Functions import Debug.Trace -- --------------------------------------------------------------------- -- The Parser type -- | The parse state. data S r = S {-# UNPACK #-} !S.ByteString {-# UNPACK #-} !Int64 {-# UNPACK #-} !Bool {-# UNPACK #-} !(Maybe (S r -> Result r)) -- error hander moved to state deriving Show setFail :: S r -> Maybe (S r -> Result r) -> S r setFail (S a b c _) mfail = (S a b c mfail) getFail :: S r -> Maybe (S r -> Result r) getFail (S _ _ _ mfail) = mfail clearFail :: S r -> S r clearFail (S a b c _) = S a b c Nothing -- | A parse either succeeds, fails or returns a suspension with which -- the parsing can be resumed. data Result a = Finished a S.ByteString -- ^ Parsing succeeded and produced a value of type -- @a@. The returned 'S.ByteString' is the remaining -- unconsumed input. | Failed Int64 -- ^ Parsing failed at the given position. Either -- because the parser didn't match the input or because -- an unexpected end of input was reached during -- parsing. | Partial (Maybe S.ByteString -> Result a) -- ^ The parsing needs more input to continue. Pass in -- @Just input@ to continue parsing and @Nothing@ to -- signal end of input. If @Nothing@ is passed the -- 'Result' is either 'Finished' or 'Failed'. deriving Show -- | A parser takes a parse state, a success continuation and returns a 'Result'. newtype Parser a = Parser { unParser :: forall r. S r -> (a -> S r -> Result r) -> Result r } -- --------------------------------------------------------------------- -- Instances instance Functor Parser where fmap f p = Parser $ \s succ -> unParser p s (succ . f) {-# INLINE fmap #-} instance Applicative Parser where pure a = Parser $ \s succ -> succ a s {-# INLINE pure #-} p <*> p' = Parser $ \s succ -> let newSucc f s' = unParser p' s' (succ . f) in unParser p s newSucc {-# INLINE (<*>) #-} instance Alternative Parser where empty = Parser $ \s _ -> callFailed s {-# INLINE empty #-} p <|> p' = Parser $ \s@(S _ _ _ mfail) succ -> let newS = setFail s (Just (\s' -> unParser p' (setFail s' mfail) succ)) in unParser p newS succ {-# INLINE (<|>) #-} -- --------------------------------------------------------------------- -- Running a parser initState :: S.ByteString -> S r initState bs = S bs 0 False Nothing {-# INLINE initState #-} -- | This is the final continuation that turns a successful parse into -- a 'Result'. finished :: r -> S r -> Result r finished v (S bs _ _ _) = Finished v bs -- | This is the final continuation that turns an unsuccessful parse -- into a 'Result'. failed :: S r -> Result a failed (S _ pos _ _) = Failed pos -- | TODO: Write documentation. runParser :: Parser a -> S.ByteString -> Result a runParser p bs = unParser p (initState bs) finished callFailed :: S r -> Result r callFailed s@(S _ _ _ Nothing) = failed s callFailed s@(S _ _ _ (Just fail)) = fail s -- --------------------------------------------------------------------- -- Primitive parsers -- | The parser @satisfy p@ succeeds for any byte for which the -- supplied function @p@ returns 'True'. Returns the byte that is -- actually parsed. satisfy :: (Word8 -> Bool) -> Parser Word8 satisfy p = Parser $ \s@(S bs pos eof mfail) succ -> case S.uncons bs of Just (b, bs') -> if p b then succ b (S bs' (pos + 1) eof Nothing) -- clear (Maybe fail) on advance else callFailed s Nothing -> if eof then callFailed s else Partial $ \x -> case x of Just bs' -> retry (S bs' pos eof mfail) Nothing -> callFailed (S bs pos True mfail) where retry s' = unParser (satisfy p) s' succ -- | @byte b@ parses a single byte @b@. Returns the parsed byte -- (i.e. @b@). byte :: Word8 -> Parser Word8 byte b = satisfy (== b) -- | @bytes bs@ parses a sequence of bytes @bs@. Returns the parsed -- bytes (i.e. @bs@). bytes :: S.ByteString -> Parser S.ByteString bytes = fmap S.pack . go where go bs = case S.uncons bs of Just (b, bs') -> liftA2 (:) (byte b) (go bs') Nothing -> pure []