
Johan Tibell wrote:>
I've written what I believe to be a similar, continuation-based parser. I haven't uploaded my latest patches (basically faster combinators) but the idea can be seen in the file here:
http://www.johantibell.com/cgi-bin/gitweb.cgi?p=hyena.git;a=blob;f=Hyena/Par...
The use case is parsing HTTP without resorting to lazy I/O.
I have just read through your code. It is quite similar. The differences: Your error handling is via Alternative, and if the first branch advances (consumes input) then the second branch is not attempted. The state can only go forward (by 1 byte) or remain in place (there is no look-ahead). If reading past the end, then either (*) the new first byte works (*) the new first fails and the Alternative is ready to try it. In MyGet/MyGetW/MyGetSimplified the MonadError semantics are different. If the first alternative fails then the parser state is rolled back and the second alternative is tried from the same starting point as the first was tried. If the first alternative trigged more input from IPartial then this input is still visible to the second alternative. The management of saved state on the stack of pending operations is much simpler with your commit-if-advance semantics and much more complicated with my rollback semantics. Oddly, it seems your committed operations do not immediately release the pending handlers so they can be garbage collected. This same kind of issue motivated me to improve the implementation of binary-strict's incremental get. On a different note: Hmmm....In the MyGetW implementation I could add a fancier throwError/Alternative command that allows the user to "commit" to the current branch and immediately release/discard the pending handler/second branch. Something like:
d = mplus (mplus a b) c where a = do comThing <- getCommitter x <- getWord32be catchError (commitTo comThing >> throwError "WTF") (\errMsg -> liftIO (print errMsg)) b = liftIO (print "b") c = liftIO (print "c")
In the above pseudo code the "commitTo" will cause the throwError to bypass both the (\errMsg -> ...) handler and the "b" alternative. It will go to "c" instead. The "comThing" is an opaque value that is the unique ID of the current error handler frame, in the above case it is the frame for "mplus a b". So the commitTo causes the system to immediately abandon (and allow garbage collection) of the (errMsg -> ..) and "b" code. Hmmm.... -- Chris