
On Sat, 11 Apr 2009 02:02:04 -0700
Michael Mossey
I'll looking at the parser example, page 242 in Chapter 10 of Real World Haskell, and they are defining a type of monadic parser with the help of an operator they call ==>
You can find chapter 10 online. This ebook doesn't have page numbers, but you can find the example I'm looking at in the second called "A more interesting parser", about 40% of the way down:
http://book.realworldhaskell.org/read/code-case-study-parsing-a-binary-data-...
The authors have defined their parser by chaining together functions with ==>. The first function is "getState". What confuses me is: they use getState to "get the state out of the Parser," but a Parser is by definition a function that takes the parse state as its lone argument. I don't understand why they can't drop getState entirely. Maybe it's a simply a way to avoid wrapping the entire function in Parser (...)?
Some of this stuff looks "inefficient" to me, but I realize that in a lazy language with an optimizing compiler you can often write long chains of functions (many of which discard their results) and not impede efficiency.
Thanks, Mike
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
This is a matter of good coding style (in the sense of modularity, maintanability, etc.). As you've seen it, the authors have first started with a (>>?) function that allows them to combine Maybes, that is to combine functions with a possibility of error. We therefore had the following functions and types:
data Maybe a -- The type of parsing results (>>?) :: Maybe a -> (a -> Maybe b) -> Maybe b -- To combine parsing functions
Nothing :: Maybe a -- To signal an error Just :: a -> Maybe a -- To return a result without error
Code written using this (i.e. parseP5_take2) makes explicit use of the representation of parsing results as Maybe values. When they added implicit state and error messages to the machinery, they had to modify their parsing code by renaming things around. More importantly, the changes affected all their code, even the parts that wouldn't make use of the new functionality. If they had already written a huge body of code that could work perfectly well without reporting errors or accessing state, they'd still have needed to refactor it. Now assume they had, instead, treated the type of parsing results as abstract from the beginning:
newtype Parse a = Parse (Maybe a) -- The type of parsing results (===>) :: Parse a -> (a -> Parse b) -> Parse b p ===> f = case p of Parse Nothing -> Parse Nothing Parse (Just a) -> f a
failure :: Parse a failure = Parse Nothing
result :: a -> Parse a result a = Parse (Just a)
Writing their code in terms of Parse, (===>), failure and result rather than Maybe, (>>?), Nothing and Just wouldn't have been any more difficult. But they could then have added implicit state and error messages by simpliy replacing the definitions:
newtype Parse a = Parse (ParseState -> Either String (a, ParseState)) -- The type of parsing results (===>) :: Parse a -> (a -> Parse b) -> Parse b (===>) = ...
failure = bail "generic error" result = identity
-- And the new primitives: bail :: String -> Parse a getState :: Parse ParseState putState :: ParseState -> Parse ()
Doing so, code that didn't make use of the new functionality wouldn't have needed to be modified _at all_. Code that made use of it would simply have called the new functions bail, getState and putState at appropriate points. Given the magnitude of the change in functionality, this is actually pretty incredible. Now, and to finally answer your question: Besides state and errors, there are a number of other useful facilities that can fit in the framework of a (Parse a) type and a (===>) function as above. Real World Haskell will introduce them in a systematic way starting with Chapter 14 (Monads). As long as our parsing code treats the Parse type as abstract, adding another one of these facilities (like, say, backtracking) will be as straightforward as adding state and error messages was in my example above, and won't require any change in code that doesn't make use of it. If we started explicitly using the fact that (Parse a) is a function on ParseState, things would be different. It is, generally, a matter of good style in all programming paradigms to treat values as abstract as often as possible and only access them through a well-defined interface that isn't dependent on their actual representations. This is even more true in cases like this, when the interface in question conforms to a well-known pattern (in this particular case, it's a Monad) that can accomodate a large number of different capabilities and uses.