question, chapter 10 Real World Haskell

I'll looking at the parser example, page 242 in Chapter 10 of Real World Haskell, and they are defining a type of monadic parser with the help of an operator they call ==> You can find chapter 10 online. This ebook doesn't have page numbers, but you can find the example I'm looking at in the second called "A more interesting parser", about 40% of the way down: http://book.realworldhaskell.org/read/code-case-study-parsing-a-binary-data-... The authors have defined their parser by chaining together functions with ==>. The first function is "getState". What confuses me is: they use getState to "get the state out of the Parser," but a Parser is by definition a function that takes the parse state as its lone argument. I don't understand why they can't drop getState entirely. Maybe it's a simply a way to avoid wrapping the entire function in Parser (...)? Some of this stuff looks "inefficient" to me, but I realize that in a lazy language with an optimizing compiler you can often write long chains of functions (many of which discard their results) and not impede efficiency. Thanks, Mike

On Sat, 11 Apr 2009 02:02:04 -0700
Michael Mossey
I'll looking at the parser example, page 242 in Chapter 10 of Real World Haskell, and they are defining a type of monadic parser with the help of an operator they call ==>
You can find chapter 10 online. This ebook doesn't have page numbers, but you can find the example I'm looking at in the second called "A more interesting parser", about 40% of the way down:
http://book.realworldhaskell.org/read/code-case-study-parsing-a-binary-data-...
The authors have defined their parser by chaining together functions with ==>. The first function is "getState". What confuses me is: they use getState to "get the state out of the Parser," but a Parser is by definition a function that takes the parse state as its lone argument. I don't understand why they can't drop getState entirely. Maybe it's a simply a way to avoid wrapping the entire function in Parser (...)?
Some of this stuff looks "inefficient" to me, but I realize that in a lazy language with an optimizing compiler you can often write long chains of functions (many of which discard their results) and not impede efficiency.
Thanks, Mike
_______________________________________________ Beginners mailing list Beginners@haskell.org http://www.haskell.org/mailman/listinfo/beginners
This is a matter of good coding style (in the sense of modularity, maintanability, etc.). As you've seen it, the authors have first started with a (>>?) function that allows them to combine Maybes, that is to combine functions with a possibility of error. We therefore had the following functions and types:
data Maybe a -- The type of parsing results (>>?) :: Maybe a -> (a -> Maybe b) -> Maybe b -- To combine parsing functions
Nothing :: Maybe a -- To signal an error Just :: a -> Maybe a -- To return a result without error
Code written using this (i.e. parseP5_take2) makes explicit use of the representation of parsing results as Maybe values. When they added implicit state and error messages to the machinery, they had to modify their parsing code by renaming things around. More importantly, the changes affected all their code, even the parts that wouldn't make use of the new functionality. If they had already written a huge body of code that could work perfectly well without reporting errors or accessing state, they'd still have needed to refactor it. Now assume they had, instead, treated the type of parsing results as abstract from the beginning:
newtype Parse a = Parse (Maybe a) -- The type of parsing results (===>) :: Parse a -> (a -> Parse b) -> Parse b p ===> f = case p of Parse Nothing -> Parse Nothing Parse (Just a) -> f a
failure :: Parse a failure = Parse Nothing
result :: a -> Parse a result a = Parse (Just a)
Writing their code in terms of Parse, (===>), failure and result rather than Maybe, (>>?), Nothing and Just wouldn't have been any more difficult. But they could then have added implicit state and error messages by simpliy replacing the definitions:
newtype Parse a = Parse (ParseState -> Either String (a, ParseState)) -- The type of parsing results (===>) :: Parse a -> (a -> Parse b) -> Parse b (===>) = ...
failure = bail "generic error" result = identity
-- And the new primitives: bail :: String -> Parse a getState :: Parse ParseState putState :: ParseState -> Parse ()
Doing so, code that didn't make use of the new functionality wouldn't have needed to be modified _at all_. Code that made use of it would simply have called the new functions bail, getState and putState at appropriate points. Given the magnitude of the change in functionality, this is actually pretty incredible. Now, and to finally answer your question: Besides state and errors, there are a number of other useful facilities that can fit in the framework of a (Parse a) type and a (===>) function as above. Real World Haskell will introduce them in a systematic way starting with Chapter 14 (Monads). As long as our parsing code treats the Parse type as abstract, adding another one of these facilities (like, say, backtracking) will be as straightforward as adding state and error messages was in my example above, and won't require any change in code that doesn't make use of it. If we started explicitly using the fact that (Parse a) is a function on ParseState, things would be different. It is, generally, a matter of good style in all programming paradigms to treat values as abstract as often as possible and only access them through a well-defined interface that isn't dependent on their actual representations. This is even more true in cases like this, when the interface in question conforms to a well-known pattern (in this particular case, it's a Monad) that can accomodate a large number of different capabilities and uses.

Quentin Moser wrote:
On Sat, 11 Apr 2009 02:02:04 -0700 Michael Mossey
wrote: This is a matter of good coding style (in the sense of modularity, maintanability, etc.).
As you've seen it, the authors have first started with a (>>?) function that allows them to combine Maybes, that is to combine functions with a possibility of error. ...
Thanks for the detailed explanation. I like how you show that the original >>? function, and the method of signaling either an error or a result, could have been written with ==> notation. But, I'm still confused on a point. Let me put it this way. The authors wrote: -- file: ch10/Parse.hs -- import the Word8 type from Data.Word parseByte :: Parse Word8 parseByte = getState ==> \initState -> case L.uncons (string initState) of Nothing -> bail "no more input" Just (byte,remainder) -> putState newState ==> \_ -> identity byte where newState = initState { string = remainder, offset = newOffset } newOffset = offset initState + 1 Why couldn't they have avoided getState by writing parseByte = Parse ( \initState -> case L.unconcs (string initState) of ... ... newOffset = offset initState +1 ) ...because a parser is by definition a function that takes ParseState as its input. I understand you are saying that in future chapters they may introduce new capabilities that fit within this framework, so maybe that's what I'm not seeing. Maybe they will redefine getState (or something equivalent) so that it does more than grab the unchanged state from the Parse. But I'm curious to know if my second implementation works the same as the first, in theory. Thanks

Am Samstag 11 April 2009 16:41:50 schrieb Michael Mossey:
Quentin Moser wrote:
On Sat, 11 Apr 2009 02:02:04 -0700 Michael Mossey
wrote: This is a matter of good coding style (in the sense of modularity, maintanability, etc.).
As you've seen it, the authors have first started with a (>>?) function that allows them to combine Maybes, that is to combine functions with a possibility of error.
...
Thanks for the detailed explanation. I like how you show that the original
? function, and the method of signaling either an error or a result, could have been written with ==> notation.
But, I'm still confused on a point. Let me put it this way. The authors wrote:
-- file: ch10/Parse.hs -- import the Word8 type from Data.Word parseByte :: Parse Word8 parseByte = getState ==> \initState -> case L.uncons (string initState) of Nothing -> bail "no more input" Just (byte,remainder) -> putState newState ==> \_ -> identity byte where newState = initState { string = remainder, offset = newOffset } newOffset = offset initState + 1
Why couldn't they have avoided getState by writing
parseByte = Parse ( \initState -> case L.unconcs (string initState) of ... ... newOffset = offset initState +1 )
...because a parser is by definition a function that takes ParseState as its input. I understand you are saying that in future chapters they may introduce new capabilities that fit within this framework, so maybe that's what I'm not seeing. Maybe they will redefine getState (or something equivalent) so that it does more than grab the unchanged state from the Parse.
In a real library, the constructor Parse would not be exported, to allow later changing the implementation of the Parse type without breaking user code. So user code like parseByte cannot access the constructor and must use the exported API (getState, putState, identity, bail, ...).
But I'm curious to know if my second implementation works the same as the first, in theory.
As long as the constructor is accessible, yes.
Thanks

Daniel Fischer wrote:
In a real library, the constructor Parse would not be exported, to allow later changing the implementation of the Parse type without breaking user code. So user code like parseByte cannot access the constructor and must use the exported API (getState, putState, identity, bail, ...).
thanks for the point, I get it. One other question. Later in the chapter they define peekByte as follows: -- file: ch10/Parse.hs peekByte :: Parse (Maybe Word8) peekByte = (fmap fst . L.uncons . string) <$> getState Here they are accessing the 'string' field of the state. So whomever writes this function needs to have the accessor functions. At this point I'm wondering how much state is really getting hidden. Or maybe peekByte would only be written inside the original library. I was just playing around and discovered you can choose to export the accessor functions or not. Let's say we have -------- module Mod where data What = What { acc :: Int64 } -------- now in a separate module we write: ------------------- -- All fine: import Mod x1 = What 3 x2 = What { acc = 3 } x3 = acc x1 --------------- Now we revisit the first module and write module Mod ( What(What) ) where Now, import Mod x1 = What 3 -- Fine x2 = What { acc = 3 } -- Error x3 = acc x1 -- Error So apparently this exports the constructor but not the accessor fields. The item can be constructed in the normal way, but not in record syntax way. Finally module Mod ( What(acc) ) where --> This exports the accessors which can be used to access, but not to construct module Mod (What(..)) where --> export everything So the lesson is, if we are going to allow users to write functions like peekByte, we have to export the accessor functions, but not necessarily any constructors.

On Sat, 11 Apr 2009 09:37:53 -0700
Michael Mossey
One other question. Later in the chapter they define peekByte as follows:
-- file: ch10/Parse.hs peekByte :: Parse (Maybe Word8) peekByte = (fmap fst . L.uncons . string) <$> getState
Here they are accessing the 'string' field of the state. So whomever writes this function needs to have the accessor functions. At this point I'm wondering how much state is really getting hidden. Or maybe peekByte would only be written inside the original library.
Even before worrying about the accessor functions, a parsing library would in the first place not even export getState, putState, or the ParseState type. It would instead provide functions like parseByte and peekByte as primitives from which all complex parsers will be built.

Am Samstag 11 April 2009 19:04:22 schrieb Quentin Moser:
On Sat, 11 Apr 2009 09:37:53 -0700
Michael Mossey
wrote: One other question. Later in the chapter they define peekByte as follows:
-- file: ch10/Parse.hs peekByte :: Parse (Maybe Word8) peekByte = (fmap fst . L.uncons . string) <$> getState
Here they are accessing the 'string' field of the state. So whomever writes this function needs to have the accessor functions. At this point I'm wondering how much state is really getting hidden. Or maybe peekByte would only be written inside the original library.
The point is not so much hiding the state, but hiding the details of the Parse type. newtype Parse a = Parse { runParse :: ParseState -> Either String (a,ParseState) } is, except for the names, StateT ParseState (Either String) a. What peekByte assumes is that there is an instance MonadState ParseState Parse where... , implicitly at least. peekByte won't break if the implementation of Parse is changed, as long as that is maintained. If you look at the haddock docs for Parsec, you'll see that there the State is exposed while GenParser is only exported as an abstract type, allowing the details to be changed if deemed advantageous.
Even before worrying about the accessor functions, a parsing library would in the first place not even export getState, putState, or the ParseState type. It would instead provide functions like parseByte and peekByte as primitives from which all complex parsers will be built.
Parsec does: getParserState setParserState
participants (3)
-
Daniel Fischer
-
Michael Mossey
-
Quentin Moser