Re: [Haskell-cafe] Designing a Parser

Actually, I haven't sent this question to the list before. So you're in no danger of repeating yourself. Thanks for your kind reply anyway Paul At 07:03 17/02/2008, you wrote:
On Feb 17, 2008 6:20 AM, PR Stanley
wrote: I can't think of an elegant pattern for the last function. I've already tried set comprehension. However, something tells me that the answer may lie in a complex recursive pattern. I'm not afraid of complex solutions but I naturally see them as an easy way out. It's those clear simple patterns that separate men from mice. :-) I'd be grateful for any advice on this and indeed my approach as a whole. If you think I'm on the wrong path from the start feel free to let me know. I'm not asking for the answer. I'd like to work that out by myself although some guidance would be most appreciated.
Yes, I definitely think you're on the wrong path from the start. In high school when I was learning C++ I wrote an arithmetic expression parser in something like this fashion. I scanned the input for the first opening bracket, then I walked forward to the matching closing bracket and extracted the subexpression and evaluated it.
The approach turned out to be both complicated and inefficient. I think the biggest problem was that it didn't scale well with implementation complexity; eg. to add support for prefix functions like sin() was almost impossible. Think about how you would go about doing such things using your approach.
I think I've seen you ask questions relating to this on the list before, and at the risk of repeating others, I suggest the ReadS in the Haskell Prelude. You get the benefit of it being a clean, top-down solution as well as the benefit of having a lot of primitive Haskell types already implemented for you (such as Integers).
I'll describe the approach again, describing the combinators, and then afterward showing how you might use them to accomplish this problem. I'll leave the details up to you, since you can learn a lot by implementing combinator libraries like this.
First, you build a bunch of functions which are "parsers". A "parser" is just a function of type String -> [(a,String)] for some type 'a'; that is, it takes a string and returns a list of "parses", where a parse is the value parsed paired with the remainder of the string. So if you have a function:
parseInt :: String -> [(Int,String)]
Then you can expect this result:
parseInt "123 hello world" = [(123, " hello world")]
Or maybe even:
parseInt "123 hello world" = [(123, " hello world"), (12, "3 hello world"), (1, "23 hello world")]
But that latter behavior is not recommended in this case (i.e. it is advisable force int parsing to be greedy).
Then to combine parsers you can use some combination operations:
sequenceParsers :: (String -> [(a,String)]) -> (a -> String -> [(b, String)]) -> String -> [(b,String)]
That one may be easier to see if you use the built-in type synonym ReadS a = String -> [(a, String)]:
sequenceParsers :: ReadS a -> (a -> ReadS b) -> ReadsB
The second argument here is a function, because we have already parsed the first argument and know its value, so the second argument ought to be able to use it. We can also write:
alternateParsers :: ReadS a -> ReadS a -> ReadS a
Which gives all valid parses that the first one recognizes concatenated with all valid parsers that the second one recognizes.
Implementation of these combinators is left to you. Since the types of these functions are quite general, you can use a type-directed approach (i.e. if your implementation uses all available data and it typechecks, it's probably correct).
Now that you have these, how do you use them to actually parse something? Let's parse simple logical expressions. First we need a data structure to parse into:
data Exp = Variable String | AndExp Exp Exp | OrExp Exp Exp
Let's do it with a top down coding strategy: write a function to parse an expression using whatever helper functions we need but haven't written yet :-)
parseExp :: ReadS Exp parseExp = parseAnyOf [ parseVariable, parseAndExp, parseOrExp ]
Where we haven't written parseAnyOf yet. Write that inductively on the list:
parseAnyOf :: [ReadS a] -> ReadS a parseAnyOf [] = \input -> [] parseAnyOf (p:ps) = alternateParsers p (parseAnyOf ps)
And jump in to the next thing we haven't written, parseVariable:
parseVariable :: ReadS Exp parseVariable = mapParser Variable parseString
mapParser doesn't actually parse anything, it just parses whatever its argument does and applies a function to the result:
mapParser :: (a -> b) -> ReadS a -> ReadS b mapParser :: (a -> b) -> (String -> [(a,String)]) -> String -> [(b,String)]
I rewrote the type signature to help guide your implementation.
That should get you started, and show you how the approach usually goes. You seem to already get the idea of writing many small functions and composing them together. This is the same idea, except the functions are abstracting the problem more.
Luke

Writing a parsing library like this is a great learning experience; Graham Hutton wrote a paper you can follow along with entitled "Monadic Parsing in Haskell" at http://www.cs.nott.ac.uk/~gmh/bib.html#pearl But if you're just interested in writing a parser, and not in writing a parser generator, I recommend Parsec; it's included with GHC as Text.ParserCombinators.Parsec, and there is a tutorial in the second part of this page: http://legacy.cs.uu.nl/daan/download/parsec/parsec.html -- ryan
participants (2)
-
PR Stanley
-
Ryan Ingram