Consuming Rule Based Parsing

Christopher Howard

16 Nov 2012 16 Nov '12

8:55 a.m.

I haven't learned anything substantive yet regarding Haskell approaches to parsing, so I would appreciate any pointers towards appropriate modules, abstractions, tutorials, or such like. Specifically: For a particular application, I want to play around with a rules (grammer) based parsing which consumes an incoming data stream. That is, I'm not parsing a single type of document with a set structure. Rather, the idea is that I have a list of symbols (characters, numbers, whatever) and pattern match the front part of the list according to some (grammer) rule (which itself could be a conjunction or disjunction of rules) in a set of rules. If the match succeeds, then I consume that part of the list, and then analyze the remaining list, starting again with the first rule in my rule set. If the match fails, then I try the next rule in my rule set. (Sort of like Prolog's context free grammars and difference lists.) Backtracking would also be cool (to see what other results I can get). Is there a particular Haskell abstraction or system suitable for what I have described? -- frigidcode.com

Attachments:

signature.asc (application/pgp-signature — 551 bytes)

Show replies by date

Karl Voelker

16 Nov 16 Nov

9:55 a.m.

On Fri, Nov 16, 2012 at 12:55 AM, Christopher Howard < christopher.howard@frigidcode.com> wrote:

...

That is, I'm not parsing a single type of document with a set structure.

I think you are. See below.

...

Rather, the idea is that I have a list of symbols (characters, numbers, whatever) and pattern match the front part of the list according to some (grammer) rule (which itself could be a conjunction or disjunction of rules) in a set of rules. If the match succeeds, then I consume that part of the list, and then analyze the remaining list, starting again with the first rule in my rule set. If the match fails, then I try the next rule in my rule set.

If the things which can appear on the stream have grammars A, B, and C, then you can describe the grammar of the stream as a whole by: S -> (A | B | C)* So you are still just parsing an S document.

...

Backtracking would also be cool (to see what other results I can get).

Unregulated backtracking tends to produce slow parsers. And for a reasonably thought-out grammar, backtracking is not usually necessary. But it can be done.

...

Is there a particular Haskell abstraction or system suitable for what I have described?

There is an approach called "parser combinators". The basic idea is this: type Parser a = String -> [(a, String)] In other words, a parser of values of type "a" is a function which takes a string and returns all possible parses of some prefix of that string as an "a"-value, each paired with whatever input was left after parsing. You can then start writing functions to combine parsers. Thus, you end up with "parser combinators". There is a library called Parsec which is the industrial-strength incarnation of the parser combinator concept. But I would recommend that you start by getting familiar with a home-grown parser combinator library of your own. -Karl

Christopher Howard

17 Nov 17 Nov

1:13 a.m.

On 11/16/2012 12:55 AM, Karl Voelker wrote:

...

There is an approach called "parser combinators". The basic idea is this:

type Parser a = String -> [(a, String)]

In other words, a parser of values of type "a" is a function which takes a string and returns all possible parses of some prefix of that string as an "a"-value, each paired with whatever input was left after parsing.

You can then start writing functions to combine parsers. Thus, you end up with "parser combinators".

There is a library called Parsec which is the industrial-strength incarnation of the parser combinator concept. But I would recommend that you start by getting familiar with a home-grown parser combinator library of your own.

-Karl

Thank you for your help. Can you elaborate a little more on your explanation? So, would "a" be some data type representing the atoms of the grammar? Or some kind of recursive data type that could represent any kind of valid structure? Or...? I'll wait until I'm clear on that point before asking about how I would combine parsers. (Presumably, two combined parsers would both have to have the same "a" type.) -- frigidcode.com

Karl Voelker

3:04 a.m.

On Fri, Nov 16, 2012 at 5:13 PM, Christopher Howard < christopher.howard@frigidcode.com> wrote:

...

Thank you for your help. Can you elaborate a little more on your explanation? So, would "a" be some data type representing the atoms of the grammar? Or some kind of recursive data type that could represent any kind of valid structure? Or...?

The type parameter "a" is the output of that particular parser. So, a parser for an integer might have type String -> Integer, while a parser for some complicated data type Foo would have type String -> Foo. I'll wait until I'm clear on that point before asking about how I would

...

combine parsers. (Presumably, two combined parsers would both have to have the same "a" type.)

Since the parsers in this scheme are just functions, there are endless ways they could be combined, and the input and output types may or may not match. Some combinators you will probably want: andThen :: Parser a -> (a -> Parser b) -> Parser b orElse :: Parser a -> Parser a -> Parser a And you might also want: succeedWith :: a -> Parser a -Karl

Christopher Howard

3:53 a.m.

On 11/16/2012 06:04 PM, Karl Voelker wrote:

...

On Fri, Nov 16, 2012 at 5:13 PM, Christopher Howard mailto:christopher.howard@frigidcode.com> wrote:

Thank you for your help. Can you elaborate a little more on your explanation? So, would "a" be some data type representing the atoms of the grammar? Or some kind of recursive data type that could represent any kind of valid structure? Or...?

The type parameter "a" is the output of that particular parser. So, a parser for an integer might have type String -> Integer, while a parser for some complicated data type Foo would have type String -> Foo.

I'll wait until I'm clear on that point before asking about how I would combine parsers. (Presumably, two combined parsers would both have to have the same "a" type.)

Since the parsers in this scheme are just functions, there are endless ways they could be combined, and the input and output types may or may not match.

Some combinators you will probably want:

andThen :: Parser a -> (a -> Parser b) -> Parser b orElse :: Parser a -> Parser a -> Parser a

And you might also want:

succeedWith :: a -> Parser a

-Karl

Maybe I'm thinking about this all wrong... But it isn't quite clear to me how I make a generic function or operator that combines two Parsers of one (or two) types to make another Parser of a separate type. Say, for instance, I have a grammar which consists of atomic Nouns, atomic Verbs, and Sentences which are a combination of the two. So naturally I'd expect to have something like: data Noun = Noun String data Verb = Verb String data Sentence = Sentence Noun Verb nounParser :: Parser Noun nounParser = ... verbParser :: Parser Verb verbParser = ... sentenceParser :: Parser Sentence sentenceParser = nounParser <+> verbParser (<+>) :: ? (<+>) f g = ? -- frigidcode.com

Karl Voelker

4:03 a.m.

On Fri, Nov 16, 2012 at 7:53 PM, Christopher Howard < christopher.howard@frigidcode.com> wrote:

...

data Noun = Noun String data Verb = Verb String data Sentence = Sentence Noun Verb

nounParser :: Parser Noun nounParser = ...

verbParser :: Parser Verb verbParser = ...

This is a good example to start with.

...

sentenceParser :: Parser Sentence sentenceParser = nounParser <+> verbParser

(<+>) :: ? (<+>) f g = ?

Based on the types of nounParser, verbParser, and sentenceParser, we can infer that: (<+>) :: Parser Noun -> Parser Verb -> Parser Sentence But I suspect you were hoping that <+> would be more general-purpose. The combinator you want is the one I previously called "andThen": andThen :: Parser a -> (a -> Parser b) -> Parser b Which you could use like this: sentenceParser = nounParser `andThen` (\noun -> verbParser `andThen` (\verb -> succeedWith (Sentence noun verb))) Notice that the difference between your <+> and my andThen is that andThen requires the caller to provide the function that combines the two inputs into one output. This is what keeps it general. Now for a slight digression: If this looks frustratingly verbose, that's because it is. But if you make Parser into a monad (where return is succeedWith and >>= is andThen), you can use the syntactic sugar that is "do notation": sentenceParser = do noun <- nounParser verb <- verbParser return (Sentence noun verb) -Karl

4613

Age (days ago)

4614

Last active (days ago)

List overview

Download

5 comments

2 participants

participants (2)

Christopher Howard
Karl Voelker