parsec and multiple simultaneous token types

28 Mar 2010

      Hello,

I have some input which is divided up into segments like this:

["foo", "hi", "there", "world"]

And I want to use parsec to parse the segments. I am looking for a way to be
able to use Char parsers on each segment, but also parse the list of
segments as a whole.

I have an implementation which works, but I am not feeling that satisfied
with it. An example parser looks like this:

testp :: GenParser String () (Char, String, String)
testp =
  do segment "foo"
     st <- p2u (char 'h' >> char 'o')
     sg <- anySegment
     sg' <- anySegment
     return (st,sg, sg')

If you use it to parse ["foo", "hi", "there", "world"] you get:

*Main> test
["foo","hi","there","world"] (segment 2 character 2):
unexpected "i"
expecting "o"

This is good -- the error tells you the segment and the offset. The
combinators, segment and anySegment match on String tokens. The p2u function
converts a Char parser into a String parser.

The primary issue is that I can not figure out how to implement p2u so that
it works under Parsec 2 and 3 with out changes.

The secondary issue is that I feel like the DSL is not that great. I would
rather write the above parser like this:

testp :: GenParser String () (Char, String, String)
testp =
  do string "foo"
       nextSegment
       char 'h' >> char 'o'
       nextSegment
       sg <- many anyChar
       nextSegment
       sg' <- many anyChar
       return (st,sg, sg')

so, the combinators like 'many' would only see to the end of the current
segment. nextSegment would bring the next segment into context (or possibly
fail if the current segment was not completely consumed). I could also,
perhaps, reimplement anySegment as:

anySegment =
   do s <- many anyChar
        nextSegment
        return s

and write:

testp :: GenParser String () (Char, String, String)
testp =
  do string "foo"
       nextSegment
       char 'h' >> char 'o'
       nextSegment
       sg <- anySegment
       sg' <- anySegment
       return (st,sg, sg')

I don't see a way to implement this on top of parsec though.

Am I missing some clever ideas here?

I have attached my code that implements the first method. Any ideas how to
make it parsec 2/3 agnostic ? The primary issue is that if I use runParser
on the inner Char parser, I need some way to stick a ParseError back into
the parent context (with out just converting it to a String and calling
fail). I can see how to get/set the inputState, etc, in a portable way, but
I don't see how to set the ParseError in a portable way.The attached code
requires 3.1.0 because it uses mkPT / runParsecT :(

thanks!
- jeremy

Jeremy Shaw

Stephen Tetley

tags

participants (2)