uu-parsinglib - Greedy Parser - Haskell-Cafe

13 Jan 2015

      In the uu-parsinglib the choice combinator (<|>) is symmetric and does not commit to any alternative.
If the grammar being parsed is ambiguous, the corresponding parser will fail at run time on an ambiguous input.

Consider for instance this html parser.

pTag = pOpenTag <|> pCloseTag <|> pCommentTag <|> pContent

pContent = Content <$> some (satisfy (/= '<'))
pHtml = some pTag

This parser will fail on "<a>123</a>", because this may be interpreted as:
[ [Open a , Content "123", Close a], 
 [Open a, Content "12", Content "3", Close a],
 [Open a, Content "1", Content "2", Content "3", Close a] ... ]

In other parsing libraries such as parsec the choice operator <|> is greedy and commits to the first alternative that makes
any progress, so that some is greedy and Content "123" would be parsed.
The operator <<|> in uu-parsinglib has the same greedy behaviour.

I would like to disambiguate the grammar so that the first result ([Open a, Content "123", Close a]) is selected (longest matching rule).
I suppose that this may be done using <<|> to define a greedySome to be used in in pContent, however I am wondering whether 
there is another way to do this, without using such operator <<|>.

Any help is appreciated.

All the best,
Marco

uu-parsinglib - Greedy Parser

Marco Vassena

Peter Simons

S. Doaitse Swierstra

Marco Vassena

S. Doaitse Swierstra

oleg＠okmij.org

Mario Blažević

Kyle Marek-Spartz

tags

participants (6)