
Ben Franksen wrote:
Next thing I'll try is to transform such a grammar into an actual parser...
Which I also managed to get working. However, this exposed yet another problem I am not sure how to solve. The problem manifests itself with non-left-factored rules like Number ::= Digit Number | Digit Translating such a grammar directly into a Parsec parser leads to parse errors because Parsec's choice operator is predictive: if a production has consumed any input the whole choice fails, even if alternative productions would not: *Main> P.parseTest (parseGrammar myGrm) "2" parse error at (line 1, column 2): unexpected end of input expecting Number Of course, one solution is to apply Parsec's try combinator to all choices in a rule. But this rather defeats the purpose of using a (by default) predictive parser in the first place which is to avoid unnecessary backtracking. So, a better solution is to left-factor the grammar before translating to Parsec. Since we have a data representation of the grammar that we can readily analyse and transform, this should be possible given some suitable algorithm. But how is this transformation to be typed? My first naive attempt was to define (recap: nt :: * -> * is the type of nonterminals, t :: * is the type of terminals a.k.a. tokens, and a is the result type):
leftFactor :: Grammar nt t a -> Grammar nt t a
Of course, this is wrong: Left-factoring is expected to introduce new nonterminals, so on the right-hand side of the type we should not have the same 'nt' as on the left. Instead we shoudl have some other type that is "'nt' extended with new constructors". Moreover, we cannot statically know how many new nonterminals are added, so we cannot simply apply a type function to nt. Is this solvable at all in Haskell or do I need proper dependent types to express this? I have very vague ideas that revolve around setting up some recursive type function that on each level adds one constructor, define a common interface with a (multiparam) type class and then use existential quantification in the result type to hide the resulting type of nonterminals. The next question is: Even if this turns out to be possible, isn't it overkill? Maybe it is better to use an infinite type for the nonterminals in the first place and let the grammar be a partial function? OTOH, the formulation of the grammar as a function that pattern matches on the nonterminals is very elegant. Cheers Ben