
On 2009-10-07 17:29, Robert Atkey wrote:
A deep embedding of a parsing DSL (really a context-sensitive grammar DSL) would look something like the following. I think I saw something like this in the Agda2 code somewhere, but I stumbled across it when I was trying to work out what "free" applicative functors were.
The Agda code you saw may have been due to Ulf Norell and me. There is a note about it on my web page: http://www.cs.nott.ac.uk/~nad/publications/danielsson-norell-parser-combinat...
Note that these grammars are strictly less powerful than the ones that can be expressed using Parsec because we only have a fixed range of possibilities for each rule, rather than allowing previously parsed input to determine what the parser will accept in the future.
Previously parsed input /can/ determine what the parser will accept in the future (as pointed out by Peter Ljunglöf in his licentiate thesis). Consider the following grammar for the context-sensitive language {aⁿbⁿcⁿ| n ∈ ℕ}: data NT a where Start :: NT () -- Start ∷= aⁿbⁿcⁿ ABC :: Nat -> NT () -- ABC n ∷= aˡbⁿ⁺ˡcⁿ⁺ˡ X :: Char -> Nat -> NT () -- X x n ∷= xⁿ g :: Grammar NT g Start = nt (ABC 0) g (ABC n) = char 'a' <* nt (ABC (succ n)) <|> nt (X 'b' n) <* nt (X 'c' n) g (X c n) | n == 0 = pure () | otherwise = char c <* nt (X c (pred n))
And a general definition for parsing single-digit numbers. This works for any set of non-terminals, so it is a reusable component that works for any grammar:
Things become more complicated if the reusable component is defined using non-terminals which take rules (defined using an arbitrary non-terminal type) as arguments. Exercise: Define a reusable variant of the Kleene star, without using grammars of infinite depth. -- /NAD This message has been checked for viruses but the contents of an attachment may still contain software viruses, which could damage your computer system: you are advised to perform your own checks. Email communications with the University of Nottingham may be monitored as permitted by UK legislation.