
Thanks,
adding state to lexer seems to be the way to go.
2011/2/16 Mihai Maruseac
On Wed, Feb 16, 2011 at 5:31 PM, Roman Dzvinkovsky
wrote: Hi,
using alex+happy, how could I parse lines like these?
"mr <username> says <message>\n"
where both <username> and <message> may contain arbitrary characters (except eol)?
If I make lexer tokens
"mr " { T_Mr } " says " { T_Says } \r?\n { T_Eol } . { T_Char $$ }
and parser
'mr ' { T_Mr } ' says ' { T_Says } eol { T_Eol } char { T_Char }
...
line :: { (String, String) } : 'mr ' string ' says ' string eol { ($2, $4) }
string :: { String } : char { [ $1 ] } | char string { $1 : $2 }
then I get error when <username> or <message> contain "mr " substrings, because parser encounters T_Mr token.
Workaround is mention all small tokens in my <string> definition:
string :: { String } : { [] } | 'mr ' string { "mr " ++ $2 } | ' says ' string { " says " ++ $2 } | char string { $1 : $2 }
but that is weird and I'm sure there is a better way.
I don't have an implementation right now but you could try having some states or user data in which to record whether you have already parsed the 'mr ' part (etc..) Guess you could use monadUserData parser (just like I've found after a night without sleep [1] - solved now).
-- Mihai
[1]: http://www.haskell.org/pipermail/haskell-cafe/2011-February/089330.html