happy + alex parsing question

Hi, using alex+happy, how could I parse lines like these?
"mr <username> says <message>\n"
where both <username> and <message> may contain arbitrary characters (except eol)? If I make lexer tokens
"mr " { T_Mr } " says " { T_Says } \r?\n { T_Eol } . { T_Char $$ }
and parser
'mr ' { T_Mr } ' says ' { T_Says } eol { T_Eol } char { T_Char }
...
line :: { (String, String) } : 'mr ' string ' says ' string eol { ($2, $4) }
string :: { String } : char { [ $1 ] } | char string { $1 : $2 }
then I get error when <username> or <message> contain "mr " substrings, because parser encounters T_Mr token. Workaround is mention all small tokens in my <string> definition:
string :: { String } : { [] } | 'mr ' string { "mr " ++ $2 } | ' says ' string { " says " ++ $2 } | char string { $1 : $2 }
but that is weird and I'm sure there is a better way. Thanks for advance, Roman.

On Wed, Feb 16, 2011 at 5:31 PM, Roman Dzvinkovsky
Hi,
using alex+happy, how could I parse lines like these?
"mr <username> says <message>\n"
where both <username> and <message> may contain arbitrary characters (except eol)?
If I make lexer tokens
"mr " { T_Mr } " says " { T_Says } \r?\n { T_Eol } . { T_Char $$ }
and parser
'mr ' { T_Mr } ' says ' { T_Says } eol { T_Eol } char { T_Char }
...
line :: { (String, String) } : 'mr ' string ' says ' string eol { ($2, $4) }
string :: { String } : char { [ $1 ] } | char string { $1 : $2 }
then I get error when <username> or <message> contain "mr " substrings, because parser encounters T_Mr token.
Workaround is mention all small tokens in my <string> definition:
string :: { String } : { [] } | 'mr ' string { "mr " ++ $2 } | ' says ' string { " says " ++ $2 } | char string { $1 : $2 }
but that is weird and I'm sure there is a better way.
I don't have an implementation right now but you could try having some states or user data in which to record whether you have already parsed the 'mr ' part (etc..) Guess you could use monadUserData parser (just like I've found after a night without sleep [1] - solved now). -- Mihai [1]: http://www.haskell.org/pipermail/haskell-cafe/2011-February/089330.html

Thanks,
adding state to lexer seems to be the way to go.
2011/2/16 Mihai Maruseac
On Wed, Feb 16, 2011 at 5:31 PM, Roman Dzvinkovsky
wrote: Hi,
using alex+happy, how could I parse lines like these?
"mr <username> says <message>\n"
where both <username> and <message> may contain arbitrary characters (except eol)?
If I make lexer tokens
"mr " { T_Mr } " says " { T_Says } \r?\n { T_Eol } . { T_Char $$ }
and parser
'mr ' { T_Mr } ' says ' { T_Says } eol { T_Eol } char { T_Char }
...
line :: { (String, String) } : 'mr ' string ' says ' string eol { ($2, $4) }
string :: { String } : char { [ $1 ] } | char string { $1 : $2 }
then I get error when <username> or <message> contain "mr " substrings, because parser encounters T_Mr token.
Workaround is mention all small tokens in my <string> definition:
string :: { String } : { [] } | 'mr ' string { "mr " ++ $2 } | ' says ' string { " says " ++ $2 } | char string { $1 : $2 }
but that is weird and I'm sure there is a better way.
I don't have an implementation right now but you could try having some states or user data in which to record whether you have already parsed the 'mr ' part (etc..) Guess you could use monadUserData parser (just like I've found after a night without sleep [1] - solved now).
-- Mihai
[1]: http://www.haskell.org/pipermail/haskell-cafe/2011-February/089330.html

On 16 February 2011 15:31, Roman Dzvinkovsky
using alex+happy, how could I parse lines like these?
"mr <username> says <message>\n"
Alex has both user states and powerful regex and character set operators (complement and set difference), that said, LR parsing plus Alex lexing doesn't look like a satisfactory match for the input format. I'd either go with regexps or write a hand-coded lexer and do all the work the work in the lexer as the result just needs to be a list of pairs [(String,String)]. If you are using this input format as a test-case for learning how to use Happy+Alex, it isn't a good start point. You'd be better choosing something with more structure and less problematic tokens such as an expression parser.
participants (3)
-
Mihai Maruseac
-
Roman Dzvinkovsky
-
Stephen Tetley