Expression parsing problem

I'm writing a paper as a replacement for writing exam and decided to write a simple compiler (got a little experience with it). However, I got trouble in parsing expression. The grammar: expression = "get" | [ "+" | "-" ] term { ( "+" | "-" ) term } term = factor { ( "*" | "/" ) factor } factor = IDENTIFIER | VALUE | "(" expression ")" I can't make term parse, for instance "1 * 2 / 3" (the number is not important, identifier is also accepted). It stops after parsing 2, i.e. only the first multiplication is parsed. Interchanging * and / gives the same result, only differs in operation. Whichever got encountered first will be parsed. The same problem also arises from expression, where it can't parse "1 + 2 - 3". Both problems are identical, but I can't figure out what's wrong (don't count the optional +/- before term in expression, I haven't done it yet). Sorry, but I'm lack of knowledge about Monad. I know it can be done better with it, but I need to learn a lot about it, while I don't have enough time (only 2 weeks). Below are necessary definitions for the parser (some taken from the scanner). For testing purpose, please try: expression [("1",Value),("+",Plus),("2",Value),("-",Minus),("3",Value),("EOF",EOF)] term [("1",Value),("*",Times),("2",Value),("/",Slash),("3",Value),("EOF",EOF)] expression [("1",Value),("-",Minus),("2",Value),("+",Plus),("3",Value),("EOF",EOF)] term [("1",Value),("/",Slash),("2",Value),("*",Times),("3",Value),("EOF",EOF)]
data Token = Identifier | OpenBlock | CloseBlock | SemiColon | Slash | Equals | OpenBrace | CloseBrace | Minus | Times | Plus | Nil | Value | Var | Const | Put | Get | Comma | EOF deriving (Show,Eq)
type Symbol = (String,Token) type ASL = [Symbol]
type ParseFunc = ASL -> (ASL,[String])
expression :: ParseFunc expression (h:s) | snd h == Get = (s,["IN"]) | op `elem` [Plus,Minus] = (s2,r1 ++ r2 ++ [operation op]) | otherwise = (s1,r1) where (s1,r1) = term (h:s) (s2,r2) = term $ tail s1 op = if s1 /= [] then snd $ head s1 else Nil expression s = (s,[])
term :: ParseFunc term s = if op `elem` [Times,Slash] then (s2,r1 ++ r2 ++ [operation op]) else (s1,r1) where (s1,r1) = factor s (s2,r2) = factor $ tail s1 op = if s1 /= [] then snd $ head s1 else Nil
factor :: ParseFunc factor ((id,Identifier):s) = (s,["LOAD " ++ id]) factor ((val,Value):s) = (s,["PUSH " ++ val]) factor (("(",OpenBrace):s) = if head s1 == (")",CloseBrace) then (tail s1,r1) else error $ "\")\" expected, got" ++ (show $ fst $ head s1) where (s1,r1) = expression s factor s = (s,[])
-- View this message in context: http://www.nabble.com/Expression-parsing-problem-tp23610457p23610457.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

The grammar: expression = "get" | [ "+" | "-" ] term { ( "+" | "-" ) term } term = factor { ( "*" | "/" ) factor } factor = IDENTIFIER | VALUE | "(" expression ")"
I can't make term parse, for instance "1 * 2 / 3"
Indeed, the grammar does not admit "1*2/3" as a sentence of that language although it will admit "(1*2)/3" or "1*(2/3)". If you wish to allow sequences of infix operators without bracketting, then examples of the standard grammar for this can be found by searching the web for "expression term factor", e.g. http://en.wikipedia.org/wiki/Syntax_diagram suggests: expression ::= term | term "+" expression term ::= factor | factor "*" term factor ::= constant | variable | "(" expression ")" Regards, Malcolm

Indeed, the grammar does not admit "1*2/3" as a sentence ...
Huh? Why not? "1 * 2 / 3" should match factor "*" factor "/" factor. Remember that { } is repetition, so it should be able to handle such term.
expression ::= term | term "+" expression term ::= factor | factor "*" term factor ::= constant | variable | "(" expression ")"
Oh, left recursion. Well, it should be easy to transform: expression ::= term | moreTerm term ::= factor | moreFactor moreTerm ::= term "+" expression factor ::= constant | variable | "(" expression ")" moreFactor := factor "*" term correct? -- View this message in context: http://www.nabble.com/Expression-parsing-problem-tp23610457p23611617.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Hello,
2009/5/19 leledumbo
expression ::= term | term "+" expression term ::= factor | factor "*" term factor ::= constant | variable | "(" expression ")"
Oh, left recursion. Well, it should be easy to transform:
expression ::= term | moreTerm term ::= factor | moreFactor moreTerm ::= term "+" expression factor ::= constant | variable | "(" expression ")" moreFactor := factor "*" term
correct?
I think not. See for instance:
expression ::= term | moreTerm moreTerm ::= term "+" expression
An expression begins by a term or a moreTerm… which itself begins by a term. You still have the left recursion problem, I think. What you mean was probably that: expression ::= term moreTerm term ::= factor moreFactor factor ::= constant | variable | "(" expression ")" moreTerm ::= "+" expression | nothing moreFactor ::= "*" expression | nothing nothing ::= Unfortunately, if this work (I'm not entirely sure), it is right associative. Example of parsing left associative operators can be found on the net, however. Finally, I strongly suggest you to take a look at the Parsec library [1] (unless you can't?). It provide a "buildExpressionParser" function which takes care of associativity and precedence for you. [1] http://legacy.cs.uu.nl/daan/download/parsec/parsec.html

Haha... yes, thanks. It was a mistake, I thought I did it too fast. -- View this message in context: http://www.nabble.com/Expression-parsing-problem-tp23610457p23632282.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Why is Symbol = (String, Token)? A more sensible token type would
include values in the Value constructor and string identifiers in the
Identifier constructor; the strings in everything else seem redundant.
A more pure/monadic parser would have a type like this:
data Result a = Error String | OK [a]
newtype Parser a = Parser (ASL -> Result (ASL, a))
Try to write these functions:
return :: a -> Parser a
(>>=) :: Parser a -> (a -> Parser b) -> Parser b
Next write some simple state modification:
token :: Parser Token
(or, if you insist on your symbol type)
token :: Parser Symbol
expect :: Token -> Parser ()
Then build on these to write:
expression :: Parser Expression
term :: Parser Expression
factor :: Parser Expression
for some suitable type Expression
Good luck, sounds like a tough but interesting project!
-- ryan
On Mon, May 18, 2009 at 11:28 PM, leledumbo
I'm writing a paper as a replacement for writing exam and decided to write a simple compiler (got a little experience with it). However, I got trouble in parsing expression.
The grammar: expression = "get" | [ "+" | "-" ] term { ( "+" | "-" ) term } term = factor { ( "*" | "/" ) factor } factor = IDENTIFIER | VALUE | "(" expression ")"
I can't make term parse, for instance "1 * 2 / 3" (the number is not important, identifier is also accepted). It stops after parsing 2, i.e. only the first multiplication is parsed. Interchanging * and / gives the same result, only differs in operation. Whichever got encountered first will be parsed.
The same problem also arises from expression, where it can't parse "1 + 2 - 3". Both problems are identical, but I can't figure out what's wrong (don't count the optional +/- before term in expression, I haven't done it yet).
Sorry, but I'm lack of knowledge about Monad. I know it can be done better with it, but I need to learn a lot about it, while I don't have enough time (only 2 weeks).
Below are necessary definitions for the parser (some taken from the scanner).
For testing purpose, please try: expression [("1",Value),("+",Plus),("2",Value),("-",Minus),("3",Value),("EOF",EOF)] term [("1",Value),("*",Times),("2",Value),("/",Slash),("3",Value),("EOF",EOF)] expression [("1",Value),("-",Minus),("2",Value),("+",Plus),("3",Value),("EOF",EOF)] term [("1",Value),("/",Slash),("2",Value),("*",Times),("3",Value),("EOF",EOF)]
data Token = Identifier | OpenBlock | CloseBlock | SemiColon | Slash | Equals | OpenBrace | CloseBrace | Minus | Times | Plus | Nil | Value | Var | Const | Put | Get | Comma | EOF deriving (Show,Eq)
type Symbol = (String,Token) type ASL = [Symbol]
type ParseFunc = ASL -> (ASL,[String])
expression :: ParseFunc expression (h:s) | snd h == Get = (s,["IN"]) | op `elem` [Plus,Minus] = (s2,r1 ++ r2 ++ [operation op]) | otherwise = (s1,r1) where (s1,r1) = term (h:s) (s2,r2) = term $ tail s1 op = if s1 /= [] then snd $ head s1 else Nil expression s = (s,[])
term :: ParseFunc term s = if op `elem` [Times,Slash] then (s2,r1 ++ r2 ++ [operation op]) else (s1,r1) where (s1,r1) = factor s (s2,r2) = factor $ tail s1 op = if s1 /= [] then snd $ head s1 else Nil
factor :: ParseFunc factor ((id,Identifier):s) = (s,["LOAD " ++ id]) factor ((val,Value):s) = (s,["PUSH " ++ val]) factor (("(",OpenBrace):s) = if head s1 == (")",CloseBrace) then (tail s1,r1) else error $ "\")\" expected, got" ++ (show $ fst $ head s1) where (s1,r1) = expression s factor s = (s,[])
-- View this message in context: http://www.nabble.com/Expression-parsing-problem-tp23610457p23610457.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Why is Symbol = (String, Token)? A more sensible token type would include values in the Value constructor and string identifiers in the Identifier constructor; the strings in everything else seem redundant.
Surely you didn't read my original post, do you? I have a very limited knowledge of Monad and I try to find a solution using my current skills because the due date is within two weeks. Therefore, I don't think I can create a Monadic parser for this. -- View this message in context: http://www.nabble.com/Expression-parsing-problem-tp23610457p23612618.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.

Surely you didn't read my original post, do you? I have a very limited knowledge of Monad and I try to find a solution using my current skills because the due date is within two weeks. Therefore, I don't think I can create a Monadic parser for this.
I think you're giving up way too easily. My claim is that learning to make a monadic parser would actually be *faster* than implementing it this way, and you'll be a better programmer at the end of it. There's a great functional pearl on this at http://www.cs.nott.ac.uk/~gmh/bib.html#pearl ; you do yourself a disservice if you don't read it. It's only 7 pages! -- ryan

I hope you're right. 7 pages... 1-2 nights should be enough. Thanks for all. -- View this message in context: http://www.nabble.com/Expression-parsing-problem-tp23610457p23614011.html Sent from the Haskell - Haskell-Cafe mailing list archive at Nabble.com.
participants (4)
-
leledumbo
-
Loup Vaillant
-
Malcolm Wallace
-
Ryan Ingram