[Haskell-cafe] Overlaping Parsec rules

7 Mar 2011

      Hi,

to solve this ICFP task http://www.cs.cornell.edu/icfp/task.htm I'm currnetly working on the parser. With the hint from Thu (reading Phillip Wadlers monadic parser paper) and consulting http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing I produce a first working version of the parser.
After this great moment I completed the token definition and near all parsing rules. For the complete code have a look at http://code.google.com/p/hgmltracer/source/browse/#svn%2Ftrunk%2FhGmlTracer

data GmlToken =

        -- structures

        FunctionToken TokenSequence     |

        ArrayToken TokenSequence        |

        -- control operators

        ApplyToken                      |

        IfToken                         |

        -- number operators

        AddiToken                       |

        AddfToken                       |

        ACosToken                       |

        ASinToken                       |

        ClampfToken                     |

        CosToken                        |

        DiviToken                       |

        DivfToken                       |

        EqiToken                        |

        EqfToken                        |

        FloorToken                      |

        FracToken                       |

        LessiToken                      |

        LessfToken                      |

        ModiToken                       |

        MuliToken                       |

        MulfToken                       |

        NegiToken                       |

        NegfToken                       |

        ToRealToken                     |

        SinToken                        |

        SqrtToken                       |

        SubiToken                       |

        SubfToken                       |

        -- points

        GetXToken                       |

        GetYToken                       |

        GetZToken                       |

        PointToken                      |

        -- arrays

        GetToken                        |

        LengthToken                     |

        -- environment

        IdentifierToken String          |

        BinderToken String              |

        -- types

        BoolToken Bool                  |

        IntToken Int                    |

        RealToken Double                |

        StringToken String              deriving Show

And

parseGml :: String -> [GmlToken]

parseGml input = case parse parseList "gml" input of

    Left err -> error ("Parse error: " ++ (show err))

    Right gml -> gml

parseList = sepBy parseGml' spaces

parseGml' =

        -- structures

--        parseFunction

--    <|> parseArray

        -- control operators

        parseControlOperators

        -- number operators

    <|> parseNumberOperators

        -- points

    <|> parsePointOperators

        -- arrays

    <|> parseArrayOperators

        -- types

    <|> parseBool

    <|> parseInteger

    <|> parseString

        -- environment

    <|> parseIdentifier

    <|> parseBinder

parseArray = parseSequence '[' ']'

parseFunction = parseSequence '{' '}'

parseSequence start end = do char start

                             array <- parseList

                             char end

                             return $ ArrayToken array

parseControlOperators = parseApply <|> parseIf

parseApply = do string "apply"

                return $ ApplyToken

parseIf = do string "if"

             return $ IfToken

parseNumberOperators = do string "addi"

                          return $ AddiToken

                   <|> do string "addf"

                          return $ AddfToken

                   <|> do string "acos"

                          return $ ACosToken

                   <|> do string "asind"

                          return $ ASinToken

                   <|> do string "clampf"

                          return $ ClampfToken

                   <|> do string "cos"

                          return $ CosToken

                   <|> do string "divi"

                          return $ DiviToken

                   <|> do string "divf"

                          return $ DivfToken

                   <|> do string "eqi"

                          return $ EqiToken

                   <|> do string "eqf"

                          return $ EqfToken

                   <|> do string "floor"

                          return $ FloorToken

                   <|> do string "frac"

                          return $ FracToken

                   <|> do string "lessi"

                          return $ LessiToken

                   <|> do string "lessf"

                          return $ LessfToken

                   <|> do string "modi"

                          return $ ModiToken

                   <|> do string "muli"

                          return $ MuliToken

                   <|> do string "mulf"

                          return $ MulfToken

                   <|> do string "negi"

                          return $ NegiToken

                   <|> do string "negf"

                          return $ NegfToken

                   <|> do string "real"

                          return $ ToRealToken

                   <|> do string "sin"

                          return $ SinToken

                   <|> do string "Sqrt"

                          return $ SqrtToken

                   <|> do string "subi"

                          return $ SubiToken

                   <|> do string "subf"

                          return $ SubfToken

parsePointOperators = do string "getx"

                         return $ GetXToken

                  <|> do string "gety"

                         return $ GetYToken

                  <|> do string "getz"

                         return $ GetZToken

                  <|> do string "point"

                         return $ PointToken

parseArrayOperators = do string "get"

                         return $ GetToken

                  <|> do string "length"

                         return $ LengthToken

parseInteger = liftM (IntToken . read) $ many1 digit

parseBool = do string "true"

               return $ BoolToken True

        <|> do string "false"

               return $ BoolToken False

parseString = do char '"'

                 string <- many (noneOf "\"")

                 char '"'

                 return $ StringToken string

parseIdentifier = do identifier <- many (noneOf " ")

                     return $ IdentifierToken identifier

parseBinder = do char '/'

                 binder <- many (noneOf " ")

                 return $ BinderToken binder

parseComment = do char '%'

                  many (noneOf "")

                  newline

                  return $ ()

spaces = skipMany1 space

After gluing all this together in my mind all worked well. But it doesn't. The test file for parsing looks like:

{ /self /n
  n 2 lessi
  { 1 }
  { n 1 subi self self apply n muli }
  if
} /fact

12 fact fact apply

*       I think there is a problem with overlaping rules. There is a parser rule consuming "negi" and resulting in the NegiToken. A single "n" is a valid identifier. For the example file my parser says: unexpected " ", expecting "negi"
*       I think the same problem is present for "parseInteger" and "parseReal" (currently no in code but looks like "parseReal = do a <- many1 digit \n char '.' \n b <- many1 digit \n return $ RealToken (read (a ++ "." ++ b)"
*       Something with "parseFunction" is going really wrong.
*       All parsig rules are designed with the condition that each "construct" is separated by whitespaces. For this before parsing the input will be preprocessed: insert spaces, removing whitespaces and so on. Now in the parsed result appears (IdentifierToken ""). I think my version is not the best way to parse a identifer:
parseIdentifier = do identifier <- many (noneOf " ")
                     return $ IdentifierToken identifier

Please help me.

Klaus

[Haskell-cafe] Overlaping Parsec rules

Hauschild, Klaus (EXT)