Overlaping Parsec rules

7 Mar 2011

      Hi,

to solve this ICFP task http://www.cs.cornell.edu/icfp/task.htm I'm currnetly working on the parser. With the hint from Thu (reading Phillip Wadlers monadic parser paper) and consulting http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing I produce a first working version of the parser.
After this great moment I completed the token definition and near all parsing rules. For the complete code have a look at http://code.google.com/p/hgmltracer/source/browse/#svn%2Ftrunk%2FhGmlTracer

data GmlToken =

        -- structures

        FunctionToken TokenSequence     |

        ArrayToken TokenSequence        |

        -- control operators

        ApplyToken                      |

        IfToken                         |

        -- number operators

        AddiToken                       |

        AddfToken                       |

        ACosToken                       |

        ASinToken                       |

        ClampfToken                     |

        CosToken                        |

        DiviToken                       |

        DivfToken                       |

        EqiToken                        |

        EqfToken                        |

        FloorToken                      |

        FracToken                       |

        LessiToken                      |

        LessfToken                      |

        ModiToken                       |

        MuliToken                       |

        MulfToken                       |

        NegiToken                       |

        NegfToken                       |

        ToRealToken                     |

        SinToken                        |

        SqrtToken                       |

        SubiToken                       |

        SubfToken                       |

        -- points

        GetXToken                       |

        GetYToken                       |

        GetZToken                       |

        PointToken                      |

        -- arrays

        GetToken                        |

        LengthToken                     |

        -- environment

        IdentifierToken String          |

        BinderToken String              |

        -- types

        BoolToken Bool                  |

        IntToken Int                    |

        RealToken Double                |

        StringToken String              deriving Show

And

parseGml :: String -> [GmlToken]

parseGml input = case parse parseList "gml" input of

    Left err -> error ("Parse error: " ++ (show err))

    Right gml -> gml

parseList = sepBy parseGml' spaces

parseGml' =

        -- structures

--        parseFunction

--    <|> parseArray

        -- control operators

        parseControlOperators

        -- number operators

    <|> parseNumberOperators

        -- points

    <|> parsePointOperators

        -- arrays

    <|> parseArrayOperators

        -- types

    <|> parseBool

    <|> parseInteger

    <|> parseString

        -- environment

    <|> parseIdentifier

    <|> parseBinder

parseArray = parseSequence '[' ']'

parseFunction = parseSequence '{' '}'

parseSequence start end = do char start

                             array <- parseList

                             char end

                             return $ ArrayToken array

parseControlOperators = parseApply <|> parseIf

parseApply = do string "apply"

                return $ ApplyToken

parseIf = do string "if"

             return $ IfToken

parseNumberOperators = do string "addi"

                          return $ AddiToken

                   <|> do string "addf"

                          return $ AddfToken

                   <|> do string "acos"

                          return $ ACosToken

                   <|> do string "asind"

                          return $ ASinToken

                   <|> do string "clampf"

                          return $ ClampfToken

                   <|> do string "cos"

                          return $ CosToken

                   <|> do string "divi"

                          return $ DiviToken

                   <|> do string "divf"

                          return $ DivfToken

                   <|> do string "eqi"

                          return $ EqiToken

                   <|> do string "eqf"

                          return $ EqfToken

                   <|> do string "floor"

                          return $ FloorToken

                   <|> do string "frac"

                          return $ FracToken

                   <|> do string "lessi"

                          return $ LessiToken

                   <|> do string "lessf"

                          return $ LessfToken

                   <|> do string "modi"

                          return $ ModiToken

                   <|> do string "muli"

                          return $ MuliToken

                   <|> do string "mulf"

                          return $ MulfToken

                   <|> do string "negi"

                          return $ NegiToken

                   <|> do string "negf"

                          return $ NegfToken

                   <|> do string "real"

                          return $ ToRealToken

                   <|> do string "sin"

                          return $ SinToken

                   <|> do string "Sqrt"

                          return $ SqrtToken

                   <|> do string "subi"

                          return $ SubiToken

                   <|> do string "subf"

                          return $ SubfToken

parsePointOperators = do string "getx"

                         return $ GetXToken

                  <|> do string "gety"

                         return $ GetYToken

                  <|> do string "getz"

                         return $ GetZToken

                  <|> do string "point"

                         return $ PointToken

parseArrayOperators = do string "get"

                         return $ GetToken

                  <|> do string "length"

                         return $ LengthToken

parseInteger = liftM (IntToken . read) $ many1 digit

parseBool = do string "true"

               return $ BoolToken True

        <|> do string "false"

               return $ BoolToken False

parseString = do char '"'

                 string <- many (noneOf "\"")

                 char '"'

                 return $ StringToken string

parseIdentifier = do identifier <- many (noneOf " ")

                     return $ IdentifierToken identifier

parseBinder = do char '/'

                 binder <- many (noneOf " ")

                 return $ BinderToken binder

parseComment = do char '%'

                  many (noneOf "")

                  newline

                  return $ ()

spaces = skipMany1 space

After gluing all this together in my mind all worked well. But it doesn't. The test file for parsing looks like:

{ /self /n
  n 2 lessi
  { 1 }
  { n 1 subi self self apply n muli }
  if
} /fact

12 fact fact apply

*       I think there is a problem with overlaping rules. There is a parser rule consuming "negi" and resulting in the NegiToken. A single "n" is a valid identifier. For the example file my parser says: unexpected " ", expecting "negi"
*       I think the same problem is present for "parseInteger" and "parseReal" (currently no in code but looks like "parseReal = do a <- many1 digit \n char '.' \n b <- many1 digit \n return $ RealToken (read (a ++ "." ++ b)"
*       Something with "parseFunction" is going really wrong.
*       All parsig rules are designed with the condition that each "construct" is separated by whitespaces. For this before parsing the input will be preprocessed: insert spaces, removing whitespaces and so on. Now in the parsed result appears (IdentifierToken ""). I think my version is not the best way to parse a identifer:
parseIdentifier = do identifier <- many (noneOf " ")
                     return $ IdentifierToken identifier

Please help me.

Klaus

Hauschild, Klaus (EXT)

Christian Maeder

Hauschild, Klaus (EXT)

Christian Maeder

Hauschild, Klaus (EXT)

Christian Maeder

Christian Maeder

Christian Maeder

Hauschild, Klaus (EXT)

Christian Maeder

Christian Maeder

Christian Maeder

Stephen Tetley

Christian Maeder

Daniel Fischer

Hauschild, Klaus (EXT)

tags

participants (4)