Re: [Haskell-cafe] Overlaping Parsec rules

7 Mar 2011

      Thanks Christian,

I adapted the "keyword" parser and now "n" <-> "negi" does not occur.

But there are still other problems to solve. If I activate the parseFunction the parser will answer to fact.gml unexpected end of input, expecting space or "}".

Any ideas?

-----Ursprüngliche Nachricht-----
Von: Christian Maeder [mailto:Christian.Maeder@dfki.de] 
Gesendet: Montag, 7. März 2011 12:23
An: Hauschild, Klaus (EXT)
Cc: haskell-cafe@haskell.org
Betreff: Re: Overlaping Parsec rules

You should parse keywords using:

keyword s = try (string s) >> notFollowedBy (letter <|> digit)

C.

Am 07.03.2011 11:34, schrieb Hauschild, Klaus (EXT):
...
Hi,
to solve this ICFP task _http://www.cs.cornell.edu/icfp/task.htm_ I'm
currnetly working on the parser. With the hint from Thu (reading Phillip
Wadlers monadic parser paper) and consulting
_http://en.wikibooks.org/wiki/Write_Yourself_a_Scheme_in_48_Hours/Parsing_
I produce a first working version of the parser.
After this great moment I completed the token definition and near all
parsing rules. For the complete code have a look at
_http://code.google.com/p/hgmltracer/source/browse/#svn%2Ftrunk%2FhGmlTracer_
data GmlToken =
-- structures
FunctionToken TokenSequence     |
ArrayToken TokenSequence        |
-- control operators
ApplyToken                      |
IfToken                         |
-- number operators
AddiToken                       |
AddfToken                       |
ACosToken                       |
ASinToken                       |
ClampfToken                     |
CosToken                        |
DiviToken                       |
DivfToken                       |
EqiToken                        |
EqfToken                        |
FloorToken                      |
FracToken                       |
LessiToken                      |
LessfToken                      |
ModiToken                       |
MuliToken                       |
MulfToken                       |
NegiToken                       |
NegfToken                       |
ToRealToken                     |
SinToken                        |
SqrtToken                       |
SubiToken                       |
SubfToken                       |
-- points
GetXToken                       |
GetYToken                       |
GetZToken                       |
PointToken                      |
-- arrays
GetToken                        |
LengthToken                     |
-- environment
IdentifierToken String          |
BinderToken String              |
-- types
BoolToken Bool                  |
IntToken Int                    |
RealToken Double                |
StringToken String              deriving Show
And
parseGml :: String -> [GmlToken]
parseGml input = case parse parseList "gml" input of
Left err -> error ("Parse error: " ++ (show err))
Right gml -> gml
parseList = sepBy parseGml' spaces
parseGml' =
-- structures
--        parseFunction
--    <|> parseArray
-- control operators
parseControlOperators
-- number operators
<|> parseNumberOperators
-- points
<|> parsePointOperators
-- arrays
<|> parseArrayOperators
-- types
<|> parseBool
<|> parseInteger
<|> parseString
-- environment
<|> parseIdentifier
<|> parseBinder
parseArray = parseSequence '[' ']'
parseFunction = parseSequence '{' '}'
parseSequence start end = do char start
array <- parseList
char end
return $ ArrayToken array
parseControlOperators = parseApply <|> parseIf
parseApply = do string "apply"
return $ ApplyToken
parseIf = do string "if"
return $ IfToken
parseNumberOperators = do string "addi"
return $ AddiToken
<|> do string "addf"
return $ AddfToken
<|> do string "acos"
return $ ACosToken
<|> do string "asind"
return $ ASinToken
<|> do string "clampf"
return $ ClampfToken
<|> do string "cos"
return $ CosToken
<|> do string "divi"
return $ DiviToken
<|> do string "divf"
return $ DivfToken
<|> do string "eqi"
return $ EqiToken
<|> do string "eqf"
return $ EqfToken
<|> do string "floor"
return $ FloorToken
<|> do string "frac"
return $ FracToken
<|> do string "lessi"
return $ LessiToken
<|> do string "lessf"
return $ LessfToken
<|> do string "modi"
return $ ModiToken
<|> do string "muli"
return $ MuliToken
<|> do string "mulf"
return $ MulfToken
<|> do string "negi"
return $ NegiToken
<|> do string "negf"
return $ NegfToken
<|> do string "real"
return $ ToRealToken
<|> do string "sin"
return $ SinToken
<|> do string "Sqrt"
return $ SqrtToken
<|> do string "subi"
return $ SubiToken
<|> do string "subf"
return $ SubfToken
parsePointOperators = do string "getx"
return $ GetXToken
<|> do string "gety"
return $ GetYToken
<|> do string "getz"
return $ GetZToken
<|> do string "point"
return $ PointToken
parseArrayOperators = do string "get"
return $ GetToken
<|> do string "length"
return $ LengthToken
parseInteger = liftM (IntToken . read) $ many1 digit
parseBool = do string "true"
return $ BoolToken True
<|> do string "false"
return $ BoolToken False
parseString = do char '"'
string <- many (noneOf "\"")
char '"'
return $ StringToken string
parseIdentifier = do identifier <- many (noneOf " ")
return $ IdentifierToken identifier
parseBinder = do char '/'
binder <- many (noneOf " ")
return $ BinderToken binder
parseComment = do char '%'
many (noneOf "")
newline
return $ ()
spaces = skipMany1 space
After gluing all this together in my mind all worked well. But it
doesn't. The test file for parsing looks like:
{ /self /n
  n 2 lessi
  { 1 }
  { n 1 subi self self apply n muli }
  if
} /fact
12 fact fact apply
* I think there is a problem with overlaping rules. There is a
      parser rule consuming "negi" and resulting in the NegiToken. A
      single "n" is a valid identifier. For the example file my parser
      says: unexpected " ", expecting "negi"
    * I think the same problem is present for "parseInteger" and
      "parseReal" (currently no in code but looks like "parseReal = do a
      <- many1 digit \n char '.' \n b <- many1 digit \n return $
      RealToken (read (a ++ "." ++ b)"
    * Something with "parseFunction" is going really wrong.
    * All parsig rules are designed with the condition that each
      "construct" is separated by whitespaces. For this before parsing
      the input will be preprocessed: insert spaces, removing
      whitespaces and so on. Now in the parsed result appears
      (IdentifierToken ""). I think my version is not the best way to
      parse a identifer:
parseIdentifier = *do* identifier <- many (noneOf " ")
                     return $ *IdentifierToken* identifier
Please help me.
Klaus
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe