
Joel Reymont wrote:
Meaning that if a keyword Output is followed by ":" and an identifier and then "(NumericSimple)" then add identifier to the symbol table as a Number and box it in a constructor.
Then in my lexer I do a lookup to check if I have seen this identifier and if I have seen one of type TypeNumOut I return the token NUM instead of ID. This ensures that I can have rules with the token NUM as opposed to ID everywhere.
I use a set of strings for the symbol table (I don't record the types of the identifiers, but you can add it back). I don't allow for whitespace, but you can add it back. The parser returns a string rather than a constructor with a string, but you can add it back. It is necessary to fuse the lexer and the parser together, so that they share state; but we can fuse them in a way that still leaves recognizable boundary, e.g., in the below, string "blah", ident, num, name, and numeric_simple are lexers (thus when you add back whitespace you know who are the suspects), and p0 is a parser that calls the lexers and do extra. The name lexer returns a sum type, so you can use its two cases to signify whether a name is in the table or not; then ident and num can fail on the wrong cases. (Alternatively, you can eliminate the sum type by copying the name code into the ident code and the num code.) import Text.ParserCombinators.Parsec import Monad(mzero) import Data.Set as Set main = do { input <- getLine ; print (runParser p0 Set.empty "stdin" input) } p0 = do { string "Output" ; string ":" ; i <- ident ; string "(" ; numeric_simple ; string ")" ; updateState (Set.insert i) ; return i } numeric_simple = many digit ident = do { n <- name ; case n of { ID i -> return i ; _ -> mzero } } name = do { c0 <- letter ; cs <- many alphaNum ; let n = c0 : cs ; table <- getState ; return (if n `Set.member` table then NUM n else ID n) } data Name = NUM String | ID String num = do { n <- name ; case n of { NUM i -> return i ; _ -> mzero } }