Parsing indentation-based languages with Parsec

13 Apr 2007

      Hi, first time list poster :)

I've searched around a bit but haven't been able to find any examples of
this. I want to be able to parse a language (such as Haskell, Python)
which has only EOL as the 'statement' separator and has indentation
levels to indicate block structure. Whilst doing this I want to use
Parsec's nice library.

The first thing I noticed was that Parsec's whiteSpace parser will
ignore EOL as just whiteSpace, so I need to redefine that. Is this the
correct way to do it? I've only been using Haskell for a week or so so
I'm not too sure on the record structures and updating them...

lexer :: P.TokenParser ()
lexer = (
	P.makeTokenParser
	emptyDef
		{
			commentLine    = "#",
			nestedComments = True,
                        identStart     = letter,
                        identLetter    = letter,
                        opStart        = oneOf "+*/-=",
                        opLetter       = oneOf "+*/-=",
                        reservedNames  = [],
                        reservedOpNames = [],
                        caseSensitive = False
		}
	)
		{ --update lexer fields
			P.whiteSpace = do --just gobble spaces
				many (char ' ')
				return ()
                }

(I got the basic code from the tutorial contained within the Parsec
docs.)

For handling the indented blocks I thought I would use something to hold
current indentation state, as Parsec has support for threading state
through all the parsers.

Is this the right way to go about this? Has anyone done the 'groundwork'
with parsing such languages so I don't need to reinvent this?

Thanks in advance,
- porges.

George Pollard

tags

participants (1)