A Pascal-like Language Compiler

Hi all, I have been learning haskell for the last month or so. I have just started becoming familiar with monads and looking at Parsec. As part of the Language Translators Course this semester, in the university I study, we are to implement a Pascal-like toy language to C translator[1]. The aim is to take a simple subset of Pascal and translate it to an intermediate 3 operand format, before translating it to simple C. The time I have for this project is about a month. I am contemplating doing this in Haskell, since I have read in several places that Haskell (and other functional languages) is very good for writing compilers. In fact, I have started writing code for lexing the basic tokens. I do have doubts about my ability to finish it. The curiosity to do something of a moderate size in Haskell, and the extremely friendly folks on #haskell, tipped me on this decision. I looked around for similar things on the internet, in vain. I have several doubts at this stage some of which include, * how the parser will determine the scope of the variables, i.e, how can a symbol table be implemented (in the state of the parser ?) * how easy or difficult will basic code optimisation (like unused variables, and loops with no side effects) and type checking be? * how would one go about implementing basic error recovery in the parser (make it skip the current line, or block) ? I would love it if I get a few pointers from people who have been there, done that. Or am I being too ambitious too early? :) [1] One of the suggested programming exercises at the end of the "Dragon" Compilers book, by Aho, et al. -- Pranesh S

Oops, should have hit the replay-to-all button, so sorry Pranesh for
double posting.
Hi,
have you considered using BNCF[1]? With BNFC you can define a grammar
for your language then let the BNFC converter generate a Haskell
parser (along with an ADT representing syntax trees).
On Mon, Oct 20, 2008 at 2:09 PM, Pranesh Srinivasan
I looked around for similar things on the internet, in vain. I have several doubts at this stage some of which include,
* how the parser will determine the scope of the variables, i.e, how can a symbol table be implemented (in the state of the parser ?)
I think it is probably better to exclude the variable look up, type checking etc from the parsing phase. Here's an example of a general scheme: 1. Parse input file into an abstract syntax tree representation. 2. Perform type checking on your syntax tree. 3. Transform the syntax tree using re-write rules and optimisations. 4. Pretty print the syntax tree in order to output code of your target language.
* how easy or difficult will basic code optimisation (like unused variables, and loops with no side effects) and type checking be?
I would say probably easier then in many other languages since Haskell has such a great support for recursive data types, pattern matching etc.
* how would one go about implementing basic error recovery in the parser (make it skip the current line, or block) ?
Not really sure what you mean by error recovery but when using BNFC all of the tedious work of implementing the parsers is automated. Regards, Joel [1] http://www.cs.chalmers.se/Cs/Research/Language-technology/BNFC/

Hey all, Thanks to everyone for replying. I was severly caught up with work for the last two days. The link Larry gave for the lookahead states seems very nice :). But I am not sure if Ill be looking to calculate the lookahead states myself, or let BNFC do the job? In either case, I think merely printing the line number where the error occured should do in the worse case. It will be exciting to have nice error messages though. I am starting to reliase the advantage of pattern matching being present, like Chris had said. I mean it definitely beats maintaining and checking a flag, the way you would do it in C :)
Definitely not, just go for it. In the IPL course @ UU we implemented Thanks, Chris :).
1. Parse input file into an abstract syntax tree representation. 2. Perform type checking on your syntax tree. 3. Transform the syntax tree using re-write rules and optimisations. 4. Pretty print the syntax tree in order to output code of your target language.
That seems like a very nice scheme to follow. I had a similar method in mind. Step 3 is what I am really worried about. How easy/difficult will it be in a pure func language, to "transform" the sytnax tree. I have to take a deeper look at BNFC. But from an initial look, it seems way too powerful for me to use? At least as powerful as yacc. And that with Haskell, should give me a very good toolset-advantage? @Chris : The ipl course website, seems very helpful, especially some of the lectures on abstract syntax. -- Pranesh Srinivasan, Third Year Student, Computer Science & Engineering, Indian Institute of Technology - Madras. http://spranesh.googlepages.com http://www.cse.iitm.ac.in/~spranesh/

On Thu, Oct 23, 2008 at 12:18 AM, Pranesh Srinivasan
That seems like a very nice scheme to follow. I had a similar method in mind. Step 3 is what I am really worried about. How easy/difficult will it be in a pure func language, to "transform" the sytnax tree.
I think the pureness is actually an advantage here, since the transformation is a function from one syntax tree to another. If however, you feel the need of keeping a state and you don't wanna pass around this explicitly, you may consider using a state monad.
I have to take a deeper look at BNFC. But from an initial look, it seems way too powerful for me to use? At least as powerful as yacc. And that with Haskell, should give me a very good toolset-advantage?
If the project is about writing parsers then using BNFC would probably be considered cheating. But for practical purposes I think it's a great tool that would save you a lot of work. - Joel

On 10/23/08 03:23, Joel Björnson wrote:
On Thu, Oct 23, 2008 at 12:18 AM, Pranesh Srinivasan
wrote:
[snip]
That seems like a very nice scheme to follow. I had a similar method in mind. Step 3 is what I am really worried about. How easy/difficult will it be in a pure func language, to "transform" the sytnax tree.
I think the pureness is actually an advantage here, since the transformation is a function from one syntax tree to another. If however, you feel the need of keeping a state and you don't wanna pass around this explicitly, you may consider using a state monad.
I have to take a deeper look at BNFC. But from an initial look, it seems way too powerful for me to use? At least as powerful as yacc. And that with Haskell, should give me a very good toolset-advantage?
If the project is about writing parsers then using BNFC would probably be considered cheating. But for practical purposes I think it's a great tool that would save you a lot of work.
- Joel
HI Joel. Does BNFC calculate the lookahead sets for a grammar? I'm attempting to do this with haskell, but I would hate to "reinvent the wheel". I would think if haskell can tranform a syntax tree, then haskell could transform a grammar tree to calculate the lookahead sets. My purpose it not so much create a compiler as illustrate how it's done. I found the Dragon's book description hard to follow and thought redoing it using functors (to transform the tree into another, with different operators) then simplify the transformed trees (one tree for each lookahead set) then find the fixpoint of those trees (really forest: 1 tree for each production) would be a good way to demonstrate the method. -Larry

Hi,
On Thu, Oct 23, 2008 at 3:37 PM, Larry Evans
Does BNFC calculate the lookahead sets for a grammar? I'm attempting to do this with haskell, but I would hate to "reinvent the wheel". I would think if haskell can tranform a syntax tree, then haskell could transform a grammar tree to calculate the lookahead sets. My purpose it not so much create a compiler as illustrate how it's done. I found the Dragon's book description hard to follow and thought redoing it using functors (to transform the tree into another, with different operators) then simplify the transformed trees (one tree for each lookahead set) then find the fixpoint of those trees (really forest: 1 tree for each production) would be a good way to demonstrate the method.
As stated at the BNFC web page, "BNFC is a compiler compiler compiler", i.e. it generates specifications for other lexers and parsers based on the given grammar (expressed in Backus Nour Form). In Haskell mode it outputs specifications for Alex[1] and Happy[2] which in turn can be used to generate haskell code for lexing and parsing. So if I understood you correctly, I think the answer to your question is no. But maybe you wanna have a look the haskell parser code generated by Happy? Unfortunately I really don't know anything about the implementations of either BNFC or Happy, so I'm sure others may provide you with more insightful explanations. - Joel [1] http://www.haskell.org/alex/ [2] http://www.haskell.org/happy/
participants (3)
-
Joel Björnson
-
Larry Evans
-
Pranesh Srinivasan