http://www.cs.cornell.edu/icfp/task.htm

Hi, Currently I'm trying to lern Haskell by doing. After doing some examples I plan to solve an ICFP task (see subject). In short: build an interpreter for a stack based language thata describes a 3D scene. Ray trace this scene in an image. My current source state can be found here: http://code.google.com/p/hgmltracer/source/browse/ My first goal is to develop the interpreter of the GML language. My source contains a data GmlToken with various constructors for the different operators, sequences, int, real, string an so on. Some code for the evaluation is also there und working. But the parser converting a string (contained in the program file provided as command line argument) is hard stuff and I'm stuck. Can any one help me with ideas or concepts for this parser? Thanks

2011/2/25 Hauschild, Klaus (EXT)
Hi,
Currently I'm trying to lern Haskell by doing. After doing some examples I plan to solve an ICFP task (see subject). In short: build an interpreter for a stack based language thata describes a 3D scene. Ray trace this scene in an image. My current source state can be found here: http://code.google.com/p/hgmltracer/source/browse/ My first goal is to develop the interpreter of the GML language. My source contains a data GmlToken with various constructors for the different operators, sequences, int, real, string an so on. Some code for the evaluation is also there und working. But the parser converting a string (contained in the program file provided as command line argument) is hard stuff and I'm stuck.
Can any one help me with ideas or concepts for this parser?
Hi, You have chosen to develop a very interesting program and I'm sure a lot of people could help you if you face any problem. But it would be easier for those people to help you if you could be a bit more specific about what works (or what you understand) and where you have some problem. In this case, maybe you could copy in the mail some code (possibly cutting it down to what really matter). For instance, I think central to your mail is this data structure: data GmlToken = IntToken Int | RealToken Double | BoolToken Bool | StringToken String | FunctionToken [GmlSequence] | ArrayTokenToken [GmlSequence] | BinderToken String | IdentifierToken String | AddiToken and the function parse :: [String] -> TokenSequence -> TokenSequence parse = ... At first view I would say that parse should be (and I call it tokenize) tokenize :: String -> [GmlToken] Now, maybe you can provide some sample inputs *with* their expected output and tell us what is really a problem for you. Alternatively I can assume too much (or not enough) things from your mail and you can enlighten me :) Cheers, Thu

Hi Thu,
You read my mind. Ok, for the details. Here are my data structur for the differen tokens (currently not complete):
data GmlToken = IntToken Int
| RealToken Double
| BoolToken Bool
| StringToken String
| FunctionToken TokenSequence
| ArrayToken TokenSequence
| BinderToken String
| IdentifierToken String
| AddiToken
type TokenSequence = [GmlToken]
type TokenStack = [GmlToken]
Now I'm working on the tokenize method. There a public and a private version (indicated by ')
tokenize :: String → TokenSequence
-- foreward to intern method
tokenize program = tokenize' (words program) []
tokenize' :: [String] → TokenSequence → TokenSequence
-- last recursion step
tokenize' [] tokens
= tokens
-- "addi" is a reserved keyword for the AddiToken
tokenize' ("addi" : words) tokens
= tokenize' words (AddiToken : tokens)
-- binder tokens start with '/'
tokenize' (('/' : binder) : words) tokens
= tokenize' words ((BinderToken binder) : tokens)
-- if nothing matches it is an identifier
tokenize' (name : words) tokens
= tokenize' words ((IdentifierToken name) : tokens)
With this code I'm able to tokenize the different reserved keywords of the GML language, binder and identifier (these two handle variable access).
Now the problem: How to write the code to tokenize numbers (integer and double) and arrays and functions. I hope it is clear what is meant with numbers. Arrays and functions are embedded token sequences enclosed by "[" and "]" or "{" and "}".
Some examples:
tokenize' ["2", "2", "addi"] [] results in [IntToken 2, IntToken 2, AddiToken]
tokenize' ["2.0", "2.0", "addi"] [] results in [RealToken 2.0, RealToken 2.0, AddiToken]
tokenize' ["[", "foo", "/bar", "]"] results in [ArrayToken [IdentifierToken "foo", BinderToken "bar"]]
tokenize' ["{", "foo", "/bar", "}"] results in [FunctionToken [IdentifierToken "foo", BinderToken "bar"]]
-----Ursprüngliche Nachricht-----
Von: Vo Minh Thu [mailto:noteed@gmail.com]
Gesendet: Freitag, 25. Februar 2011 17:44
An: Hauschild, Klaus (EXT)
Cc: haskell-cafe@haskell.org
Betreff: Re: [Haskell-cafe] http://www.cs.cornell.edu/icfp/task.htm
2011/2/25 Hauschild, Klaus (EXT)
Hi,
Currently I'm trying to lern Haskell by doing. After doing some examples I plan to solve an ICFP task (see subject). In short: build an interpreter for a stack based language thata describes a 3D scene. Ray trace this scene in an image. My current source state can be found here: http://code.google.com/p/hgmltracer/source/browse/ My first goal is to develop the interpreter of the GML language. My source contains a data GmlToken with various constructors for the different operators, sequences, int, real, string an so on. Some code for the evaluation is also there und working. But the parser converting a string (contained in the program file provided as command line argument) is hard stuff and I'm stuck.
Can any one help me with ideas or concepts for this parser?
Hi, You have chosen to develop a very interesting program and I'm sure a lot of people could help you if you face any problem. But it would be easier for those people to help you if you could be a bit more specific about what works (or what you understand) and where you have some problem. In this case, maybe you could copy in the mail some code (possibly cutting it down to what really matter). For instance, I think central to your mail is this data structure: data GmlToken = IntToken Int | RealToken Double | BoolToken Bool | StringToken String | FunctionToken [GmlSequence] | ArrayTokenToken [GmlSequence] | BinderToken String | IdentifierToken String | AddiToken and the function parse :: [String] -> TokenSequence -> TokenSequence parse = ... At first view I would say that parse should be (and I call it tokenize) tokenize :: String -> [GmlToken] Now, maybe you can provide some sample inputs *with* their expected output and tell us what is really a problem for you. Alternatively I can assume too much (or not enough) things from your mail and you can enlighten me :) Cheers, Thu

Hello,
From what I see here, you can use a well-known technique call descent recursive parser. The idee is to do exactly what you did but involving a call to some other function which should do some kind of sub-work.
Actually, you can see the pattern already in the code you provided; for instance
tokenize' (('/' : binder) : words) tokens
= tokenize' words ((BinderToken binder) : tokens)
would become something like
parse' ('/' : words) tokens
= parse' rest (t : tokens)
where (t, rest) = parseBinder words -- parseBinder should parse
the input that follows '/'
so for an array:
parse' ('[' : words) tokens
= parse' rest (t : tokens)
where (t, rest) = parseArray words -- parseArray should parse the
input that follows '[', until and including ']'
Remarks:
- You see that the parseX that parses some X returns t, the 'token' we
are interested in, and also the rest of the input, i.e. the part not
consumed by parseX.
- tokenize is named parse: parseArray for instance doesn't return a
list of tokens, but a single thing that represent a complete array,
including its elements.
- because of the first point, you should define 'parse' as returning
([GmlToken], String). The return type has the same shape as your
sub-parser: (<return-type>, <unconsumed-input>). So do not pass the
'tokens' argument in the recursive call, but let the caller make the
(:) append. This way, instead of doing an append, you can make
something more useful before returning the <return-type>, like
building a nice representation of an array.
After you have managed to digest my no-so-well explained prose, have a
read at monadic parsing. I believe Philip Wadler has a nice
introductory paper about that. This is exactly what you need. Then,
once you understood that, have a look at Parsec, a standard monadic
parsing library for Haskell. (You don't need to read about monadic
parsing to do your project, but it happens to be exactly what my
answer is all about).
HTH,
Thu
2011/2/28 Hauschild, Klaus (EXT)
Hi Thu,
You read my mind. Ok, for the details. Here are my data structur for the differen tokens (currently not complete):
data GmlToken = IntToken Int | RealToken Double | BoolToken Bool | StringToken String | FunctionToken TokenSequence | ArrayToken TokenSequence | BinderToken String | IdentifierToken String | AddiToken
type TokenSequence = [GmlToken] type TokenStack = [GmlToken]
Now I'm working on the tokenize method. There a public and a private version (indicated by ')
tokenize :: String → TokenSequence -- foreward to intern method tokenize program = tokenize' (words program) []
tokenize' :: [String] → TokenSequence → TokenSequence -- last recursion step tokenize' [] tokens = tokens -- "addi" is a reserved keyword for the AddiToken tokenize' ("addi" : words) tokens = tokenize' words (AddiToken : tokens) -- binder tokens start with '/' tokenize' (('/' : binder) : words) tokens = tokenize' words ((BinderToken binder) : tokens) -- if nothing matches it is an identifier tokenize' (name : words) tokens = tokenize' words ((IdentifierToken name) : tokens)
With this code I'm able to tokenize the different reserved keywords of the GML language, binder and identifier (these two handle variable access). Now the problem: How to write the code to tokenize numbers (integer and double) and arrays and functions. I hope it is clear what is meant with numbers. Arrays and functions are embedded token sequences enclosed by "[" and "]" or "{" and "}". Some examples:
tokenize' ["2", "2", "addi"] [] results in [IntToken 2, IntToken 2, AddiToken] tokenize' ["2.0", "2.0", "addi"] [] results in [RealToken 2.0, RealToken 2.0, AddiToken]
tokenize' ["[", "foo", "/bar", "]"] results in [ArrayToken [IdentifierToken "foo", BinderToken "bar"]]
tokenize' ["{", "foo", "/bar", "}"] results in [FunctionToken [IdentifierToken "foo", BinderToken "bar"]]
-----Ursprüngliche Nachricht----- Von: Vo Minh Thu [mailto:noteed@gmail.com] Gesendet: Freitag, 25. Februar 2011 17:44 An: Hauschild, Klaus (EXT) Cc: haskell-cafe@haskell.org Betreff: Re: [Haskell-cafe] http://www.cs.cornell.edu/icfp/task.htm
2011/2/25 Hauschild, Klaus (EXT)
: Hi,
Currently I'm trying to lern Haskell by doing. After doing some examples I plan to solve an ICFP task (see subject). In short: build an interpreter for a stack based language thata describes a 3D scene. Ray trace this scene in an image. My current source state can be found here: http://code.google.com/p/hgmltracer/source/browse/ My first goal is to develop the interpreter of the GML language. My source contains a data GmlToken with various constructors for the different operators, sequences, int, real, string an so on. Some code for the evaluation is also there und working. But the parser converting a string (contained in the program file provided as command line argument) is hard stuff and I'm stuck.
Can any one help me with ideas or concepts for this parser?
Hi,
You have chosen to develop a very interesting program and I'm sure a lot of people could help you if you face any problem. But it would be easier for those people to help you if you could be a bit more specific about what works (or what you understand) and where you have some problem.
In this case, maybe you could copy in the mail some code (possibly cutting it down to what really matter).
For instance, I think central to your mail is this data structure:
data GmlToken = IntToken Int | RealToken Double | BoolToken Bool | StringToken String | FunctionToken [GmlSequence] | ArrayTokenToken [GmlSequence] | BinderToken String | IdentifierToken String | AddiToken
and the function
parse :: [String] -> TokenSequence -> TokenSequence parse = ...
At first view I would say that parse should be (and I call it tokenize)
tokenize :: String -> [GmlToken]
Now, maybe you can provide some sample inputs *with* their expected output and tell us what is really a problem for you.
Alternatively I can assume too much (or not enough) things from your mail and you can enlighten me :)
Cheers, Thu

Hi Klaus, from your code I cannot tell where exactly you're stuck. A general hint may be to have a look at the parsec library. On 02/25/2011 09:51 AM, Hauschild, Klaus (EXT) wrote:
Hi, Currently I'm trying to lern Haskell by doing. After doing some examples I plan to solve an ICFP task (see subject). In short: build an interpreter for a stack based language thata describes a 3D scene. Ray trace this scene in an image. My current source state can be found here: _http://code.google.com/p/hgmltracer/source/browse/_ My first goal is to develop the interpreter of the GML language. My source contains a data GmlToken with various constructors for the different operators, sequences, int, real, string an so on. Some code for the evaluation is also there und working. But the parser converting a string (contained in the program file provided as command line argument) is hard stuff and I'm stuck. Can any one help me with ideas or concepts for this parser? Thanks
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (3)
-
Hauschild, Klaus (EXT)
-
Tobias Schoofs
-
Vo Minh Thu