Very simple parser

Gregory Propf

2 Jul 2007 2 Jul '07

9:25 p.m.

As a programming exercise I'm trying to use the State monad to create a simple parser. It's for a very simple assembly language for a simple virtual machine. The state is a string of instructions. I want to be able to call something like getNextInstruction to pull out the next instruction and then update the state (string). I know I can do this non-monadically by just passing the string explicitly each time but I'd like to learn more about the State monad. I also know about Parsec and Happy and so forth but this is just an exercise so I want to do it this way. Any ideas? I can't seem to get anything to work. I've tried different things but I suspect I'm just missing something basic. Can someone post a simple prototype for this? Just assume the instructions are integers. ____________________________________________________________________________________ Get the Yahoo! toolbar and be alerted to new email wherever you're surfing. http://new.toolbar.yahoo.com/toolbar/features/mail/index.php

Attachments:

attachment.html (text/html — 1.2 KB)

Show replies by date

Arie Peterson

2 Jul 2 Jul

9:46 p.m.

Gregory Propf wrote:

...

As a programming exercise I'm trying to use the State monad to create a simple parser. It's for a very simple assembly language for a simple virtual machine. The state is a string of instructions. I want to be able to call something like getNextInstruction to pull out the next instruction and then update the state (string). I know I can do this non-monadically by just passing the string explicitly each time but I'd like to learn more about the State monad. I also know about Parsec and Happy and so forth but this is just an exercise so I want to do it this way. Any ideas? I can't seem to get anything to work. I've tried different things but I suspect I'm just missing something basic. Can someone post a simple prototype for this? Just assume the instructions are integers.

Did you look at the documentation for the State monad? http://haskell.org/ghc/docs/latest/html/libraries/mtl/Control-Monad-State-La... It also contains some examples. Getting the next instruction sounds like a job for 'get'; to remove it from the state you might use 'put' or 'modify'. If this doesn't help, can you show something you tried? Greetings, Arie

Stefan O'Rear

10:12 p.m.

vim: set ft=lhaskell: On Mon, Jul 02, 2007 at 02:25:57PM -0700, Gregory Propf wrote: | As a programming exercise I'm trying to use the State monad to create | a simple parser. It's for a very simple assembly language for a | simple virtual machine. The state is a string of instructions. I | want to be able to call something like getNextInstruction to pull out | the next instruction and then update the state (string). I know I can | do this non-monadically by just passing the string explicitly each | time but I'd like to learn more about the State monad. I also know | about Parsec and Happy and so forth but this is just an exercise so I | want to do it this way. Any ideas? I can't seem to get anything to | work. I've tried different things but I suspect I'm just missing | something basic. Can someone post a simple prototype for this? Just | assume the instructions are integers. For an example, here is the simplest parser type I know of; LL(1) a la Crenshaw. Our parser is simply a stateful computation using the rest of the input.

...

import Char import Control.Monad.State

...

type P = State [Char]

Simple primitives. We need to be able to see what the next character is. Notice that we return Maybe because there might not be a next character. Also note the use of the State data constructor to modify the value and return at the same time.

...

look :: P (Maybe Char) look = State $ \ st -> case st of [] -> (Nothing, []) (c:cs) -> (Just c, c:cs)

We need to do tests occasionally on the values.

...

isDigit' = maybe False isDigit digitToInt' = maybe 0 digitToInt

getc is similar, but it removes the character. This is typically done after making a decision based on look.

...

getc :: P (Maybe Char) getc = State $ \ st -> case st of [] -> (Nothing, []) (c:cs) -> (Just c, cs)

If we find inconsistent input, we signal a fatal error using the fail function already defined for State. A more featureful monad such as ErrorT (State [Char]) could be used to make error conditions non-fatal for the program. Often, we know what the lookahead will be; we can use this for better error messages. (We should also store the text position in the state, but for pedagogical reasons I will ignore that). For instance, if we are expecting something but don't find it, we use expected:

...

expected str = do context <- gets (show . take 20) fail $ str ++ " expected; found " ++ context ++ " instead"

Skipping whitespace is very important to handle our lines. Note that we also handle #-comments.

...

white = look >>= \ch -> case ch of Just '#' -> line >> white Just ' ' -> getc >> white Just '\t' -> getc >> white _ -> return ()

line = look >>= \ch -> case ch of Just '\n' -> return () Nothing -> return () _ -> getc >> line

Another common pattern is to skip over a noise character while verifying that it was correct. Notice the use of expected to handle error message formatting. Also note that we skip whitespace afterward - from here on we will maintain the invariant that the cursor does not point to spaces.

...

match ch = look >>= \realch -> if (realch == Just ch) then getc else expected (show ch)

Parsing integers is one of the problem requirements; it is handled by dispatching on the first character, then parsing a natural.

...

number = look >>= \ch -> case ch of Just '+' -> match '+' >> natural 0 Just '-' -> match '-' >> negate `fmap` natural 0 _ -> natural 0

Naturals can be handled using a simple loop. Notice that we check lookahead at the *end*, this is necessary to avoid parsing eg "xx" as a natural 0. We use an accumulating parameter.

...

natural acc = look >>= \ch -> do unless (isDigit' ch) $ expected "number"

getc let acc' = digitToInt' ch + acc * 10

look >>= \lk -> if isDigit' lk then natural acc' else white >> return acc'

A line of input to our assembler consists of horizontal whitespace, an optional number, more horizontal whitespace, and a newline.

...

inputLine = do white look >>= \ch -> if isDigit' ch then do number white else return () match '\n'

All our input consists of input lines for as long as there is more.

...

input = look >>= maybe (return ()) (\_ -> inputLine >> input)

...

main = interact $ show . runState input

Stefan

Hugh Perkins

10:46 p.m.

Graham Hutton has some great tutorials on parsing. Check out the "Are parsers monodic?" thread (not exact name) for a good reference. There's also a good tutorial at http://www.cs.nott.ac.uk/~gmh/book.html In Section "Slides", click on "8 Functional parsers", but you may just want to start from 1. They're really quick and painless. Graham Hutton's tutorials are about the only tutorials on monads that make sense to me. YMMV of course. Other than that... a list is an instance of State, I think (?), so you can do something like (writing this in directly, without trying to compile): processor :: State a processor = do value <- gets head case value of "blah" -> return blah "foo" -> return foo dotest = evalState( processor )["blah","foo"] Note that I'm a total newbie, and I didnt check this compiles (almost certainly doesnt) so take this with a wodge of salt I cant say I really like the way I have a case that selects on strings to decide which function to call. If someone knows a more elegant/spelling-safe way of doing this that'd be really useful generally. For example something like this could be more spelling safe (if it worked) (maybe it does???): case value of (showConstr $ toConstr $ blah) -> return blah (showConstr $ toConstr $ foo) -> return foo

6575

Age (days ago)

6575

Last active (days ago)

List overview

Download

3 comments

4 participants

participants (4)

Arie Peterson
Gregory Propf
Hugh Perkins
Stefan O'Rear