
vim: set ft=lhaskell: On Mon, Jul 02, 2007 at 02:25:57PM -0700, Gregory Propf wrote: | As a programming exercise I'm trying to use the State monad to create | a simple parser. It's for a very simple assembly language for a | simple virtual machine. The state is a string of instructions. I | want to be able to call something like getNextInstruction to pull out | the next instruction and then update the state (string). I know I can | do this non-monadically by just passing the string explicitly each | time but I'd like to learn more about the State monad. I also know | about Parsec and Happy and so forth but this is just an exercise so I | want to do it this way. Any ideas? I can't seem to get anything to | work. I've tried different things but I suspect I'm just missing | something basic. Can someone post a simple prototype for this? Just | assume the instructions are integers. For an example, here is the simplest parser type I know of; LL(1) a la Crenshaw. Our parser is simply a stateful computation using the rest of the input.
import Char import Control.Monad.State
type P = State [Char]
Simple primitives. We need to be able to see what the next character is. Notice that we return Maybe because there might not be a next character. Also note the use of the State data constructor to modify the value and return at the same time.
look :: P (Maybe Char) look = State $ \ st -> case st of [] -> (Nothing, []) (c:cs) -> (Just c, c:cs)
We need to do tests occasionally on the values.
isDigit' = maybe False isDigit digitToInt' = maybe 0 digitToInt
getc is similar, but it removes the character. This is typically done after making a decision based on look.
getc :: P (Maybe Char) getc = State $ \ st -> case st of [] -> (Nothing, []) (c:cs) -> (Just c, cs)
If we find inconsistent input, we signal a fatal error using the fail function already defined for State. A more featureful monad such as ErrorT (State [Char]) could be used to make error conditions non-fatal for the program. Often, we know what the lookahead will be; we can use this for better error messages. (We should also store the text position in the state, but for pedagogical reasons I will ignore that). For instance, if we are expecting something but don't find it, we use expected:
expected str = do context <- gets (show . take 20) fail $ str ++ " expected; found " ++ context ++ " instead"
Skipping whitespace is very important to handle our lines. Note that we also handle #-comments.
white = look >>= \ch -> case ch of Just '#' -> line >> white Just ' ' -> getc >> white Just '\t' -> getc >> white _ -> return ()
line = look >>= \ch -> case ch of Just '\n' -> return () Nothing -> return () _ -> getc >> line
Another common pattern is to skip over a noise character while verifying that it was correct. Notice the use of expected to handle error message formatting. Also note that we skip whitespace afterward - from here on we will maintain the invariant that the cursor does not point to spaces.
match ch = look >>= \realch -> if (realch == Just ch) then getc else expected (show ch)
Parsing integers is one of the problem requirements; it is handled by dispatching on the first character, then parsing a natural.
number = look >>= \ch -> case ch of Just '+' -> match '+' >> natural 0 Just '-' -> match '-' >> negate `fmap` natural 0 _ -> natural 0
Naturals can be handled using a simple loop. Notice that we check lookahead at the *end*, this is necessary to avoid parsing eg "xx" as a natural 0. We use an accumulating parameter.
natural acc = look >>= \ch -> do unless (isDigit' ch) $ expected "number"
getc let acc' = digitToInt' ch + acc * 10
look >>= \lk -> if isDigit' lk then natural acc' else white >> return acc'
A line of input to our assembler consists of horizontal whitespace, an optional number, more horizontal whitespace, and a newline.
inputLine = do white look >>= \ch -> if isDigit' ch then do number white else return () match '\n'
All our input consists of input lines for as long as there is more.
input = look >>= maybe (return ()) (\_ -> inputLine >> input)
main = interact $ show . runState input
Stefan