Haskell Quiz Solution - Haskell Newbie Requesting Review

As part of the Ruby Quiz in Haskell solutions appearing on the wiki recently, I just a solution to Ruby Quiz #100 - create a bytecode interpreter for a simple expression language. Like I said, the code below parses simple integer arithmetic expressions and generates byte codes for a hypothetical stack-based intepreter that would evaluate the expressions. To run it, save it as a literate haskell file and run "interpret_tests". That just shows correctness, though. Other output can be obtained by running "compile_tests" (shows bytes for all tests), "generate_tests" (symbolic bytecodes for all tests), and "eval_tests" (evaluate ASTs for all tests). To see the AST generated for a example expression, try something like 'parse "2-2-2"'. I'm just learning Haskell (about a month in) and if anyone has time and desire to critique the code below, I'd love to hear it. I come from an OOP (primarily C# & Ruby) background, so I'm really interested in getting a handle on the functional/Haskell "way" of coding. Thanks for any feedback! Justin p.s. This code is also available on the wiki: http://www.haskell.org/haskellwiki/Haskell_Quiz/Bytecode_Compiler/Solution_J... p.p.s. The original ruby quiz is available at: http://www.rubyquiz.com/quiz100.html \begin{code} import Text.ParserCombinators.Parsec hiding (parse) import qualified Text.ParserCombinators.Parsec as P (parse) import Text.ParserCombinators.Parsec.Expr import Data.Bits import Data.Int -- Represents various operations that can be applied -- to expressions. data Op = Plus | Minus | Mult | Div | Pow | Mod | Neg deriving (Show, Eq) -- Represents expression we can build - either numbers or expressions -- connected by operators. This structure is the basis of the AST built -- when parsing data Expression = Statement Op Expression Expression | Val Integer | Empty deriving (Show) -- Define the byte codes that can be generated. data Bytecode = NOOP | CONST Integer | LCONST Integer | ADD | SUB | MUL | POW | DIV | MOD | SWAP deriving (Show) -- Using imported Parsec.Expr library, build a parser for expressions. expr :: Parser Expression expr = buildExpressionParser table factor <?> "expression" where -- Recognizes a factor in an expression factor = do{ char '(' ; x <- expr ; char ')' ; return x } <|> number <?> "simple expression" -- Recognizes a number number :: Parser Expression number = do{ ds <- many1 digit ; return (Val (read ds)) } > "number" -- Specifies operator, associativity, precendence, and constructor to execute -- and built AST with. table = [[prefix "-" (Statement Mult (Val (-1)))], [binary "^" (Statement Pow) AssocRight], [binary "*" (Statement Mult) AssocLeft, binary "/" (Statement Div) AssocLeft, binary "%" (Statement Mod) AssocLeft], [binary "+" (Statement Plus) AssocLeft, binary "-" (Statement Minus) AssocLeft] ] where binary s f assoc = Infix (do{ string s; return f}) assoc prefix s f = Prefix (do{ string s; return f}) -- Parses a string into an AST, using the parser defined above parse s = case P.parse expr "" s of Right ast -> ast Left e -> error $ show e -- Take AST and evaluate (mostly for testing) eval (Val n) = n eval (Statement op left right) | op == Mult = eval left * eval right | op == Minus = eval left - eval right | op == Plus = eval left + eval right | op == Div = eval left `div` eval right | op == Pow = eval left ^ eval right | op == Mod = eval left `mod` eval right -- Takes an AST and turns it into a byte code list generate stmt = generate' stmt [] where generate' (Statement op left right) instr = let li = generate' left instr ri = generate' right instr lri = li ++ ri in case op of Plus -> lri ++ [ADD] Minus -> lri ++ [SUB] Mult -> lri ++ [MUL] Div -> lri ++ [DIV] Mod -> lri ++ [MOD] Pow -> lri ++ [POW] generate' (Val n) instr = if abs(n) > 32768 then LCONST n : instr else CONST n : instr -- Takes a statement and converts it into a list of actual bytes to -- be interpreted compile s = toBytes (generate $ parse s) -- Convert a list of byte codes to a list of integer codes. If LCONST or CONST -- instruction are seen, correct byte representantion is produced toBytes ((NOOP):xs) = 0 : toBytes xs toBytes ((CONST n):xs) = 1 : (toConstBytes (fromInteger n)) ++ toBytes xs toBytes ((LCONST n):xs) = 2 : (toLConstBytes (fromInteger n)) ++ toBytes xs toBytes ((ADD):xs) = 0x0a : toBytes xs toBytes ((SUB):xs) = 0x0b : toBytes xs toBytes ((MUL):xs) = 0x0c : toBytes xs toBytes ((POW):xs) = 0x0d : toBytes xs toBytes ((DIV):xs) = 0x0e : toBytes xs toBytes ((MOD):xs) = 0x0f : toBytes xs toBytes ((SWAP):xs) = 0x0a : toBytes xs toBytes [] = [] -- Convert number to CONST representation (2 element list) toConstBytes n = toByteList 2 n toLConstBytes n = toByteList 4 n -- Convert a number into a list of 8-bit bytes (big-endian/network byte order). -- Make sure final list is size elements long toByteList :: Bits Int => Int -> Int -> [Int] toByteList size n = reverse $ take size (toByteList' n) where toByteList' a = (a .&. 255) : toByteList' (a `shiftR` 8) -- All tests defined by the quiz, with the associated values they should evaluate to. test1 = [(2+2, "2+2"), (2-2, "2-2"), (2*2, "2*2"), (2^2, "2^2"), (2 `div` 2, "2/2"), (2 `mod` 2, "2%2"), (3 `mod` 2, "3%2")] test2 = [(2+2+2, "2+2+2"), (2-2-2, "2-2-2"), (2*2*2, "2*2*2"), (2^2^2, "2^2^2"), (4 `div` 2 `div` 2, "4/2/2"), (7`mod`2`mod`1, "7%2%1")] test3 = [(2+2-2, "2+2-2"), (2-2+2, "2-2+2"), (2*2+2, "2*2+2"), (2^2+2, "2^2+2"), (4 `div` 2+2, "4/2+2"), (7`mod`2+1, "7%2+1")] test4 = [(2+(2-2), "2+(2-2)"), (2-(2+2), "2-(2+2)"), (2+(2*2), "2+(2*2)"), (2*(2+2), "2*(2+2)"), (2^(2+2), "2^(2+2)"), (4 `div` (2+2), "4/(2+2)"), (7`mod`(2+1), "7%(2+1)")] test5 = [(-2+(2-2), "-2+(2-2)"), (2-(-2+2), "2-(-2+2)"), (2+(2 * -2), "2+(2*-2)")] test6 = [((3 `div` 3)+(8-2), "(3/3)+(8-2)"), ((1+3) `div` (2 `div` 2)*(10-8), "(1+3)/(2/2)*(10-8)"), ((1*3)*4*(5*6), "(1*3)*4*(5*6)"), ((10`mod`3)*(2+2), "(10%3)*(2+2)"), (2^(2+(3 `div` 2)^2), "2^(2+(3/2)^2)"), ((10 `div` (2+3)*4), "(10/(2+3)*4)"), (5+((5*4)`mod`(2+1)), "5+((5*4)%(2+1))")] -- Evaluates the tests and makes sure the expressions match the expected values eval_tests = concat $ map eval_tests [test1, test2, test3, test4, test5, test6] where eval_tests ((val, stmt):ts) = let eval_val = eval $ parse stmt in if val == eval_val then ("Passed: " ++ stmt) : eval_tests ts else ("Failed: " ++ stmt ++ "(" ++ show eval_val ++ ")") : eval_tests ts eval_tests [] = [] -- Takes all the tests and displays symbolic bytes codes for each generate_tests = concat $ map generate_all [test1,test2,test3,test4,test5,test6] where generate_all ((val, stmt):ts) = (stmt, generate (parse stmt)) : generate_all ts generate_all [] = [] -- Takes all tests and generates a list of bytes representing them compile_tests = concat $ map compile_all [test1,test2,test3,test4,test5,test6] where compile_all ((val, stmt):ts) = (stmt, compile stmt) : compile_all ts compile_all [] = [] interpret_tests = concat $ map f' [test1, test2, test3, test4, test5, test6] where f' tests = map f'' tests f'' (expected, stmt) = let value = fromIntegral $ interpret [] $ compile stmt in if value == expected then "Passed: " ++ stmt else "Failed: " ++ stmt ++ "(" ++ (show value) ++ ")" fromBytes n xs = let int16 = (fromIntegral ((fromIntegral int32) :: Int16)) :: Int int32 = byte xs byte xs = foldl (\accum byte -> (accum `shiftL` 8) .|. (byte)) (head xs) (take (n - 1) (tail xs)) in if n == 2 then int16 else int32 interpret [] [] = error "no result produced" interpret (s1:s) [] = s1 interpret s (o:xs) | o < 10 = interpret ((fromBytes (o*2) xs):s) (drop (o*2) xs) interpret (s1:s2:s) (o:xs) | o == 16 = interpret (s2:s1:s) xs | otherwise = interpret (((case o of 10 -> (+); 11 -> (-); 12 -> (*); 13 -> (^); 14 -> div; 15 -> mod) s2 s1):s) xs \end{code}

Justin Bailey wrote:
As part of the Ruby Quiz in Haskell solutions appearing on the wiki recently, I just a solution to Ruby Quiz #100 - create a bytecode interpreter for a simple expression language.
Like I said, the code below parses simple integer arithmetic expressions and generates byte codes for a hypothetical stack-based intepreter that would evaluate the expressions. To run it, save it as a literate haskell file and run "interpret_tests". That just shows correctness, though. Other output can be obtained by running "compile_tests" (shows bytes for all tests), "generate_tests" (symbolic bytecodes for all tests), and "eval_tests" (evaluate ASTs for all tests).
To see the AST generated for a example expression, try something like 'parse "2-2-2"'.
I'm just learning Haskell (about a month in) and if anyone has time and desire to critique the code below, I'd love to hear it. I come from an OOP (primarily C# & Ruby) background, so I'm really interested in getting a handle on the functional/Haskell "way" of coding. Thanks for any feedback!
I haven't had time to look over your code, but this reminds me of a fun paper, "A type-correct, stack-safe, provably correct expression compiler in Epigram" The code consists of an expression interpreter, a stack machine emulator, a compiler, and a proof that forall expr, evaluate exper == execute (compile expr). (proofs are functions - go Curry-Howard) You can find the paper at http://www.cs.nott.ac.uk/~jjw/ Brandon

Justin Bailey wrote:
As part of the Ruby Quiz in Haskell solutions appearing on the wiki recently, I just a solution to Ruby Quiz #100 - create a bytecode interpreter for a simple expression language.
Like I said, the code below parses simple integer arithmetic expressions and generates byte codes for a hypothetical stack-based intepreter that would evaluate the expressions. To run it, save it as a literate haskell file and run "interpret_tests". That just shows correctness, though. Other output can be obtained by running "compile_tests" (shows bytes for all tests), "generate_tests" (symbolic bytecodes for all tests), and "eval_tests" (evaluate ASTs for all tests).
To see the AST generated for a example expression, try something like 'parse "2-2-2"'.
I'm just learning Haskell (about a month in) and if anyone has time and desire to critique the code below, I'd love to hear it. I come from an OOP (primarily C# & Ruby) background, so I'm really interested in getting a handle on the functional/Haskell "way" of coding. Thanks for any feedback!
Looks nice, especially if you're just getting started. The overall structure looks good, I've just made a bunch of little changes to the details. Mostly I found repeated patterns to replace with library functions or extract as helper functions. In particular, you often wrote fun (x:xs) = <stuff with x> : fun xs fun [] = [] instead of fun xs = map <stuff> xs Getting a little fancier, defining the fold over your expression type captures the recursion pattern in eval and generate. It's fairly handy for defining constant folding too, if you want that. Haskell is very easy to refactor. You pull out some functions, the code shrinks, you see more subtle patterns, you pull out more functions. Eventually you start noticing and reifying patterns between those functions, like a star off the main sequence burning the products of the last round of fusion, until finally the "degeneracy pressure" or fixed boilerplate of introducing and calling a new abstraction stops the collapse. And Haskell has very little syntactic overhead, no classes, keywords on function definitions, or even block delimiters - it's like neutronium instead of that bloated electron-degenerate matter :) I wonder, what's the programming equivalent of a black hole? Happy Hacking Brandon \begin{code} import Text.ParserCombinators.Parsec hiding (parse) import qualified Text.ParserCombinators.Parsec as P (parse) import Text.ParserCombinators.Parsec.Expr import Data.Bits import Data.Int -- Represents various operations that can be applied -- to expressions. data Op = Plus | Minus | Mult | Div | Pow | Mod | Neg deriving (Show, Eq) -- Represents expression we can build - either numbers or expressions -- connected by operators. This structure is the basis of the AST built -- when parsing data Expression = Val Integer | Statement Op Expression Expression deriving (Show) foldExpression val stmt = f where f (Val n) = val n f (Statement op l r) = stmt op (f l) (f r) -- Define the byte codes that can be generated. data Bytecode = NOOP | CONST Integer | LCONST Integer | ADD | SUB | MUL | POW | DIV | MOD | SWAP deriving (Show) -- Using imported Parsec.Expr library, build a parser for expressions. expr :: Parser Expression expr = buildExpressionParser table factor <?> "expression" where -- Recognizes a factor in an expression factor = between (char '(') (char ')') expr <|> number <?> "simple expression" -- Recognizes a number number = fmap (Val . read) (many1 digit) > "number" -- Specifies operator, associativity, precendence, and constructor to execute -- and built AST with. table = [[prefix "-" (Statement Mult (Val (-1)))], [binary "^" Pow AssocRight], [binary "*" Mult AssocLeft, binary "/" Div AssocLeft, binary "%" Mod AssocLeft], [binary "+" Plus AssocLeft, binary "-" Minus AssocLeft]] where binary s op assoc = Infix (do{ string s; return (Statement op)}) assoc prefix s f = Prefix (do{ string s; return f}) -- Parses a string into an AST, using the parser defined above parse s = case P.parse expr "" s of Right ast -> ast Left e -> error $ show e -- Take AST and evaluate (mostly for testing) eval = foldExpression id evalOp evalOp op = case op of Mult -> (*) Minus -> (-) Plus -> (+) Div -> div Pow -> (^) Mod -> mod -- Takes an AST and turns it into a byte code list generate = foldExpression generateVal (\op l r -> l ++ r ++ generateOp op) where generateVal n = if abs n > 2^(2*8)-1 then [CONST n] else [LCONST n] generateOp op = case op of Plus -> [ADD] Minus -> [SUB] Mult -> [MUL] Div -> [DIV] Mod -> [MOD] Pow -> [POW] -- Takes a statement and converts it into a list of actual bytes to -- be interpreted compile s = toBytes (generate $ parse s) -- Convert a list of byte codes to a list of integer codes. If LCONST or CONST -- instruction are seen, correct byte representantion is produced toBytes xs = concatMap instToBytes xs instToBytes instr = case instr of NOOP -> [0] (CONST n) -> 1 : toByteList 2 (fromInteger n) (LCONST n) -> 2 : toByteList 4 (fromInteger n) ADD -> [0x0a] SUB -> [0x0b] MUL -> [0x0c] POW -> [0x0d] DIV -> [0x0e] MOD -> [0x0f] SWAP -> [0x0a] -- Convert a number into a list of 8-bit bytes (big-endian/network byte order). -- Make sure final list is size elements long toByteList :: Int -> Int -> [Int] toByteList size n = reverse $ take size (toByteList' n) where toByteList' a = (a .&. 255) : toByteList' (a `shiftR` 8) fromBytes xs = foldl (\accum byte -> (accum `shiftL` 8) .|. byte) 0 xs -- The stack machine for binary bytecodes interpret [] [] = error "no result produced" interpret (s1:s) [] = s1 interpret s (o:xs) | o < 10 = interpret (fromBytes (take (o*2) xs):s) (drop (o*2) xs) interpret (s1:s2:s) (o:xs) | o == 16 = interpret (s2:s1:s) xs | otherwise = interpret (evalOpCode o s2 s1:s) xs evalOpCode o = case o of 10 -> (+) 11 -> (-) 12 -> (*) 13 -> (^) 14 -> div 15 -> mod -- All tests defined by the quiz, with the associated values they should evaluate to. test1 = [(2+2, "2+2"), (2-2, "2-2"), (2*2, "2*2"), (22, "22"), (2 `div` 2, "2/2"), (2 `mod` 2, "2%2"), (3 `mod` 2, "3%2")] test2 = [(2+2+2, "2+2+2"), (2-2-2, "2-2-2"), (2*2*2, "2*2*2"), (22^2, "22^2"), (4 `div` 2 `div` 2, "4/2/2"), (7`mod`2`mod`1, "7%2%1")] test3 = [(2+2-2, "2+2-2"), (2-2+2, "2-2+2"), (2*2+2, "2*2+2"), (22+2, "22+2"), (4 `div` 2+2, "4/2+2"), (7`mod`2+1, "7%2+1")] test4 = [(2+(2-2), "2+(2-2)"), (2-(2+2), "2-(2+2)"), (2+(2*2), "2+(2*2)"), (2*(2+2), "2*(2+2)"), (2^(2+2), "2^(2+2)"), (4 `div` (2+2), "4/(2+2)"), (7`mod`(2+1), "7%(2+1)")] test5 = [(-2+(2-2), "-2+(2-2)"), (2-(-2+2), "2-(-2+2)"), (2+(2*(-2)), "2+(2*-2)")] test6 = [((3 `div` 3)+(8-2), "(3/3)+(8-2)"), ((1+3) `div` (2 `div` 2)*(10-8), "(1+3)/(2/2)*(10-8)"), ((1*3)*4*(5*6), "(1*3)*4*(5*6)"), ((10`mod`3)*(2+2), "(10%3)*(2+2)"), -- (2^(2+(3 `div` 2)2), "2^(2+(3/2)2)"), -- maybe *2 at the end? ((10 `div` (2+3)*4), "(10/(2+3)*4)"), (5+((5*4)`mod`(2+1)), "5+((5*4)%(2+1))")] suite = [test1, test2, test3, test4, test5, test6] suiteResults doCase = [doCase val stmt | batch <- suite, (val, stmt) <- batch] checkResult fun expected arg | result == expected = "Passed: " ++ show arg | otherwise = "Failed: " ++ show arg ++ "(" ++ show result ++ ")" where result = fun arg showResult fun _ arg = (arg, fun arg) -- Evaluates the tests and makes sure the expressions match the expected values eval_tests = suiteResults (checkResult (eval . parse)) -- Takes all the tests and displays symbolic bytes codes for each generate_tests = suiteResults (showResult (generate . parse)) -- Takes all tests and generates a list of bytes representing them compile_tests = suiteResults (showResult compile) -- Execute the binary bytecode for each test, and make sure the results are expected interpret_tests = suiteResults (checkResult (fromIntegral . interpret [] . compile)) \end{code}

On 11/9/06, Brandon Moore
Looks nice, especially if you're just getting started.
The overall structure looks good, I've just made a bunch of little changes to the details. Mostly I found repeated patterns to replace with library functions or extract as helper functions.
Thanks very much! I really appreciate you taking the time to look through this code and perform the refactoring you did. Now, I hope you don't mind me asking a lot of questions about it :)
Getting a little fancier, defining the fold over your expression type captures the recursion pattern in eval and generate. It's fairly handy for defining constant folding too, if you want that.
Do you have any tips for recognizing these patterns? Its still hard for me to see them. Is there a general way to think of them? Comparing the two code pieces, I can see how the structure of the recursion was similar, but not the same. Is there a "pattern" for which pieces are common and which are unique? For example, I can think of foldl as "folding" a function over a list, with a given base case. Is there something similar for thinking about recursion?
I wonder, what's the programming equivalent of a black hole?
To stretch the analogy to the breaking point, what about virtual particles and Hawking radiation? And what does the event horizon look like? LOL.
foldExpression val stmt = f where f (Val n) = val n f (Statement op l r) = stmt op (f l) (f r)
This is great. It took me a while to realize that 'val' is a function for translating values, and 'stmt' is for translating statements. Really cool!
number = fmap (Val . read) (many1 digit) > "number"
How is this working? I read it as 'map (Val (read)) (string)' ('map', because its applied the List version of fmap). Is that correct? How does 'read' get the string argument? I would assume read is evaluated, and then its result and the string would be passed as arguments to Val. Clearly that's not right - can you correct me?
-- Takes an AST and turns it into a byte code list generate = foldExpression generateVal (\op l r -> l ++ r ++ generateOp op) where generateVal n = if abs n > 2^(2*8)-1 then [CONST n] else [LCONST n] generateOp op = case op of Plus -> [ADD] Minus -> [SUB] Mult -> [MUL] Div -> [DIV] Mod -> [MOD] Pow -> [POW]
This is what clued me into how foldExpression was working. I especially like how the lambda works to generate the correct bytecode for the operator, and how "l" and "r" are already recursively evaluated by the "f" function returned from foldExpression. I just wonder how I'll ever spot similar patterns ;)
eval_tests = suiteResults (checkResult (eval . parse))
generate_tests = suiteResults (showResult (generate . parse))
interpret_tests = suiteResults (checkResult (fromIntegral . interpret [] . compile))
Above are all more examples of partial functions and function composition. I understand the first concept, but function composition escapes me somehow. What are the rules for partial functions getting arguments when they are eventually supplied? For example, in 'interpret_tests' I can see that the function (fromIntegral . interpret . compile) gets applied to the statement via 'checkResult', but it seems to me that fromIntegral should get teh argument (i.e. because I read it is '(fromIntegral (interpret (compile)))'). Clearly, I'm wrong. Do arguments get consumed by partially applied functions regardless of their "depth"? Thanks again for your time looking at this code and maybe even answering these questions. I've already learned a ton just seeing the refactor. Justin

Quoth Justin Bailey, nevermore,
Above are all more examples of partial functions and function composition. I understand the first concept, but function composition escapes me somehow. What are the rules for partial functions getting arguments when they are eventually supplied? For example, in 'interpret_tests' I can see that the function (fromIntegral . interpret . compile) gets applied to the statement via 'checkResult', but it seems to me that fromIntegral should get teh argument (i.e. because I read it is '(fromIntegral (interpret (compile)))'). Clearly, I'm wrong. Do arguments get consumed by partially applied functions regardless of their "depth"?
The operators (.) and ($) are used to join functions together, but in slightly different ways. Taking your example above, we would use ($) to obtain nested functions:
fromIntegral $ interpret $ compile == fromIntegral (interpret (compile))
As you noted that doesn't seem right --- how does compile capture its input? Well, the (.) operator is slightly different. It captures variables and passes them into the 'innermost' function, a bit like this:
f . g = \x -> f (g x)
In this respect you can treat 'f . g' as a single functional entity which takes the same number and type of functions as 'g' and return whatever type 'f' returns. As in the type signature:
(.) :: (b -> c) -> (a -> b) -> a -> c
If it helps, think of something like
map (f . g . h) xs
as identical to the following (although obviously much more succinct and orders of magnitude clearer)
map (f') xs where f' = \x -> f (g (h x))
Cheers,
D.
--
Dougal Stanton

On 11/10/06, Dougal Stanton
As you noted that doesn't seem right --- how does compile capture its input? Well, the (.) operator is slightly different. It captures variables and passes them into the 'innermost' function, a bit like this:
That is a great explanation. I've got a much better understanding of the operator now - thanks very much! Justin
participants (3)
-
Brandon Moore
-
Dougal Stanton
-
Justin Bailey