Please critique my code (a simple lexer)

Hi all, I've been teaching myself Haskell lately (I come from the C#/Python world). I wrote a simplistic lexer, and I was hoping I could get a code review or two. The code that follows is a stand-alone app that works under GHC. A few concerns of mine: - My `consume` function seems basic enough that it should be library code, but my searches turned up empty. Did I miss anything? - Is `case _ of x:xs -> x:xsr where xsr = something xs` a common idiom? It happened twice in my code, and it seems odd to split the first element away from the rest of the list as it's processed. - Is creating data structures with simple field names like `kind`, `offset`, etc a good practice? Since the names are global functions, I worry about namespace pollution, or stomping on functions defined elsewhere. Thanks in advance for anyone willing to take the time. -- code follows module Main where import qualified Data.Map as Map data Lexer = Lexer String makeLexer :: String -> Lexer makeLexer fn = Lexer fn data Loc = Loc {offset :: Int, line :: Int, column :: Int} locInc loc n = Loc (offset loc + n) (line loc) (column loc + n) locNL loc = Loc (offset loc + 1) (line loc + 1) 1 data TokenKind = Colon | RArrow1 | Def | Var | Identifier String | EOF deriving Show data Token = Token {lexer :: Lexer, loc :: Loc, kind :: TokenKind} idStart = ['a'..'z'] ++ ['A'..'Z'] ++ "!@$%^&*-_=+|<>/?" idNext = idStart ++ ['0'..'9'] ++ "'\"" namedTokens = Map.fromList [ ("def", Def), ("var", Var)] doLex :: Lexer -> String -> [Token] doLex lexer = doLex' lexer (Loc 0 1 1) doLex' lexer loc source = case source of [] -> [makeToken EOF] ' ':xs -> more (locInc loc 1) xs '\n':xs -> more (locNL loc) xs ':':xs -> makeToken Colon : more (locInc loc 1) xs '-':'>':xs -> makeToken RArrow1 : more (locInc loc 2) xs x:xs | x `elem` idStart -> makeToken kind : more (locInc loc $ length name) xsr where (namer, xsr) = consume idNext xs name = x:namer kind = maybe (Identifier name) id $ Map.lookup name namedTokens _ -> error "Invalid character in source" where makeToken = Token lexer loc more = doLex' lexer consume :: Eq a => [a] -> [a] -> ([a], [a]) consume want xs = case xs of x:xs | x `elem` want -> (x:xsr, rest) where (xsr, rest) = consume want xs _ -> ([], xs) main :: IO () main = do let toks = doLex (makeLexer "") "def x -> y" in putStrLn $ show $ map kind toks

John Simon, Tue 2012-05-22 @ 10:13:07-0500:
- My `consume` function seems basic enough that it should be library code, but my searches turned up empty. Did I miss anything?
consume = span . flip elem
- Is creating data structures with simple field names like `kind`, `offset`, etc a good practice? Since the names are global functions, I worry about namespace pollution, or stomping on functions defined elsewhere.
If you don't intend your module to be imported and used as a library, then there's no reason to worry about this. If you do intend it to be used that way, then it's probably still not worth worrying about, as name clashes can be resolved at the import level via qualified imports or `hiding` lists. If it ends up really being a problem, you can always add a namespace prefix to those names, though honestly I find that kind of ugly. The compiler will always catch cases of ambiguity caused by multiple definitions of the same name being in scope, so you don't have to worry about this causing inadvertent runtime bugs.

Another suggestion is to use pattern matching at the function level:
doLex' lexer loc [] = [makeToken EOF]
doLex' lexer loc (x:xs) = case x of
' ' -> more (locInc loc 1) xs
'\n' -> more (locNL loc) xs
...
_ ->
That saves you from having to deconstruct repeatedly in your case
statements.
You might also want to check out the excellent HLint (available on
hackage), which will give you even more suggestions.
On Tue, May 22, 2012 at 8:36 AM, Taylor Hedberg
John Simon, Tue 2012-05-22 @ 10:13:07-0500:
- My `consume` function seems basic enough that it should be library code, but my searches turned up empty. Did I miss anything?
consume = span . flip elem
- Is creating data structures with simple field names like `kind`, `offset`, etc a good practice? Since the names are global functions, I worry about namespace pollution, or stomping on functions defined elsewhere.
If you don't intend your module to be imported and used as a library, then there's no reason to worry about this. If you do intend it to be used that way, then it's probably still not worth worrying about, as name clashes can be resolved at the import level via qualified imports or `hiding` lists. If it ends up really being a problem, you can always add a namespace prefix to those names, though honestly I find that kind of ugly.
The compiler will always catch cases of ambiguity caused by multiple definitions of the same name being in scope, so you don't have to worry about this causing inadvertent runtime bugs.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On 5/22/12 11:13 AM, John Simon wrote:
- Is `case _ of x:xs -> x:xsr where xsr = something xs` a common idiom? It happened twice in my code, and it seems odd to split the first element away from the rest of the list as it's processed.
I don't know how common it is in practice, but that's fmap for the PreList functor. Allow me to explain... (tl;dr: there's some non-trivial theoretical backing, if you're interested in recursion theory. Though again, I'm not sure how often it actually comes up for lists.) Here's the list type, if we defined it ourselves: data List a = Nil | Cons a (List a) Like other recursive types, we can decompose this into a non-recursive type plus a generic version of recursion: data Fix f = MkFix (f (Fix f)) -- N.B., MkFix :: f (Fix f) -> Fix f data PreList a recurse = PreNil | PreCons a recurse type List' a = Fix (PreList a) The new List' is essentially the same as the old List: to :: List a -> List' a to Nil = Fix PreNil to (Cons x xs) = Fix (PreCons x (to xs)) fro :: List' a -> List a fro (MkFix PreNil) = Nil fro (MkFix (PreCons x xs)) = Cons x (fro xs) -- and we can prove: -- > to . fro == id -- > fro . to == id There's a whole bunch of other machinery that comes from this perspective, but the important thing is that it all depends on the fact that PreList is a functor in a very specific way, namely that it applies the function to all the recursion sites (i.e., "one level down" if we're thinking of the fixed-point version): instance Functor (PreList a) where fmap f PreNil = PreNil fmap f (PreCons x xs) = PreCons x (f xs) Of course, Fix(PreList a) ---aka List a--- has its own functor instance, but that's unrelated. -- Live well, ~wren
participants (4)
-
Eric Rasmussen
-
John Simon
-
Taylor Hedberg
-
wren ng thornton