
I am getting frustrated when trying to implement a parser for a minimal TeX subset. I am essentially trying to implement something that recognizes \commands (no parameters yet), paragraphs (two newlines), and text (almost anything else). The code I have so far is: module UnTeX where import Text.ParserCombinators.Parsec import Text.ParserCombinators.Parsec.Prim import Text.ParserCombinators.Parsec.Language import qualified Text.ParserCombinators.Parsec.Token as T data UnTeX = Command String | Text String | Paragraph deriving Show command :: Parser UnTeX command = do char '\\' cmd <- many1 (letter <|> digit) > "command" return $ Command cmd paragraph :: Parser UnTeX paragraph = do newline newline return $ Paragraph text :: Parser UnTeX text = do txt <- many1 (alphaNum <|> space) return $ Text txt I have no illusions that this is probably far from what I actually need to be doing. But I am finding documentation on all this to be spotty. The 'parsec.pdf' is 7 years old and code doesn't always work. Can anyone point me in the right direction? I just want something to able to do this simply. After an initial version works, I would like to have commands with parameters - something like \command[param][param]{body} but need to have something working first. Thank you for any help you can provide. Jeffrey.

Each one of the little combinators seems to work as advertized. Are you having trouble fitting them together? -- _jsn module UnTeX where import Text.ParserCombinators.Parsec import Text.ParserCombinators.Parsec.Prim import Text.ParserCombinators.Parsec.Language import qualified Text.ParserCombinators.Parsec.Token as T data UnTeX = Command String [String] String | Text String | Paragraph deriving Show -- I don't remember TeX very well, so I'm not sure this is right. command :: Parser UnTeX command = do char '\\' cmd <- ident p <- orNot params [] b <- orNot body "" return $ Command cmd p b where params = many1 $ between (char '[') (char ']') ident body = do char '{' text <- many1 $ noneOf "}" char '}' return text ident = many1 $ letter <|> digit orNot p n = choice [try p , return n] paragraph :: Parser UnTeX paragraph = do newline newline return $ Paragraph text :: Parser UnTeX text = do txt <- many1 (alphaNum <|> space) return $ Text txt

*This message was sent in reply to Jason Dusek but the reply went to him, not the list. I plan to have Command String, CommandParams [String], CommandParamsWithArgs [String] [String] or something to that effect. If each of these are combinators, then I imagine you can use <|> with them. But realistically, I have no idea how to get this to take a string input (or something like getContents) and have [UnTeX] come out the other end. The problem that I have also is that while Command* can come anywhere in the text, everything goes back to Text unless it is a Paragraph. The spaces after a Command* up till the next letter have to be ignored, and superfluous spaces within Text itself also should be. I also can't have two Paragraphs right beside each other, because that makes little sense. So from what I can guess - I need a lexer, I think it was called a lexeme lexer. The syntax for Command* is like this: \command \command[arg] \command[arg][arg][...] \command[...]{body} \command[...]{body1}{body2}{...} The reason why this is necessary is because you could have something like \frac{a}{b}. I am trying to be more consistent with my use of this than LaTeX/TeX is. I might need to implement something like a table generator eventually, but this would be hopefully for the backend. Because I would like to translate this stuff into HTML and other outputs eventually. Thank you for your help, Jeffrey. On Sat, 2008-11-08 at 01:00 -0800, Jason Dusek wrote:
Each one of the little combinators seems to work as advertized. Are you having trouble fitting them together?
-- _jsn
module UnTeX where
import Text.ParserCombinators.Parsec import Text.ParserCombinators.Parsec.Prim import Text.ParserCombinators.Parsec.Language import qualified Text.ParserCombinators.Parsec.Token as T
data UnTeX = Command String [String] String | Text String | Paragraph deriving Show
-- I don't remember TeX very well, so I'm not sure this is right. command :: Parser UnTeX command = do char '\\' cmd <- ident p <- orNot params [] b <- orNot body "" return $ Command cmd p b where params = many1 $ between (char '[') (char ']') ident body = do char '{' text <- many1 $ noneOf "}" char '}' return text ident = many1 $ letter <|> digit orNot p n = choice [try p , return n]
paragraph :: Parser UnTeX paragraph = do newline newline return $ Paragraph
text :: Parser UnTeX text = do txt <- many1 (alphaNum <|> space) return $ Text txt

What you have looks good. Can you specifically describe the problems you are having? -Brent
participants (3)
-
Brent Yorgey
-
Jason Dusek
-
Jeffrey Drake