
*This message was sent in reply to Jason Dusek but the reply went to him, not the list. I plan to have Command String, CommandParams [String], CommandParamsWithArgs [String] [String] or something to that effect. If each of these are combinators, then I imagine you can use <|> with them. But realistically, I have no idea how to get this to take a string input (or something like getContents) and have [UnTeX] come out the other end. The problem that I have also is that while Command* can come anywhere in the text, everything goes back to Text unless it is a Paragraph. The spaces after a Command* up till the next letter have to be ignored, and superfluous spaces within Text itself also should be. I also can't have two Paragraphs right beside each other, because that makes little sense. So from what I can guess - I need a lexer, I think it was called a lexeme lexer. The syntax for Command* is like this: \command \command[arg] \command[arg][arg][...] \command[...]{body} \command[...]{body1}{body2}{...} The reason why this is necessary is because you could have something like \frac{a}{b}. I am trying to be more consistent with my use of this than LaTeX/TeX is. I might need to implement something like a table generator eventually, but this would be hopefully for the backend. Because I would like to translate this stuff into HTML and other outputs eventually. Thank you for your help, Jeffrey. On Sat, 2008-11-08 at 01:00 -0800, Jason Dusek wrote:
Each one of the little combinators seems to work as advertized. Are you having trouble fitting them together?
-- _jsn
module UnTeX where
import Text.ParserCombinators.Parsec import Text.ParserCombinators.Parsec.Prim import Text.ParserCombinators.Parsec.Language import qualified Text.ParserCombinators.Parsec.Token as T
data UnTeX = Command String [String] String | Text String | Paragraph deriving Show
-- I don't remember TeX very well, so I'm not sure this is right. command :: Parser UnTeX command = do char '\\' cmd <- ident p <- orNot params [] b <- orNot body "" return $ Command cmd p b where params = many1 $ between (char '[') (char ']') ident body = do char '{' text <- many1 $ noneOf "}" char '}' return text ident = many1 $ letter <|> digit orNot p n = choice [try p , return n]
paragraph :: Parser UnTeX paragraph = do newline newline return $ Paragraph
text :: Parser UnTeX text = do txt <- many1 (alphaNum <|> space) return $ Text txt