
Hi, I'm playing since a few hours with Parsec and trying to write a small html (fragment) parser, but I'm stuck in a point which I really can't understand. The problem seem to be either in "parseProperCont" or in "closing" (see code below). It looks like closing does not work (but it is a very simple function!) or (also hard to believe) function "try" from Parsec has some problems. Anyway I get this answer: Prelude ParseSHtml> pf parseHtmlFrg "ptest.txt" Left "ptest.txt" (line 5, column 2): unexpected "/" expecting element name when I'm parsing this file: <div id="normtext"> one line with break<br /> another line <br /><br /> Mail: <a href="mailto:user@dom.at">user@dom.at</a> </div> with this code (sorry for the longer mail): import Text.ParserCombinators.Parsec hiding (label) import Text.XHtml.Strict -- Helper function: parse a string up to one of the given chars upTo :: [Char] -> Parser [Char] upTo ds = many1 (noneOf ds) parseHtmlFrg :: Parser Html parseHtmlFrg = do many space choice [parseElem, parseText] <?> "html fragment" parseElem :: Parser Html parseElem = do en <- parseElTag many1 space (ats, cnt) <- restElem en return $ compElem en cnt ! ats <?> "html element" -- Compose a html element from tag name and content compElem en cnt = if isNoHtml cnt then itag en else tag en cnt parseElTag :: Parser String parseElTag = do char '<' en <- elemName return en <?> "element tag" elemName :: Parser String elemName = many1 lower <?> "element name" restElem :: String -> Parser ([HtmlAttr], Html) restElem nm = do ats <- parseAttList ht <- (restElNoCont <|> restElCont nm) return (ats, ht) <?> ("> or /> to close the tag " ++ nm) -- Rest element with no content restElNoCont = do char '/' char '>' return noHtml <?> "/>" -- Rest element with content restElCont nm = do char '>' many space els <- parseProperCont nm return $ concatHtml els <?> "element with content" -- Parse closing tag or proper content(s) parseProperCont :: String -> Parser [Html] parseProperCont nm = try (do closing nm return [] ) <|> (do h <- parseHtmlFrg hs <- parseProperCont nm return (h:hs) ) -- <|> return [] <?> "proper element content" closing nm = do char '<' char '/' nm1 <- elemName char '>' if nm1 == nm then return () else fail $ nm ++ ", encountered " ++ nm1 <?> ("closing of " ++ nm) -- Parse a html attribute parseAttr :: Parser HtmlAttr parseAttr = do at <- many1 lower char '=' va <- parseQuote many space return $ strAttr at va <?> "Attribut" parseAttList = many1 parseAttr <|> return [] <?> "attribute list" -- Parse a quoted string parseQuote :: Parser String parseQuote = do char '"' cs <- upTo ['"'] char '"' return cs -- Parse a text element parseText :: Parser Html parseText = do s <- upTo "<" return (stringToHtml s) <?> "some text" -- For tests: pf p file = parseFromFile p file Nicu

Here's a handy simple function I've found very useful. You'll obviously also need to import Debug.Trace: pTrace s = pt <|> return () where pt = try $ do x <- try $ many1 anyChar trace (s++": " ++x) $ try $ char 'z' fail x It could perhaps be cleaner, but it does the job for me fine. Just insert a line like pTrace "label" anywhere in your parsing functions and whenever parsec hits that line you get a nice line of output: "label: <rest of string to be parsed>" This tends to help track down just where your code goes wrong. Try works like it should in my experience, but that doesn't necessarily mean it works how you expect. Regards, s On Jan 20, 2008, at 12:12 PM, Nicu Ionita wrote:
Hi,
I'm playing since a few hours with Parsec and trying to write a small html (fragment) parser, but I'm stuck in a point which I really can't understand.
The problem seem to be either in "parseProperCont" or in "closing" (see code below). It looks like closing does not work (but it is a very simple function!) or (also hard to believe) function "try" from Parsec has some problems.
Anyway I get this answer:
Prelude ParseSHtml> pf parseHtmlFrg "ptest.txt" Left "ptest.txt" (line 5, column 2): unexpected "/" expecting element name
when I'm parsing this file:
<div id="normtext"> one line with break<br /> another line <br /><br /> Mail: <a href="mailto:user@dom.at">user@dom.at</a> </div>
with this code (sorry for the longer mail):
import Text.ParserCombinators.Parsec hiding (label) import Text.XHtml.Strict
-- Helper function: parse a string up to one of the given chars upTo :: [Char] -> Parser [Char] upTo ds = many1 (noneOf ds)
parseHtmlFrg :: Parser Html parseHtmlFrg = do many space choice [parseElem, parseText] > "html fragment"
parseElem :: Parser Html parseElem = do en <- parseElTag many1 space (ats, cnt) <- restElem en return $ compElem en cnt ! ats > "html element"
-- Compose a html element from tag name and content compElem en cnt = if isNoHtml cnt then itag en else tag en cnt
parseElTag :: Parser String parseElTag = do char '<' en <- elemName return en > "element tag"
elemName :: Parser String elemName = many1 lower > "element name"
restElem :: String -> Parser ([HtmlAttr], Html) restElem nm = do ats <- parseAttList ht <- (restElNoCont <|> restElCont nm) return (ats, ht) > ("> or /> to close the tag " ++ nm)
-- Rest element with no content restElNoCont = do char '/' char '>' return noHtml > "/>"
-- Rest element with content restElCont nm = do char '>' many space els <- parseProperCont nm return $ concatHtml els > "element with content"
-- Parse closing tag or proper content(s) parseProperCont :: String -> Parser [Html] parseProperCont nm = try (do closing nm return [] ) <|> (do h <- parseHtmlFrg hs <- parseProperCont nm return (h:hs) ) -- <|> return [] > "proper element content"
closing nm = do char '<' char '/' nm1 <- elemName char '>' if nm1 == nm then return () else fail $ nm ++ ", encountered " ++ nm1 > ("closing of " ++ nm)
-- Parse a html attribute parseAttr :: Parser HtmlAttr parseAttr = do at <- many1 lower char '=' va <- parseQuote many space return $ strAttr at va <?> "Attribut" parseAttList = many1 parseAttr <|> return [] <?> "attribute list"
-- Parse a quoted string parseQuote :: Parser String parseQuote = do char '"' cs <- upTo ['"'] char '"' return cs
-- Parse a text element parseText :: Parser Html parseText = do s <- upTo "<" return (stringToHtml s) > "some text"
-- For tests: pf p file = parseFromFile p file
Nicu
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (2)
-
Nicu Ionita
-
Sterling Clover