[Newbie][Parsec] Skipping to desired phrase

Hi all, I've got a String containing html and I'd like to extract from it some informations.. Specifically these informations start at point "after" some phrase ( let say "nicelinks"). How do I skip all the html up to the point of this phrase? I've done that much already: p_rest = do skipMany ((try (string "nicelinks")) <|> anyHeadString) text <- many1 anyChar return [text] anyHeadString = do c <- anyChar return [c] But after doing: parse p_rest [] html I get: Left (line 112, column 15): unexpected end of input expecting "nicelinks" What am I doing wrong? Best regards

It skips a lot of characters, then when it gets to nicelinks, it skips
that, then continues to skip characters and nicelinks, then it hits
eof, and says hey I need either nicelinks or some more characters to
continue skipping or a string to capture but there aren't any, so it
barfs.
p_rest = do
manyTill anyChar (try (string "nicelinks")) <?> "fdsa"
text <- many1 anyChar <?> "asdf"
return [text]
This works, however I have a funny feeling you want the anyChar to be
something more complex than a single character, which is why you went
down this route. I had the same problem and some fellow helped me on
stack overflow with a solution. This is a case where you pretty much
have to use recursion to get what you want.
import Text.Parsec
html = "<head>nicelinks:123</head>"
p_rest = do
string "nicelinks" <|> anyHeadString <?> "fdsa"
p_rest <|> manyTill anyChar (try anyHeadString) <?> "asdf"
anyHeadString = try (string "<head>") <|> string "</head>"
main = do
print $ parse p_rest [] html
On Fri, Jul 29, 2011 at 4:27 AM, Kamil Ciemniewski
Hi all, I've got a String containing html and I'd like to extract from it some informations.. Specifically these informations start at point "after" some phrase ( let say "nicelinks"). How do I skip all the html up to the point of this phrase? I've done that much already: p_rest = do skipMany ((try (string "nicelinks")) <|> anyHeadString) text <- many1 anyChar return [text] anyHeadString = do c <- anyChar return [c] But after doing: parse p_rest [] html I get: Left (line 112, column 15): unexpected end of input expecting "nicelinks" What am I doing wrong? Best regards _______________________________________________ web-devel mailing list web-devel@haskell.org http://www.haskell.org/mailman/listinfo/web-devel
participants (2)
-
David McBride
-
Kamil Ciemniewski