
Alexander.Vladislav.Popov wrote:
Help me to parse multiline comments (like /* ... */ or even like /*- ... -*/) using attoparsec.
Daniel Fischer wrote:
The docs for manyTill at http://hackage.haskell.org/packages/archive/attoparsec/0.9.1.2/doc/html/Data... Attoparsec-Combinator.html have the example
simpleComment = string "<!--" *> manyTill anyChar (try (string "-->"))
Unfortunately, many of the examples in the documentation for Attoparsec are just lifted directly from the Parsec documentation, so they need to be modified somewhat to work in Attoparsed. This is one of those cases. First of all, the string function in Attoparsec has type string :: ByteString -> Parser ByteString So you can't use a String as its argument directly; you need to wrap the String with Data.ByteString.Char8.pack. If you are parsing source code text, you anyway would be better off using Text instead of String and attoparsec-text instead of attoparsec. That, together with the OverloadedStrings extension, will solve that problem and more. Then you don't even need the "string" function. Just use string literals to parse strings, and the compiler will automatically insert calls to "string". The next issue is that you don't need "try" - unlike in Parsec, the string combinator in Attoparsec does *not* consume any input when it fails. Finally, if you are bothering to use attoparsec, it is presumably because you want the super speed it is able to achieve using fusion. But the documentation points out that you don't get that fusion for combinators like manyTill, only for "byte-oriented" combinators like takeTill. So you should factor the parser to parse the comment (quickly!) into chunks beginning with '*', then use manyTill only on the chunks. Here is an untested example with the above modifications: comment = "/*" .*> (emptyComment <|> T.concat <$> commentChunks) emptyComment = "*/" .*> pure T.empty commentChunks = manyTill (takeWhile1 (/= '*')) (string "*/") I did need to use the string keyword with manyTill, because manyTill is too polymorphic for the compiler to be able to deduce the type of the string literal. Since manyTill is so common, I often define a type-specialized version of it for convenience: manyTillS :: Parser a -> Parser Text -> Parser [a] manyTillS = manyTill That is analogous to ".*>", attoparsec-text's type-specialized version of "*>" from Control.Applicative. Then you can write: commentChunks = takeWhile1 (/= '*') `manyTillS` "*/" Regards, Yitz