ANNOUNCE: Megaparsec – an improved and actively maintained fork of Parsec

Hello! I'd like to announce the release of Megaparsec, a fork of Parsec that has been in the works for the past 2 months. There's a lot of improvements and bugfixes under the hood – as well as a new test suite with 128 Quickcheck tests covering 80% of code (Parsec has only 3 tests, by the way) – but first I'd like to explain why a fork was needed, since forking a popular library is a pretty drastic measure and should be accompanied by an explanation. (A disclaimer: I've been given permission to announce the library, but I'm neither the author nor an expert on parsing.) Hackage: https://hackage.haskell.org/package/megaparsec Changelog (including a list of differences from Parsec): https://hackage.haskell.org/package/megaparsec/changelog Github: https://github.com/mrkkrp/megaparsec If you ever had any ideas about what Parsec should've done differently, or what amazing new combinators it should include, etc., post your ideas here: https://github.com/mrkkrp/megaparsec/issues Why fork Parsec instead of writing a new library ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ There are many parsing libraries on Hackage; Parsec was one of the first to appear, and it's still the one that is recommended to beginners when they ask about doing parsing in Haskell. Most other libraries aren't exactly trying to compete with Parsec – instead they explore new directions. Like it or not, it remains a fact that a lot of people are being recommended Parsec, a lot of people are using Parsec, and a lot of people will probably continue to use Parsec since there's no clear alternative to it. Writing a new and *different* library probably won't change it. Even new and *similar* libraries (trifecta and attoparsec are similar enough) haven't removed the need for Parsec, only mitigated it somewhat. Perhaps one day trifecta or something else will completely replace Parsec, but right now we still have to put up with Parsec. So, what we need (or at least what would be nice to have) is simply a better Parsec. What's wrong with original Parsec ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Parsec's development has stagnated long ago; you'd have a hard time getting your pull request merged, even if the only thing it does is fixing a typo (I'm not even talking about anything more controversial than that). Parsec isn't perfect or bug-free, and there *is* a need for those pull requests – just look at its Github page, where you'll find 8 unmerged PRs (and 9 open issues): https://github.com/aslatter/parsec In addition to things that could be fixed but simply aren't, there are some inconveniences (and bugs!) that are hard to fix without breaking backwards compatibility: * “notFollowedBy eof” will just silently not do what you expect it to do (a bug old enough to be considered an undocumented feature) * “<|>” and “many” are redefined and so importing Text.Parsec clashes with Control.Applicative (this can't be trivially fixed in Parsec because its “<|>” has different precedence from Control.Applicative's “<|>”) * you can't wrap Parser into monad transformers (I'm talking about things like “WriterT [String] Parser a” – if this was possible, there'd be no need for “user state” baked into ParsecT) * “Text.Parsec.Token” is not flexible enough for many needs; if you depend on it, one day you may find that you have to copy the whole module to get the behavior you want (look at https://github.com/aslatter/parsec/issues/15 and https://github.com/aslatter/parsec/issues/24, for example). In short, there may be a lot of value in improving Parsec – and that's where Megaparsec comes in. What is Megaparsec and how is it different ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Megaparsec “reboots” the development of Parsec. It's not backwards compatible, but it *is* compatible enough to avoid having to rewrite all Parsec tutorials, and code written for Parsec can be converted to use Megaparsec pretty mechanically. Given that, I would recommend using Megaparsec instead of Parsec from now on, unless you need compatibility with GHCs older than 7.10. Here is a detailed account of some of the bigger changes: === Error reporting === Megaparsec's errors messages are significantly more accurate than Parsec's in some cases, as the following demonstration shows. Parsec: > parseTest (try (string "let") <|> string "lexical") "le" parse error at (line 1, column 1): unexpected end of input expecting "lexical" Megaparsec: > parseTest (try (string "let") <|> string "lexical") "le" parse error at line 1, column 1: unexpected "le" expecting "let" or "lexical" And here's another one, showing off how sometimes “try a <|> b” is a bit less harmful than usual. Just in case, I'm talking about this blog post: http://blog.ezyang.com/2014/05/parsec-try-a-or-b-considered-harmful/ If you don't want to read it in full, the gist of it is that if you use “try” too liberally, Parsec's error messages become much worse than they could be. In this example, we're trying to parse a simple version of Haskell's import statements, which can look either as “import Foo” or “import qualified Foo as B”. If you write a parser like this: try pQualifiedImport <|> pImport you will get an uninformative error message on the following input: import qualified Foo s B Specifically, the “try pQualifiedImport” branch will fail, and then the “pImport” branch will stumble upon “qualified” – while the actual error isn't there, but in misspelled “as”. The advice given in the blog post (“The scope of backtracking try should be minimized”) is good, and applies to Megaparsec just as well as it applies to Parsec. However, a curious thing is that in this particular case Parsec *should* be able to tell you that the error is in misspelled “as”, because Parsec implements the longest match rule – errors that occur later in text are given precedence over errors that occur earlier. The code is there, but somehow it doesn't do what it's supposed to do (and Megaparsec fixes that). Here's the code from the article, rewritten for Megaparsec: import Text.Megaparsec import qualified Text.Megaparsec.Lexer as L data Stmt = QualifiedImport String String | Import String deriving (Show) pStmt = try pQualifiedImport <|> pImport pImport = do keyword "import" Import <$> upperCaseIdentifier pQualifiedImport = do keyword "import" keyword "qualified" QualifiedImport <$> upperCaseIdentifier <*> (keyword "as" *> upperCaseIdentifier) upperCaseIdentifier = lexeme $ (:) <$> upperChar <*> many (alphaNumChar <|> oneOf "_.") lexeme = L.lexeme (hidden space) keyword = L.symbol (hidden space) And here are the error messages it produces. Megaparsec: > parseTest (pStmt >> eof) "import qualified Foo s B" parse error at line 1, column 22: unexpected 's' expecting "as" Parsec (with minor modifications): > parseTest (pStmt >> eof) "import qualified Foo s B" parse error at (line 1, column 8): unexpected "q" expecting uppercase letter === Integration with monad transformers === The key type of Parsec is “ParsecT” (others, such as “Parsec” and “Parser”, are just type synonyms). It lets you use parsers with other monads – for instance, if you use “ParsecT String () IO”, you can have IO in your parsers. This is the reason, by the way, why “char” has the type char :: Stream s m Char => Char -> ParsecT s u m Char instead of a simpler char :: Char -> Parser Char (If it was the latter, you wouldn't be able to use “char” and IO in the same parser.) However, even this type isn't general enough for all things you might want to do. Imagine that you want some parsers to generate warnings when they are run, and later you want to collect those warnings and do something with them. This sounds like what Writer was invented for, so you try to use it. Now all your parsers have this type: ParsecT String () (Writer [Warning]) (Where “Warning” is, say, a synonym for String.) Unfortunately, this is not how you should've composed ParsecT and Writer, because this way you don't get backtracking – in other words, if you try to do optional (try someParser) and someParser generates warnings but then fails, “optional” won't be able to make them disappear – they will still be recorded. Same with State – if you do x <* notFollowedBy y and “y” changes the state, this change will be recorded even tho it's not what you want most of the time. (You can use Parsec's internal state and it *will* work, but it doesn't help you when you want to use Writer or something else instead of State.) What you want in such situations is WriterT [Warning] Parser () StateT YourState Parser () ... but you can't get it because “char” and all other primitive parsers simply don't have those types. With Parsec, the only solution is to apply “lift” to all parsers you want to use, which is pretty annoying. Megaparsec solves this by introducing an mtl-style “MonadParsec” class, making primitive parsers members of this class, and providing instances of MonadParsec for various monad transformers. (If you have ever used the ‘parsers’ library, you may recognise this approach.) I think being able to get backtracking behavior without relying on inelegant ways like “Parsec user state” is pretty neat, even if it's not something every Parsec user needs. === Lexing === Parsec has a lexing module (Text.Parsec.Token): http://hackage.haskell.org/package/parsec/docs/Text-Parsec-Token.html If you're not familiar with lexing, the idea is as follows. When you are parsing a programming language, you often have to solve the same set of problems – parsing numbers, string literals (with all those escaping rules), identifiers/operators/keywords, comments, making all parsers handle whitespace (since there can be whitespace between pretty much any 2 tokens), and so on. With “Text.Parsec.Token” you could just specify what counts as whitespace, as a comment, as an identifier character, etc. and get a set of parsers “for free”. Parsec achieves this by defining a huge structure called “GenLanguageDef” that contains the specification of your language: data GenLanguageDef s u m = LanguageDef { commentStart :: String, commentEnd :: String, commentLine :: String, nestedComments :: Bool, identStart :: ParsecT s u m Char, identLetter :: ParsecT s u m Char, opStart :: ParsecT s u m Char, opLetter :: ParsecT s u m Char, reservedNames :: [String], reservedOpNames :: [String], caseSensitive :: Bool } Then you use “makeTokenParser” on it to generate another huge structure containing lots of useful parsers: makeTokenParser :: GenLanguageDef ... -> GenTokenParser ... data GenTokenParser s u m = TokenParser { identifier :: ParsecT s u m String, operator :: ParsecT s u m String, charLiteral :: ParsecT s u m Char, stringLiteral :: ParsecT s u m String, natural :: ParsecT s u m Integer, integer :: ParsecT s u m Integer, float :: ParsecT s u m Double, lexeme :: forall a. ParsecT s u m a -> ParsecT s u m a, parens :: forall a. ParsecT s u m a -> ParsecT s u m a, braces :: forall a. ParsecT s u m a -> ParsecT s u m a, comma :: ParsecT s u m String, colon :: ParsecT s u m String, ... } What's the problem with this approach? It's very inflexible – the moment you want to change something that wasn't supposed to be changed, you're on your own. Do you want to special-case “-- |” comments to use them as doc strings? The easiest solution is to copy the whole module into your own project: https://github.com/aslatter/parsec/issues/15 Do you want to handle newlines by yourself (for instance, to allow them to be expression separators)? The easiest solution is to fork Parsec: https://github.com/aslatter/parsec/issues/24 (Even if those turn out not to be the easiest solutions, it's still somewhat telling that they were what the authors ended up with.) Megaparsec uses a simpler, more flexible approach. See the docs here: https://hackage.haskell.org/package/megaparsec/docs/Text-Megaparsec-Lexer.ht... The linked module provides 3 categories of functions: * “integer”, “decimal”, “float”, “charLiteral”, etc are generic parsers that you can use to parse... well, things that they are named after. Some of those – like “integer” – are occasionally useful even if you're not parsing any languages (how often did you have to write “read <$> many1 digit”? now you don't have to). * “skipLineComment” and “skipBlockComment” generate comment parsers, and “space” combines them together. You can make comment parsers arbitrarily complex before passing them to “space”, or you can write your own space-skipping combinator. * “lexeme”, “symbol”, and “symbol'” make lexemes out of things; a lexeme, by convention, expects no leading whitespace and skips all trailing whitespace. So, instead of having parsers being passed to each other under the hood, you now have to pass them by yourself – except that you don't actually have to do much passing, because you can just write import qualified Text.Megaparsec.Lexer as L lexeme = L.lexeme space symbol = L.symbol space and off you go: keyword = label "keyword" . symbol parens = between (symbol "(") (symbol ")") ... etc ... How you can help ~~~~~~~~~~~~~~~~ * Take a project, modify it to use Megaparsec, file an issue if you've encountered any difficulties. * Take a Parsec tutorial and rewrite it for Megaparsec. (And then please send an email so that a link to your tutorial could be added to the README file.) * As I've already mentioned before, if you have any ideas about what could be changed/improved in Parsec, they likely apply to Megaparsec as well – propose them on the issue tracker. * Report typos in documentation (same goes for mistakes, unclear phrasing, etc). If you're shy (like me) and don't like opening issues, just ping me on IRC (I'm “indiagreen” on Freenode) if you spot anything.

(Artyom, I hope you can forward the following to the original author and, ideally, have them subscribe to the mailing list) As a heavy user of parsec, I have to say this is simply awesome. I've read through the announcement and the changelog, and I have to say the changes seem to be spot on and scratch so many old itches I have personally struggled with when using parsec. I'm at the point where I am seriously considering porting all my code to megaparsec; however, lack of compatibility with GHC <7.10 is a bit of a deal-breaker at this point. I'd like to discuss the decision to fork parsec. Now, I'm not judging: as long as the license permits it, whether to fork or not is really up to the forker. However, it seems the author of megaparsec would like it to become the new standard parsing library (see "How you can help"). I'd like that too. However, I don't think that forking is the best way to achieve that. Of the two reasons that were given here, the first (stagnant development and maintenance) is a good reason to become either a co-maintainer, or a new maintainer. There seems to be a pretty smooth process for that now, and it doesn't raise any eyebrows when someone requests to be the new maintainer having good reasons for it (like willingness to contribute time and effort to the package). The second reason is backwards compatibility. It's true the changes might break some code, but that's why we have major versions. And the overall architecture and interface is still in line with parsec. So, to me it makes perfect sense to call megaparsec a parsec-4.0.0.0 and get some continuity from the older library. That way megaparsec becomes the default just by inheritance. So, if the author of megaparsec is open to becoming a maintainer of parsec, I'm sure the Hackage Trustees would be willing to help. If not, I'm personally willing to champion that. /Andrey On 09/29/2015 02:42 PM, Artyom wrote:
Hello!
I'd like to announce the release of Megaparsec, a fork of Parsec that has been in the works for the past 2 months. There's a lot of improvements and bugfixes under the hood – as well as a new test suite with 128 Quickcheck tests covering 80% of code (Parsec has only 3 tests, by the way) – but first I'd like to explain why a fork was needed, since forking a popular library is a pretty drastic measure and should be accompanied by an explanation. (A disclaimer: I've been given permission to announce the library, but I'm neither the author nor an expert on parsing.) [..snip..]

Il giorno 29/set/2015, alle ore 21:46, Andrey Chudnov
ha scritto: (Artyom, I hope you can forward the following to the original author and, ideally, have them subscribe to the mailing list)
As a heavy user of parsec, I have to say this is simply awesome. I've read through the announcement and the changelog, and I have to say the changes seem to be spot on and scratch so many old itches I have personally struggled with when using parsec. I'm at the point where I am seriously considering porting all my code to megaparsec; however, lack of compatibility with GHC <7.10 is a bit of a deal-breaker at this point.
I'd like to discuss the decision to fork parsec. Now, I'm not judging: as long as the license permits it, whether to fork or not is really up to the forker.
However, it seems the author of megaparsec would like it to become the new standard parsing library (see "How you can help"). I'd like that too. However, I don't think that forking is the best way to achieve that. Of the two reasons that were given here, the first (stagnant development and maintenance) is a good reason to become either a co-maintainer, or a new maintainer. There seems to be a pretty smooth process for that now, and it doesn't raise any eyebrows when someone requests to be the new maintainer having good reasons for it (like willingness to contribute time and effort to the package).
The second reason is backwards compatibility. It's true the changes might break some code, but that's why we have major versions. And the overall architecture and interface is still in line with parsec. So, to me it makes perfect sense to call megaparsec a parsec-4.0.0.0 and get some continuity from the older library. That way megaparsec becomes the default just by inheritance.
So, if the author of megaparsec is open to becoming a maintainer of parsec, I'm sure the Hackage Trustees would be willing to help. If not, I'm personally willing to champion that.
/Andrey
+1 Megaparsec seems awesome to me, especially the monad transformers issue is exactly what I missed in using Parsec. As someone that often teaches a bit of haskell, I would also like it to become "Parsec 4.0” if that’s possible. That would save a lot of boilerplate in docs and tutorials about why I’m talking about megaparsec while it seems the most used library is parsec and explain the reasons of the fork etc.. every time. To megaparsec’s author: great work! Bye, Nicola

Agreed. Great work, and if we can find a path to this being a foundation
for or inspiring parsec 4.0, that would probably be a net win for everyone.
On Wednesday, September 30, 2015, Nicola Gigante
Il giorno 29/set/2015, alle ore 21:46, Andrey Chudnov < achudnov@gmail.com javascript:;> ha scritto:
(Artyom, I hope you can forward the following to the original author and, ideally, have them subscribe to the mailing list)
As a heavy user of parsec, I have to say this is simply awesome. I've read through the announcement and the changelog, and I have to say the changes seem to be spot on and scratch so many old itches I have personally struggled with when using parsec. I'm at the point where I am seriously considering porting all my code to megaparsec; however, lack of compatibility with GHC <7.10 is a bit of a deal-breaker at this point.
I'd like to discuss the decision to fork parsec. Now, I'm not judging: as long as the license permits it, whether to fork or not is really up to the forker.
However, it seems the author of megaparsec would like it to become the new standard parsing library (see "How you can help"). I'd like that too. However, I don't think that forking is the best way to achieve that. Of the two reasons that were given here, the first (stagnant development and maintenance) is a good reason to become either a co-maintainer, or a new maintainer. There seems to be a pretty smooth process for that now, and it doesn't raise any eyebrows when someone requests to be the new maintainer having good reasons for it (like willingness to contribute time and effort to the package).
The second reason is backwards compatibility. It's true the changes might break some code, but that's why we have major versions. And the overall architecture and interface is still in line with parsec. So, to me it makes perfect sense to call megaparsec a parsec-4.0.0.0 and get some continuity from the older library. That way megaparsec becomes the default just by inheritance.
So, if the author of megaparsec is open to becoming a maintainer of parsec, I'm sure the Hackage Trustees would be willing to help. If not, I'm personally willing to champion that.
/Andrey
+1
Megaparsec seems awesome to me, especially the monad transformers issue is exactly what I missed in using Parsec.
As someone that often teaches a bit of haskell, I would also like it to become "Parsec 4.0” if that’s possible. That would save a lot of boilerplate in docs and tutorials about why I’m talking about megaparsec while it seems the most used library is parsec and explain the reasons of the fork etc.. every time.
To megaparsec’s author: great work!
Bye, Nicola _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org javascript:; http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

I think the easiest way might be adding Mark Karpov (the author of megaparsec) as the maintainer of parsec and let him run with it. (Assuming he would like that too.) As a package trustee, Carter, you probably have the best chance of making that happen. Although, I'd love to hear from the 4 (sic!) parsec maintainers regarding the state of the library, their commitment to maintenance and their opinion of megaparsec. Request for comment from Antoine Latter, Derek Elkins, Ian Lynagh and Roman Cheplyaka. On 09/30/2015 03:32 PM, Carter Schonwald wrote:
Agreed. Great work, and if we can find a path to this being a foundation for or inspiring parsec 4.0, that would probably be a net win for everyone.
On Wednesday, September 30, 2015, Nicola Gigante
mailto:nicola.gigante@gmail.com> wrote: > Il giorno 29/set/2015, alle ore 21:46, Andrey Chudnov
javascript:;> ha scritto: > > (Artyom, I hope you can forward the following to the original author and, ideally, have them subscribe to the mailing list) > > As a heavy user of parsec, I have to say this is simply awesome. I've read through the announcement and the changelog, and I have to say the changes seem to be spot on and scratch so many old itches I have personally struggled with when using parsec. I'm at the point where I am seriously considering porting all my code to megaparsec; however, lack of compatibility with GHC <7.10 is a bit of a deal-breaker at this point. > > I'd like to discuss the decision to fork parsec. Now, I'm not judging: as long as the license permits it, whether to fork or not is really up to the forker. > > However, it seems the author of megaparsec would like it to become the new standard parsing library (see "How you can help"). I'd like that too. However, I don't think that forking is the best way to achieve that. Of the two reasons that were given here, the first (stagnant development and maintenance) is a good reason to become either a co-maintainer, or a new maintainer. There seems to be a pretty smooth process for that now, and it doesn't raise any eyebrows when someone requests to be the new maintainer having good reasons for it (like willingness to contribute time and effort to the package). > > The second reason is backwards compatibility. It's true the changes might break some code, but that's why we have major versions. And the overall architecture and interface is still in line with parsec. So, to me it makes perfect sense to call megaparsec a parsec-4.0.0.0 and get some continuity from the older library. That way megaparsec becomes the default just by inheritance. > > So, if the author of megaparsec is open to becoming a maintainer of parsec, I'm sure the Hackage Trustees would be willing to help. If not, I'm personally willing to champion that. > > /Andrey > +1
Megaparsec seems awesome to me, especially the monad transformers issue is exactly what I missed in using Parsec.
As someone that often teaches a bit of haskell, I would also like it to become "Parsec 4.0” if that’s possible. That would save a lot of boilerplate in docs and tutorials about why I’m talking about megaparsec while it seems the most used library is parsec and explain the reasons of the fork etc.. every time.
To megaparsec’s author: great work!
Bye, Nicola _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org javascript:; http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On Wed, Sep 30, 2015 at 08:51:09PM -0400, Andrey Chudnov wrote:
I think the easiest way might be adding Mark Karpov (the author of megaparsec) as the maintainer of parsec and let him run with it. (Assuming he would like that too.) As a package trustee, Carter, you probably have the best chance of making that happen.
Although, I'd love to hear from the 4 (sic!) parsec maintainers regarding the state of the library, their commitment to maintenance and their opinion of megaparsec.
Request for comment from Antoine Latter, Derek Elkins, Ian Lynagh and Roman Cheplyaka.
Maybe they aren't subscribed to the ML, courtesy ping to Antoine Latter.

On 10/01/2015 03:51 AM, Andrey Chudnov wrote:
I think the easiest way might be adding Mark Karpov (the author of megaparsec) as the maintainer of parsec and let him run with it. (Assuming he would like that too.) As a package trustee, Carter, you probably have the best chance of making that happen.
Although, I'd love to hear from the 4 (sic!) parsec maintainers regarding the state of the library, their commitment to maintenance and their opinion of megaparsec.
Request for comment from Antoine Latter, Derek Elkins, Ian Lynagh and Roman Cheplyaka.
Indeed four of us happen to have hackage upload rights for parsec, but I believe the only actual maintainer is Antoine. For what it's worth, I've been long dissatisfied with the state of parsec and would love to see it evolve. But let's hear from Antoine. Roman
participants (6)
-
Andrey Chudnov
-
Artyom
-
Carter Schonwald
-
Francesco Ariis
-
Nicola Gigante
-
Roman Cheplyaka