
Hello!
On Mon, Nov 23, 2009 at 12:29 PM, Antoine Latter
On Sun, Nov 22, 2009 at 11:56 AM, Antoine Latter
wrote: Running 'pandoc --strict' over the Markdown readme.text takes:
~0.09s with pandoc built against parsec-2 ~0.19s with pandoc built against parsec-3
on my machine.
I have a branch of parsec-3 which seems to brings us back to parsec-2 numbers, but also fails the rst-reader test-case in the pandoc testing suite:
In reply to my own post, the branch of parsec posted now passes all of the pandoc test cases.
If there are any other consumers of the parsec library that have tests I can run let me know.
The 'many' combinator is one of those things that can look right, be wrong, yet work for almost everything.
I finally had some time to test it. After running it multiple times (of course, it would be nice to use criterion here), I'm getting numbers in this neighborhood: $ ghc --version The Glorious Glasgow Haskell Compilation System, version 6.12.0.20091121 $ time pandoc --strict README > /dev/null parsec 2: --- real 0m0.140s user 0m0.130s sys 0m0.010s parsec 3 - antoine's: --- real 0m0.151s user 0m0.150s sys 0m0.010s parsec 3 - hackage: --- real 0m0.243s user 0m0.240s sys 0m0.010s Nice work, Antoine! Paulo

On Mon, Nov 23, 2009 at 12:29 PM, Antoine Latter
wrote:
I finally had some time to test it. After running it multiple times (of course, it would be nice to use criterion here), I'm getting numbers in this neighborhood:
I used criterion to compare pandoc compiled with parsec2 to pandoc compiled with your version of parsec3. (The benchmark is converting testsuite.txt from markdown to HTML.) The difference was minor: parsec2: mean: 67.66576 ms, lb 67.56722 ms, ub 67.88983 ms, ci 0.950 std dev: 722.3878 us, lb 323.0343 us, ub 1.356013 ms, ci 0.950 parsec3: mean: 68.20847 ms, lb 68.16387 ms, ub 68.26284 ms, ci 0.950 std dev: 252.7773 us, lb 204.5512 us, ub 325.2424 us, ci 0.950 So, once you release the new parsec3, I am prepared to remove the "parsec < 3" restriction from the libraries I maintain: pandoc, highlighting-kate, filestore, gitit, and yst. John

Hi John,
On Fri, Dec 4, 2009 at 3:09 PM, John MacFarlane
I used criterion to compare pandoc compiled with parsec2 to pandoc compiled with your version of parsec3. (The benchmark is converting testsuite.txt from markdown to HTML.) The difference was minor:
parsec2: mean: 67.66576 ms, lb 67.56722 ms, ub 67.88983 ms, ci 0.950 std dev: 722.3878 us, lb 323.0343 us, ub 1.356013 ms, ci 0.950
parsec3: mean: 68.20847 ms, lb 68.16387 ms, ub 68.26284 ms, ci 0.950 std dev: 252.7773 us, lb 204.5512 us, ub 325.2424 us, ci 0.950
So, once you release the new parsec3, I am prepared to remove the "parsec < 3" restriction from the libraries I maintain: pandoc, highlighting-kate, filestore, gitit, and yst.
John
Great! Antoine, other tests we should do? Derek, I apologize if you're already following this, but can you give us your opinion? Paulo

On Fri, Dec 4, 2009 at 7:47 PM, Paulo Tanimoto
Great! Antoine, other tests we should do? Derek, I apologize if you're already following this, but can you give us your opinion?
Paulo
Well, the more eyes and test cases we can get on the new code the better. I'd also like to encourage anyone who's familiar with the idea of how parsec works to take a look at my changes to make sure they're comprehensible. Antoine

On Sun, Dec 6, 2009 at 8:40 AM, Antoine Latter
Well, the more eyes and test cases we can get on the new code the better. I'd also like to encourage anyone who's familiar with the idea of how parsec works to take a look at my changes to make sure they're comprehensible.
Where would I look to find a darcs repo with your changes? It would be great to see parsec3 finally replace parsec2. Besides the performance issue, are there any other considerations keeping it from becoming the default? (I'd also like to write up a patch to add Data.Text support, not surprisingly.)

Hi Bryan,
On Tue, Dec 15, 2009 at 3:13 PM, Bryan O'Sullivan
Where would I look to find a darcs repo with your changes? It would be great to see parsec3 finally replace parsec2. Besides the performance issue, are there any other considerations keeping it from becoming the default? (I'd also like to write up a patch to add Data.Text support, not surprisingly.)
This is the darcs repository, it's Antoine's work: http://community.haskell.org/~aslatter/code/parsec/cps I don't know about the second question, I think performance and just inertia were the big factors. I don't know, for example, if it's like quickcheck, where the new version is missing a few things that the old one provided. Everything appears to work for Pandoc though. Data.Text support would be wonderful! Paulo

On Tue, Dec 15, 2009 at 3:13 PM, Bryan O'Sullivan
Besides the performance issue, are there any other considerations keeping it from becoming the default?
One thing that makes me a bit hesitant is that it's a pretty big change to the core parser data structure, to the extent that I'm not sure I should even call it "Parsec." Reading through the Parsec technical report, one of the innovations that made Parsec what it was is that it introduced a new way of returning the four possible parse results Previous work with parsers returning good error message indicated parse reults with the following sort of data structure: data ParseResult s a = EmptyOk s a -- parsed ok but did not consume any input | EmptyError ErrorMessage -- did not parse okay, but did not consume any input | ConsumedOk s a -- parsed ok and consumed input | ConsumedError ErrorMessage -- did not parse okay and consumed input To something like: data Consumed a = Empty a | Consumed a data Reply s a = Ok s a | Error ErrorMessage data ParseResult s a = Consumed (Reply s a) This change allows us to determine whether a parser consumes input before forcing the computation that determines if the parser succeeds, improving performance and getting rid of a space leak. My branch takes us back to returning the four flattened results, but offers them as four continuations to take during parsing. So my biggest reservation is if I can even call my branch the same parser as Parsec. Antoine

So my biggest reservation is if I can even call my branch the same parser as Parsec.
Maybe we could go the way regexes: say "parsec" is an API, and allow multiple backends with different characteristics as long as they basically implement that API. In practice, a lot of parser combinator libraries have something close to the parsec API already. They don't have to be completely compatible, i.e. you might have a 'parse' function take different arguments, but it would be practical to have a standard set of core combinators. Or come up with a new name for the "API" or call in parseclike or something :)

What is the relationship between the "Parsec API", Applicative and Alternative? Is the only point of overlap `<|>`? -- Jason Dusek

2009/12/16 Jason Dusek
What is the relationship between the "Parsec API", Applicative and Alternative? Is the only point of overlap `<|>`?
Hello everyone, Lots of functions in Text.ParserCombinators.Parsec.Combinator can be defined with only with an obligation on Applicative and no need to access the token stream, parser state. etc. No surprise of course, as the equivalents were available in UU parsing. Here are some if the ones form Text.ParserCombinators.Parsec.Combinator defined just with Applicative obligations that I did for a two continuation parser monad a while ago. I was a bit surprised no-one had put something similar up on Hackage, maybe they have but its not yet indexed by Hayoo. The code is a bit old and might be some way off optimal, though the combinators do show the utility of applicative cons (<:>). Parsec's character parsers (Text.ParserCombinators.Parsec.Char) need access to the input. My attempts to define similar ones with only type class obligations rather than some concrete character type were dismal failures. Best wishes Stephen -- Applicative cons (<:>) :: Applicative f => f a -> f [a] -> f [a] (<:>) p1 p2 = (:) <$> p1 <*> p2 choice :: Alternative f => [f a] -> f a choice = foldr (<|>) empty count :: Applicative f => Int -> f a -> f [a] count i p | i <= 0 = pure [] | otherwise = p <:> count (i-1) p between :: Applicative f => f open -> f close -> f a -> f a between o c a = o *> a <* c option :: Alternative f => a -> f a -> f a option x p = p <|> pure x optionMaybe :: Alternative f => f a -> f (Maybe a) optionMaybe = optional -- aka Parsecs /optional/ optionUnit :: Alternative f => f a -> f () optionUnit p = () <$ p <|> pure () skipMany1 :: Alternative f => f a -> f () skipMany1 p = p *> skipMany p skipMany :: Alternative f => f a -> f () skipMany p = many_p where many_p = some_p <|> pure () some_p = p *> many_p -- | @many1@ an alias for @some@. many1 :: Alternative f => f a -> f [a] many1 = some sepBy :: Alternative f => f a -> f b -> f [a] sepBy p sep = sepBy1 p sep <|> pure [] sepBy1 :: Alternative f => f a -> f b -> f [a] sepBy1 p sep = p <:> step where step = (sep *> p) <:> step <|> pure [] sepEndBy :: Alternative f => f a -> f b -> f [a] sepEndBy p sep = sepEndBy1 p sep <|> pure [] sepEndBy1 :: Alternative f => f a -> f b -> f [a] sepEndBy1 p sep = (p <* sep) <:> step where step = (p <* sep) <:> step <|> pure [] manyTill :: Alternative f => f a -> f b -> f [a] manyTill p end = step <|> pure [] where step = p <:> (step <|> (pure [] <$> end))

2009/12/16 Stephen Tetley
2009/12/16 Jason Dusek
: What is the relationship between the "Parsec API", Applicative and Alternative? Is the only point of overlap `<|>`?
Lots of functions in Text.ParserCombinators.Parsec.Combinator can be defined with only with an obligation on Applicative and no need to access the token stream, parser state. etc. No surprise of course, as the equivalents were available in UU parsing.
Is UU parsing more "Applicative aware", then? When you say "obligation", what do you mean? -- Jason Dusek

Hi Jason
UU parsing somewhat invented the Applicative style - it defined the
usual combinators from Control.Applicative (<$>), (<*>), (<*), (*>)
etc. but didn't have an 'Applicative' type class.
By obligation, I mean relying only on the Applicative class for the
derived operations, here manyTill, choice etc.
Best wishes
Stephen
2009/12/16 Jason Dusek
Is UU parsing more "Applicative aware", then?
When you say "obligation", what do you mean?

Hi everyone,
While you are discussing performance of parsing combinator libraries,
I though I'd mention "parsimony", available from Hackage. It has as
good performance as parsec v2 but it also has support for different
buffer types (e.g., byte strings, including support for utf8 decoding,
etc) which is similar to parsec v3. I also think that it has a
slightly simpler API.
-Iavor
On Wed, Dec 16, 2009 at 4:45 AM, Stephen Tetley
Hi Jason
UU parsing somewhat invented the Applicative style - it defined the usual combinators from Control.Applicative (<$>), (<*>), (<*), (*>) etc. but didn't have an 'Applicative' type class.
By obligation, I mean relying only on the Applicative class for the derived operations, here manyTill, choice etc.
Best wishes
Stephen
2009/12/16 Jason Dusek
: ... Is UU parsing more "Applicative aware", then?
When you say "obligation", what do you mean?
Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

On Fri, Dec 4, 2009 at 1:09 PM, John MacFarlane
On Mon, Nov 23, 2009 at 12:29 PM, Antoine Latter
wrote: I finally had some time to test it. After running it multiple times (of course, it would be nice to use criterion here), I'm getting numbers in this neighborhood:
I used criterion to compare pandoc compiled with parsec2 to pandoc compiled with your version of parsec3. (The benchmark is converting testsuite.txt from markdown to HTML.) The difference was minor:
Very nice, I was interested in parsec 3 but scared off by the reports of slowness, as I'm sure many others were. Is there any document out there describing the differences between 2 and 3? I gathered 3 allows more flexibility wrt the input, so you can more easily use ByteString or Text, but it would be nice to have a doc saying what the new features are and why we should be interested in upgrading. The old parsec docs were out of date even for parsec 2, and looks like they haven't been updated. The new ones look like they use haddock which is great, that was a gripe I had about the old doc. However, the haddock docs are less friendly than the old doc. So my suggestion is to paste the old introduction (with Daan permission, of course) or something similar into the Text.Parsec description field, along with links to more detailed descriptions and tutorial in the style of v2 on haskell.org along with a 2 vs. 3 doc, even if they're sketchy and brief. Or if it's ok I could just send some darcs patches :)

On Fri, Dec 4, 2009 at 11:01 PM, Evan Laforge
On Fri, Dec 4, 2009 at 1:09 PM, John MacFarlane
wrote: On Mon, Nov 23, 2009 at 12:29 PM, Antoine Latter
wrote: I finally had some time to test it. After running it multiple times (of course, it would be nice to use criterion here), I'm getting numbers in this neighborhood:
I used criterion to compare pandoc compiled with parsec2 to pandoc compiled with your version of parsec3. (The benchmark is converting testsuite.txt from markdown to HTML.) The difference was minor:
Very nice, I was interested in parsec 3 but scared off by the reports of slowness, as I'm sure many others were.
Is there any document out there describing the differences between 2 and 3? I gathered 3 allows more flexibility wrt the input, so you can more easily use ByteString or Text, but it would be nice to have a doc saying what the new features are and why we should be interested in upgrading.
Basically, the main (only) significant changes are that Parsec 3 provides a monad transformer rather than just a monad and the input has been generalized to take arbitrary "Streams" rather than lists of tokens.
The old parsec docs were out of date even for parsec 2, and looks like they haven't been updated. The new ones look like they use haddock which is great, that was a gripe I had about the old doc. However, the haddock docs are less friendly than the old doc. So my suggestion is to paste the old introduction (with Daan permission, of course) or something similar into the Text.Parsec description field, along with links to more detailed descriptions and tutorial in the style of v2 on haskell.org along with a 2 vs. 3 doc, even if they're sketchy and brief.
The Parsec Letter applies to Parsec 3 readily. The only thing that needs changing is the module names and possibly one or two function names which the haddock documentation should readily point out. The letter obviously does not cover the new features of Parsec 3; for that there is only the haddock at this point. It would be nice to update the Parsec letter but that would ideally require the document source and necessarily require Daan's permission. Unfortunately, no one has been able to get in touch with Daan on this issue to my knowledge.
Or if it's ok I could just send some darcs patches :)
You can certainly email me patches and I'll likely apply them if there are no copyright or licensing issues.
participants (9)
-
Antoine Latter
-
Bryan O'Sullivan
-
Derek Elkins
-
Evan Laforge
-
Iavor Diatchki
-
Jason Dusek
-
John MacFarlane
-
Paulo Tanimoto
-
Stephen Tetley