
Hi all, I am happy to announce binary-parsers. A ByteString parsing library built on binary. I borrowed lots of design/tests/document from attoparsec so that i can build its shape very quickly, thank you bos! And thanks to binary's excellent design, the codebase is very small(<500 loc). From my benchmark, it’s insanely fast, it outperforms attoparsec by 10%~30% in aeson benchmark. it’s also slightly faster than scanner(a non-backtracking parser designed for speed) in http request benchmark. I’d like to ask you to give it a shot if you need super fast ByteString parsing. These parsers are just binary's Get monads, so you can seamlessly combine them with combinators from binary package. You can now write more complex Binary instances using comprehensive combinators, with serialisation packages like blaze-texual. all these goodies come for free! Happy hacking! Github: https://github.com/winterland1989/binary-parsers Hackage: http://hackage.haskell.org/package/binary-parsers Winterland

On Thu, Sep 22, 2016 at 7:47 PM, 韩冬(基础平台部)
Hi all,
I am happy to announce binary-parsers. A ByteString parsing library built on binary. I borrowed lots of design/tests/document from attoparsec so that i can build its shape very quickly, thank you bos! And thanks to binary's excellent design, the codebase is very small(<500 loc).
From my benchmark, it’s insanely fast, it outperforms attoparsec by 10%~30% in aeson benchmark. it’s also slightly faster than scanner(a non-backtracking parser designed for speed) in http request benchmark. I’d like to ask you to give it a shot if you need super fast ByteString parsing.
Yay! more users of my bytestring-lexing package :) Since attoparsec's numeric parsers are dreadfully slow, can you tell how much of your speedup is due to bytestring-lexing vs how much is due to other differences vs aeson? -- Live well, ~wren

Hi wren!
Yes, i noticed that attoparsec's numeric parsers are slow. I have a benchmark set to compare attoparsec and binary-parsers on different sample JSON files, it's on github: https://github.com/winterland1989/binary-parsers.
I'm pretty sure bytestring-lexing helped a lot, for example, the average decoding speed improvement is around 20%, but numeric only benchmarks(integers and numbers) improved by 30% !
Parsing is just a part of JSON decoding, lots of time is spent on unescaping, .etc. So the parser's improvement is quite large IMHO.
BTW, can you provide a version of lexer which doesn't check whether a Word is a digit? In binary-parsers i use something like `takeWhile isDigit` to extract the input ByteString, so there's no need to verify this in lexer again. Maybe we can have another performance improvement.
Cheers!
Winterland
________________________________________
From: winterkoninkje@gmail.com
Hi all,
I am happy to announce binary-parsers. A ByteString parsing library built on binary. I borrowed lots of design/tests/document from attoparsec so that i can build its shape very quickly, thank you bos! And thanks to binary's excellent design, the codebase is very small(<500 loc).
From my benchmark, it’s insanely fast, it outperforms attoparsec by 10%~30% in aeson benchmark. it’s also slightly faster than scanner(a non-backtracking parser designed for speed) in http request benchmark. I’d like to ask you to give it a shot if you need super fast ByteString parsing.
Yay! more users of my bytestring-lexing package :) Since attoparsec's numeric parsers are dreadfully slow, can you tell how much of your speedup is due to bytestring-lexing vs how much is due to other differences vs aeson? -- Live well, ~wren

On Sun, Oct 2, 2016 at 3:17 AM, 韩冬(基础平台部)
Hi wren!
Yes, i noticed that attoparsec's numeric parsers are slow. I have a benchmark set to compare attoparsec and binary-parsers on different sample JSON files, it's on github: https://github.com/winterland1989/binary-parsers.
I'm pretty sure bytestring-lexing helped a lot, for example, the average decoding speed improvement is around 20%, but numeric only benchmarks(integers and numbers) improved by 30% !
So still some substantial gains for non-numeric stuff, nice!
Parsing is just a part of JSON decoding, lots of time is spent on unescaping, .etc. So the parser's improvement is quite large IMHO.
BTW, can you provide a version of lexer which doesn't check whether a Word is a digit? In binary-parsers i use something like `takeWhile isDigit` to extract the input ByteString, so there's no need to verify this in lexer again. Maybe we can have another performance improvement.
I suppose I could, but then it wouldn't be guaranteed to return correct answers. The way things are set up now, the intended workflow is that wherever you're expecting a number, you should just hand the ByteString over to bytestring-lexing (i.e., not bother scanning/pre-lexing via `takeWhile isDigit`) and it'll give back the answer together with the remainder of the input. This ensures that you don't need to do two passes over the characters. So, for Attoparsec itself you'd wrap it up with something like: decimal :: Integral a => Parser a decimal = get >>= \bs -> case readDecimal bs of Nothing -> fail "error message" Just (a, bs') -> put bs' >> return a Alas `get` isn't exported[1], but you get the idea. Of course, for absolute performance you may want to inline all the combinators to see if there's stuff you can get rid of. The only reason for scanning ahead is in case you're dealing with lazy bytestrings and so need to glue them together in order to use bytestring-lexing. Older versions of the library did have support for lazy bytestrings, but I removed it because it was bitrotten and unused. But if you really need it, I can add new variants of the lexers for dealing with the possibility of requesting new data when the input runs out. [1] http://hackage.haskell.org/package/attoparsec-0.13.1.0/docs/src/Data-Attopar... -- Live well, ~wren

The only reason for scanning ahead is in case you're dealing with lazy bytestrings and so need to glue them together in order to use bytestring-lexing. Older versions of the library did have support for lazy bytestrings, but I removed it because it was bitrotten and unused. But if you really need it, I can add new variants of the lexers for dealing with the possibility of requesting new data when the input runs out.
Yes, please! the only reason i have to use `takeWhile isDigit` myself is that `takeWhile` will take care partial input for me, but if you can provide a version which is easy to deal incremental input, then i should rely on bytestring-lexing completely. You may be interested in `scanChunks` combinator in binary-parsers. Let’s work something out, if you need any help please tell me, thanks! cheers!~ winter
On Oct 9, 2016, at 13:56, wren romano
wrote: On Sun, Oct 2, 2016 at 3:17 AM, 韩冬(基础平台部)
wrote: Hi wren!
Yes, i noticed that attoparsec's numeric parsers are slow. I have a benchmark set to compare attoparsec and binary-parsers on different sample JSON files, it's on github: https://github.com/winterland1989/binary-parsers.
I'm pretty sure bytestring-lexing helped a lot, for example, the average decoding speed improvement is around 20%, but numeric only benchmarks(integers and numbers) improved by 30% !
So still some substantial gains for non-numeric stuff, nice!
Parsing is just a part of JSON decoding, lots of time is spent on unescaping, .etc. So the parser's improvement is quite large IMHO.
BTW, can you provide a version of lexer which doesn't check whether a Word is a digit? In binary-parsers i use something like `takeWhile isDigit` to extract the input ByteString, so there's no need to verify this in lexer again. Maybe we can have another performance improvement.
I suppose I could, but then it wouldn't be guaranteed to return correct answers. The way things are set up now, the intended workflow is that wherever you're expecting a number, you should just hand the ByteString over to bytestring-lexing (i.e., not bother scanning/pre-lexing via `takeWhile isDigit`) and it'll give back the answer together with the remainder of the input. This ensures that you don't need to do two passes over the characters. So, for Attoparsec itself you'd wrap it up with something like:
decimal :: Integral a => Parser a decimal = get >>= \bs -> case readDecimal bs of Nothing -> fail "error message" Just (a, bs') -> put bs' >> return a
Alas `get` isn't exported[1], but you get the idea. Of course, for absolute performance you may want to inline all the combinators to see if there's stuff you can get rid of.
The only reason for scanning ahead is in case you're dealing with lazy bytestrings and so need to glue them together in order to use bytestring-lexing. Older versions of the library did have support for lazy bytestrings, but I removed it because it was bitrotten and unused. But if you really need it, I can add new variants of the lexers for dealing with the possibility of requesting new data when the input runs out.
[1] http://hackage.haskell.org/package/attoparsec-0.13.1.0/docs/src/Data-Attopar...
-- Live well, ~wren _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

Hi, wren BTW, I think it’s a good idea to host your code on github which is easier to send patch .etc, can you mirror your bytestring-lexing repo to github? happy hacking! winter
On Oct 10, 2016, at 11:39, winter
wrote: On Oct 9, 2016, at 13:56, wren romano
wrote: On Sun, Oct 2, 2016 at 3:17 AM, 韩冬(基础平台部)
wrote: Hi wren!
Yes, i noticed that attoparsec's numeric parsers are slow. I have a benchmark set to compare attoparsec and binary-parsers on different sample JSON files, it's on github: https://github.com/winterland1989/binary-parsers.
I'm pretty sure bytestring-lexing helped a lot, for example, the average decoding speed improvement is around 20%, but numeric only benchmarks(integers and numbers) improved by 30% !
So still some substantial gains for non-numeric stuff, nice!
Parsing is just a part of JSON decoding, lots of time is spent on unescaping, .etc. So the parser's improvement is quite large IMHO.
BTW, can you provide a version of lexer which doesn't check whether a Word is a digit? In binary-parsers i use something like `takeWhile isDigit` to extract the input ByteString, so there's no need to verify this in lexer again. Maybe we can have another performance improvement.
I suppose I could, but then it wouldn't be guaranteed to return correct answers. The way things are set up now, the intended workflow is that wherever you're expecting a number, you should just hand the ByteString over to bytestring-lexing (i.e., not bother scanning/pre-lexing via `takeWhile isDigit`) and it'll give back the answer together with the remainder of the input. This ensures that you don't need to do two passes over the characters. So, for Attoparsec itself you'd wrap it up with something like:
decimal :: Integral a => Parser a decimal = get >>= \bs -> case readDecimal bs of Nothing -> fail "error message" Just (a, bs') -> put bs' >> return a
Alas `get` isn't exported[1], but you get the idea. Of course, for absolute performance you may want to inline all the combinators to see if there's stuff you can get rid of.
The only reason for scanning ahead is in case you're dealing with lazy bytestrings and so need to glue them together in order to use bytestring-lexing. Older versions of the library did have support for lazy bytestrings, but I removed it because it was bitrotten and unused. But if you really need it, I can add new variants of the lexers for dealing with the possibility of requesting new data when the input runs out.
[1] http://hackage.haskell.org/package/attoparsec-0.13.1.0/docs/src/Data-Attopar...
-- Live well, ~wren _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Tue, Oct 11, 2016 at 3:14 AM, winter
Hi, wren
BTW, I think it’s a good idea to host your code on github which is easier to send patch .etc, can you mirror your bytestring-lexing repo to github?
Like most of my darcs repos, it's already mirrored to github: https://github.com/wrengr/bytestring-lexing -- Live well, ~wren

"韩冬(基础平台部)"
Hi all,
I am happy to announce binary-parsers. A ByteString parsing library built on binary. I borrowed lots of design/tests/document from attoparsec so that i can build its shape very quickly, thank you bos! And thanks to binary's excellent design, the codebase is very small(<500 loc).
What in particular changed to produce these performance improvements? I have been maintaining attoparsec recently and would be happy to merge any semantics-preserving changes or new combinators that would help existing users benefit from your work. Cheers, - Ben

Hi Ben! I'm not familiar enough with attoparsec's internal to give you a concrete answer, since the core parser type of binary and attoparsec is so different. But I guess it may have something to do with ghc specializer, because attoparsec try to parametrize the input type to support both Text and ByteString (which is a bad decision IMHO). Actually I think current binary's Decoder type can be improved further following attoparsec: the Pos state is encoded directly into CPS parser type, I'll try to see if this is an improvement or not. In an ideal world, I think we should have a fast parser for ByteString which support both binary's getWordXX and ascii textual content, and a fast parser specialized for Text, let me know what's your thoughts! Cheers! Winter 发自我的 iPhone
在 2016年10月12日,上午5:38,Ben Gamari
写道: "韩冬(基础平台部)"
writes: Hi all,
I am happy to announce binary-parsers. A ByteString parsing library built on binary. I borrowed lots of design/tests/document from attoparsec so that i can build its shape very quickly, thank you bos! And thanks to binary's excellent design, the codebase is very small(<500 loc). What in particular changed to produce these performance improvements? I have been maintaining attoparsec recently and would be happy to merge any semantics-preserving changes or new combinators that would help existing users benefit from your work.
Cheers,
- Ben _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.

On Tue, Oct 11, 2016 at 2:38 PM, Ben Gamari
What in particular changed to produce these performance improvements? I have been maintaining attoparsec recently and would be happy to merge any semantics-preserving changes or new combinators that would help existing users benefit from your work.
I can't speak to Winter's work, but for my part: One of the big things is bytestring-lexing. I'd mentioned to Bryan about using it before, but that never went anywhere. I also have a handful of other minor patches which help to optimize a few combinators here and there. I'd sent a bunch of these to Bryan back when I did them; some got merged but I think some also got lost in the shuffle. I can resend them if you'd like. -- Live well, ~wren
participants (5)
-
Ben Gamari
-
winter
-
wren romano
-
wren romano
-
韩冬(基础平台部)