
Hi, wren BTW, I think it’s a good idea to host your code on github which is easier to send patch .etc, can you mirror your bytestring-lexing repo to github? happy hacking! winter
On Oct 10, 2016, at 11:39, winter
wrote: On Oct 9, 2016, at 13:56, wren romano
wrote: On Sun, Oct 2, 2016 at 3:17 AM, 韩冬(基础平台部)
wrote: Hi wren!
Yes, i noticed that attoparsec's numeric parsers are slow. I have a benchmark set to compare attoparsec and binary-parsers on different sample JSON files, it's on github: https://github.com/winterland1989/binary-parsers.
I'm pretty sure bytestring-lexing helped a lot, for example, the average decoding speed improvement is around 20%, but numeric only benchmarks(integers and numbers) improved by 30% !
So still some substantial gains for non-numeric stuff, nice!
Parsing is just a part of JSON decoding, lots of time is spent on unescaping, .etc. So the parser's improvement is quite large IMHO.
BTW, can you provide a version of lexer which doesn't check whether a Word is a digit? In binary-parsers i use something like `takeWhile isDigit` to extract the input ByteString, so there's no need to verify this in lexer again. Maybe we can have another performance improvement.
I suppose I could, but then it wouldn't be guaranteed to return correct answers. The way things are set up now, the intended workflow is that wherever you're expecting a number, you should just hand the ByteString over to bytestring-lexing (i.e., not bother scanning/pre-lexing via `takeWhile isDigit`) and it'll give back the answer together with the remainder of the input. This ensures that you don't need to do two passes over the characters. So, for Attoparsec itself you'd wrap it up with something like:
decimal :: Integral a => Parser a decimal = get >>= \bs -> case readDecimal bs of Nothing -> fail "error message" Just (a, bs') -> put bs' >> return a
Alas `get` isn't exported[1], but you get the idea. Of course, for absolute performance you may want to inline all the combinators to see if there's stuff you can get rid of.
The only reason for scanning ahead is in case you're dealing with lazy bytestrings and so need to glue them together in order to use bytestring-lexing. Older versions of the library did have support for lazy bytestrings, but I removed it because it was bitrotten and unused. But if you really need it, I can add new variants of the lexers for dealing with the possibility of requesting new data when the input runs out.
[1] http://hackage.haskell.org/package/attoparsec-0.13.1.0/docs/src/Data-Attopar...
-- Live well, ~wren _______________________________________________ Haskell-Cafe mailing list To (un)subscribe, modify options or view archives go to: http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe Only members subscribed via the mailman list are allowed to post.