Re: [Haskell-cafe] Fast JSON validation - reducing allocations

11 May 2017

      Interesting, thanks for the link. However we're trying to do _even less_
than that - this is just the "scan the document to produce the Elias-Fano
encoding" step, except without needing to keep a hold of it for later. It's
not quite as trivial as that paper makes out as (a) it doesn't mention the
possibility that the documents might not be well-formed, and (b) it doesn't
really talk about dealing with the special delimiting characters within
string literals, for which you need to drag along a bunch of state while
you're scanning.

This'd be pretty trivial to do in a C-like language now we've got the DFA
tables built, and I may resort to that at some point if we can't work out
how to avoid the allocations in Haskell-land, but I'd quite like to be able
to solve problems of this form without unnecessarily resorting to C.

Cheers,

On 11 May 2017 at 17:30, Bardur Arantsson  wrote:
...
On 2017-05-11 18:12, David Turner wrote:
...
Dear Café,
We have a stream of ~900byte JSON documents (lazy ByteStrings) that we
would like to validate (NB not parse) as quickly as possible. We just
need to check that the bytes are a syntactically valid JSON document and
then pass them elsewhere; we do not care what the content is. The
documents are encoded in UTF-8 and have no extraneous whitespace.
No particular recommendations, but you might want to look into
semi-indexing[1] as a strategy. It looks plausible that it would be
possible to do that without a lot of allocation; see the paper for
details. (I there's also a demo implementation in Python on GH.)
[1] http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf
_______________________________________________
Haskell-Cafe mailing list
To (un)subscribe, modify options or view archives go to:
http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
Only members subscribed via the mailman list are allowed to post.