
11 May
2017
11 May
'17
4:30 p.m.
On 2017-05-11 18:12, David Turner wrote:
Dear Café,
We have a stream of ~900byte JSON documents (lazy ByteStrings) that we would like to validate (NB not parse) as quickly as possible. We just need to check that the bytes are a syntactically valid JSON document and then pass them elsewhere; we do not care what the content is. The documents are encoded in UTF-8 and have no extraneous whitespace.
No particular recommendations, but you might want to look into semi-indexing[1] as a strategy. It looks plausible that it would be possible to do that without a lot of allocation; see the paper for details. (I there's also a demo implementation in Python on GH.) [1] http://www.di.unipi.it/~ottavian/files/semi_index_cikm.pdf