JSON parser that returns the rest of the string that was not used

As someone who spent many years putting data in S-expression format, it seems natural to me to write multiple S-expressions (or JSON objects) to a file, and expect a reader to be able to read them back one at a time. This seems comparatively uncommon in the JSON world. Accordingly, it looks like the most popular JSON parsing lib, Aeson, doesn't directly provide this functionality. Functions like decode just return a "Maybe a", not the left-over input, meaning that you would need to somehow split up your multi-object file before attempting to parse, which is annoying and error prone. It looks like maybe you can get Aeson to do what I want by dropping down to the attoparsec layer and messing with IResult. But is there a better way to do this? Would this be a good convenience routine to add to aeson in a PR? I.e. would anyone else use this? Thanks, -Ryan

You could drop down to the attoparsec layer, but instead of messing with IResults, use it to make another parser that will parse all the objects in the file. E.g. json `sepBy` skipSpace :: Parser [Value] sepBy and skipSpace both taken from Data.Attoparsec.Text On Sun, 2016-05-29 at 13:09 -0400, Ryan Newton wrote:
As someone who spent many years putting data in S-expression format, it seems natural to me to write multiple S-expressions (or JSON objects) to a file, and expect a reader to be able to read them back one at a time.
This seems comparatively uncommon in the JSON world. Accordingly, it looks like the most popular JSON parsing lib, Aeson, doesn't directly provide this functionality. Functions like decode just return a "Maybe a", not the left-over input, meaning that you would need to somehow split up your multi-object file before attempting to parse, which is annoying and error prone.
It looks like maybe you can get Aeson to do what I want by dropping down to the attoparsec layer and messing with IResult.
But is there a better way to do this? Would this be a good convenience routine to add to aeson in a PR? I.e. would anyone else use this?
Thanks, -Ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Thanks, I'll have to try it and see if the Parser [Value] can enable
streaming/incremental IO.
On Sun, May 29, 2016 at 1:23 PM, Sanae
You could drop down to the attoparsec layer, but instead of messing with IResults, use it to make another parser that will parse all the objects in the file.
E.g. json `sepBy` skipSpace :: Parser [Value]
sepBy and skipSpace both taken from Data.Attoparsec.Text
On Sun, 2016-05-29 at 13:09 -0400, Ryan Newton wrote:
As someone who spent many years putting data in S-expression format, it seems natural to me to write multiple S-expressions (or JSON objects) to a file, and expect a reader to be able to read them back one at a time.
This seems comparatively uncommon in the JSON world. Accordingly, it looks like the most popular JSON parsing lib, Aeson, doesn't directly provide this functionality. Functions like decode just return a "Maybe a", not the left-over input, meaning that you would need to somehow split up your multi-object file before attempting to parse, which is annoying and error prone.
It looks like maybe you can get Aeson to do what I want by dropping down to the attoparsec layer and messing with IResult.
But is there a better way to do this? Would this be a good convenience routine to add to aeson in a PR? I.e. would anyone else use this?
Thanks, -Ryan
_______________________________________________ Haskell-Cafe mailing listHaskell-Cafe@haskell.orghttp://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Last spring some colleagues and I wrote a properly streaming Json parser
that could incrementally emit Json primitive values as its fed bytestrings.
There's some Corner cases that come up wrt how atto parsecs float
parser parses 0.0 as two different zero literals depending on how it's
split between chunks, but that aside its a pretty simple stack machine
implementation that worked out pretty well
On Sunday, May 29, 2016, Ryan Newton
Thanks, I'll have to try it and see if the Parser [Value] can enable streaming/incremental IO.
On Sun, May 29, 2016 at 1:23 PM, Sanae
javascript:_e(%7B%7D,'cvml','uguu@installgentoo.com');> wrote: You could drop down to the attoparsec layer, but instead of messing with IResults, use it to make another parser that will parse all the objects in the file.
E.g. json `sepBy` skipSpace :: Parser [Value]
sepBy and skipSpace both taken from Data.Attoparsec.Text
On Sun, 2016-05-29 at 13:09 -0400, Ryan Newton wrote:
As someone who spent many years putting data in S-expression format, it seems natural to me to write multiple S-expressions (or JSON objects) to a file, and expect a reader to be able to read them back one at a time.
This seems comparatively uncommon in the JSON world. Accordingly, it looks like the most popular JSON parsing lib, Aeson, doesn't directly provide this functionality. Functions like decode just return a "Maybe a", not the left-over input, meaning that you would need to somehow split up your multi-object file before attempting to parse, which is annoying and error prone.
It looks like maybe you can get Aeson to do what I want by dropping down to the attoparsec layer and messing with IResult.
But is there a better way to do this? Would this be a good convenience routine to add to aeson in a PR? I.e. would anyone else use this?
Thanks, -Ryan
_______________________________________________ Haskell-Cafe mailing listHaskell-Cafe@haskell.org javascript:_e(%7B%7D,'cvml','Haskell-Cafe@haskell.org');http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org javascript:_e(%7B%7D,'cvml','Haskell-Cafe@haskell.org'); http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

Hi Ryan
Isn't this a problem of JSON rather than it's parsers?
That's too say I believe (but could easily be wrong...) that a file
with multiple JSON objects would be ill-formed; it would be
well-formed if the multiple objects were in a single top-level array.
On 29 May 2016 at 18:09, Ryan Newton
As someone who spent many years putting data in S-expression format, it seems natural to me to write multiple S-expressions (or JSON objects) to a file, and expect a reader to be able to read them back one at a time.
This seems comparatively uncommon in the JSON world. Accordingly, it looks like the most popular JSON parsing lib, Aeson, doesn't directly provide this functionality. Functions like decode just return a "Maybe a", not the left-over input, meaning that you would need to somehow split up your multi-object file before attempting to parse, which is annoying and error prone.
It looks like maybe you can get Aeson to do what I want by dropping down to the attoparsec layer and messing with IResult.
But is there a better way to do this? Would this be a good convenience routine to add to aeson in a PR? I.e. would anyone else use this?
Thanks, -Ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On Sun, May 29, 2016 at 1:53 PM, Stephen Tetley
Isn't this a problem of JSON rather than it's parsers?
I can understand that a file with multiple JSONs is not a legal "JSON
text". But... isn't that issue separate from whether parsers expect
terminated strings, or, conversely, are tolerant of arbitrary text
following the JSON expr? Scheme "read" functions from time immemorial
would read the first expression off a handle without worrying about what
followed it! It doesn't mean the whole file needs to be valid JSON, just
that each prefix chewed off the front is valid JSON.
Thanks to Nikita for the links to json-stream and
json-incremental-decoder. My understanding is that if I use a top-level
array to wrap the objects, then these approaches will let me retain a
streaming/incremental IO. I'm not sure yet how to use this to stream
output from a monadic computation.
Let me be specific about the scenario I'm trying to handle:
Criterion loops over benchmarks, and after running each, it writes the
report out to disk appending it to a file:
https://github.com/bos/criterion/blob/fb815c928af2cb089cea9399503304530e2788...
This way, the report doesn't sit in memory affecting subsequent benchmarks.
(I.e. polluting the live set for major GC.) When all benchmarks are
complete, the reports are read back from the file.
There are bugs in the binary serialization used in the linked code. We
want to switch it to dump and read back in JSON instead.
In this case, we can just write an initial "[" to the file, and then
serialize one JSON object at a time, interspersed with ",". That's ok...
but it's kind of an ugly solution -- it requires that, we, the client of
the JSON serialization API, make assumptions about the serialization format
and reimplement a tiny tiny fraction of it.
Cheers,
-Ryan
On Sun, May 29, 2016 at 1:53 PM, Stephen Tetley
Hi Ryan
Isn't this a problem of JSON rather than it's parsers?
That's too say I believe (but could easily be wrong...) that a file with multiple JSON objects would be ill-formed; it would be well-formed if the multiple objects were in a single top-level array.
As someone who spent many years putting data in S-expression format, it seems natural to me to write multiple S-expressions (or JSON objects) to a file, and expect a reader to be able to read them back one at a time.
This seems comparatively uncommon in the JSON world. Accordingly, it looks like the most popular JSON parsing lib, Aeson, doesn't directly provide
On 29 May 2016 at 18:09, Ryan Newton
wrote: this functionality. Functions like decode just return a "Maybe a", not the left-over input, meaning that you would need to somehow split up your multi-object file before attempting to parse, which is annoying and error prone.
It looks like maybe you can get Aeson to do what I want by dropping down to the attoparsec layer and messing with IResult.
But is there a better way to do this? Would this be a good convenience routine to add to aeson in a PR? I.e. would anyone else use this?
Thanks, -Ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On 30/05/16 5:53 AM, Stephen Tetley wrote:
Hi Ryan
Isn't this a problem of JSON rather than it's parsers?
That's too say I believe (but could easily be wrong...) that a file with multiple JSON objects would be ill-formed; it would be well-formed if the multiple objects were in a single top-level array.
"A file with multiple JSON objects would be ill-formed" -- it would be an ill-formed *what*? The media type application/json appears to describe a format containing precisely one JSON value, but RFC 7159 is otherwise silent about streams of JSON values. JSON is sometimes used as the format for entries in logs; it would be pretty useless for that if you couldn't have more than one in a sequence. If a JSON value is true, false, or null it ends at its last letter; if it's a string it ends at the closing double quote; if it's an array it ends at the closing ]; if it's an object it ends at the closing }; only if it is a number is there any need to check the next character, but then only one character needs to be checked, and thanks to the requirement that numbers be in ASCII, only one byte needs to be checked, there being no need to decode the next Unicode code point in full.

Thanks Richard. I didn't know that the spec was precise about the JSON
expr not going beyond the closing character. (I wasn't sure, for instance,
if it would also include whitespace after that point.)
For logging, I bet it helps if people try to enforce the invariant that
JSON text doesn't internally include newlines...
Best,
-Ryan
On Sun, May 29, 2016 at 10:25 PM, Richard A. O'Keefe
On 30/05/16 5:53 AM, Stephen Tetley wrote:
Hi Ryan
Isn't this a problem of JSON rather than it's parsers?
That's too say I believe (but could easily be wrong...) that a file with multiple JSON objects would be ill-formed; it would be well-formed if the multiple objects were in a single top-level array.
"A file with multiple JSON objects would be ill-formed" -- it would be an ill-formed *what*?
The media type application/json appears to describe a format containing precisely one JSON value, but RFC 7159 is otherwise silent about streams of JSON values.
JSON is sometimes used as the format for entries in logs; it would be pretty useless for that if you couldn't have more than one in a sequence.
If a JSON value is true, false, or null it ends at its last letter; if it's a string it ends at the closing double quote; if it's an array it ends at the closing ]; if it's an object it ends at the closing }; only if it is a number is there any need to check the next character, but then only one character needs to be checked, and thanks to the requirement that numbers be in ASCII, only one byte needs to be checked, there being no need to decode the next Unicode code point in full.
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

On 31/05/16 1:16 AM, Ryan Newton wrote:
Thanks Richard. I didn't know that the spec was precise about the JSON expr not going beyond the closing character.
First, older versions of JSON required a value to be an object or an array. Second, the JSON grammar in RFC 7159 is quite precise. Insignificant whitespace is allowed before or after any of the six structural characters. -- That is, [ ] { } , : -- Oddly enough, the specification does NOT say that whitespace -- is allowed before or after any other token; it appears that -- "false" is legal at JSON top level but " false" is not. ws = *( %x20 / ; Space %x09 / ; Horizontal tab %x0A / ; Line feed or New line %x0D ) ; Carriage return value = false / null / true / object / array / number / string false = %x66.61.6c.73.65 ; false null = %x6e.75.6c.6c ; null true = %x74.72.75.65 ; true Note that it's not a matter of scanning a sequence of letters and then checking for particular values, it must be one of those three exact sequences. Once you have read the "e" of "false" there is no point in reading any further. You certainly don't need to skip white space, indeed, if you take the specification literally, you mustn't. (But, sigh, it IS ok to skip white space after a final ] or }. Such are the standards the net is made from.) object = ws %x7B ws [ member *( ws %x2C ws member ) ] ws %x7D ws member = string ws %x3A ws value array = ws %x5B ws [ value *( ws %x2C ws value ) ] ws %x5D ws And yes, the grammar is ambiguous. Consider "[ [ ] ]" Does the first white space character go with the first left bracket or the second one? All they needed to do was to say that strings, any other values, and , : ] } can be preceded by insignificant white space, and the ambiguity would be gone and " false" would be legal. Every kind of number ends with a block of digits; since white space isn't allowed after numbers, the next character, whatever it is, should not be consumed, but must be checked to make sure it is not a digit. I wonder if anyone has a JSON parser that follows the letter of the standard? Preparing this message has made me realise that (a) mine doesn't and (b) I don't really want it to.

I know of at least two packages providing the incremental JSON parsing
functionality:
http://hackage.haskell.org/package/json-stream
http://hackage.haskell.org/package/json-incremental-decoder
Being the author of the latter one I recommend checking out both.
вс, 29 мая 2016 г. в 20:10, Ryan Newton
As someone who spent many years putting data in S-expression format, it seems natural to me to write multiple S-expressions (or JSON objects) to a file, and expect a reader to be able to read them back one at a time.
This seems comparatively uncommon in the JSON world. Accordingly, it looks like the most popular JSON parsing lib, Aeson, doesn't directly provide this functionality. Functions like decode just return a "Maybe a", not the left-over input, meaning that you would need to somehow split up your multi-object file before attempting to parse, which is annoying and error prone.
It looks like maybe you can get Aeson to do what I want by dropping down to the attoparsec layer and messing with IResult.
But is there a better way to do this? Would this be a good convenience routine to add to aeson in a PR? I.e. would anyone else use this?
Thanks, -Ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

For aeson, use the Parser/attoparsec module
https://hackage.haskell.org/package/aeson-0.11.2.0/docs/Data-Aeson-Parser.ht...
On Sun, May 29, 2016 at 8:48 PM, Nikita Volkov
I know of at least two packages providing the incremental JSON parsing functionality: http://hackage.haskell.org/package/json-stream http://hackage.haskell.org/package/json-incremental-decoder
Being the author of the latter one I recommend checking out both.
вс, 29 мая 2016 г. в 20:10, Ryan Newton
: As someone who spent many years putting data in S-expression format, it seems natural to me to write multiple S-expressions (or JSON objects) to a file, and expect a reader to be able to read them back one at a time.
This seems comparatively uncommon in the JSON world. Accordingly, it looks like the most popular JSON parsing lib, Aeson, doesn't directly provide this functionality. Functions like decode just return a "Maybe a", not the left-over input, meaning that you would need to somehow split up your multi-object file before attempting to parse, which is annoying and error prone.
It looks like maybe you can get Aeson to do what I want by dropping down to the attoparsec layer and messing with IResult.
But is there a better way to do this? Would this be a good convenience routine to add to aeson in a PR? I.e. would anyone else use this?
Thanks, -Ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
participants (7)
-
Adam Bergmark
-
Carter Schonwald
-
Nikita Volkov
-
Richard A. O'Keefe
-
Ryan Newton
-
Sanae
-
Stephen Tetley