Embarrassed Haskeller -- why is Read so bad? What are alternatives?

We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats. Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.) Right now I'm working on a project with a Racketeer who is trying to read a 6000 line file as a single list data structure. The derived Read instance is just telling him it won't parse, with NO error information, line number etc. To someone used to Scheme readers, that's rather poor. I wish I had something better to tell him! I am not aware of a library to recommend other than switching to JSON format on disk. OR manually kludging together a parsing hack that, for example, puts one element of the list on each line and makes much smaller calls to "read". Argh!, -Ryan

On Tue, Oct 15, 2013 at 11:01 AM, Ryan Newton
We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats.
What we have is parsec, attoparsec, trifecta, etc. --- parsers are so trivial to build in Haskell that it's easier to swot up one that does exactly what you need on the fly than it is to build a better Read with enough flexibility to cover a majority of use cases. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

On Oct 15, 2013, at 17:05 , Brandon Allbery
On Tue, Oct 15, 2013 at 11:01 AM, Ryan Newton
wrote: We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats. What we have is parsec, attoparsec, trifecta, etc. --- parsers are so trivial to build in Haskell that it's easier to swot up one that does exactly what you need on the fly than it is to build a better Read with enough flexibility to cover a majority of use cases.
In feel free to completely disagree; it is easy to write a simple parser package, but designing one which gives nice error messages, corrects input instead of halting, can deal with ambiguous grammars, does not hang on to the input and produces results online wherever possible is not so easy. By the way: the http://hackage.haskell.org/package/ChristmasTree package contains template Haskell code which can be used to read printed Haskell data type valyes in LINEAR TIME, and knows how to deal with infix constructors without havimng to include swarms of extra parentheses which make the trivial parsers take exponential time. Doaitse
-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

The polyparse package has a complete drop-in replacement for Read, called Text.Parse. Its lexer is derived closely from Haskell Prelude lex. It already has instances for all of the standard datatypes, and you can use DrIFT to generate instances for any datatypes outside the usual Prelude types. The parser is fast, lazy, and space-efficient. It gives good error messages.
Regards,
Malcolm
On 15 Oct, 2013,at 04:01 PM, Ryan Newton

Thanks!
That looks perfect and that is exactly what my googling failed to turn up.
On Tue, Oct 15, 2013 at 11:49 AM, malcolm.wallace
The polyparse package has a complete drop-in replacement for Read, called Text.Parse. Its lexer is derived closely from Haskell Prelude lex. It already has instances for all of the standard datatypes, and you can use DrIFT to generate instances for any datatypes outside the usual Prelude types. The parser is fast, lazy, and space-efficient. It gives good error messages.
Regards, Malcolm
On 15 Oct, 2013,at 04:01 PM, Ryan Newton
wrote: We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats.
Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.)
Right now I'm working on a project with a Racketeer who is trying to read a 6000 line file as a single list data structure. The derived Read instance is just telling him it won't parse, with NO error information, line number etc. To someone used to Scheme readers, that's rather poor.
I wish I had something better to tell him! I am not aware of a library to recommend other than switching to JSON format on disk. OR manually kludging together a parsing hack that, for example, puts one element of the list on each line and makes much smaller calls to "read".
Argh!, -Ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hmm, currently trying to learn how to incorporate DriFT into a cabal
package. Polyparse doesn't support any other deriving mechanisms does it?
(GHC.Generics, or TemplateHaskell based, or Neil Mitchell's "derive"
preprocessor?)
On Tue, Oct 15, 2013 at 3:01 PM, Ryan Newton
Thanks!
That looks perfect and that is exactly what my googling failed to turn up.
On Tue, Oct 15, 2013 at 11:49 AM, malcolm.wallace
wrote: The polyparse package has a complete drop-in replacement for Read, called Text.Parse. Its lexer is derived closely from Haskell Prelude lex. It already has instances for all of the standard datatypes, and you can use DrIFT to generate instances for any datatypes outside the usual Prelude types. The parser is fast, lazy, and space-efficient. It gives good error messages.
Regards, Malcolm
On 15 Oct, 2013,at 04:01 PM, Ryan Newton
wrote: We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats.
Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.)
Right now I'm working on a project with a Racketeer who is trying to read a 6000 line file as a single list data structure. The derived Read instance is just telling him it won't parse, with NO error information, line number etc. To someone used to Scheme readers, that's rather poor.
I wish I had something better to tell him! I am not aware of a library to recommend other than switching to JSON format on disk. OR manually kludging together a parsing hack that, for example, puts one element of the list on each line and makes much smaller calls to "read".
Argh!, -Ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Also, any way to get line numbers in the errors when parsing a big string?
On Tue, Oct 15, 2013 at 3:38 PM, Ryan Newton
Hmm, currently trying to learn how to incorporate DriFT into a cabal package. Polyparse doesn't support any other deriving mechanisms does it? (GHC.Generics, or TemplateHaskell based, or Neil Mitchell's "derive" preprocessor?)
On Tue, Oct 15, 2013 at 3:01 PM, Ryan Newton
wrote: Thanks!
That looks perfect and that is exactly what my googling failed to turn up.
On Tue, Oct 15, 2013 at 11:49 AM, malcolm.wallace
wrote:
The polyparse package has a complete drop-in replacement for Read, called Text.Parse. Its lexer is derived closely from Haskell Prelude lex. It already has instances for all of the standard datatypes, and you can use DrIFT to generate instances for any datatypes outside the usual Prelude types. The parser is fast, lazy, and space-efficient. It gives good error messages.
Regards, Malcolm
On 15 Oct, 2013,at 04:01 PM, Ryan Newton
wrote: We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats.
Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.)
Right now I'm working on a project with a Racketeer who is trying to read a 6000 line file as a single list data structure. The derived Read instance is just telling him it won't parse, with NO error information, line number etc. To someone used to Scheme readers, that's rather poor.
I wish I had something better to tell him! I am not aware of a library to recommend other than switching to JSON format on disk. OR manually kludging together a parsing hack that, for example, puts one element of the list on each line and makes much smaller calls to "read".
Argh!, -Ryan
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Hmm, currently trying to learn how to incorporate DriFT into a cabal
Ryan Newton

Dear Ryan,
Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.)
Surprisingly, I have found (1) to be a rather small issue. Read has at least two other features that make it abysmally slow. - Read is built on top of a lexer (Prelude.lex), which in turn is based on ReadP. This means checking a lot of cases which are often useless, since while a parser would likely know which token types to expect next, 'lex' doesn't. It even causes otherwise inexplicable non- termination: For example,
read ('"' : repeat ' ') :: Int
never terminates, because 'lex' keeps on trying to determine whether the given string starts with a valid string literal or not. ReadP supports unlimited backtracking, a well-known source of bad performance in parsers (at least when the alternatives are returned as a list, which ReadP does.) - Read itself *also* supports unlimited backtracking; it's built around the ReadS type, ReadS a = String -> [(a, String)]. Again, this causes bad performance. In comparison, unfolding, say, a bytestring to a temporary list that is consumed by a parser is cheap. Of course, it's still a good idea to avoid this. I second the suggestions to use one of the modern parser libraries instead.
Argh!,
Indeed. Bertram
participants (6)
-
Bertram Felgenhauer
-
Brandon Allbery
-
Johannes Waldmann
-
malcolm.wallace
-
Ryan Newton
-
S. Doaitse Swierstra