Embarrassed Haskeller -- why is Read so bad? What are alternatives?

Ryan Newton

15 Oct 2013 15 Oct '13

3:01 p.m.

We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats. Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.) Right now I'm working on a project with a Racketeer who is trying to read a 6000 line file as a single list data structure. The derived Read instance is just telling him it won't parse, with NO error information, line number etc. To someone used to Scheme readers, that's rather poor. I wish I had something better to tell him! I am not aware of a library to recommend other than switching to JSON format on disk. OR manually kludging together a parsing hack that, for example, puts one element of the list on each line and makes much smaller calls to "read". Argh!, -Ryan

Attachments:

attachment.html (text/html — 1.2 KB)

Show replies by date

Brandon Allbery

15 Oct 15 Oct

3:05 p.m.

New subject: Embarrassed Haskeller -- why is Read so bad? What are alternatives?

On Tue, Oct 15, 2013 at 11:01 AM, Ryan Newton wrote:

...

We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats.

What we have is parsec, attoparsec, trifecta, etc. --- parsers are so trivial to build in Haskell that it's easier to swot up one that does exactly what you need on the fly than it is to build a better Read with enough flexibility to cover a majority of use cases. -- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net

S. Doaitse Swierstra

17 Oct 17 Oct

12:19 p.m.

New subject: Embarrassed Haskeller -- why is Read so bad? What are alternatives?

On Oct 15, 2013, at 17:05 , Brandon Allbery wrote:

...

On Tue, Oct 15, 2013 at 11:01 AM, Ryan Newton wrote: We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats.

What we have is parsec, attoparsec, trifecta, etc. --- parsers are so trivial to build in Haskell that it's easier to swot up one that does exactly what you need on the fly than it is to build a better Read with enough flexibility to cover a majority of use cases.

In feel free to completely disagree; it is easy to write a simple parser package, but designing one which gives nice error messages, corrects input instead of halting, can deal with ambiguous grammars, does not hang on to the input and produces results online wherever possible is not so easy. By the way: the http://hackage.haskell.org/package/ChristmasTree package contains template Haskell code which can be used to read printed Haskell data type valyes in LINEAR TIME, and knows how to deal with infix constructors without havimng to include swarms of extra parentheses which make the trivial parsers take exponential time. Doaitse

...

-- brandon s allbery kf8nh sine nomine associates allbery.b@gmail.com ballbery@sinenomine.net unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

malcolm.wallace

15 Oct 15 Oct

3:49 p.m.

New subject: Embarrassed Haskeller -- why is Read so bad? What are alternatives?

The polyparse package has a complete drop-in replacement for Read, called Text.Parse. Its lexer is derived closely from Haskell Prelude lex. It already has instances for all of the standard datatypes, and you can use DrIFT to generate instances for any datatypes outside the usual Prelude types. The parser is fast, lazy, and space-efficient. It gives good error messages. Regards, Malcolm On 15 Oct, 2013,at 04:01 PM, Ryan Newton wrote: We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats. Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.) Right now I'm working on a project with a Racketeer who is trying to read a 6000 line file as a single list data structure. The derived Read instance is just telling him it won't parse, with NO error information, line number etc. To someone used to Scheme readers, that's rather poor. I wish I had something better to tell him! I am not aware of a library to recommend other than switching to JSON format on disk. OR manually kludging together a parsing hack that, for example, puts one element of the list on each line and makes much smaller calls to "read". Argh!, -Ryan _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ryan Newton

7:01 p.m.

New subject: Embarrassed Haskeller -- why is Read so bad? What are alternatives?

Thanks! That looks perfect and that is exactly what my googling failed to turn up. On Tue, Oct 15, 2013 at 11:49 AM, malcolm.wallace wrote:

...

The polyparse package has a complete drop-in replacement for Read, called Text.Parse. Its lexer is derived closely from Haskell Prelude lex. It already has instances for all of the standard datatypes, and you can use DrIFT to generate instances for any datatypes outside the usual Prelude types. The parser is fast, lazy, and space-efficient. It gives good error messages.

Regards, Malcolm

On 15 Oct, 2013,at 04:01 PM, Ryan Newton wrote:

We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats.

Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.)

Right now I'm working on a project with a Racketeer who is trying to read a 6000 line file as a single list data structure. The derived Read instance is just telling him it won't parse, with NO error information, line number etc. To someone used to Scheme readers, that's rather poor.

I wish I had something better to tell him! I am not aware of a library to recommend other than switching to JSON format on disk. OR manually kludging together a parsing hack that, for example, puts one element of the list on each line and makes much smaller calls to "read".

Argh!, -Ryan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ryan Newton

7:38 p.m.

New subject: Embarrassed Haskeller -- why is Read so bad? What are alternatives?

Hmm, currently trying to learn how to incorporate DriFT into a cabal package. Polyparse doesn't support any other deriving mechanisms does it? (GHC.Generics, or TemplateHaskell based, or Neil Mitchell's "derive" preprocessor?) On Tue, Oct 15, 2013 at 3:01 PM, Ryan Newton wrote:

...

Thanks!

That looks perfect and that is exactly what my googling failed to turn up.

On Tue, Oct 15, 2013 at 11:49 AM, malcolm.wallace wrote:

...
The polyparse package has a complete drop-in replacement for Read, called Text.Parse. Its lexer is derived closely from Haskell Prelude lex. It already has instances for all of the standard datatypes, and you can use DrIFT to generate instances for any datatypes outside the usual Prelude types. The parser is fast, lazy, and space-efficient. It gives good error messages.

Regards, Malcolm

On 15 Oct, 2013,at 04:01 PM, Ryan Newton wrote:

We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats.

Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.)

Right now I'm working on a project with a Racketeer who is trying to read a 6000 line file as a single list data structure. The derived Read instance is just telling him it won't parse, with NO error information, line number etc. To someone used to Scheme readers, that's rather poor.

I wish I had something better to tell him! I am not aware of a library to recommend other than switching to JSON format on disk. OR manually kludging together a parsing hack that, for example, puts one element of the list on each line and makes much smaller calls to "read".

Argh!, -Ryan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Ryan Newton

8:46 p.m.

New subject: Embarrassed Haskeller -- why is Read so bad? What are alternatives?

Also, any way to get line numbers in the errors when parsing a big string? On Tue, Oct 15, 2013 at 3:38 PM, Ryan Newton wrote:

...

Hmm, currently trying to learn how to incorporate DriFT into a cabal package. Polyparse doesn't support any other deriving mechanisms does it? (GHC.Generics, or TemplateHaskell based, or Neil Mitchell's "derive" preprocessor?)

On Tue, Oct 15, 2013 at 3:01 PM, Ryan Newton wrote:

...
Thanks!

That looks perfect and that is exactly what my googling failed to turn up.

On Tue, Oct 15, 2013 at 11:49 AM, malcolm.wallace
...
wrote:

...
The polyparse package has a complete drop-in replacement for Read, called Text.Parse. Its lexer is derived closely from Haskell Prelude lex. It already has instances for all of the standard datatypes, and you can use DrIFT to generate instances for any datatypes outside the usual Prelude types. The parser is fast, lazy, and space-efficient. It gives good error messages.

Regards, Malcolm

On 15 Oct, 2013,at 04:01 PM, Ryan Newton wrote:

We have great tools for [de]serializing both to binary and to JSON (binary, cereal, json, aeson, etc). But we have rather weak support for [de]serializing to human readable Haskell-ish external formats.

Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.)

Right now I'm working on a project with a Racketeer who is trying to read a 6000 line file as a single list data structure. The derived Read instance is just telling him it won't parse, with NO error information, line number etc. To someone used to Scheme readers, that's rather poor.

I wish I had something better to tell him! I am not aware of a library to recommend other than switching to JSON format on disk. OR manually kludging together a parsing hack that, for example, puts one element of the list on each line and makes much smaller calls to "read".

Argh!, -Ryan

_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

Johannes Waldmann

16 Oct 16 Oct

12:04 p.m.

New subject: Embarrassed Haskeller -- why is Read so bad? What are alternatives?

...

Hmm, currently trying to learn how to incorporate DriFT into a cabal

Ryan Newton writes: package. Polyparse doesn't support any other deriving mechanisms does it? Here is some TH magic by Bertram Felgenhauer for deriving Parsers (that do what you'd expect from "deriving Read"). It allows to write data Foo = ... $(derives [makeToDoc, makeReader] [''Foo]) I find this incredibly useful, and use it heavily. As it stands, it creates code for Parsec, but certainly it could be adapted to a different backend. View source: http://autolat.imn.htwk-leipzig.de/gitweb/?p=autolib;a=tree;f=derive Get source (this is a larger repo): git clone git://autolat.imn.htwk-leipzig.de/git/autolib - J.W.

Bertram Felgenhauer

15 Oct 15 Oct

4:23 p.m.

New subject: Embarrassed Haskeller -- why is Read so bad? What are alternatives?

Dear Ryan,

...

Read instances are both (1) slow, because they take Strings, and (2) they don't allow sensible error messages! (Historical decision choosing Maybe rather than Either.)

Surprisingly, I have found (1) to be a rather small issue. Read has at least two other features that make it abysmally slow. - Read is built on top of a lexer (Prelude.lex), which in turn is based on ReadP. This means checking a lot of cases which are often useless, since while a parser would likely know which token types to expect next, 'lex' doesn't. It even causes otherwise inexplicable non- termination: For example,

...

read ('"' : repeat ' ') :: Int

never terminates, because 'lex' keeps on trying to determine whether the given string starts with a valid string literal or not. ReadP supports unlimited backtracking, a well-known source of bad performance in parsers (at least when the alternatives are returned as a list, which ReadP does.) - Read itself *also* supports unlimited backtracking; it's built around the ReadS type, ReadS a = String -> [(a, String)]. Again, this causes bad performance. In comparison, unfolding, say, a bytestring to a temporary list that is consumed by a parser is cheap. Of course, it's still a good idea to avoid this. I second the suggestions to use one of the modern parser libraries instead.

...

Argh!,

Indeed. Bertram

4279

Age (days ago)

4281

Last active (days ago)

List overview

Download

8 comments

6 participants

participants (6)

Bertram Felgenhauer
Brandon Allbery
Johannes Waldmann
malcolm.wallace
Ryan Newton
S. Doaitse Swierstra