How efficient is read?

I have a lot of structured data in a program written in a different language, which I would like to read in and analyze with Haskell. And I'm free to format this data in any shape or form from the other language. Could I define a Haskell type for this data that derives the default Read, then simply print out Haskell code from the program and 'read' it in? Would this be horribly inefficient? It would save me some time of writing a parser. -Tom

tomahawkins:
I have a lot of structured data in a program written in a different language, which I would like to read in and analyze with Haskell. And I'm free to format this data in any shape or form from the other language.
Could I define a Haskell type for this data that derives the default Read, then simply print out Haskell code from the program and 'read' it in? Would this be horribly inefficient? It would save me some time of writing a parser.
It would be easy but inefficient for more than say, 100k of data. deriving Binary will be faster and almost as easy (the derive script is in the binary/scripts dir).

In fact, the time you'd spend writing read instances would not compare to the half hour required to learn parsec. And your parser will be efficient (at least, according to the guys from the parser team ;-) Cheers, PE El 08/05/2010, a las 23:32, Tom Hawkins escribió:
I have a lot of structured data in a program written in a different language, which I would like to read in and analyze with Haskell. And I'm free to format this data in any shape or form from the other language.
Could I define a Haskell type for this data that derives the default Read, then simply print out Haskell code from the program and 'read' it in? Would this be horribly inefficient? It would save me some time of writing a parser.
-Tom _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe

PEM> In fact, the time you'd spend writing read instances would not PEM> compare to the half hour required to learn parsec. maybe the wiki could be updated to give more clues for a newcomer. http://www.haskell.org/haskellwiki/Parsec in particular : - link 1 points to the parsec site, with an almost 10 years old documentation, for a previous major release - link 3 is broken The rest of the page is a bit terse as well. I'm really wondering what one should start reading to learn how to parse a stream in haskell. -- Paul

On 9 May 2010 08:45, Paul R
http://www.haskell.org/haskellwiki/Parsec
in particular :
- link 1 points to the parsec site, with an almost 10 years old documentation, for a previous major release - link 3 is broken
The rest of the page is a bit terse as well. I'm really wondering what one should start reading to learn how to parse a stream in haskell.
Hi Paul The 10 year old documentation is very good though - for my taste, Parsec 2.0 is the best documented Haskell lib I've seen. If you want to parse a stream, you don't want Parsec as produces as it isn't an online parser - online meaning 'streaming' i.e. it can produce some results during the 'work' rather than a single result at the end. From the descriptions on Hackage, Parsimony and uu-parsinglib sound like better candidates; similarly one of the Polyparse modules provides an online parser. If you want to learn how to write a streaming parser, pick one of those - start work and post back to this list if/when you have problems. Remember that a non-streaming parser is simpler than a streaming one: you might want to write a version that works on short input first and your result type has to support streaming (probably best if it is a list). Also for any parser, but especially an online one you'll have to be careful to use backtracking sparingly. Best wishes Stephen

Stephen Tetley
The 10 year old documentation is very good though - for my taste, Parsec 2.0 is the best documented Haskell lib I've seen.
But does it help with Parsec-3?
If you want to parse a stream, you don't want Parsec as produces as it isn't an online parser - online meaning 'streaming' i.e. it can produce some results during the 'work' rather than a single result at the end.
I thought this was one of the new features in Parsec-3... -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On 9 May 2010 11:42, Ivan Lazar Miljenovic
If you want to parse a stream, you don't want Parsec ___ as it isn't an online parser - online meaning 'streaming' i.e. it can produce some results during the 'work' rather than a single result at the end.
I thought this was one of the new features in Parsec-3...
Hi Ivan Possibly? If so, maybe the authors ought to mention it in the cabal.file / package description. I know it can use bytestrings which have efficiency advantages over String, but that doesn't make it online. Best wishes Stephen

Stephen Tetley
On 9 May 2010 11:42, Ivan Lazar Miljenovic
wrote: If you want to parse a stream, you don't want Parsec ___ as it isn't an online parser - online meaning 'streaming' i.e. it can produce some results during the 'work' rather than a single result at the end.
I thought this was one of the new features in Parsec-3...
Possibly?
If so, maybe the authors ought to mention it in the cabal.file / package description. I know it can use bytestrings which have efficiency advantages over String, but that doesn't make it online.
Well, RWH talks about Parsecs' "input stream", so maybe I'm just confusing the terms: http://book.realworldhaskell.org/read/using-parsec.html -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

On May 9, 2010, at 06:53 , Ivan Lazar Miljenovic wrote:
Stephen Tetley
writes: On 9 May 2010 11:42, Ivan Lazar Miljenovic
wrote: If you want to parse a stream, you don't want Parsec ___ as it isn't an online parser - online meaning 'streaming' i.e. it can produce some results during the 'work' rather than a single result at the end.
I thought this was one of the new features in Parsec-3...
Possibly?
If so, maybe the authors ought to mention it in the cabal.file / package description. I know it can use bytestrings which have efficiency advantages over String, but that doesn't make it online.
Well, RWH talks about Parsecs' "input stream", so maybe I'm just confusing the terms: http://book.realworldhaskell.org/read/using-parsec.html
Hm. I'd understand that as referring to the fact that Parsec 3 can use arbitrary input types instead of [Char], not to streams as in stream fusion or lazy processing. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

On 9 May 2010 13:25, Brandon S. Allbery KF8NH
Hm. I'd understand that as referring to the fact that Parsec 3 can use arbitrary input types instead of [Char], not to streams as in stream fusion or lazy processing.
Hi Brandon Yes - that's my impression too. There is a package for Parsec iteratee package on Hackage that would presumably support for streaming (thats to say online, or synonymously - 'piecemeal' / lazy processing). However unless I could find a tutorial, I'd go with Polyparse or uu-parsinglib (Doaitse Swierstra has a tech report that gives a very detailed guide to uu-parsinglib). Best wishes Stephen

Hello Stephen, Stephen> The 10 year old documentation is very good though - for my Stephen> taste, Parsec 2.0 is the best documented Haskell lib I've seen. Indeed the doc for 2.0 is really comprehensive, but didn't the library evolve a lot between release 2.0 and 3.1 ? Stephen> If you want to parse a stream, you don't want Parsec as Stephen> produces as it isn't an online parser - online meaning Stephen> 'streaming' i.e. it can produce some results during the 'work' Stephen> rather than a single result at the end. From the descriptions Stephen> on Hackage, Parsimony and uu-parsinglib sound like better Stephen> candidates; similarly one of the Polyparse modules provides an Stephen> online parser. Thank you for this well detailed explanation. It was just me misusing the word "stream", I was actually meaning a simple bounded string. As a first shot I might try to add a new Reader to pandoc, which makes use of Parsec 3, maybe a Textile one, which is not in yet. regards, -- Paul

On 10 May 2010 09:32, Paul R
Indeed the doc for 2.0 is really comprehensive, but didn't the library evolve a lot between release 2.0 and 3.1 ?
Hi Paul I think the internals evolved a lot more than the interface - so it can handle parsing byte-strings etc. There was quite a long blog post aggregated to Planet Haskell a few months ago detailing the new internals, unfortunately I can't remember the author's name so can't find you a reference (I'm sure is wasn't Derek Elkins who maintains Parsec). I still use Parsec 2.1 myself (I've no need to parse large files where byte-strings would be a clear advantage) so I'm not the best person to comment, but I've just scanned the Haddock documentation on Hackage and the interfaces look very similar. The modules have slightly different namespaces - so imports will be different and one would have to choose which text type to use (Text.Parsec.ByteString; Text.Parsec.ByteString.Lazy or Text.Parsec.String) and import the appropriate module to get the "parseFromFile" function. Best wishes Stephen

On May 10, 2010, at 04:32 , Paul R wrote:
Stephen> If you want to parse a stream, you don't want Parsec as Stephen> produces as it isn't an online parser - online meaning
Thank you for this well detailed explanation. It was just me misusing the word "stream", I was actually meaning a simple bounded string.
That's not misuse, it's just confusing the usual parser terminology with the usual Haskell terminology. (Hence also the earlier confusion about token streams in Parsec 3.) -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] allbery@kf8nh.com system administrator [openafs,heimdal,too many hats] allbery@ece.cmu.edu electrical and computer engineering, carnegie mellon university KF8NH

In fact, the time you'd spend writing read instances would not compare to the half hour required to learn parsec. And your parser will be efficient (at least, according to the guys from the parser team ;-)
I agree that Read is likely to be inefficient, but the more important aspect is that it gives you no useful error message if the parse fails. Parser combinators are really rather easy to learn and use, and tend to give decent error reports when something goes wrong. In fact, if you just want Read-like functionality for a set of Haskell datatypes, use polyparse: the DrIFT tool can derive polyparse's Text.Parse class (the equivalent of Read) for you, so you do not even need to write the parser yourself! I would caution against using Parsec if your dataset is large. Parsec does not return anything until it has seen the entire input, so can use a huge amount of memory. The other day someone was observing on haskell-cafe that parsing a 9Mb XML file using a Parsec-based parser required >7Gb of memory, compared with 1.3Gb for a strict polyparse- based parser (still too much), and the happy conclusion was that the lazy polyparse variant uses a neglible amount by comparison. (Declaration of interest: I wrote polyparse.) Regards, Malcolm

Malcolm Wallace
(Declaration of interest: I wrote polyparse.)
For which I, for one, am grateful! (So, when are you going to release an updated version with a fixed definition of discard for the lazy parser? :p) -- Ivan Lazar Miljenovic Ivan.Miljenovic@gmail.com IvanMiljenovic.wordpress.com

In fact, if you just want Read-like functionality for a set of Haskell datatypes, use polyparse: the DrIFT tool can derive polyparse's Text.Parse class (the equivalent of Read) for you, so you do not even need to write the parser yourself!
Cabal install DrIFT-cabalized complains. What is the module "Rules"? I've never seen it before. Is there a quick fix? I didn't see a "build-depends" line in my ~/.cabal/config file. e0082888@e0082888-laptop:~$ cabal install DrIFT-cabalized Resolving dependencies... Configuring DrIFT-cabalized-2.2.3.1... Preprocessing executables for DrIFT-cabalized-2.2.3.1... Building DrIFT-cabalized-2.2.3.1... src/DrIFT.hs:19:17: Could not find module `Rules': It is a member of the hidden package `ghc-6.12.2'. Perhaps you need to add `ghc' to the build-depends in your .cabal file. Use -v to see a list of the files searched for. cabal: Error: some packages failed to install: DrIFT-cabalized-2.2.3.1 failed during the building phase. The exception was: ExitFailure 1

On Mon, May 10, 2010 at 4:50 PM, Tom Hawkins
In fact, if you just want Read-like functionality for a set of Haskell datatypes, use polyparse: the DrIFT tool can derive polyparse's Text.Parse class (the equivalent of Read) for you, so you do not even need to write the parser yourself!
Cabal install DrIFT-cabalized complains. What is the module "Rules"? I've never seen it before.
Is there a quick fix? I didn't see a "build-depends" line in my ~/.cabal/config file.
e0082888@e0082888-laptop:~$ cabal install DrIFT-cabalized Resolving dependencies... Configuring DrIFT-cabalized-2.2.3.1... Preprocessing executables for DrIFT-cabalized-2.2.3.1... Building DrIFT-cabalized-2.2.3.1...
src/DrIFT.hs:19:17: Could not find module `Rules': It is a member of the hidden package `ghc-6.12.2'. Perhaps you need to add `ghc' to the build-depends in your .cabal file. Use -v to see a list of the files searched for. cabal: Error: some packages failed to install: DrIFT-cabalized-2.2.3.1 failed during the building phase. The exception was: ExitFailure 1
The tarball was missing its Rules.hs; as it happens, GHC has a module named Rules.hs as well, hence the confusing error. I've uploaded a fresh one that should work. -- gwern

The tarball was missing its Rules.hs; as it happens, GHC has a module named Rules.hs as well, hence the confusing error. I've uploaded a fresh one that should work.
Thanks. This builds and installs fine. But I think there is something wrong with the generated parser. It doesn't look for (..) groupings. For example: data Something = Something Int (Maybe String) deriving Show {-! derive : Parse !-} There is nothing in the generated parser to look for parens around the Maybe in case it is a (Just string). Am I missing something?

On Tue, May 11, 2010 at 12:16 AM, Tom Hawkins
The tarball was missing its Rules.hs; as it happens, GHC has a module named Rules.hs as well, hence the confusing error. I've uploaded a fresh one that should work.
Thanks. This builds and installs fine.
But I think there is something wrong with the generated parser. It doesn't look for (..) groupings. For example:
data Something = Something Int (Maybe String) deriving Show {-! derive : Parse !-}
There is nothing in the generated parser to look for parens around the Maybe in case it is a (Just string).
Am I missing something?
I don't know. If you could check whether the original Drift has that error as well, then I suspect Drift's author, Meachem, would be interested to know. (I only maintain drift-cabalized as a packaging fork; I tried not to change any actual functionality.) -- gwern

data Something = Something Int (Maybe String) deriving Show {-! derive : Parse !-}
There is nothing in the generated parser to look for parens around the Maybe in case it is a (Just string).
Sorry, that will be my fault. I contributed the rules for deriving Parse to DrIFT. I am on holiday right now, but will try to take a look shortly. Regards, Malcolm

On May 9, 2010, at 12:32 AM, Tom Hawkins wrote:
I have a lot of structured data in a program written in a different language, which I would like to read in and analyze with Haskell. And I'm free to format this data in any shape or form from the other language.
Could I define a Haskell type for this data that derives the default Read, then simply print out Haskell code from the program and 'read' it in? Would this be horribly inefficient? It would save me some time of writing a parser.
-Tom
If your types contain infix constructors, the derived Read instances may be almost unusable; see http://hackage.haskell.org/trac/ghc/ticket/1544

There is the ChristmasTree package (http://hackage.haskell.org/package/ChristmasTree) which provides a very fast read alternative by deriving grammars for each datatype. If you want to know the speed differences, see http://www.cs.uu.nl/wiki/bin/view/Center/TTTAS for more information (it's in the Haskell Do You Read Me paper, see section 5 for a comparison of efficiency). -chris On 9 mei 2010, at 05:32, Tom Hawkins wrote:
I have a lot of structured data in a program written in a different language, which I would like to read in and analyze with Haskell. And I'm free to format this data in any shape or form from the other language.
Could I define a Haskell type for this data that derives the default Read, then simply print out Haskell code from the program and 'read' it in? Would this be horribly inefficient? It would save me some time of writing a parser.
-Tom _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
participants (12)
-
Brandon S. Allbery KF8NH
-
Chris Eidhof
-
Daniel Gorín
-
Don Stewart
-
Gwern Branwen
-
Ivan Lazar Miljenovic
-
Malcolm Wallace
-
Malcolm Wallace
-
Paul R
-
Pierre-Etienne Meunier
-
Stephen Tetley
-
Tom Hawkins