
Maciej Piechotka
On Tue, 2010-02-09 at 16:41 +0000, John Lato wrote:
See http://inmachina.net/~jwlato/haskell/ParsecIteratee.hs for a valid Stream instance using iteratee. Also Gregory Collins recently posted an iteratee wrapper for Attoparsec to haskell-cafe. To my knowledge these are not yet in any packages, but hackage is vast.
Hmm. Am I correct that his implementation caches everything?
The one that John posted (iteratees on top of parsec) has to keep a copy of the entire input, because parsec wants to be able to do arbitrary backtracking on the stream. Attoparsec provides an *incremental* parser. You feed it bite-sized chunks of an input stream, and it either says "ok, I'm done, here's your value, and the rest of the stream I didn't use" or "I couldn't finish, here's a parser continuation you can feed more chunks to." This, of course, is a perfect conceptual match for iteratees -- with a little bit of plumbing you should be able to parse a stream in O(1) space.
I tried to rewrite the implementation using... well imperative linked list. For trivial benchmark it have large improvement (althought it may be due to error in test such as using ByteString) and, I believe, that it allows to free memory before finish.
Results of test on Core 2 Duo 2.8 GHz: 10: 0.000455s 0.000181s 100: 0.000669s 0.001104s 1000: 0.005209s 0.023704s 10000: 0.053292s 1.423443s 100000: 0.508093s 132.208597s
Which column corresponds to which module here, and which module are you
benchmarking against, John's or mine?
G
--
Gregory Collins