Re: [Haskell-cafe] Re: Haskell-Cafe Digest, Vol 78, Issue 14

11 Feb 2010

      On Thu, 2010-02-11 at 11:00 -0500, Gregory Collins wrote:
...
Maciej Piechotka  writes:
...
On Tue, 2010-02-09 at 16:41 +0000, John Lato wrote:
...
See http://inmachina.net/~jwlato/haskell/ParsecIteratee.hs for a valid
Stream instance using iteratee.  Also Gregory Collins recently posted
an iteratee wrapper for Attoparsec to haskell-cafe.  To my knowledge
these are not yet in any packages, but hackage is vast.
Hmm. Am I correct that his implementation caches everything?
The one that John posted (iteratees on top of parsec) has to keep a copy
of the entire input, because parsec wants to be able to do arbitrary
backtracking on the stream.
Well. Not quite. AFAIU (and ByteString implementation indicate so) the
uncons have a type
    uncons :: s -> m (Maybe (t, s))

Where s indicates the position on the stream. Since it is impossible to
get back from having s alone the GC should be free to finalize all
memory allocated to cache the stream before the first living s.

I.e. if input is:

text = 'L':'o':'r':'e':'m':' ':'i':'p':'s':'u':'m':[]
                    ^               ^
                    s1              s2

and s1 and s2 are position in the stream (for stream that is list)
GC can free Lor part. It seems that it might be significant in real live
as try calls are relatively short comparing with rest of code.

By keeping s as 'pointer to' element second uncons have O(1) time
instead of O(n).
...
...
I tried to rewrite the implementation using... well imperative linked
list. For trivial benchmark it have large improvement (althought it may
be due to error in test such as using ByteString) and, I believe, that
it allows to free memory before finish.
Results of test on Core 2 Duo 2.8 GHz:
10:	0.000455s	0.000181s
100:	0.000669s	0.001104s
1000:	0.005209s	0.023704s
10000:	0.053292s	1.423443s
100000:	0.508093s	132.208597s
Which column corresponds to which module here, and which module are you
benchmarking against, John's or mine?
G
As I'm implementing for parsec (I don't know attoparsec) [as a kind of
exercise to get iteratee better] I benchmarked against John's. My
results are on the left.

I forgot to compile and optimize. Here's result for ByteString:
        Mine            John's
10:	0.000425s	0.000215s
100:	0.000616s	0.001963s
1000:	0.0041s		0.048359s
10000:	0.041694s	4.492774s
100000:	0.309289s	434.238449s

And []:
        Mine            John's
10:	0.000605s	0.000932s
100:	0.001464s	0.008101s
1000:	0.004036s	0.054125s
10000:	0.032341s	1.36938s
100000:	0.317859s	115.846891s

Regards and sorry for confusion 

PS. Sorry - I know that test is somehow simplistic but I thing it
emulates real live situation where you are interested in small amount of
elements around current position (short try). I also think
that /dev/zero have relatively predictable access time (it does not need
to be loaded from slow disk first after which it can be accessed from
cache, it does not run out of entropy etc.).