Re: [Haskell-cafe] Re: Haskell-Cafe Digest, Vol 78, Issue 14

12 Feb 2010

      On Thu, 2010-02-11 at 17:49 -0600, John Lato wrote:
...
On Thu, Feb 11, 2010 at 10:00 AM, Gregory Collins
 wrote:
...
Maciej Piechotka  writes:
...
On Tue, 2010-02-09 at 16:41 +0000, John Lato wrote:
...
See http://inmachina.net/~jwlato/haskell/ParsecIteratee.hs for a valid
Stream instance using iteratee.  Also Gregory Collins recently posted
an iteratee wrapper for Attoparsec to haskell-cafe.  To my knowledge
these are not yet in any packages, but hackage is vast.
Hmm. Am I correct that his implementation caches everything?
The one that John posted (iteratees on top of parsec) has to keep a copy
of the entire input, because parsec wants to be able to do arbitrary
backtracking on the stream.
This is true, however I believe this alternative approach is also
correct.  The Cursor holds the stream state, and parsec holds on to
the Cursor for backtracking.  Data is only read within the Iteratee
monad when it goes beyond the currently available cursors, at which
point another cursor is added to the linked list (implemented with
IORef or other mutable reference).
The downside to this approach is that data is consumed from the
iteratee stream for a partial parse, even if the parse fails.  I did
not want this behavior, so I chose a different approach.
Hmm. AFAIU your code you are doing something like:
...
concatCursor :: (Monad m, Reference r m, StreamChunk c el)
             => Cursor r m c el -> m (c el)
concatCursor c = liftM mconcat (concatCursor' c)
concatCursor' :: (Monad m, Reference r m, StreamChunk c el)
              => Cursor r m c el -> m [c el]
concatCursor' (Cursor r v) =
  liftM2 (:) (return v) (readRef r >>= concatNextCursor')
concatNextCursor' :: (Monad m, Reference r m, StreamChunk c el)
                 => NextCursor r m c el -> m [c el]
concatNextCursor' (NextCursor c) = concatCursor' $! c
concatNextCursor' _              = return $! []
parsecIteratee' :: (Monad m, Reference r m, StreamChunk c el)
                => ParsecT (Cursor r m c el) u (IterateeG c el m) a
                -> u
                -> SourceName
                -> IterateeG c el m (Either ParseError a)
parsecIteratee' p u sn = do
   c <- lift $ mkCursor :: IterateeG c el m (Cursor r m c el)
   res <- runParserT (liftM2 (,) p getInput) u sn c
   case res of
     Right (a, c) -> do sc <- lift $ concatCursor c
                        liftI $! Done (Right a) $! Chunk $! sc
     Left   err   -> return $ Left err
Which seems that it should work (I just checked if it is suppose to
compile). Unfortunately I need to work the clash between transformers
and mtl).

EDIT. Ops. sorry. It will not work. However it will (as it should)
return the remaining part back to stream.
...
...
...
I tried to rewrite the implementation using... well imperative linked
list. For trivial benchmark it have large improvement (althought it may
be due to error in test such as using ByteString) and, I believe, that
it allows to free memory before finish.
Results of test on Core 2 Duo 2.8 GHz:
10:   0.000455s       0.000181s
100:  0.000669s       0.001104s
1000: 0.005209s       0.023704s
10000:        0.053292s       1.423443s
100000:       0.508093s       132.208597s
I'm surprised your version has better performance for small numbers of
elements.  I wonder if it's partially due to more aggressive inlining
from GHC or something of that nature.  Or maybe your version compiles
to a tighter loop as elements can be gc'd.
It is possible as my code was in the same module. I'll try to use 2
different modules.
...
I expected poor performance of my code for larger numbers of elements,
as demonstrated here.
I haven't tested for more then 1e5 (which was in comment).
...
I envisioned the usage scenario where parsers would be relatively
short (<20 chars), and most of the work would be done directly with
iteratees.  In this case it would be more important to preserve the
stream state in the case of a failed parse, and the performance issues
of appending chunks wouldn't arise either.
Fortunately parsec does not limit the number of streams per monad so it
is up to user which one he will choose (depending on problem).
...
Cheers,
John
Regards
PS. Why iteratee uses transformers? It seems to be identical (both have
functional dependencies etc.) to mtl except that mtl is standard in
platform. Using both lead to clashes between names.