On Wed, Sep 9, 2009 at 12:16 PM, Don Stewart <dons@galois.com> wrote:
ekmett:
> Hi Anakim,
>  
> Nice to see someone else working in this space.
>  
> I have also been working on a set of parallel parsing techniques, which can use
> small Parsec parsers for local context sensitivity.
>  
> See the second set of slides in http://comonad.com/reader/2009/
> iteratees-parsec-and-monoid/ for an overview of how I'm doing something similar
> to feed Parsec independent chunks. Note that this approach bypasses the need
> for a separate sequential scan, which otherwise floods your cache, and lets you
> get closer to the performance limit imposed by Amdahl's law.
>  
> The code in the second set of slides can be adapted to your case: load
> everything into a lazy bytestring or fingertree of strict bytestrings, then for
> each strict bytestring chunk in parallel, scan it for the first newline, and
> then start an iteratee based parsec parser from that point. I use the iteratee
> based parsec parsers so that when I glue the partial parses together I can feed
> the unparsed data on the left side of the first newline in each chunk to the
> parser I'm joining on the left. I provide a monoid for the purpose of gluing
> together these partial parses, which encapsulates this behavior. 

You can get quite a long way by using bytestring-mmap and strict
bytestrings. The first ensures your IO overhead will be low, and the
second ensures you don't migrate work needlessly.
 
I'm actually using bytestring-mmap in a toy compiler which uses these techniques, though at present I'm using System.IO.Posix.MMap.Lazy rather than the strict version.
 
As for migrating work, maybe I'm not understanding your point, but my whole goal was to migrate the chunking and parsing out to other cores, so the behavior of parMap rwhnf'ing the monoidal reduction over the chunks seems to be exactly what I want. I could perhaps get better results by trying to assign the same thread contiguous ranges of the input though, and controlling the choice of core used to merge chunks more intelligently by maybe doing something like mixing up pseq and par to group runs onto the same core where possible.
 
Could you elaborate?
 
-Edward