ANN: The attosplit package splits lazy bytestrings lazily

21 Dec 2010

      This is a belated announcement of a package I uploaded to
hackage last week.

When processing a large input stream, one of the most important
techniques is to split the input into smaller pieces and process
each piece separately.

If that large input stream happens to be a lazy bytestring,
the attosplit package provides a simple but powerful way
to use that technique.

The attosplit package exports a single function:

split :: Parser Strict.ByteString -> Lazy.ByteString -> [Lazy.ByteString]

It splits a lazy bytestring into a lazy list of lazy bytestrings, at
boundaries defined by an attoparsec parser. Whenever the
parser matches, the result of the match is prepended to the
new lazy bytestring which begins at that point.

For example, suppose lazyH is a lazy bytestring read from a
gigantic HTML file, and you want to process each section
beginning with an <H1> tag separately. Each section could
itself still be quite large, so you want to process each section
lazily as well. Then

split (string "<H1>") lazyH

is a lazy list of lazy bytestrings. The first lazy bytestring
gives you all of the bytes up to the first <H1> tag.
The rest of the lazy bytestrings give each section
beginning with <H1> separately. This allows you to read
the entire file lazily, while processing each individual section
lazily on its own.

Of course, you are not limited to split only at boundaries
defined by a string. You have the full power of attoparsec
at your fingertips.

http://hackage.haskell.org/package/attosplit

Enjoy!

-Yitz

Yitzchak Gale

tags

participants (1)