ANN: The attosplit package splits lazy bytestrings lazily

This is a belated announcement of a package I uploaded to hackage last week. When processing a large input stream, one of the most important techniques is to split the input into smaller pieces and process each piece separately. If that large input stream happens to be a lazy bytestring, the attosplit package provides a simple but powerful way to use that technique. The attosplit package exports a single function: split :: Parser Strict.ByteString -> Lazy.ByteString -> [Lazy.ByteString] It splits a lazy bytestring into a lazy list of lazy bytestrings, at boundaries defined by an attoparsec parser. Whenever the parser matches, the result of the match is prepended to the new lazy bytestring which begins at that point. For example, suppose lazyH is a lazy bytestring read from a gigantic HTML file, and you want to process each section beginning with an <H1> tag separately. Each section could itself still be quite large, so you want to process each section lazily as well. Then split (string "<H1>") lazyH is a lazy list of lazy bytestrings. The first lazy bytestring gives you all of the bytes up to the first <H1> tag. The rest of the lazy bytestrings give each section beginning with <H1> separately. This allows you to read the entire file lazily, while processing each individual section lazily on its own. Of course, you are not limited to split only at boundaries defined by a string. You have the full power of attoparsec at your fingertips. http://hackage.haskell.org/package/attosplit Enjoy! -Yitz
participants (1)
-
Yitzchak Gale