
I have a similar issue, I think. The problem with attoparsec is it only covers the unmarshalling side, writing data to disk still requires manually marshalling values into ByteStrings. Data.Binary with Data.Derive provide a clean, proven (encode . decode == id) way of doing this. If there's a way to accomplish this with attoparsec, I'd love to know. Max On Jul 28, 2010, at 10:32 PM, Gregory Collins wrote:
Conrad Parker
writes: Hi,
I am reading data from a file as strict bytestrings and processing them in an iteratee. As the parsing code uses Data.Binary, the strict bytestrings are then converted to lazy bytestrings (using fromWrap which Gregory Collins posted here in January:
-- | wrapped bytestring -> lazy bytestring fromWrap :: I.WrappedByteString Word8 -> L.ByteString fromWrap = L.fromChunks . (:[]) . I.unWrap
This just makes a 1-chunk lazy bytestring:
(L.fromChunks . (:[])) :: S.ByteString -> L.ByteString
). The parsing is then done with the library function Data.Binary.Get.runGetState:
-- | Run the Get monad applies a 'get'-based parser on the input -- ByteString. Additional to the result of get it returns the number of -- consumed bytes and the rest of the input. runGetState :: Get a -> L.ByteString -> Int64 -> (a, L.ByteString, Int64)
The issue I am seeing is that runGetState consumes more bytes than the length of the input bytestring, while reporting an apparently successful get (ie. it does not call error/fail). I was able to work around this by checking if the bytes consumed > input length, and if so to ignore the result of get and simply prepend the input bytestring to the next chunk in the continuation.
Something smells fishy here. I have a hard time believing that binary is reading more input than is available? Could you post more code please?
However I am curious as to why this apparent lack of bounds checking happens. My guess is that Get does not check the length of the input bytestring, perhaps to avoid forcing lazy bytestring inputs; does that make sense?
Would a better long-term solution be to use a strict-bytestring binary parser (like cereal)? So far I've avoided that as there is not yet a corresponding ieee754 parser.
If you're using iteratees you could try attoparsec + attoparsec-iteratee which would be a more natural way to bolt parsers together. The attoparsec-iteratee package exports:
parserToIteratee :: (Monad m) => Parser a -> IterateeG WrappedByteString Word8 m a
Attoparsec is an incremental parser so this technique allows you to parse a stream in constant space (i.e. without necessarily having to retain all of the input). It also hides the details of the annoying buffering/bytestring twiddling you would be forced to do otherwise.
Cheers, G -- Gregory Collins
_______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe