
On Thu, Apr 14, 2011 at 9:50 AM, Daniel Fischer
Well, I would have expected that, but [snip] searches the entire block and checks whether the second component is empty to see whether there's any match at all. So I thought that'd be the desired behaviour.
Oops, you're right =). I wonder what was OP's intended behaviour.
On Thu, Apr 14, 2011 at 7:14 AM, Sean Perry
The idea is to walk the disk looking for a signature, say NTFS or EXT. Since we do not know where the block containing this identifier is, we read the blocks in one at a time.
You may use an iteratee as well, which gives clean, efficient code that doesn't depend on IO (among other things). Using the "enumerator" package [1], (and assuming that you just want to check the beginning of each block), you could write: import qualified Data.Enumerator as E import qualified Data.Enumerator.Binary as EB searchForPattern :: Monad m => B.ByteString -> E.Iteratee B.ByteString m (Maybe Int) searchForPattern pat | restLen >= 0 = go 0 | otherwise = return Nothing where patL = L.fromChunks [pat] patLen = B.length pat restLen = chunkSize - patLen go i = do str <- EB.take (fromIntegral patLen) case (L.length str < fromIntegral patLen, str == patL) of (True, _) -> return Nothing (_, True) -> return (Just i) _ -> do EB.drop (fromIntegral restLen) go $! i+1 Using lazy bytestrings could make your code leak memory, leak handles and/or choke with exceptions. Using handles directly in IO (as in your original approach) makes code harder to test and leaves all gritty details up to you (such as reading blocks larger than 512 bytes). This iteratee is easily testable with pure code, does not leak, handles exceptions gracefully and you don't have to worry about how the file is being read. I've attached a full working example. In the example I read in blocks of 16 KiB, but you may easily adjust that (without compromising correctness). Cheers! =) [1] http://hackage.haskell.org/package/enumerator -- Felipe.