
On Thursday 14 April 2011 12:14:59, Sean Perry wrote:
The idea is to walk the disk looking for a signature, say NTFS or EXT. Since we do not know where the block containing this identifier is, we read the blocks in one at a time.
It may be more efficient to read the entire thing into a lazy ByteString instead of reading in very small strict chunks. That also makes the eof check unnecessary.
Long term I would like to support command line arguments for the name of the file and the offset to start looking.
Comments on style, idioms, etc. welcomed. I am particularly interested in the hIsEOF check and if there is a better way to handle that.
$ cabal install stringsearch -- faster searches for a pattern than provided by bytestring import qualified Data.ByteString.Char8 as BC import qualified Data.ByteString.Lazy as L import qualified Data.ByteString.Lazy.Search as L blockSize = 512 main = do stuff <- L.readFile "blocks" case L.indices (BC.pack "PART") stuff of [] -> putStrLn "Not Found." (i:_) -> putStrLn $ "Found at " ++ show (i `quot` blockSize) ++ "." putStrLn "Done."
import qualified Data.ByteString as B import qualified Data.ByteString.Char8 as BC import IO import System.IO
chunkSize = 512
searchForPattern handle pat = searchForPattern' 0 handle pat
searchForPattern' index handle pat = do eof <- hIsEOF handle if eof then return Nothing else do bytes <- B.hGet handle chunkSize case BC.breakSubstring pat bytes of (x, y) | BC.null y -> searchForPattern' (index + 1) handle pat
| otherwise -> return (Just index)
main = do fromHandle <- openBinaryFile "blocks" ReadMode result <- searchForPattern fromHandle (BC.pack "PART") case result of Nothing -> putStr "Not Found.\n" Just n -> putStr $ "Found at " ++ show n ++ ".\n" hClose fromHandle putStr "Done.\n"